0

I'm trying to get the content of an online page through SpringFramework using this procedure

public <T>HttpReply<T> httpRequest(final String uri, final HttpMethod method,
 final Class<T> expectedReturnType, final List<HttpMessageConverter<?>> messageConverters,
 final HashMap<String, Object> formValues, final HashMap<String, Object> headers)
 throws HttpNullUriOrMethodException, HttpInvocationException {
 try {
 redirectInfo.set(new AbstractMap.SimpleEntry<String, String>(uri, ""));
 if (method==null) {
 throw new HttpNullUriOrMethodException("HttpMethod cannot be null.");
 }
 if (!StringUtils.hasText(uri)) {
 throw new HttpNullUriOrMethodException("URI cannot be null or empty.");
 }
 HttpRequestExecutingMessageHandler handler =
 buildMessageHandler(uri, method, expectedReturnType, messageConverters);
 // Default queue for reply
 QueueChannel replyChannel = new QueueChannel();
 handler.setOutputChannel(replyChannel);
 // Exec Http Request
 Message<?> message = buildMessage(formValues, headers);
 try {
 handler.handleMessage(message);
 }
 catch (Exception e) {
 throw new HttpInvocationException("Error Handling HTTP Message.");
 }
 // Get Response
 Message<?> response = replyChannel.receive();
 if (response == null) {
 throw new HttpInvocationException("Error: communication is interrupted.");
 }
 // Read response Headers
 String[] usefulHeaders = readUsefulHeaders(response.getHeaders());
 // Return payload
 Object respObj = response.getPayload(); 
 if (expectedReturnType != null && !expectedReturnType.isInstance(respObj)) {
 throw new HttpInvocationException("Error: response payload is instance of "
 + respObj.getClass().getName() + ". Expected: " + expectedReturnType.getClass().getName());
 }
 HttpReply<T> retVal = new HttpReply<>();
 retVal.setPayload((T)respObj);
 String valRedirect = uri;
 if (redirectInfo.get().getKey().equals(uri)) {
 if (StringUtils.hasText(redirectInfo.get().getValue())) {
 valRedirect = redirectInfo.get().getValue();
 }
 }
 else {
 throw new HttpInvocationException("ERROR READING REDIRECT INFORMATION!!! Original URI: "
 + uri + " - FOUND URI: " + redirectInfo.get().getKey());
 }
 retVal.setActualLocation(valRedirect);
 return retVal;
 }
 finally {
 redirectInfo.remove();
 }
 }

which gets called like this

HttpReply<byte[]> feedContent = httpUtil.httpRequest(rssFeed.getUrl(), HttpMethod.GET, byte[].class, null,
 null, null);
rawXml = new String(feedContent.getPayload());

Now, this procedure works fine, except that sometimes rawXml contains �, especially when reading from page with a charset different from UTF8.

I tried to put into the handler.setCharset(StandardCharsets.ISO_8859_1), or to change the message header so that it would contain "contentType=application/xml; charset=ISO-8859-1"

I also tried to convert the text once inside rawXml but sometimes the message is neither UTF-8 nor ISO-8859-1 and so the conversion just doesn't correct the missing characters.

asked Dec 28, 2022 at 15:34
3
  • 1
    Welcome to the world of character sets. They are terribly difficult to get right. Hopefully the server tells you what character set it is using and then you use that character set to read your xml. However, not all character sets play nicely with the expected character ranges. One that is notorious for doing this is the microsoft character set that used control characters, which caused all sorts of problems. I would recommend looking for those control characters in your returned xml and replace them with a suitable replacement. alanwood.net/demos/ansi.html Commented Dec 28, 2022 at 16:47
  • the characters that get changed into � are the classic àòèéìù, "", «» and SOMETIMES '. The problem comes from the fact that the retVal is an array of bytes that already gets populated with the wrong character. Commented Jan 2, 2023 at 14:31
  • So, I am not totally sure what your question/goal is... Are you trying to convert the square question mark into the correct characters? Do you want to reject any xml that isn't utf-8? Do you want to accept any character set and then change it into utf-8 as part of the response? My guess is that you want to convert the characters, but that can sometimes be difficult. First and foremost, I would urge you to figure out the code points of the characters that are messed up, then determine what those values should actually be. stackoverflow.com/q/23979676/42962 Commented Jan 3, 2023 at 17:23

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.