I'm calling a service which errors, telling me there is an encoding problem with the below String:
Universal®
It is my understanding that this String is "utf8" encoded. Is this a correct understanding of utf8 encoding? If so, does this indicate that I should remove the utf8 encoding? If so, any suggestions on how I can de-encode a utf8 String in Java?
Or am I wrong, and the above String is not utf8 encoded? If so, any suggestions how to encode it?
4 Answers 4
How Java stores the string isn't the same as how it is encoded in messages. You can try something like:
String s = "Universal®";
byte[] encoded = s.getBytes(Charset.forName("UTF-8"));
You'll have to catch the UnsupportedCharsetException, but UTF-8 is a standard available charset.
Or you may need to set the encoding in the sending API, like in HTTP Content-Type: text/plain; charset=UTF-8.
2 Comments
StandardCharsets.UTF_8 instead of Charset.forName("UTF-8")"Universal®" with ® U+00AE cannot be represented in plain 7-bits ASCII, Though it can in several other charsets/encodings. The universal Unicode encoding UTF-8 can mix any script.
You need the text converted as bytes in some encoding to be able to state its encoding.
In java String is Unicode internally and can deal with everything.
As the java source encoding is free however, it must be the same encoding as used by the java compiler javac. You can however use the u-escaping, using ASCII to represent the special symbols (in the UTF-16 range):
String s = "Universal\u00AE";
byte[] bytes = s.getBytes(StandardCharsets.UTF_8);
String t = new String(bytes, StandardCharsets.UTF_8);
assert t.equals(s);
Comments
In a very general sense, encoding is just the assortment and allocation of bits, that is used to represent strings. See the link below for more detailed information. Generally all encoding types are fairly transferrable to each other, but there is a few exceptions to this. You have probably seen the large blank squares/etc that mark a symbol that cannot be displayed. This is generally caused by an encoding error (such as the character not existing for that encoding scheme).
https://en.wikipedia.org/wiki/UTF-8
As per your specific problem, that string listed should be UTF-8 Encodable. It may have been saved in another encoding type (which may cause your issue). You could always attempt to convert it to UTF-8 and see what happens.
Edit - In regards to the comments, I expect the issue is related to not encoding it properly before attempting to transfer it via the service (or to the service).
1 Comment
A quick look here: http://www.utf8-chartable.de/ (and we should know that without looking, people) shows that @ is indeed a utf8 character. So, dunno what framework complains about it not being such, but it's wrong
Stringobject, internally theStringisnot encoded in UTF-8. It is encoded in UTF-16. That's largely irrelevant, though: the issue is all about how you transfer the string data to the service you are trying to call, and about how that service expects you to do so. Apparently those are mismatched.