Better code for converting a char to its UTF-8 percent encoding representation?

Question 1

This is working code for a URI template (RFC 6570) implementation; when the character to render is not within a specific character set, it is needed to grab the UTF-8 representation of that character and encode each byte as a percent-encoded sequence.

The question is whether something more efficient than this exists?

private static String encodeChar(final char c)
{
 final String tmp = new String(new char[] { c });
 final byte[] bytes = tmp.getBytes(Charset.forName("UTF-8"));
 final StringBuilder sb = new StringBuilder();
 for (final byte b: bytes)
 sb.append('%').append(UnsignedBytes.toString(b, 16));
 return sb.toString();
}

Note: UnsignedBytes.toString() is from Guava.

Question 2

What kind of performance do you need? Is this just a general question? Did you try essentially the same thing on the whole string and see if it was faster?

Question 3

Doing it on a whole string is a no-no. Only the characters out of a specified character set need to be percent-encoded this way. What I am asking is whether a better way to achieve such an encoding of a single character exists. For reference, see RFC 6570, section 1.5.

Question 4

Of course, on a whole string it would skip most characters - I don't know how easy it would be with getBytes. (You didn't answer my first two questions.)

Question 5

It is not a question of performance; the problem is that I think the code is a little clumsy as it stands. In this sense it is more of a general question. The main encoding function uses a CharMatcher to determine whether the current character in a string should be percent encoded or not.

Question 6

Documentation here

public static void main(final String[] args) {
 try {
 System.out.println(URLEncoder.encode("a?>éàùA-Z",
 System.getProperty("sun.jnu.encoding")));
 } catch (final UnsupportedEncodingException e) {
 System.out.println("BIG PB");
 return; 
 }
 }

Output :

a%3F%3E%E9%E0%F9A-Z

You have the same for URLEDecoder

EDIT

For URI
Found in src of iMeMex Dataspace ManagementSystem

public static URI encode(String uriStr) throws URISyntaxException {
 if (uriStr != null) {
 // poor man's encoding
 uriStr = uriStr.replaceAll(" ", "%20");
 }
 return new URI(uriStr);
}

JavaDoc here

But URI from Java does not change other %VALUES.

Question 7

This does not fit the bill. For instance, it replaces spaces with '+', and the encoded character list differs from what URI Template requires.

cl-r cl-r 9084 silver badges13 bronze badges · Answer 1 · 2013-05-24 06:57:22Z

Documentation here

public static void main(final String[] args) {
 try {
 System.out.println(URLEncoder.encode("a?>éàùA-Z",
 System.getProperty("sun.jnu.encoding")));
 } catch (final UnsupportedEncodingException e) {
 System.out.println("BIG PB");
 return; 
 }
 }

Output :

a%3F%3E%E9%E0%F9A-Z

You have the same for URLEDecoder

EDIT

For URI
Found in src of iMeMex Dataspace ManagementSystem

public static URI encode(String uriStr) throws URISyntaxException {
 if (uriStr != null) {
 // poor man's encoding
 uriStr = uriStr.replaceAll(" ", "%20");
 }
 return new URI(uriStr);
}

JavaDoc here

But URI from Java does not change other %VALUES.

This does not fit the bill. For instance, it replaces spaces with '+', and the encoded character list differs from what URI Template requires.

Stack Exchange Network

Better code for converting a char to its UTF-8 percent encoding representation?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Better code for converting a char to its UTF-8 percent encoding representation?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions