Efficient URL Escape (Percent-Encoding)

Question 1

Here I have a simple algorithm to percent-encode any string.

(Specification from Wikipedia; note that this is not compatible with URLEncoder.encode())

Is this an efficient solution to the problem?

Using a StringBuilder should be efficient, but it doesn't seem great since every character is added to the StringBuilder individually. Could this have any significant impact?

This method works only for ASCII characters; that is sufficient for my use case.

private static String urlEscape(String toEscape){
 //if null, keep null (no gain or loss of safety)
 if (toEscape==null)
 return null;
 StringBuilder sb=new StringBuilder();
 for (char character: toEscape.toCharArray())//for every character in the string
 switch (character){//if the character needs to be escaped, add its escaped value to the StringBuilder
 case '!': sb.append("%21"); continue;
 case '#': sb.append("%23"); continue;
 case '$': sb.append("%24"); continue;
 case '&': sb.append("%26"); continue;
 case '\'': sb.append("%27"); continue;
 case '(': sb.append("%28"); continue;
 case ')': sb.append("%29"); continue;
 case '*': sb.append("%2A"); continue;
 case '+': sb.append("%2B"); continue;
 case ',': sb.append("%2C"); continue;
 case '/': sb.append("%2F"); continue;
 case ':': sb.append("%3A"); continue;
 case ';': sb.append("%3B"); continue;
 case '=': sb.append("%3D"); continue;
 case '?': sb.append("%3F"); continue;
 case '@': sb.append("%40"); continue;
 case '[': sb.append("%5B"); continue;
 case ']': sb.append("%5D"); continue;
 case ' ': sb.append("%20"); continue;
 case '"': sb.append("%22"); continue;
 case '%': sb.append("%25"); continue;
 case '-': sb.append("%2D"); continue;
 case '.': sb.append("%2E"); continue;
 case '<': sb.append("%3C"); continue;
 case '>': sb.append("%3E"); continue;
 case '\\': sb.append("%5C"); continue;
 case '^': sb.append("%5E"); continue;
 case '_': sb.append("%5F"); continue;
 case '`': sb.append("%60"); continue;
 case '{': sb.append("%7B"); continue;
 case '|': sb.append("%7C"); continue;
 case '}': sb.append("%7D"); continue;
 case '~': sb.append("%7E"); continue;
 default: sb.append(character);//if it does not need to be escaped, add the character itself to the StringBuilder
 }
 return sb.toString();//build the string, and return
}

Question 2

Won't this fail with UTF-8 characters?

Question 3

@IsmaelMiguel Yes, this will fail for special characters. However, it's not important for my use case.

Question 4

That should be specified in the question.

Question 5

Also have a look at this SO answer (and the other answer): stackoverflow.com/a/4605848/866915

Question 6

You are in dire need of a lookup table. What you want to do is define a mapping somewhere else, and then just percent-encode all characters you have mapped. This gets rid of that unwieldy, huge and btw. hacky switch-statement.

consider the following:

StringBuilder sb = new StringBuilder(toEncode.length());
for (Character c : toEncode.toCharArray()) {
 if (MAPPING.containsKey(c)) {
 sb.append(MAPPING.get(c));
 } else {
 sb.append(c);
 }
}

This should be equally fast, if not faster. Also it makes use of a MAPPING you can change, without significantly affecting how this method works. btw. I am prespecifying the StringBuilder's capacity, since it's more efficient when the backing collection in there does not need to be resized too often.

Oh and another thing. Your switch-case only works by sheer luck. continue; is not the correct keyword to use there, instead you should rely on break; that allows you to add more work in the loop body, which is currently more or less impossible.

Question 7

Would it be worth adding roughly +10% capacity to the StringBuilder?

Question 8

@Builder_K depends a little on what you are encoding, and you'd have to run measurements.... It depends

Question 9

Some mapping implementations also have a 'get or default value' function, which would eliminate the needs for an if statement. In Java 8 even the default map implementation has this.

Question 10

An alternative is to put your list of reserved characters as a String, then iterate-and-compare using indexOf():

private static final String VALUES = "!#$&'()*+,/:;=?@[] \"%-.<>\\^_`{|}~";
private static String encode(String input) {
 if (input == null || input.isEmpty()) {
 return input;
 }
 StringBuilder result = new StringBuilder(input);
 for (int i = input.length() - 1; i >= 0; i--) {
 if (VALUES.indexOf(input.charAt(i)) != -1) {
 result.replace(i, i + 1, 
 "%" + Integer.toHexString(input.charAt(i)).toUpperCase());
 }
 }
 return result.toString();
}

You may want to (slightly) optimize your initial validation by skipping "" values as well. Over here, I used input as the initial contents of the StringBuilder. Helpfully, the initial capacity adds another 16 on top of the input's length, so if there's only... 8 replacements, then it's pretty 'efficient' (for some definition of that word).

Also, take note that the iteration is done from the end of the String, to preserve the start positioning of the character to replace. To visualize this in another manner, the replacement is always done to the right, while moving left-wise.

Question 11

VALUES.indexOf() is O(n) time. A lookup (let's assume hash) table is theoretically faster, with O(1) time.

Question 12

The VALUES string is of course very useful to the function you write that builds the lookup table (once, on first use).

Question 13

@Builder_K Big O notation is only important for asymptotic performance. With a constant set of items it's completely irrelavant. There may be a performance penalty to iterating through the array, but it's so small it's insignificant. Premature optimization is the root of all evil!

Question 14

@AnubianNoob actually not "completely irrelevant". but you do have a point. for this use-case O(n) and O(1) are the same since n is constant ...

Question 15

@AnubianNoob I guess I'm used to thinking in big O estimates. All I meant was that iterating over the string many times will incur a performance hit.

Vogel612 Vogel612 25.5k7 gold badges59 silver badges141 bronze badges · Accepted Answer · 2015-09-02 14:12:15Z

You are in dire need of a lookup table. What you want to do is define a mapping somewhere else, and then just percent-encode all characters you have mapped. This gets rid of that unwieldy, huge and btw. hacky switch-statement.

consider the following:

StringBuilder sb = new StringBuilder(toEncode.length());
for (Character c : toEncode.toCharArray()) {
 if (MAPPING.containsKey(c)) {
 sb.append(MAPPING.get(c));
 } else {
 sb.append(c);
 }
}

This should be equally fast, if not faster. Also it makes use of a MAPPING you can change, without significantly affecting how this method works. btw. I am prespecifying the StringBuilder's capacity, since it's more efficient when the backing collection in there does not need to be resized too often.

Oh and another thing. Your switch-case only works by sheer luck. continue; is not the correct keyword to use there, instead you should rely on break; that allows you to add more work in the loop body, which is currently more or less impossible.

Would it be worth adding roughly +10% capacity to the StringBuilder?
@Builder_K depends a little on what you are encoding, and you'd have to run measurements.... It depends
Some mapping implementations also have a 'get or default value' function, which would eliminate the needs for an if statement. In Java 8 even the default map implementation has this.

Stack Exchange Network

Efficient URL Escape (Percent-Encoding)

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Efficient URL Escape (Percent-Encoding)

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions