Here I have a simple algorithm to percent-encode any string.
(Specification from Wikipedia; note that this is not compatible with URLEncoder.encode()
)
Is this an efficient solution to the problem?
Using a StringBuilder
should be efficient, but it doesn't seem great since every character is added to the StringBuilder
individually. Could this have any significant impact?
This method works only for ASCII characters; that is sufficient for my use case.
private static String urlEscape(String toEscape){
//if null, keep null (no gain or loss of safety)
if (toEscape==null)
return null;
StringBuilder sb=new StringBuilder();
for (char character: toEscape.toCharArray())//for every character in the string
switch (character){//if the character needs to be escaped, add its escaped value to the StringBuilder
case '!': sb.append("%21"); continue;
case '#': sb.append("%23"); continue;
case '$': sb.append("%24"); continue;
case '&': sb.append("%26"); continue;
case '\'': sb.append("%27"); continue;
case '(': sb.append("%28"); continue;
case ')': sb.append("%29"); continue;
case '*': sb.append("%2A"); continue;
case '+': sb.append("%2B"); continue;
case ',': sb.append("%2C"); continue;
case '/': sb.append("%2F"); continue;
case ':': sb.append("%3A"); continue;
case ';': sb.append("%3B"); continue;
case '=': sb.append("%3D"); continue;
case '?': sb.append("%3F"); continue;
case '@': sb.append("%40"); continue;
case '[': sb.append("%5B"); continue;
case ']': sb.append("%5D"); continue;
case ' ': sb.append("%20"); continue;
case '"': sb.append("%22"); continue;
case '%': sb.append("%25"); continue;
case '-': sb.append("%2D"); continue;
case '.': sb.append("%2E"); continue;
case '<': sb.append("%3C"); continue;
case '>': sb.append("%3E"); continue;
case '\\': sb.append("%5C"); continue;
case '^': sb.append("%5E"); continue;
case '_': sb.append("%5F"); continue;
case '`': sb.append("%60"); continue;
case '{': sb.append("%7B"); continue;
case '|': sb.append("%7C"); continue;
case '}': sb.append("%7D"); continue;
case '~': sb.append("%7E"); continue;
default: sb.append(character);//if it does not need to be escaped, add the character itself to the StringBuilder
}
return sb.toString();//build the string, and return
}
-
\$\begingroup\$ Won't this fail with UTF-8 characters? \$\endgroup\$Ismael Miguel– Ismael Miguel2015年09月02日 14:21:13 +00:00Commented Sep 2, 2015 at 14:21
-
\$\begingroup\$ @IsmaelMiguel Yes, this will fail for special characters. However, it's not important for my use case. \$\endgroup\$Kent– Kent2015年09月02日 14:24:45 +00:00Commented Sep 2, 2015 at 14:24
-
1\$\begingroup\$ That should be specified in the question. \$\endgroup\$Ismael Miguel– Ismael Miguel2015年09月02日 14:26:41 +00:00Commented Sep 2, 2015 at 14:26
-
\$\begingroup\$ Also have a look at this SO answer (and the other answer): stackoverflow.com/a/4605848/866915 \$\endgroup\$ErikR– ErikR2015年09月02日 15:01:00 +00:00Commented Sep 2, 2015 at 15:01
2 Answers 2
You are in dire need of a lookup table. What you want to do is define a mapping somewhere else, and then just percent-encode all characters you have mapped. This gets rid of that unwieldy, huge and btw. hacky switch-statement.
consider the following:
StringBuilder sb = new StringBuilder(toEncode.length());
for (Character c : toEncode.toCharArray()) {
if (MAPPING.containsKey(c)) {
sb.append(MAPPING.get(c));
} else {
sb.append(c);
}
}
This should be equally fast, if not faster. Also it makes use of a MAPPING you can change, without significantly affecting how this method works. btw. I am prespecifying the StringBuilder's capacity, since it's more efficient when the backing collection in there does not need to be resized too often.
Oh and another thing. Your switch-case only works by sheer luck. continue;
is not the correct keyword to use there, instead you should rely on break;
that allows you to add more work in the loop body, which is currently more or less impossible.
-
\$\begingroup\$ Would it be worth adding roughly +10% capacity to the StringBuilder? \$\endgroup\$Kent– Kent2015年09月02日 14:36:53 +00:00Commented Sep 2, 2015 at 14:36
-
\$\begingroup\$ @Builder_K depends a little on what you are encoding, and you'd have to run measurements.... It depends \$\endgroup\$Vogel612– Vogel6122015年09月02日 14:37:43 +00:00Commented Sep 2, 2015 at 14:37
-
2\$\begingroup\$ Some mapping implementations also have a 'get or default value' function, which would eliminate the needs for an
if
statement. In Java 8 even the default map implementation has this. \$\endgroup\$David says Reinstate Monica– David says Reinstate Monica2015年09月02日 20:57:33 +00:00Commented Sep 2, 2015 at 20:57
An alternative is to put your list of reserved characters as a String
, then iterate-and-compare using indexOf()
:
private static final String VALUES = "!#$&'()*+,/:;=?@[] \"%-.<>\\^_`{|}~";
private static String encode(String input) {
if (input == null || input.isEmpty()) {
return input;
}
StringBuilder result = new StringBuilder(input);
for (int i = input.length() - 1; i >= 0; i--) {
if (VALUES.indexOf(input.charAt(i)) != -1) {
result.replace(i, i + 1,
"%" + Integer.toHexString(input.charAt(i)).toUpperCase());
}
}
return result.toString();
}
You may want to (slightly) optimize your initial validation by skipping ""
values as well. Over here, I used input
as the initial contents of the StringBuilder
. Helpfully, the initial capacity adds another 16 on top of the input's length, so if there's only... 8 replacements, then it's pretty 'efficient' (for some definition of that word).
Also, take note that the iteration is done from the end of the String
, to preserve the start
positioning of the character to replace. To visualize this in another manner, the replacement is always done to the right, while moving left-wise.
-
1\$\begingroup\$ VALUES.indexOf() is O(n) time. A lookup (let's assume hash) table is theoretically faster, with O(1) time. \$\endgroup\$Kent– Kent2015年09月02日 15:47:28 +00:00Commented Sep 2, 2015 at 15:47
-
\$\begingroup\$ The
VALUES
string is of course very useful to the function you write that builds the lookup table (once, on first use). \$\endgroup\$Toby Speight– Toby Speight2015年09月02日 16:37:55 +00:00Commented Sep 2, 2015 at 16:37 -
1\$\begingroup\$ @Builder_K Big O notation is only important for asymptotic performance. With a constant set of items it's completely irrelavant. There may be a performance penalty to iterating through the array, but it's so small it's insignificant. Premature optimization is the root of all evil! \$\endgroup\$Anubian Noob– Anubian Noob2015年09月02日 16:59:38 +00:00Commented Sep 2, 2015 at 16:59
-
2\$\begingroup\$ @AnubianNoob actually not "completely irrelevant". but you do have a point. for this use-case O(n) and O(1) are the same since n is constant ... \$\endgroup\$Vogel612– Vogel6122015年09月02日 17:09:44 +00:00Commented Sep 2, 2015 at 17:09
-
\$\begingroup\$ @AnubianNoob I guess I'm used to thinking in big O estimates. All I meant was that iterating over the string many times will incur a performance hit. \$\endgroup\$Kent– Kent2015年09月02日 17:55:01 +00:00Commented Sep 2, 2015 at 17:55