14
\$\begingroup\$

Introduction

As some of you may know, URLs actually have a list of characters that do special things. For example, the / character separates parts of the URL, and the ?, &, and = characters are used to pass query parameters to the server. In fact, there is a bunch of characters with special functions: $&+,/:;=?@. When you need to use these characters in the URL for any other reason besides the special functions, you have to do something called percent-encoding.

Percent encoding is when you take a character's hexadecimal value and prepend a % character to the beginning of it. For example, the character ? would be encoded as %3F, and the character & would be encoded as %26. In a URL specifically, this allows you to send these characters as data via the URL without causing parsing problems. Your challenge will be to take a string, and percent-encode all of the characters that need to be encoded.

The Challenge

You shall write a program or function that takes in a single string consisting of characters with codepoints 00-FF (ASCII and Extended ASCII characters). You will then have to output or return the same string with each character percent-encoded if necessary. Built-ins that accomplish this task are not allowed, nor are standard loopholes. For reference, here is a list of every character that needs to be percent encoded:

  • Control characters (Codepoints 00-1F and 7F)
  • Extended ASCII characters (Codepoints 80-FF)
  • Reserved characters ($&+,/:;=?@, i.e. codepoints 24, 26, 2B, 2C, 2F, 3A, 3B, 3D, 3F, 40)
  • Unsafe characters (" <>#%{}|\^~[]`, i.e. codepoints 20, 22, 3C, 3E, 23, 25, 7B, 7D, 7C, 5C, 5E, 7E, 5B, 5D, 60)

Here is a the same list, but instead as a list of decimal codepoints:

0-31, 32, 34, 35, 36, 37, 38, 43, 44, 47, 58, 59, 60, 62, 61, 63, 64, 91, 92, 93, 94, 96, 123, 124, 125, 126, 127, 128-255

This is code golf, so shortest code in bytes (or approved alternative scoring method) wins!

Test Cases

http://codegolf.stackexchange.com/ => http%3A%2F%2Fcodegolf.stackexchange.com%2F
[@=>]{#} => %5B%40%3D%3E%5D%7B%23%7D
Test String => Test%20String
ÑÉÐÔ® => %D1%C9%D0%D4%AE
 => %0F%16%7F (Control characters 0F, 16, and 7F)
 ¡¢£¤\¦§ ̈©a«¬­® ̄°±23 ́μ¶· ×ばつØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ => %80%81%82%83%84%85%86%87%88%89%8A%8B%8C%8D%8E%8F%90%91%92%93%94%95%96%97%98%99%9A%9B%9C%9D%9E%9F%A0%A1%A2%A3%A4%A5%A6%A7%A8%A9%AA%AB%AC%AD%AE%AF%B0%B1%B2%B3%B4%B5%B6%B7%B8%B9%BA%BB%BC%BD%BE%BF%C0%C1%C2%C3%C4%C5%C6%C7%C8%C9%CA%CB%CC%CD%CE%CF%D0%D1%D2%D3%D4%D5%D6%D7%D8%D9%DA%DB%DC%DD%DE%DF%E0%E1%E2%E3%E4%E5%E6%E7%E8%E9%EA%EB%EC%ED%EE%EF%F0%F1%F2%F3%F4%F5%F6%F7%F8%F9%FA%FB%FC%FD%FE%FF (Extended ASCII characters 80-FF)
 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ => %20!%22%23%24%25%26'()*%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D%7E
Οurous
8,1471 gold badge17 silver badges38 bronze badges
asked Aug 14, 2016 at 5:29
\$\endgroup\$
13
  • \$\begingroup\$ Would you have a testcase that shows the control characters? \$\endgroup\$ Commented Aug 14, 2016 at 5:31
  • \$\begingroup\$ @LeakyNun done. \$\endgroup\$ Commented Aug 14, 2016 at 5:36
  • \$\begingroup\$ I'm sure codepoint EF doesn't contain the question mark. \$\endgroup\$ Commented Aug 14, 2016 at 5:36
  • \$\begingroup\$ @zyabin101 where did you find that? Im not seeing it. \$\endgroup\$ Commented Aug 14, 2016 at 5:38
  • \$\begingroup\$ "For example, the character ? would be encoded as %EF..." \$\endgroup\$ Commented Aug 14, 2016 at 5:39

14 Answers 14

3
\$\begingroup\$

Vim, 67 bytes/keystrokes

:s/\c[^a-z!'()*0-9._-]/\='%'.printf("%02x",char2nr(submatch(0)))/g<cr>

Note that <cr> represents the enter key, e.g. 0x0D which is a single byte.

This is a pretty straightforward solution. Explanation:

:s/ "Search and replace
 \c "Case-insensitive
 [^a-z!'()*0-9._-]/ "A negative range. Matches any character not alphabetc, numeric or in "!'()*0-9._-"
 \= "Evaluate
 '%' "a percent sign string
 . "Concatenated with
 printf("%02x",char2nr(submatch(0))) "The hex value of the character we just matched
 /g "Make this apply to ever match
 <cr> "Actually run the command

That printf("%02x",char2nr(submatch(0))) garbage is terribly ungolfy.

answered Aug 14, 2016 at 6:31
\$\endgroup\$
1
  • \$\begingroup\$ "That printf("%02x",char2nr(submatch(0))) garbage is terribly ungolfy" and extremely hacky \$\endgroup\$ Commented Aug 14, 2016 at 6:34
2
\$\begingroup\$

Pyth, (削除) 30 28 (削除ここまで) 26 bytes

L?hx+G+rG1CGbb+\%.HCbsmydz

try it online

Explanation

L?hx+G+rG1CGbb+\%.HCbsmydz
L?hx+G+rG1CGbb+\%.HCb First part, L defines the function y(b)
 ?hx+G+rG1CGbb+\%.HCb ? is the ternary operator
 hx+G+rG1CGb This part will be evaluated
 hx x will find the first occurence of a
 character in a list. If it doesn't
 find one, it will return -1. hx then
 equals 0 (or false).
 +G+rG1CG The list of allowed characters, a
 concetanation (+) of the alphabet (G),
 uppercase alphabet (rG1) and numbers
 (CG, see below for details)
 b The character to find in the list
 b True branch of the ternary operator,
 the character is allowed and returned.
 +\%.HCb False branch, convert to hex and add %
 smydz The actual program
 mydz Map every character in the input (z)
 using the function y on every d
 s Join the array, and implicit print.

CG is this trick that generate a huge number that contains all possible digits. This is perfect, since we don't care for duplicates when checking whether a string is in another.

answered Aug 14, 2016 at 18:16
\$\endgroup\$
1
  • \$\begingroup\$ This answer does not meet the spec in the question. There are more allowed characters than just A-Za-z0-9. For example, . should be preserved rather than translated to %2E. (cc: @GamrCorps) \$\endgroup\$ Commented Nov 16, 2016 at 17:22
2
\$\begingroup\$

Perl, 40 bytes

39 bytes code + -p.

A bit lame, but I think it's the shortest solution...

s/[^!'()*-.\w]/sprintf'%%%02x',ord$&/ge

Usage

echo -n ' !"#$%&'\''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqstuvwxyz{|}~' | perl -pe "s/[^'()*-.\w]/sprintf'%%%02x',ord$&/ge"
%20%21%22%23%24%25%26'()*+,-.%2f0123456789%3a%3b%3c%3d%3e%3f%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5b%5c%5d%5e_%60abcdefghijklmnopqstuvwxyz%7b%7c%7d%7e
answered Aug 14, 2016 at 7:49
\$\endgroup\$
2
\$\begingroup\$

Julia, 47 bytes

!s=replace(s,r"[^\w!'()*.-]",c->"%"hex(c[1],2))

Try it online!

answered Aug 14, 2016 at 17:24
\$\endgroup\$
1
\$\begingroup\$

Python 3, 92 bytes

5 bytes thanks to orlp.

1 byte thanks to Sp3000.

import re;lambda s:''.join(re.match("[!'-*.0-9\w-]",c,256)and c or'%%%02X'%ord(c)for c in s)

Ideone it!

answered Aug 14, 2016 at 5:41
\$\endgroup\$
4
  • \$\begingroup\$ re.match("[!'()*.0-9A-Za-z_-]",c)and c or'%%%02X'%ord(c) \$\endgroup\$ Commented Aug 14, 2016 at 6:19
  • \$\begingroup\$ @Sp3000 \w includes extended ASCII \$\endgroup\$ Commented Aug 14, 2016 at 6:23
  • \$\begingroup\$ Also, '()* -> '-* \$\endgroup\$ Commented Aug 14, 2016 at 6:29
  • \$\begingroup\$ I think \w works with the 256 (re.ASCII) option: ideone. It definitely works in Python 3 on ideone, and it should work with u"..." strings in Python 2, but ideone seems to do funky things to the latter (e.g. print len(u"ÑÉÐÔ®") gives 10 on ideone but 5 on repl.it and my computer, despite all being 2.7.10) \$\endgroup\$ Commented Aug 14, 2016 at 7:24
1
\$\begingroup\$

C, 83 bytes

f(char*p){for(;*p;++p)printf(isalnum(*p)||strchr("!'()*-._",*p)?"%c":"%%%02X",*p);}
answered Aug 14, 2016 at 6:04
\$\endgroup\$
0
1
\$\begingroup\$

Python, 86 bytes

lambda s:"".join(["%%%02X"%ord(c),c][c<"{"and c.isalnum()or c in"!'()*-._"]for c in s)

Port of my C answer.

answered Aug 14, 2016 at 6:13
\$\endgroup\$
0
1
\$\begingroup\$

Ruby, 37 + 3 = 40 bytes

Run with -p (3 extra bytes), like $ ruby -p percent_encode.rb:

gsub(/[^\w!'()*-.]/){"%%%02X"%$&.ord}
answered Aug 14, 2016 at 13:54
\$\endgroup\$
1
\$\begingroup\$

Jelly, (削除) 28 (削除ここまで) 27 bytes

ḟØWḟ©"!'()*-."Od4‘ịØH"%p®,y

This is a monadic link. Try it online!

How it works

ḟØWḟ©"!'()*-."Od4‘ịØH"%p®,y Monadic link. Argument: s (string)
 ØW Yield "0...9A...Z_a...z".
ḟ Remove these characters from s.
 "!'()*-." Yield "!'()*-.".
 ḟ Remove these characters from s.
 © Copy the result to the register.
 O Ordinal; get the code point of each character.
 d4 Divmod 16; yield quotient and remainder modulo 16.
 ’ Decrement the results.
 ịØH Index into "0123456789ABCDEF".
 "p% Perform Cartesian product with "%, prepending it to
 each pair of hexadecimal digits.
 ®, Yield [t, r], where t is the string in the register
 and r the result of the Cartesian product.
 y Use this pair to perform transliteration on s.
answered Aug 14, 2016 at 7:23
\$\endgroup\$
0
1
\$\begingroup\$

Haskell, (削除) 201 (削除ここまで) (削除) 179 (削除ここまで) (削除) 178 (削除ここまで) (削除) 127 (削除ここまで) 119 bytes

import Data.Char;import Numeric;f=(=<<)(\c->if isAlphaNum c&&isAscii c||elem c"-_.~"then[c]else '%':(showHex$ord c)"")

Ungolfed:

import Data.Char
import Numeric
f=(=<<) e
e c = if isAlphaNum c && isAscii c && c `elem` "-_.~" then [c] else '%' : (showHex $ ord c) ""
answered Aug 14, 2016 at 19:07
\$\endgroup\$
4
  • \$\begingroup\$ Can you remove a bunch of the spaces? \$\endgroup\$ Commented Aug 14, 2016 at 19:09
  • \$\begingroup\$ You can loose the where, turn the if into guards, make e partial, loose the last argument of showHex, inline p, inline s, loose the signature, reorder the elem and loose even more whitespace. As a first approximation I got down to 118 that way. \$\endgroup\$ Commented Aug 14, 2016 at 19:29
  • \$\begingroup\$ Thanks @MarLinn for a bunch of good suggestions on trimming down the code. However, I had some trouble with certain suggestions. First of all, if I remove the signature, GHC will complain that No instance for (Foldable t0) arising from a use of ‘foldr’. It says that the type of the function is ambiguous, resulting in an inferred binding of f :: t0 Char -> [Char]. And second of all, I could not remove the empty string argument from showHex as it returns a ShowS, which is a type alias for String -> String thus needing the empty string. \$\endgroup\$ Commented Aug 14, 2016 at 19:59
  • \$\begingroup\$ @sham1, yes, ShowS takes a String... but you have one: the one you're adding with (++). So you can loose both at the same time. That's actually why ShowS looks that way. I don't get the type error, so I guess it's a version thing? Two other things I noticed by now: otherwise can always be replaced by 1<2 (a shorthand for True), but if you return to if instead you can inline e and drop all names. And even turn the fold into a concatMap, i.e. a (>>=). Doesn't save a lot, but at least a little. Might solve the type error, too. \$\endgroup\$ Commented Aug 14, 2016 at 20:27
1
\$\begingroup\$

16/32-bit x86 assembly, 73 bytes

Byte code:

AC 3C 21 72 2A 74 3E 3C 26 76 24 3C 2B 72 36 3C
2C 76 1C 3C 2F 72 2E 74 16 3C 3A 72 28 74 10 3C
5F 74 22 50 0C 60 3C 60 74 02 3C 7B 58 72 16 D4
10 3C 09 1C 69 2F 86 E0 3C 09 1C 69 2F 92 B0 25
AA 92 AA 86 E0 AA E2 B8 C3

Disassembly:

l0: lodsb ;fetch a character
 cmp al, 21h
 jb l1 ;encode 0x00-0x20
 je l2 ;store 0x21
 cmp al, 26h
 jbe l1 ;encode 0x22-0x26
 cmp al, 2bh
 jb l2 ;store 0x27-0x2A
 cmp al, 2ch
 jbe l1 ;encode 0x2B-0x2C
 cmp al, 2fh
 jb l2 ;store 0x2D-0x2E
 je l1 ;encode 0x2F
 cmp al, 3ah
 jb l2 ;store 0x30-0x39
 je l1 ;encode 0x3A
 cmp al, 5fh
 je l2 ;store 0x5F
 push eax
 or al, 60h ;merge ranges
 cmp al, 60h
 je l3 ;encode 0x40, 0x60
 cmp al, 7bh
l3: pop eax
 jb l2 ;store 0x41-0x5A, 0x61-0x7A
 ;encode 0x3B-0x3F, 0x5B-0x5E, 0x7B-0xFF
l1: aam 10h ;split byte to nibbles
 cmp al, 9 ;convert 0x0A-0x0F 
 sbb al, 69h ;to
 das ;0x41-0x46 ('A'-'F')
 xchg ah, al ;swap nibbles
 cmp al, 9 ;do
 sbb al, 69h ;other
 das ;half
 xchg edx, eax ;save in edx
 mov al, '%'
 stosb ;emit '%'
 xchg edx, eax
 stosb ;emit high nibble
 xchg ah, al
l2: stosb ;emit low nibble or original character
 loop l0 ;until end of string
 ret

Call with:
- esi = pointer to buffer that holds source string;
- edi = pointer to buffer that receives encoded string;
- ecx = length of source string.

answered Mar 16, 2018 at 5:24
\$\endgroup\$
0
\$\begingroup\$

Python 2, 78 bytes

lambda s:"".join(["%%%02x"%ord(c),c][c.isalnum()or c in"!'()*-._"]for c in s)

More nicely formatted:

lambda s:
 "".join(["%%%02x" % ord(c), c][c.isalnum() or c in"!'()*-._"] for c in s)
answered Aug 14, 2016 at 14:45
\$\endgroup\$
0
\$\begingroup\$

SQF, (削除) 199 (削除ここまで) 176

Using the function-as-a-file format:

i="";a="0123456789ABCDEF!'()*-.GHIJKLMNOPQRSTUVWXYZ_";{i=i+if((toUpper _x)in a)then{_x}else{x=(toArray[_x])select 0;"%"+(a select floor(x/16))+(a select(x%16))}}forEach _this;i

Call as "STRING" call NAME_OF_COMPILED_FUNCTION

answered Aug 14, 2016 at 23:10
\$\endgroup\$
0
\$\begingroup\$

PowerShell v2+, 146 bytes

param($n)37,38+0..36+43,44,47+58..64+91,93+96+123..255-ne33|%{$n=$n-replace"[$([char]$_)]",("%{0:x2}"-f$_)};$n-replace'\\','%5c'-replace'\^','%5e'

Long because I wanted to show a different approach rather than just copy-pasting the same regex string that every else is using.

Instead here, we loop through every code point that must be percent-encoded, and do a literal -replace on the input string $n each iteration (re-saving back into $n). Then we need to account for the two special characters that need escaping, \ and ^, so those are in separate -replace elements at the end. Since we didn't re-save that final string, it's left on the pipeline and printing is implicit.

answered Aug 15, 2016 at 14:16
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.