Percent-Encode a String

Question 1

Introduction

As some of you may know, URLs actually have a list of characters that do special things. For example, the / character separates parts of the URL, and the ?, &, and = characters are used to pass query parameters to the server. In fact, there is a bunch of characters with special functions: $&+,/:;=?@. When you need to use these characters in the URL for any other reason besides the special functions, you have to do something called percent-encoding.

Percent encoding is when you take a character's hexadecimal value and prepend a % character to the beginning of it. For example, the character ? would be encoded as %3F, and the character & would be encoded as %26. In a URL specifically, this allows you to send these characters as data via the URL without causing parsing problems. Your challenge will be to take a string, and percent-encode all of the characters that need to be encoded.

The Challenge

You shall write a program or function that takes in a single string consisting of characters with codepoints 00-FF (ASCII and Extended ASCII characters). You will then have to output or return the same string with each character percent-encoded if necessary. Built-ins that accomplish this task are not allowed, nor are standard loopholes. For reference, here is a list of every character that needs to be percent encoded:

Control characters (Codepoints 00-1F and 7F)
Extended ASCII characters (Codepoints 80-FF)
Reserved characters ($&+,/:;=?@, i.e. codepoints 24, 26, 2B, 2C, 2F, 3A, 3B, 3D, 3F, 40)
Unsafe characters (" <>#%{}|\^~[]`, i.e. codepoints 20, 22, 3C, 3E, 23, 25, 7B, 7D, 7C, 5C, 5E, 7E, 5B, 5D, 60)

Here is a the same list, but instead as a list of decimal codepoints:

0-31, 32, 34, 35, 36, 37, 38, 43, 44, 47, 58, 59, 60, 62, 61, 63, 64, 91, 92, 93, 94, 96, 123, 124, 125, 126, 127, 128-255

This is code golf, so shortest code in bytes (or approved alternative scoring method) wins!

Test Cases

http://codegolf.stackexchange.com/ => http%3A%2F%2Fcodegolf.stackexchange.com%2F
[@=>]{#} => %5B%40%3D%3E%5D%7B%23%7D
Test String => Test%20String
ÑÉÐÔ® => %D1%C9%D0%D4%AE
 => %0F%16%7F (Control characters 0F, 16, and 7F)
 ¡¢£¤\¦§ ̈©a«¬® ̄°±23 ́μ¶· ×ばつØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ => %80%81%82%83%84%85%86%87%88%89%8A%8B%8C%8D%8E%8F%90%91%92%93%94%95%96%97%98%99%9A%9B%9C%9D%9E%9F%A0%A1%A2%A3%A4%A5%A6%A7%A8%A9%AA%AB%AC%AD%AE%AF%B0%B1%B2%B3%B4%B5%B6%B7%B8%B9%BA%BB%BC%BD%BE%BF%C0%C1%C2%C3%C4%C5%C6%C7%C8%C9%CA%CB%CC%CD%CE%CF%D0%D1%D2%D3%D4%D5%D6%D7%D8%D9%DA%DB%DC%DD%DE%DF%E0%E1%E2%E3%E4%E5%E6%E7%E8%E9%EA%EB%EC%ED%EE%EF%F0%F1%F2%F3%F4%F5%F6%F7%F8%F9%FA%FB%FC%FD%FE%FF (Extended ASCII characters 80-FF)
 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ => %20!%22%23%24%25%26'()*%2B%2C-.%2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D%7E

Question 2

Would you have a testcase that shows the control characters?

Question 3

@LeakyNun done.

Question 4

I'm sure codepoint EF doesn't contain the question mark.

Question 5

@zyabin101 where did you find that? Im not seeing it.

Question 6

"For example, the character ? would be encoded as %EF..."

Question 7

Vim, 67 bytes/keystrokes

:s/\c[^a-z!'()*0-9._-]/\='%'.printf("%02x",char2nr(submatch(0)))/g<cr>

Note that <cr> represents the enter key, e.g. 0x0D which is a single byte.

This is a pretty straightforward solution. Explanation:

:s/ "Search and replace
 \c "Case-insensitive
 [^a-z!'()*0-9._-]/ "A negative range. Matches any character not alphabetc, numeric or in "!'()*0-9._-"
 \= "Evaluate
 '%' "a percent sign string
 . "Concatenated with
 printf("%02x",char2nr(submatch(0))) "The hex value of the character we just matched
 /g "Make this apply to ever match
 <cr> "Actually run the command

That printf("%02x",char2nr(submatch(0))) garbage is terribly ungolfy.

Question 8

"That printf("%02x",char2nr(submatch(0))) garbage is terribly ungolfy" and extremely hacky

Question 9

Pyth, (削除) 30 28 (削除ここまで) 26 bytes

L?hx+G+rG1CGbb+\%.HCbsmydz

try it online

Explanation

L?hx+G+rG1CGbb+\%.HCbsmydz
L?hx+G+rG1CGbb+\%.HCb First part, L defines the function y(b)
 ?hx+G+rG1CGbb+\%.HCb ? is the ternary operator
 hx+G+rG1CGb This part will be evaluated
 hx x will find the first occurence of a
 character in a list. If it doesn't
 find one, it will return -1. hx then
 equals 0 (or false).
 +G+rG1CG The list of allowed characters, a
 concetanation (+) of the alphabet (G),
 uppercase alphabet (rG1) and numbers
 (CG, see below for details)
 b The character to find in the list
 b True branch of the ternary operator,
 the character is allowed and returned.
 +\%.HCb False branch, convert to hex and add %
 smydz The actual program
 mydz Map every character in the input (z)
 using the function y on every d
 s Join the array, and implicit print.

CG is this trick that generate a huge number that contains all possible digits. This is perfect, since we don't care for duplicates when checking whether a string is in another.

Question 10

This answer does not meet the spec in the question. There are more allowed characters than just A-Za-z0-9. For example, . should be preserved rather than translated to %2E. (cc: @GamrCorps)

Question 11

Perl, 40 bytes

39 bytes code + -p.

A bit lame, but I think it's the shortest solution...

s/[^!'()*-.\w]/sprintf'%%%02x',ord$&/ge

Usage

echo -n ' !"#$%&'\''()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqstuvwxyz{|}~' | perl -pe "s/[^'()*-.\w]/sprintf'%%%02x',ord$&/ge"
%20%21%22%23%24%25%26'()*+,-.%2f0123456789%3a%3b%3c%3d%3e%3f%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5b%5c%5d%5e_%60abcdefghijklmnopqstuvwxyz%7b%7c%7d%7e

Question 12

Julia, 47 bytes

!s=replace(s,r"[^\w!'()*.-]",c->"%"hex(c[1],2))

Try it online!

Question 13

Python 3, 92 bytes

5 bytes thanks to orlp.

1 byte thanks to Sp3000.

import re;lambda s:''.join(re.match("[!'-*.0-9\w-]",c,256)and c or'%%%02X'%ord(c)for c in s)

Ideone it!

Question 14

re.match("[!'()*.0-9A-Za-z_-]",c)and c or'%%%02X'%ord(c)

Question 15

@Sp3000 \w includes extended ASCII

Question 16

Also, '()* -> '-*

Question 17

I think \w works with the 256 (re.ASCII) option: ideone. It definitely works in Python 3 on ideone, and it should work with u"..." strings in Python 2, but ideone seems to do funky things to the latter (e.g. print len(u"ÑÉÐÔ®") gives 10 on ideone but 5 on repl.it and my computer, despite all being 2.7.10)

Question 18

C, 83 bytes

f(char*p){for(;*p;++p)printf(isalnum(*p)||strchr("!'()*-._",*p)?"%c":"%%%02X",*p);}

Question 19

Python, 86 bytes

lambda s:"".join(["%%%02X"%ord(c),c][c<"{"and c.isalnum()or c in"!'()*-._"]for c in s)

Port of my C answer.

Question 20

Ruby, 37 + 3 = 40 bytes

Run with -p (3 extra bytes), like $ ruby -p percent_encode.rb:

gsub(/[^\w!'()*-.]/){"%%%02X"%$&.ord}

Question 21

Jelly, (削除) 28 (削除ここまで) 27 bytes

ḟØWḟ©"!'()*-."Od4‘ịØH"%p®,y

This is a monadic link. Try it online!

How it works

ḟØWḟ©"!'()*-."Od4‘ịØH"%p®,y Monadic link. Argument: s (string)
 ØW Yield "0...9A...Z_a...z".
ḟ Remove these characters from s.
 "!'()*-." Yield "!'()*-.".
 ḟ Remove these characters from s.
 © Copy the result to the register.
 O Ordinal; get the code point of each character.
 d4 Divmod 16; yield quotient and remainder modulo 16.
 ’ Decrement the results.
 ịØH Index into "0123456789ABCDEF".
 "p% Perform Cartesian product with "%, prepending it to
 each pair of hexadecimal digits.
 ®, Yield [t, r], where t is the string in the register
 and r the result of the Cartesian product.
 y Use this pair to perform transliteration on s.

Question 22

Haskell, (削除) 201 (削除ここまで) (削除) 179 (削除ここまで) (削除) 178 (削除ここまで) (削除) 127 (削除ここまで) 119 bytes

import Data.Char;import Numeric;f=(=<<)(\c->if isAlphaNum c&&isAscii c||elem c"-_.~"then[c]else '%':(showHex$ord c)"")

Ungolfed:

import Data.Char
import Numeric
f=(=<<) e
e c = if isAlphaNum c && isAscii c && c `elem` "-_.~" then [c] else '%' : (showHex $ ord c) ""

Question 23

Can you remove a bunch of the spaces?

Question 24

You can loose the where, turn the if into guards, make e partial, loose the last argument of showHex, inline p, inline s, loose the signature, reorder the elem and loose even more whitespace. As a first approximation I got down to 118 that way.

Question 25

Thanks @MarLinn for a bunch of good suggestions on trimming down the code. However, I had some trouble with certain suggestions. First of all, if I remove the signature, GHC will complain that No instance for (Foldable t0) arising from a use of ‘foldr’. It says that the type of the function is ambiguous, resulting in an inferred binding of f :: t0 Char -> [Char]. And second of all, I could not remove the empty string argument from showHex as it returns a ShowS, which is a type alias for String -> String thus needing the empty string.

Question 26

@sham1, yes, ShowS takes a String... but you have one: the one you're adding with (++). So you can loose both at the same time. That's actually why ShowS looks that way. I don't get the type error, so I guess it's a version thing? Two other things I noticed by now: otherwise can always be replaced by 1<2 (a shorthand for True), but if you return to if instead you can inline e and drop all names. And even turn the fold into a concatMap, i.e. a (>>=). Doesn't save a lot, but at least a little. Might solve the type error, too.

Question 27

16/32-bit x86 assembly, 73 bytes

Byte code:

AC 3C 21 72 2A 74 3E 3C 26 76 24 3C 2B 72 36 3C
2C 76 1C 3C 2F 72 2E 74 16 3C 3A 72 28 74 10 3C
5F 74 22 50 0C 60 3C 60 74 02 3C 7B 58 72 16 D4
10 3C 09 1C 69 2F 86 E0 3C 09 1C 69 2F 92 B0 25
AA 92 AA 86 E0 AA E2 B8 C3

Disassembly:

l0: lodsb ;fetch a character
 cmp al, 21h
 jb l1 ;encode 0x00-0x20
 je l2 ;store 0x21
 cmp al, 26h
 jbe l1 ;encode 0x22-0x26
 cmp al, 2bh
 jb l2 ;store 0x27-0x2A
 cmp al, 2ch
 jbe l1 ;encode 0x2B-0x2C
 cmp al, 2fh
 jb l2 ;store 0x2D-0x2E
 je l1 ;encode 0x2F
 cmp al, 3ah
 jb l2 ;store 0x30-0x39
 je l1 ;encode 0x3A
 cmp al, 5fh
 je l2 ;store 0x5F
 push eax
 or al, 60h ;merge ranges
 cmp al, 60h
 je l3 ;encode 0x40, 0x60
 cmp al, 7bh
l3: pop eax
 jb l2 ;store 0x41-0x5A, 0x61-0x7A
 ;encode 0x3B-0x3F, 0x5B-0x5E, 0x7B-0xFF
l1: aam 10h ;split byte to nibbles
 cmp al, 9 ;convert 0x0A-0x0F 
 sbb al, 69h ;to
 das ;0x41-0x46 ('A'-'F')
 xchg ah, al ;swap nibbles
 cmp al, 9 ;do
 sbb al, 69h ;other
 das ;half
 xchg edx, eax ;save in edx
 mov al, '%'
 stosb ;emit '%'
 xchg edx, eax
 stosb ;emit high nibble
 xchg ah, al
l2: stosb ;emit low nibble or original character
 loop l0 ;until end of string
 ret

Call with:
- esi = pointer to buffer that holds source string;
- edi = pointer to buffer that receives encoded string;
- ecx = length of source string.

Question 28

Python 2, 78 bytes

lambda s:"".join(["%%%02x"%ord(c),c][c.isalnum()or c in"!'()*-._"]for c in s)

More nicely formatted:

lambda s:
 "".join(["%%%02x" % ord(c), c][c.isalnum() or c in"!'()*-._"] for c in s)

Question 29

SQF, (削除) 199 (削除ここまで) 176

Using the function-as-a-file format:

i="";a="0123456789ABCDEF!'()*-.GHIJKLMNOPQRSTUVWXYZ_";{i=i+if((toUpper _x)in a)then{_x}else{x=(toArray[_x])select 0;"%"+(a select floor(x/16))+(a select(x%16))}}forEach _this;i

Call as "STRING" call NAME_OF_COMPILED_FUNCTION

Question 30

PowerShell v2+, 146 bytes

param($n)37,38+0..36+43,44,47+58..64+91,93+96+123..255-ne33|%{$n=$n-replace"[$([char]$_)]",("%{0:x2}"-f$_)};$n-replace'\\','%5c'-replace'\^','%5e'

Long because I wanted to show a different approach rather than just copy-pasting the same regex string that every else is using.

Instead here, we loop through every code point that must be percent-encoded, and do a literal -replace on the input string $n each iteration (re-saving back into $n). Then we need to account for the two special characters that need escaping, \ and ^, so those are in separate -replace elements at the end. Since we didn't re-save that final string, it's left on the pipeline and printing is implicit.

DJMcMayhem DJMcMayhem 60k18 gold badges203 silver badges352 bronze badges · Accepted Answer · 2016-08-14 06:31:28Z

Vim, 67 bytes/keystrokes

:s/\c[^a-z!'()*0-9._-]/\='%'.printf("%02x",char2nr(submatch(0)))/g<cr>

Note that <cr> represents the enter key, e.g. 0x0D which is a single byte.

This is a pretty straightforward solution. Explanation:

:s/ "Search and replace
 \c "Case-insensitive
 [^a-z!'()*0-9._-]/ "A negative range. Matches any character not alphabetc, numeric or in "!'()*0-9._-"
 \= "Evaluate
 '%' "a percent sign string
 . "Concatenated with
 printf("%02x",char2nr(submatch(0))) "The hex value of the character we just matched
 /g "Make this apply to ever match
 <cr> "Actually run the command

That printf("%02x",char2nr(submatch(0))) garbage is terribly ungolfy.

"That printf("%02x",char2nr(submatch(0))) garbage is terribly ungolfy" and extremely hacky

Stack Exchange Network

Percent-Encode a String

Introduction

The Challenge

Test Cases

14 Answers 14

Vim, 67 bytes/keystrokes

Pyth, (削除) 30 28 (削除ここまで) 26 bytes

Explanation

Perl, 40 bytes

Usage

Julia, 47 bytes

Python 3, 92 bytes

C, 83 bytes

Python, 86 bytes

Ruby, 37 + 3 = 40 bytes

Jelly, (削除) 28 (削除ここまで) 27 bytes

How it works

Haskell, (削除) 201 (削除ここまで) (削除) 179 (削除ここまで) (削除) 178 (削除ここまで) (削除) 127 (削除ここまで) 119 bytes

16/32-bit x86 assembly, 73 bytes

Python 2, 78 bytes

SQF, (削除) 199 (削除ここまで) 176

PowerShell v2+, 146 bytes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Percent-Encode a String

Introduction

The Challenge

Test Cases

14 Answers 14

Vim, 67 bytes/keystrokes

Pyth, (削除) 30 28 (削除ここまで) 26 bytes

Explanation

Perl, 40 bytes

Usage

Julia, 47 bytes

Python 3, 92 bytes

C, 83 bytes

Python, 86 bytes

Ruby, 37 + 3 = 40 bytes

Jelly, (削除) 28 (削除ここまで) 27 bytes

How it works

Haskell, (削除) 201 (削除ここまで) (削除) 179 (削除ここまで) (削除) 178 (削除ここまで) (削除) 127 (削除ここまで) 119 bytes

16/32-bit x86 assembly, 73 bytes

Python 2, 78 bytes

SQF, (削除) 199 (削除ここまで) 176

PowerShell v2+, 146 bytes

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions