Challenge
I think everyone of us heard of URL encoding mechanism - it's basically everywhere.
Given an URLEncoded string on stdin, decode it, and output the decoded form to stdout.
The encoding is very simple, +
or %20
is representing space. Every percent followed by two hex digits (uppercase or lowercase) has to be replaced with ASCII character code of this number.
Related, but it's the other way round
Example
100%25+working
=>
100% working
%24+%26+%3C+%3E+%3F+%3B+%23+%3A+%3D+%2C+%22+%27+%7E+%2B+%25
=>
$ & < > ? ; # : = , " ' ~ + %
%414243
=>
A4243
Test case that fails:
%24%XY+%%%
^~~~~~~
=>
$
Rules
- Loopholes are forbidden.
- If your programming language has built-in function to decode the URL, using it (the function) is forbidden.
- Assume ASCII character set. No Unicode and all that stuff.
- This is a code-golf, so the answer with the fewest bytes used to accomplish the task wins.
- Please include a link to an online interpreter for your code.
- Your program has to decode until EOF or an error (eg. percent without valid hex digits -
%xy
) is spotted (you can safetly terminate). - If anything is unclear, please let me know down in the comments.
13 Answers 13
05AB1E, (削除) 41 (削除ここまで) 30 bytes
'+ð:Δć©'%Qi2ôćuDHç©Ç`hÊiq}J}®?
-11 bytes thanks to @Grimy.
Try it online or verify all test cases.
Explanation:
'+ð: '# Replace all "+" in the (implicit) input with spaces
Δ # Loop until the result no longer changes:
ć # Extract head; pop and push remainder-string and first character
© # Store the character in variable `®` (without popping)
'%Qi '# If this character is a "%":
2ô # Split the remainder-string into parts of size 2
# i.e. "abcde" → ["ab","cd","e"]
ć # Extract head again
u # Convert it to uppercase
D # Duplicate it
H # Convert it from hexadecimal to integer
# (NOTE: even if it isn't a valid hexadecimal string,
# it will still result in an integer regardless)
ç # And then from integer to ASCII-character with this codepoint
© # Replace variable `®` with this (without popping)
Ç`h # Reverse process: ASCII-character → integer → hexadecimal string
Êi # If both are NOT equal (so it initially was invalid hexadecimal):
q # Stop the program
}J # And join the list of 2-char strings back together
}®? # And then print `®` without newline
-
1\$\begingroup\$ Fails on lowercase (challenge explicitly says lowercase hex has to be handled too). \$\endgroup\$Grimmy– Grimmy2019年08月02日 13:02:44 +00:00Commented Aug 2, 2019 at 13:02
-
1\$\begingroup\$ @Grimy Straight-forward fix with +3 bytes for now (adding
Dl«
). Thanks for reporting. I did check ifH
would convert lowercase or mixed case characters correctly, but forgot about myå
check.. \$\endgroup\$Kevin Cruijssen– Kevin Cruijssen2019年08月02日 13:09:15 +00:00Commented Aug 2, 2019 at 13:09 -
-
1
-
\$\begingroup\$ @Grimy Nice, thanks. :) \$\endgroup\$Kevin Cruijssen– Kevin Cruijssen2019年08月02日 15:39:12 +00:00Commented Aug 2, 2019 at 15:39
JavaScript (V8), 112 bytes
u=>u.match(/([^%]|%[\da-f]{2})*/i)[0].replace(/%..|./g,d=>d=="+"?" ":d[1]?String.fromCharCode("0x"+d[1]+d[2]):d)
Noticed Arnauld's 92 byte Node.js answer shortly after spending some time golfing this. Porting that method would save quite a few bytes (with the main difference being Buffer
vs. String.fromCharCode
), but I wanted to post this one as it's more interesting and the V8 port wouldn't be worth a separate answer.
This challenge requires input validation, so the first .match
takes only the valid part of the URL. Then, each part of it is replace
d using a function. One trick I used that's kind of neat is the "0x"+d[1]+d[2]
. Ordinarily you can convert hexadecimal to decimal using +("0x"+n)
, but it seems String.fromCharCode
casts to number on its own, saving three bytes. Instead of slicing the initial %
, I just concatenate the second and third characters, which is shorter.
Gema, 45 characters
+=
%<X2>=@int-char{@radix{16;10;1ドル}}
%=@fail
Insensitive on hexadecimal case.
Sample run:
bash-5.0$ gema '+= ;%<X2>=@int-char{@radix{16;10;1ドル}};%=@fail' <<< $'100%25+working\n%24+%26+%3C+%3E+%3F+%3B+%23+%3A+%3D+%2C+%22+%27+%7E+%2B+%25\n%414243\n%24+%XY+%%%'
100% working
$ & < > ? ; # : = , " ' ~ + %
A4243
$
JavaScript (Node.js), 92 bytes
f=([x,y,z,...a])=>x=='%'?1/(n='0x'+y+z)?Buffer([n])+f(a):'':x?(x=='+'?' ':x)+f([y,z,...a]):a
Python 3, 90 bytes
def d(s):t='%'!=s[0];print(end=t*s[0].replace('+',' ')or chr(int(s[1:3],16)));d(s[3-2*t:])
Explanation
Checks if the first character is %
. If that is the case, it will try to hex-decode the following two characters and print the result. If not, it will just print the first character and replaces x
with if necessary.
If the first character was %
, the first three characters are sliced off the string and the function is called recursively. If not, only the first character is sliced off and the function is called again.
Raises an error if the hex string cannot be decoded or if end of line is reached.
Jelly, 35 bytes
ṣ"+Kṣ"%μḊḢ;ḢƊ€ŒuØHiⱮⱮ’ḅ48żFO<0œṗƊḢỌ
A monadic link that takes a string as its argument and returns the decoded string, terminating early at any invalid hex.
I’ve assumed for now that standard I/O rules apply. If it really has to be stdin, that will cost a byte.
Perl 6, 62 bytes
{S:g{(<-[%]>*)\%?(..)?}={TR/+/ /}(0ドル).print+print chr "0x"~1ドル}
Anonymous code block that outputs to STDOUT and then errors. This assumes that the input cannot contain spaces.
Charcoal, 65 bytes
≔E16⍘ιφθWθ«≔⮌⪪S%ι⊟ιW∧ι⊟ι«¿∧›Lκ1⬤01Noθ↧§κμ«c/o⍘↧...κ2¦16✂κ2»«≔υι≔υθ»»D⎚
Try it online! Link is to verbose version of code. Note that Charcoal prompts "Enter input:" if it runs out of input. Explanation:
≔E16⍘ιφθ
Grab the list of hex digits into a variable.
Wθ«
Repeat while the variable is not empty. This is used as a flag to break out of the loop, since Charcoal has no other way of terminating the loop.
≔⮌⪪S%ι
Read the next line of text and split it on %
s.
⊟ι
Output the first split.
W∧ι⊟ι«
Repeat while there are more splits to process, but stop if any of them are empty.
¿∧›Lκ1⬤01Noθ↧§κμ«
Also check that the length of the split is at least 2 and that the first 2 characters are hex digits. (Inconveniently I can't use a literal 2
, I have to use a string of length 2 instead.)
c/o⍘↧...κ2¦16
Convert the first two characters from hex and output the character with that code.
✂κ2
Output the rest of that split.
»«≔υι≔υθ»»D⎚
Otherwise clear the loop variables so that we terminate processing. The canvas is also printed after each loop as otherwise Charcoal's input handling gets in the way again.
Python 3, 91 bytes
lambda s:re.sub('%([A-Fa-f\d]{2})',lambda t:chr(int(t[1],16)),s.replace('+',' '))
import re
Approach this with regular expressions.
-
\$\begingroup\$ Nice approach, but prints
$%XY %%%
instead of$
in the fourth example. \$\endgroup\$Jitse– Jitse2019年08月03日 07:49:46 +00:00Commented Aug 3, 2019 at 7:49
Janet, 88 bytes
|(peg/match~(%(any(+(/"+"" ")(*"%"(/(number(2 :h)16),string/from-bytes))(*(!"%")'1))))$)
Returns a single-element array with the resulting string. +3 bytes if this is unacceptable:
|((peg/match~(%(any(+(/"+"" ")(*"%"(/(number(2 :h)16),string/from-bytes))(*(!"%")'1))))$)0)
In the case of invalid input, chops off the string at the last valid point, as requested.
APL(NARS), 182 chars
r←f w;i;c;b
b←r←''⋄l←≢w⋄i×ばつ⍳l<i×ばつ⍳'%'=c←w[i×ばつ⍳'+'=c⋄r,←c×ばつ⍳l<i+←1⋄b,←w[i×ばつ⍳l<i+←1⋄b,←w[i×ばつ⍳∼b⊆⎕D∪⎕A[1..6]∪⎕a[1..6]⋄r,←⎕AV[1+⍎'16b',b]⋄b←''⋄→2
r,←' '⋄→2
r←,'$'
// +/ 12 16 44 93 10 7
the function with the line numbers
0:r←f w;i;c;b
1:b←r←''⋄l←≢w⋄i×ばつ⍳l<i×ばつ⍳'%'=c←w[i×ばつ⍳'+'=c⋄r,←c×ばつ⍳l<i+←1⋄b,←w[i×ばつ⍳l<i+←1⋄b,←w[i×ばつ⍳∼b⊆⎕D∪⎕A[1..6]∪⎕a[1..6]⋄r,←⎕AV[1+⍎'16b',b]⋄b←''⋄→2
4:r,←' '⋄→2
5:r←,'$'
f has as input one string and as output one string. If some error is found (% and 2 digits exadecimal not found or out of range allowable) it would return the string "$". ⍎'16bBB' is translated to 187 the hex value from 0xBB.
It is a little long for whaterver language, but if this is the only one in APL it would win the same.
f '100%25+working'
100% working
f '%24+%26+%3C+%3E+%3F+%3B+%3A'
$ & < > ? ; :
f'%414243'
A4243
f'%24%XY+%%%'
$
stdin
, rather than default IO rules? Also, agreed the required input validation makes it feel like a chameleon challenge... on some platforms the code to do that may be longer than the actual challenge. \$\endgroup\$