Interpret a SqueezeL string
SqueezeL is a golfing language I'm developing. Its main distinguishing feature is its 40 character code page, which led me to create a semi-complicated encoding method for string literals, which I think could be an interesting code golf challenge.
Input
Any string consisting only of spaces, quotes, parentheses, digits and lowercase letters.
Output
The string represented by the input. Here are the tokens that can appear in a string:
- A space, digit, or lowercase letter, which represents itself.
- A doubled parenthesis, which represents a single parenthesis. That is,
((represents(and))represents). )", which represents".- A right parenthesis followed by a lowercase letter, which represents the uppercase version of that letter.
- A "short base-36 code", which consists of a right parenthesis, followed by a digit, followed by an alphanumeric character. This can represent any character code point in [0, 359], in base 36. For example, a comma has the codepoint 44, which is 18 is base 36, so
)18represents a comma. - A "long base-36 code": a left parenthesis followed by four alphanumeric characters. Works the same way as a short base-36 code, but can represent any Unicode character.
You may assume the input won't contain any invalid tokens, such as (), (", or ) .
Examples
foobar123 -> foobar123
)capital -> Capital
((parentheses)) -> (parentheses)
)"quotes)" -> "quotes"
)who)18 me)1r -> Who, me?
)9z -> ŧ
(2r5s -> 😀
// more cases suggested by Arnauld
)9zzz -> ŧzz
(2r5uuu -> 😂uu
This is code-golf, so fewest bytes wins.
5 Answers 5
JavaScript (ES9), 105 bytes
s=>s.replace(/[()]((?<=\()\w{4}|\d?.)/g,(_,s)=>s[1]?String.fromCodePoint(parseInt(s,36)):s.toUpperCase())
Method
Regular expression:
[()] // a parenthesis followed by either:
( //
(?<=\()\w{4} // if it's an opening parenthesis: 4 alphanumeric characters
// matches: (xxxx
| // or
\d?. // an optional digit followed by any character
) // matches: ((, )), )a, )", and )xx
where the a in )a is a lower case letter and xx / xxxx are base-36 strings.
NB: Nothing else will be match provided that the input string does not contain invalid tokens, as specified in the challenge.
We don't really care about the escape character and only retrieve the string s that follows. It boils down to two cases:
If
sis at least 2 characters long, proceed with base-36 decoding:String.fromCodePoint(parseInt(s,36))Otherwise, return
sin upper case (leaving(,)and"unchanged):s.toUpperCase()
Japt, 35 bytes
Port of Arnauld's JS solution.
r"%)%d?.|%(%w\{0,3}."Ȥ?XÅn36 d:XÌu
Try it (includes all test cases)
r"..."Ȥ?XÅn36 d:XÌu :Implicit input of string
r :Replace
"..." : RegEx /\)\d?.|\(\w{0,3}./g
È : Pass each match, X, through the following fuction
¤ : Slice off the first 2 characters
? : If truthy (non-empty string)
XÅ : Slice off the first character
n36 : Convert from base 36
d : Get character at that codepoint
: : Else
XÌ : Last character
u : Uppercase
-
\$\begingroup\$ I've saved 2 bytes in the JS version with an updated regex. This may just make things longer in Japt, though. \$\endgroup\$Arnauld– Arnauld2025年01月16日 17:34:48 +00:00Commented Jan 16 at 17:34
-
\$\begingroup\$ Thanks, @Arnauld :) But, you're right, it would be longer in Japt; I'd only be able to reclaim 2 bytes, either by removing the matching group from the new RegEx, or by removing the slice before the base conversion and the indexing into the string before the case conversion. \$\endgroup\$Shaggy– Shaggy2025年01月17日 09:56:45 +00:00Commented Jan 17 at 9:56
Charcoal, 62 bytes
FS≡⪫υωω¿No()ι⊞υιι(≡ι⊟υι⊞υι)¿=ιIΣι≔⟦ωωι⟧υ∧⊟υ↥ι¿=L⊞Oυι4«c/o⍘υ36≔⟦⟧υ
Try it online! Link is to verbose version of code. Explanation:
FS
Loop over each character of the input string.
≡⪫υω
Check which state the program is in. The program state is kept as a list as this makes it easier to reset the program state by popping the character (saving 6 bytes), but switch only works on hashable types, so the list is joined here, although the only interesting cases are the empty list and the list containing a parenthesis.
ω
If the program is in the starting state:
¿No()ι
If the current character is a parenthesis, then...
⊞υι
... set the state to that character, otherwise...
ι
... output the character.
(
If the program is in the open parenthesis state:
≡ι⊟υ
If the current character is also an open parenthesis, resetting the program state, then...
ι
... output the current character, otherwise...
⊞υι
... add this base 36 digit to the program state.
)
If the program is in the close parenthesis state:
¿=ιIΣι
If the current character is a digit, then...
≔⟦ωωι⟧υ
... set the program state to waiting for the final base 36 digit, otherwise...
∧⊟υ↥ι
... reset the program state and output the current character in upper case.
¿=L⊞Oυι4«
Otherwise, if this is the last base 36 digit to collect, then:
c/o⍘υ36
Convert the state from base 36 and output that Unicode character.
≔⟦⟧υ
Reset the program state.
05AB1E, 43 bytes
¶ì„)(vJy©¡NUεDõQi®ë¬diX>·ôćAžhìÅβçšJëćuì]J¦
Explanation:
¶ì # Prepend a newline before the (implicit) input-string
„)( # Push string ")("
v # Loop over its characters `y`:
J # (First iteration: no-op)
# Second iteration: Join the list of the previous iteration back together to a string
y # Push the current parenthesis-character
© # Store it in variable `®` (without popping)
¡ # First iteration: split the implicit input by this character)
# Second iteration: split the current string by this character
NU # And also store the current index in variable `X`
ε # Map over each part:
DõQi # If the current part is empty:
® # Push the current parenthesis-character `®` instead
ë¬di # Else-if the current part starts with a digit:
X # Push index `X`
>· # Increment and double (0 becomes 2 and 1 becomes 4)
ô # Split the string into parts of that size
ć # Extract head; push first item and remainder-list separately
AžhìÅβ # Convert it from custom base "0-9a-z" to a base-10 integer
ç # Convert that from a codepoint-integer to a character
š # Prepend it back to the list
J # Join the list back together
ë # Else: it doesn't start with a digit
ć # Extract head
u # Uppercase it (no-op for '"')
ì # Prepend it back to the remainder-string
] # Close the if-else statements; map; and loop
J # Join the list back together to a string
¦ # Remove the leading newline again
# (after which the result is output implicitly)
Minor note: the double J (one right after v and one after ]) is shorter than a single join after closing both if-else statements and the map at the end of every loop-iteration:
↓ ↓↓
¶ì„)(vJy©¡NUεDõQi®ë¬diX>·ôćAžhìÅβçšJëćuì]J¦
¶ì„)(vy©¡NUεDõQi®ë¬diX>·ôćAžhìÅβçšJëćuì}}}J}¦
↑↑↑↑↑
Perl 5 -MMath::Base36=:all -p, 108 bytes
s/(?<!\))\)(\pL)/\U1ドル/g;s/(?<!\))\)(\d.)|(?<!\()\((\w{4})/chr decode_base361ドル.2ドル/ge;s/\)([")])|\((\()/1ドル2ドル/g