Challenge
The challenge is to implement the bottom encoding (only encoding, not decoding). There is a wide variety of existing implementations in the bottom-software-foundation org.
Bottom is a text encoding where each character is separated into multiple emoji.
Unicode escape(s) | Character | Value |
---|---|---|
U+1FAC2 |
๐ซ | 200 |
U+1F496 |
๐ | 50 |
U+2728 |
โจ | 10 |
U+1F97A |
๐ฅบ | 5 |
U+002C |
, | 1 |
U+2764 , U+FE0F |
โค๏ธ | 0 |
Unicode escape(s) | Character | Purpose |
---|---|---|
U+1F449 , U+1F448 |
๐๐ | Byte separator |
Notes on encoding
- The output stream will be a sequence of groups of value characters (see table above) with each group separated by the byte separator character, i.e.
๐โจโจโจ๐๐๐๐๐ฅบ,,,๐๐๐๐,๐๐๐โจโจโจโจ๐ฅบ,,๐๐๐๐โจ๐ฅบ๐๐๐๐,๐๐๐โจ,,,๐๐
- The total numerical value of each group must equal the decimal value of the corresponding input byte.
- For example, the numerical value of
๐๐,,,,
, as according to the character table above, is50 + 50 + 1 + 1 + 1 + 1
, or 104. This sequence would thus representU+0068
orh
, which has a decimal value of104
. - Note the ordering of characters within groups. Groups of value characters must be in descending order. While character order (within groups) technically does not affect the output in any way, arbitrary ordering can encroach significantly on decoding speed and is considered both illegal and bad form.
- For example, the numerical value of
- Byte separators that do not follow a group of value characters are illegal, i.e
๐๐,,,,๐๐๐๐
or๐๐๐๐,,,,๐๐
. As such,๐๐
alone is illegal. - Groups of value characters must be followed by a byte separator.
๐๐,,,,
alone is illegal, but๐๐,,,,๐๐
is valid. - The null value must not be followed by a byte separator.
๐๐,,,,๐๐โค๏ธ๐๐,,,,๐๐
and๐๐,,,,๐๐โค๏ธ
alone are valid, but๐๐,,,,๐๐โค๏ธ๐๐
is illegal.
Some pseudocode to illustrate how each character is converted:
for b in input_stream:
# Get `b`'s UTF-8 value
let v = b as number
let o = new string
if v == 0:
o.append("โค๏ธ")
else:
while true:
if v >= 200:
o.append("๐ซ")
v = v - 200
else if v >= 50:
o.append("๐")
v = v - 50
else if v >= 10:
o.append("โจ")
v = v - 10
else if v >= 5:
o.append("๐ฅบ")
v = v - 5
else if v >= 1:
o.append(",")
v = v - 1
else:
break
o.append("๐๐")
return o
Rules
- The standard I/O rules apply.
- The input must be a string, not a list of bytes.
- An empty string should return an empty string.
- It should return the same result as bottom-web for any input with no null bytes (UTF-8 validation is not required).
Some test cases:
test
->๐๐โจ๐ฅบ,๐๐๐๐,๐๐๐๐โจ๐ฅบ๐๐๐๐โจ๐ฅบ,๐๐
Hello World!
->๐โจโจ,,๐๐๐๐,๐๐๐๐๐ฅบ,,,๐๐๐๐๐ฅบ,,,๐๐๐๐โจ,๐๐โจโจโจ,,๐๐๐โจโจโจ๐ฅบ,,๐๐๐๐โจ,๐๐๐๐โจ,,,,๐๐๐๐๐ฅบ,,,๐๐๐๐๐๐โจโจโจ,,,๐๐
๐ฅบ
->๐ซโจโจโจโจ๐๐๐๐๐๐ฅบ,,,,๐๐๐๐๐โจ๐ฅบ๐๐๐๐๐โจโจโจ๐ฅบ,๐๐
- (2 spaces) ->
โจโจโจ,,๐๐โจโจโจ,,๐๐
๐
->๐ซโจโจโจโจ๐๐๐๐โจโจโจโจ,,,,๐๐๐๐โจโจ๐ฅบ,,,๐๐๐๐โจโจโจ๐ฅบ,,,๐๐
โโ
->๐ซโจโจ๐ฅบ,๐๐๐๐โจโจโจโจ,๐๐๐๐โจโจโจ๐ฅบ,,๐๐๐ซโจโจ๐ฅบ,๐๐๐๐โจโจโจ,๐๐๐๐๐โจ๐๐
Xโฉ
->๐โจโจโจ๐ฅบ,,,๐๐๐ซโจโจ๐ฅบ,๐๐๐๐โจโจโจ,๐๐๐๐๐โจ๐ฅบ,,,,๐๐
โค๏ธ
->๐ซโจโจ๐ฅบ,๐๐๐๐๐๐ฅบ,,๐๐๐๐๐โจ,,,,๐๐๐ซโจโจโจ๐ฅบ,,,,๐๐๐๐๐โจโจโจ,,,,๐๐๐๐โจโจโจโจ,,,๐๐
.0ๅ.
(null byte surrounded by full stops) ->โจโจโจโจ๐ฅบ,๐๐โค๏ธโจโจโจโจ๐ฅบ,๐๐
This is code-golf, so fewest bytes win!
8 Answers 8
APL(Dyalog Unicode), (ๅ้ค) (ๅ้คใใใพใง)133 bytes SBCS
{โ,/{โต=0:'โค๏ธ'โ'๐๐',โจโฌ{โตโฅ200:(โบ,'๐ซ')โโต-200โโตโฅ50:(โบ,'๐')โโต-50โโตโฅ10:(โบ,'โจ')โโต-10โโตโฅ5:(โบ,'๐ฅบ')โโต-5โโตโฅ1:(โบ,',')โโต-1โโบ}โต} ฬ'UTF-8'โucsโต}
-5 thanks to ovs.
-
\$\begingroup\$ This question uses the bytes in the UTF-8 encoding of the input, which is different from this codepoint. You can get this with
'UTF-8'โucsโต
: Slightly golfed \$\endgroup\$ovs– ovs2021ๅนด02ๆ04ๆฅ 12:41:50 +00:00Commented Feb 4, 2021 at 12:41 -
\$\begingroup\$ And I don't think these emojis are included in the single-byte code page, which means you can't encode the program with this. You either have to encode it in UTF-8, which is quite long, or remove those. \$\endgroup\$ovs– ovs2021ๅนด02ๆ04ๆฅ 12:43:09 +00:00Commented Feb 4, 2021 at 12:43
-
\$\begingroup\$ hmm, fair, i overlooked that and just trusted the byte counter. \$\endgroup\$Kamila Szewczyk– Kamila Szewczyk2021ๅนด02ๆ04ๆฅ 12:44:20 +00:00Commented Feb 4, 2021 at 12:44
-
\$\begingroup\$ @Razetime doesn't work \$\endgroup\$Kamila Szewczyk– Kamila Szewczyk2021ๅนด02ๆ04ๆฅ 12:53:32 +00:00Commented Feb 4, 2021 at 12:53
05AB1E, (ๅ้ค) 101 (ๅ้คใใใพใง) (ๅ้ค) 100 (ๅ้คใใใพใง) (ๅ้ค) 90 (ๅ้คใใใพใง) 86 bytes
โinรยฆt("รฟ",รฎ3aries: 1)โ.E#ฮตรพD_iโขyรรกรโข2รครงJรซโข$Hรด}โขฦต&ะฒvyโฐ`}1D)โข}รaรใฐใค]S
Output as a list of characters.
-10 bytes thanks to @ovs.
Try it online or verify all test cases.
Explanation:
โinรยฆt("รฟ",รฎ3aries: 1)โ # Push dictionary string "inspect("รฟ",binaries: 1)",
# where `รฟ` is automatically replaced with the (implicit) input
.E # Execute it as Elixir code
# # Split the result in spaces
ฮต # Map over each string-part:
รพ # Only leave the digits of the string
D_i # If it's 0 (thus a null-byte):
โขyรรกรโข # Push compressed integer 1008465039
2รค # Split it into 2 parts: [10084,65039]
รง # Convert each to a character with this codepoint: ["โค","๏ธ"]
J # Join the inner pair together: "โค๏ธ"
รซ # Else:
โข$Hรด}โข # Push compressed integer 1626142265
ฦต& # Push compressed integer 201
ะฒ # Convert 1626142265 to base-201 as list: [200,50,10,5]
v # Loop over each integer `y`:
yโฐ # Take divmod-`y` on the top integer
# (which is the codepoint in the first iteration)
` # Pop and push the pair separated to the stack: n//y and n%y
}1D # After the loop: push two 1s
) # And wrap everything into a list
โข}รaโฤลลร{1โฮธว2?โข
# Push compressed integer 618452457360154459249099105698966985
โข1รพรโข # Push compressed integer 129731
ะฒ # Convert the larger integer to base-129731 as list:
# [129730,128150,10024,129402,44]
รง # Convert each to a character:
# ["๐ซ","๐","โจ","๐ฅบ",",","๐","๐"]
รใฐใค # Repeat each character the divmod-list amount of times
] # Close the if-else statement and map
S # Convert the list of lists to a flattened list of characters
# (after which the result is output implicitly)
See this 05AB1E tip of mine (sections How to use the dictionary?; How to compress large integers?; and How to compress integer lists?) to understand why โinรยฆt("รฟ",รฎ3aries: 1)โ
is "inspect("รฟ",binaries: 1)"
; โขyรรกรโข
is 1008465039
; โข$Hรด}โข
is 1626142265
; ฦต&
is 201
; โข$Hรด}โขฦต&ะฒ
is [200,50,10,5]
; โข}รaโฤลลร{1โฮธว2?โข
is 618452457360154459249099105698966985
; โข1รพรโข
is 129731
; and โข}รaโฤลลร{1โฮธว2?โขโข1รพรโขะฒ
is [129730,128150,10024,129402,44,128073,128072]
.
-
1\$\begingroup\$ The reason this doesn't work on TIO is that 05AB1E now uses Elixir to evaluate the expression: Try it online!. (This is the shortest way I could find) \$\endgroup\$ovs– ovs2021ๅนด02ๆ04ๆฅ 16:21:27 +00:00Commented Feb 4, 2021 at 16:21
-
\$\begingroup\$ @ovs Oh, didn't knew
.E
worked as an Elixir-eval now. Is that the new functionality in the new 05AB1E, or only on TIO? It does make sense, but I thought.E
was still a Python eval despite 05AB1E being build in Elixir now. But apparently not? \$\endgroup\$Kevin Cruijssen– Kevin Cruijssen2021ๅนด02ๆ04ๆฅ 16:35:14 +00:00Commented Feb 4, 2021 at 16:35 -
\$\begingroup\$ @ovs I've changed the entire program to Elixir evals now. Unfortunately it's quite a bit longer, but it works on TIO now, so whatever. Also, these are slightly shorter, but thanks for the
"inspect("ÿ",binaries: 1)
. :) \$\endgroup\$Kevin Cruijssen– Kevin Cruijssen2021ๅนด02ๆ04ๆฅ 17:52:01 +00:00Commented Feb 4, 2021 at 17:52 -
-
1\$\begingroup\$ You can generate the
โค๏ธ
emoji withโขyÑáÜโข2äçJ
which is 10 bytes shorter ;) \$\endgroup\$ovs– ovs2021ๅนด02ๆ04ๆฅ 17:58:45 +00:00Commented Feb 4, 2021 at 17:58
Python 3, (ๅ้ค) 134 (ๅ้คใใใพใง) 130 bytes
-4 bytes thanks to bb94!
for v in open(0,'rb').read():print(v//200*'๐ซ'+v//50%4*'๐'+v//10%5*'โจ'+v//5%2*'๐ฅบ'+v%5*',',end=0**v*'โค๏ธ'or'๐๐')
Try it online! Test case includes a trailing null byte.
open(0,'rb').read()
is the exact same length as bytes(input(),'u8')
, but reads the entire input instead of one line.
-
\$\begingroup\$ There was an indent issue with the pseudocode, if there is a 0 byte it should still add a byte separator. bottom-web should be considered the main test \$\endgroup\$Orangutan– Orangutan2021ๅนด02ๆ04ๆฅ 13:05:28 +00:00Commented Feb 4, 2021 at 13:05
-
\$\begingroup\$ Also reading from stdin and outputting to stdout isn't required due to the standard I/O rule so you could use
lambda
instead \$\endgroup\$Orangutan– Orangutan2021ๅนด02ๆ04ๆฅ 13:08:56 +00:00Commented Feb 4, 2021 at 13:08 -
\$\begingroup\$ @Orangutan bottom-web doesn't add a byte seperator after a null byte though. The output for
.0ๅ.
isโจโจโจโจ๐ฅบ,๐๐โค๏ธโจโจโจโจ๐ฅบ,๐๐
. (The shortest lambda I found was 136 bytes) \$\endgroup\$ovs– ovs2021ๅนด02ๆ04ๆฅ 13:10:11 +00:00Commented Feb 4, 2021 at 13:10 -
\$\begingroup\$ Actually never mind my comment, yeah no byte separator needed as you don't need to know how many 0s there are \$\endgroup\$Orangutan– Orangutan2021ๅนด02ๆ04ๆฅ 13:12:01 +00:00Commented Feb 4, 2021 at 13:12
-
2\$\begingroup\$ Couldn't you use
v//50%4
,v//10%5
, andv//5%2
instead ofv%200//50
,v%50//10
, andv%10//5
to save some bytes? \$\endgroup\$bb94– bb942021ๅนด02ๆ05ๆฅ 01:19:33 +00:00Commented Feb 5, 2021 at 1:19
JavaScript (newer browsers), (ๅ้ค) 166 (ๅ้คใใใพใง) 164 bytes
f=
s=>[...new TextEncoder().encode(s)].map(c =>(`๐ซ`[r=`repeat`](c/200)+`๐`[r](c/50%4)+`โจ`[r](c/10%5)+`๐ฅบ`[r](c/5%2)+`,`[r](c%5)||`โค๏ธ`)+`๐๐`).join``
<input oninput=o.textContent=f(this.value)><div id=o>
(Only the second line qualifies as the code; the rest is just part of the Stack Snippet.)
Can I Use says that Firefox supported TextEncoder before arrow functions, but I didn't bother to check other browser support.
-
\$\begingroup\$
TextEncoder
is browser API, not a part of ES standard. Maybe you can claim the language as JavaScript (browser) or something like it. \$\endgroup\$tsh– tsh2021ๅนด02ๆ05ๆฅ 03:48:53 +00:00Commented Feb 5, 2021 at 3:48
Wolfram Language (Mathematica), 141 bytes
Table@@@({Characters@"๐ซ๐โจ๐ฅบ,",#~NumberDecompose~{200,50,10,5,1}}๏)<>If[#>0,"๐๐","โค๏ธ"]&/@Normal@StringToByteArray@#<>""&
JavaScript (Node.js), 133 bytes
s=>[...Buffer(s)].map(n=>n?[200,50,10,5,1].map((v,i)=>[..."๐ซ๐โจ๐ฅบ,"][i].repeat(n/v,n%=v)).join``+"๐๐":"โค๏ธ").join``
Raku, (ๅ้ค) 110 (ๅ้คใใใพใง) 112 bytes
{[~] map ->\a {[~] "โค๏ธ๐ซ๐โจ๐ฅบ,๐๐".comb Zx(!a,a/200,a/50%4,a/10%5,a/5%2,a%5,?a,?a)},@(.encode)}
+2 bytes to properly implement null byte encoding
Lua (LuaJIT), (ๅ้ค) 184 (ๅ้คใใใพใง) (ๅ้ค) 172 (ๅ้คใใใพใง) 158 bytes
r=s.rep;s:gsub('.',function(v)v=v:byte()io.write(r('๐ซ',v/200),r('๐',v/50%4),r('โจ',v/10%5),r('๐ฅบ',v/5%2),r(',',v%5),v~=0 and'๐๐'or'โค๏ธ')end)
โค๏ธ
not followed by๐๐
in the new test case? \$\endgroup\$encode
, I would say this test case is the most accurate interpretation \$\endgroup\$โจโจโจโจ๐ฅบ,๐๐โค๏ธโจโจโจโจ๐ฅบ,๐๐
properly, but it does forโจโจโจโจ๐ฅบ,๐๐โค๏ธ๐๐โจโจโจโจ๐ฅบ,๐๐
. Also,โจโจโจโจ๐ฅบ,๐๐โค๏ธโจโจโจโจ๐ฅบ,๐๐
violates the EBNF syntax you yourself specified. \$\endgroup\$