TLDR: Sort your input according to a new English alphabet somewhat based on Chinese stroke count methods.
Background: In a Chinese glossary/index, finding terms that are contained within the book is different from English because Chinese doesn't have an alphabet like English, instead they are sorted by stroke count. (一畫 = 1 stroke,二畫 = 2 strokes,三畫 = 3 strokes,四畫 = 4 strokes,and so on)
An English glossary, having an alphabet, is naturally sorted alphabetically. For this challenge, we flip that idea somewhat to follow the Chinese manner. And we'll follow some Chinese writing rules to help determine stroke order for the alphabet below.
Counting Strokes: Take 口 (kou) for example, a simple square. You'd think it is 4 strokes, but it is actually 3. The 1st being the left vertical line. The 2nd being the top horizontal and right vertical in one fluid stroke, forming the corner. And the 3rd being the lower horizontal line, completing the square. This pattern, among others, holds relatively true across Chinese characters. For sake of simplicity though, and for some added diversity in the English Stroke Count Alphabet, there is a somewhat subjective choice for stroke counts.
Defining the NESCA First, I need to define stroke count for each letter. For sake of simplicity, and somewhat subjectively, I'll use the characters as they appear below. If there are any arguments why a letter should have a different stroke count, please make your case, but again in order to promote diversity in stroke counts, I made some personal judgment calls. For example, W could arguably be done in 2 strokes, where each stroke makes a v shape, but if that was the case for every letter, this new alphabet would essentially resemble the original. Hence my subjective choice of stroke counts. (For those that also read/speak Mandarin, 不好意思!)
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
3 3 1 2 4 3 2 3 3 1 3 2 4 3 1 2 2 3 1 2 1 2 4 2 3 3
a b c d e f g h i j k l m n o p q r s t u v w x y z
2 2 1 2 2 2 2 2 3 2 3 2 3 2 1 2 2 2 1 2 2 2 4 2 2 3
Letters with equal stroke counts should retain the original alphabetic order as before. The only tie-breaker should be upper and lower case letters with the same stroke counts. C and c, O and o, S and s, D and d, etc. So the English Stroke Order Alphabet is as follows. (If I made an error, please say as much, there are a lot of examples that I might have to adjust)
The NESCA
C J O S U D G L P Q T V X A B F H I K N R Y Z E M W
c o s a b d e f g h j l n p q r t u v x y i k m z w
and more specifically...
CcJOoSsUabDdefGghjLlnPpQqrTtuVvXxyABFHIiKkNRYZzEMmWw
1111111122222222222222222222222222333333333333344444
Note 1: Tiebreakers - If upper and lowercase for the same letter have the same stroke count, uppercase letters take precedence.
- "Cousin" precedes "cousin"
- "father" precedes "Father" (because lowercase f is 2 strokes, while the uppercase is 3)
- "Stop" precedes "soap" (while the o would precede t in stroke count, uppercase S precedes lowercase s)
- KO precedes kO (K precedes k)
- kO precedes ko (O precedes o)
- make precedes When (both have 4 strokes, but m precedes W in the original alphabet)
Note 2: Input will never include any numbers, punctuation, or special characters, nor will it be empty.
Note 3: I left this challenge in the Sandbox for 2 weeks as a precaution. I'm worried a lot of people will argue against my subjective decisions in defining this alphabet (especially the letter g). I merely tried to allow for a very new and very different alphabet, and to add more diversity to Challenges.
The Challenge Given a string input containing a sentence, series of words, or a list of words, organize those words according to the NESCA. Output can be either a string, or a list of properly words is a single string of properly organized words, including duplicates should they exist.
EDIT At the behest of users, I have changed my examples to be one consistent input/output format. My example formats can be found here, and exact examples can be obviously found in edit history.
Example Format 1
"INPUT HERE" / "OUTPUT HERE"Example Format 2
[INPUT HERE] / [OUTPUT HERE]Example Format 3
["INPUT", "HERE"] / ["OUTPUT", "HERE"]Any suitable format for your language, as per community standards.
Input / Output
"It was the best of times it was the worst of tImes" / "of of best the the tImes times It it worst was was"
"When life gives you lemons make Lemonade" / "gives Lemonade lemons life you make When"
"The journey of a thousand miles begins with one step" / "of one step a begins journey The thousand miles with"
"English Stroke Count Alphabet" / "Count Stroke Alphabet English"
"A man a plan a canal panama" / "canal a a panama plan A man"
"Carry on my wayward son" / "Carry on son my wayward"
"Close our store and begin destroying every flower green house just lose no people quietly rather than using vexing xrays yesterday it killed Zachs mini wombat" / Same as input (If you can write a better sentence than above, I'd be much appreciated. I'd gift reputation, but I don't know how)
"May the Force be with you" / "be the you Force May with"
"Im going to make him an offer hE cant refuse" / "cant offer an going him hE refuse to Im make"
"jello Jello JellO JEllo jellO JELlo JELlO jEllo JELLO JelLo JeLlo" / "JeLlo JelLo JellO Jello JELLO JELlO JELlo JEllo jellO jello jEllo" (Annoyed? Me too!)
"We suffer more often In imagination than IN reality" / "often suffer reality than In IN imagination more We"
"Code Golf and Coding Challenges" / "Code Coding Challenges and Golf"
"Do or DO not there is no try" / "or DO Do no not there try" is"
"Failure the best teacher is" / "best teacher the Failure is"
"Can you tell that I am a Star Wars fan" / "Can Star a am fan tell that you I Wars"
"enough examples no more words" / "enough examples no more words"
14 Answers 14
Japt, 53 bytes
I/O as an array of words. I wasn't able to run all the test cases 'cause trying to format the input for them all on my phone got to be infuriating.
n`CcJOoSsUabD ̧fGghjLlnPpQqrTtuVvXxyABFHIiKkNRYZzEMmWw
Try it (header splits input strings on spaces)
-
7\$\begingroup\$ Coding on your phone?!?! That's worth an upvote in and of itself! \$\endgroup\$Sumner18– Sumner182020年11月16日 21:37:08 +00:00Commented Nov 16, 2020 at 21:37
-
\$\begingroup\$ What is the purpose of the comma character between D and f? \$\endgroup\$Sumner18– Sumner182020年11月16日 21:49:11 +00:00Commented Nov 16, 2020 at 21:49
-
\$\begingroup\$ @Sumner18, it's the
decompressed; the backtick encloses a compressed string. \$\endgroup\$Shaggy– Shaggy2020年11月16日 21:54:36 +00:00Commented Nov 16, 2020 at 21:54 -
13\$\begingroup\$ @Sumner18 Shaggy golfs on his phone after a couple of pints. Unclear if the latter improves his golfing or not. \$\endgroup\$Giuseppe– Giuseppe2020年11月16日 22:31:39 +00:00Commented Nov 16, 2020 at 22:31
Jelly, 38 bytes
"EḂ 2JḶ]{5+cUBẋ÷ỌṫƇÆ7ɗ"CỵƊ¢Ṁċ’œ?ØẠ¤w)Þ
A monadic link that accepts and returns a list of words.
Explanation
"...’œ?ØẠ¤w)Þ Main monadic link
Þ Sort by
) Map [over each letter in a given word]
w Find index of subsequence in
¤ (
"...’ 3928442642485912187600397757783525135099072511850472479412437675483
œ? rd permutation of
ØẠ the string "ABC...XYZabc...xyz"
¤ )
-
1\$\begingroup\$ How did you find that permutation number? That's insane! Have an upvote! \$\endgroup\$Sumner18– Sumner182020年11月16日 22:53:04 +00:00Commented Nov 16, 2020 at 22:53
-
5\$\begingroup\$ @Sumner18 Jelly has a built-in that does exactly that:
Œ¿is essentially the inverse ofŒ?. \$\endgroup\$Arnauld– Arnauld2020年11月16日 22:58:01 +00:00Commented Nov 16, 2020 at 22:58 -
1\$\begingroup\$ @Arnauld Could that be translated to other languages? I see most other examples, like your JavaScript solution, are just hard coding the NESCA into the solution, which is fine, but I'm just imagining the possibilities. \$\endgroup\$Sumner18– Sumner182020年11月16日 23:06:15 +00:00Commented Nov 16, 2020 at 23:06
-
4\$\begingroup\$ @Sumner18 It wouldn't make sense in practical languages. In order for this approach to save bytes, it needs the language to have compressed number literals, a built-in for the alphabet and a built-in for indexing permutations. \$\endgroup\$Adamátor– Adamátor2020年11月16日 23:15:17 +00:00Commented Nov 16, 2020 at 23:15
-
6
Perl 5 -a, (削除) 104 (削除ここまで) 100 bytes
@NahuelFouilleul saved 4 bytes
sub t{pop=~y/CcJOoSsUabDdefGghjLlnPpQqrTtuVvXxyABFHIiKkNRYZzEMmWw/A-z/r}say for sort{t($a)cmp t$b}@F
-
1\$\begingroup\$ some ideas
popinstead of"@_"andA-zinstead ofA-Za-z\$\endgroup\$Nahuel Fouilleul– Nahuel Fouilleul2020年11月17日 16:50:50 +00:00Commented Nov 17, 2020 at 16:50 -
1\$\begingroup\$ @NahuelFouilleul Good call on using pop. I didn't even know that
A-zwas possible. \$\endgroup\$Xcali– Xcali2020年11月17日 20:31:31 +00:00Commented Nov 17, 2020 at 20:31 -
\$\begingroup\$ for
A-zmay be more tricky than what you think, it could be also beA-t, also as cmp is dependent to locale (LC_COLLATE) and on tio, it works [tio.run/##K0gtyjH9/79S3znZyz8/… Try it online!] \$\endgroup\$Nahuel Fouilleul– Nahuel Fouilleul2020年11月18日 08:18:22 +00:00Commented Nov 18, 2020 at 8:18
Python 3, (削除) 107 (削除ここまで) (削除) 101 (削除ここまで) 99 bytes
Saved (削除) 6 (削除ここまで) 8 bytes (and got below 100!) thanks to ovs!!!
lambda s:s.sort(key=lambda k:[*map("CcJOoSsUabDdefGghjLlnPpQqrTtuVvXxyABFHIiKkNRYZzEMmWw".find,k)])
Inputs a list of strings and sorts them accordingly.
-
\$\begingroup\$ Your
keyfunction can be shortened tolambda k:[*map("CcJOoSsUabDdefGghjLlnPpQqrTtuVvXxyABFHIiKkNRYZzEMmWw".find,k)]. \$\endgroup\$ovs– ovs2020年11月17日 10:56:03 +00:00Commented Nov 17, 2020 at 10:56 -
\$\begingroup\$ @ovs Very nice - thanks! :D \$\endgroup\$Noodle9– Noodle92020年11月17日 11:28:00 +00:00Commented Nov 17, 2020 at 11:28
-
\$\begingroup\$ You can save another 2 bytes by modifying the input list instead of returning a new one:
def f(s):s.sort(key=...). \$\endgroup\$ovs– ovs2020年11月17日 12:46:53 +00:00Commented Nov 17, 2020 at 12:46 -
\$\begingroup\$ @ovs Nice one - thanks! :-) \$\endgroup\$Noodle9– Noodle92020年11月17日 13:39:54 +00:00Commented Nov 17, 2020 at 13:39
R, (削除) 159 (削除ここまで) (削除) 134 (削除ここまで) 125 bytes
icuSetCollate(locale="ASCII");s=scan(,"");s[order(chartr("CcJOoSsUabDdefGghjLlnPpQqrTtuVvXxyABFHIiKkNRYZzEMmWw","A-Za-z",s))]
Thanks to Dominic van Essen for -7 bytes.
Similar to others, translate the characters in the string using chartr into an appropriate order, then sort the strings using that order.
The default collation order in the R install on TIO, en_US.UTF8, is very odd: while, for instance, e comes before E, ekF comes after EgHTk (those being the translations of "and" and "begin" in the unchanged test case). So I switch to an ASCII locale, which compares by byte value instead.
-
\$\begingroup\$ I follow you, but I really don't. I have no idea what you're saying in that second paragraph. \$\endgroup\$Sumner18– Sumner182020年11月16日 22:58:46 +00:00Commented Nov 16, 2020 at 22:58
-
1\$\begingroup\$ @Sumner18 Try it online! -- the docs say something about string comparison here (just ctrl + F for "lexicographic"), so I just dug around until I found the commands to give me the right sort order. \$\endgroup\$Giuseppe– Giuseppe2020年11月16日 23:01:25 +00:00Commented Nov 16, 2020 at 23:01
-
\$\begingroup\$ I wish I could say that I understand that too. I'm just a statistician with less than 3 years in the workforce. I'll figure it out someday! \$\endgroup\$Sumner18– Sumner182020年11月16日 23:11:18 +00:00Commented Nov 16, 2020 at 23:11
-
2\$\begingroup\$ Seeing as we define languages by their interpreter here, why not just assume an interpreter running in an ASCII locale? \$\endgroup\$Shaggy– Shaggy2020年11月17日 00:08:45 +00:00Commented Nov 17, 2020 at 0:08
-
1\$\begingroup\$ 152 bytes using
intToUtf8... \$\endgroup\$Dominic van Essen– Dominic van Essen2020年11月17日 15:14:57 +00:00Commented Nov 17, 2020 at 15:14
05AB1E, 38 bytes
ΣžnS•f[?θ$Ÿ)*:TMûò0Æì+Ω£μ\.—g"Ý»θä•.Isk
I/O as a list of list of characters.
Port of @xigoi's Jelly answer, so make sure to upvote him/her as well!
Try it online or verify all test cases.
Explanation:
Σ # Sort the (implicit) input-list by:
žn # Push the constant string "ABC...XYZabc...xyz"
S # Convert it to a list of characters
•f[?θ$Ÿ)*:TMûò0Æì+Ω£μ\.—g"Ý»θä•
"# Push compressed integer 3928442642485912187600397757783525135099072511850472479412437675482
.I # Get the 392...482nd permutation of the character-list
s # Swap to get the current list of characters
k # And get the index of each character in the permutation
# (we sort on those lists of indices)
# (after which the sorted list is output implicitly as result)
See this 05AB1E tip of mine (section How to compress large integers?) to understand why •f[?θ$Ÿ)*:TMûò0Æì+Ω£μ\.—g"Ý»θä• is 3928442642485912187600397757783525135099072511850472479412437675482. (Note that it's 1 lower than the number used in the Jelly answer, because 05AB1E uses 0-based indexing and Jelly uses 1-based indexing instead. This number is generated with the œ¿ Jelly builtin.)
-
1\$\begingroup\$ Slight correction: The Jelly program you linked completely ignores the second argument.
Œ¿is similar toœ¿, but it takes only one argument and uses its sorted version as the second argument. Here it just happens to work because the alphabet is sorted. \$\endgroup\$Adamátor– Adamátor2020年11月17日 12:56:40 +00:00Commented Nov 17, 2020 at 12:56 -
1\$\begingroup\$ @xigoi Ah, thanks a lot for mentioning that. That explains why I sometimes had trouble finding the permutation index in other challenges, when I was using that Jelly builtin. I always assumed I just had to sort the characters in the string prior to using the \$n^{th}\$ permutation when it happened, but apparently I was sometimes just using the wrong builtin in Jelly to calculate \$n\$.. Thanks for letting me know (and I'll edit my answer to reduce confusion if someone else reads it). \$\endgroup\$Kevin Cruijssen– Kevin Cruijssen2020年11月17日 13:39:50 +00:00Commented Nov 17, 2020 at 13:39
-
1\$\begingroup\$ Jelly actually has a naming convention for this: atoms starting with an uppercase letter are monadic and atoms starting with a lowercase letter are dyadic. \$\endgroup\$Adamátor– Adamátor2020年11月17日 17:25:57 +00:00Commented Nov 17, 2020 at 17:25
JavaScript (ES6), 119 bytes
a=>a.sort((a,b)=>(g=s=>[...s].map(c=>"CcJOoSsUabDdefGghjLlnPpQqrTtuVvXxyABFHIiKkNRYZzEMmWw".search(c)+10))(a)>g(b)||-1)
Retina 0.8.2, 141 bytes
T`CcJ\O\oSsUabD\defGg\hj\L\lnP\pQqrTtuVvXxyABF\HIiKkNRYZz\EMmW\w`Ll
O`\w+
T`Ll`CcJ\O\oSsUabD\defGg\hj\L\lnP\pQqrTtuVvXxyABF\HIiKkNRYZz\EMmW\w
Try it online! Link includes test cases. Explanation: Simply replaces all letters with other letters that are in the desired sort order, then replaces then back after sorting the words into order. Note that Transliterate has several shorthand letters (such as L and l of course) which need to be quoted in the master list.
K (ngn/k), 61 bytes
{x@<"CcJOoSsUabDdefGghjLlnPpQqrTtuVvXxyABFHIiKkNRYZzEMmWw"?x}
Takes the input as a list of words; returns a list of words.
{ } a function with parameter x
"CcJOoSsUabDdefGghjLlnPpQqrTtuVvXxyABFHIiKkNRYZzEMmWw"?x find the indeces of each character of the input words in the list of stroke counts
< grade down
x@ take the words at the graded down indeces
-
\$\begingroup\$ Very confusing, needs an explanation \$\endgroup\$Razetime– Razetime2020年11月17日 08:23:02 +00:00Commented Nov 17, 2020 at 8:23
Red, 162 bytes
func[b][forall b[b/1: collect[foreach c b/1[keep index?
find/case"CcJOoSsUabDdefGghjLlnPpQqrTtuVvXxyABFHIiKkNRYZzEMmWw"c]keep
b/1]]sort b forall b[b/1: last b/1]]
-
1\$\begingroup\$ I will be joining with answers in Rebol very soon. \$\endgroup\$Razetime– Razetime2020年11月17日 09:56:43 +00:00Commented Nov 17, 2020 at 9:56
-
\$\begingroup\$ @Razetime That's great, I'm looking forward to it! Is there any online "try it" suite for Rebol? \$\endgroup\$Galen Ivanov– Galen Ivanov2020年11月17日 10:08:27 +00:00Commented Nov 17, 2020 at 10:08
-
\$\begingroup\$ Yeah, REBOL 2 and 3 are there.. I found it through Hostilefork. \$\endgroup\$Razetime– Razetime2020年11月17日 10:10:05 +00:00Commented Nov 17, 2020 at 10:10
-
1\$\begingroup\$ @Razetime Thanks! BTW Arturo language is inspired (among others) by Rebol and has some functional tools that would be useful for golfing. \$\endgroup\$Galen Ivanov– Galen Ivanov2020年11月17日 10:12:29 +00:00Commented Nov 17, 2020 at 10:12
Jelly, 28 bytes
"ẹ1ʋỴḂỤ$*Ɗż©zk’b4żØẠŒuÞFiⱮμÞ
A monadic Link accepting and yielding a list of words (each being a list of characters).
Try it online! Or see the test-suite.
How?
"...’b4żØẠŒuÞFiⱮμÞ - Link: words
μÞ - sort (words) by this monadic chain, f(word):
"...’ - base 250 literal = 12827082404216683880457031718358
b4 - in base 4 -> 2201321220213201120101312211011111212121011101113112
ØẠ - alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
ż - zip -> [[2,'A'],[2,'B'],[0,'C'],...,[2,'z']]
Þ - sort by:
Œu - to upper-case -> [[0,'C'],[0,'c'],[0,'J'],...,[3,'w']]
F - flatten -> [0,'C',0,'c',0,'J',...,3,'w']
Ɱ - for each character, c, in word:
i - index (of c) in (that)
Julia 1.0, (削除) 93 (削除ここまで) 92 bytes
l->sort(l,by=x->findlast.(i for i=x,"CcJOoSsUabDdefGghjLlnPpQqrTtuVvXxyABFHIiKkNRYZzEMmWw"))
works with Julia > 1.3
uses as input and output a list of the words
based on this answer by @Noodle9
edit: replace collect(x) with i for i=x (-1 byte)
Charcoal, (削除) 72 (削除ここまで) 63 bytes
≔⭆4⭆⌕A")"∧·1]↗¿¤≕τB}VC↘"Iι§⭆α+ν↧νληUMθEι⌕ηλW−θυFNoθ⌊ι⊞υ⌊ιEυ⭆ι§ηλ
Try it online! Link is to verbose version of code. Partly inspired by @JonathanAllan's answer. Takes input as a list and outputs the sorted words on separate lines. Explanation:
≔⭆4⭆⌕A")"∧·1]↗¿¤≕τB}VC↘"Iι§⭆α+ν↧νλη
The compressed string ")"∧·1]↗¿¤≕τB}VC↘" expands to 2121001131211121220122113321001111210011011133112122 which represents the decremented stroke counts of each letter in the order AaBbC...Zz. The NESCA is then calculated by extracting the relevant letters in order of stroke count.
UMθEι⌕ηλ
Replace each string with an array of integer offsets into the NESCA.
W−θυFNoθ⌊ι⊞υ⌊ι
For each unique word in the list in ascending order, push each occurrence to the sorted list. (Minus filters out all matches, so we have to explicitly push the duplicates.)
Eυ⭆ι§ηλ
Restore each integer array to its original string and output each string on its own line.
05AB1E, 26 bytes
Σε•a ̄æ·$ÎÐ+MAî+X•4Bžnø{Ssk
Try it online! Beats all other answers.
Σε•...•4Bžnø{Ssk # trimmed program
# implicit input...
Σ # sorted by...
sk # index of...
# (implicit) current element in...
ε # map over letters of...
# (implicit) current element in sort...
sk # in...
# (implicit) flat...
S # list of characters in...
# (implicit) each element of...
{ # sorted list of...
# (implicit) all elements of...
•...• # 12827082404216683880457031718358...
B # in base...
4 # literal...
ø # with each element paired with corresponding element from...
žn # "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
# implicit output
Code Coding Challenges Golf andbutaprecedesGinCcJOoSsUabDdefGghjLlnPpQqrTtuVvXxyABFHIiKkNRYZzEMmWw? \$\endgroup\$ccost 一畫, butCcost 壹畫. \$\endgroup\$