(This post is partly self-plagiarized.)
Objective
Given a Hangul syllable, toggle its vowel harmony.
Introduction to Hangul syllables
Hangul(한글) is the Korean writing system invented by Sejong the Great. Hangul syllables are allocated in Unicode point U+AC00 – U+D7A3. A Hangul syllable consists of an initial consonant, a vowel, and an optional final consonant.
The initial consonants are:
ᄀ ᄁ ᄂ ᄃ ᄄ ᄅ ᄆ ᄇ ᄈ ᄉ ᄊ ᄋ ᄌ ᄍ ᄎ ᄏ ᄐ ᄑ ᄒ
The vowels are:
ᅡ ᅢ ᅣ ᅤ ᅥ ᅦ ᅧ ᅨ ᅩ ᅪ ᅫ ᅬ ᅭ ᅮ ᅯ ᅰ ᅱ ᅲ ᅳ ᅴ ᅵ
The final consonants are:
(none) ᄀ ᄁ ᆪ ᄂ ᆬ ᆭ ᄃ ᄅ ᆰ ᆱ ᆲ ᆳ ᆴ ᆵ ᄚ ᄆ ᄇ ᄡ ᄉ ᄊ ᄋ ᄌ ᄎ ᄏ ᄐ ᄑ ᄒ
For example, 뷁 has initial consonant ᄇ, vowel ᅰ, and final consonant ᆰ.
South Korean dictionary order
The consonants and vowels above are sorted in South Korean dictionary order. The syllables are firstly sorted by initial consonants, secondly by vowels, and finally by (optional) final consonants.
The Unicode block for Hangul syllables contains every consonant/vowel combinations, and is entirely sorted in South Korean dictionary order.
The Unicode block can be seen here, and the first 256 characters are shown for illustrative purpose:
가각갂갃간갅갆갇갈갉갊갋갌갍갎갏감갑값갓갔강갖갗갘같갚갛개객갞갟갠갡갢갣갤갥갦갧갨갩갪갫갬갭갮갯갰갱갲갳갴갵갶갷갸갹갺갻갼갽갾갿걀걁걂걃걄걅걆걇걈걉걊걋걌걍걎걏걐걑걒걓걔걕걖걗걘걙걚걛걜걝걞걟걠걡걢걣걤걥걦걧걨걩걪걫걬걭걮걯거걱걲걳건걵걶걷걸걹걺걻걼걽걾걿검겁겂것겄겅겆겇겈겉겊겋게겍겎겏겐겑겒겓겔겕겖겗겘겙겚겛겜겝겞겟겠겡겢겣겤겥겦겧겨격겪겫견겭겮겯결겱겲겳겴겵겶겷겸겹겺겻겼경겾겿곀곁곂곃계곅곆곇곈곉곊곋곌곍곎곏곐곑곒곓곔곕곖곗곘곙곚곛곜곝곞곟고곡곢곣곤곥곦곧골곩곪곫곬곭곮곯곰곱곲곳곴공곶곷곸곹곺곻과곽곾곿
Vowel Harmony
Korean vowels express vowel harmony as positive-negative pairs. They're paired like the followings:
(Positive) - (Negative)
ᅡ - ᅥ
ᅢ - ᅦ
ᅣ - ᅧ
ᅤ - ᅨ
ᅩ - ᅮ
ᅪ - ᅯ
ᅫ - ᅰ
ᅬ - ᅱ
ᅭ - ᅲ
Note that ᅳ, ᅴ, and ᅵ lack counterparts. More accurately, ᅵ is neither positive nor negative. ᅳ and ᅴ are negative, but their positive counterparts have vanished historically. As such, Hangul syllables whose vowel is ᅳ, ᅴ, or ᅵ are considered to be an invalid input.
I/O format
Flexible. In particular, I/O in Unicode codepoints are okay.
Examples
뷁 → 봵
냥 → 녕
멍 → 망
망 → 멍
-
\$\begingroup\$ I tried answering this in Retina but it came out at 2399 bytes... \$\endgroup\$Neil– Neil2025年02月28日 00:40:05 +00:00Commented Feb 28 at 0:40
4 Answers 4
JavaScript (Node.js), 47 bytes
x=>x+28*((x=(x+68)/28%21)<4?4:x<8?-4:x<13?5:-5)
JavaScript (Node.js), 49 bytes by Arnauld
x=>x+((x%=588)<44|x>519?4:x<156?-4:x<296?5:-5)*28
JavaScript (Node.js), 50 bytes
x=>x+((x+68)%588<224?-4:5)*((x-44)%588<252||-1)*28
x86-64 machine code, 26 bytes
8D 97 54 54 FF FF 6A AC 58 01 C2 78 0A 04 E4 7B F5 01 C2 79 F4 F7 D8 01 F8 C3
Following the standard calling convention for Unix-like systems (from the System V AMD64 ABI), this takes a 32-bit integer in EDI and returns a 32-bit integer in EAX.
The offsets needed to the vowel indices are [4, 4, 4, 4, -4, -4, -4, -4, 5, 5, 5, 5, 5, -5, -5, -5, -5, -5, ?, ?, ?]. The ?s are don't-care values, which will be set to -3 to fit the pattern of n repeats of n and -n. When working on the combined character code, the pattern is scaled up by a factor of 28.
In assembly:
f: lea edx, [rdi - 43948] # Set EDX to the character code minus 43948.
# With this offset, the Hangul characters start at 28*3.
r3: push -84; pop rax # Set EAX to -84 = -28 * 3.
r: add edx, eax # Add EAX to EDX.
js e # Jump if the result is negative (offset in EAX).
add al, -28 # Decrease EAX by 28, using its low byte for shortness.
jpo r3 # Jump back if the sum of the low 8 bits is odd
# (which occurs at -28 * 6) to reset EAX.
add edx, eax # Add EAX to EDX.
jns r # Jump if the result is not negative.
neg eax # Negate EAX (to reverse the offset).
e: add eax, edi # Add EDI (the original character code) to EAX.
ret # Return.
Uiua 0.15.0-dev.2, (削除) 32 (削除ここまで) 31 bytes SBCS
⍜(◿21÷28+68)(⨬(+8◿10+1)◿8⊸<4-4)
Takes a Unicode code point as input and output (the link uses under F @0円 to convert to and from code points).
-1 byte inspired by l4m2's comment
Explanation
⍜(◿21÷28+68)(⨬(+8◿10+1)◿8⊸<4-4)
⍜( ) # do this first, then undo at the end:
◿21÷28+68 # add 68, divide by 28, mod 21
-4 # minus 4
⨬ ⊸<4 # test if less than 4
◿8 # if so, mod 8
(+8◿10+1) # if not, add 1, mod 10, add 8
-
\$\begingroup\$ Do dividing first result longer? \$\endgroup\$l4m2– l4m22025年02月27日 03:33:50 +00:00Commented Feb 27 at 3:33
Charcoal, 38 bytes
c/o+c×ばつ28I§⪪"{⊞¶∨A⧴⧴C〜pNo↙✳⊖@";"2÷+68c/oθ28
Try it online! Link is to verbose version of code. I/O is in characters. Explanation:
θ Input character
c/o Take the ordinal
+ Plus
"..." Compressed look-up table of offsets
⪪ Split into substrings of length
2 Literal integer `2`
§ Indexed by
θ Input character
c/o Take the ordinal
+ Plus
68 Literal integer `68`
÷ Integer divided by
28 Literal integer `28`
I Cast to integer
×ばつ Multiplied by
28 Literal integer `28`
c/o Convert to character
Implicitly print
(You can change the I/O format to ordinals by replacing all of the c/os with Is.)