Your program is given a string consisting entirely of lowercase letters at STDIN (or closest alternative). The program must then output a truthy or falsey value, depending on whether the input is valid romaji.
Rules:
- It must be possible to divide the entire string into a sequence of kana without any leftover characters.
- Each kana can be a single vowel (
aeiou
) - Each kana can also be a consonant p, g, z, b, d, k, s, t, n, h, m, or r followed by a vowel. For example, ka and te are valid kana, but qa is not.
- The exceptions to the above rule are that zi, di, du, si, ti, and tu are not valid kana.
- The following are also valid kana: n, wa, wo, ya, yu, yo, ji, vu, fu, chi, shi, tsu.
- If a particular consonant is valid before an i (i.e ki, pi), the i can be replaced by a ya, yu, or yo and still be valid (i.e kya, kyu, kyo)
- Exceptions to the above rule are chi and shi, for which the y has to be dropped too (i.e cha, chu, cho, sha, shu, sho)
- It is also valid to double consonants if they are the first character of a kana (kka is valid but chhi is not)
- Shortest answer wins. All regular loopholes are disallowed.
List of all valid kana:
Can have double consonant:
ba, bu, be, bo, bi
ga, gu, ge, go, gi
ha, hu, he, ho, hi
ka, ku, ke, ko, ki
ma, mu, me, mo, mi
na, nu, ne, no, ni
pa, pu, pe, po, pi
ra, ru, re, ro, ri
sa, su, se, so,
za, zu, ze, zo,
da, de, do,
ta, te, to,
wa, wo,
ya, yu, yo,
fu,
vu
ji
Can not have double consonant:
a, i, u, e, o,
tsu,
chi, cha, cho, chu,
shi, sha, sho, shu,
n
Test cases
Pass:
kyoto
watashi
tsunami
bunpu
yappari
Fail:
yi
chhi
zhi
kyi
2 Answers 2
Ruby, (削除) 96 (削除ここまで) 149 bytes
Regex solution to match all the valid kana. Interestingly, "ecchi" is not valid according to the current rules, but perhaps it's for the best.
->s{s.gsub(/(?![dt]u)(sh|ch|([gbknhmrp])2円?y?|([zdst])3円?)?[auo]|(\g<2>)?4円?[ie]|(\g<3>)5円?e|ww?[ao]|n|tsu|([fv])6円?u|jj?i|j?y?[aou]|yy[aou]/){}==""}
Try it online! feat. Cruel Angel's Thesis
-
\$\begingroup\$ It failes on simple tests
zi
andzye
\$\endgroup\$Dead Possum– Dead Possum2017年05月24日 13:32:21 +00:00Commented May 24, 2017 at 13:32 -
\$\begingroup\$ @DeadPossum fixed. \$\endgroup\$Value Ink– Value Ink2017年05月24日 23:01:13 +00:00Commented May 24, 2017 at 23:01
Python 2, 166 bytes
Long regex solution
Try it online
I think that f-strings from 3.[something] python can help to shorten it by replacing repeated [auo
and {1,2}
.
Unfortunatetly I can't check it by myself now :c
import re
lambda x:re.sub('[bghkmnpr]~([auoei]|y[auo])|[sz]~[auoe]|[dt]~[aeo]|w~[ao]|([fv]~|ts)u|(j~|[cs]h)(i|y[auo])|y~[auo]|[auoien]'.replace('~','{1,2}'),'',x)==''
-
\$\begingroup\$
re.sub('~','{1,2}',(your regex)
is shorter than(your regex).replace('~','{1,2}')
by 1 byte. \$\endgroup\$Value Ink– Value Ink2017年05月24日 23:03:18 +00:00Commented May 24, 2017 at 23:03 -
\$\begingroup\$ Your regex is also failing on a simple test case:
bku
. Doubled consonants have to be the same consonant. \$\endgroup\$Value Ink– Value Ink2017年05月24日 23:04:37 +00:00Commented May 24, 2017 at 23:04
n
cannot be doubled. I know enough about the Japanese alphabets to say that. Ifn
was doubled, it would need to have a vowel after, but then it wouldn't ben
. So ifkanna
was a word (just making it up), it'd actually beka n na
. \$\endgroup\$unicodedata
, but it'll definitely be longer than a regex solution. Partial program \$\endgroup\$