I wanted a hash which lets me reference Japanese syllables by their romanized names. In hindsight I could have searched for an existing one column table, but I wanted to improve my ruby by writing a function which serializes these multi-column tables I found on wikipedia:
katakana:
v_eng: a i u e o
v_jap: ア イ ウ エ オ
K: カ キ ク ケ コ
S: サ シ ス セ ソ
T: タ チ ツ テ ト
N: ナ ニ ヌ ネ ノ
H: ハ ヒ フ ヘ ホ
M: マ ミ ム メ モ
Y: ヤ _ ユ _ ヨ
R: ラ リ ル レ ロ
W: ワ ヰ _ ヱ ヲ
hiragana:
v_eng: a i u e o
v_jap: あ い う え お
k: か き く け こ
s: さ し す せ そ
t: た ち つ て と
n: な に ぬ ね の
h: は ひ ふ へ ほ
m: ま み む め も
y: や _ ゆ _ よ
r: ら り る れ ろ
w: わ ゐ _ ゑ を
nn: ん _ _ _ _
I was able to create the serializing function, syllabarys():
#!/usr/bin/env ruby
require 'yaml'
def syllabarys
@syllabarys ||= lambda{
raw_data = YAML.load_file 'japanese.dic'
syllabary_names = ['katakana','hiragana']
a = syllabary_names.map{|syllabary|
syllabary_data = raw_data[syllabary]
veng,vjap = syllabary_data['v_eng'].split, syllabary_data['v_jap'].split
vowels = Hash[*veng.zip(vjap).flatten] #zipped flat array => splat
#jp row strings by en consonants:
jrsbec = syllabary_data.select{|con,row|con =~ /^[KSTNHMYRWkstnhmyrwN]$/}
#jp row arrays by en consonants:
jrabec = Hash[*jrsbec.map{|con,row|[con,row.split]}.flatten(1)]
#en vowels with jp row arrays by en consonants:
evwjrabec = Hash[jrabec.map{|con,row|[con,Hash[veng.zip(row)]]}] #array of hashes => no splat
#jp syllables by en syllables:
#outer map provides en consonant to inner map
#inner map creates the dictionary we want in array form, e.g. [#K#[['Ka','カ'],..], #S..]
#flatten(1) removes outer array created by outer map [['Ka','カ'],..] => no splat
jp_by_en = Hash[evwjrabec.map{|con,row|row.map{|vowel,jp_syl| [con+vowel,jp_syl] }}.flatten(1)]
#remove forgotten syllables:
jp_by_en.select{|en_syl,jp_syl|jp_syl != '_'}
}
Hash[*syllabary_names.zip(a).flatten(1)]
}.call
end
And it returns the desired hash:
{"katakana"=>{"Ka"=>"カ", "Ki"=>"キ", "Ku"=>"ク", "Ke"=>"ケ", "Ko"=>"コ", "Sa"=>"サ", "Si"=>"シ", "Su"=>"ス", "Se"=>"セ", "So"=>"ソ", "Ta"=>"タ", "Ti"=>"チ", "Tu"=>"ツ", "Te"=>"テ", "To"=>"ト", "Na"=>"ナ", "Ni"=>"ニ", "Nu"=>"ヌ", "Ne"=>"ネ", "No"=>"ノ", "Ha"=>"ハ", "Hi"=>"ヒ", "Hu"=>"フ", "He"=>"ヘ", "Ho"=>"ホ", "Ma"=>"マ", "Mi"=>"ミ", "Mu"=>"ム", "Me"=>"メ", "Mo"=>"モ", "Ya"=>"ヤ", "Yu"=>"ユ", "Yo"=>"ヨ", "Ra"=>"ラ", "Ri"=>"リ", "Ru"=>"ル", "Re"=>"レ", "Ro"=>"ロ", "Wa"=>"ワ", "Wi"=>"ヰ", "We"=>"ヱ", "Wo"=>"ヲ"}, "hiragana"=>{"ka"=>"か", "ki"=>"き", "ku"=>"く", "ke"=>"け", "ko"=>"こ", "sa"=>"さ", "si"=>"し", "su"=>"す", "se"=>"せ", "so"=>"そ", "ta"=>"た", "ti"=>"ち", "tu"=>"つ", "te"=>"て", "to"=>"と", "na"=>"な", "ni"=>"に", "nu"=>"ぬ", "ne"=>"ね", "no"=>"の", "ha"=>"は", "hi"=>"ひ", "hu"=>"ふ", "he"=>"へ", "ho"=>"ほ", "ma"=>"ま", "mi"=>"み", "mu"=>"む", "me"=>"め", "mo"=>"も", "ya"=>"や", "yu"=>"ゆ", "yo"=>"よ", "ra"=>"ら", "ri"=>"り", "ru"=>"る", "re"=>"れ", "ro"=>"ろ", "wa"=>"わ", "wi"=>"ゐ", "we"=>"ゑ", "wo"=>"を"}}
This task was a good exercise in index-free coding, but I notice my code exhibits a recurring pattern of map|zip->flatten->Hash. I'd like to know if this is this a normal pattern in ruby or if there's a better way of serializing tabular data.
1 Answer 1
Feedback:
- Instead of
||= lambda { ... }.call
, you can use||= begin ... end
- Instead of
Hash[*arr]
you can usearr.to_h
in Ruby 2.0+ - You don't need to convert everything to a hash if you just want to use it for
.map
later --[[1, 2], [3, 4]].map { |k, v| k + v }
#=>[3, 7]
- Instead of
.map { ... }.flatten(1)
, you can use.flat_map { ... }
- If you won't be using a variable in a block, you can use
_
instead, like.map { |key, _| key }
I rewrote the code to be like what I'd code it today.
require 'yaml'
def do_it(raw)
map = raw["v_eng"].split.zip(raw["v_jap"].split)
raw.select do |k, _|
k.size == 1
end.flat_map do |pre, japs|
map.zip(japs.split).map do |(post, _), jap|
[pre + post, jap] unless jap == '_'
end.compact
end.to_h
end
want = {"katakana"=>{"Ka"=>"カ", "Ki"=>"キ", "Ku"=>"ク", "Ke"=>"ケ", "Ko"=>"コ", "Sa"=>"サ", "Si"=>"シ", "Su"=>"ス", "Se"=>"セ", "So"=>"ソ", "Ta"=>"タ", "Ti"=>"チ", "Tu"=>"ツ", "Te"=>"テ", "To"=>"ト", "Na"=>"ナ", "Ni"=>"ニ", "Nu"=>"ヌ", "Ne"=>"ネ", "No"=>"ノ", "Ha"=>"ハ", "Hi"=>"ヒ", "Hu"=>"フ", "He"=>"ヘ", "Ho"=>"ホ", "Ma"=>"マ", "Mi"=>"ミ", "Mu"=>"ム", "Me"=>"メ", "Mo"=>"モ", "Ya"=>"ヤ", "Yu"=>"ユ", "Yo"=>"ヨ", "Ra"=>"ラ", "Ri"=>"リ", "Ru"=>"ル", "Re"=>"レ", "Ro"=>"ロ", "Wa"=>"ワ", "Wi"=>"ヰ", "We"=>"ヱ", "Wo"=>"ヲ"}, "hiragana"=>{"ka"=>"か", "ki"=>"き", "ku"=>"く", "ke"=>"け", "ko"=>"こ", "sa"=>"さ", "si"=>"し", "su"=>"す", "se"=>"せ", "so"=>"そ", "ta"=>"た", "ti"=>"ち", "tu"=>"つ", "te"=>"て", "to"=>"と", "na"=>"な", "ni"=>"に", "nu"=>"ぬ", "ne"=>"ね", "no"=>"の", "ha"=>"は", "hi"=>"ひ", "hu"=>"ふ", "he"=>"へ", "ho"=>"ほ", "ma"=>"ま", "mi"=>"み", "mu"=>"む", "me"=>"め", "mo"=>"も", "ya"=>"や", "yu"=>"ゆ", "yo"=>"よ", "ra"=>"ら", "ri"=>"り", "ru"=>"る", "re"=>"れ", "ro"=>"ろ", "wa"=>"わ", "wi"=>"ゐ", "we"=>"ゑ", "wo"=>"を"}}
raw = YAML.load_file 'japanese.dic'
p do_it(raw["katakana"]) == want["katakana"] #=> true
p do_it(raw["hiragana"]) == want["hiragana"] #=> true
Hope that helps. Let me know if you want any other clarification(s) in a comment below.