3
\$\begingroup\$

I wanted a hash which lets me reference Japanese syllables by their romanized names. In hindsight I could have searched for an existing one column table, but I wanted to improve my ruby by writing a function which serializes these multi-column tables I found on wikipedia:

 katakana:
 v_eng: a i u e o
 v_jap: ア イ ウ エ オ
 K: カ キ ク ケ コ
 S: サ シ ス セ ソ
 T: タ チ ツ テ ト
 N: ナ ニ ヌ ネ ノ
 H: ハ ヒ フ ヘ ホ
 M: マ ミ ム メ モ
 Y: ヤ _ ユ _ ヨ
 R: ラ リ ル レ ロ
 W: ワ ヰ _ ヱ ヲ
hiragana:
 v_eng: a i u e o
 v_jap: あ い う え お
 k: か き く け こ
 s: さ し す せ そ
 t: た ち つ て と
 n: な に ぬ ね の
 h: は ひ ふ へ ほ
 m: ま み む め も
 y: や _ ゆ _ よ
 r: ら り る れ ろ
 w: わ ゐ _ ゑ を
 nn: ん _ _ _ _

I was able to create the serializing function, syllabarys():

#!/usr/bin/env ruby
require 'yaml'
def syllabarys
@syllabarys ||= lambda{
 raw_data = YAML.load_file 'japanese.dic'
 syllabary_names = ['katakana','hiragana']
 a = syllabary_names.map{|syllabary|
 syllabary_data = raw_data[syllabary]
 veng,vjap = syllabary_data['v_eng'].split, syllabary_data['v_jap'].split
 vowels = Hash[*veng.zip(vjap).flatten] #zipped flat array => splat
 #jp row strings by en consonants:
 jrsbec = syllabary_data.select{|con,row|con =~ /^[KSTNHMYRWkstnhmyrwN]$/}
 #jp row arrays by en consonants:
 jrabec = Hash[*jrsbec.map{|con,row|[con,row.split]}.flatten(1)]
 #en vowels with jp row arrays by en consonants:
 evwjrabec = Hash[jrabec.map{|con,row|[con,Hash[veng.zip(row)]]}] #array of hashes => no splat
 #jp syllables by en syllables: 
 #outer map provides en consonant to inner map
 #inner map creates the dictionary we want in array form, e.g. [#K#[['Ka','カ'],..], #S..]
 #flatten(1) removes outer array created by outer map [['Ka','カ'],..] => no splat
 jp_by_en = Hash[evwjrabec.map{|con,row|row.map{|vowel,jp_syl| [con+vowel,jp_syl] }}.flatten(1)]
 #remove forgotten syllables:
 jp_by_en.select{|en_syl,jp_syl|jp_syl != '_'}
 }
 Hash[*syllabary_names.zip(a).flatten(1)]
}.call
end

And it returns the desired hash:

{"katakana"=>{"Ka"=>"カ", "Ki"=>"キ", "Ku"=>"ク", "Ke"=>"ケ", "Ko"=>"コ", "Sa"=>"サ", "Si"=>"シ", "Su"=>"ス", "Se"=>"セ", "So"=>"ソ", "Ta"=>"タ", "Ti"=>"チ", "Tu"=>"ツ", "Te"=>"テ", "To"=>"ト", "Na"=>"ナ", "Ni"=>"ニ", "Nu"=>"ヌ", "Ne"=>"ネ", "No"=>"ノ", "Ha"=>"ハ", "Hi"=>"ヒ", "Hu"=>"フ", "He"=>"ヘ", "Ho"=>"ホ", "Ma"=>"マ", "Mi"=>"ミ", "Mu"=>"ム", "Me"=>"メ", "Mo"=>"モ", "Ya"=>"ヤ", "Yu"=>"ユ", "Yo"=>"ヨ", "Ra"=>"ラ", "Ri"=>"リ", "Ru"=>"ル", "Re"=>"レ", "Ro"=>"ロ", "Wa"=>"ワ", "Wi"=>"ヰ", "We"=>"ヱ", "Wo"=>"ヲ"}, "hiragana"=>{"ka"=>"か", "ki"=>"き", "ku"=>"く", "ke"=>"け", "ko"=>"こ", "sa"=>"さ", "si"=>"し", "su"=>"す", "se"=>"せ", "so"=>"そ", "ta"=>"た", "ti"=>"ち", "tu"=>"つ", "te"=>"て", "to"=>"と", "na"=>"な", "ni"=>"に", "nu"=>"ぬ", "ne"=>"ね", "no"=>"の", "ha"=>"は", "hi"=>"ひ", "hu"=>"ふ", "he"=>"へ", "ho"=>"ほ", "ma"=>"ま", "mi"=>"み", "mu"=>"む", "me"=>"め", "mo"=>"も", "ya"=>"や", "yu"=>"ゆ", "yo"=>"よ", "ra"=>"ら", "ri"=>"り", "ru"=>"る", "re"=>"れ", "ro"=>"ろ", "wa"=>"わ", "wi"=>"ゐ", "we"=>"ゑ", "wo"=>"を"}}

This task was a good exercise in index-free coding, but I notice my code exhibits a recurring pattern of map|zip->flatten->Hash. I'd like to know if this is this a normal pattern in ruby or if there's a better way of serializing tabular data.

asked Jan 29, 2014 at 19:16
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

Feedback:

  • Instead of ||= lambda { ... }.call, you can use ||= begin ... end
  • Instead of Hash[*arr] you can use arr.to_h in Ruby 2.0+
  • You don't need to convert everything to a hash if you just want to use it for .map later -- [[1, 2], [3, 4]].map { |k, v| k + v } #=> [3, 7]
  • Instead of .map { ... }.flatten(1), you can use .flat_map { ... }
  • If you won't be using a variable in a block, you can use _ instead, like .map { |key, _| key }

I rewrote the code to be like what I'd code it today.

require 'yaml'
def do_it(raw)
 map = raw["v_eng"].split.zip(raw["v_jap"].split)
 raw.select do |k, _|
 k.size == 1
 end.flat_map do |pre, japs|
 map.zip(japs.split).map do |(post, _), jap|
 [pre + post, jap] unless jap == '_'
 end.compact
 end.to_h
end
want = {"katakana"=>{"Ka"=>"カ", "Ki"=>"キ", "Ku"=>"ク", "Ke"=>"ケ", "Ko"=>"コ", "Sa"=>"サ", "Si"=>"シ", "Su"=>"ス", "Se"=>"セ", "So"=>"ソ", "Ta"=>"タ", "Ti"=>"チ", "Tu"=>"ツ", "Te"=>"テ", "To"=>"ト", "Na"=>"ナ", "Ni"=>"ニ", "Nu"=>"ヌ", "Ne"=>"ネ", "No"=>"ノ", "Ha"=>"ハ", "Hi"=>"ヒ", "Hu"=>"フ", "He"=>"ヘ", "Ho"=>"ホ", "Ma"=>"マ", "Mi"=>"ミ", "Mu"=>"ム", "Me"=>"メ", "Mo"=>"モ", "Ya"=>"ヤ", "Yu"=>"ユ", "Yo"=>"ヨ", "Ra"=>"ラ", "Ri"=>"リ", "Ru"=>"ル", "Re"=>"レ", "Ro"=>"ロ", "Wa"=>"ワ", "Wi"=>"ヰ", "We"=>"ヱ", "Wo"=>"ヲ"}, "hiragana"=>{"ka"=>"か", "ki"=>"き", "ku"=>"く", "ke"=>"け", "ko"=>"こ", "sa"=>"さ", "si"=>"し", "su"=>"す", "se"=>"せ", "so"=>"そ", "ta"=>"た", "ti"=>"ち", "tu"=>"つ", "te"=>"て", "to"=>"と", "na"=>"な", "ni"=>"に", "nu"=>"ぬ", "ne"=>"ね", "no"=>"の", "ha"=>"は", "hi"=>"ひ", "hu"=>"ふ", "he"=>"へ", "ho"=>"ほ", "ma"=>"ま", "mi"=>"み", "mu"=>"む", "me"=>"め", "mo"=>"も", "ya"=>"や", "yu"=>"ゆ", "yo"=>"よ", "ra"=>"ら", "ri"=>"り", "ru"=>"る", "re"=>"れ", "ro"=>"ろ", "wa"=>"わ", "wi"=>"ゐ", "we"=>"ゑ", "wo"=>"を"}}
raw = YAML.load_file 'japanese.dic'
p do_it(raw["katakana"]) == want["katakana"] #=> true
p do_it(raw["hiragana"]) == want["hiragana"] #=> true

Hope that helps. Let me know if you want any other clarification(s) in a comment below.

answered Feb 1, 2014 at 15:29
\$\endgroup\$
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.