Serializing tabular data in ruby -- is map, flatten, hash the correct approach?

Question 1

I wanted a hash which lets me reference Japanese syllables by their romanized names. In hindsight I could have searched for an existing one column table, but I wanted to improve my ruby by writing a function which serializes these multi-column tables I found on wikipedia:

 katakana:
 v_eng: a i u e o
 v_jap: ア イ ウ エ オ
 K: カ キ ク ケ コ
 S: サ シ ス セ ソ
 T: タ チ ツ テ ト
 N: ナ ニ ヌ ネ ノ
 H: ハ ヒ フ ヘ ホ
 M: マ ミ ム メ モ
 Y: ヤ _ ユ _ ヨ
 R: ラ リ ル レ ロ
 W: ワ ヰ _ ヱ ヲ
hiragana:
 v_eng: a i u e o
 v_jap: あ い う え お
 k: か き く け こ
 s: さ し す せ そ
 t: た ち つ て と
 n: な に ぬ ね の
 h: は ひ ふ へ ほ
 m: ま み む め も
 y: や _ ゆ _ よ
 r: ら り る れ ろ
 w: わ ゐ _ ゑ を
 nn: ん _ _ _ _

I was able to create the serializing function, syllabarys():

#!/usr/bin/env ruby
require 'yaml'
def syllabarys
@syllabarys ||= lambda{
 raw_data = YAML.load_file 'japanese.dic'
 syllabary_names = ['katakana','hiragana']
 a = syllabary_names.map{|syllabary|
 syllabary_data = raw_data[syllabary]
 veng,vjap = syllabary_data['v_eng'].split, syllabary_data['v_jap'].split
 vowels = Hash[*veng.zip(vjap).flatten] #zipped flat array => splat
 #jp row strings by en consonants:
 jrsbec = syllabary_data.select{|con,row|con =~ /^[KSTNHMYRWkstnhmyrwN]$/}
 #jp row arrays by en consonants:
 jrabec = Hash[*jrsbec.map{|con,row|[con,row.split]}.flatten(1)]
 #en vowels with jp row arrays by en consonants:
 evwjrabec = Hash[jrabec.map{|con,row|[con,Hash[veng.zip(row)]]}] #array of hashes => no splat
 #jp syllables by en syllables: 
 #outer map provides en consonant to inner map
 #inner map creates the dictionary we want in array form, e.g. [#K#[['Ka','カ'],..], #S..]
 #flatten(1) removes outer array created by outer map [['Ka','カ'],..] => no splat
 jp_by_en = Hash[evwjrabec.map{|con,row|row.map{|vowel,jp_syl| [con+vowel,jp_syl] }}.flatten(1)]
 #remove forgotten syllables:
 jp_by_en.select{|en_syl,jp_syl|jp_syl != '_'}
 }
 Hash[*syllabary_names.zip(a).flatten(1)]
}.call
end

And it returns the desired hash:

{"katakana"=>{"Ka"=>"カ", "Ki"=>"キ", "Ku"=>"ク", "Ke"=>"ケ", "Ko"=>"コ", "Sa"=>"サ", "Si"=>"シ", "Su"=>"ス", "Se"=>"セ", "So"=>"ソ", "Ta"=>"タ", "Ti"=>"チ", "Tu"=>"ツ", "Te"=>"テ", "To"=>"ト", "Na"=>"ナ", "Ni"=>"ニ", "Nu"=>"ヌ", "Ne"=>"ネ", "No"=>"ノ", "Ha"=>"ハ", "Hi"=>"ヒ", "Hu"=>"フ", "He"=>"ヘ", "Ho"=>"ホ", "Ma"=>"マ", "Mi"=>"ミ", "Mu"=>"ム", "Me"=>"メ", "Mo"=>"モ", "Ya"=>"ヤ", "Yu"=>"ユ", "Yo"=>"ヨ", "Ra"=>"ラ", "Ri"=>"リ", "Ru"=>"ル", "Re"=>"レ", "Ro"=>"ロ", "Wa"=>"ワ", "Wi"=>"ヰ", "We"=>"ヱ", "Wo"=>"ヲ"}, "hiragana"=>{"ka"=>"か", "ki"=>"き", "ku"=>"く", "ke"=>"け", "ko"=>"こ", "sa"=>"さ", "si"=>"し", "su"=>"す", "se"=>"せ", "so"=>"そ", "ta"=>"た", "ti"=>"ち", "tu"=>"つ", "te"=>"て", "to"=>"と", "na"=>"な", "ni"=>"に", "nu"=>"ぬ", "ne"=>"ね", "no"=>"の", "ha"=>"は", "hi"=>"ひ", "hu"=>"ふ", "he"=>"へ", "ho"=>"ほ", "ma"=>"ま", "mi"=>"み", "mu"=>"む", "me"=>"め", "mo"=>"も", "ya"=>"や", "yu"=>"ゆ", "yo"=>"よ", "ra"=>"ら", "ri"=>"り", "ru"=>"る", "re"=>"れ", "ro"=>"ろ", "wa"=>"わ", "wi"=>"ゐ", "we"=>"ゑ", "wo"=>"を"}}

This task was a good exercise in index-free coding, but I notice my code exhibits a recurring pattern of map|zip->flatten->Hash. I'd like to know if this is this a normal pattern in ruby or if there's a better way of serializing tabular data.

Question 2

Feedback:

Instead of ||= lambda { ... }.call, you can use ||= begin ... end
Instead of Hash[*arr] you can use arr.to_h in Ruby 2.0+
You don't need to convert everything to a hash if you just want to use it for .map later -- [[1, 2], [3, 4]].map { |k, v| k + v } #=> [3, 7]
Instead of .map { ... }.flatten(1), you can use .flat_map { ... }
If you won't be using a variable in a block, you can use _ instead, like .map { |key, _| key }

I rewrote the code to be like what I'd code it today.

require 'yaml'
def do_it(raw)
 map = raw["v_eng"].split.zip(raw["v_jap"].split)
 raw.select do |k, _|
 k.size == 1
 end.flat_map do |pre, japs|
 map.zip(japs.split).map do |(post, _), jap|
 [pre + post, jap] unless jap == '_'
 end.compact
 end.to_h
end
want = {"katakana"=>{"Ka"=>"カ", "Ki"=>"キ", "Ku"=>"ク", "Ke"=>"ケ", "Ko"=>"コ", "Sa"=>"サ", "Si"=>"シ", "Su"=>"ス", "Se"=>"セ", "So"=>"ソ", "Ta"=>"タ", "Ti"=>"チ", "Tu"=>"ツ", "Te"=>"テ", "To"=>"ト", "Na"=>"ナ", "Ni"=>"ニ", "Nu"=>"ヌ", "Ne"=>"ネ", "No"=>"ノ", "Ha"=>"ハ", "Hi"=>"ヒ", "Hu"=>"フ", "He"=>"ヘ", "Ho"=>"ホ", "Ma"=>"マ", "Mi"=>"ミ", "Mu"=>"ム", "Me"=>"メ", "Mo"=>"モ", "Ya"=>"ヤ", "Yu"=>"ユ", "Yo"=>"ヨ", "Ra"=>"ラ", "Ri"=>"リ", "Ru"=>"ル", "Re"=>"レ", "Ro"=>"ロ", "Wa"=>"ワ", "Wi"=>"ヰ", "We"=>"ヱ", "Wo"=>"ヲ"}, "hiragana"=>{"ka"=>"か", "ki"=>"き", "ku"=>"く", "ke"=>"け", "ko"=>"こ", "sa"=>"さ", "si"=>"し", "su"=>"す", "se"=>"せ", "so"=>"そ", "ta"=>"た", "ti"=>"ち", "tu"=>"つ", "te"=>"て", "to"=>"と", "na"=>"な", "ni"=>"に", "nu"=>"ぬ", "ne"=>"ね", "no"=>"の", "ha"=>"は", "hi"=>"ひ", "hu"=>"ふ", "he"=>"へ", "ho"=>"ほ", "ma"=>"ま", "mi"=>"み", "mu"=>"む", "me"=>"め", "mo"=>"も", "ya"=>"や", "yu"=>"ゆ", "yo"=>"よ", "ra"=>"ら", "ri"=>"り", "ru"=>"る", "re"=>"れ", "ro"=>"ろ", "wa"=>"わ", "wi"=>"ゐ", "we"=>"ゑ", "wo"=>"を"}}
raw = YAML.load_file 'japanese.dic'
p do_it(raw["katakana"]) == want["katakana"] #=> true
p do_it(raw["hiragana"]) == want["hiragana"] #=> true

Hope that helps. Let me know if you want any other clarification(s) in a comment below.

Dogbert DogbertDogbert 2911 gold badge2 silver badges5 bronze badges · Accepted Answer · 2014-02-01 15:29:35Z

Feedback:

Instead of ||= lambda { ... }.call, you can use ||= begin ... end
Instead of Hash[*arr] you can use arr.to_h in Ruby 2.0+
You don't need to convert everything to a hash if you just want to use it for .map later -- [[1, 2], [3, 4]].map { |k, v| k + v } #=> [3, 7]
Instead of .map { ... }.flatten(1), you can use .flat_map { ... }
If you won't be using a variable in a block, you can use _ instead, like .map { |key, _| key }

I rewrote the code to be like what I'd code it today.

require 'yaml'
def do_it(raw)
 map = raw["v_eng"].split.zip(raw["v_jap"].split)
 raw.select do |k, _|
 k.size == 1
 end.flat_map do |pre, japs|
 map.zip(japs.split).map do |(post, _), jap|
 [pre + post, jap] unless jap == '_'
 end.compact
 end.to_h
end
want = {"katakana"=>{"Ka"=>"カ", "Ki"=>"キ", "Ku"=>"ク", "Ke"=>"ケ", "Ko"=>"コ", "Sa"=>"サ", "Si"=>"シ", "Su"=>"ス", "Se"=>"セ", "So"=>"ソ", "Ta"=>"タ", "Ti"=>"チ", "Tu"=>"ツ", "Te"=>"テ", "To"=>"ト", "Na"=>"ナ", "Ni"=>"ニ", "Nu"=>"ヌ", "Ne"=>"ネ", "No"=>"ノ", "Ha"=>"ハ", "Hi"=>"ヒ", "Hu"=>"フ", "He"=>"ヘ", "Ho"=>"ホ", "Ma"=>"マ", "Mi"=>"ミ", "Mu"=>"ム", "Me"=>"メ", "Mo"=>"モ", "Ya"=>"ヤ", "Yu"=>"ユ", "Yo"=>"ヨ", "Ra"=>"ラ", "Ri"=>"リ", "Ru"=>"ル", "Re"=>"レ", "Ro"=>"ロ", "Wa"=>"ワ", "Wi"=>"ヰ", "We"=>"ヱ", "Wo"=>"ヲ"}, "hiragana"=>{"ka"=>"か", "ki"=>"き", "ku"=>"く", "ke"=>"け", "ko"=>"こ", "sa"=>"さ", "si"=>"し", "su"=>"す", "se"=>"せ", "so"=>"そ", "ta"=>"た", "ti"=>"ち", "tu"=>"つ", "te"=>"て", "to"=>"と", "na"=>"な", "ni"=>"に", "nu"=>"ぬ", "ne"=>"ね", "no"=>"の", "ha"=>"は", "hi"=>"ひ", "hu"=>"ふ", "he"=>"へ", "ho"=>"ほ", "ma"=>"ま", "mi"=>"み", "mu"=>"む", "me"=>"め", "mo"=>"も", "ya"=>"や", "yu"=>"ゆ", "yo"=>"よ", "ra"=>"ら", "ri"=>"り", "ru"=>"る", "re"=>"れ", "ro"=>"ろ", "wa"=>"わ", "wi"=>"ゐ", "we"=>"ゑ", "wo"=>"を"}}
raw = YAML.load_file 'japanese.dic'
p do_it(raw["katakana"]) == want["katakana"] #=> true
p do_it(raw["hiragana"]) == want["hiragana"] #=> true

Hope that helps. Let me know if you want any other clarification(s) in a comment below.

Stack Exchange Network

Serializing tabular data in ruby -- is map, flatten, hash the correct approach?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Serializing tabular data in ruby -- is map, flatten, hash the correct approach?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions