I am working on a Ruby gem to pull data from the Wikidata API. The data is deeply nested, as it contains a lot of repetition.
I have written a piece of Ruby (sans Rails) code that enables an array of hashes to be merged together, with any duplicate key-value pairs being stored as an array under the same key.
For example:
[{:a=>1, :b=>2}, {:b=>3, :c=>4}, {:d=>5, :a=>6}].merge_hashes
gives:
{:a=>[1, 6], :b=>[2, 3], :c=>4, :d=>5}
I began with the following:
class Array
def merge_hashes
raise StandardError.new "Array is not an Array of Hashes" unless self.all? {|e| e.is_a? Hash}
self.each_with_object({}) do |el, h|
el.each do |k, v|
if h[k].nil?
h[k] = v
else
h[k] = Array.new([h[k]]) << v
h[k] = h[k].flatten
end
end
end
end
end
And refactored to:
class Array
def merge_hashes
raise StandardError.new "Array is not an Array of Hashes" unless self.all? {|e| e.is_a? Hash}
self.each_with_object({}) { |el, h| el.each { |k, v| h[k].nil? ? h[k] = v : h[k] = (Array.new([h[k]]) << v).flatten } }
end
end
Are there any other improvements I could make? Is that an acceptable exception to throw?
2 Answers 2
I think the refactor is barely readable, the first version is better. Anyway, if you use functional abstractions, instead of doing a imperative processing from scratch, algorithms tend to be more clear.
In this case, we can use the abstraction Enumerable#map_by (a group_by
variation where you can control both the grouping function and the accumulated values) and write:
require 'facets'
class Array
def merge_hashes
flat_map(&:to_a).map_by { |k, v| [k, v] }
end
end
Note that this snippet always returns an array as a value, instead of the scalar/list you have.
-
\$\begingroup\$ Thanks for the answer and for the link to facets; I didn't know about it. \$\endgroup\$Bradley Marques– Bradley Marques2016年05月30日 06:22:32 +00:00Commented May 30, 2016 at 6:22
-
1\$\begingroup\$ Huh, I see your point about the mixed data types. Thanks for the good advice. \$\endgroup\$Bradley Marques– Bradley Marques2016年06月01日 10:16:03 +00:00Commented Jun 1, 2016 at 10:16
I've tried to simplify assignment condition (splat operator is used instead of #flatten
):
class Array
def merge_hashes_alt
# ...
self.each_with_object({}) do |el, h|
el.each { |k, v| h[k] = h[k] ? [*h[k]] << v : v }
end
end
end
-
\$\begingroup\$ I think this is good, because it doesn't check for
nil?
explicitly and only has oneh[k] =
assignment. In addition, it gets rid of theArray.new()
and theflatten
, as you mentioned. \$\endgroup\$Bradley Marques– Bradley Marques2016年05月30日 06:17:48 +00:00Commented May 30, 2016 at 6:17