I am trying to make the fastest #to_struct
method in Ruby's Hash.
I am including a use case and benchmark so you can run and see if you have really improved the code.
This is my implementation and the benchmark is included. The time at the bottom is the time it takes on my machine. How can I make this faster?
require "json"
require 'benchmark'
require 'bigdecimal/math'
class Hash
def to_struct
k = self.keys
klass = k.map(&:to_s).sort_by {|word| word.downcase}.join.capitalize
begin
Kernel.const_get("Struct::" + klass).new(*self.values_at(*k))
rescue NameError
Struct.new(klass, *(k)).new(*self.values_at(*k))
end
end
end
# You have a hash that you have built in your app
sample_hash = {
foo_key: "foo_val",
bar_key: "bar_val",
baz_key: "baz_val",
foo1_key: "foo_val",
bar1_key: "bar_val",
baz1_key: "baz_val",
foo2_key: "foo_val",
bar2_key: "bar_val",
baz2_key: "baz_val",
foo3_key: "foo_val",
bar3_key: "bar_val",
baz3_key: "baz_val",
foo4_key: "foo_val",
bar4_key: "bar_val",
baz4_key: "baz_val",
foo5_key: "foo_val",
bar5_key: "bar_val",
baz5_key: "baz_val",
foo6_key: "foo_val",
bar6_key: "bar_val",
baz6_key: "baz_val",
foo7_key: "foo_val",
bar7_key: "bar_val",
baz7_key: "baz_val",
}
# Then you have JSON coming from some external api
json_response = "{\"qux_key\":\"qux_val\",\"quux_key\":\"quux_val\",\"corge_key\":\"corge_val\"}"
hash_with_unknown_keys = JSON.parse(json_response)
# Merge these two together
sample_hash.merge!(hash_with_unknown_keys)
iterations = 100_000
Benchmark.bm do |bm|
bm.report "#to_struct" do
iterations.times do
# Would be super nice if I could convert this to a struct with a method
# Somehow a bit faster than the explicit example below and much faster than open struct
sample_struct = sample_hash.to_struct
unless sample_struct.foo_key == "foo_val"
raise "Wrong value"
end
end
end
bm.report "Struct" do
iterations.times do
sample_struct = Struct.new(*sample_hash.keys)
.new(*sample_hash.values)
unless sample_struct.foo_key == "foo_val"
raise "Wrong value"
end
end
end
bm.report "OpenStruct" do
iterations.times do
sample_open_struct = OpenStruct.new(sample_hash)
unless sample_open_struct.foo_key == "foo_val"
raise "Wrong value"
end
end
end
end
# user system total real # #to_struct 4.030000 0.010000 4.040000 ( 4.072031) # Struct 6.870000 0.290000 7.160000 ( 7.320459) # OpenStruct 23.550000 0.210000 23.760000 ( 23.895187)
-
\$\begingroup\$ The benchmark is testing the time it takes to run #to_struct together with the time it takes to access the structure's attributes. Is that correct? \$\endgroup\$Wayne Conrad– Wayne Conrad2014年05月24日 22:44:01 +00:00Commented May 24, 2014 at 22:44
-
\$\begingroup\$ @WayneConrad The benchmark has three different ways to instantiate similar ruby objects. The first two are two different ways to create structs and the third instantiates an open struct. And yes all three benchmarks access the objects attributes. \$\endgroup\$mpiccolo– mpiccolo2014年05月27日 03:14:29 +00:00Commented May 27, 2014 at 3:14
-
2\$\begingroup\$ I think the reason that the to_struct version is faster than the plain Struct version is that it's not doing the full task on each iteration. It only constructs a new Struct subclass on the first iteration, and then re-uses it on all the subsequent iterations. It's therefore not a strictly fair comparison, as the Struct method has to create a new class for each iteration. I think you'd find that if you randomised the hash keys for each iteration, there would be little difference between the to_struct version and the Struct version. \$\endgroup\$AlexT– AlexT2014年07月05日 12:15:46 +00:00Commented Jul 5, 2014 at 12:15
-
\$\begingroup\$ And why do the key joining to get a struct class name? It's optional, any reason to have it? \$\endgroup\$DiegoSalazar– DiegoSalazar2016年09月04日 16:30:51 +00:00Commented Sep 4, 2016 at 16:30
1 Answer 1
Use OpenHash and Ruby>= 2.3.0
Starting with MRI 2.3.0, your benchmark using OpenHash gets fast. Very fast:
ruby-2.2.5: ruby 2.2.5p319 (2016年04月26日 revision 54774) [x86_64-linux]
user system total real
#to_struct 1.780000 0.000000 1.780000 ( 1.774490)
Struct 9.100000 0.000000 9.100000 ( 9.099619)
OpenStruct 7.910000 0.000000 7.910000 ( 7.911342)
ruby-2.3.0: ruby 2.3.0p0 (2015年12月25日 revision 53290) [x86_64-linux]
user system total real
#to_struct 1.700000 0.000000 1.700000 ( 1.695587)
Struct 7.660000 0.000000 7.660000 ( 7.660869)
OpenStruct 0.650000 0.000000 0.650000 ( 0.658817)
With the latest MRI, Your #to_struct method gets a bit of a speed boost as well.
ruby-2.4.1: ruby 2.4.1p111 (2017年03月22日 revision 58053) [x86_64-linux]
user system total real
#to_struct 1.460000 0.000000 1.460000 ( 1.459063)
Struct 7.420000 0.000000 7.420000 ( 7.416505)
OpenStruct 0.660000 0.000000 0.660000 ( 0.658009)
So if you can, use Ruby>= ruby 2.3.0, and use OpenHash.
How to make #to_struct faster
I made the following changes for performance:
- Eliminate the mapping of hash keys using #downcase.
- Use #values instead of #values_at (values are always the same order as keys). See https://stackoverflow.com/a/31425274/238886
and these for clarity:
- Eliminate the temporary for
self.keys
- DRY the creation of the struct instance
- Removed
self
references.
With these changes, the code is:
class Hash
def new_to_struct
klass_name = keys.map(&:to_s).sort.join.capitalize
klass = begin
Kernel.const_get("Struct::" + klass_name)
rescue NameError
Struct.new(klass_name, *keys)
end
klass.new(*values)
end
end
and the benchmark (run against ruby-2.4.1):
user system total real
#to_struct 1.410000 0.000000 1.410000 ( 1.403908)
#new_to_struct 0.760000 0.000000 0.760000 ( 0.757548)
Struct 7.060000 0.010000 7.070000 ( 7.075619)
OpenStruct 0.650000 0.000000 0.650000 ( 0.649057)
These changes get to_struct close to OpenStruct, but still not as fast.