I have multiple multidimensional arrays that each have the same structure with different lengths.
arr1 = [["23 Jan",3],["29 Jan",10],["19 Jan",23],["25 Jan",23]]
arr2 = [["31 Jan",7],["29 Jan",12],["01 Feb",4]]
My result would be a merged array without duplicate dates ordered by the date. For the dates that get combined, like 29 Jan
above, I would add the second values together.
Here is what the outcome to the above arrays should be:
arr = [["19 Jan",23],["23 Jan",3],["25 Jan",23],["29 Jan",22],["31 Jan",7],["01 Feb",4]]
As you can see the two with the date, 29 Jan
, were combined and added.
Here is my current solution:
(arr1 | arr2).group_by(&:first).
map { |k, v| [k, v.map(&:last).inject(:+)] }.
sort_by { |x| x.first.to_date }
Which does indeed get my the correct result, I just want to refactor it. Any suggestions/insights would be appreciated.
3 Answers 3
It looks as if you have chosen the wrong data structure. The multidimensional Array
looks very much like a Hash
. Even your code mentions the terms key (k
) and value (v
).
If you were to use a Hash
, you could use Hash#merge
:
one = Hash[[["23 Jan",3],["29 Jan",10],["19 Jan",23],["25 Jan",23]]]
two = Hash[[["31 Jan",7],["29 Jan",12],["01 Feb",4]]]
one.merge(two) { |_value, old, new| old + new }
# => {"23 Jan"=>3, "29 Jan"=>22, "19 Jan"=>23, "25 Jan"=>23, "31 Jan"=>7, "01 Feb"=>4}
Note: If you're using a recent version of Ruby, you can replace Hash::[]
with Array#to_hash
If you really do need multidimensional Array
s, you could still use Hash
es internally and convert to the required format (Hash#to_a
) when needed.
-
\$\begingroup\$ Yea I need the result to be a multidimensional array, but of course converting the hash to an array does that very well. this is also much more descriptive than my original solution. The only thing left is to sort the hashes by date. \$\endgroup\$Justin– Justin2015年02月05日 15:22:09 +00:00Commented Feb 5, 2015 at 15:22
Your solution is algorithmically fine. My main critique is that it isn't obvious at a glance what the code does. In particular, k
, v
, :first
, and :last
can only be understood if you mentally step through the code to follow what the data structure looks like at each point.
I'd like to present an alternative solution. It's much less clever — four statements instead of one expression. However, it's easier to see that there is a date-to-value mapping, that I'm grouping by date, summing the values, and sorting by date.
aggregator = Hash.new(0)
arr1.each { |date, val| aggregator[date.to_date] += val }
arr2.each { |date, val| aggregator[date.to_date] += val }
aggregator.to_a.sort_by { |date, val| date }
In addition, I've chosen to call #to_date
earlier. Presumably, your #to_date
method would treat "01 Feb"
and "1 Feb"
equally, in case your input is sloppy.
-
\$\begingroup\$ Yea, I couldn't agree more that my code presented could have be written in a clearer way. This definitely does a good job of that. The returned array's dates are formatted incorrectly though,
[2015年1月19日, 23]
, but this can be fixed easily. \$\endgroup\$Justin– Justin2015年02月05日 15:16:20 +00:00Commented Feb 5, 2015 at 15:16 -
\$\begingroup\$ I added your method to the benchmark after removing
to_date
. \$\endgroup\$Cary Swoveland– Cary Swoveland2015年02月10日 05:28:51 +00:00Commented Feb 10, 2015 at 5:28
Suppose the arrays were large and you were looking for an efficient solution.
Let arr1
and arr2
denote the two arrays, with arr2
being the smaller of the two.
First, convert arr2
to a hash:
h2 = Hash[arr2]
Now map each element of arr1
and append to the result the elements of arr2
that do not have matching dates among the elements of arr1
:
arr1.map { |k1,v1| [k1, h2.key?(k1) ? v1 + h2.delete(k1) : v] }.concat(h2.to_a)
Note that h2.delete(k)
does double-duty: it deletes the key-value pair k,v2
from h2
and returns v2
to be added to v1
.
You can of course tack on .sort
to sort the array by date.
I suspect this may be reasonably fast considering that only one hash is constructed and that is for the smaller of the two arrays.
Let's see how this compares with the other methods that have been suggested.
Test case
def test(n1, n2, f1)
m = [(n1*f1).to_i, n2].min # elements of `arr1` in `arr2`
[Array.new(n1) { |i| [i,i] }.shuffle,
Array.new(n2) { |i| [i+m-n2,i+i] }.shuffle]
end
Suppose, for example, arr1
and arr2
are to have 10
and 8
tuples, respectively, and in 10*0.5 => 5
cases, the first element of a tuple in arr1
matches the first element of a tuple in arr2
. Thus 0-5 => 5
tuples in arr1
have no match in arr2
and 8-5 => 3
tuples in arr2
have no match in arr1
.
Then
arr1, arr2 = test(10,8,0.5)
#=> [[[7, 7], [0, 0], [2, 2], [4, 4], [3, 3], [5, 5], [1, 1], [8, 8], [9, 9], [6, 6]],
# [[-2, 2], [-3, 0], [3, 12], [-1, 4], [4, 14], [0, 6], [2, 10], [1, 8]]]
arr1.sort
#=> [[0, 0], [1, 1], [2, 2], [3, 3], [4, 4], [5, 5], [6, 6], [7, 7], [8, 8], [9, 9]]
arr2.sort
#=> [[-3, 0], [-2, 2], [-1, 4], [0, 6], [1, 8], [2, 10], [3, 12], [4, 14]]
Methods to be benchmarked
def justin(arr1, arr2)
(arr1 + arr2).group_by(&:first)
.map { |k, v| [k, v.map(&:last).inject(:+)] }
.sort_by { |x| x.first }
end
I've corrected justin
by changing arr1 | arr2
to arr1 + arr2
, and removed to_date
.
def btea(arr1, arr2)
one = Hash[arr1]
two = Hash[arr2]
one.merge(two) { |_value, old, new| old + new }.to_a
end
In btea
I converted the merged hash back to an array.
def success(arr1, arr2)
aggregator = Hash.new(0)
arr1.each { |date, val| aggregator[date] += val }
arr2.each { |date, val| aggregator[date] += val }
aggregator.to_a.sort_by { |date, val| date }
end
I removed .to_date
in success
.
def cary(arr1, arr2)
h2 = Hash[arr2]
arr1.map { |d1,n1| [d1, h2.key?(d1) ? n1 + h2.delete(d1) : n1] }.concat(h2.to_a)
end
Confirm methods give the same results
arr1, arr2 = test(1000,800,0.8)
bt = btea(arr1, arr2).sort
su = success(arr1, arr2).sort
justin(arr1, arr2).sort == bt #=> true
bt == su #=> true
su == cary(arr1, arr2).sort #=> true
Benchmark code
require 'benchmark'
def bench(arr1, arr2)
Benchmark.bm(10) do |bm|
bm.report "justin" do
justin(arr1, arr2)
end
bm.report "btea" do
btea(arr1, arr2)
end
bm.report "success" do
success(arr1, arr2)
end
bm.report "cary" do
cary(arr1,arr2)
end
end
end
Results
arr1, arr2 = test(1_000_000, 800_000, 0.8)
bench(arr1, arr2)
user system total real
justin 6.650000 0.160000 6.810000 ( 6.855140)
btea 5.290000 0.100000 5.390000 ( 5.439332)
success 2.970000 0.020000 2.990000 ( 2.999598)
cary 3.970000 0.110000 4.080000 ( 4.090756)
arr1, arr2 = test(2_000_000, 200_000, 0.5)
bench(arr1, arr2)
user system total real
justin 10.690000 0.190000 10.880000 ( 10.958374)
btea 8.650000 0.180000 8.830000 ( 8.868550)
success 5.240000 0.010000 5.250000 ( 5.268781)
cary 1.330000 0.020000 1.350000 ( 1.351937)
arr1, arr2 = test(1_000_000, 1_000_000, 0.5)
bench(arr1, arr2)
user system total real
justin 10.660000 0.160000 10.820000 ( 10.845127)
btea 5.810000 0.110000 5.920000 ( 5.922358)
success 4.060000 0.020000 4.080000 ( 4.078427)
cary 2.400000 0.040000 2.440000 ( 2.447106)
k
,v
, andx
, I don't see any need to refactor anything here. I'd probably write the same thing. \$\endgroup\$|
, two pairs are not duplicated if they are on different arrays, aren't they? It seems you needuniq
for each array. \$\endgroup\$Array#+
. Here it will have the unfortunate (or desired?) effect of ignoring items inarr2
that are equal to items inarr1
(e.g. if["1 Jan", 1]
is in botharr1
andarr2
, the OPs code will return["Jan 1", 1]
instead of["Jan 1", 2]
). If this is intentional, both answers here are not valid, unfortunately. Could you clarify Justin? \$\endgroup\$arr1
contain["23 Jan",3]
and["23 Jan",5]
. If "yes" (I'm guessing the answer is "no") and ifarr2
contained["23 Jan",7]
, would["23 Jan",3+5+7]
be part of your result? The efficiency as well as the correctness of the solution depends on whetherarr1
orarr2
can contain pairs with the same date. \$\endgroup\$