Data
I have 4 Collection objects of the following kind:
dates1=(12/01/02, 15/01/03, 17/01/03, 25/04/04...)
dates2=(15/02/02, 17/06/07, 10/05/10, ...)
numbers1 = (3,5,2,7,...)
numbers2 = (1,-3,-1,...)
The rules of the game are:
dates1
will always have the same length ofnumbers1
,dates2
will always have the same length ofnumbers
, butdates1
is not necessarily as long asdates2
.The dates into the two
dates
collections are always growing within the same collection, but not necessarily the first date ofdates1
will be before the first date ofdates2
.
Problem in words
I would like to create a "merged" collection of these two. In particular,
The
datesTot
collection will have to contain only one date for each record. So, for example, if both "dates1" and "dates2" contain the 14/01/13, then this date will appear in the collection only once.The
numbersTot
collection will have to contain the sum of the two "numbers" collection in case the date is a duplicate, otherwise only the single value.
For example:
numbers1 = (1,3,4)
numbers2 = (4,7,12)
dates1 = (13/01/14, 14/01/15, 16/01/17)
dates2 = (12/01/14, 14/01/15, 18/01/17)
will result in the following:
datesTot = (12/01/14, 13/01/14, 14/01/15, 16/01/17, 18/01/17)
numbersTot = (4, 1, 10, 4, 12) '<-- where 10 is the sum of 7+3, because they both are on 14/01/15
What I have tried so far
I have tried to stretch the human logic described above in a code:
Determining with which date the joint collection should start:
Do While (dates1Index <= dates1Index.Count) And (dates2Index <= dates2Index.Count)
If dates1(dates1Index) < dates2(dates2Index) Then '<-- to put in a while loop with some conditions check
datesTot.Add dates1(dates1Index)
numbersTot.Add numbers1(dates1Index)
dates1Index = dates1Index + 1
ElseIf dates1(dates1Index) > dates2(dates2Index) Then
datesTot.Add dates2(dates2Index)
numbersTot.Add numbers2(dates2Index)
dates2Index = dates2Index+1
Else
datesTot.Add dates1(dates1Index)
numbersTot.Add numbers1(dates1Index) + numbers2(dates2Index)
dates1Index = dates1Index + 1
dates2Index = dates2Index + 1
End If
Loop
The idea, hence, is to keep on looping through the elements like this with the separate indexes, by adding some controls (for the collections lengths) and once reached the shortest collection's length, keep on adding all the others of the longest collection.
My question
The more I look at this method, the less I'm convinced. It's heavy, dirty and (in my opinion) not the best algorithm ever. Is there any built in for collections in VBA to join the elements without duplicates and sort them (I've not found anything in the docs) hence? Or in general, how should I design my code to accomplish this purpose in a flexible way?
-
\$\begingroup\$ @RubberDuck, I can't find your project anymore, but for what I've read before holidays I found it really interesting and would have loved to have a deeper look once re-started the project. Did you migrate it somewhere else, or I've no chance to see it again? \$\endgroup\$Matteo NNZ– Matteo NNZ2015年01月15日 23:00:12 +00:00Commented Jan 15, 2015 at 23:00
-
\$\begingroup\$ There's a good link to that project on my CR profile. Just click my name. \$\endgroup\$RubberDuck– RubberDuck2015年01月16日 22:17:19 +00:00Commented Jan 16, 2015 at 22:17
2 Answers 2
You've got a pretty big bug here. You don't actually end up with unique items in your final list. In fact, with the sample data you gave, 1/15/14
ends up in the final output twice. You should be entering that Else
block, but you're not. This is because you're not comparing what's already stored in datesTot
to that value in dates2
, you're comparing it to what is at the current index of dates1
. (By the way, don't shorten your variable names like that. datesTotal
or dateTotals
would be a much better name).
I think this stems from the fact that you don't loop all the way through the first collection, before you shift the index on the second. Your attempt at optimizing the performance of the code has led to a bug. A wise dev once told me:
- Make it work.
- Make it right.
- Then make it fast.
The second problem I see is your data structure. You're keeping related data in two separate, but parallel collections. That's a recipe for disaster. Why? Because you can't overwrite an item in a collection. You have to add the replacement to the collection and then remove the original. That leaves a ton of room to make a mistake.
A correct algorithm requires three loops.
For Each outerItem in outerCollecton
For Each innerItem in innerCollection
If outerItem = innerItem Then
For Each resultItem In results
If innerItem = resultItem then
resultItem.Count = resultItem.Count + innerItem.Count
Exit For
Else
results.Add innerItem
End If
Next resultItem
End If
Next innerItem
Next outerItem
Implementing that in a clean way would be easier said than done though, because you can't overwrite a value in a collection. That means you will need to keep single collection of objects that represent your data structure. For simplicity's sake, you could create a class module as simple as this.
Option Explicit
Public DateValue As Date
Public Count As Long
Alternatively, assuming your original data that you want to merge has only one instance of each date, then a Scripting.Dictionary
could be a great data structure to use. See my example below.
Public Sub test()
Dim dates1 As New Scripting.Dictionary
dates1.Add #1/14/2013#, 1
dates1.Add #1/15/2014#, 3
dates1.Add #1/17/2016#, 4
Dim dates2 As New Scripting.Dictionary
dates2.Add #12/1/2014#, 4
dates2.Add #1/15/2014#, 7
dates2.Add #1/17/2018#, 12
' first copy the first dict to a new one to return
Dim results As New Dictionary
Dim currentKey
For Each currentKey In dates1.Keys
results.Add currentKey, dates1(currentKey)
Next
' second dict
For Each currentKey In dates2.Keys
If results.Exists(currentKey) Then
results(currentKey) = results(currentKey) + dates2(currentKey)
Else
results.Add currentKey, dates2(currentKey)
End If
Next
' print the results
For Each currentKey In results.Keys
Debug.Print "Date: " & currentKey & vbTab & "Count: " & results(currentKey)
Next
End Sub
Output:
Date: 1/14/2013 Count: 1 Date: 1/15/2014 Count: 10 Date: 1/17/2016 Count: 4 Date: 12/1/2014 Count: 4 Date: 1/17/2018 Count: 12
-
1\$\begingroup\$ First question here in Code Review, I must admit testing the answers is a bit longer than SO but you have actually hit a bug I didn't notice at all and that would have led to huge problems. The code works perfectly and fast now, thanks a lot for the time you spent on it! \$\endgroup\$Matteo NNZ– Matteo NNZ2015年01月17日 10:21:38 +00:00Commented Jan 17, 2015 at 10:21
-
1\$\begingroup\$ You're welcome. I'll admit it, this one had me stumped for a while. Come back anytime you're not feeling so great about the code you've gotten working. \$\endgroup\$RubberDuck– RubberDuck2015年01月17日 11:01:00 +00:00Commented Jan 17, 2015 at 11:01
-
\$\begingroup\$ I'll definitely do it. Thanks a lot once again. \$\endgroup\$Matteo NNZ– Matteo NNZ2015年01月17日 11:06:38 +00:00Commented Jan 17, 2015 at 11:06
1) Your answer seems to implement merge sort, which is an asymptotically optimal solution.
2) This is one of the most beautiful VBA implementations I have seen. It is so elegant, even moving common code to a different block will require complicated logic.
Enjoy your christmas early and quit bothering other programmers :)
-
\$\begingroup\$ I couldn't imagine my code was really good, I actually thought it was crap :) Well, seen your answer explains why it is good, I would say I will edit it with my final solution and then accept it as an answer. Can I do the edit if you don't mind, so it will be helpful to someone having my some issue in the future? \$\endgroup\$Matteo NNZ– Matteo NNZ2014年12月23日 17:12:55 +00:00Commented Dec 23, 2014 at 17:12
-
\$\begingroup\$ This is not really a code review answer. Please improve your answer in a way that actually reviews the code. \$\endgroup\$Phrancis– Phrancis2015年01月16日 22:10:14 +00:00Commented Jan 16, 2015 at 22:10
-
\$\begingroup\$ @Phrancis it looks like a code review answer to me, it just happens to be wrong. I'll admit it. I thought the same thing when I first looked at OP's solution. \$\endgroup\$RubberDuck– RubberDuck2015年01月16日 22:50:37 +00:00Commented Jan 16, 2015 at 22:50