2
\$\begingroup\$

In my previous post I was looking for a better solution to write a single column of a 2D array into a worksheet besides looping between code and worksheet and finally managed to achieve that.

On deploying my code, I found that the double loop to compare two arrays was inefficient when it came to comparing arrays of 5,000 or more and so I turned to the scripting.dictionary, for which I found the easiest instructions to follow here.

Here is an example that I have tested with arrays of 80,000 rows, returning results in seconds, but I am wondering if there might be some code I could cut or write better:

Option Explicit
Option Base 1
Public Sub arrayMatch()
 Dim firstArray As Variant
 With ThisWorkbook.Worksheets(1)
 firstArray = .Range("A1").CurrentRegion
 End With
 Dim secondArray As Variant
 With ThisWorkbook.Worksheets(2)
 secondArray = .Range("A1").CurrentRegion
 End With
 Dim i As Long
 Dim dictKeyForLookup As String
 Dim dictValueToReturn As String
 Dim dict As Object
 Set dict = CreateObject("Scripting.Dictionary")
 dict.CompareMode = vbTextCompare
 For i = LBound(firstArray, 1) To UBound(firstArray, 1)
 dictKeyForLookup = firstArray(i, 4)
 dictValueToReturn = firstArray(i, 3)
 If Not dict.Exists(dictKeyForLookup) Then
 dict.Add Key:=dictKeyForLookup, Item:=dictValueToReturn
 End If
 Next i
 Dim arrayToPaste As Variant
 ReDim arrayToPaste(UBound(secondArray, 1), 1)
 For i = LBound(secondArray, 1) To UBound(secondArray, 1)
 If dict.Exists(secondArray(i, 1)) Then
 arrayToPaste(i, 1) = dict.Item(secondArray(i, 1))
 End If
 Next i
 With ThisWorkbook.Worksheets(1)
 .Range(.Cells(1, 4), .Cells(UBound(secondArray, 1), 4)).Value = arrayToPaste
 End With
End Sub
asked Sep 23, 2016 at 22:39
\$\endgroup\$

1 Answer 1

4
\$\begingroup\$

This implementation actually looks really solid. A couple stylistic and micro-performance issues I would personally address:

Array base: I'd remove Option Base 1 and explicitly declare your array dimensions. The problem with using Option Base 1 is that it is a module level option, and it "taints" all of the code in the module with a non-default setting. I consider it similar to using #define foo bar in C - it alters the semantics of your code. There's not even really a reason to use it in this case because you are always using the LBound to initialize your loop counters anyway. Changing this line...

ReDim arrayToPaste(UBound(secondArray, 1), 1)

...to...

ReDim arrayToPaste(1 To UBound(secondArray, 1), 1 To 1)

...makes it immediately obvious that the array is 1 based without having to scroll to the top of the module or remember that the base is non-default.

Variable naming: firstArray and secondArray are about as meaningful as a1 and a2. Something like workingArray and comparisonArray would make it a little more obvious what your code is dealing with.

Worksheet references: You are currently using magic numbers to locate the worksheets that you're operating on, and the indexes are based on the ordinal of where the sheet is in the collection. If this is intended to operate on two fixed worksheets, use their class names instead. This has two benefits: first, you aren't using 2 procedure calls (Worksheets and the implicit Items) to get a reference that you already know and have access to . Second, your Sub won't break the first time you add a new worksheet. For example, use With Sheet1 (or whatever it is) instead of With ThisWorkbook.Worksheets(1).

Binding: There is no reason to late bind to the scripting runtime. The interface hasn't changed in this century, and is unlikely to. This is a pretty good performance hit in exchange for absolutely zero upside. Add a reference to Microsoft Scripting Runtime and declare it explicitly:

Dim dict As Scripting.Dictionary
Set dict = New Scripting.Dictionary

Use With appropriately: You're doing this for your worksheets, but it doesn't actually do anything for you:

With ThisWorkbook.Worksheets(1)
 firstArray = .Range("A1").CurrentRegion
End With

You use it exactly once inside the block, so this is exactly the same thing as:

firstArray = ThisWorkbook.Worksheets(1).Range("A1").CurrentRegion
'...or even better (see above)...
firstArray = Sheet1.Range("A1").CurrentRegion

What With does is hold an object reference to avoid having to dereference it multiple times. If you only use it once, it's reduced to 2 extra lines of code and 1 extra indentation level.

That said, the place where it is appropriate is with your Dictionary, and it isn't used there. This is much better:

With New Scripting.Dictionary 'You don't even need the variable.
 .CompareMode = vbTextCompare
 For i = LBound(firstArray, 1) To UBound(firstArray, 1)
 dictKeyForLookup = firstArray(i, 4)
 dictValueToReturn = firstArray(i, 3)
 If Not .Exists(dictKeyForLookup) Then
 .Add Key:=dictKeyForLookup, Item:=dictValueToReturn
 End If
 Next i
 Dim arrayToPaste As Variant
 ReDim arrayToPaste(1 To UBound(secondArray, 1), 1 To 1)
 For i = LBound(secondArray, 1) To UBound(secondArray, 1)
 If .Exists(secondArray(i, 1)) Then
 arrayToPaste(i, 1) = .Item(secondArray(i, 1))
 End If
 Next i
End With 'Dictionary goes out of scope here.

Named arguments: There is no reason to use them here:

dict.Add Key:=dictKeyForLookup, Item:=dictValueToReturn

If you are early bound, the name information is available via both Intellisense and the object browser. If you are late bound, it actually adds more overhead for the runtime to resolve them. Just use the ordinal argument positions:

dict.Add dictKeyForLookup, dictValueToReturn

They're also a potential bug source because they are resolved at runtime for late bound objects (which is the same reason you take a performance hit). That means that if you have a typo in the name (i.e. Keey:=foo), it won't be caught by the compiler. It is also dependant on how the typelib is defined for the class. For example it's possible for some late bound objects to introduce errors by specifying the same variable name multiple times. Excel seems to be prone to this type of error:

Debug.Print WorksheetFunction.Sum(Arg1:=1, Arg1:=42) 'Prints 1
answered Sep 24, 2016 at 0:03
\$\endgroup\$
6
  • 2
    \$\begingroup\$ Well that was fast! "They're a potential bug source... but you're probably not interested" - the reader isn't only the OP! Feel free to expand on that =) \$\endgroup\$ Commented Sep 24, 2016 at 0:08
  • \$\begingroup\$ Worksheets(1) reference should be a variable, it's re-referenced at the end of the procedure ...but then again, using the existing global-scope objects would make that moot :-) \$\endgroup\$ Commented Sep 24, 2016 at 0:16
  • \$\begingroup\$ @Mat'sMug - Re named arguments, good point - I let my hunger get the better of me there (edited). Re Worksheets(1), I actually typed that before I realized using the globals would moot it. ;-) \$\endgroup\$ Commented Sep 24, 2016 at 0:40
  • \$\begingroup\$ Thanks for expanding @Comintern, is mistyping named ranges the only potential bug source there? I will take your advice on board and do a rewrite, I think we may have just stumbled on my final version here :-) \$\endgroup\$ Commented Sep 24, 2016 at 7:31
  • \$\begingroup\$ Ah I see the other bug potential, I thought that VBA would always take the last variable name setting? \$\endgroup\$ Commented Sep 24, 2016 at 8:06

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.