1
\$\begingroup\$

I made a user-defined function union in VBA, such that:

  1. it could take variable parameters
  2. each parameter is a one-column range like A1, A2:A10; we don't need to consider passing constant values to parameters
  3. we could consider, within one input range, there are no duplicates; but it is very possible to have duplicates among input ranges.
  4. union combines the input ranges, and keeps the order of the elements. For instance, =union(A1:A5, C1:C2, E1:E3) has the following expected output in Column I:

enter image description here

I wrote the following code which works. However, it is slow. A union over a list of 4000 rows and a list of 20 rows takes several seconds.

First, I don't know whether the way I coded arrays could be improved. Second, the algorithm just consists in comparing each new element against the accumulating result list; there is no sort, no other techniques. Third, I don't know if there are any existing functions we could use in other objects of VBA (eg, VBA FILTER function, Collection, ArrayLists, Scripting.Dictionary).

Could anyone propose a more efficient code?

Function getDimension(var As Variant) As Long
 On Error GoTo Err
 Dim i As Long
 Dim tmp As Long
 i = 0
 Do While True
 i = i + 1
 tmp = UBound(var, i)
 Loop
Err:
 getDimension = i - 1
End Function
Function exists(v As Variant, arr As Variant, resCount As Long) As Boolean
 If resCount = 0 Then
 exists = False
 Else
 exists = False
 i = LBound(arr, 1)
 Do While (i <= resCount) And (Not exists)
 If arr(i) = v Then
 exists = True
 End If
 i = i + 1
 Loop
 End If
End Function
' assumption: every input is a range (eg, A1, A1:A2)
' assumption: each input range has only one column
Function union(ParamArray arr() As Variant) As Variant
 Dim res As Variant
 ReDim res(1 To 100000)
 Dim resCount As Long
 resCount = 0
 
 For k = LBound(arr) To UBound(arr)
 Dim arrk As Variant
 Dim v
 arrk = arr(k).Value2
 If getDimension(arrk) = 0 Then 'case of A1, B1
 v = arrk
 If Not exists(v, res, resCount) Then
 resCount = resCount + 1
 res(resCount) = v
 End If
 ElseIf getDimension(arrk) = 2 Then 'case of A1:A10, B1:B10
 For i = LBound(arrk, 1) To UBound(arrk, 1)
 v = arrk(i, 1)
 If Not exists(v, res, resCount) Then
 resCount = resCount + 1
 res(resCount) = v
 End If
 Next i
 End If
 Next k
 
 ReDim Preserve res(1 To resCount)
 union = Application.WorksheetFunction.Transpose(res)
End Function
asked Dec 7, 2021 at 3:29
\$\endgroup\$
11
  • \$\begingroup\$ Can you post your test workbook on GitHub or something? \$\endgroup\$ Commented Dec 7, 2021 at 4:04
  • 1
    \$\begingroup\$ A1:B2 is not a one-column range. If you are worried about performance, you probably also want to be more specific about your values in actual use cases. Are they numbers, arbitrary-length strings, fixed-length strings, strings of a maximum given length, etc.? \$\endgroup\$ Commented Dec 7, 2021 at 4:52
  • 2
    \$\begingroup\$ For an idea how to handle many ranges containing a huge number of values see k-way merge. \$\endgroup\$ Commented Dec 7, 2021 at 5:38
  • 1
    \$\begingroup\$ You can codereview.stackexchange.com/q/268446/146810 but it's a bit messy (although only has to be done once). If you can enforce some structure like consecutive columns and pass a single 2D array then you could use the new BYCOL function along with FILTER. Maybe I'll have a look... \$\endgroup\$ Commented Dec 7, 2021 at 8:41
  • 1
    \$\begingroup\$ This is getting a bit off topic - but limited to 253 in theory (limitation of LAMBDAS), and the Excel builtin TEXTJOIN is 252 args max. This could be done by giving all the arguments single character names where possible to not hit the max character limit for a formula. However the number you actually need is based on context, I'd say for PRINTF, a function call with more than 10 arguments is already kinda crazy, so the limit of ~100 args I don't consider constraint. More clever options here. Maybe add to your question the real life context - the why? \$\endgroup\$ Commented Dec 7, 2021 at 9:34

1 Answer 1

2
\$\begingroup\$

Option Explicit

Adding Option Explicit to the first line of your modules will force you to declare your variables. Always declare your variables!

getDimension()

Use arrk.CountLarge instead of this function.

 If arrk.CountLarge = 1 Then
 
 Else
 
 End If

union

Avoid naming User Defined Functions after built in functions.

Dim res As Variant
ReDim res(1 To 100000)

res could initialized when it was declared because it is never resized.

Dim res(1 To 100000) As Variant

Refactored Code

Rem Using a Collection
Function Union1(ParamArray Args() As Variant) As Variant
 Dim Map As New Collection
 Dim Item As Variant
 Dim r As Long
 On Error Resume Next
 For Each Item In Args
 If Item.CountLarge > 1 Then
 For r = 1 To Item.Rows.Count
 Map.Add Item(r, 1).Value, Item(r, 1).Text
 Next
 Else
 Map.Add Item.Value, Item.Value
 End If
 Next
 On Error GoTo 0
 
 If Map.Count = 0 Then Exit Function
 
 Dim Results() As Variant
 ReDim Results(1 To Map.Count, 1 To 1)
 
 For r = 1 To Map.Count
 Results(r, 1) = Map.Item(r)
 Next
 
 Union1 = WorksheetFunction.TextJoin(",", True, Results)
 
End Function
Rem Using an ArrayList
Function Union2(ParamArray Args() As Variant) As Variant
 Dim List As Object
 Set List = CreateObject("System.Collections.ArrayList")
 
 Dim r As Long
 For Each Item In Args
 If IsArray(Item) Then
 For r = 1 To Item.Rows.Count
 If Not List.Contains(Item(r, 1).Value) Then List.Add Item(r, 1).Value
 Next
 Else
 If Not List.Contains(Item.Value) Then List.Add Item.Value
 End If
 Next
 
 If List.Count = 0 Then Exit Function
 Union2 = WorksheetFunction.Transpose(List.ToArray)
 
End Function
Rem Using a Scripting.Dictionary
Function Union3(ParamArray Args() As Variant) As Variant
 Dim List As Object
 Set List = CreateObject("Scripting.Dictionary")
 
 Dim r As Long
 For Each Item In Args
 If IsArray(Item) Then
 For r = 1 To Item.Rows.Count
 If Not List.Exists(Item(r, 1).Value) Then List.Add Item(r, 1).Text, Item(r, 1).Value
 Next
 Else
 If Not List.Exists(Item.Value) Then List.Add Item.Text, Item.Value
 End If
 Next
 
 If List.Count = 0 Then Exit Function
 
 Dim Res
 Res = WorksheetFunction.Transpose(List.Items)
 
 Union3 = WorksheetFunction.Transpose(List.Items)
 
End Function
answered Dec 8, 2021 at 2:02
\$\endgroup\$
2
  • \$\begingroup\$ It works, and is much faster. Thank you. \$\endgroup\$ Commented Dec 8, 2021 at 7:18
  • \$\begingroup\$ Thanks for accepting my answer @SoftTimur \$\endgroup\$ Commented Dec 8, 2021 at 22:53

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.