I made a user-defined function union
in VBA, such that:
- it could take variable parameters
- each parameter is a one-column range like
A1
,A2:A10
; we don't need to consider passing constant values to parameters - we could consider, within one input range, there are no duplicates; but it is very possible to have duplicates among input ranges.
union
combines the input ranges, and keeps the order of the elements. For instance,=union(A1:A5, C1:C2, E1:E3)
has the following expected output in Column I:
I wrote the following code which works. However, it is slow. A union over a list of 4000 rows and a list of 20 rows takes several seconds.
First, I don't know whether the way I coded arrays could be improved. Second, the algorithm just consists in comparing each new element against the accumulating result list; there is no sort, no other techniques. Third, I don't know if there are any existing functions we could use in other objects of VBA (eg, VBA FILTER
function, Collection
, ArrayLists
, Scripting.Dictionary
).
Could anyone propose a more efficient code?
Function getDimension(var As Variant) As Long
On Error GoTo Err
Dim i As Long
Dim tmp As Long
i = 0
Do While True
i = i + 1
tmp = UBound(var, i)
Loop
Err:
getDimension = i - 1
End Function
Function exists(v As Variant, arr As Variant, resCount As Long) As Boolean
If resCount = 0 Then
exists = False
Else
exists = False
i = LBound(arr, 1)
Do While (i <= resCount) And (Not exists)
If arr(i) = v Then
exists = True
End If
i = i + 1
Loop
End If
End Function
' assumption: every input is a range (eg, A1, A1:A2)
' assumption: each input range has only one column
Function union(ParamArray arr() As Variant) As Variant
Dim res As Variant
ReDim res(1 To 100000)
Dim resCount As Long
resCount = 0
For k = LBound(arr) To UBound(arr)
Dim arrk As Variant
Dim v
arrk = arr(k).Value2
If getDimension(arrk) = 0 Then 'case of A1, B1
v = arrk
If Not exists(v, res, resCount) Then
resCount = resCount + 1
res(resCount) = v
End If
ElseIf getDimension(arrk) = 2 Then 'case of A1:A10, B1:B10
For i = LBound(arrk, 1) To UBound(arrk, 1)
v = arrk(i, 1)
If Not exists(v, res, resCount) Then
resCount = resCount + 1
res(resCount) = v
End If
Next i
End If
Next k
ReDim Preserve res(1 To resCount)
union = Application.WorksheetFunction.Transpose(res)
End Function
-
\$\begingroup\$ Can you post your test workbook on GitHub or something? \$\endgroup\$PChemGuy– PChemGuy2021年12月07日 04:04:09 +00:00Commented Dec 7, 2021 at 4:04
-
1\$\begingroup\$ A1:B2 is not a one-column range. If you are worried about performance, you probably also want to be more specific about your values in actual use cases. Are they numbers, arbitrary-length strings, fixed-length strings, strings of a maximum given length, etc.? \$\endgroup\$PChemGuy– PChemGuy2021年12月07日 04:52:19 +00:00Commented Dec 7, 2021 at 4:52
-
2\$\begingroup\$ For an idea how to handle many ranges containing a huge number of values see k-way merge. \$\endgroup\$greybeard– greybeard2021年12月07日 05:38:33 +00:00Commented Dec 7, 2021 at 5:38
-
1\$\begingroup\$ You can codereview.stackexchange.com/q/268446/146810 but it's a bit messy (although only has to be done once). If you can enforce some structure like consecutive columns and pass a single 2D array then you could use the new BYCOL function along with FILTER. Maybe I'll have a look... \$\endgroup\$Greedo– Greedo2021年12月07日 08:41:47 +00:00Commented Dec 7, 2021 at 8:41
-
1\$\begingroup\$ This is getting a bit off topic - but limited to 253 in theory (limitation of LAMBDAS), and the Excel builtin TEXTJOIN is 252 args max. This could be done by giving all the arguments single character names where possible to not hit the max character limit for a formula. However the number you actually need is based on context, I'd say for PRINTF, a function call with more than 10 arguments is already kinda crazy, so the limit of ~100 args I don't consider constraint. More clever options here. Maybe add to your question the real life context - the why? \$\endgroup\$Greedo– Greedo2021年12月07日 09:34:34 +00:00Commented Dec 7, 2021 at 9:34
1 Answer 1
Option Explicit
Adding Option Explicit
to the first line of your modules will force you to declare your variables. Always declare your variables!
getDimension()
Use arrk.CountLarge
instead of this function.
If arrk.CountLarge = 1 Then
Else
End If
union
Avoid naming User Defined Functions after built in functions.
Dim res As Variant ReDim res(1 To 100000)
res
could initialized when it was declared because it is never resized.
Dim res(1 To 100000) As Variant
Refactored Code
Rem Using a Collection
Function Union1(ParamArray Args() As Variant) As Variant
Dim Map As New Collection
Dim Item As Variant
Dim r As Long
On Error Resume Next
For Each Item In Args
If Item.CountLarge > 1 Then
For r = 1 To Item.Rows.Count
Map.Add Item(r, 1).Value, Item(r, 1).Text
Next
Else
Map.Add Item.Value, Item.Value
End If
Next
On Error GoTo 0
If Map.Count = 0 Then Exit Function
Dim Results() As Variant
ReDim Results(1 To Map.Count, 1 To 1)
For r = 1 To Map.Count
Results(r, 1) = Map.Item(r)
Next
Union1 = WorksheetFunction.TextJoin(",", True, Results)
End Function
Rem Using an ArrayList
Function Union2(ParamArray Args() As Variant) As Variant
Dim List As Object
Set List = CreateObject("System.Collections.ArrayList")
Dim r As Long
For Each Item In Args
If IsArray(Item) Then
For r = 1 To Item.Rows.Count
If Not List.Contains(Item(r, 1).Value) Then List.Add Item(r, 1).Value
Next
Else
If Not List.Contains(Item.Value) Then List.Add Item.Value
End If
Next
If List.Count = 0 Then Exit Function
Union2 = WorksheetFunction.Transpose(List.ToArray)
End Function
Rem Using a Scripting.Dictionary
Function Union3(ParamArray Args() As Variant) As Variant
Dim List As Object
Set List = CreateObject("Scripting.Dictionary")
Dim r As Long
For Each Item In Args
If IsArray(Item) Then
For r = 1 To Item.Rows.Count
If Not List.Exists(Item(r, 1).Value) Then List.Add Item(r, 1).Text, Item(r, 1).Value
Next
Else
If Not List.Exists(Item.Value) Then List.Add Item.Text, Item.Value
End If
Next
If List.Count = 0 Then Exit Function
Dim Res
Res = WorksheetFunction.Transpose(List.Items)
Union3 = WorksheetFunction.Transpose(List.Items)
End Function