6
\$\begingroup\$

I use the following code in many of my answers on Stack Overflow, to mimic the new TEXTJOIN function available in Office 365 Excel:

Function TEXTJOIN(delim As String, skipblank As Boolean, arr)
 Dim d As Long
 Dim c As Long
 Dim arr2()
 Dim t As Long, y As Long
 t = -1
 y = -1
 If TypeName(arr) = "Range" Then
 arr2 = arr.Value
 Else
 arr2 = arr
 End If
 On Error Resume Next
 t = UBound(arr2, 2)
 y = UBound(arr2, 1)
 On Error GoTo 0
 If t >= 0 And y >= 0 Then
 For c = LBound(arr2, 1) To UBound(arr2, 1)
 For d = LBound(arr2, 1) To UBound(arr2, 2)
 If arr2(c, d) <> "" Or Not skipblank Then
 TEXTJOIN = TEXTJOIN & arr2(c, d) & delim
 End If
 Next d
 Next c
 Else
 For c = LBound(arr2) To UBound(arr2)
 If arr2(c) <> "" Or Not skipblank Then
 TEXTJOIN = TEXTJOIN & arr2(c) & delim
 End If
 Next c
 End If
 TEXTJOIN = Left(TEXTJOIN, Len(TEXTJOIN) - Len(delim))
End Function

I have successfully made it so it works with both ranges and arrays, so it works with:

=TEXTJOIN(",",TRUE,A1:B7)

As well as

{=TEXTJOIN(",",TRUE,IF(A1:A7 = "x",B1:B7,""))}

The question is: Is this the best method to check for array vs. range? Is there a better method?

As stated this works, nicely with both, but I have to think that I am doing it in a round about manner.

asked Nov 13, 2017 at 16:38
\$\endgroup\$
1
  • \$\begingroup\$ Thanks for posting that here. I thought it was pretty efficient but wow it's cool to see what you can do to tighten it up. Lots of good info in the answers and I certainly still have tons to learn! \$\endgroup\$ Commented Nov 14, 2017 at 2:01

3 Answers 3

7
\$\begingroup\$

Okay, let's first eliminate the easy stuff Rubberduck points out:

  • Function is implicitly Public and implicitly returns a Variant - should be a String.
  • All parameters are implicitly passed ByRef (and semantically should be ByVal).
  • Parameter arr is implicitly Variant,
  • Local variables c, d, t, y, arr2 all have terrible, meaningless names (arr, matey! ..I like calling it Pirate Notation ;)
  • String-returning Left$ function should be used over Variant-returning Left function.
  • vbNullString should probably be preferred over "" empty string literal.

With default settings Rubberduck will also complain about Dim t As Long, y As Long, because having multiple declarations in a single instruction isn't ideal.

The variables are declared at the top of the function rather than as close as possible to their usage, which makes it hard to see what's used where.


Type-checking isn't type safe:

If TypeName(arr) = "Range" Then
 arr2 = arr.Value
Else
 arr2 = arr
End If

If the Microsoft Word object model is referenced, you can pass a Word.Range object and it will happily take it; same if I made my own Range class and gave it an instance. Use the TypeOf operator to perform compile-time type-safe type checks:

If TypeOf arr Is Excel.Range Then
 'definitely an Excel Range object
Else
 'could be anything
End If

Note, in the Else branch arr could literally be anything - but you're assuming it's an array. You could use the IsArray function to make sure of that, and then you can also assert on the upper bound using a rather ugly helper function, to make sure you're looking at the 2D array you're expecting.

Else
 Debug.Assert IsArray(arr)
 Debug.Assert GetArrayDimSize(arr) = 2
 arr2 = arr
End If

With the array dimension count known/asserted, the On Error Resume Next statement that follows can be removed... and t can be renamed sourceColumns or colCount or whatever, and y can be renamed sourceRows or rowCount or whatever - and arr2 could be renamed to sourceArray:

sourceColumns = UBound(sourceArray, 2)
sourceRows = UBound(sourceArray, 1)

And with that naming scheme I think I'd rename the arr parameter to source.

If t >= 0 And y >= 0 Then

Ah, ok, so that condition is leveraging the fact that VBA would have thrown an error given a 1D array, leaving t = 0. Not very obvious, let's improve that.

But first we need to tweak the assertions - we don't really want a 2D array, we just want any array with at most two dimensions. So... let's be less forgiving given anything other than that:

Dim dimensionCount As Long
If TypeOf(arr) Is Excel.Range Then
 sourceArray = source.Value
 dimensionCount = 2 'Range.Value is always a 2D array
Else
 Dim isValidArray As Boolean
 isValidArray = IsArray(source)
 If isValidArray Then dimensionCount = GetArrayDimSize(source)
 isValidArray = dimensionCount <> 0 And dimensionCount <= 2
 If Not isValidArray Then Err.Raise 5, "TEXTJOIN", "Expected: 1D or 2D array"
End If

And now that If statement can be much more explicit about what's going on and why:

If dimensionCount = 2 Then
 'handle 2D array
Else
 Debug.Assert dimensionCount = 1
 'handle 1D array
End If

So, c iterates dimension 1 / rows, d iterates dimension 2 / columns.

Now all these concatenations are inherently slow. I realize that's "just a worksheet function" and you'd like to share it as a single, simple, cohesive and focused little piece of code... but given thousands of iterations, VBA's string-handling will start becoming the performance bottleneck of the function.

There's a lightning-fast StringBuilder class right here on this site that you can use to address that.

As a bonus, using a StringBuilder makes the function assign to its return value only once - as opposed to what you have now, which is treating the return-value-identifier as if it were a local variable (which it technically is, ...I just don't like doing that).

Not sure why the d / column loop is inconsistent here:

For d = LBound(arr2, 1) To UBound(arr2, 2)

You're iterating the 2nd dimension, the LBound should be off the 2nd dimension too (yes, it should be the very same as that of the 1st dimension). Also, the upper bounds of both dimensions are already known and stored in local variables:

For currentRow = LBound(sourceArray, 1) To sourceRows
 For currentColumn = LBound(sourceArray, 2) To sourceColumns

This code will throw an error (by design?) if the array contains an Error value:

If arr2(c, d) <> "" Or Not skipblank Then

You could have a parameter that helps you decide how to treat errors - perhaps an Enum could be used:

Public Enum TEXTJOIN_ErrorValues
 ThrowOnError
 SkipError
 IncludeErrorText
End Enum

ThrowOnError would be the current/default behavior; SkipError would treat errors as blanks, and IncludeErrorText would include the e.g. #N/A error text into the result.

Except, once you have an Variant/Error value and not a Range, it's pretty much impossible to get the text back (unless you want to map CVErr(xlErrWhatever) values to a corresponding string... probably not worth it) - so scratch that enum value, and the behavior becomes either throw on error, or skip error values. And that's entirely possible, and quite easy to do - an additional optional Boolean parameter could be helpful for that.

Dim sb As StringBuilder
Set sb = New StringBuilder
'...
For currentRow = LBound(sourceArray, 1) To sourceRows
 For currentColumn = LBound(sourceArray, 2) To sourceColumns
 If Not IsError(sourceArray(currentRow, currentColumn)) Then
 If sourceArray(currentRow, currentColumn) <> vbNullString Or Not skipBlank Then
 sb.Append sourceArray(currentRow, currentColumn)
 sb.Append delim
 End If
 ElseIf Not skipErrors Then
 sb.Append delim
 End If
 Next
Next
TEXTJOIN = sb.ToString

Now, that's the 2D loop.. the 1D loop does essentially the same thing.. and that's annoying. I'd extract a method for that.

Private Sub ProcessValue(ByVal value As Variant, ByVal sb As StringBuilder, ByVal delim As String, ByVal skipBlanks As Boolean, ByVal skipErrors As Boolean)
 If Not IsError(value) Then
 If CStr(value) <> vbNullString Or Not skipBlanks Then
 sb.Append CStr(value)
 sb.Append delim
 End If
 ElseIf Not skipErrors Then
 sb.Append delim
 End If
End Sub

That turns the logic into:

If dimensionCount = 2 Then
 For currentRow = LBound(sourceArray, 1) To sourceRows
 For currentColumn = LBound(sourceArray, 2) To sourceColumns
 ProcessValue sourceArray(currentRow, currentColumn), sb, delim, skipBlanks, skipErrors
 Next
 Next
Else
 Debug.Assert dimensionCount = 1
 For currentRow = LBound(sourceArray, 1) To sourceRows
 ProcessValue sourceArray(currentRow), sb, delim, skipBlanks, skipErrors
 Next
End If
TEXTJOIN = sb.ToString

The order of the parameters strikes me as unintuitive - might be by design to match Microsoft's function, by I would have made the source array/range the first parameter, followed by an optional delimiter, followed by an optional flag to skip blanks (followed by an optional flag to skip errors).

answered Nov 13, 2017 at 18:15
\$\endgroup\$
3
  • 1
    \$\begingroup\$ Gosh Darn it you guys, These are all good answers and I learned a lot just reading them(and is why I do not answer here), thanks. I will digest and then mark as correct the one I use. Not sure the full new and improved will be used on Stack Overflow, something tells me if I put these blocks of code in an answer the OP will freak a little, but for me to learn it is great. \$\endgroup\$ Commented Nov 13, 2017 at 18:33
  • \$\begingroup\$ @ScottCraner I can't answer them all... I'm sure you would do great too! \$\endgroup\$ Commented Nov 13, 2017 at 18:38
  • \$\begingroup\$ In your pseudo code you used TypeOf as a function; instead of an operator. If you get a chance could you check out my answer. I feel like I should have done something a little different with the scalar values. Using Goto statements to skip the array processing would work but I'd have to use Goto statements..lol. I guess maybe, I should have use a huge If..ElseIf...ElseIf...Else...If yuck. \$\endgroup\$ Commented Nov 14, 2017 at 8:21
5
\$\begingroup\$

Disclaimer: I know you are asking a relatively straightforward question, but this is CR after all...

Checking an array versus a range is perfectly fine (assuming this function will generally be called from the worksheet), but there are some important considerations to be made.

First, it took me a few minutes to decode what your code is doing. Given that you are quite active in the community (and thus you are helping reach many people) it would be worth investing in a copy of 'Clean Code' by Robert C Martin. This is a book that was recommended to me by Mat's Mug, and has completely change how I think about code, and how I approach coding.

As a result, the first thing I noticed was how the small things within your code add up towards a larger potential for bugs. For example:

On Error Resume Next
t = UBound(arr2, 2)
y = UBound(arr2, 1)
On Error GoTo 0
If t >= 0 And y >= 0 Then
...
End If

If I interpret this bit correctly, you're just testing whether there is a second dimension within the array, and if there is, you're looping through the array as a 2d array. Otherwise, you're looping through as a one-dimensional array. Wouldnt it be nice if we could explicitly say that in VBA?

Private Function ArrayIsTwoDimensional(ByVal TestArray As Variant) As Boolean
 Dim Test As Variant
 On Error Resume Next
 Test = TestArray(LBound(TestArray, 1), LBound(TestArray, 2))
 ' Note: This will work for a 3d array, or a 4d array, etc. If there is any risk of a higher-dimension array, use a function that returns
 ' the exact number of dimensions.
 ArrayIsTwoDimensional = (Err.Number = 0)
 On Error GoTo 0
End Function
Private Function JoinFromTwoDimensionalArray(ByVal Delimeter As String, ByVal SkipBlanks As Boolean, ByVal InputArray As Variant)
 Dim Join As String
 Dim i As Long
 For i = LBound(InputArray, 1) To UBound(InputArray, 1)
 Dim j As Long
 For j = LBound(InputArray, 2) To UBound(InputArray, 2)
 If InputArray(i, j) <> vbNullString Or Not SkipBlanks Then
 If Join <> vbNullString Then Join = Join & Delimeter
 Join = Join & InputArray(i, j)
 End If
 Next
 Next
 JoinFromTwoDimensionalArray= Join
End Function
Private Function JoinFromOneDimensionalArray(ByVal Delimeter As String, ByVal SkipBlanks As Boolean, ByVal InputArray As Variant) As String
 Dim Join As String
 Dim i As Long
 For i = LBound(InputArray) To UBound(InputArray)
 If InputArray(i) <> vbNullString Or Not SkipBlanks Then
 ' Note the placement of the delimeter forces the joined text to only ever add a delimeter when needed.
 If Join <> vbNullString Then Join = Join & Delimeter
 Join = Join & InputArray(i)
 End If
 Next i
 JoinFromOneDimensionalArray = Join
End Function

That was the first step I took in making your code make a little bit more sense. By extracting the two loops, and the dimension check, the main routine becomes much cleaner, and gets to rely on a few Private Functions to do the work it needs to do. The beauty of this is that your code now explicitly says what it is doing (anyone could read the code, regardless of whether they have been coding for days or years).

The next step I took was explicitly checking for the supported types. For example:

If TypeName(arr) = "Range" Then
 arr2 = arr.Value
Else
 arr2 = arr
End If

If arr is a Worksheet for example, you will get a With block or Object variable not set error when trying to assign arr2 to arr (the names here are troubling as well). Let's explicitly raise an error if someone passes a value we don't currently support:

Private Const ERR_NUMBER_TYPE_NOT_SUPPORTED As Long = 513
Private Const ERR_MESSAGE_TYPE_NOT_SUPPORTED As String = "Please provide a Variant() or Range for the InputValues argument of TEXTJOIN. Alternate types are not currently supported."
Dim Values As Variant
Select Case TypeName(InputValues)
Case "Range"
 Values = InputValues.Value
Case "Variant()"
 Values = InputValues
Case Else
 Err.Raise ERR_NUMBER_TYPE_NOT_SUPPORTED, ERR_MESSAGE_TYPE_NOT_SUPPORTED
End Select

The beauty here is that not only will our code raise an error explicitly related to the source of the problem, but we also have a very modular way of adding additional support. For example, if we wanted to support a worksheet (for whatever reason) we would want to update the error message, and add just a bit of additional code:

Dim Values As Variant
Select Case TypeName(InputValues)
Case "Range"
 Values = InputValues.Value
Case "Variant()"
 Values = InputValues
Case "Worksheet"
 Values = GetArrayFromWorksheet(InputValues)
Case Else
 Err.Raise ERR_NUMBER_TYPE_NOT_SUPPORTED, ERR_MESSAGE_TYPE_NOT_SUPPORTED
End Select

Finally, making these few small changes has a huge impact on the readability/maintainability of the code. Here is the finished product (identical in function):

Private Const ERR_NUMBER_TYPE_NOT_SUPPORTED As Long = 513
Private Const ERR_MESSAGE_TYPE_NOT_SUPPORTED As String = "Please provide a Variant() or Range for the InputValues argument of TEXTJOIN. Alternate types are not currently supported."
Public Function TEXTJOIN(ByVal Delimeter As String, ByVal SkipBlanks As Boolean, ByVal InputValues As Variant) As String
 Dim Values As Variant
 Select Case TypeName(InputValues)
 Case "Range"
 Values = InputValues.Value
 Case "Variant()"
 Values = InputValues
 Case Else
 Err.Raise ERR_NUMBER_TYPE_NOT_SUPPORTED, ERR_MESSAGE_TYPE_NOT_SUPPORTED
 End Select
 If ArrayIsTwoDimensional(InputValues) Then
 TEXTJOIN = JoinFromTwoDimensionalArray(Delimeter, SkipBlanks, InputValues)
 Else
 TEXTJOIN = JoinFromOneDimensionalArray(Delimeter, SkipBlanks, InputValues)
 End If
End Function
Private Function ArrayIsTwoDimensional(ByVal TestArray As Variant) As Boolean
 Dim Test As Variant
 On Error Resume Next
 Test = TestArray(LBound(TestArray, 1), LBound(TestArray, 2))
 ' Note: This will work for a 3d array, or a 4d array, etc. If there is any risk of a higher-dimension array, use a function that returns
 ' the exact number of dimensions.
 ArrayIsTwoDimensional = (Err.Number = 0)
 On Error GoTo 0
End Function
Private Function JoinFromTwoDimensionalArray(ByVal Delimeter As String, ByVal SkipBlanks As Boolean, ByVal InputArray As Variant)
 Dim Join As String
 Dim i As Long
 For i = LBound(InputArray, 1) To UBound(InputArray, 1)
 Dim j As Long
 For j = LBound(InputArray, 2) To UBound(InputArray, 2)
 If InputArray(i, j) <> vbNullString Or Not SkipBlanks Then
 If Join <> vbNullString Then Join = Join & Delimeter
 Join = Join & InputArray(i, j)
 End If
 Next
 Next
 JoinFromTwoDimensionalArray= Join
End Function
Private Function JoinFromOneDimensionalArray(ByVal Delimeter As String, ByVal SkipBlanks As Boolean, ByVal InputArray As Variant) As String
 Dim Join As String
 Dim i As Long
 For i = LBound(InputArray) To UBound(InputArray)
 If InputArray(i) <> vbNullString Or Not SkipBlanks Then
 ' Note the placement of the delimeter forces the joined text to only ever add a delimeter when needed.
 If Join <> vbNullString Then Join = Join & Delimeter
 Join = Join & InputArray(i)
 End If
 Next i
 JoinFromOneDimensionalArray = Join
End Function

We get rid of all of those counter variables (opting instead for i and j which are pretty standard). Our other variables are quite clearly named, and everything is what you would expect. As a result if you (however many years from now) come back to the function to add something new, or if it breaks, you can quickly find the source instead of trying to remember what t and y were and why you set t or y to -1 and so on so forth.

Again, highly recommend picking up a copy of Clean Code. The idea of levels of abstraction, functions, classes, etc all can be difficult to understand at first but once you get going your code will become much easier to read.

answered Nov 13, 2017 at 18:00
\$\endgroup\$
6
  • \$\begingroup\$ Thanks, makes sense. BTW you have two Private Function JoinFromOneDimensionalArray I assume the first should be Private Function JoinFromTwoDimensionalArray \$\endgroup\$ Commented Nov 13, 2017 at 18:10
  • \$\begingroup\$ also would not Join = Join & Delimeter & InputArray(i, j) put the delimeter between the null string and the first value. so in the end we would get ,1,2,3? Shouldn't we then use JoinFromOneDimensionalArray = MID(Join,2)? or am I missing something here? I know you put this together quickly and are human, just want to ensure I am not missing something. \$\endgroup\$ Commented Nov 13, 2017 at 18:16
  • \$\begingroup\$ @ScottCraner It should be indeed. Good catch. A great reminder of why Copy/Paste can be dangerous :p. \$\endgroup\$ Commented Nov 13, 2017 at 18:16
  • \$\begingroup\$ Fixed the concatenation as well. I kept looking at that one and knew I did it differently, but just assumed I was doing it wrong previously. It should be fixed now. \$\endgroup\$ Commented Nov 13, 2017 at 18:20
  • 1
    \$\begingroup\$ "This is a book that was recommended to me by Mat's Mug" - look at you, name-dropping my mug like that! ;-) \$\endgroup\$ Commented Nov 13, 2017 at 18:22
2
\$\begingroup\$

I really like the OP's concept but I feel that a pseudo Excel Application.WorksheetFunction.TextJoin function should take a ParamArray of mixed data types.

In my implementation I use a combination of a string buffer and the Mid function improve the speed by avoiding concatenating large strings. To test the speed of my function I filled 500K cells with random strings ranging from 5 to 50 characters in length. Using the TheSpreadsheetGuru: Timer I determined that it took 1.95 seconds to create a string of 14,256,557 characters.

I'm sure that by utilizing the lightning-fast StringBuilder that @Mat'sMug mentioned would increase the speed 5 fold. I did not use it myself because I wanted convenience of having a single function do all the work.

Formulas

=TextJoin2(",",FALSE,"Numbers",A6:C6,A7:C9,{10,11,12})

=TextJoin2(",",TRUE,"Numbers",A6:C6,A7:C9,{10,11,12})

enter image description here

Code

Function TextJoin2(Delimiter As String, Ignore_Emtpy As Boolean, ParamArray Args() As Variant) As Variant
 Dim results As String
 Dim count As Long, i As Long, j As Long, length As Long, pos As Long
 Dim argument As Variant, v As Variant
 Select Case TypeName(Args(0))
 Case "Empty"
 argument = Array()
 Case "Range"
 If Args(0).count = 1 Then
 argument = Array(Args(0).value)
 Else
 argument = Args(0).value
 End If
 Case "String"
 argument = Array(Args(0))
 Case "Variant()"
 argument = Args(0)
 End Select
 For Each v In argument
 length = length + Len(v)
 count = count + 1
 Next
 results = Space(length + count * Len(Delimiter))
 If count - 1 + LBound(argument) = UBound(argument) Then
 For Each v In argument
 If Not Ignore_Emtpy Or Len(v) > 0 Then
 Mid(results, pos + 1, Len(v) + Len(Delimiter)) = v & Delimiter
 pos = pos + Len(v) + Len(Delimiter)
 End If
 Next
 Else
 For i = LBound(argument) To UBound(argument)
 For j = LBound(argument, 2) To UBound(argument, 2)
 If Not Ignore_Emtpy Or Len(argument(i, j)) > 0 Then
 Mid(results, pos + 1, Len(argument(i, j)) + Len(Delimiter)) = argument(i, j) & Delimiter
 pos = pos + Len(argument(i, j)) + Len(Delimiter)
 End If
 Next
 Next
 End If
 
 'Trim results needed to adjust for skipping empty values
 results = Left(results, pos)
 
 For i = 1 To UBound(Args)
 results = results & TextJoin2(Delimiter, Ignore_Emtpy, Args(i)) & Delimiter
 Next
 
 Debug.Print Left(results, Len(results) - Len(Delimiter))
 TextJoin2 = Left(results, Len(results) - Len(Delimiter))
End Function
answered Nov 14, 2017 at 8:03
\$\endgroup\$
1
  • \$\begingroup\$ Thanks, I was trying to remain like the actual formula but this is a good idea to use it more like SUMIFS(). \$\endgroup\$ Commented Nov 14, 2017 at 16:44

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.