44
\$\begingroup\$

In the process of trying to build a serializable data structure, I found myself building large strings, which gets very slow because VBA copies a string every time concatenation is performed.

To alleviate this, referring to Dynamic Array and Java's StringBuilder interface, I cobbled together a unicode clsStringBuilder class.

This isn't a very big chunk of code, but I'd be interested in any advice about edge cases that I maybe haven't considered, unexpected copying behavior that VBA might be performing "behind my back" which I could avoid, or corrections to coding style (or lack thereof).

Option Compare Database
Option Explicit
'******
'* v2 *
'******
Private Declare Sub CopyMemory Lib "kernel32.dll" Alias "RtlMoveMemory" (ByVal dst As Long, ByVal src As Long, ByVal Length As Long)
Private Const DEFAULT_CAPACITY As Long = &H10
Private m_currLen As Long
Private m_stringBuffer() As Byte
Private Sub Class_Initialize()
 ReDim m_stringBuffer(0 To (DEFAULT_CAPACITY * 2) - 1) 'Each unicode character is 2 bytes
End Sub
Public Function Append(strString As String) As clsStringBuilder
On Error GoTo derp
 If m_currLen + LenB(strString) < UBound(m_stringBuffer) Then
 CopyMemory VarPtr(m_stringBuffer(m_currLen)), StrPtr(strString), LenB(strString)
 Else
 If m_currLen + LenB(strString) < UBound(m_stringBuffer) * 2 Then
 Expand
 Else
 Expand m_currLen + LenB(strString)
 End If
 CopyMemory VarPtr(m_stringBuffer(m_currLen)), StrPtr(strString), LenB(strString)
 End If
 m_currLen = m_currLen + LenB(strString)
 Set Append = Me
 Exit Function
derp:
 Stop
 Resume
End Function
Public Property Get Length() As Long
 Length = m_currLen / 2
End Property
Public Property Get Capacity() As Long
 Capacity = UBound(m_stringBuffer)
End Property
Private Sub Expand(Optional newSize As Long = 0)
 If newSize <> 0 Then
 ReDim Preserve m_stringBuffer(0 To newSize - 1)
 Else
 ReDim Preserve m_stringBuffer(0 To (UBound(m_stringBuffer) * 2) + 1)
 End If
End Sub
Public Function toString() As String
 toString = Mid(m_stringBuffer, 1, m_currLen / 2)
End Function

Here is a test:

Public Sub Main()
 Dim sb As clsStringBuilder
 Set sb = New clsStringBuilder
 Dim strString As String
 Dim i As Long
 Dim StartTime As Double
 'VBA String
 StartTime = MicroTimer()
 For i = 0 To 100000
 strString = strString + "Hello World;"
 Next
 Debug.Print "The VBA String took: " & Round(MicroTimer - StartTime, 3) & " seconds"
 'StringBuilder
 StartTime = MicroTimer()
 For i = 0 To 100000
 sb.Append "Hello World;"
 Next
 Debug.Print "The Stringbuilder took: " & Round(MicroTimer - StartTime, 3) & " seconds"
 'Are the strings the same?
 Debug.Print StrComp(strString, sb.toString, vbBinaryCompare)
End Sub

Here is Microsoft's MicroTimer function, which can be found here:

Private Declare Function getFrequency Lib "kernel32" _
Alias "QueryPerformanceFrequency" (cyFrequency As Currency) As Long
Private Declare Function getTickCount Lib "kernel32" _
Alias "QueryPerformanceCounter" (cyTickCount As Currency) As Long
Function MicroTimer() As Double
'
' Returns seconds.
 Dim cyTicks1 As Currency
 Static cyFrequency As Currency
 '
 MicroTimer = 0
' Get frequency.
 If cyFrequency = 0 Then getFrequency cyFrequency
' Get ticks.
 getTickCount cyTicks1
' Seconds
 If cyFrequency Then MicroTimer = cyTicks1 / cyFrequency
End Function
Mathieu Guindon
75.5k18 gold badges194 silver badges467 bronze badges
asked Oct 22, 2014 at 19:02
\$\endgroup\$
17
  • 3
    \$\begingroup\$ Hey Blackhawk - I've tweaked this code a bit and uploaded it on GitHub, and I don't know if you saw Tim Hall's VBA-Blocks, but I think it would be awesome to have a StringBuilder class available through a nuget-like package manager for VBA (and Rubberduck might eventually craft a UI for it, so packages for a project can be add/updated/maintained, all in the VBE). Would be awesome if you could contact Tim to agree with what I've done with your code (or I can take down that repository if you prefer) - LMK! \$\endgroup\$ Commented Oct 24, 2018 at 18:13
  • 3
    \$\begingroup\$ @MathieuGuindon That's awesome! If you need me to poke anybody, let me know, but otherwise consider this comment a stamp of approval :) \$\endgroup\$ Commented Oct 26, 2018 at 15:03
  • 2
    \$\begingroup\$ @TonyDallimore to add a bit more clarification, the most expensive part of the standard ReDim Preserve is the copy. When you have a chunk of memory and you want to "expand" it by just one character, what actually happens is the OS has to find a completely new chunk of memory that is the new requested size, and then it has to copy everything from the old chunk into the new chunk. This takes longer and longer as your data grows. Imagine building a 2GB array expanding 1 byte at a time - you'd end up making ~ 2 billion copies totaling ~ 2 billion GB! \$\endgroup\$ Commented Apr 22, 2019 at 20:41
  • 2
    \$\begingroup\$ @Blackhawk Thanks for the comments. I understand the principle of how string concatenation and ReDim Preserve work behind the scenes. I see ReDim Preserve Arr(1 To UBound(Arr) + 1) so often I did not study your code carefully enough. Once, MathieuGuindon had pointed out that each ReDim doubled the size of the array, not just added enough for the next append, the reason for your code being so fast was obvious. \$\endgroup\$ Commented Apr 22, 2019 at 21:52
  • 1
    \$\begingroup\$ @Mathieu: You are partially right; using the test above: The VBA String took: 68,107 seconds; The MatsMugStringbuilder took: 0,883 seconds; The BlackhawkStringbuilder took: 0,396 seconds; and both StringBuilder versions avoid Out of String Space... No big deal under 10K concatenations. \$\endgroup\$ Commented Sep 9, 2020 at 23:54

4 Answers 4

29
\$\begingroup\$

I like this, a lot. It's.. brilliant. Like .NET String objects, VBA strings are immutable, which means like in .NET, when "the quick brown fox" gets appended with "jumps over" and then "the lazy dog", it's 4 strings that have been generated, and thus the first one got copied 3 times; a VBA StringBuilder class is therefore definitely welcome to any VBA toolkit!

This is some serious code you've got here. Let's take a look.

So you've called the class clsStringBuilder. I know where you're coming from, but there's no real reason for this "cls" Hungarian prefix - I'd remove it, and call the class StringBuilder.

'******
'* v2 *
'******

Don't bother with that. I know version control is natively near-impossible with VBA, but there is no need to "version" code in comments nonetheless; do you actually maintain the version number? Why bother? Just remove it, it's useless clutter.

Private Const DEFAULT_CAPACITY As Long = &H10

Why not 16? Hexadecimal 10 is 16 right? I think it would be clearer to use a decimal notation. Actually that capacity is potentially confusing, especially given the hex notation. Is &H10 the number of bytes? Characters?

ReDim m_stringBuffer(0 To (DEFAULT_CAPACITY * 2) - 1) 'Each unicode character is 2 bytes

Ah, characters then. How about calling the constant DEFAULT_CHARACTER_CAPACITY? Nah, too long.. and I personally don't like the YELLCASE, I'd just call it InitialCharacterCapacity, but I've seen other people use ALL CAPS for constants - as long as you're consistent, it works :)

BTW that's a good comment you have there, but I wouldn't bother specifying "unicode" characters; it gets confusing when the IDE itself only supports ANSI strings!

I don't like prefixes and abbreviated names, so m_currLen would become currentLength and m_stringBuffer would become stringBufffer, or just buffer.

Actually since currentLength is in bytes, I'd call it currentByteLength, so as to avoid question marks when it comes to this:

Public Property Get Length() As Long
 Length = m_currLen / 2
End Property

Public Function Append(strString As String) As clsStringBuilder

strString, really? I doesn't get any more Hungarian than that! Also you should know that parameters are passed ByRef by default - I'd change the signature to this:

Public Function Append(ByVal value As String) As StringBuilder

The error handling isn't optimal - if anything blows up, you're bringing up the IDE for the end user to scratch their head and debug your code! That's not production-ready:

derp:
 Stop
 Resume

"derp" means nothing to me - I like following a "template" like this:

Public Sub Foo()
 On Error GoTo CleanFail
 'implementation
CleanExit:
 Exit Sub
CleanFail:
 'handle error
 Resume CleanExit
End Sub

You might also want to make sure Expand doesn't actually shrink the buffer. I think. ;)

Lastly, I'm not sure I understand why toString isn't following the convention and named in PascalCase like every public method - ToString would be better-looking.

Good job!

answered Oct 22, 2014 at 19:49
\$\endgroup\$
4
  • 2
    \$\begingroup\$ Thanks for the feedback - fixing the naming conventions now. For the member variables, I've often run into cases where I need to make them private and provide access through Get/Set/Let properties. In those cases, I use "m_" to keep VBA from yelling "Ambiguous name detected: membername". What is the best practice for those situations? \$\endgroup\$ Commented Oct 22, 2014 at 20:36
  • 6
    \$\begingroup\$ I like stuffing all my private fields into a private type - here would be Private Type TStringBuilder, with all the members. Then the class only has 1 field, which I call this, like, Private this As TStringBuilder - this eliminates the name clashes, and I like seeing this.MemberName = value in a Public Property Let MemberName(ByVal value As Whatever) block ;) \$\endgroup\$ Commented Oct 22, 2014 at 20:39
  • \$\begingroup\$ Interesting... I like this :D I'll have to think through the implications... \$\endgroup\$ Commented Oct 22, 2014 at 20:54
  • 10
    \$\begingroup\$ Maybe it's just me, but I like YELLCASE constants... \$\endgroup\$ Commented Oct 23, 2014 at 1:13
19
\$\begingroup\$

Your StringBuilder is pretty damn impressive :) ++ to Mat's suggestions except the YELLCASE which I am on RubberDuck's side ;)

I think I have identified a potential overflow of memory (out of memory). It's probably very unlikely to happen to anyone but hey... If you wrap your loop with another loop VBA runtime doesn't seem to catch up with counting and releasing the references...Your StringBuilder is too damn quick to VBA runtime ;)

Ex:

For j = 0 To 1000
 Dim csb As New clsStringBuilder
 StartTime = MicroTimer()
 For i = 0 To 100000
 csb.Append "Hello World;"
 Next
Next

That will stop in derp at some point and cause an out of memory... AFAIC, nothing you can really do... except don't allow people like me to test your code ;P jk!

A few other minor things from me though:

しろいしかく Select Case is faster than If-Else

しろいしかく Division is more expensive than addition & multiplication

しろいしかく Multiple computation to get the same number is a bit inefficient. If you need to get the value of Ubound(arr) 5 times within one if-else/select case consider storing this number in a variable.

しろいしかく Mid$() (in ToString()) should be a tiny bit faster than Mid()

しろいしかく Probably a safer option to use & instead of + for String concatenation. (your Main())

Overall speed seems just a tiny bit faster with my improvements - too subtle? ;)

1000 tests each

enter image description here

Ok I just changed the name to StringBuilder and here's what I've done to it:

Private Declare Sub CopyMemory Lib "kernel32.dll" Alias "RtlMoveMemory" _
 (ByVal dst As Long, ByVal src As Long, ByVal Length As Long)
Private Const DEFAULT_CAPACITY As Long = 16
Private m_currLen As Long
Private m_stringBuffer() As Byte
Private Sub Class_Initialize()
 ReDim m_stringBuffer(0 To (DEFAULT_CAPACITY + DEFAULT_CAPACITY) - 1) 'Each unicode character is 2 bytes
End Sub
Public Function Append(strString As String) As StringBuilder
On Error GoTo derp
 Dim uBuffer As Long
 uBuffer = UBound(m_stringBuffer)
 Dim lengthB As Long
 lengthB = LenB(strString)
 Dim sPtr As Long
 sPtr = StrPtr(strString)
 Dim currLen As Long
 currLen = m_currLen + lengthB
 Select Case currLen
 Case Is < uBuffer
 CopyMemory VarPtr(m_stringBuffer(m_currLen)), sPtr, lengthB
 Case Is < (uBuffer + uBuffer)
 Expand
 CopyMemory VarPtr(m_stringBuffer(m_currLen)), sPtr, lengthB
 Case Else
 Expand currLen
 CopyMemory VarPtr(m_stringBuffer(m_currLen)), sPtr, lengthB
 End Select
 m_currLen = currLen
 Set Append = Me
 Exit Function
derp:
 Stop
 Resume
End Function
Public Property Get Length() As Long
 Length = m_currLen * 0.5
End Property
Public Property Get Capacity() As Long
 Capacity = UBound(m_stringBuffer)
End Property
Private Sub Expand(Optional newSize As Long = 0)
 Select Case newSize
 Case Is = 0
 ReDim Preserve m_stringBuffer(0 To (UBound(m_stringBuffer) + UBound(m_stringBuffer)) + 1)
 Case Else
 ReDim Preserve m_stringBuffer(0 To newSize - 1)
 End Select
End Sub
Public Function ToString() As String
 ToString = Mid$(m_stringBuffer, 1, m_currLen * 0.5)
End Function

You could possibly play a bit more with the Select Case but I've left it in a state that I am happy with...

m_stringBuffer(m_currLen) should be O(1) so no need to store in a variable IMO

answered Oct 23, 2014 at 12:43
\$\endgroup\$
5
  • \$\begingroup\$ Thanks! Working through this now... what did you use to generate the fancy chart? \$\endgroup\$ Commented Oct 23, 2014 at 17:43
  • 4
    \$\begingroup\$ you can't be serious with your question? :P Excel 2010... \$\endgroup\$ Commented Oct 23, 2014 at 17:59
  • 2
    \$\begingroup\$ Ah, of course - it was the colors that threw me, since they aren't the canned "Office pastels" \$\endgroup\$ Commented Oct 23, 2014 at 18:01
  • \$\begingroup\$ @Blackhawk no worries ;) I probably wouldn't guess myself hehe I bet we all got our favorite RGBs ;) \$\endgroup\$ Commented Oct 23, 2014 at 22:14
  • 6
    \$\begingroup\$ @user28366 You've fallen victim to VBA's As New syntax. Using Dim csb As New StringBuilder in a loop does NOT create a new instance on each iteration, so you're only ever working with the first instance and you're effectively adding "Hello World;" a mere 100m times. That's a String in excess of 2GB, at 2,402,424,024 bytes, or a Long overflow well before you run out of memory. The overflow problem isn't with the StringBuilder class, it's with your over-usage of it. The open-source VBE add-in Rubberduck VBA (to which I'm a contributor) would have spotted that oversight. \$\endgroup\$ Commented Mar 29, 2017 at 22:22
11
\$\begingroup\$

Using CopyMemory is actually unnecessary, you can achieve the same thing simply with arrays. The code is not only shorter but faster as well.

Dim MyBuffer() As String
Dim MyCurrentIndex As Long
Dim MyMaxIndex As Long
Private Sub Class_Initialize()
 MyCurrentIndex = 0
 MyMaxIndex = 16
 ReDim MyBuffer(1 To MyMaxIndex)
End Sub
'Appends the given Text to this StringBuilder
Public Sub Append(Text As String)
 MyCurrentIndex = MyCurrentIndex + 1
 If MyCurrentIndex > MyMaxIndex Then
 MyMaxIndex = 2 * MyMaxIndex
 ReDim Preserve MyBuffer(1 To MyMaxIndex)
 End If
 MyBuffer(MyCurrentIndex) = Text
End Sub
'Returns the text in this StringBuilder
'Optional Parameter: Separator (default vbNullString) used in joining components
Public Function ToString(Optional Separator As String = vbNullString) As String
 If MyCurrentIndex > 0 Then
 ReDim Preserve MyBuffer(1 To MyCurrentIndex)
 MyMaxIndex = MyCurrentIndex
 ToString = Join(MyBuffer, Separator)
 End If
End Function
answered Feb 8, 2017 at 10:21
\$\endgroup\$
7
  • \$\begingroup\$ You might want to add some explanation, how and why this is better than what OP posted. Also note that this question is already quite old (and here we don't care about canonical answers, but the actual review of the code), so the OP might have moved on. \$\endgroup\$ Commented Feb 8, 2017 at 10:32
  • \$\begingroup\$ @Martin.Roller Just looking at the code it would seem that MyCurrentIndex needs to be incremented. Is that so?? myCurrentIndex = myCurrentIndex + Len(Text) - 1 \$\endgroup\$ Commented May 2, 2017 at 4:26
  • \$\begingroup\$ @donPablo: Look again or try stepping through the code. The length of the Text is irrelevant. \$\endgroup\$ Commented May 15, 2017 at 10:23
  • \$\begingroup\$ @Blackhawk: Note that speed here only depends on the number of appends, not on the size of the appended text. \$\endgroup\$ Commented May 15, 2017 at 10:28
  • 2
    \$\begingroup\$ This is clearly the version to use, only one with no 32/64 bits portablity issues. \$\endgroup\$ Commented Nov 15, 2018 at 13:53
7
\$\begingroup\$

I know this is an old question but people finding it may find this answer of value. I've tested 4 versions of StringBuilder using Excel 2013 and the fastest version appears to be an optimized version of Blackhawk's code (The optimized code listed below). The list below shows one example of the time it took each version to do Blackhawk's test 1000 times (100,000,000 total appends). The new optimized version is labeled "New" and took a little longer than 34 seconds. Martin.Roller's array-based code is labeled "Ary", takes a little more than 36 seconds, and is nearly tied with Blackhawk's code. I've ran the test several times and Blackhawk's code does come in ahead of Martin.Roller's code some of the time. Blackhawk's code is labeled "Old" and takes a little more than 36 seconds. The "Mid" version came in last and is a version I created using VBA's MID statement to replace the content of a string. Removing the "On Error" in the code below should improve the speed even more, but make sure the calling code never builds a string longer than VBA can handle.

  1. The New StringBuilder took: 34.396 seconds
  2. The Ary StringBuilder took: 36.467 seconds
  3. The Old StringBuilder took: 36.605 seconds
  4. The Mid StringBuilder took: 40.141 seconds

New optimized version of Blackhawk's StringBuilder:

Option Compare Text
Option Explicit
Private Declare Sub CopyMemory Lib "kernel32.dll" Alias "RtlMoveMemory" (ByVal dst As Long, ByVal src As Long, ByVal Length As Long)
Private Const InitialCharCount As Long = 16
' Define members
Private mUBound As Long
Private mString() As Byte
Private Sub Class_Initialize()
 Clear
End Sub
Public Sub Clear()
 mUBound = -1
 ReDim mString(0 To InitialCharCount * 2 - 1) 'Each unicode character is 2 bytes
End Sub
Public Function Append(value As String) As StringBuilder
Dim NewUBound As Long
Dim CapacityUBound As Long
On Error GoTo Failed
 NewUBound = mUBound + LenB(value)
 If NewUBound > UBound(mString) Then
 CapacityUBound = UBound(mString) * 2 + 1
 If NewUBound > CapacityUBound Then CapacityUBound = NewUBound * 2 + 1
 ReDim Preserve mString(0 To CapacityUBound)
 End If
 CopyMemory VarPtr(mString(mUBound + 1)), StrPtr(value), LenB(value)
 mUBound = NewUBound
 Set Append = Me
 Exit Function
Failed:
 Stop
 Resume
End Function
Public Property Get Length() As Long
 Length = (mUBound + 1) / 2
End Property
Public Function ToString() As String
 ToString = Mid(mString, 1, Length)
End Function
answered Aug 20, 2018 at 8:58
\$\endgroup\$
1
  • 2
    \$\begingroup\$ Thanks for doing this! In the spirit of the question, can you add a description of the optimizations you made? That will help anyone looking at this answer to understand what changes cause the speedup. I have to say, one of the things that I love about VBA is the lack of a standard library - you get to learn how to roll your own everything :) \$\endgroup\$ Commented Aug 20, 2018 at 14:41

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.