In the process of trying to build a serializable data structure, I found myself building large strings, which gets very slow because VBA copies a string every time concatenation is performed.
To alleviate this, referring to Dynamic Array and Java's StringBuilder interface, I cobbled together a unicode clsStringBuilder
class.
This isn't a very big chunk of code, but I'd be interested in any advice about edge cases that I maybe haven't considered, unexpected copying behavior that VBA might be performing "behind my back" which I could avoid, or corrections to coding style (or lack thereof).
Option Compare Database
Option Explicit
'******
'* v2 *
'******
Private Declare Sub CopyMemory Lib "kernel32.dll" Alias "RtlMoveMemory" (ByVal dst As Long, ByVal src As Long, ByVal Length As Long)
Private Const DEFAULT_CAPACITY As Long = &H10
Private m_currLen As Long
Private m_stringBuffer() As Byte
Private Sub Class_Initialize()
ReDim m_stringBuffer(0 To (DEFAULT_CAPACITY * 2) - 1) 'Each unicode character is 2 bytes
End Sub
Public Function Append(strString As String) As clsStringBuilder
On Error GoTo derp
If m_currLen + LenB(strString) < UBound(m_stringBuffer) Then
CopyMemory VarPtr(m_stringBuffer(m_currLen)), StrPtr(strString), LenB(strString)
Else
If m_currLen + LenB(strString) < UBound(m_stringBuffer) * 2 Then
Expand
Else
Expand m_currLen + LenB(strString)
End If
CopyMemory VarPtr(m_stringBuffer(m_currLen)), StrPtr(strString), LenB(strString)
End If
m_currLen = m_currLen + LenB(strString)
Set Append = Me
Exit Function
derp:
Stop
Resume
End Function
Public Property Get Length() As Long
Length = m_currLen / 2
End Property
Public Property Get Capacity() As Long
Capacity = UBound(m_stringBuffer)
End Property
Private Sub Expand(Optional newSize As Long = 0)
If newSize <> 0 Then
ReDim Preserve m_stringBuffer(0 To newSize - 1)
Else
ReDim Preserve m_stringBuffer(0 To (UBound(m_stringBuffer) * 2) + 1)
End If
End Sub
Public Function toString() As String
toString = Mid(m_stringBuffer, 1, m_currLen / 2)
End Function
Here is a test:
Public Sub Main()
Dim sb As clsStringBuilder
Set sb = New clsStringBuilder
Dim strString As String
Dim i As Long
Dim StartTime As Double
'VBA String
StartTime = MicroTimer()
For i = 0 To 100000
strString = strString + "Hello World;"
Next
Debug.Print "The VBA String took: " & Round(MicroTimer - StartTime, 3) & " seconds"
'StringBuilder
StartTime = MicroTimer()
For i = 0 To 100000
sb.Append "Hello World;"
Next
Debug.Print "The Stringbuilder took: " & Round(MicroTimer - StartTime, 3) & " seconds"
'Are the strings the same?
Debug.Print StrComp(strString, sb.toString, vbBinaryCompare)
End Sub
Here is Microsoft's MicroTimer
function, which can be found here:
Private Declare Function getFrequency Lib "kernel32" _
Alias "QueryPerformanceFrequency" (cyFrequency As Currency) As Long
Private Declare Function getTickCount Lib "kernel32" _
Alias "QueryPerformanceCounter" (cyTickCount As Currency) As Long
Function MicroTimer() As Double
'
' Returns seconds.
Dim cyTicks1 As Currency
Static cyFrequency As Currency
'
MicroTimer = 0
' Get frequency.
If cyFrequency = 0 Then getFrequency cyFrequency
' Get ticks.
getTickCount cyTicks1
' Seconds
If cyFrequency Then MicroTimer = cyTicks1 / cyFrequency
End Function
4 Answers 4
I like this, a lot. It's.. brilliant. Like .NET String
objects, VBA strings are immutable, which means like in .NET, when "the quick brown fox"
gets appended with "jumps over"
and then "the lazy dog"
, it's 4 strings that have been generated, and thus the first one got copied 3 times; a VBA StringBuilder
class is therefore definitely welcome to any VBA toolkit!
This is some serious code you've got here. Let's take a look.
So you've called the class clsStringBuilder
. I know where you're coming from, but there's no real reason for this "cls" Hungarian prefix - I'd remove it, and call the class StringBuilder
.
'****** '* v2 * '******
Don't bother with that. I know version control is natively near-impossible with VBA, but there is no need to "version" code in comments nonetheless; do you actually maintain the version number? Why bother? Just remove it, it's useless clutter.
Private Const DEFAULT_CAPACITY As Long = &H10
Why not 16
? Hexadecimal 10
is 16
right? I think it would be clearer to use a decimal notation. Actually that capacity is potentially confusing, especially given the hex notation. Is &H10
the number of bytes? Characters?
ReDim m_stringBuffer(0 To (DEFAULT_CAPACITY * 2) - 1) 'Each unicode character is 2 bytes
Ah, characters then. How about calling the constant DEFAULT_CHARACTER_CAPACITY
? Nah, too long.. and I personally don't like the YELLCASE, I'd just call it InitialCharacterCapacity
, but I've seen other people use ALL CAPS for constants - as long as you're consistent, it works :)
BTW that's a good comment you have there, but I wouldn't bother specifying "unicode" characters; it gets confusing when the IDE itself only supports ANSI strings!
I don't like prefixes and abbreviated names, so m_currLen
would become currentLength
and m_stringBuffer
would become stringBufffer
, or just buffer
.
Actually since currentLength
is in bytes, I'd call it currentByteLength
, so as to avoid question marks when it comes to this:
Public Property Get Length() As Long Length = m_currLen / 2 End Property
Public Function Append(strString As String) As clsStringBuilder
strString
, really? I doesn't get any more Hungarian than that! Also you should know that parameters are passed ByRef
by default - I'd change the signature to this:
Public Function Append(ByVal value As String) As StringBuilder
The error handling isn't optimal - if anything blows up, you're bringing up the IDE for the end user to scratch their head and debug your code! That's not production-ready:
derp: Stop Resume
"derp" means nothing to me - I like following a "template" like this:
Public Sub Foo()
On Error GoTo CleanFail
'implementation
CleanExit:
Exit Sub
CleanFail:
'handle error
Resume CleanExit
End Sub
You might also want to make sure Expand
doesn't actually shrink the buffer. I think. ;)
Lastly, I'm not sure I understand why toString
isn't following the convention and named in PascalCase
like every public method - ToString
would be better-looking.
Good job!
-
2\$\begingroup\$ Thanks for the feedback - fixing the naming conventions now. For the member variables, I've often run into cases where I need to make them private and provide access through Get/Set/Let properties. In those cases, I use "m_" to keep VBA from yelling "Ambiguous name detected: membername". What is the best practice for those situations? \$\endgroup\$Blackhawk– Blackhawk2014年10月22日 20:36:53 +00:00Commented Oct 22, 2014 at 20:36
-
6\$\begingroup\$ I like stuffing all my private fields into a private type - here would be
Private Type TStringBuilder
, with all the members. Then the class only has 1 field, which I callthis
, like,Private this As TStringBuilder
- this eliminates the name clashes, and I like seeingthis.MemberName = value
in aPublic Property Let MemberName(ByVal value As Whatever)
block ;) \$\endgroup\$Mathieu Guindon– Mathieu Guindon2014年10月22日 20:39:26 +00:00Commented Oct 22, 2014 at 20:39 -
\$\begingroup\$ Interesting... I like this :D I'll have to think through the implications... \$\endgroup\$Blackhawk– Blackhawk2014年10月22日 20:54:47 +00:00Commented Oct 22, 2014 at 20:54
-
10\$\begingroup\$ Maybe it's just me, but I like YELLCASE constants... \$\endgroup\$RubberDuck– RubberDuck2014年10月23日 01:13:28 +00:00Commented Oct 23, 2014 at 1:13
Your StringBuilder
is pretty damn impressive :) ++ to Mat's suggestions except the YELLCASE which I am on RubberDuck's side ;)
I think I have identified a potential overflow of memory (out of memory
). It's probably very unlikely to happen to anyone but hey... If you wrap your loop with another loop VBA runtime doesn't seem to catch up with counting and releasing the references...Your StringBuilder
is too damn quick to VBA runtime ;)
Ex:
For j = 0 To 1000
Dim csb As New clsStringBuilder
StartTime = MicroTimer()
For i = 0 To 100000
csb.Append "Hello World;"
Next
Next
That will stop in derp
at some point and cause an out of memory
... AFAIC, nothing you can really do... except don't allow people like me to test your code ;P jk!
A few other minor things from me though:
□しろいしかく Select Case
is faster than If-Else
□しろいしかく Division is more expensive than addition & multiplication
□しろいしかく Multiple computation to get the same number is a bit inefficient. If you need to get the value of Ubound(arr)
5 times within one if-else/select case consider storing this number in a variable.
□しろいしかく Mid$()
(in ToString()
) should be a tiny bit faster than Mid()
□しろいしかく Probably a safer option to use &
instead of +
for String concatenation. (your Main()
)
Overall speed seems just a tiny bit faster with my improvements - too subtle? ;)
1000 tests each
enter image description here
Ok I just changed the name to StringBuilder
and here's what I've done to it:
Private Declare Sub CopyMemory Lib "kernel32.dll" Alias "RtlMoveMemory" _
(ByVal dst As Long, ByVal src As Long, ByVal Length As Long)
Private Const DEFAULT_CAPACITY As Long = 16
Private m_currLen As Long
Private m_stringBuffer() As Byte
Private Sub Class_Initialize()
ReDim m_stringBuffer(0 To (DEFAULT_CAPACITY + DEFAULT_CAPACITY) - 1) 'Each unicode character is 2 bytes
End Sub
Public Function Append(strString As String) As StringBuilder
On Error GoTo derp
Dim uBuffer As Long
uBuffer = UBound(m_stringBuffer)
Dim lengthB As Long
lengthB = LenB(strString)
Dim sPtr As Long
sPtr = StrPtr(strString)
Dim currLen As Long
currLen = m_currLen + lengthB
Select Case currLen
Case Is < uBuffer
CopyMemory VarPtr(m_stringBuffer(m_currLen)), sPtr, lengthB
Case Is < (uBuffer + uBuffer)
Expand
CopyMemory VarPtr(m_stringBuffer(m_currLen)), sPtr, lengthB
Case Else
Expand currLen
CopyMemory VarPtr(m_stringBuffer(m_currLen)), sPtr, lengthB
End Select
m_currLen = currLen
Set Append = Me
Exit Function
derp:
Stop
Resume
End Function
Public Property Get Length() As Long
Length = m_currLen * 0.5
End Property
Public Property Get Capacity() As Long
Capacity = UBound(m_stringBuffer)
End Property
Private Sub Expand(Optional newSize As Long = 0)
Select Case newSize
Case Is = 0
ReDim Preserve m_stringBuffer(0 To (UBound(m_stringBuffer) + UBound(m_stringBuffer)) + 1)
Case Else
ReDim Preserve m_stringBuffer(0 To newSize - 1)
End Select
End Sub
Public Function ToString() As String
ToString = Mid$(m_stringBuffer, 1, m_currLen * 0.5)
End Function
You could possibly play a bit more with the Select Case
but I've left it in a state that I am happy with...
m_stringBuffer(m_currLen)
should be O(1)
so no need to store in a variable IMO
-
\$\begingroup\$ Thanks! Working through this now... what did you use to generate the fancy chart? \$\endgroup\$Blackhawk– Blackhawk2014年10月23日 17:43:23 +00:00Commented Oct 23, 2014 at 17:43
-
4\$\begingroup\$ you can't be serious with your question? :P Excel 2010... \$\endgroup\$user28366– user283662014年10月23日 17:59:39 +00:00Commented Oct 23, 2014 at 17:59
-
2\$\begingroup\$ Ah, of course - it was the colors that threw me, since they aren't the canned "Office pastels" \$\endgroup\$Blackhawk– Blackhawk2014年10月23日 18:01:46 +00:00Commented Oct 23, 2014 at 18:01
-
\$\begingroup\$ @Blackhawk no worries ;) I probably wouldn't guess myself hehe I bet we all got our favorite RGBs ;) \$\endgroup\$user28366– user283662014年10月23日 22:14:10 +00:00Commented Oct 23, 2014 at 22:14
-
6\$\begingroup\$ @user28366 You've fallen victim to VBA's
As New
syntax. UsingDim csb As New StringBuilder
in a loop does NOT create a new instance on each iteration, so you're only ever working with the first instance and you're effectively adding"Hello World;"
a mere 100m times. That's aString
in excess of 2GB, at 2,402,424,024 bytes, or aLong
overflow well before you run out of memory. The overflow problem isn't with theStringBuilder
class, it's with your over-usage of it. The open-source VBE add-in Rubberduck VBA (to which I'm a contributor) would have spotted that oversight. \$\endgroup\$ThunderFrame– ThunderFrame2017年03月29日 22:22:24 +00:00Commented Mar 29, 2017 at 22:22
Using CopyMemory
is actually unnecessary, you can achieve the same thing simply with arrays. The code is not only shorter but faster as well.
Dim MyBuffer() As String
Dim MyCurrentIndex As Long
Dim MyMaxIndex As Long
Private Sub Class_Initialize()
MyCurrentIndex = 0
MyMaxIndex = 16
ReDim MyBuffer(1 To MyMaxIndex)
End Sub
'Appends the given Text to this StringBuilder
Public Sub Append(Text As String)
MyCurrentIndex = MyCurrentIndex + 1
If MyCurrentIndex > MyMaxIndex Then
MyMaxIndex = 2 * MyMaxIndex
ReDim Preserve MyBuffer(1 To MyMaxIndex)
End If
MyBuffer(MyCurrentIndex) = Text
End Sub
'Returns the text in this StringBuilder
'Optional Parameter: Separator (default vbNullString) used in joining components
Public Function ToString(Optional Separator As String = vbNullString) As String
If MyCurrentIndex > 0 Then
ReDim Preserve MyBuffer(1 To MyCurrentIndex)
MyMaxIndex = MyCurrentIndex
ToString = Join(MyBuffer, Separator)
End If
End Function
-
\$\begingroup\$ You might want to add some explanation, how and why this is better than what OP posted. Also note that this question is already quite old (and here we don't care about canonical answers, but the actual review of the code), so the OP might have moved on. \$\endgroup\$Graipher– Graipher2017年02月08日 10:32:16 +00:00Commented Feb 8, 2017 at 10:32
-
\$\begingroup\$ @Martin.Roller Just looking at the code it would seem that MyCurrentIndex needs to be incremented. Is that so?? myCurrentIndex = myCurrentIndex + Len(Text) - 1 \$\endgroup\$donPablo– donPablo2017年05月02日 04:26:11 +00:00Commented May 2, 2017 at 4:26
-
\$\begingroup\$ @donPablo: Look again or try stepping through the code. The length of the Text is irrelevant. \$\endgroup\$Martin.Roller– Martin.Roller2017年05月15日 10:23:05 +00:00Commented May 15, 2017 at 10:23
-
\$\begingroup\$ @Blackhawk: Note that speed here only depends on the number of appends, not on the size of the appended text. \$\endgroup\$Martin.Roller– Martin.Roller2017年05月15日 10:28:47 +00:00Commented May 15, 2017 at 10:28
-
2\$\begingroup\$ This is clearly the version to use, only one with no 32/64 bits portablity issues. \$\endgroup\$Patrick Honorez– Patrick Honorez2018年11月15日 13:53:26 +00:00Commented Nov 15, 2018 at 13:53
I know this is an old question but people finding it may find this answer of value. I've tested 4 versions of StringBuilder using Excel 2013 and the fastest version appears to be an optimized version of Blackhawk's code (The optimized code listed below). The list below shows one example of the time it took each version to do Blackhawk's test 1000 times (100,000,000 total appends). The new optimized version is labeled "New" and took a little longer than 34 seconds. Martin.Roller's array-based code is labeled "Ary", takes a little more than 36 seconds, and is nearly tied with Blackhawk's code. I've ran the test several times and Blackhawk's code does come in ahead of Martin.Roller's code some of the time. Blackhawk's code is labeled "Old" and takes a little more than 36 seconds. The "Mid" version came in last and is a version I created using VBA's MID statement to replace the content of a string. Removing the "On Error" in the code below should improve the speed even more, but make sure the calling code never builds a string longer than VBA can handle.
- The New StringBuilder took: 34.396 seconds
- The Ary StringBuilder took: 36.467 seconds
- The Old StringBuilder took: 36.605 seconds
- The Mid StringBuilder took: 40.141 seconds
New optimized version of Blackhawk's StringBuilder:
Option Compare Text
Option Explicit
Private Declare Sub CopyMemory Lib "kernel32.dll" Alias "RtlMoveMemory" (ByVal dst As Long, ByVal src As Long, ByVal Length As Long)
Private Const InitialCharCount As Long = 16
' Define members
Private mUBound As Long
Private mString() As Byte
Private Sub Class_Initialize()
Clear
End Sub
Public Sub Clear()
mUBound = -1
ReDim mString(0 To InitialCharCount * 2 - 1) 'Each unicode character is 2 bytes
End Sub
Public Function Append(value As String) As StringBuilder
Dim NewUBound As Long
Dim CapacityUBound As Long
On Error GoTo Failed
NewUBound = mUBound + LenB(value)
If NewUBound > UBound(mString) Then
CapacityUBound = UBound(mString) * 2 + 1
If NewUBound > CapacityUBound Then CapacityUBound = NewUBound * 2 + 1
ReDim Preserve mString(0 To CapacityUBound)
End If
CopyMemory VarPtr(mString(mUBound + 1)), StrPtr(value), LenB(value)
mUBound = NewUBound
Set Append = Me
Exit Function
Failed:
Stop
Resume
End Function
Public Property Get Length() As Long
Length = (mUBound + 1) / 2
End Property
Public Function ToString() As String
ToString = Mid(mString, 1, Length)
End Function
-
2\$\begingroup\$ Thanks for doing this! In the spirit of the question, can you add a description of the optimizations you made? That will help anyone looking at this answer to understand what changes cause the speedup. I have to say, one of the things that I love about VBA is the lack of a standard library - you get to learn how to roll your own everything :) \$\endgroup\$Blackhawk– Blackhawk2018年08月20日 14:41:02 +00:00Commented Aug 20, 2018 at 14:41
StringBuilder
class available through a nuget-like package manager for VBA (and Rubberduck might eventually craft a UI for it, so packages for a project can be add/updated/maintained, all in the VBE). Would be awesome if you could contact Tim to agree with what I've done with your code (or I can take down that repository if you prefer) - LMK! \$\endgroup\$ReDim Preserve
is the copy. When you have a chunk of memory and you want to "expand" it by just one character, what actually happens is the OS has to find a completely new chunk of memory that is the new requested size, and then it has to copy everything from the old chunk into the new chunk. This takes longer and longer as your data grows. Imagine building a 2GB array expanding 1 byte at a time - you'd end up making ~ 2 billion copies totaling ~ 2 billion GB! \$\endgroup\$ReDim Preserve
work behind the scenes. I seeReDim Preserve Arr(1 To UBound(Arr) + 1)
so often I did not study your code carefully enough. Once, MathieuGuindon had pointed out that each ReDim doubled the size of the array, not just added enough for the next append, the reason for your code being so fast was obvious. \$\endgroup\$