I have a simple code that loops through a text file to fix a few things in it and I use mostly dictionaries to find and handle the adjustment. However, with larger files (about 30K lines) it seems to get exponentially slower in a specific line of a loop. This line shouldn't be so slow because it just has to access an element specified and alter value. Anyone knows why this is happening ?
I can't figure out why that is happening and how to go around without a huge performance loss.
For i = 0 To tags.Count - 1
If tags.Items(i)(2) = 0 Then 'This is the slow part. I need the validation for my case usage.
linhas(CStr(i)) = 1
End If
Next i
This is the full code simplified:
Sub ajusta_ofx_teste() 'macro para converter arquivos ofx em xml
Dim my_file As Integer
Dim text_line As String
Dim file_name As String
Dim i As Long
Dim linhas As New Scripting.Dictionary
Dim tags As New Scripting.Dictionary
Dim fso As New Scripting.FileSystemObject
Dim nfso As New Scripting.FileSystemObject
Dim dados_tag(4) As Variant
strofx = Application.GetOpenFilename(MultiSelect:=False)
'Open file_name For Input As #my_file
Set arquivo = fso.OpenTextFile(strofx, ForReading)
L = 10000
f = 0
i = 0 'Define o número de linhas com Tag
loops = 0 'Define a linha do arquivo sendo lida
ntags = 0
achou = 0
While Not arquivo.AtEndOfStream
'Line Input #my_file, text_line
text_line = arquivo.ReadLine
If i > 100000 Then Stop
x = InStr(1, text_line, "<") 'Marca o início do Tag
'Save only lines with Tag in "tag" dictionary.
If x > 0 Then
linhas.Add CStr(i), text_line
If Mid(text_line, x + 1, 1) <> "/" Then
ntags = ntags + 1
fim_tag_i = InStr(1, text_line, ">")
dados_tag(0) = i 'Tag line number
dados_tag(1) = Mid(text_line, x, fim_tag_i - x + 1) 'Tag name
dados_tag(2) = 0 'Registry used after - I removed the part of the code that fills it depending on the tag.
dados_tag(3) = 0
chave = Format(ntags, "000000") & dados_tag(1) 'chave = "Key"
tags.Add chave, dados_tag
End If
i = i + 1
End If
loops = loops + 1
If loops > 100000 Then Stop 'saida de emergencia do loop
Wend
nlinhas = i
arquivo.Close
For i = 0 To tags.Count - 1
If tags.Items(i)(2) = 0 Then 'This is the slow part. I need the validation for my case usage.
linhas(CStr(i)) = 1
End If
Next i
Set fso = Nothing
Set nfso = Nothing
End Sub
The text file can be downloaded here: test.ofx
3 Answers 3
Speed issue
tags.Items
does a copy of the array of items. So, you are copying that array 32821 times which of course is slow.
Replace this:
For i = 0 To tags.Count - 1
If tags.Items(i)(2) = 0 Then 'This is the slow part. I need the validation for my case usage.
linhas(CStr(i)) = 1
End If
Next i
with this:
Dim arrItems() As Variant: arrItems = tags.Items
For i = 0 To tags.Count - 1
If arrItems(i)(2) = 0 Then
linhas(CStr(i)) = 1
End If
Next i
Other issues
Best practice is to have Option Explicit
at the top of each code module. This forces you to declare all variables. In your code there are 11 undeclared variables which are all defaulted to Variant
:
strofx, arquivo, L, f, loops, ntags, achou, x, fim_tag_i, chave, nlinhas
Imagine having Dim cake as Long
in your code and then somewhere later you can do a typing mistake and use cakes
instead of cake
. Without Option Explicit
your code will compile and you won't even know why it's not working correctly.
To make this into a habit, turn on 'Require Variable Declaration' under Tools/Options: option
Use Stop
only for debugging. Never for production code.
-
\$\begingroup\$ Hello, Thank you for the great for the great answer! I always thought that ".items" was a property and not a whole copy being made. It seems i have lots of codes to improve haha. Also thanks for the tip regarding declarations. I'm always lazy to declare everything but you are correct. Now the code takes only 15 seconds to run in comparison to more than 15 minutes before (i never waited until the end so a don't know the exact time it would take.). \$\endgroup\$KE_1234– KE_12342023年01月16日 11:39:52 +00:00Commented Jan 16, 2023 at 11:39
-
1\$\begingroup\$ @KE_1234 You are correct when you say that
.Items
is a property. However, this particular property returns an array. In VBA, when you assign an array, a copy is made by default. For example:arr2 = arr1
makes a copy of thearr1
array. \$\endgroup\$Cristian Buse– Cristian Buse2023年01月16日 12:47:55 +00:00Commented Jan 16, 2023 at 12:47
I believe your question regarding speed is answered by @CristianBuse. In addition to the excellent recommendations that came along with his answer, I wanted to suggest one more improvement un-related to performance...
Your code opens a Scripting.FileSystemObject
and TextStream
object and does not guarantee the destruction of both if an error should occur. Even though the provided code is "the full code simplified", I'm suspecting the full code has the same issue. So, rather than:
Sub ajusta_ofx_teste()
Dim fso As New Scripting.FileSystemObject
Set arquivo = fso.OpenTextFile(strofx, ForReading)
'... preparing for the While-Wend loop
While Not arquivo.AtEndOfStream
'... loop code
Wend
nlinhas = i
arquivo.Close
'... the 'slow part'
Set fso = Nothing
End Sub
It would be better to guarantee that arquivo.Close
and Set fso = Nothing
are both executed in the event that an error is encountered. To do so, the code could be re-organized as follows:
Option Explicit 'The importance of ALWAYS declaring this Option cannot be overstated
Sub ajusta_ofx_teste() 'macro para converter arquivos ofx em xml
Dim strofx As Variant
strofx = Application.GetOpenFilename(MultiSelect:=False)
If strofx = False Then 'User does not select a file
Exit Sub
End If
Dim fso As New Scripting.FileSystemObject
Dim arquivo As TextStream
On Error GoTo ErrorExit
Set arquivo = fso.OpenTextFile(strofx, ForReading)
ProcessTextStream arquivo
ErrorExit:
If Not arquivo Is Nothing Then
arquivo.Close
End If
Set fso = Nothing
End Sub
Private Sub ProcessTextStream (ByVal arquivo As TextStream)
'... Declare necessary variables, e.g., linhas, i, tags, ...
While Not arquivo.AtEndOfStream
'... loop code
Wend
nlinhas = i
'The recommended code for the code formerly known as 'the slow part'
Dim arrItems() As Variant: arrItems = tags.Items
For i = 0 To tags.Count - 1
If arrItems(i)(2) = 0 Then
linhas(CStr(i)) = 1
End If
Next i
End Sub
Now, if an error occurs during ProcessStream
, the Scripting.FileSystemObject
, and the TextStream
are properly cleaned up.
-
\$\begingroup\$ Thank you for the tip. What happens if i don't properly close the txt stream? I thought vba would close it automatically. Will it just remain as trash in the Memory until the next restart ? \$\endgroup\$KE_1234– KE_12342023年01月16日 11:51:39 +00:00Commented Jan 16, 2023 at 11:51
-
1\$\begingroup\$ The suggestion falls under the category of a best/better practice. It matters more if you opened a file for writing - in a scenario where an error occurs while writing to a file (resulting in skipping the call to
Close
). If the code later tries to open the same file for writing using a differentTextStream
object - a 'Permission denied' error is generated. I tested Reading a file using the same scenario. It executes successfully whetherClose
is called or not. \$\endgroup\$BZngr– BZngr2023年01月16日 17:54:49 +00:00Commented Jan 16, 2023 at 17:54
First of all, thank you to @Cristian Buse for the clarification and answer.
The code supplied works perfectly but i went with a slight different approach. I tested both and this last one is a bit faster.
For Each strkey In tags.Keys()
If tags(strkey)(2) = 0 Then
linhas(tags(strkey)(0)) = linhas(CStr(tags(strkey)(0))) & Replace(tags(strkey)(1), "<", "</")
End If
i = i + 1
Next strkey
The total run time was 16 seconds with Cristian's method and 14 seconds with this method.
-
\$\begingroup\$ Welcome to Code Review! Thanks for posting this code. It's a good idea to summarise which changes you made, and why - a self-answer ought to review the code, just like any other answer. \$\endgroup\$Toby Speight– Toby Speight2023年01月16日 11:51:07 +00:00Commented Jan 16, 2023 at 11:51