3
\$\begingroup\$

I have a simple code that loops through a text file to fix a few things in it and I use mostly dictionaries to find and handle the adjustment. However, with larger files (about 30K lines) it seems to get exponentially slower in a specific line of a loop. This line shouldn't be so slow because it just has to access an element specified and alter value. Anyone knows why this is happening ?

I can't figure out why that is happening and how to go around without a huge performance loss.

For i = 0 To tags.Count - 1
 If tags.Items(i)(2) = 0 Then 'This is the slow part. I need the validation for my case usage.
 linhas(CStr(i)) = 1
 End If
Next i

This is the full code simplified:

Sub ajusta_ofx_teste() 'macro para converter arquivos ofx em xml
 
 Dim my_file As Integer
 Dim text_line As String
 Dim file_name As String
 Dim i As Long
 Dim linhas As New Scripting.Dictionary
 Dim tags As New Scripting.Dictionary
 Dim fso As New Scripting.FileSystemObject
 Dim nfso As New Scripting.FileSystemObject
 Dim dados_tag(4) As Variant
 strofx = Application.GetOpenFilename(MultiSelect:=False)
 
 'Open file_name For Input As #my_file
 Set arquivo = fso.OpenTextFile(strofx, ForReading)
 
 L = 10000
 f = 0
 i = 0 'Define o número de linhas com Tag
 loops = 0 'Define a linha do arquivo sendo lida
 ntags = 0
 achou = 0
 
 While Not arquivo.AtEndOfStream
 'Line Input #my_file, text_line
 text_line = arquivo.ReadLine
 
 If i > 100000 Then Stop
 x = InStr(1, text_line, "<") 'Marca o início do Tag
 
 'Save only lines with Tag in "tag" dictionary.
 If x > 0 Then
 linhas.Add CStr(i), text_line
 
 If Mid(text_line, x + 1, 1) <> "/" Then
 ntags = ntags + 1
 fim_tag_i = InStr(1, text_line, ">")
 dados_tag(0) = i 'Tag line number
 dados_tag(1) = Mid(text_line, x, fim_tag_i - x + 1) 'Tag name
 dados_tag(2) = 0 'Registry used after - I removed the part of the code that fills it depending on the tag.
 dados_tag(3) = 0
 
 chave = Format(ntags, "000000") & dados_tag(1) 'chave = "Key"
 tags.Add chave, dados_tag
 End If
 i = i + 1
 End If
 loops = loops + 1
 If loops > 100000 Then Stop 'saida de emergencia do loop
 Wend
 nlinhas = i
 arquivo.Close
 For i = 0 To tags.Count - 1
 If tags.Items(i)(2) = 0 Then 'This is the slow part. I need the validation for my case usage.
 linhas(CStr(i)) = 1
 End If
 Next i
 Set fso = Nothing
 Set nfso = Nothing
End Sub

The text file can be downloaded here: test.ofx

Sᴀᴍ Onᴇᴌᴀ
29.5k16 gold badges45 silver badges201 bronze badges
asked Jan 13, 2023 at 21:24
\$\endgroup\$

3 Answers 3

2
\$\begingroup\$

Speed issue

tags.Items does a copy of the array of items. So, you are copying that array 32821 times which of course is slow.

Replace this:

For i = 0 To tags.Count - 1
 If tags.Items(i)(2) = 0 Then 'This is the slow part. I need the validation for my case usage.
 linhas(CStr(i)) = 1
 End If
Next i

with this:

Dim arrItems() As Variant: arrItems = tags.Items
For i = 0 To tags.Count - 1
 If arrItems(i)(2) = 0 Then
 linhas(CStr(i)) = 1
 End If
Next i

Other issues

Best practice is to have Option Explicit at the top of each code module. This forces you to declare all variables. In your code there are 11 undeclared variables which are all defaulted to Variant: strofx, arquivo, L, f, loops, ntags, achou, x, fim_tag_i, chave, nlinhas

Imagine having Dim cake as Long in your code and then somewhere later you can do a typing mistake and use cakes instead of cake. Without Option Explicit your code will compile and you won't even know why it's not working correctly.

To make this into a habit, turn on 'Require Variable Declaration' under Tools/Options: option

Use Stop only for debugging. Never for production code.

answered Jan 14, 2023 at 21:50
\$\endgroup\$
2
  • \$\begingroup\$ Hello, Thank you for the great for the great answer! I always thought that ".items" was a property and not a whole copy being made. It seems i have lots of codes to improve haha. Also thanks for the tip regarding declarations. I'm always lazy to declare everything but you are correct. Now the code takes only 15 seconds to run in comparison to more than 15 minutes before (i never waited until the end so a don't know the exact time it would take.). \$\endgroup\$ Commented Jan 16, 2023 at 11:39
  • 1
    \$\begingroup\$ @KE_1234 You are correct when you say that .Items is a property. However, this particular property returns an array. In VBA, when you assign an array, a copy is made by default. For example: arr2 = arr1 makes a copy of the arr1 array. \$\endgroup\$ Commented Jan 16, 2023 at 12:47
1
\$\begingroup\$

I believe your question regarding speed is answered by @CristianBuse. In addition to the excellent recommendations that came along with his answer, I wanted to suggest one more improvement un-related to performance...

Your code opens a Scripting.FileSystemObject and TextStream object and does not guarantee the destruction of both if an error should occur. Even though the provided code is "the full code simplified", I'm suspecting the full code has the same issue. So, rather than:

Sub ajusta_ofx_teste()
 Dim fso As New Scripting.FileSystemObject
 Set arquivo = fso.OpenTextFile(strofx, ForReading) 
 
 '... preparing for the While-Wend loop
 While Not arquivo.AtEndOfStream
 '... loop code
 Wend
 nlinhas = i
 arquivo.Close
 
 '... the 'slow part'
 
 Set fso = Nothing
End Sub

It would be better to guarantee that arquivo.Close and Set fso = Nothing are both executed in the event that an error is encountered. To do so, the code could be re-organized as follows:

Option Explicit 'The importance of ALWAYS declaring this Option cannot be overstated
Sub ajusta_ofx_teste() 'macro para converter arquivos ofx em xml
 Dim strofx As Variant
 strofx = Application.GetOpenFilename(MultiSelect:=False)
 
 If strofx = False Then 'User does not select a file
 Exit Sub
 End If
 
 Dim fso As New Scripting.FileSystemObject
 
 Dim arquivo As TextStream
On Error GoTo ErrorExit
 Set arquivo = fso.OpenTextFile(strofx, ForReading)
 
 ProcessTextStream arquivo
 
ErrorExit:
 If Not arquivo Is Nothing Then
 arquivo.Close
 End If
 Set fso = Nothing
End Sub
Private Sub ProcessTextStream (ByVal arquivo As TextStream)
 '... Declare necessary variables, e.g., linhas, i, tags, ...
 
 While Not arquivo.AtEndOfStream
 '... loop code
 Wend
 nlinhas = i
 'The recommended code for the code formerly known as 'the slow part'
 Dim arrItems() As Variant: arrItems = tags.Items
 For i = 0 To tags.Count - 1
 If arrItems(i)(2) = 0 Then
 linhas(CStr(i)) = 1
 End If
 Next i
 
End Sub

Now, if an error occurs during ProcessStream, the Scripting.FileSystemObject, and the TextStream are properly cleaned up.

answered Jan 16, 2023 at 5:57
\$\endgroup\$
2
  • \$\begingroup\$ Thank you for the tip. What happens if i don't properly close the txt stream? I thought vba would close it automatically. Will it just remain as trash in the Memory until the next restart ? \$\endgroup\$ Commented Jan 16, 2023 at 11:51
  • 1
    \$\begingroup\$ The suggestion falls under the category of a best/better practice. It matters more if you opened a file for writing - in a scenario where an error occurs while writing to a file (resulting in skipping the call to Close). If the code later tries to open the same file for writing using a different TextStream object - a 'Permission denied' error is generated. I tested Reading a file using the same scenario. It executes successfully whether Close is called or not. \$\endgroup\$ Commented Jan 16, 2023 at 17:54
0
\$\begingroup\$

First of all, thank you to @Cristian Buse for the clarification and answer.

The code supplied works perfectly but i went with a slight different approach. I tested both and this last one is a bit faster.

For Each strkey In tags.Keys()
 If tags(strkey)(2) = 0 Then
 linhas(tags(strkey)(0)) = linhas(CStr(tags(strkey)(0))) & Replace(tags(strkey)(1), "<", "</")
 End If
 i = i + 1
Next strkey

The total run time was 16 seconds with Cristian's method and 14 seconds with this method.

answered Jan 16, 2023 at 11:49
\$\endgroup\$
1
  • \$\begingroup\$ Welcome to Code Review! Thanks for posting this code. It's a good idea to summarise which changes you made, and why - a self-answer ought to review the code, just like any other answer. \$\endgroup\$ Commented Jan 16, 2023 at 11:51

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.