7
\$\begingroup\$

I have written some VBA to loop through a directory of 15,000 XML files and extract key information from these files into Excel - this is unbearably slow. Can someone please assist me in optimising the module?

Sub ImportXMLtoList()
 Application.DisplayAlerts = False
 Application.ScreenUpdating = False
 oldStatusBar = Application.DisplayStatusBar
 Application.DisplayStatusBar = True
 Dim myDir As String
 Dim StrFile As String
 StrFile = Dir("\\Ntsydfsp152\shared\Temp\Lucsan\LMIG\Output\XML\SBL\Staging\")
 myDir = "\\Ntsydfsp152\shared\Temp\Lucsan\LMIG\Output\XML\SBL\Staging\"
 myFile = "LMIG_List_XML_Files.xlsm"
 'clear sheet1
 Sheets(1).Range("a2:g30000").Clear
 x = 2
 Do While Len(StrFile) > 0
 Application.StatusBar = StrFile & " Row: " & x
 Sheets(1).Cells(x, 1).Value = StrFile
 Workbooks.OpenXML Filename:=myDir & StrFile, LoadOption:=xlXmlLoadImportToList
 temp2 = ActiveWorkbook.Sheets(1).Range("g2")
 temp3 = ActiveWorkbook.Sheets(1).Range("e2")
 temp4 = ActiveWorkbook.Sheets(1).Range("ak3")
 temp5 = ActiveWorkbook.Sheets(1).Range("av4") & "/" & ActiveWorkbook.Sheets(1).Range("ax4") & "/" & ActiveWorkbook.Sheets(1).Range("aw4") & "/" & ActiveWorkbook.Sheets(1).Range("ay4")
 ActiveWorkbook.Close
 Sheets(1).Cells(x, 2) = temp2
 Sheets(1).Cells(x, 3) = temp3
 Sheets(1).Cells(x, 4) = temp4
 Sheets(1).Cells(x, 5) = temp5
 x = x + 1
 StrFile = Dir
 Loop
 Application.DisplayAlerts = True
 Application.ScreenUpdating = True
 Application.StatusBar = False
 Application.DisplayStatusBar = oldStatusBar
End Sub
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Oct 2, 2014 at 6:58
\$\endgroup\$

4 Answers 4

6
\$\begingroup\$

A few short comments:

Use Option Explicit:

Why? because It makes thinks easier by orders of magnitude when developing.
You get more useful error messages and it becomes harder to misspell variable names without realizing.

That brings us right to the next:

Variable Names:

Your variable names are not consistently cased. You are mixing up "camel casing" (e.g. myDir) and "Pascal Casing" (e.g. StrFile)

Also temp 1 through 4 are not really good variable names. Neither is x, but hey ;)

Error Handling:

You should really make sure your "Application" is in a clean state after exit, even when you got unexpected errors:

Sub ImportXMLToList()
 On Error GoTo ErrorHandler
 ' Code goes here: 
:CleanExit
 Application.DisplayAlerts = True
 Application.ScreenUpdating = True
 Application.StatusBar = False
 Application.DisplayStatusBar = oldStatusBar
 Exit Sub
:ErrorHandler
 'Close resources, maybe errormessage
 Resume CleanExit
End Sub
answered Oct 2, 2014 at 7:19
\$\endgroup\$
4
  • 2
    \$\begingroup\$ Great advice, but it can be hard to control the case in VB because it's case insensitive and the IDE will automatically change it on you. \$\endgroup\$ Commented Oct 2, 2014 at 15:14
  • 1
    \$\begingroup\$ Sorry Vogel but the OP says the code is slow and your tips don't really improve the execution time other than being just some nice cosmetic improvements. \$\endgroup\$ Commented Oct 8, 2014 at 7:55
  • \$\begingroup\$ @vba4all I never claimed they'd improve the performance and frankly speaking, I am not knowledgeable enough to suggest performance-relevant changes :( But I think a clean and well defined Error Handling is more than a "cosmetic improvement" ... Either way why are you sorry? \$\endgroup\$ Commented Oct 8, 2014 at 7:58
  • \$\begingroup\$ just don't want to you feel like im attacking you or something. just sharing my opinion \$\endgroup\$ Commented Oct 8, 2014 at 7:59
6
\$\begingroup\$

I'll try not to duplicate advice, but seriously, use an error handler and make sure the application gets set back to a workable state. There's nothing more frustrating for a user than to end up with a "frozen" instance of Excel because screen updating has been turned off.

  • I like that you're declaring variables for your directories and file names, but they should be constants instead. These are things that shouldn't change at runtime.
  • This won't actually clear the sheet.

     'clear sheet1
     Sheets(1).Range("a2:g30000").Clear
    

    You should find the used range.

  • The Sheets collection contains Charts and Pivot Tables. Use the Worksheets collection instead.

  • x is a pretty bad variable name. row would be much more descriptive.

  • You should declare a workbook variable instead of calling ActiveWorkbook all the time. It is possible for the user to select a different workbook while the code is running. Then the code is working on a different workbook and everything is a mess.

  • Anytime you're numbering variables, you should step back and rethink the names. temp is a terrible name on it's own. Start numbering them and they're completely obfuscated.

  • I recommend using named ranges instead of cell addresses. It's the difference between this:

    temp2 = ActiveWorkbook.Sheets(1).Range("g2")
    

    and this:

    personName = wb.Worksheets(1).Range("PersonName")
    
  • You should initialize StrFile closer to the loop. It makes it clearer that you're looping through all files in a directory.

Finally...... Performance:

You can set the recalculation of the sheet to manual prior to running the code. It will only make a large difference if your code has a lot of formulas, but it's worth a shot. Just be sure to set it back to xlAutomatic in the "CleanExit" part of your routine.

Application.Calculation = xlCalculationManual

But really, never mind that because the real slow down here is opening and closing all of those workbooks. I recommend looking into connecting to those workbooks using ADODB recordsets. That should significantly speed up the process.

answered Oct 2, 2014 at 15:41
\$\endgroup\$
3
  • \$\begingroup\$ no offence but most of those tips don't really affect performance although they're good syntax related advices. \$\endgroup\$ Commented Oct 8, 2014 at 7:54
  • 1
    \$\begingroup\$ @vba4all you're absolutely right, but connecting to the sheets with ADODB will solve the performance problem. I felt there were other issues with the code. While I try to address any particular concerns, all of the code is up for review. \$\endgroup\$ Commented Oct 8, 2014 at 9:58
  • \$\begingroup\$ Well, at the least, it will speed it up. Processing 15k files is going to take some time regardless. \$\endgroup\$ Commented Oct 8, 2014 at 10:04
5
\$\begingroup\$

The other posts cover the main issues very well, so I won't belabour them and get right down to your performance issue. Your code is unbearably slow because using Excel to retrieve values from XML documents is using the bluntest of instruments available for the job. Kind of like chopping down a tree with a noodle. The network retrieval is not your issue at all - the bottleneck is how long it takes Excel to open a new Workbook. Just for a quick test, open an XML document in Excel, then open the same one in Notepad. Now imagine doing that 15,000 times. This is exactly what your code is doing, and is exactly why it is running slow.

Luckily, there are XML handling tools available in VBA - just add a reference to Microsoft XML and use them instead. I ran some tests to compare the 2 access methods, opening this file, retrieving a value, and writing it to a cell in the current Worksheet.

XML Method:

Sub FastRead(file As String, sheet As Worksheet)
 Dim current As DOMDocument
 Dim tree As IXMLDOMSelection
 Set current = New DOMDocument
 'For the sake of argument, we'll write the artist of the 4th CD to cell A1.
 If current.Load(file) Then
 Set tree = current.SelectNodes("CATALOG/CD")
 sheet.Cells(1, 1) = tree.item(3).ChildNodes(1).Text
 Else
 'Do whatever you need to do when you can't open the file.
 Debug.Print current.parseError.ErrorCode
 End If
End Sub

Excel Method:

Sub SlowRead(file As String, sheet As Worksheet)
 Dim worker As Workbook
 Set worker = Workbooks.OpenXML(file , , xlXmlLoadImportToList)
 'For the sake of argument, we'll write the artist of the 4th CD to cell A1.
 sheet.Cells(1, 1) = worker.Sheets(1).Cells(5, 2).Value
 Call worker.Close(False)
End Sub

Test Code

Sub Test()
 Dim sheet As Worksheet
 Set sheet = ActiveSheet
 Application.DisplayAlerts = False
 Application.ScreenUpdating = False
 Dim i As Integer
 For i = 1 To 10000
 Call FastRead("C:\cd_catalog.xml", sheet)
' Call SlowRead("C:\cd_catalog.xml", sheet)
 Next i
 Application.DisplayAlerts = True
 Application.ScreenUpdating = True
End Sub

The FastRead version took 17 seconds over 10,000 calls. The SlowRead version clocked in at 28 minutes. Moral of the story is that Excel allows you to open XML files merely as a convenience - it is not intended to be an efficient way of retrieving values from them. Always best to use the right tools for the job.

answered Oct 11, 2014 at 5:57
\$\endgroup\$
2
  • \$\begingroup\$ This is spot on. The Windows XML libraries make this so much easier - I've been working with them a lot recently in rebuilding a clunky Access application, and using MSXML and particularly using XPath expressions to pull data out has massively improved performance. \$\endgroup\$ Commented Oct 13, 2014 at 10:33
  • \$\begingroup\$ somehow related \$\endgroup\$ Commented Oct 13, 2014 at 14:23
4
\$\begingroup\$

On top of what the other people have said I can only recommend one more thing...

It seems you're accessing files on a network drive and doing that always adds overhead to the runtime.

Since you mention 15K files I would consider temporarily copying the directory that stores all those files to a local location on the drive and then running the code on the local copy of that directory instead of network.

You can then copy and paste +override the directory back to where it originally was.

answered Oct 8, 2014 at 7:53
\$\endgroup\$
3
  • 1
    \$\begingroup\$ I don't mean any offense, but it doesn't seem reasonable to copy 15k files twice. \$\endgroup\$ Commented Oct 8, 2014 at 11:38
  • 1
    \$\begingroup\$ well, I would test it first :) \$\endgroup\$ Commented Oct 8, 2014 at 12:20
  • \$\begingroup\$ When you put it that way, it's worth trying. =;)- \$\endgroup\$ Commented Oct 8, 2014 at 12:33

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.