I have adapted a VBA code to import all the XML files I have on a folder into Excel. I have tried to run the code with only two files and it takes about 30 seconds. Several minutes to import less than 20 files.
I need to import more than 200,000 files. Can you please help me on how to improve my code?
Sub ListFiles()
'DECLARE AND SET VARIABLEs
Dim ShellApplication As Object
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
Set ShellApplication = CreateObject("Shell.Application").BrowseForFolder(0, "Please choose a folder", 0, OpenAt)
Path = ShellApplication.self.Path
Set ShellApplication = Nothing
[a3] = "XML"
[b3] = "Files"
'DEFAULT PATH FROM HIDDEN SHEET
Call ListMyFiles(Path, True)
End Sub
Sub ListMyFiles(mySourcePath, IncludeSubfolders)
Set MyObject = New Scripting.FileSystemObject
Set mySource = MyObject.GetFolder(mySourcePath)
'With SearchXML
Application.ScreenUpdating = False
'--------------------------------------------------------------------
'FIND XML FILES ONLY, APPLY SEARCH CRIERIA, DISPLAY MATCHES ONLY
For Each myfile In mySource.Files
If Right(myfile.Name, 3) = "XML" Or Right(myfile.Name, 3) = "xml" Then 'IS XML?
LastRow = ActiveSheet.Cells(Rows.Count, 2).End(xlUp).Row
'-------------------------------------------------------------
'IMPORT XML FILE
Application.DisplayAlerts = False
ActiveWorkbook.XmlImport URL:=mySource & "\" & myfile.Name, _
ImportMap:=Nothing, Overwrite:=True, Destination:=Range("$B$" & LastRow + 1)
Cells(LastRow + 1, 1) = myfile.Name
'------------------------------------------------------------
'DELETE MAPS
maps = ActiveWorkbook.XmlMaps.Count
For i = 1 To maps
ActiveWorkbook.XmlMaps(1).Delete
Next i
End If
Next
If IncludeSubfolders Then 'SEARCH SUBFOLDERS FOR SAME CRITERIA
For Each MySubFolder In mySource.SubFolders
Call ListMyFiles(MySubFolder.Path, True)
Next
End If
'End With
Application.ScreenUpdating = True
Application.DisplayAlerts = True
Sheets("XMLData").UsedRange.WrapText = False
End Sub
1 Answer 1
Not knowing what your XMl files are like or what you're trying to extract from them, this is mostly speculative:
Generally, you want to strip away as much unwanted data as early as possible in the process. Rather than trying to import an entire file to the worksheet every time, why not strip your document down to a NodeList, and then either target specific nodes (if you're only after certain data) or iterate over them and extract the data values into an array. Then you can print that Array to a worksheet.
I recently built a Tool to extract data from corporate accounts. It loads an XML doc (typically 100kb), parses the Node Tree, searches for specific nodes using XPath, extracts the displayed text of each and sticks it in an array that's printed to a worksheet every 500 lines or so. It grabs 5 sets of nodes from each document, and parses about 1 Million documents/hour, or nearly 300 documents / second.
Below is an extract showing the basic processing of loading, stripping, searching and retrieving.
Public Sub GetDirectorsFromFile(ByVal strFullFilename As String)
Dim xDoc As MSXML2.DOMDocument
Set xDoc = New MSXML2.DOMDocument
Dim colDirectors As Collection
Set colDirectors = New Collection
With xDoc
If .Load(strFullFilename) Then
.setProperty "SelectionLanguage", "XPath"
GetDirectorsFromXml xDoc, colDirectors
End If
End With
End Sub
Public Sub GetDirectorsFromXml(ByRef xDoc As MSXML2.DOMDocument, ByRef colDirectors As Collection)
Dim ixItem As Long
Dim xPathSearchString As String
Dim nodes As MSXML2.IXMLDOMNodeList
Dim nodeText As String
xPathSearchString = "//*[contains(@name,""NameEntityOfficer"")]"
Set nodes = xDoc.SelectNodes(xPathSearchString )
For ixItem = 0 To nodes.Length - 1
nodeText = nodes.Item(ixItem).text
colDirectors.Add text
Next ixItem
End Sub