I am new to stackoverflow and a newbie to VBA coding. At my work, we are supplied with shipment data in the form of Ms Word which is not very useful. I have found a way to transfer the data using VBA and have a code that is fully functional. However, the data set contains hundreds of thousands of records. I tried running a month's worth of data with 200k records and it took 5 days. Just wondering if there is anything in my code that I could be improved to speed up the process. I've tried turning off screen updates, events, calculations but it didn't do much. Thanks in advance for your help.
Sub Word_to_Excel()
Dim FName As String, FD As FileDialog
Dim wdApp As Object
Dim wdDoc As Object
Dim WDR, WDCheck, ShipmentID As Object
Dim ExR As Range
Dim file
Dim Path As String
Dim ImportDate As Object
Dim ImportValue As String
Dim ShipmentIDcheck As String
Dim objResult
Set objShell = CreateObject("WScript.Shell")
Set ExR = Selection ' current location in Excel Sheet
' Select Folder containing word documents
Set FD = Application.FileDialog(msoFileDialogFolderPicker)
FD.Show
FName = FD.SelectedItems(1)
file = Dir(FName & "\*.docx")
Set wdApp = CreateObject("Word.Application")
' Open word document in the folder, run macro, close it and open the next word document until there are none left
Do While file <> ""
wdApp.Documents.Open Filename:=FName & "\" & file
wdApp.ActiveWindow.ActivePane.View.Type = 1
wdApp.Visible = True
' Once the word doc is open, go to beginning of document and search for CTY/SITE/SORT:
wdApp.Selection.HomeKey Unit:=6
wdApp.Selection.Find.ClearFormatting
wdApp.Selection.Find.Execute "CTY/SITE/SORT:"
Set WDCheck = wdApp.Selection
' If "CTY/SITE/SORT:" is found, then look for Shipment ID
Do While WDCheck = "CTY/SITE/SORT:"
' Find first shipment
wdApp.Selection.HomeKey Unit:=5
wdApp.Selection.MoveDown Unit:=5, Count:=11
wdApp.Selection.MoveRight Unit:=1, Count:=1
wdApp.Selection.MoveRight Unit:=1, Count:=11, Extend:=1
Set ShipmentID = wdApp.Selection
ShipmentIDcheck = Replace(ShipmentID, " ", "")
' Transfer information from Word to Excel for a Shipment ID and go to the next one.
' Shipment ID should be a string that is 11 characters long
' If Shipment ID no longer exist, go to next page by searching for the next CTY/SITE/SORT:
Do While Len(Trim(ShipmentIDcheck)) = 11
i = i + 1
ExR(i, 1) = file
ExR(i, 2) = ShipmentIDcheck
' Consignee Name
wdApp.Selection.MoveUp Unit:=5, Count:=1
wdApp.Selection.MoveRight Unit:=1, Count:=12
wdApp.Selection.MoveRight Unit:=1, Count:=23, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 3) = Trim(WDR)
' Importer Name
wdApp.Selection.MoveRight Unit:=1, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=23, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 8) = Trim(WDR)
' Shipper Name
wdApp.Selection.MoveRight Unit:=1, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=23, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 13) = Trim(WDR)
' Quantity
wdApp.Selection.MoveRight Unit:=1, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=10, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 19) = Trim(WDR)
' Weight
wdApp.Selection.MoveRight Unit:=1, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=12, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 20) = Trim(WDR)
' Value
wdApp.Selection.MoveRight Unit:=1, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=12, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 21) = Trim(WDR)
' Broker
wdApp.Selection.MoveRight Unit:=1, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=11, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 23) = Trim(WDR)
' Consignee Street
wdApp.Selection.HomeKey Unit:=5
wdApp.Selection.MoveDown Unit:=5, Count:=1
wdApp.Selection.MoveRight Unit:=1, Count:=13
wdApp.Selection.MoveRight Unit:=1, Count:=23, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 4) = Trim(WDR)
' Importer Street
wdApp.Selection.MoveRight Unit:=1, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=23, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 9) = Trim(WDR)
' Shipper Street
wdApp.Selection.MoveRight Unit:=1, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=23, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 14) = Trim(WDR)
' Description
wdApp.Selection.MoveRight Unit:=1, Count:=8
wdApp.Selection.MoveRight Unit:=1, Count:=40, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 18) = Trim(WDR)
' Consignee City
wdApp.Selection.HomeKey Unit:=5
wdApp.Selection.MoveDown Unit:=5, Count:=1
wdApp.Selection.MoveRight Unit:=1, Count:=13
wdApp.Selection.MoveRight Unit:=1, Count:=13, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 5) = Trim(WDR)
' Consignee Province
wdApp.Selection.MoveRight Unit:=1, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=2, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 6) = Trim(WDR)
' Consignee Postal
wdApp.Selection.MoveRight Unit:=1, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=6, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 7) = Trim(WDR)
' Importer City
wdApp.Selection.MoveRight Unit:=1, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=13, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 10) = Trim(WDR)
' Importer Province
wdApp.Selection.MoveRight Unit:=1, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=2, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 11) = Trim(WDR)
' Importer Postal
wdApp.Selection.MoveRight Unit:=1, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=6, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 12) = Trim(WDR)
' Shipper City
wdApp.Selection.MoveRight Unit:=1, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=13, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 15) = Trim(WDR)
' Shipper Province
wdApp.Selection.MoveRight Unit:=1, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=2, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 16) = Trim(WDR)
' Shipper Postal
wdApp.Selection.MoveRight Unit:=1, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=6, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 17) = Trim(WDR)
' Country of Origin
wdApp.Selection.MoveRight Unit:=1, Count:=29
wdApp.Selection.MoveRight Unit:=1, Count:=21, Extend:=1
Set WDR = wdApp.Selection
ExR(i, 22) = Trim(WDR)
wdApp.Selection.HomeKey Unit:=5
wdApp.Selection.MoveDown Unit:=5, Count:=2
wdApp.Selection.MoveRight Unit:=1, Count:=1
wdApp.Selection.MoveRight Unit:=1, Count:=11, Extend:=1
Set ShipmentID = wdApp.Selection
' Remove spaces from selection. Selection is then used to check if it is a shipment ID.
' If it is, then data for that shipment ID is transferred. If not, macro will go to the next page in the Word Doc.
ShipmentIDcheck = Replace(ShipmentID, " ", "")
ActiveCell.Offset(1).Select
Loop
'Simulate keyboard press "NUMLOCK" to prevent screen from locking
objResult = objShell.SendKeys("{NUMLOCK}")
wdApp.Selection.HomeKey Unit:=5
wdApp.Selection.Find.ClearFormatting
wdApp.Selection.Find.Execute "CTY/SITE/SORT:"
Set WDCheck = wdApp.Selection
Loop
wdApp.ActiveDocument.Close SaveChanges:=wdDoNotSaveChanges
ActiveWorkbook.Save
file = Dir()
Loop
wdApp.Quit
MsgBox "Data extraction completed at:" & vbNewLine & Format(Now, "mmmm d, yyyy hh:mm AM/PM")
End Sub
This is how the data is formatted in Ms Word. There are multiple word documents containing pages and pages of this dataset per day. Number of shipments per page varies. But the format are the same throughout. There are no tables in the word documents, just text separated by spaces. CTY/SITE/SORT: is unique to every page and I used it as an anchor point. if the macro finds it, then it goes down 11 lines and takes the first shipment ID and the other information. It then checks for the next shipment ID. If it is not there, then it goes to the next page and repeats the process.
REPORT NUM : ABC1234 OPERATIONS SYSTEM PAGE NUM: 2
CTY/SITE/SORT: CA 00123 SUMMARY CARGO RUN TIME: 07:33:43
SORT DATE : INBOUND - SCAN RUN DATE: 01AUG19
OPER ID : ABC123
MVMT: 12345678 MVMT DT: 01AUG19 MAWB: PROD TYP: DTY TYP: IMP CTY: EXP CTY: BL TYP:
COURIER REMISSION MANIFEST EXPORT SITE: US 12345
GCCN ID: EXPECTED SHPTS: EXPECTED PKGS: EXPECTED WEIGHT:
CUSTOMS NUM CONSIGNEE NAME IMPORTER NAME SHIPPER NAME CSA QTY WGT(LBS) VALUE BROKER
SHIPMENT ID DESCRIPTION (CAD) CTRY OF ORIGIN
JOHN SMITH ABC COMPANY XYZ COMPANY 1 1.1 1000.00 UNCONSIGNED
ABC12345678 123 MAIN STREET 345 RANDOM ROAD UNIVERSITY OF WASHINGTO BICYCLE PARTS
VANCOUVER BC V1A1A1 VANCOUVER BC V2B1B2 SEATTLE WA 981234 US
JOHN SMITH ABC COMPANY XYZ COMPANY 1 1.1 1000.00 UNCONSIGNED
ABC12345678 123 MAIN STREET 345 RANDOM ROAD UNIVERSITY OF WASHINGTO BICYCLE PARTS
VANCOUVER BC V1A1A1 VANCOUVER BC V2B1B2 SEATTLE WA 981234 US
JOHN SMITH ABC COMPANY XYZ COMPANY 1 1.1 1000.00 UNCONSIGNED
ABC12345678 123 MAIN STREET 345 RANDOM ROAD UNIVERSITY OF WASHINGTO BICYCLE PARTS
VANCOUVER BC V1A1A1 VANCOUVER BC V2B1B2 SEATTLE WA 981234 US
JOHN SMITH ABC COMPANY XYZ COMPANY 1 1.1 1000.00 UNCONSIGNED
ABC12345678 123 MAIN STREET 345 RANDOM ROAD UNIVERSITY OF WASHINGTO BICYCLE PARTS
VANCOUVER BC V1A1A1 VANCOUVER BC V2B1B2 SEATTLE WA 981234 US
TOTAL FOR DUTY TYPE COURIER REMISSION
TOTAL SHIPMENTS: 4
TOTAL PACKAGES : 4
TOTAL WEIGHT : 70.9 LBS
TOTAL VALUES : 4000.00
* * *
-
1\$\begingroup\$ Is the word document laid out with tables? If so I'd consider plain copy-pasting the Word tables into a spreadsheet, and try processing that instead. \$\endgroup\$Mathieu Guindon– Mathieu Guindon2020年01月15日 14:14:42 +00:00Commented Jan 15, 2020 at 14:14
-
\$\begingroup\$ no tables. it is laid out into columns and separated by spaces. \$\endgroup\$Buroughs– Buroughs2020年01月15日 15:42:30 +00:00Commented Jan 15, 2020 at 15:42
-
2\$\begingroup\$ Then I try text import. Anything but line-by-line, column-by-column processing ;-) \$\endgroup\$Mathieu Guindon– Mathieu Guindon2020年01月15日 15:56:22 +00:00Commented Jan 15, 2020 at 15:56
-
\$\begingroup\$ the-macro-recorder-curse can be cured! And 200K data should be stored in a database! \$\endgroup\$ComputerVersteher– ComputerVersteher2020年01月22日 01:41:33 +00:00Commented Jan 22, 2020 at 1:41
1 Answer 1
Declarations
My first comment is Option explicit
. Every. Single. Time.
Your first line of code is :
Set objShell = CreateObject("WScript.Shell")
Why? objshell
is not declared or used. And while on the matter of declarations:
Dim WDR, WDCheck, ShipmentID As Object
declares WDR
and WDCheck
as Variant, not Object.
Youi are writing a utility tool - using early binding instead of late binding will improve the code. (Dim wdApp as Word.Application
: Set wdApp = New Word.Application
, assuming you are running this from Excel).
Macro recorder
To me it is obvious you used the macro recorder and then simply copied the code to get what you wanted. In order to improve your code, look at each step that has been recorded (a couple of lines each time) to work out what is really happening
You open a word document, but do not assign that open document to the declared variable wdDoc
. Which should be declared as Word.Document
not Object
. Hint: Word.Application.Documents.Open
returns a Document.
Once you start looking at the recorded code and making sensible changes, you will stop working with the nebulous Selection
and start working directly with defined objects that you can control better.
Approach
A good approach is to first clean the input data. This can be as simple as identifying the block of text to be imported, copying that to an intermediate work area (perhaps a temporary word document, or a work area in your excel file) and then setting up the data format to suit your next step (the direct import).
What I have inferred from your code is that each column is separated by multiple spaces to create a nicely formatted output. So you can have two approaches here:
- replace those spaces with a known delimiter
- use the fixed column widths to do a text input into Excel (noted by @MathieuGuindon in the comments)
The first approach is useful if there is no consistency between the documents. The second is useful if there is this consistency.
Either way, at the end of these steps you have a consistent form of input data that you can now directly import into Excel.
I have deliberately not included any example code. The initial steps of refactoring the macro-recorded code is great learning experience for yourself and is something that will give you good insight into how you can improve your own code.
Looking forwards to the seeing the refactored code as a new article here for further review!