Background
The CSV files being used can be 6+ mill records in size, using a small sample size to proof the methodology.
Why are you doing this on excel? Access or SQL would be quicker
My company has disabled the VBE for anyone who is not recognised as a developer (progressive, I know.....).
I want to submit a third party macro & be recognised as a developer to continue building small applications when native tools do not fulfil my needs.
Objective
- Import two CSV files & add to respective arrays for comparison (attached for reference)
- Files contain account & unit balance data (dummy data for this example)
- Consolidate records onto the 'original' array
- If a record from the 'original' file is also on the 'external' file, the units from the 'external' file should be added to the corresponding record on the 'original' file.
- Output a CSV file with double quote text qualifiers & ";" delimiters.
enter image description here enter image description here
I have compiled the below code to complete but really appreciate any feedback on how to optimise given the size of the data that will be used.
Sub Import_Files_Create_Consolidated_File()
Const txtFilepath = "C:\Users\jason\OneDrive\Desktop\VBA_Folder\Macros\Balance_Consolidation\"
Dim arrOrig, arrExt As Variant
Dim i, j, n, n2, x As Long
Dim joinOrig, joinExt, strTempArray1d, strTempArray1dExt As String
Dim FSOreadOrigBal As Object: Set FSOreadOrigBal = CreateObject("Scripting.FileSystemObject")
Dim FSOreadExtBal As Object: Set FSOreadExtBal = CreateObject("Scripting.FileSystemObject")
Dim FSOcreatetxt As Object: Set FSOcreatetxt = CreateObject("Scripting.FileSystemObject")
'I move output file to later in the procedure after err handling is finished
Dim outputFile As Object: Set outputFile = FSOcreatetxt.CreateTextFile(txtFilepath & "Output.csv", True)
Dim fdImportOrigBal As FileDialog: Set fdImportOrigBal = Application.FileDialog(msoFileDialogFilePicker)
Dim fdImportExtBal As FileDialog: Set fdImportExtBal = Application.FileDialog(msoFileDialogFilePicker)
Dim strOrigBal, strExtBal As String
Dim tempArray1d, tempArray1dExt, finalArrayOrig2d, finalArrayExt2d As Variant
'Allow the user to select their input files, Enter err handling later
With fdImportOrigBal
.Filters.Clear
.Filters.Add "CSV files", "*.CSV*"
.Title = "Select original balance file"
.AllowMultiSelect = False
.InitialFileName = txtFilepath
.Show
End With
strOrigBal = fdImportOrigBal.SelectedItems(1)
Set FSOreadOrigBal = FSOreadOrigBal.OpenTextFile(strOrigBal, 1)
With fdImportExtBal
.Filters.Clear
.Filters.Add "CSV files", "*.CSV*"
.Title = "Select external balance file"
.AllowMultiSelect = False
.InitialFileName = txtFilepath
.Show
End With
strExtBal = fdImportExtBal.SelectedItems(1)
Set FSOreadExtBal = FSOreadExtBal.OpenTextFile(strExtBal, 1)
'Compile both files into a 1d array
arrOrig = Split(FSOreadOrigBal.ReadAll, vbNewLine)
arrExt = Split(FSOreadExtBal.ReadAll, vbNewLine)
'So I want to check if an item is on the external array but not on the original array_
'If not found, move the item from the external array & add to the original array_
'Important that the item just moved is deleted from the external array, otherwise it will incorrectly sum the units when_
'we complete this check
'I will need to only use the first 4 columns for the unique key, as units will differ even if on both external & original arrays.
ReDim tempArray1d(UBound(arrOrig))
'Create unique key on original array & pass it to a temp array
For n = LBound(arrOrig, 1) + 1 To UBound(arrOrig, 1)
tempArray1d(n) = Split(arrOrig(n), ",")(0) & Split(arrOrig(n), ",")(1) & Split(arrOrig(n), ",")(2) & Split(arrOrig(n), ",")(3)
Next n
ReDim tempArray1dExt(UBound(arrExt))
'Create unique key on original array & pass it to a temp array_
'Using a UDF & unique key, search the temp original array to find the record from the external array_
'If not found, resize original array & add the record which has not been found.
For n = LBound(arrExt, 1) + 1 To UBound(arrExt, 1)
tempArray1dExt(n) = Split(arrExt(n), ",")(0) & Split(arrExt(n), ",")(1) & Split(arrExt(n), ",")(2) & Split(arrExt(n), ",")(3)
strTempArray1dExt = tempArray1dExt(n)
'Credit function - https://stackoverflow.com/questions/38267950/check-if-a-value-is-in-an-array-or-not-with-excel-vba
If IsInArray(strTempArray1dExt, tempArray1d) = False Then
ReDim Preserve arrOrig(UBound(arrOrig) + 1)
arrOrig(UBound(arrOrig)) = arrExt(n)
arrExt(n) = ""
End If
Next n
'Now I want to resize my array before making a 2d array by removing the blank indexes
ReDim tempArray1dExt(0)
For n = LBound(arrExt, 1) To UBound(arrExt, 1)
'Write non blank rows to a different temp array & then resize & write back to arrExt
If arrExt(n) <> "" Then
tempArray1dExt(UBound(tempArray1dExt)) = arrExt(n)
ReDim Preserve tempArray1dExt(UBound(tempArray1dExt) + 1)
End If
Next n
ReDim arrExt(UBound(tempArray1dExt) - 1)
For n = LBound(tempArray1dExt, 1) To UBound(tempArray1dExt, 1) - 1
arrExt(n) = tempArray1dExt(n)
Next n
'Create the template for my final 2d array, will be used to write to a CSV output
ReDim finalArrayOrig2d(UBound(arrOrig), 4)
For n = LBound(arrOrig, 1) To UBound(arrOrig, 1)
tempArray1d = Split(arrOrig(n), ",")
For n2 = 0 To 4
finalArrayOrig2d(n, n2) = tempArray1d(n2)
Next n2
Next n
'Create the template for my final 2d external array, will be used to sum units if unique key is found
ReDim finalArrayExt2d(UBound(arrExt), 4)
For n = LBound(arrExt, 1) To UBound(arrExt, 1)
tempArray1d = Split(arrExt(n), ",")
For n2 = 0 To 4
finalArrayExt2d(n, n2) = tempArray1d(n2)
Next n2
Next n
'Change data type to double in final arrays for units addition
For i = LBound(finalArrayOrig2d, 1) + 1 To UBound(finalArrayOrig2d, 1)
finalArrayOrig2d(i, 4) = CDbl(finalArrayOrig2d(i, 4))
Next i
For i = LBound(finalArrayExt2d, 1) + 1 To UBound(finalArrayExt2d, 1)
finalArrayExt2d(i, 4) = CDbl(finalArrayExt2d(i, 4))
Next i
'Enter a loop to determine if there is a match & add units if there is
For i = LBound(finalArrayOrig2d, 1) + 1 To UBound(finalArrayOrig2d, 1)
joinOrig = Join(Array(finalArrayOrig2d(i, 0), finalArrayOrig2d(i, 1), finalArrayOrig2d(i, 2), finalArrayOrig2d(i, 3)))
'Enter external balance file & compile unique key
For j = LBound(finalArrayExt2d, 1) + 1 To UBound(finalArrayExt2d, 1)
joinExt = Join(Array(finalArrayExt2d(j, 0), finalArrayExt2d(j, 1), finalArrayExt2d(j, 2), finalArrayExt2d(j, 3)))
'Add units here is a match
If joinOrig = joinExt Then
finalArrayOrig2d(i, 4) = finalArrayOrig2d(i, 4) + finalArrayExt2d(j, 4)
End If
Next j
Next i
'Wrap final array in double quote text qualifiers
For i = LBound(finalArrayOrig2d, 1) To UBound(finalArrayOrig2d, 1)
For j = LBound(finalArrayOrig2d, 2) To UBound(finalArrayOrig2d, 2)
finalArrayOrig2d(i, j) = """" & finalArrayOrig2d(i, j) & """"
Next j
Next i
'Write the updated original array to a CSV file
For n = LBound(finalArrayOrig2d, 1) To UBound(finalArrayOrig2d, 1)
outputFile.WriteLine Join(Array(finalArrayOrig2d(n, 0), finalArrayOrig2d(n, 1), finalArrayOrig2d(n, 2), finalArrayOrig2d(n, 3), finalArrayOrig2d(n, 4)), ";")
Next n
End Sub
Public Function IsInArray(stringToBeFound As String, arr As Variant) As Boolean
Dim i As Long
For i = LBound(arr) To UBound(arr)
If arr(i) = stringToBeFound Then
IsInArray = True
Exit Function
End If
Next i
IsInArray = False
End Function
2 Answers 2
A well-known code development aphorism is "Make it work. Make it right. Make it fast" (Kent Beck). The most nebulous statement here is "Make it Right". I would propose that making code easier to read/understand, maintain, and test is a reasonable interpretation of "Make it Right".
Making it Work
Since you've posted this question on CR, it is assumed that the code works.
Step 1 complete!
Making it Right
Comprehension of code for a first time reader is made much easier if it emulates an outline of a document. That is, a well written document uses a nested structure consisting of a title, sections, paragraphs, and sentences.
When a module's code consists of a single long procedure, it is like publishing a document where there is a title (EntryPoint Sub
), no sections, and a single, really really long paragraph (the code). Sure, it's possible to read it. But, it is painful - and the reader is likely to give up if the ideas are not presented in a way that can be easily organized in the reader's head. (BTW: the 'first time reader' includes yourself when the code is set aside for a few days/weeks/months).
So, how to evaluate your code to avoid the single long paragraph in the above analogy? The simplest metric to apply is lines of code (Loc). Which, of course, only generates a new question - How many lines of code are just right? There is no perfect answer. But, if a reader has to scroll up and down frequently to see and understand the content in a function, then that function is likely too long. (BTW: after completing the "Make it Work" step, my own code always contains functions that are too long)
Import_Files_Create_Consolidated_File
has 167 lines of code...that's a lot of scrolling. Everyone's setup is different, but, when I have the VBE open, my editor pane displays around 32 lines (+/- depending on my Immediate and Locals Windows height). For lack of any better criteria, I would set out to initially get Import_Files_Create_Consolidated_File
to fit within 32 lines of code.
Despite my justifications, 32-Loc per function is an arbitrary goal (because, admittedly, one could just buy a really big monitor to avoid scrolling). The point of establishing a function Loc limit is that it naturally forces the use of helper functions to carry out specific tasks. After refactoring, Import_Files_Create_Consolidated_File
will contain function calls with names that describe the high-level tasks necessary to 'ImportFiles' and 'CreateConsolidatedFile'. Additionally, coding best practices are also naturally brought to bear during refactoring to meet this goal. Two in particular: The Single Responsibility Principle (SRP) and Don't Repeat Yourself (DRY).
Import_Files_Create_Consolidated_File
can be easily reduced to the following code:
Option Explicit
Private Const txtFilepath As String = "C:\Users\jason\OneDrive\Desktop\VBA_Folder\Macros\Balance_Consolidation\"
Private strOrigBal As String
Private strExtBal As String
Sub Import_Files_Create_Consolidated_File()
ImportFiles
CreateConsolidatedFile strOrigBal, strExtBal, txtFilepath & "Output.csv"
End Sub
Private Sub ImportFiles()
Dim fdImportOrigBal As FileDialog
Set fdImportOrigBal = Application.FileDialog(msoFileDialogFilePicker)
Dim fdImportExtBal As FileDialog
Set fdImportExtBal = Application.FileDialog(msoFileDialogFilePicker)
'Allow the user to select their input files, Enter err handling later
With fdImportOrigBal
.Filters.Clear
.Filters.Add "CSV files", "*.CSV*"
.Title = "Select original balance file"
.AllowMultiSelect = False
.InitialFileName = txtFilepath
.Show
End With
strOrigBal = fdImportOrigBal.SelectedItems(1)
With fdImportExtBal
.Filters.Clear
.Filters.Add "CSV files", "*.CSV*"
.Title = "Select external balance file"
.AllowMultiSelect = False
.InitialFileName = txtFilepath
.Show
End With
strExtBal = fdImportExtBal.SelectedItems(1)
End Sub
Public Sub CreateConsolidatedFile(ByVal pOrigBalFilepath As String, _
ByVal pExtBalFilepath As String, ByVal pOutputFilepath As String)
Dim FSOreadOrigBal As Object
Set FSOreadOrigBal = CreateObject("Scripting.FileSystemObject")
Dim FSOreadExtBal As Object
Set FSOreadExtBal = CreateObject("Scripting.FileSystemObject")
Set FSOreadOrigBal = FSOreadOrigBal.OpenTextFile(pOrigBalFilepath, 1)
Set FSOreadExtBal = FSOreadExtBal.OpenTextFile(pExtBalFilepath, 1)
'Compile both files into a 1d array
'** The rest of the code **
End Sub
Import_Files_Create_Consolidated_File
now contains 4 lines of code...Check. It presents better because the wall-of-declarations is gone and it is easy to tell what the high level tasks are required to be accomplished.
Note: I've moved two variables and a constant to module-scope to simplify the hand-off of data between the two procedures. That's not necessarily a good practice, but handing off the data differently brings in more concepts that would distract from the goals of this answer.
ImportFiles
now contains 29 lines of code. This meets that 32-line criteria - and it is easy to read and understand. Still, I want to refactor this function to be even shorter to demonstrate the DRY principle. There is a fair amount of repeated code here. Removing the repeated code using a new function results in:
Private Sub ImportFiles()
strOrigBal = GetFilePathFromUser(txtFilepath, "Select original balance file")
strExtBal = GetFilePathFromUser(txtFilepath, "Select external balance file")
End Sub
Private Function GetFilePathFromUser(ByVal pInitialPath As String, _
ByVal pTitle As String) As String
Dim fDialog As FileDialog
Set fDialog = Application.FileDialog(msoFileDialogFilePicker)
With fDialog
.Filters.Clear
.Filters.Add "CSV files", "*.CSV*"
.Title = pTitle
.AllowMultiSelect = False
.InitialFileName = pInitialPath
.Show
End With
GetFilePathFromUser = fDialog.SelectedItems(1)
End Function
ImportFiles
is now really short. It's now so short that it's content can be relocated back to the entry point Sub
. Doing so removes the need for the module-scope fields introduced by the initial refactoring. The code now looks like:
Option Explicit
Sub Import_Files_Create_Consolidated_File()
Const txtFilepath As String _
= "C:\Users\jason\OneDrive\Desktop\VBA_Folder\Macros\Balance_Consolidation\"
Dim strOrigBal As String
Dim strExtBal As String
strOrigBal = GetFilePathFromUser(txtFilepath, "Select original balance file")
strExtBal = GetFilePathFromUser(txtFilepath, "Select external balance file")
CreateConsolidatedFile strOrigBal, strExtBal, txtFilepath & "Output.csv"
End Sub
Public Sub CreateConsolidatedFile(ByVal pOrigBalFilepath As String, _
ByVal pExtBalFilepath As String, ByVal pOutputFilepath As String)
Dim FSOreadOrigBal As Object
Set FSOreadOrigBal = CreateObject("Scripting.FileSystemObject")
Dim FSOreadExtBal As Object
Set FSOreadExtBal = CreateObject("Scripting.FileSystemObject")
Set FSOreadOrigBal = FSOreadOrigBal.OpenTextFile(pOrigBalFilepath, 1)
Set FSOreadExtBal = FSOreadExtBal.OpenTextFile(pExtBalFilepath, 1)
'Compile both files into a 1d array
'** The rest of the code **
End Sub
Private Function GetFilePathFromUser(ByVal pInitialPath As String, _
ByVal pTitle As String) As String
Dim fDialog As FileDialog
Set fDialog = Application.FileDialog(msoFileDialogFilePicker)
With fDialog
.Filters.Clear
.Filters.Add "CSV files", "*.CSV*"
.Title = pTitle
.AllowMultiSelect = False
.InitialFileName = pInitialPath
.Show
End With
GetFilePathFromUser= fDialog.SelectedItems(1)
End Function
Import_Files_Create_Consolidated_File
now has 12 lines of code (including the 2 lines used for Const txtFilepath
to avoid scrolling left/right)...Check
GetFilePathFromUser
now has 15 lines of code...Check
CreateConsolidatedFile
is now somewhere around 132 lines. Progress, but there is more work to be done.
Testing
The changes so far have been fairly straight forward. And, it is likely that the code hasn't been broken and still 'works'. However, additional refactoring of CreateConsolidatedFile
will be more complex and incurs the risk of breaking the code. How should one proceed from this point and avoid this risk?
Separating the User interaction code (retained in the EntryPoint procedure) from the consolidation code (CreateConsolidateFile
) has resulted in making CreateConsolidateFile
a testable function. This UI/Domain logic separation is really important to facilitate testing. Now you can add test functions in a separate test module that looks something like:
Sub TestConsolidatedFile()
Const myTestOriginalBalanceFilepath = "asdf/asdf/MyTestFiles/TestBalances.csv"
Const myTestExtBalanceFilepath = "asdf/asdf/MyTestFiles/TestExtBalances.csv"
Const myTestOutputFilepath = "asdf/asdf/MyTestFiles/TestConsolidationResults.csv"
CreateConsolidatedFile myTestOriginalBalanceFilepath, _
myTestExtBalanceFilepath, myTestOutputFilepath
Dim testResult As Boolean
testResult = False
'Add code to read/check that TestConsolidationResults.csv contains what you expect and
'sets the testResult flag
If testResult Then
Debug.Print "Success!"
Else
Debug.Print "TestConsolidatedFile Failed"
End If
End Sub
If you don't have them, create test files that can be used with CreateConsolidatedFile
. Add the necessary code to get the above test to print the success message when it executes. Now, you have a the ability to detect when you make a change that breaks the original/working logic.
From this point forward, the process is Modify-Test-Fix(if it fails) until all your functions are around 32 (or your own limit) lines of code. You will likely discover that many comments become unnecessary as function names describe what is happening within them. With respect to code documentation, always prefer a well-named function to a comment.
Make it Fast
By applying DRY and SRP along the way, the result will be code that is far more readable and easier to "Make Fast". When you are ready for that step, create test files and tests specifically designed to stress-test your code to verify that it performs well against your anticipated real csv files.
Obligatory
- Your posted code does not declare
Option Explicit
at the top of the module. Get in the habit of always doing this. It guarantees that you declare all variables before they are used. The VBE Tools->Options has a 'Require Variable Declaration' setting that will insertOption Explicit
for new modules automatically. - Variable Declarations
VBA interprets
Dim joinOrig, joinExt, strTempArray1d, strTempArray1dExt As String
As
Dim joinOrig As Variant
Dim joinExt As Variant
Dim strTempArray1d As Variant
Dim strTempArray1dExt As String
The code still 'works' because a Variant
can be assigned values or objects. And, in this case is able to behave like a String
if it is assigned a String
value. Still, it is a best practice to explicitly declare any variable as its intended Type
rather than letting VBA decide for you.
More information
If you are interested in more information about best/better coding practices demonstrated using VBA, I would recommend you visit RubberDuck and consider using the free RubberDuck Excel Addin to augment the VBE. Full Disclosure: I contribute to the Rubberduck project...but you should consider using it nevertheless :)
-
\$\begingroup\$ Nice review! Introduction to a lot of concepts. I really like the point on some sub becoming short enough that it no longer needs to be its own thing and can be expanded back into the parent method. Another general rule of thumb is not LOC, but if a child sub has all the same parameters as the parent sub then it should not be its own routine. One point, the only place you should really never have underscores is in public method names (and public variables) - it will come back to bite you when you get round to using interfaces, and you want to remain consistent throughout. PascalCase Not_This. \$\endgroup\$Greedo– Greedo2022年04月21日 08:09:28 +00:00Commented Apr 21, 2022 at 8:09
-
\$\begingroup\$ Really appreciate your in depth review! It's much clearer to me now how an experienced person constructs the project in their mind & putting it to action. I will replicate these concepts on some other projects I have lined up. I can't upvote just yet!! \$\endgroup\$JasonC– JasonC2022年04月25日 18:13:27 +00:00Commented Apr 25, 2022 at 18:13
The structure of your code would suggest that your chances of being recognised as a developer are practically zero.
You have many functions encapsulated in a single method. Split your Method into smaller methods that can be tested.
You are manually searching linear arrays. Itr would be simpler to put those arrays in a dictionary and use the .Exists method (or an ArrayList and use the '.Contains' method.
You don't provide any test methods for your code so maintenenance is going to be an issue.
-
\$\begingroup\$ Don't worry, I'm aware it's not optimal, I'm pretty much at ground zero of my learnings. I will attempt both of your suggestions. What do you mean y test methods? \$\endgroup\$JasonC– JasonC2022年04月19日 12:59:22 +00:00Commented Apr 19, 2022 at 12:59
-
\$\begingroup\$ Google 'Test driven development'. \$\endgroup\$Freeflow– Freeflow2022年04月19日 13:06:54 +00:00Commented Apr 19, 2022 at 13:06