I pieced together this code in answer to the following OP post on Stack Overflow: Get worksheet Names of All .xlsx files in a folder
Task:
The OP basically wanted to loop all .xlsx files in a folder and produce a list of worksheets and workbooks contained therein; and they gave an example output format.
My request:
Could I please have feedback on the following code, with particular focus on the encapsulation? How should I correctly address encapsulation so as to still produce the same output but without exposing the public fields e.g. key
, value
.
My code design:
I sought to create a process that would:
1) Get the worksheet names without opening the files;
2) Use a global collection with custom class to store the Filename and worksheet names in key, value pairs where both the "key" and the "value" could be accessed;
3) Produce output in the format the OP specified.
Example output format given by OP:
Number of workbooks = 3
Number of Worksheets = 100
ReportNames = TestFile1, TestFile2, TestFile3
SheetNames = TestFile1:TestSheet1, TestFile1:TestSheet2, TestFile1:TestSheet3, TestFile1:TestSheet4
Notes:
I didn't post my answer as I was unclear on how to alter my code properly to fix a number of my public fields breaking encapsulation i.e. I should have Get/Setters, I believe, but am not sure how to implement correctly. The overall code is a little patchwork-like and I give the links to any code sources where I have adapted code from elsewhere.
In a class module
called KeyValue
Option Explicit
'https://stackoverflow.com/questions/5702362/vba-collection-list-of-keys
Public key As String
Public value As String
Public Sub Init(ByVal k As String, ByVal v As String)
key = k
value = v
End Sub
In standard module 1:
Option Explicit
Public numWorksheets As Long
Public kv As KeyValue
Public Col As Collection
In standard module 2:
Option Explicit
Public Sub LoopAllExcelFilesInFolder()
'https://www.thespreadsheetguru.com/the-code-vault/2014/4/23/loop-through-all-excel-files-in-a-given-folder
Dim wb As Workbook
Dim myPath As String
Dim filename As String
Dim myExtension As String
Dim FldrPicker As FileDialog
Dim numWorkbooks As Long
numWorkbooks = 0
numWorksheets = 0
Set Col = New Collection
'Optimize Macro Speed
Application.ScreenUpdating = False
Application.EnableEvents = False
Application.Calculation = xlCalculationManual
Set FldrPicker = Application.FileDialog(msoFileDialogFolderPicker)
With FldrPicker
.Title = "Select A Target Folder"
.AllowMultiSelect = False
If .Show <> -1 Then GoTo NextCode
myPath = .SelectedItems(1) & "\"
End With
NextCode: 'In Case of Cancel
myPath = myPath
If myPath = vbNullString Then GoTo ResetSettings
myExtension = "*.xlsx*" 'Target File Extension (must include wildcard "*")
filename = Dir(myPath & myExtension)
Do While filename <> vbNullString
DoEvents
numWorkbooks = numWorkbooks + 1
AddItemsToCollection myPath, filename
filename = Dir
Loop
Dim ReportNames As String
Dim SheetNames As String
Dim counter As Long
ReportNames = vbNullString
SheetNames = vbNullString
For Each kv In Col
If InStr(1, ReportNames, kv.key) = 0 Then
counter = counter + 1
ReportNames = ReportNames & kv.key & " "
End If
SheetNames = SheetNames & kv.key & ":" & kv.value & " "
Next kv
ReportNames = "ReportNames = " & Replace$(Trim$(ReportNames), " ", ",")
SheetNames = "SheetNames = " & Replace$(Trim$(SheetNames), " ", ",")
Debug.Print "Num of Workbooks = " & numWorkbooks, "Num of Worksheets = " & numWorksheets
Debug.Print ReportNames
Debug.Print SheetNames
ResetSettings:
'Reset Macro Optimization Settings
Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
End Sub
In standard module 3:
Option Explicit
''======================================================
' Required references:
'Microsoft ActiveX Data Object X.X Library
'Microsoft ADO Ext. X.X for DLL and Security
''=====================================================
Public Sub AddItemsToCollection(ByVal path As String, ByVal filename As String)
Dim objConn As ADODB.Connection
Dim objCat As ADOX.Catalog
Dim tbl As ADOX.Table
Dim sConnString As String
Dim WBName As String
Dim wsName As String
WBName = path & filename
'https://social.msdn.microsoft.com/Forums/windows/en-US/2cd64f27-135f-4d69-8f62-a3951c5e522b/how-to-checkinstall-jet-drivers?forum=winformsdatacontrols
sConnString = "Provider=Microsoft.ACE.OLEDB.12.0;" & _
"Data Source=" & WBName & ";" & _
"Extended Properties=Excel 12.0;"
Set objConn = New ADODB.Connection
objConn.Open sConnString
Set objCat = New ADOX.Catalog
Set objCat.ActiveConnection = objConn
For Each tbl In objCat.Tables
numWorksheets = numWorksheets + 1
wsName = tbl.Name
wsName = Application.Substitute(wsName, "'", vbNullString)
wsName = Left$(wsName, InStr(1, wsName, "$", 1) - 1)
Set kv = New KeyValue
kv.Init filename, wsName
Col.Add kv
Set kv = Nothing
Next tbl
objConn.Close
Set objCat = Nothing
Set objConn = Nothing
End Sub
1 Answer 1
So you've done the tough part. You got the code running. Now it's time to reexamine it. How can you better encapsulate your code? Do you really need the public fields? Are your subroutines too long or overly complicated? Should you move all you code into custom classes?
When encapsulation your code you need to consider more than just hiding public fields. You need to consider the portability of the code. Currently your code is spread across 3 public modules, 1 class module, and requires 2 external references to be set. That is a lot of setup for 2 functions.
The 2 external references can be eliminated by employing Late Binding. A common technique is to use Early Binding in the developmental stage. But before you deployment you change the data types of the referenced objects to Object and set them using CreateObject. You would then test the code, remove the external references and retest the code. In this way, you get the both easy of development using Early Binding and the portability of Late Binding.
Developmental stage:
Dim objConn As ADODB.Connection
Dim objCat As ADOX.Catalog
Dim tbl As ADOX.Table
Set objConn = New ADODB.Connection
Set objCat = New ADOX.Catalog
Here is the same code block at deployment after Deployment after the external references have been removed:
Dim objConn As Object
Dim objCat As Object
Dim tbl As Object
Set objConn = CreateObject("ADODB.Connection")
Set objCat = CreateObject("ADOX.Catalog")
What to do about the evils of Public Fields? Pass them as a parameters. If you have too many public fields to sanely pass as parameters, then encapsulate them in a custom class or custom type (UDT) and them pass the class or type as a parameter.
The dreaded public fields!!!
Public numWorksheets As Long
Public kv As KeyValue
Public Col As Collection
None of these are even needed.
numWorksheets
: Shouldn't be a public field. Simply declare it in your main subroutine and pass it ByRef toAddItemsToCollection(ByRef numWorksheets as Long)
- kv As KeyValue: should have never been a public field. It should be declared in both subroutines independently.
- Col As Collection: Shouldn't be a public field. Simply pass it as a parameter to
AddItemsToCollection(ByVal Col as Collection)
. It should be noted that Objects passed ByVal are still referencing the original object.
Class KeyValue
Allows you to retrieve both a keys and values from a dictionary. I would either expand the class adding additional functionality or use a Scripting Dictionary. You could make an argument for using built-in VBA Collections because Macs don't have Scripting Dictionaries available by default. In that case, I would use @GSerg's answer to VBA collection: list of keys over @user895964 answer that you are using.
Public Sub LoopAllExcelFilesInFolder()
- Changes the Application State to improve performance
- Retrieves a folder path from a `Application.FileDialog(msoFileDialogFolderPicker)
- Loops through all the files in a directory
- Add a KeyValue class to a collection by calling
AddItemsToCollection
for each file in the directory - Prints the code results to the Immediate Window
- Resets the Application State
This subroutine emcompasses 81 lines of code which take up 3 pages of the code window and perform 6 tasks. Ideally, a subroutine should perform 1 or 2 simply tasks and take up 25-40 lines at the most. Whenever possible I refactor any code that I have to scroll vertically or horizontally to read.
Items 1 and 6 modify the Application State to speed up the Macro. Typically, I create a sub to do that Sub SpeedBoost(TurnOn as Boolean)
. This will save 4 lines of code in the in this subroutine. By using such a descriptive name SpeedBoost
, you could also remove 2 lines of comments. However, you are not opening any workbooks or writing and data to the workbook, so this technique will have no effect.
Item 2 should be moved into it's own function. This would make it easier to test and possibly modify in the future. You should also add some conditions in case the dialog was cancelled.
FolderName = getFolder()
If len(FolderName) = 0 then
Msgbox "Cancelled"
Exit Sub
End If
Item 3 loop through the files in a directory. This extracted to it's own function. In this way you can easily modify and test this functionality.
Function getFiles(FolderName As String, Optional IncludeSubFolders as Boolean) as Collection
'Some code
End Function
Main Function
Dim f as Object, colFiles as Collection
Set colFiles = getFiles(FolderName, False)
If colFiles.Count = 0 then
Msgbox "No Files Found"
Exit Sub
End If
For each f in colFiles
'Some code
Next
Item 5 printing the code results to the Immediate Window. It would be more useful to convert convert the sub to a function that will return the results
Just a couple notes on AddItemsToCollection
- When passing a path As String and filename separately you should check it the path ends with a path separator. Because their is no need to know the path I would use `FullFileName'.
numWorksheets = numWorksheets + 1
should be replaced with numWorksheets = objCat.Tables.Count.
-
\$\begingroup\$ Thank you for the really helpful feedback. Totally re-framed my thinking. Couple of questions: 1) Main Function code as end above.... Set colFiles = getFiles Would this actually be Set colFiles = getFiles(FolderName , IncludeSubFolders ) ? 2) What kind of additionally functionality were you implying could be added to the class KeyValue? 3) Is all the application.state code redundant in my example? 4) The function to return the output: would this return a string array for example? Thanks again for taking time to consider my post and giving such detailed feedback. \$\endgroup\$QHarr– QHarr2017年11月15日 09:45:46 +00:00Commented Nov 15, 2017 at 9:45
-
\$\begingroup\$ Here is how I would design the classes: Download Example Workbook \$\endgroup\$user109261– user1092612017年11月15日 16:28:09 +00:00Commented Nov 15, 2017 at 16:28
-
\$\begingroup\$ You are correct about
getFiles
. The Macro optimization isn't redundant it is not needed. Your Macro speed isn't going to effected by the Application State because you are not writing values to a worksheet or opening any workbooks that will cause recalculations or screen updating. \$\endgroup\$user109261– user1092612017年11月15日 16:31:45 +00:00Commented Nov 15, 2017 at 16:31 -
1\$\begingroup\$ Now consider 3 years done the line your need to modify a large project with many global variables. In order to fully understand the project you will have to know how each of these variables are used in all the functions that use them. Much better to limit the scope of the variables to the procedure that is using it. \$\endgroup\$user109261– user1092612018年02月20日 22:58:03 +00:00Commented Feb 20, 2018 at 22:58
-
1\$\begingroup\$ Yes ,
kv
should be a declared as a local variable in both scopes. \$\endgroup\$user109261– user1092612018年02月21日 06:36:40 +00:00Commented Feb 21, 2018 at 6:36