I've built a VBA Excel macro that works on a spreadsheet of duplicated files, and displays only groups in which at least one duplication is owned by the current user.
The spreadsheet lists file metadata, one file per row, grouped with other files whose content is identical, i.e. "duplicated". Each such duplication group is preceded by a title row in bold.
The macro uses text boldness to determine duplication groups, and checks within each group whether any file's owner (represented in column 7) is the currently logged user. All groups with no file owned by the user get hidden. (auto-filtering would not solve this problem because I want the user to see the other files in the duplication group, i.e. see files from other owners with the same content as theirs)
It works exactly as intended, but too slowly. It takes one or two minutes to run through a spreadsheet about 80 thousand lines long. Anyone can see a way to speed it up?
I had the idea of maybe hiding every row first (which I assume can be done much faster) and then de-hide what should be shown. But I couldn't find a command to mass-hide.
Also, since we're here, what best practices am I missing? I'm a complete novice to VBA, though I'm an experienced programmer in other languages.
Grateful for any help.
'Operates on .xslx report of network file duplication, as output by TreeSize tool.
'Hides all Duplication Groups not involving the currently logged user as a file Owner
'(regular filtering does not work because it hides the other duplicates which have a different owner)
'TODO: speed it up. It currently takes ~1min to run. Probably hide-all then unhide-selected will be faster.
Private Sub My_duplicates_Click()
Dim hidIt As Boolean
Dim foundInGroup As Boolean
Application.ScreenUpdating = False
Application.EnableEvents = False
login = Environ("Username")
ownerCol = 7
rowNum = 3 'skip header
Do While Not IsEmpty(Cells(rowNum, 1))
If Cells(rowNum, ownerCol).Font.Bold = True Then 'bold text indicates the beginning of a duplication group
foundInGroup = False
groupMemberRow = rowNum + 1
Do While Not IsEmpty(Cells(groupMemberRow, 1)) And Not Cells(groupMemberRow, ownerCol).Font.Bold = True
If Cells(groupMemberRow, ownerCol).Value2 = login Then
foundInGroup = True
End If
groupMemberRow = groupMemberRow + 1
Loop
If foundInGroup = False Then
hidIt = True
Else
hidIt = False
End If
Cells(rowNum, ownerCol).EntireRow.Hidden = hidIt
groupMemberRow = rowNum + 1
Do While Not IsEmpty(Cells(groupMemberRow, 1)) And Not Cells(groupMemberRow, ownerCol).Font.Bold = True
Cells(groupMemberRow, ownerCol).EntireRow.Hidden = hidIt
groupMemberRow = groupMemberRow + 1
Loop
rowNum = groupMemberRow - 1 'counting on next increment to place it back on the next group header
End If
rowNum = rowNum + 1
Loop
End Sub
2 Answers 2
Auto Filtering
(auto-filtering would not solve this problem because I want the user to see the other files in the duplication group, i.e. see files from other owners with the same content as theirs)
Auto Filtering could definitely solve this problem. The way to go about it would be to add a helper column, set the values in the column to "Show" if you want to show the column and then filter the helper column by the word "Show". You could even hide this column.
Hiding Rows
I had the idea of maybe hiding every row first (which I assume can be done much faster) and then de-hide what should be shown. But I couldn't find a command to mass-hide.
Yes, this undoubtedly faster. Consider that you have 80K rows and you only need to see 1K rows. Instead of performing 80K + 1 show/hide operations, hiding all the rows and then unhiding the rows you want will reduce the number of operations to 1K + 1. Union()
is much faster than Show/Hide. It would be even faster if you were to Union()
all the ranges that need to be shown together and only have 1 hide operation amd 1 show operation.
Note: Concatenating the Range addresses is considerably faster than Union()
. This can get somewhat complicated because there is a limit to the address length. To maximize performance, use a combination of address concatenation and Union()
.
Option Explicit
MSDN - Option Explicit Statement
Use Option Explicit to avoid incorrectly typing the name of an existing variable or to avoid confusion in code where the scope of the variable is not clear.
One of the best of the "Best Practices" is to always declare and Type your variables. The more information that you give the compiler the faster the compiler can catch errors.
Declaring Constants
By declaring a constant, you can assign a meaningful name to a value. You use the Const statement to declare a constant and set its value. After a constant is declared, it cannot be modified or assigned a new value.
A variable should be used when its value is either not know prior to compiling or will vary throughout the course of code execution. A value that will remain always remain constant should be declared as a such.
Const ownerCol As Long = 7
A Faster Way
Sub ShowUserDuplicates()
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
Application.EnableEvents = False
Const OWNER_COLUMN As Long = 7
Dim cell As Range, VisibleRows As Range
Dim flagUser As Boolean
Dim login As String
login = Environ("Username")
With Worksheets("Sheet1")
'Hide All the Rows
.UsedRange.Offset(2).EntireRow.Hide
'Iterate though the cells in the Owner Column
For Each cell In Range("A3", .Range("A" & .Rows.Count).End(xlUp)).Offset(0, OWNER_COLUMN - 1)
If cell.Font.Bold Then flagUser = cell.Value = login
If flagUser Then
If VisibleRows Is Nothing Then
Set VisibleRows = cell
Else
Set VisibleRows = Union(VisibleRows, cell)
End If
End If
Next
If Not VisibleRows Is Nothing Then VisibleRows.EntireRow.Hidden = False
End With
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = False
Application.EnableEvents = False
End Sub
First off, regarding
But I couldn't find a command to mass-hide.
For me, the snippet WhateverSheet.Range("A3:A80000").Rows.Hidden = True
seems to do the trick.
I see a lot of not fully qualified Range()
s in your code. Be aware that these references always implicitly point to ActiveSheet.Range()
. So if - for whatever reason - the ActiveSheet
changes while your code executes, results might be unexpected. Better would be to use a reference to the sheet you want to modify:
Dim Ws as Worksheet
Set Ws = ThisWorkbook.Worksheets(1) ' ThisWorkbook always refers to the workbook the executing code resides in
'Set Ws = Workbooks("NameOfWorkbook.xlsx").Worksheets("NameOfWorksheet")
'Set Ws = Sheet1 ' this is the worksheet's code name as seen in the VBE (the sheet objects are listed as CodeName (ShownName))
Ws.Range("A1").Value = "A new value for my range"
With Ws
.Range("A2").Value = "With blocks ease access to frequently used parent objects"
.Range("A3").Value = .Range("A2").Value & " - as shown here"
End With
There are some instances of undeclared variables in your code. These open up room for a lot of erros: They will be implicitly declared as Variant
, meaning the compiler won't help you, if you intend to only store a specific data type in them. For example rowNum = "three"
would lead to a generic runtime error 1004 in your code (as an integer
or long
type is expected by the .Cells
property). If you declared Dim rowNum as Long
, rowNum = "three"
would instead lead to more specific Type Mismatch
error.
Specifying Option Explicit
at the top of each module will prevent the usage of undeclared variables (and is highly recommended!). As a sidenote, it will also help you against typos in variable names:
Const HelpText as String = "Enable Option Explicit for an easier coding life"
Debug.Print HlepText ' Compiler will tell you: Variable not defined, making it easy to spot spelling errors
In this case:
Cells(groupMemberRow, ownerCol).Font.Bold = True
you don't need = True
, as the .Bold
property already has the boolean type. In general there is no need for = True
when you are evaluating boolean statements. In other cases, having = True
in there might even be dangerous (= leading to unexpected results), as M.Doerner alludes to:
It means that
someValue = True
does returnFalse
for most values that get implicitly converted toTrue
. This is rather dangerous. So, the recommendation to never compare withTrue
is a very sensible one, not because it returns the same result as the implicit conversion, which it does not, but rather because it leads to unexpected results. E.g.1
converts toTrue
, but1 = True
evaluates toFalse
because the actual comparison performed is1 = - 1 (=CInt(True))
In general I'm not really sure what you could change to improve performance, except maybe to try and load the whole thing into an array and operate on that. See Chip Pearson's explanations on the topic and a link to a very relevant question on SO.
And last but not least, I want to mention the Rubberduck VBE Add-In (Open Source, Free, no admin privileges required, active development), that can provide many useful features to help you be a better coder in VBA, including - but not limited to:
- Indenter (Indent your procedure/module/project with just one click)
- Code inspections (it will tell you about a missing
Option Explicit
, missing declarations and about many more things that can possibly go wrong) - support for Unit Testing
- Rename assistant (renames all occurrences of the selected variable/object)
- AutoComplete (still experimental, but already working quite well)
- ...
(Disclaimer: I'm just a satisfied user who wants to spread the word. It really is worth a try.)
-
\$\begingroup\$ Please note that a comparison with
True
usually results in a different value than an implicit conversion to aBoolean
for all data types butBoolean
andString
. The reason is that for all other data types the valueTrue
gets converted to the other data type, interpreted as-1
, followed by a conversion in that data type, except forByte
, which is treated asInteger
. On the other hand, all nonzero values are converted toTrue
in an implicit conversion toBoolean
. \$\endgroup\$M.Doerner– M.Doerner2018年07月03日 13:40:24 +00:00Commented Jul 3, 2018 at 13:40 -
\$\begingroup\$ Oh, of course. I completely forgot about implicit type conversions. I was only thinking about Boolean type properties. So this means that
someValue = True
does have its place in evaluations, right? If one wants to make sure that only truthy boolean values pass? Edit: And all values thatTrue
can be converted to... (This is somewhat confusing right now.) \$\endgroup\$Inarion– Inarion2018年07月03日 14:11:15 +00:00Commented Jul 3, 2018 at 14:11 -
\$\begingroup\$ @Inarion It means that
someValue = True
does returnFalse
for most values that get implicitly converted toTrue
. This is rather dangerous. So, the recommendation to never compare withTrue
is a very sensible one, not because it returns the same result as the implicit conversion, which it does not, but rather because it leads to unexpected results. E.g.1
converts toTrue
, but1 = True
evaluates toFalse
because the actual comparison performed is1 = - 1 (=CInt(True))
. \$\endgroup\$M.Doerner– M.Doerner2018年07月03日 15:58:42 +00:00Commented Jul 3, 2018 at 15:58 -
\$\begingroup\$ @M.Doerner Now I can follow! So I basically recommended the right thing for the wrong reasons. :D Will update the answer to include your explanation. Thanks! \$\endgroup\$Inarion– Inarion2018年07月04日 05:59:49 +00:00Commented Jul 4, 2018 at 5:59