Need to merge data per column if records in the first column are the same.
I wrote a VBA code and it works but it works very slow as it executes "merge" function per each row no matter if consequent cells are the same or different.
Sub DuplicateValues()
'Declare All Variables:
Dim myCell As Range
Dim myRow As Integer
Dim myRange As Range
Dim myCol As Integer
Dim i As Integer
Dim k As Integer
Dim myFirstRow As Long
Dim myFirstColumn As Long
Dim rngCopy As Range, rngPaste As Range
'Count Number of Rows and Columns:
Application.DisplayAlerts = False
myRow = Range(Cells(3, 1), Cells(3, 1).End(xlDown)).Count + 2 'first two rows are headers
myCol = Range(Cells(2, 1), Cells(2, 1).End(xlToRight)).Count
'Apply formatting to the end of the table:
With ActiveSheet
Set rngCopy = .Range(.Range("A3"), .Cells(myRow, myCol))
With rngCopy.Borders
.LineStyle = xlContinuous
.ColorIndex = 0
.Weight = xlThin
End With
End With
'Merge duplicated cells per Row:
myFirstRow = 3
myFirstColumn = 1
Set iRow = Cells(myFirstRow, myFirstColumn)
Set n = Cells(myFirstRow + 1, myFirstColumn)
For i = 1 To myRow
If iRow <> n Then
Range(Cells(myFirstRow, 1), Cells(myFirstRow, myCol - 3)).HorizontalAlignment = xlCenter
Range(Cells(myFirstRow, 1), Cells(myFirstRow, myCol - 3)).VerticalAlignment = xlCenter
Range(Cells(myFirstRow, 8), Cells(myFirstRow, 8)).WrapText = True
Range(Cells(myFirstRow, 1), Cells(myFirstRow, myCol - 3)).EntireRow.AutoFit
iRow = Cells(myFirstRow + i, myFirstColumn)
n = Cells(myFirstRow + i + 1, myFirstColumn)
Else
n = Cells(myFirstRow + i + 1, myFirstColumn)
For k = 1 To myCol - 3 'need to merge data per column but don't need to merge data in the last 3 columns
Range(Cells(myFirstRow + i - 1, k), Cells(myFirstRow + i, k)).Merge
Range(Cells(myFirstRow + i - 1, k), Cells(myFirstRow + i, k)).HorizontalAlignment = xlCenter
Range(Cells(myFirstRow + i - 1, k), Cells(myFirstRow + i, k)).VerticalAlignment = xlCenter
Range(Cells(myFirstRow + i - 1, 8), Cells(myFirstRow + i, 8)).WrapText = True
Range(Cells(myFirstRow + i - 1, k), Cells(myFirstRow + i, k)).EntireRow.AutoFit
Next
End If
Next
End Sub
I tried to create a different logic to initiate "merge" by blocks (skip merging the same value cells in the first column right away):
merging only when the "base/initial" cell is not the same as a "check" cell and merge all appropriate cells all together (not by each row), but I can't find a way to make it work.
Would be very grateful for any code optimizations!
2 Answers 2
'Declare All Variables: Dim myCell As Range Dim myRow As Integer Dim myRange As Range Dim myCol As Integer Dim i As Integer Dim k As Integer Dim myFirstRow As Long Dim myFirstColumn As Long Dim rngCopy As Range, rngPaste As Range
Don't do this. This procedure is somewhere between 2 and 3 screens high on my laptop: when I'm at the bottom of the procedure, I don't know what I'm looking at, so I scroll back up to this wall of declarations, parse the chunk of code, locate the variable I'm looking for, then scroll back down to where I was, ...and that gets very annoying, very fast.
People defending this "declare everything at the top of the procedure" habit usually defend it pretty hard, and they're wrong. "I can see everything that's used in the procedure at once" sounds great on paper. Truth is, it only makes you scroll back and forth all the time for no reason, and makes it much harder than it should be, to know whether or not a variable is used, let alone where it's used.
If you declare variables where you first assign them, as you need to introduce them, then you simply can't miss the fact that myCell
, myRange
, rngPaste
are never assigned or referenced anywhere... and that iRow
and n
aren't declared at all. This is extremely bug-prone, you don't want to allow VBA code to run with undeclared variables. Fortunately you can completely prevent this, by simply specifying Option Explicit
at the very top of every module you ever write any VBA code in.
Proper indentation also helps, massively. It started well:
For i = 1 To myRow
If iRow <> n Then
...and then suddenly went way out there:
Range(Cells(myFirstRow, 1), Cells(myFirstRow, myCol - 3)).HorizontalAlignment = xlCenter
...to seemingly random:
Else
n = Cells(myFirstRow + i + 1, myFirstColumn)
For k = 1 To myCol - 3 'need to merge data per column but don't need to merge data in the last 3 columns
Use an indenter if you're unsure how to keep indentation consistent.
Row numbers should always be As Long
. An Integer
is a 16-bit signed integer type, so its maximum possible value is 32,767. This means as soon as your data involves row 32,768 and beyond, your code breaks with an "Overflow" run-time error. In fact, there's little to no reason at all to ever declare anything As Integer
in 2019: processors are optimized to deal with 32-bit integer types, and that's a Long
in VBA.
I like this:
With ActiveSheet
I makes ActiveSheet
references explicit, and that's very good.
However you're implicitly referring to whatever the ActiveSheet
is everywhere else outside this With
block.
myRow = Range(Cells(3, 1), Cells(3, 1).End(xlDown)).Count + 2 'first two rows are headers myCol = Range(Cells(2, 1), Cells(2, 1).End(xlToRight)).Count
Unqualified like this, Range
and Cells
are implicitly referring to the ActiveSheet
: the With ActiveSheet
block is therefore redundant, and makes accesses to that active sheet inconsistent: some are explicit, some are implicit - and that's very bug-prone.
A note about this:
myRow = Range(Cells(3, 1), Cells(3, 1).End(xlDown)).Count + 2 'first two rows are headers
Looks like you're trying to get the last used row; this is not a reliable way to do it. Consider flipping the logic around and going up from the last row on the sheet to get the .Row
of that specific cell, rather than the Count
of cells in a range:
myRow = ws.Range("A" & ws.Rows.Count).End(xlUp).Row
Note how the fist two rows being headers, doesn't matter any more.
I'd strongly advise against using my
as a prefix for anything, regardless of how often you see it in documentation. Same goes for rng
and o
and whatever other prefixing scheme you might read anywhere. Use meaningful names instead. "myRow" doesn't say what's important about it: it's the last row with data. lastRow
would already be a much better name for it.
rngCopy
is assigned, its borders are formatted, ...but it's not being copied anywhere. The name strongly suggests it's a source
, especially since it's declared right next to the unused rngPaste
, which strongly suggests that would be a destination
for an operation involving the clipboard. Pretty misleading, since none of that is happening anywhere in the procedure.
I already mentioned n
isn't declared. Its usage is a problem.
Set n = Cells(myFirstRow + 1, myFirstColumn)
It's Set
-assigned to a Range
. Thus, its runtime type at that point is Variant/Range
.
If iRow <> n Then
Now it's being compared to iRow
, which is another undeclared Variant/Range
variable, and the two objects are only comparable through implicit let-coercion involving hidden default member calls - making it all explicit would look like this:
If iRow.Value <> n.Value Then
And while that's now explicit, it's wrong. It's wrong, because they shouldn't be Range
objects. They shouldn't be Range
objects, because they're later being Let
-assigned to plain values:
iRow = Cells(myFirstRow + i, myFirstColumn)
n = Cells(myFirstRow + i + 1, myFirstColumn)
Beyond that point, the data type of iRow
and n
is up in the air. If you're lucky, you're looking at two Variant/Double
values. Otherwise, you're looking at one or more Variant/Error
values (that would be the case if the worksheet contains error values), and everything blows up with a type mismatch error.
Not declaring variables is one thing, reusing variables is another, but reusing undeclared variables and assigning them a new runtime type, makes everything very hard to follow. Declare them with an explicit data type, and let them be values, instead of objects that behave like values thanks to implicit hidden code.
As for performance, the reason it's slow is because it's doing the single slowest thing any code can do: interact with worksheet cells directly, in a loop.
You can't help it, interacting with worksheet cells is precisely what this macro needs to do. All you can do is tell Excel to stop trying to keep up as you modify the worksheet.
Every time a cell is modified, Excel evaluates whether a recalc is needed, and will perform it if it has to. Switch calculation to manual before you start manipulating the worksheet, and toggle it back to automatic when you're done: Excel will only recalculate once. That's Application.Calculation
(set to xlCalculationManual
, then back to xlCalculationAutomatic
when you're ready).
Every time a cell is modified, Excel fires worksheet events (e.g. Worksheet.Change
), and if there's a handler procedure for that event, then that code will run before the next instruction does. Turn off application events before you start manipulating the worksheet, and toggle it back on when you're done: no worksheet events will be fired; that's Application.EnableEvents
(set to False
, then back to True
when you're ready).
Every time a cell is modified, Excel tries to repaint itself. While that's normally pretty fast, when you're modifying cells in a loop you don't want Excel to even try to keep up - you want to make your changes, and have Excel repaint itself when you're done. Do that by toggling Application.ScreenUpdating
(set to False
, then back to True
when you're ready).
Whenever you toggle this global Application
state, make sure you handle runtime errors, and have your error-handling subroutine ensure that the global state gets properly reset whether the procedure succeeds or fails, whatever the reason for failing might be.
-
\$\begingroup\$ Thanks for the detailed explanations on my errors, I just started writing codes on VBA and this is very helpful! Regarding performance, don't you think that if other logic was used, like merging only duplicated cells that are following each other (skipping merging every row) would increase the speed? I just don't know how to make it works that way... \$\endgroup\$Tart– Tart2019年11月22日 20:46:26 +00:00Commented Nov 22, 2019 at 20:46
-
\$\begingroup\$ It probably would, but I'm pretty sure the overhead of Excel recalculating and redrawing stuff every time dwarfs any algorithmic tweaks - best do both (turn off recalcs & repaints and review the algorithm) \$\endgroup\$Mathieu Guindon– Mathieu Guindon2019年11月22日 21:11:04 +00:00Commented Nov 22, 2019 at 21:11
-
\$\begingroup\$ Another question is about
myRow = ws.Range("A" & ws.Rows.Count).End(xlUp).Row
that's a great way of getting the last row but I'm now getting an error: Object variable or With block variable not set \$\endgroup\$Tart– Tart2019年11月22日 21:13:50 +00:00Commented Nov 22, 2019 at 21:13 -
\$\begingroup\$ Is
ws
defined? I only put it there to emphasize thatRange
is a member call that needs to be qualified with a properWorksheet
object. \$\endgroup\$Mathieu Guindon– Mathieu Guindon2019年11月22日 21:14:11 +00:00Commented Nov 22, 2019 at 21:14 -
\$\begingroup\$ Yes, I defined
ws
:Dim ws As Worksheet
\$\endgroup\$Tart– Tart2019年11月22日 21:19:58 +00:00Commented Nov 22, 2019 at 21:19
I tried to create a different logic to initiate "merge" by blocks (skip merging the same value cells in the first column right away):
The best way to do this is count the number of duplicates and then define your range as current row to (current row - (number of duplicates + 1))
(note: you add 1 because the original value is not a duplicate).
Range(Cells(i - (countDups + 1), k), Cells(i, k))
Hard coded values that may change in the future are called Magic Numbers. Replacing these values with an enumeration or constants will make you code easier to read and modify.
For example the number 3 appears appears 4 times in code. Each occurrence is being used to determine the number of columns to exclude. If you later decide that you only need to exclude a 2 columns, you could just change all the 3's to 2. No problem, ease pease. So let's say we did that but decided to we did indeed need the to exclude the 3rd column, no problem just replace the 2's with 3's. But wait there are 2 header rows, now there is a problem.
Before
Cells(myFirstRow, myCol - 3)
After
Cells(myFirstRow, myCol - ColumnOffset)
Anything that is in going to happen If
or Else
does not belong in an If
statement.
If iRow <> n Then
Range(Cells(myFirstRow, 1), Cells(myFirstRow, myCol - 3)).HorizontalAlignment = xlCenter
Range(Cells(myFirstRow, 1), Cells(myFirstRow, myCol - 3)).VerticalAlignment = xlCenter
Range(Cells(myFirstRow, 8), Cells(myFirstRow, 8)).WrapText = True
Range(Cells(myFirstRow, 1), Cells(myFirstRow, myCol - 3)).EntireRow.AutoFit
iRow = Cells(myFirstRow + i, myFirstColumn)
n = Cells(myFirstRow + i + 1, myFirstColumn)
Else
n = Cells(myFirstRow + i + 1, myFirstColumn)
For k = 1 To myCol - 3 'need to merge data per column but don't need to merge data in the last 3 columns
Range(Cells(myFirstRow + i - 1, k), Cells(myFirstRow + i, k)).Merge
Range(Cells(myFirstRow + i - 1, k), Cells(myFirstRow + i, k)).HorizontalAlignment = xlCenter
Range(Cells(myFirstRow + i - 1, k), Cells(myFirstRow + i, k)).VerticalAlignment = xlCenter
Range(Cells(myFirstRow + i - 1, 8), Cells(myFirstRow + i, 8)).WrapText = True
Range(Cells(myFirstRow + i - 1, k), Cells(myFirstRow + i, k)).EntireRow.AutoFit
Next
End If
The k
loop is disguising the fact that the HorizontalAlignment
, VerticalAlignment
and row height are being adjusted whether iRow <> n
or not. Knowing this we can move these operations outside the scope of the If
statement. This will make it much easier to focus on the if...Else
logic.
If iRow <> n Then iRow = Cells(myFirstRow + i, myFirstColumn) Else For k = 1 To myCol - 3 Range(Cells(myFirstRow + i - 1, k), Cells(myFirstRow + i, k)).Merge Next End If n = Cells(myFirstRow + i + 1, myFirstColumn) Range(Cells(myFirstRow, 8), Cells(myFirstRow, 8)).WrapText = True With Range(Cells(myFirstRow, 1), Cells(myFirstRow, myCol - 3)) .HorizontalAlignment = xlCenter .VerticalAlignment = xlCenter .EntireRow.AutoFit End With
If we think about it a little deeper, these formats are being applied to all cells in our target area. The formatting isn't even essential to merging the cells, which is what we are really trying to do. So we could simply pass the area to be formatted to another subroutine (see code below). This will simplify the main code and make it much easier to focus on the task at hand.
Refactored Code
Sub MergeDuplicateValues()
Const DebugMode As Boolean = True
Dim Target As Range
Dim ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Data")
If DebugMode Then
Rem Close previous test workbook
CloseTestWorkbooks
Rem Copy the worksheet to a new workbook
ws.Copy
Set ws = ActiveSheet
End If
Rem Define the Target Range
Set Target = getMergeRange(ws.Range("A1"))
Rem Add breakpoint here and use Target.Select in the Immediate Window to test the range
If Target Is Nothing Then Exit Sub
Application.ScreenUpdating = False
MergeDuplicates Target
ApplyBorders Target.CurrentRegion
ApplyFormatsToMergeArea Target
End Sub
Private Sub MergeDuplicates(Target As Range)
Application.DisplayAlerts = False
Dim r As Long, countDups As Long
With Target
For r = 1 To .Rows.Count
If .Cells(r, 1).Value = .Cells(r + 1, 1).Value Then
countDups = countDups + 1
Else
If Count > 0 Then
Dim Column As Range
Rem Iterate over each column of the Target Range
For Each Column In .Columns
Dim section As Range
Set section = Column.Cells(r - countDups, 1).Resize(countDups + 1)
Rem Add breakpoint here and use section.Select in the Immediate Window to test the range
section.Merge
Next
Count = 0
End If
End If
Next
End With
Application.DisplayAlerts = True
End Sub
Private Function getMergeRange(Target As Range) As Range
Const LastColumnOffset As Long = -3
Const FirstRowOffset As Long = 2
Dim newTarget As Range
With Target.CurrentRegion
Rem An error handler is needed when setting a range using negative offsets
On Error Resume Next
Rem Define the actual Target range
Set newTarget = .Offset(FirstRowOffset).Resize(.Rows.Count - FirstRowOffset, .Columns.Count + LastColumnOffset)
Rem Add breakpoint here and use newTarget.Select in the Immediate Window to test the range
If Err.Number <> 0 Then
Err.Raise Number:=vbObjectError + 513, Description:="Unable to create Merged Range"
Exit Function
End If
On Error GoTo 0
End With
Set getMergeRange = newTarget
End Function
Private Sub ApplyBorders(Target As Range)
With Target.Borders
.LineStyle = xlContinuous
.ColorIndex = 0
.Weight = xlThin
End With
End Sub
Private Sub ApplyFormatsToMergeArea(Target As Range)
Const NumberOfColumns As Long = 8
With Target
.HorizontalAlignment = xlCenter
.VerticalAlignment = xlCenter
.Resize(, NumberOfColumns).WrapText = True
.Rows.EntireRow.AutoFit
End With
End Sub
Private Sub CloseTestWorkbooks()
Dim wb As Workbook
For Each wb In Workbooks
If Len(wb.Path) = 0 Then wb.Close 0
Next
End Sub
With
, it just obfuscates code and makes it less readable. \$\endgroup\$