Would like to have a code review so I can improve my knowledge and expertise in VBA more specifically with Dictionaries as this is the first time I used one.
For context I have duplicate data and need to consolidate creating a unique list. To achieve this I used a Dictionary concatenating 2 fields to make a unique key. Also I have a function that uses Dictionaries to get a unique key so I can find the number of occurrence for each order number and add a suffix at the end if more than 999 rows used for the same order.
Option Explicit
Sub UniqueListWithSumQty()
Dim wsSource As Worksheet
Dim ws As Worksheet
Dim sh As Worksheet
Dim strCell As String
Dim i As Long, j As Long
Dim LRow As Long
Dim dict As Object
Set ws = ActiveSheet
Set dict = CreateObject("Scripting.Dictionary")
With ws
LRow = FindLastRow(ActiveSheet, 3, 21)
.Range("C17:U" & LRow).Sort key1:=.Range("C17:C" & LRow), order1:=xlAscending, Header:=xlNo
wsSource = .Range("C17:U" & LRow).value
End With
With dict
For i = 1 To UBound(wsSource, 1)
strCell = Join(Array(wsSource(i, 1), wsSource(i, 3)), "|")
If Not .Exists(strCell) Then
.item(strCell) = .Count + 1
For j = 1 To UBound(wsSource, 2)
wsSource(.Count, j) = wsSource(i, j)
Next j
Else
wsSource(.item(strCell), 4) = wsSource(.item(strCell), 4) + wsSource(i, 4)
End If
Next i
i = .Count
End With
Set sh = ActiveSheet
sh.Range("C17:U" & LRow).Clear
With sh.Range("C17")
.Resize(i, UBound(wsSource, 2)) = wsSource
End With
Call CheckOccurance(Range("C17:C" & LRow), LRow, sh)
End Sub
Private Function CheckOccurance(rng As Range, LRow As Long, ws As Worksheet)
Dim maxOrdNoRow As Double
Dim timesOccured As Double
Dim occur As Double
Dim uniqueList As Object
Dim i As Long
Dim data As Variant
Dim x As Long
Dim key As Variant
Dim lastUsedRow As Long
Set uniqueList = CreateObject("Scripting.Dictionary")
data = rng.value
For i = 1 To UBound(data, 1)
uniqueList(data(i, 1)) = Empty
Next i
For Each key In uniqueList
maxOrdNoRow = Application.WorksheetFunction.CountIf(Range("C17:C" & LRow), "=" & key)
timesOccured = maxOrdNoRow / 999
lastUsedRow = ws.Range("C:C").Find(what:=key, after:=ws.Range("C16")).Row
If timesOccured > 1 Then
For x = 1 To WorksheetFunction.RoundUp(timesOccured, 0)
If timesOccured > x Then
occur = timesOccured - (timesOccured - x)
Else
occur = timesOccured
End If
ws.Range(Cells(lastUsedRow, 3), Cells((occur * 999) + 16, 3)).value = key & "-" & x
lastUsedRow = (occur * 999) + 17
Next x
Else
'do nothing
End If
Next key
End Function
1 Answer 1
Unfortunately the exercise is a bit abstract to me, because I cannot visualize the data you are manipulating. A data sample would have been welcome to better comprehend your purpose.
A few comments anyway
- I don't see the code for
FindLastRow
- You have multiple references to C17, so it should be defined as a constant variable. It could be named START_CELL or something. Using uppercase is common practice.
- 999: another magic number that also appears multiple times. It should be made a constant too, MAX_ITERATIONS or something. Or it could be passed as a parameter to
CheckOccurance
(you already have 3). The aim is to avoid repetition, and leave room for flexibility (because you might want to change that value at some point). - Warning:
CheckOccurance
is a typo - You also have more 'arbitrary' values like 3, 4... I suppose they are offsets. Again it would better to use variables instead because they are more descriptive, this will make your code more readable, easier to understand and will reduce the risk of errors, especially if you shift rows/columns later. And here, doing a search & replace on 3 or 4 is not an option.
- This line has 3 hardcoded values:
ws.Range(Cells(lastUsedRow, 3), Cells((occur * 999) + 16, 3)).value = key & "-" & x
. Andx
is not the most meaningful name for a variable. - Rather than using ranges like
"C17:C" & LRow
with hardcoded references it would be better to use named ranges. They are more descriptive and more flexible. Like variables, you define them once, and reuse them multiple times. Why You Should Be Using Named Ranges in Excel - Using
ActiveSheet
is tricky, because the context could change, for example your macro could switch to another sheet or workbook, and ruin your assumptions. It would be safer to retrieve a named reference (fromActiveSheet.Name
) to a variable and then use that variable. If you have no sheet selected, this property returns Nothing. - If you are concatenating just two cells maybe using an array and JOINing is a bit overblown:
strCell = Join(Array(wsSource(i, 1), wsSource(i, 3)), "|")
. But the best is to run a benchmark and test performance using different approaches. Don't be afraid to experiment. Good code is code that is efficient while remaining intelligible. - I don't understand why you declare your variable like this:
Dim wsSource As Worksheet
and then use it like this:wsSource = .Range("C17:U" & LRow).value
. Did you meanRange
? - The choice of data type for your variables is not always optimal: several variables are of type Double. From the doc: "Double (double-precision floating-point) variables are stored as IEEE 64-bit (8-byte) floating-point numbers ranging in value from: -1.79769313486231E308 to -4.94065645841247E-324 for negative values and 4.94065645841247E-324 to 1.79769313486232E308 for positive values". Performance matters when you are doing arithmetic operations in loops.
- Even Long is more than enough: "Long (long integer) variables are stored as signed 32-bit (4-byte) numbers ranging in value from -2,147,483,648 to 2,147,483,647."
- FYI: Total number of rows and columns on a worksheet: 1,048,576 rows by 16,384 columns (source: Excel specifications and limits). So when working with columns, Integer is sufficient: "Integer variables are stored as 16-bit (2-byte) numbers ranging in value from -32,768 to 32,767".
- But but but surprise: "In recent versions, however, VBA converts all integer values to type Long, even if they're declared as type Integer. So there's no longer a performance advantage to using Integer variables; in fact, Long variables may be slightly faster because VBA does not have to convert them." (Source: The Integer Data Types).
Code structure
- Overall, tabulation is OK except
UniqueListWithSumQty
- Usage of With where appropriate
Some line spacing would be desirable, calls to external methods eg.
Call CheckOccurance
should be highlighted and not drowned with the rest of the code because it is important to be able to follow the processing stream.One problem: lack of comments. There are important for you too, because in a few weeks or months, you will have lost your train of thought and you will have to re-analyze your own code. It is also very helpful to put some data sample in comments when you extract or compute data, to have an idea of what the values look like. For example, when you extract rows from a range, copy-paste a row or a small portion of your range and add it to comment.
- It is important not just to document the individual steps, but also the general logic of your code, for this you add a few lines of comments at the top of your module.
The code is quite short, but unfortunately it's not as easy to understand as it should be, because there are too many hardcoded values, and the underlying data is not known.
Strategy
Obviously this is what you are interested in and this is where I am going to be the least helpful, because I lack some insight.
First of all, you did not give a clear definition of what qualifies as duplicate data. You've mentioned order numbers but that's the only thing we know. So I am assuming that you are really looking for duplicate order numbers but maybe that's not the whole story. You mention 'concatenating 2 fields' but what are they ?
Maybe all of this was not even necessary, because Excel now has built-in tools to find duplicate data: How to Find Duplicates in Excel. But then, it this was not sufficient for your purpose it should have been made clear and maybe we could propose a better approach.
A COUNTIF
might suffice. You even use it in your code. Maybe there is a reason but what was it, I am wondering.
Dictionaries are certainly useful, but in the present case ?
-
\$\begingroup\$ thanks for the comments really helpful and insightful. I will address this issues. In essence I have order number and product to create unique identifier so I can add the duplicated keys qty together to consolidate the list without losing data values. I thought dictionaries are the best way to go. If I make the changes to provide more clarity are u able to reassess? Thanks again much appreciated. \$\endgroup\$QuickSilver– QuickSilver2020年04月15日 07:14:00 +00:00Commented Apr 15, 2020 at 7:14
-
\$\begingroup\$ I am not convinced dictionaries are the most efficient approach in this scenario. Doing your own code inevitably adds overhead and is unlikely to perform better than the built-in functions. It's perfectly possible to use
COUNTIF
with multiple criteria so I think I would have tried this before, and come up with my own sauce only if it is not satisfactory or too cumbersome (PS: using named ranges is always a good idea). \$\endgroup\$Kate– Kate2020年04月15日 15:23:48 +00:00Commented Apr 15, 2020 at 15:23 -
\$\begingroup\$ @QuickSilver not sure what your data looks like exactly, but usually the unique/natural key for order details (this happens to be my domain 😁) involves an order number and an order line number: the product on the order detail level is an attribute of the order line item record, not part of its key. If you have something like an
OrderLine
column, that's your guy. \$\endgroup\$Mathieu Guindon– Mathieu Guindon2020年04月16日 01:32:00 +00:00Commented Apr 16, 2020 at 1:32 -
\$\begingroup\$ @MathieuGuindon I have a order number and this can have >2000 rows of records for that one order number. Now in this order number for every row you have a product it can be duplicated across multiple rows that are not near to one another but the main thing is that I have unique product for ever order number and sum up their qty. \$\endgroup\$QuickSilver– QuickSilver2020年04月16日 07:31:11 +00:00Commented Apr 16, 2020 at 7:31
strCell
,wsSource
is declared as aWorksheet
but you're using it as an array, and theFindLastRow
is completely missing. Please provide updated and working code. \$\endgroup\$wsSource
toDim wsSource As Variant
and then remove the comma from the fourth declaration lineDim strCell As String,
, and then supply my own version ofFindLastRow
. These are the errors I was referring to, but tends to indicate posted code that is not fully vetted. \$\endgroup\$