1
\$\begingroup\$

Would like to have a code review so I can improve my knowledge and expertise in VBA more specifically with Dictionaries as this is the first time I used one.

For context I have duplicate data and need to consolidate creating a unique list. To achieve this I used a Dictionary concatenating 2 fields to make a unique key. Also I have a function that uses Dictionaries to get a unique key so I can find the number of occurrence for each order number and add a suffix at the end if more than 999 rows used for the same order.

Option Explicit
Sub UniqueListWithSumQty()
Dim wsSource As Worksheet
Dim ws As Worksheet
Dim sh As Worksheet
Dim strCell As String
Dim i As Long, j As Long
Dim LRow As Long
Dim dict As Object
Set ws = ActiveSheet
Set dict = CreateObject("Scripting.Dictionary")
With ws
 LRow = FindLastRow(ActiveSheet, 3, 21)
 .Range("C17:U" & LRow).Sort key1:=.Range("C17:C" & LRow), order1:=xlAscending, Header:=xlNo
 wsSource = .Range("C17:U" & LRow).value
End With
With dict
 For i = 1 To UBound(wsSource, 1)
 strCell = Join(Array(wsSource(i, 1), wsSource(i, 3)), "|")
 If Not .Exists(strCell) Then
 .item(strCell) = .Count + 1
 For j = 1 To UBound(wsSource, 2)
 wsSource(.Count, j) = wsSource(i, j)
 Next j
 Else
 wsSource(.item(strCell), 4) = wsSource(.item(strCell), 4) + wsSource(i, 4)
 End If
 Next i
 i = .Count
End With
Set sh = ActiveSheet
sh.Range("C17:U" & LRow).Clear
With sh.Range("C17")
 .Resize(i, UBound(wsSource, 2)) = wsSource
End With
Call CheckOccurance(Range("C17:C" & LRow), LRow, sh)
End Sub
Private Function CheckOccurance(rng As Range, LRow As Long, ws As Worksheet)
 Dim maxOrdNoRow As Double
 Dim timesOccured As Double
 Dim occur As Double
 Dim uniqueList As Object
 Dim i As Long
 Dim data As Variant
 Dim x As Long
 Dim key As Variant
 Dim lastUsedRow As Long
 Set uniqueList = CreateObject("Scripting.Dictionary")
 data = rng.value
 For i = 1 To UBound(data, 1)
 uniqueList(data(i, 1)) = Empty
 Next i
 For Each key In uniqueList
 maxOrdNoRow = Application.WorksheetFunction.CountIf(Range("C17:C" & LRow), "=" & key)
 timesOccured = maxOrdNoRow / 999
 lastUsedRow = ws.Range("C:C").Find(what:=key, after:=ws.Range("C16")).Row
 If timesOccured > 1 Then
 For x = 1 To WorksheetFunction.RoundUp(timesOccured, 0)
 If timesOccured > x Then
 occur = timesOccured - (timesOccured - x)
 Else
 occur = timesOccured
 End If
 ws.Range(Cells(lastUsedRow, 3), Cells((occur * 999) + 16, 3)).value = key & "-" & x
 lastUsedRow = (occur * 999) + 17
 Next x
 Else
 'do nothing
 End If
 Next key
End Function
Mast
13.8k12 gold badges56 silver badges127 bronze badges
asked Apr 13, 2020 at 12:44
\$\endgroup\$
6
  • \$\begingroup\$ If you want people to help you then you have to provide code that is understandable. Therefore your first task is to rework the code you have posted to provide meaningful variable names for third party reviewers (e.g. no i, ii, wsData,sCellData etc). A second point that is becoming increasingly relevant for VBA is, have you addressed all of the issues raised from a RubberDuck code inspection (Rubberduck is a fantastic and free addin for VBA that will do a much stricter check on your code than VBA and give you a list of issues, plus lots of other goodies.). \$\endgroup\$ Commented Apr 13, 2020 at 13:28
  • \$\begingroup\$ Also, a helpful pointer might be that it is perfectly valid for an Item in a Dictionary to be another dictionary i.e If not dict.exists(myKey) then dict.add myKey, new Scripting.Dictionary \$\endgroup\$ Commented Apr 13, 2020 at 13:37
  • 1
    \$\begingroup\$ Your code will not compile and there are numerous problems with it, such as the comma at the end of declaring strCell, wsSource is declared as a Worksheet but you're using it as an array, and the FindLastRow is completely missing. Please provide updated and working code. \$\endgroup\$ Commented Apr 14, 2020 at 15:41
  • \$\begingroup\$ @PeterT the code works perfectly and it dose what it needs I have validated the data and it matches with original except no duplicates and qty is matching as well. \$\endgroup\$ Commented Apr 14, 2020 at 15:44
  • \$\begingroup\$ I can get it to compile if I change the declaration of wsSource to Dim wsSource As Variant and then remove the comma from the fourth declaration line Dim strCell As String,, and then supply my own version of FindLastRow. These are the errors I was referring to, but tends to indicate posted code that is not fully vetted. \$\endgroup\$ Commented Apr 14, 2020 at 15:55

1 Answer 1

5
\$\begingroup\$

Unfortunately the exercise is a bit abstract to me, because I cannot visualize the data you are manipulating. A data sample would have been welcome to better comprehend your purpose.

A few comments anyway

  • I don't see the code for FindLastRow
  • You have multiple references to C17, so it should be defined as a constant variable. It could be named START_CELL or something. Using uppercase is common practice.
  • 999: another magic number that also appears multiple times. It should be made a constant too, MAX_ITERATIONS or something. Or it could be passed as a parameter to CheckOccurance (you already have 3). The aim is to avoid repetition, and leave room for flexibility (because you might want to change that value at some point).
  • Warning: CheckOccurance is a typo
  • You also have more 'arbitrary' values like 3, 4... I suppose they are offsets. Again it would better to use variables instead because they are more descriptive, this will make your code more readable, easier to understand and will reduce the risk of errors, especially if you shift rows/columns later. And here, doing a search & replace on 3 or 4 is not an option.
  • This line has 3 hardcoded values: ws.Range(Cells(lastUsedRow, 3), Cells((occur * 999) + 16, 3)).value = key & "-" & x. And x is not the most meaningful name for a variable.
  • Rather than using ranges like "C17:C" & LRow with hardcoded references it would be better to use named ranges. They are more descriptive and more flexible. Like variables, you define them once, and reuse them multiple times. Why You Should Be Using Named Ranges in Excel
  • Using ActiveSheet is tricky, because the context could change, for example your macro could switch to another sheet or workbook, and ruin your assumptions. It would be safer to retrieve a named reference (from ActiveSheet.Name) to a variable and then use that variable. If you have no sheet selected, this property returns Nothing.
  • If you are concatenating just two cells maybe using an array and JOINing is a bit overblown: strCell = Join(Array(wsSource(i, 1), wsSource(i, 3)), "|"). But the best is to run a benchmark and test performance using different approaches. Don't be afraid to experiment. Good code is code that is efficient while remaining intelligible.
  • I don't understand why you declare your variable like this: Dim wsSource As Worksheet and then use it like this: wsSource = .Range("C17:U" & LRow).value. Did you mean Range ?
  • The choice of data type for your variables is not always optimal: several variables are of type Double. From the doc: "Double (double-precision floating-point) variables are stored as IEEE 64-bit (8-byte) floating-point numbers ranging in value from: -1.79769313486231E308 to -4.94065645841247E-324 for negative values and 4.94065645841247E-324 to 1.79769313486232E308 for positive values". Performance matters when you are doing arithmetic operations in loops.
  • Even Long is more than enough: "Long (long integer) variables are stored as signed 32-bit (4-byte) numbers ranging in value from -2,147,483,648 to 2,147,483,647."
  • FYI: Total number of rows and columns on a worksheet: 1,048,576 rows by 16,384 columns (source: Excel specifications and limits). So when working with columns, Integer is sufficient: "Integer variables are stored as 16-bit (2-byte) numbers ranging in value from -32,768 to 32,767".
  • But but but surprise: "In recent versions, however, VBA converts all integer values to type Long, even if they're declared as type Integer. So there's no longer a performance advantage to using Integer variables; in fact, Long variables may be slightly faster because VBA does not have to convert them." (Source: The Integer Data Types).

Code structure

  • Overall, tabulation is OK except UniqueListWithSumQty
  • Usage of With where appropriate
  • Some line spacing would be desirable, calls to external methods eg. Call CheckOccurance should be highlighted and not drowned with the rest of the code because it is important to be able to follow the processing stream.

  • One problem: lack of comments. There are important for you too, because in a few weeks or months, you will have lost your train of thought and you will have to re-analyze your own code. It is also very helpful to put some data sample in comments when you extract or compute data, to have an idea of what the values look like. For example, when you extract rows from a range, copy-paste a row or a small portion of your range and add it to comment.

  • It is important not just to document the individual steps, but also the general logic of your code, for this you add a few lines of comments at the top of your module.

The code is quite short, but unfortunately it's not as easy to understand as it should be, because there are too many hardcoded values, and the underlying data is not known.


Strategy

Obviously this is what you are interested in and this is where I am going to be the least helpful, because I lack some insight.

First of all, you did not give a clear definition of what qualifies as duplicate data. You've mentioned order numbers but that's the only thing we know. So I am assuming that you are really looking for duplicate order numbers but maybe that's not the whole story. You mention 'concatenating 2 fields' but what are they ?

Maybe all of this was not even necessary, because Excel now has built-in tools to find duplicate data: How to Find Duplicates in Excel. But then, it this was not sufficient for your purpose it should have been made clear and maybe we could propose a better approach.

A COUNTIF might suffice. You even use it in your code. Maybe there is a reason but what was it, I am wondering. Dictionaries are certainly useful, but in the present case ?

answered Apr 14, 2020 at 23:46
\$\endgroup\$
4
  • \$\begingroup\$ thanks for the comments really helpful and insightful. I will address this issues. In essence I have order number and product to create unique identifier so I can add the duplicated keys qty together to consolidate the list without losing data values. I thought dictionaries are the best way to go. If I make the changes to provide more clarity are u able to reassess? Thanks again much appreciated. \$\endgroup\$ Commented Apr 15, 2020 at 7:14
  • \$\begingroup\$ I am not convinced dictionaries are the most efficient approach in this scenario. Doing your own code inevitably adds overhead and is unlikely to perform better than the built-in functions. It's perfectly possible to use COUNTIF with multiple criteria so I think I would have tried this before, and come up with my own sauce only if it is not satisfactory or too cumbersome (PS: using named ranges is always a good idea). \$\endgroup\$ Commented Apr 15, 2020 at 15:23
  • \$\begingroup\$ @QuickSilver not sure what your data looks like exactly, but usually the unique/natural key for order details (this happens to be my domain 😁) involves an order number and an order line number: the product on the order detail level is an attribute of the order line item record, not part of its key. If you have something like an OrderLine column, that's your guy. \$\endgroup\$ Commented Apr 16, 2020 at 1:32
  • \$\begingroup\$ @MathieuGuindon I have a order number and this can have >2000 rows of records for that one order number. Now in this order number for every row you have a product it can be duplicated across multiple rows that are not near to one another but the main thing is that I have unique product for ever order number and sum up their qty. \$\endgroup\$ Commented Apr 16, 2020 at 7:31

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.