8
\$\begingroup\$

I have a VBA script that does the following and I am trying to see if I can have it perform faster than 44 seconds:

  1. start with ~138k rows of data on sheets("Data")
  2. concatenate each cell in the row into a temp string variable
    • temp string will look some like this if my row are columns A:D, "I am cellAI am cell BI am cell CI am cell D"
  3. sort the column holding all temp strings, so I can see all duplicates
  4. filter to first temp string value to get the count of each occurrence
  5. copy count into a sheets("reporting") and hyperlink the count number
  6. create a new sheet that is opened from the hyperlink
    • in the end, after all count of duplicate strings are accounted for, I am creating 345 sheets
  7. copy the filtered results into the newly created sheet
  8. hide the sheet
  9. repeat steps 4 through 8

My question is, based on the amount of work being done, is 38 - 44 seconds reasonable or can it be in any way faster (less than 30 seconds)

Below is the code:

Option Explicit
Sub runReportV2()
 '----------------------------------------------------------------------------------------------------------
 '-V1 code
 ' - allow user to create grouping of fields
 ' - create temp strings of each row
 ' - compare all temp strings with each other
 ' - get count of each duplicate string occurrence and paste count to 'Report Summary' sheet
 '----------------------------------------------------------------------------------------------------------
 '----------------------------------------------------------------------------------------------------------
 '-V2 code
 ' - adding hyperlinks to aggregation count on Report Summary sheet
 ' - linking hyperlinks to a new sheet with filtered row data from data sheet
 '----------------------------------------------------------------------------------------------------------
 'These will help speed things up
 Application.Calculation = xlCalculationManual
 Application.ScreenUpdating = False
 Application.EnableEvents = False
 Application.DisplayStatusBar = False
 ActiveSheet.DisplayPageBreaks = False
 Dim x As Double ' used for the For Loop when creating temp strings
 Dim y As Double ' used for the For Loop when creating temp strings
 Dim tempStr1 As String ' cell value used to concatenate to str1 variable
 Dim str1 As String ' temp string from each cell value for the given row
 Dim aggStr As String ' temp string value used in the while loop
 Dim dataAggCount As Double ' get the last row on the rDataSheet in the while loop
 Dim count As Double: count = 1 ' used to get count of each temp string occurrence
 Dim overallRowCount As Double: overallRowCount = 2 ' this tells me which row to start on my next aggregation
 Dim aggCol As Long ' last column used on the rDataSheet. helps me know where to provide aggregation results (count variable)
 Dim pctdone As Single ' gives the statusBarForm the percentage completion
 Dim reportCount As Double ' used to provide next available row on reportSheet
 Dim sheetarray As Variant ' used to hold the worksheet creation variable. this is done in the while loop
 Dim rDataLastRow As Double ' get last row value when copying filtered data on rDataSheet
 Dim hOverallRowCount As Double ' get the overall row count to know where to paste the data in the sheetarray variable worksheet
 Dim hDataAggCount As Double ' get count of rows on temp string column
 'Variables for worksheets
 Dim rDataSheet As Worksheet: Set rDataSheet = Sheets(1) '!1 Sheet
 Dim reportSheet As Worksheet: Set reportSheet = Sheets(2)
 reportSheet.Name = "Report Summary"
'********** THESE COLLECTION VALUES ARE USER UPDATED ***********
 'Create Collection to hold items that are going to be used in the grouping
 Dim headerColl As New Collection
 headerColl.Add "SIM_c_site_id"
 headerColl.Add "iim_c_FcstName"
 headerColl.Add "iim_c_description"
'*********************************************
 'array to hold all of the column numbers used for each grouping column
 Dim headerArray As Variant
 ReDim headerArray(1 To headerColl.count)
 'variables used to get colum letter
 Dim rFind As Range
 Dim colNum As Long
 Dim z As Long
 'get count of fields (columns) with data
 Dim colCount As Long: colCount = rDataSheet.Cells(1, Columns.count).End(xlToLeft).Column
 For z = 1 To headerColl.count
 'find the needed header from header collection and get the column number
 With rDataSheet.Range(Cells(1, 1), Cells(1, colCount))
 Set rFind = .Find(What:=headerColl(z), LookAt:=xlWhole, MatchCase:=False, SearchFormat:=False)
 If Not rFind Is Nothing Then
 'gives me the column number
 colNum = rFind.Column
 'add column number to headerArray
 If z <> headerColl.count + 1 Then
 headerArray(z) = colNum
 End If
 End If
 End With
 Next z
 Set rFind = Nothing
 'insert header from data sheet to report sheet
 reportSheet.Rows(2).Value = rDataSheet.Rows(1).Value
'------------------------------------------------------------------------------------------------
'------------------------------------------------------------------------------------------------
'
'***This section will need to be updated once the user wants to add more aggregations (columns)***
' 'Alias the aggregation columns and possible the other columns
'
 'insert column for aggregating
 reportSheet.Cells(2, colCount + 1).Value = "nCount"
 'these variables are used for column numbers of the created columns above
 aggCol = colCount + 1
'------------------------------------------------------------------------------------------------
'------------------------------------------------------------------------------------------------
 'column letter conversion for the aggregation column
 Dim aggReportColLetter As String: aggReportColLetter = Col_Letter(aggCol)
 'column letter conversion for the aggregation column
 Dim lastReportColLetter As String: lastReportColLetter = Col_Letter(aggCol - 1)
 'set the progress label and show the form
 statusBarForm.LabelProgress.Width = 0
 statusBarForm.Show
 'update user on progress of script: this is where the temp strings will be produced and sorted
 With statusBarForm
 .LabelCaption.Caption = "Preparing data aggregation..."
 End With
 DoEvents
 'get count of rows on data sheet
 Dim dataRowCount As Double: dataRowCount = rDataSheet.Cells(Rows.count, 1).End(xlUp).Row
 'create tempStr column
 rDataSheet.Cells(1, colCount + 1).Value = "tempStr"
 str1 = vbNullString
 'create temp strings
 For y = 2 To dataRowCount
 For x = 1 To UBound(headerArray)
 tempStr1 = Cells(y, headerArray(x))
 str1 = str1 & tempStr1
 tempStr1 = vbNullString
 Next x
 rDataSheet.Cells(y, aggCol) = str1
 str1 = vbNullString
 Next y
 'create filter for sorting temp string column
 rDataSheet.Range("A1").AutoFilter
 'sort temp string column
 Columns("A:" & aggReportColLetter).Sort key1:=Range(aggReportColLetter & "1"), _
 order1:=xlAscending, Header:=xlYes
'********** THIS IS WHERE THE MAGIC HAPPENS **********
 'SUMMARY:
 ' - filter temp string
 ' - get the count of occurrences of temp string individual
 ' - paste count to 'Report Summary' sheet
 ' - create worksheet and paste aggregated row data results onto each sheet
 ' - do while the the row the temp string is on, is not greater than the overall row count
 Do While overallRowCount < dataRowCount
 'update progress bar percentage
 pctdone = Round((overallRowCount / dataRowCount) * 100, 2)
 With statusBarForm
 .LabelCaption.Caption = "Report Summary is " & pctdone & "%" & " complete."
 .LabelProgress.Width = pctdone * 2.7
 End With
 DoEvents
 rDataSheet.Select
 'row item to copy over to the 'Report Summary' sheet
 aggStr = Cells(overallRowCount, aggCol).Value
 'filter '!1' sheet to aggStr variable
 Range("$A1ドル:$" & aggReportColLetter & "$" & aggCol).AutoFilter Field:=aggCol, Criteria1:=aggStr
 'aggregation count (only counting visible rows)
 count = Application.Subtotal(103, Columns(aggCol)) - 1
 'last used row on the current aggregation
 dataAggCount = rDataSheet.Cells(Rows.count, aggCol).End(xlUp).Row
 'get count of rows on report sheet
 reportCount = reportSheet.Cells(Rows.count, 1).End(xlUp).Row
 With reportSheet
 'add row from data sheet to report sheet
 .Rows(reportCount + 1).Value = rDataSheet.Rows(overallRowCount).Value
 'copy aggregated result to 'Report Summary' sheet
 .Cells(reportCount + 1, aggCol).Value = count
 End With
 'next row to use for copying to 'Report Summary' sheet and aggregating
 overallRowCount = dataAggCount + 1
 aggStr = vbNullString
 'create new worksheet that will open up when the hyperlinked number is clicked
 Set sheetarray = Worksheets.Add(After:=Sheets(Sheets.count))
 sheetarray.Name = "!" & CStr(sheetarray.Index - 1)
'' create hyperlink to sheets created
 reportSheet.Select
 ActiveSheet.Hyperlinks.Add Anchor:=Cells(reportCount + 1, aggCol), Address:="", SubAddress:= _
 "'" & sheetarray.Name & "'!A1", TextToDisplay:=""
 rDataLastRow = rDataSheet.Cells(Rows.count, 1).End(xlUp).Row
 hDataAggCount = rDataSheet.Cells(Rows.count, aggCol - 1).End(xlUp).Row
 hOverallRowCount = hDataAggCount - count + 1
 'copy filtered data from rDataSheet and paste into the newly created sheet
 sheetarray.Select
 sheetarray.Range("A1:" & lastReportColLetter & 1).Value = rDataSheet.Range("A1:" & lastReportColLetter & 1).Value
 sheetarray.Range("A2:" & lastReportColLetter & count + 1).Value = rDataSheet.Range("A" & hOverallRowCount & ":" & lastReportColLetter & rDataLastRow).Value
 'format the sheet
 sheetarray.Range(Cells(1, 1), Cells(1, aggCol - 1)).EntireColumn.AutoFit
 'hide the sheet
 sheetarray.Visible = xlSheetHidden
 rDataSheet.AutoFilterMode = False
 'set the sheet to nothing, so the same variable can dynamically be used again for the next aggregation row
 Set sheetarray = Nothing
 Loop
'********** Clean up the report and close out the routine **********
 'delete the temp string column
 With rDataSheet
 .Columns(aggCol).Delete
 End With
 'auto fit columns on the Report Summary sheet
 With reportSheet
 .Range(Cells(1, 1), Cells(1, aggCol)).EntireColumn.AutoFit
 End With
 'close out of the status bar
 Unload statusBarForm
 MsgBox "Aggregation results are now availabe!", vbOKOnly, "Aggregation Completion"
 'restore order to the Excel world
 Application.ScreenUpdating = True
 Application.EnableEvents = True
 Application.Calculation = xlCalculationAutomatic
 Application.DisplayStatusBar = True
 ActiveSheet.DisplayPageBreaks = True
End Sub
'function that converts a number into a column letter
Function Col_Letter(lngCol As Long) As String
 Dim vArr
 vArr = Split(Cells(1, lngCol).Address(True, False), "$")
 Col_Letter = vArr(0)
End Function
asked Mar 29, 2019 at 18:59
\$\endgroup\$
10
  • 1
    \$\begingroup\$ Welcome to CR! I don't have time for a review right now, but I'd be curious what the performance is if you comment-out the progress indicator code? Consider updating progress once every x% of total, rather than at every single iteration. \$\endgroup\$ Commented Mar 29, 2019 at 19:21
  • \$\begingroup\$ @MathieuGuindon Thanks! I commented out the progress indicator code, and on the second run, commented out the statusBarForm as a whole and it surprisingly ran on average 2.5 second slower between both runs. For the change in progress update, are you thinking something like if overallRowCount / dataRowCount is equal to x% then update the progress indicator ? \$\endgroup\$ Commented Mar 29, 2019 at 20:16
  • 1
    \$\begingroup\$ Surprising indeed... did you leave the DoEvents in? As for the update, consider If overallRowCount Mod 100 = 0 Then UpdateProgress -- regarding the progress indicator itself, you might be interested to read this article I wrote a while back (the original code is somewhere on this site!) \$\endgroup\$ Commented Mar 29, 2019 at 20:33
  • 2
    \$\begingroup\$ Don't have time for a review at the moment. Most of your loops can be accomplished through an array, rather than switching between the Excel model and the VBA model - this will have a big impact on performance. \$\endgroup\$ Commented Mar 29, 2019 at 20:44
  • 1
    \$\begingroup\$ Welcome to Code Review! The current question title, which states your concerns about the code, is too general to be useful here. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles. \$\endgroup\$ Commented Apr 1, 2019 at 8:40

3 Answers 3

6
\$\begingroup\$

In this review, I am only looking at options for addressing the loops. With the number of rows you have described, finding efficiencies in the loops should have a big payoff.

Don't collapse lines of code using ":", it is not necessary, breaks indenting and makes it harder to find some lines.

Loop 1

'get count of fields (columns) with data
Dim colCount As Long
colCount = rDataSheet.Cells(1, Columns.count).End(xlToLeft).Column
For z = 1 To headerColl.count
 'find the needed header from header collection and get the column number
 With rDataSheet.Range(Cells(1, 1), Cells(1, colCount))
 Set rFind = .Find(What:=headerColl(z), LookAt:=xlWhole, MatchCase:=False, SearchFormat:=False)
 If Not rFind Is Nothing Then
 'gives me the column number
 colNum = rFind.Column
 'add column number to headerArray
 If z <> headerColl.count + 1 Then
 headerArray(z) = colNum
 End If
 End If
 End With
Next z

You use the Excel model with Range, .Find, .Column. With your current example, this would only loop 3 times, so is not significant in terms of performance. However, this presents a good training opportunity.

You work with a single range(rDataSheet.Range(Cells(1, 1), Cells(1, colCount))), but you set that range each time in the loop. If you were using Excel objects, you coud set the With statement outside of the loop and save some performance time there.

But you are only working with the values, so this allows us to use Arrays.

Dim dataValues as Variant
Dim hCollValue as Variant ' Must be variant to work in a for each loop
dataValues = rDataSheet.Range(Cells(1, 1), Cells(1, colCount)).Value
'For each hCollValue in headerColl ' was For z = 1 To headerColl.count
For z = 1 To headerColl.count
 'find the needed header from header collection and get the column number
 For i = LBound(dataValues) to UBound(dataValues)
 If UCase(CStr(dataValues(i,1))) = UCase(CStr(headerColl(z)) Then
 headerArray(z) = i
 Exit For
 End If
 Next i
Next z

Iterating through the entire loop to find the one value (noting that I exit when the first one is found) can be cheaper than calling the equivalent Excel function. You can see now that I am not touching Excel at all during that loop. Because of the use of Variants, I have cast them to strings, and taken the UpperCase to conduct a case-insensitive search.

The array returned by a Range of values is always two-dimensional. Because we are taking it from a single column, the array is only one wide, hence why I have used dataValues(i,1)

Point to note in your original loop:

 'If z <> headerColl.count + 1 Then ' This can never be false, because you are in a loop
 ' headerArray(z) = colNum
 'End If

Another thing to consider is the use of a custom class that acts as a data structure. In that way, you could:

For Each MyCustomClass in headerColl
 ....
 If dV = MyCustomClass.HeaderTitle Then
 MyCustomClass.ColumnNumber = i
 End If
 ....
Next MyCustomClass

Could be a real game changer if you tie lots of data or logic to these items. I suggest a Class and not a Type because you cannot iterate over a Type collection in VBA and there are some other wrinkles.

Loop 2

'create temp strings
For y = 2 To dataRowCount
 For x = 1 To UBound(headerArray)
 tempStr1 = Cells(y, headerArray(x))
 str1 = str1 & tempStr1
 tempStr1 = vbNullString
 Next x
 rDataSheet.Cells(y, aggCol) = str1
 str1 = vbNullString
Next y

This is where you are going to get the real performance hit. I am also finding it difficult to unpack the loop and what you are trying to achieve. If I am interpreting this right, you are creating a temporary string out of the values across the row (selected columns only), and putting this string into another column on the same row. Except in a different sheet.

Note: Always use qualified ranges, as you can't really tell which is the active sheet once the code is running.

Dim sourceStrings as Variant ` this will be a multi-dimensional array
Dim targetArray(dataRowCount - 2 + 1, 1) as String
Dim unionRange as Range
Dim r as Long, r2 as Long
With [ThisSheet] ' whatever you have set this sheet too - qualify all ranges.
 For r = LBound(headerArray) to UBound(headerArray)
 If unionRange is Nothing Then
 set unionRange = .Range(.Cells(2, headerArray(r)),.Cells(dataRowCount, headerArray(r)) )
 Else
 set unionRange = Union(unionRange, .Range(.Cells(2, headerArray(r)),.Cells(dataRowCount, headerArray(r))))
 End If
 Next r
End With 
sourceStrings = unionRange.Value
For r = LBound(sourceStrings,1) to UBound(sourceStrings,1) ' loop through the first dimension - but "1" is optional default and not really needed here.
 targetArray(r) = vbNullString '""
 For r2 = LBound(sourceStrings,2) to UBound(sourceStrings,2) ' loop through the second dimension
 targetArray(r) = targetArray(r) & sourceStrings(r, r2)
 Next r2
Next r 
With rDataSheet
 .Range(.Cells(2,aggCol),.Cells(dataRowCount,aggCol)).Value = targetArray
End With 

The first r loop seems a bit complicated, but it is short (3 iterations in your example) and it now sets up the quicker array. DISCLAIMER: I have not tested this. Possible may require some tweaking if Excel does funky things with values from a multi-area range.

Instead of switching in and out of Excel (headers * rowcount + rowcount) times, you would only do it (headers + insert values) times - which in this case is about 4 times.

Loop 3

Sometimes, there is not much that can be done. I have had a quick look, but I don't think using Arrays here is going to help much because of the diverse amount of data and Excel object items (not just .Value) that are used. Avoid using .Select.

answered Mar 30, 2019 at 5:55
\$\endgroup\$
3
  • \$\begingroup\$ To answer your question on Loop 2: what I am doing is grabbing all cell values for the first row, concatenating them into a temp string, pasting them into the same row, but in a new column (taking the last column index +1 to know where to input the "temp string" column) and then repeating for each row. This is all done on the same sheet.With this amount of rows, it is currently taking ~3.5 seconds to complete. I am using a column to store the temp strings, so I can then sort them to get my duplicates in chunks. This helps not having to take the each temp string and check that against every row \$\endgroup\$ Commented Apr 2, 2019 at 21:19
  • \$\begingroup\$ To answer your question on Loop 3: I was initially trying to avoid using .Select, but I ran into errors. Mainly when trying to create the hyperlink ActiveSheet.Hyperlinks.Add Anchor:=Cells(reportCount + 1, aggCol), Address:="", SubAddress:= "'" & sheetarray.Name & "'!A1", TextToDisplay:="" and in this line of code when using the "sheetarray" sheet sheetarray.Range(Cells(1, 1), Cells(1, aggCol - 1)).EntireColumn.AutoFit When I used the sheetname.Select first, the errors disappeared. \$\endgroup\$ Commented Apr 2, 2019 at 21:23
  • \$\begingroup\$ To respond to your Loop 1: The code I was using took milliseconds to complete, so I wasn't too concerned with this section of the code. I did, though, use the block of code you provided, as it looks a lot cleaner than what I was using. \$\endgroup\$ Commented Apr 2, 2019 at 21:26
4
\$\begingroup\$

I like the project but ...

Separation of Concerns

It is best to keep your procedure under 40 lines. Generally speaking, it is best to identify each task that is to be performed, divide the tasks up and have subprocedures that process one or two of them at a time. The main method would be responsible for passing data between the methods as parameters. It is much easier to debug and modify a small block of code then it is to run complex subroutine before you can test a code block.

Qualify References

You should avoid selecting and activating objects whenever possible. Using fully qualified ranges will make the code more efficient and less fragile.

This line fails if rDataSheet is not the ActiveSheet because of the cells within the range not being qualified to rDataSheet. They are referencing the cells on the ActiveSheet.

With rDataSheet.Range(Cells(1, 1), Cells(1, colCount))

Here is how to properly reference ranges:

With rDataSheet
 .Range(.Cells(1, 1), .Cells(1, colCount))

Generating Unique Keys

It is important to use a delimiter when creating a key.

Generating keys for the table below without using a delimiter only yields two unique keys, even though, all rows are unique.

+----------+----------+ | Column A | Column B | +----------+----------+ | 12 | 34 | +----------+----------+ | 123 | 4 | +----------+----------+ | 1234 | | +----------+----------+ | ABC | DF | +----------+----------+

User Experience (UX)

  • Hyperlinking to a hidden sheet. I'm guessing you'll fix this using the Worksheet_FollowHyperlink event
  • 330 hidden WorkSheets? You probably just delete them between runs but wouldn't it be easier to create a new Workbook for the report?

Naming Convention

  • headerColl: This is obviously the Column Headers. Oh, my mistake, it is the columns used to generate unique keys. But doesn't "concatenate each cell in the row into a temp string variable" suggest that each cell in the row is part of the unique key? This explains why there is a worksheet for each key. Otherwise, all the rows per each key worksheet would be identical. Anyway, I would use keyColumns.
  • rDataLastRow: DataLastRow
  • rDataSheet: DataSheet, wsData
  • tempStr1: There is value in being able to watch tempStr1 in the Locals Window or add a watch. But in my opinion, it is just clutter. Helper variables should be used to make the code more readable. This looks easier to read to me:

    For x = 1 To UBound(headerArray)
     str1 = str1 & Cells(y, headerArray(x))
    Next x
    

Can it be done faster?

Hell yeah. Altough, not 100% to specs, this code is over 8 times faster.

Option Explicit
Private Const Delimiter As String = "|"
Sub Main()
 Dim t As Double: t = Timer
 Application.ScreenUpdating = False
 Dim groups As New Scripting.Dictionary, subDic As Scripting.Dictionary
 Set groups = getRowsGroupedByDuplicateKeyColumns(ThisWorkbook.Worksheets(1), 1, 2, 3, 4)
 Dim wbReport As Workbook
 Set wbReport = CreateReport(groups)
 Dim key As Variant
 For Each key In groups
 Set subDic = groups(key)
 AddDuplicatesWorksheet wbReport, subDic
 Next
 Debug.Print Round(Timer - t, 2)
End Sub
Private Function CreateReport(ByRef groups As Scripting.Dictionary) As Workbook
 Dim wb As Workbook
 Set wb = Workbooks.Add
 Dim subDic As Scripting.Dictionary
 Dim key As Variant, results As Variant
 For Each key In groups
 Set subDic = groups(key)
 '.......
 Next
 Set CreateReport = wb
End Function
Private Sub AddDuplicatesWorksheet(wbReport As Workbook, subDic As Scripting.Dictionary)
 Dim key As Variant, results() As Variant, rowData() As Variant
 Dim r As Long, c As Long
 For Each key In subDic
 rowData = subDic(key)
 If r = 0 Then ReDim results(1 To subDic.count, 1 To UBound(rowData) + 1)
 r = r + 1
 results(r, 1) = key
 For c = 1 To UBound(rowData)
 results(r, c + 1) = rowData(c)
 Next
 Next
 With wbReport.Worksheets.Add
 .Range("A1").Resize(UBound(results), UBound(results, 2)).Value = results
 End With
End Sub
Private Function getRowsGroupedByDuplicateKeyColumns(ByRef ws As Worksheet, ParamArray KeyColumns() As Variant) As Scripting.Dictionary
 Dim dic As New Scripting.Dictionary
 Dim data() As Variant
 With ws
 data = .Range(.Range("A1", .Range("A1").End(xlToRight)), .Range("A1", .Cells(.Rows.count, 1).End(xlUp))).Value
 End With
 Dim key As Variant, keyData() As Variant, rowData() As Variant
 Dim r As Long, c As Long, keyIndex As LongPtr
 ReDim keyData(0 To UBound(KeyColumns))
 ReDim rowData(1 To UBound(data, 2))
 For r = 2 To UBound(data)
 For c = 0 To UBound(KeyColumns)
 keyIndex = KeyColumns(c)
 keyData(c) = data(r, keyIndex)
 Next
 For c = 1 To UBound(data, 2)
 rowData(c) = data(r, c)
 Next
 key = Join(keyData, Delimiter)
 If Not dic.Exists(key) Then dic.Add key, New Scripting.Dictionary
 dic(key).Add r, rowData
 Next
 Set getRowsGroupedByDuplicateKeyColumns = dic
End Function
answered Apr 1, 2019 at 11:27
\$\endgroup\$
4
\$\begingroup\$

With this review, I hope to offer an alternative approach to your stated problem by using a SQL query.

Instead of copying your data into subdivided sheets, I think a better fit to your stated problem would be simply query or filter the data you want to find, when you need it. Leave your raw data as is, and just pull it up on demand when needed. I chose to use ADODB with a SQL statement, but you could achieve something very similar with AutoFilter or Advanced Filter too.

How is this better? (IMO)

  • By copying your unique groups into new sheets, you are effectively doubling your raw data size. With this approach, your raw data remains untouched, you just summarize it.
  • You don't need to create 300+ sheets, you'd only ever need 2 sheets (given the stated problem). A summary sheet and a raw data sheet. A lot easier to debug is something goes awry.
  • No column concatenation is needed with this approach. Instead of making a composite key (of sorts) with 4 columns of joined data, simply filter (or query) the columns with the values you need for each column. BTW, concatenating all columns again, doubles the size of your data...again.

I've mocked up a spreadsheet with 150,000 rows of data in Worksheet called Raw Data. This sheet has 4 columns of randomly generated single characters to mock up what you described.

Raw Data

You'll also need a sheet named Summary, this is where the data is output to.

The way this work is it will find all matching rows that match the parameters you supply for Column 1 through 4. Each Column value needs to match what you provided to get returned.

The code below has querying approach built out. I didn't create a form/UI to pass in values, however that should be fairly easy to do now, just update the SearchParameters type, you can do this in the CreateView sub. The performance is pretty good, queries are taking less than 2 seconds to finish on my machine.

Let me know if there are any questions, happy to help.

Code

Option Explicit
Private Const adCmdText As Long = 1
Private Const adVarWChar As Long = 202
Private Const adParamInput As Long = 1
Public Type SearchParameters
 Column1Value As String
 Column2Value As String
 Column3Value As String
 Column4Value As String
End Type
Private Function GetExcelConnection() As Object
 Set GetExcelConnection = CreateObject("ADODB.Connection")
 GetExcelConnection.ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & _
 ThisWorkbook.FullName & ";Extended Properties='Excel 12.0;HDR=YES';"
 GetExcelConnection.Open
End Function
Private Sub DisplayFilteredRawData(SQLParameters As SearchParameters)
 Const SQL As String = "SELECT [Column 1],[Column 2],[Column 3],[Column 4] " & _
 "FROM [Raw Data$] " & _
 "WHERE [Column 1] = ? and [Column 2] = ? and [Column 3] = ? and [Column 4] = ?"
 Static dbConnection As Object
 Static OutputSheet As Excel.Worksheet
 Static OutputRange As Excel.Range
 Static RecordCount As Excel.Range
 Dim FilteredRS As Object
 Const MaxCellLength As Long = 32767
 Const NumberOfHeaderRows As Long = 4
 If OutputSheet Is Nothing Then Set OutputSheet = ThisWorkbook.Sheets("Summary")
 If OutputRange Is Nothing Then Set OutputRange = OutputSheet.Range("A5:F100000") 'Where data is output
 If RecordCount Is Nothing Then Set RecordCount = OutputSheet.Range("F4") 'Where the record count goes
 If dbConnection Is Nothing Then Set dbConnection = GetExcelConnection
 With CreateObject("ADODB.Command")
 .ActiveConnection = dbConnection
 .CommandType = adCmdText
 .CommandText = SQL
 .Parameters.Append .CreateParameter("@Value1", adVarWChar, adParamInput, MaxCellLength, SQLParameters.Column1Value)
 .Parameters.Append .CreateParameter("@Value2", adVarWChar, adParamInput, MaxCellLength, SQLParameters.Column2Value)
 .Parameters.Append .CreateParameter("@Value3", adVarWChar, adParamInput, MaxCellLength, SQLParameters.Column3Value)
 .Parameters.Append .CreateParameter("@Value4", adVarWChar, adParamInput, MaxCellLength, SQLParameters.Column4Value)
 Set FilteredRS = .Execute
 End With
 OutputRange.Clear
 If Not FilteredRS Is Nothing Then
 OutputSheet.Range(OutputRange.Cells(1, 1).Address).CopyFromRecordset FilteredRS
 End If
 RecordCount.Value2 = OutputSheet.Range("A1048576").End(xlUp).Row - NumberOfHeaderRows
End Sub
Public Sub CreateView()
 Dim myTimer As Double: myTimer = Timer
 Dim mySearchParameters As SearchParameters
 With mySearchParameters
 .Column1Value = "l"
 .Column2Value = "o"
 .Column3Value = "l"
 .Column4Value = "z"
 End With
 DisplayFilteredRawData mySearchParameters
 Debug.Print Timer - myTimer
End Sub
answered Mar 31, 2019 at 16:38
\$\endgroup\$
4
  • \$\begingroup\$ @AJD. I updated the answer slightly. I mention in my post there isn't a UI built out, but how to update the parameters for the query. \$\endgroup\$ Commented Apr 1, 2019 at 17:00
  • \$\begingroup\$ Pt.1 -This a thoughtful approach, but I don't think that this fits with what I am needing to do. The amount of rows on the sheet and the kind of data that will be provided on the "DataSheet" will vary each time the user pulls this data from a website and exports it to this Excel sheet, which houses the VBA script. With that being said, my goal when the user exports and opens the Excel doc is to get a summary sheet of all duplicate values based off of the grouping (set by the admin in the VBA code - Collection data), their count and a hyperlink to all the data when clicking the count hyperlink. \$\endgroup\$ Commented Apr 2, 2019 at 21:34
  • \$\begingroup\$ Pt. 2 - The hyperlink would then open up their respective sheet with all of the data that pertains to the grouping used. I may have 15 columns in the raw data, but only want to group by 4 columns. The hyperlink count, let's say is 10, would give me the 10 rows of data that pertain to my grouping. \$\endgroup\$ Commented Apr 2, 2019 at 21:37
  • \$\begingroup\$ Gotcha, if you want to return everything (e.g. all columns). You can change this part SELECT [Column 1],[Column 2],[Column 3],[Column 4] to SELECT *. This approach should work regardless of how many rows are returned. \$\endgroup\$ Commented Apr 2, 2019 at 22:50

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.