Slow VBA macro using nested loops and autofilter to consolidate select data from 2 worksheets into 1
I have two large worksheets that I need to consolidate select data into one worksheet. Both worksheets contain about 80K+ rows, the output is expected to be in that range as well. At this point in time, the code works, but it is extremely slow. I don't actually know how long it actually takes to run on the full data set. I have let is run for 8+ hours without having it finish.
The procedure uses a for each
loop to move through all the rows of the primary worksheet wsICD10
and selects all rows of concern (based on the criteria in the if
statement) and logs the LOS_Group
for use later in the VBA autofilter of the wsDUNST15
worksheet. I use two nested for
loops to exhaustively extract all the data from the wsDUNST15
. Once all my variables are full I write them to my new workbook and move on to the next record of interest in my wsICD10
with the for each
loop.
My guess is the combination of nested loops and auto-filtering the large worksheets is an inefficient way to complete my task. But I'm unaware of another way to select a worksheet row using multiple criteria.
Public Sub ICD10DataSet()
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'Create ICD-10-CM/PCS data set
'
'ICD10_Type | ageCategory | DiagnosisProcedure_Category | ICD10_Code | LOS_Group | AVG_Stay
'
'
' This will use Truven October, 2017 data files
'
' By Trevor Pye
'
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Dim wb As Workbook
Dim wbData As Workbook
Dim ICD10_Data As Worksheet
Dim wsICD10 As Worksheet
Dim wsDUNST15 As Worksheet
Dim ICD_Cell As Range
Dim ICD_Range As Range
Dim ICD10_Type As String
Dim AgeCat As Integer
Dim diagProced_Cat As Integer
Dim ICD10_Code As String
Dim LOS_Group As String
Dim AVG_Stay As Double
Dim startTime As Double
Application.ScreenUpdating = False
startTime = Time
Set wb = Workbooks("LOS_WorkBookICD-10_201710.xlsm")
Set wbData = Workbooks.Add
Set wsICD10 = wb.Worksheets("ICD10Full")
Set wsDUNST15 = wb.Worksheets("DUNST15")
Set wsOUNST15 = wb.Worksheets("OUNST15")
Set ICD_Range = wsICD10.Range("A2", wsICD10.Range("A1048576").End(xlUp))
Set ICD_Cell = wsICD10.Range("A2")
Set ICD10_Data = wbData.Sheets(1)
ICD10_Data.Range("A1") = Format(startTime, "HH:MM:SS")
'---- Start Header with block -------
With ICD10_Data
.Cells(2, 1).Value = "ICD10_Type"
.Cells(2, 2).Value = "ageCategory"
.Cells(2, 3).Value = "DiagnosisProcedure_Category"
.Cells(2, 4).Value = "ICD10_Code"
.Cells(2, 5).Value = "LOS_Group"
.Cells(2, 6).Value = "AVG_Stay"
End With
'---- Header Header with block -------
t = 3 '<--- starting row number
' define ICD_Type, ICD10_Code, LOS_Group
'-----Start ICD10 Code loop ----
For Each ICD_Cell In ICD_Range
If ICD_Cell.Offset(, 4) = "@" Or ICD_Cell.Offset(-1, 2) = ICD_Cell.Offset(0, 2) Or ICD_Cell.Offset(, 12) = "" Then
GoTo SkipCodeBlock '<--- Row not of interest, skip to next row
Else
ICD10_Code = ICD_Cell.Offset(0, 2).Value
LOS_Group = ICD_Cell.Offset(, 12)
ICD10_Type = ICD_Cell.Value
End If
'-------Start of nested diagnosis Loops ----------
If ICD10_Type = "D" Then
For i = 1 To 5
AgeCat = i ' <--- setting the Age category integer value
For j = 1 To 4
Select Case j ' <--- setting the diagnosis integer value
Case 1
diagProced_Cat = 0
Case 2
diagProced_Cat = 1
Case 3
diagProced_Cat = 3
Case Else
diagProced_Cat = 4
End Select
With wsDUNST15.UsedRange '<-- retrieving the row of interest
.AutoFilter field:=4, Criteria1:=LOS_Group
.AutoFilter field:=5, Criteria1:=diagProced_Cat
.AutoFilter field:=6, Criteria1:=AgeCat
End With
AVG_Stay =
(wsDUNST15.Range(wsDUNST15.Range("H1048576").End(xlUp),
"H2").SpecialCells(xlCellTypeVisible).Value)/10
wsDUNST15.ShowAllData ' <---Resetting filters
With ICD10_Data ' log Results
finalRow = .Cells(Rows.Count, 1).End(xlUp).Row + 1
.Cells(t, 1).Value = ICD10_Type
.Cells(t, 2).Value = AgeCat
.Cells(t, 3).Value = diagProced_Cat
.Cells(t, 4).Value = ICD10_Code
.Cells(t, 5).Value = "'" & LOS_Group
.Cells(t, 6).Value = AVG_Stay
End With
t = t + 1
Next j
Next i
Else
GoTo complete
End If
'-------End of nested diagnosis Loops -------
SkipCodeBlock:
Next ICD_Cell
'------End ICD10 Code loop -----
Application.ScreenUpdating = True
complete:
endTime = Time
ICD10_Data.Range("b1") = Format(endTime, "HH:MM:SS")
MsgBox "Procedure Complete!"
End Sub
-
\$\begingroup\$ In order to get a quality answer you should post some sample data and sample output and any Worksheet formulas that you may be using. I think that you mentioned a result set of +10K before the question was edited. That implies autofiltering the data +10k times. If that is the case give up on this approach. It work probably take less than a minute to process all the data in arrays. \$\endgroup\$user109261– user1092612017年12月17日 08:31:37 +00:00Commented Dec 17, 2017 at 8:31
-
\$\begingroup\$ I will add sample data later today. \$\endgroup\$Trevor Pye– Trevor Pye2017年12月18日 16:38:44 +00:00Commented Dec 18, 2017 at 16:38
-
\$\begingroup\$ The Dataset I'm working with is too big for excel, it exceeds the 1,048,576 row max. I probably looking for a Database solution thanks for your help. \$\endgroup\$Trevor Pye– Trevor Pye2017年12月19日 21:25:16 +00:00Commented Dec 19, 2017 at 21:25
2 Answers 2
I didn't have any data to work with so it likely needs debugged, but the general idea is to use sorting and cell formulas to do more of the heavy lifting than AutoFilter
:
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'Create ICD-10-CM/PCS data set
'
'ICD10_Type | ageCategory | DiagnosisProcedure_Category | ICD10_Code | LOS_Group | AVG_Stay
'
'
' This will use Truven October, 2017 data files
'
' By Trevor Pye
'
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Public Sub ICD10DataSet()
' Dim ICD10_Type As String
' Dim AgeCat As Integer
' Dim diagProced_Cat As Integer
' Dim ICD10_Code As String
' Dim LOS_Group As String
' Dim AVG_Stay As Double
Dim startTime As Double, endTime As Double
Application.ScreenUpdating = False
startTime = Time
Debug.Print "Started: " & Format(startTime, "HH:MM:SS")
Dim wbData As Workbook, ICD10_Data As Worksheet
Set wbData = Workbooks.Add
Set ICD10_Data = wbData.Sheets(1)
Dim wb As Workbook, wsICD10 As Worksheet, wsDUNST15 As Worksheet
Set wb = Workbooks("LOS_WorkBookICD-10_201710.xlsm")
Set wsICD10 = wb.Worksheets("ICD10Full")
Set wsDUNST15 = wb.Worksheets("DUNST15")
' Set wsOUNST15 = wb.Worksheets("OUNST15")
' Dim ICD_Range As Range
' Set ICD_Range = wsICD10.Range("A2", wsICD10.Range("A1048576").End(xlUp))
'---- Copy ICD10 data
wsICD10.UsedRange.Columns(1).Copy ICD10_Data.Range("A1")
wsICD10.UsedRange.Columns(3).Copy ICD10_Data.Range("D1")
wsICD10.UsedRange.Columns(13).Copy ICD10_Data.Range("E1")
'---- Filter ICD10 data
Dim ToRange As Range, ICD_Cell As Range
Set ToRange = ICD10_Data.UsedRange.Columns(2).Offset(1, 0)
Set ToRange = ToRange.Resize(ToRange.Rows.Count - 1, 1)
Set ICD_Cell = wsICD10.Range("A2")
ToRange.Formula = "=OR(" & _
ICD_Cell.Offset(, 4).Address(False, True, xlA1, True) & "=""@""," & _
ICD_Cell.Offset(-1, 2).Address(False, True, xlA1, True) & "=" & ICD_Cell.Offset(0, 2).Address(False, True, xlA1, True) & "," & _
ICD_Cell.Offset(, 12).Address(False, True, xlA1, True) & "=""""," & _
ICD_Cell.Address(False, True, xlA1, True) & "<>""D""" & _
")"
ToRange.Copy
ToRange.PasteSpecial xlPasteValues
Application.CutCopyMode = False
ICD10_Data.Sort.SortFields.Clear
ICD10_Data.Sort.SortFields.Add Key:=ICD10_Data.Range("B1"), _
SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
With ICD10_Data.Sort
.SetRange ICD10_Data.UsedRange
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.Apply
End With
Dim f As Range
Set f = ToRange.Find(What:="true", LookIn:=xlValues, lookat:=xlWhole, SearchOrder:=xlByColumns)
If Not f Is Nothing Then
ICD10_Data.Range(ICD10_Data.Cells(f.Row, 1), ICD10_Data.Cells(ICD10_Data.UsedRange.Rows.Count, 1)).EntireRow.Delete xlShiftUp
End If
ICD10_Data.Columns(2).ClearContents
'---- Start Header with block -------
With ICD10_Data
.Cells(1, 1).Value = "ICD10_Type"
.Cells(1, 2).Value = "ageCategory"
.Cells(1, 3).Value = "DiagnosisProcedure_Category"
.Cells(1, 4).Value = "ICD10_Code"
.Cells(1, 5).Value = "LOS_Group"
.Cells(1, 6).Value = "AVG_Stay"
End With
'---- Header Header with block -------
'---- Additional Filter
' ICD10_Data.UsedRange.RemoveDuplicates Columns:=Array(1, 4, 5), Header:=xlYes
Dim DupeRange As Range
Set DupeRange = ICD10_Data.UsedRange
Set DupeRange = DupeRange.Offset(1, 0).Resize(DupeRange.Rows.Count - 1, DupeRange.Columns.Count)
Set ToRange = DupeRange.Columns(1).Cells(DupeRange.Rows.Count).Offset(1, 0)
'---- Expand ICD10 data by AgeCat 1 to 5
'ICD10_Data.Range("B2") = [1..5]
Dim i As Long
For i = 1 To 5
DupeRange.Columns(2).Value = i
If i < 5 Then
DupeRange.Copy ToRange
Set ToRange = ToRange.Offset(DupeRange.Rows.Count, 0)
End If
Next i
'---- Expand ICD10 data by diagProced_Cat {0,1,3,4}
'ICD10_Data.Range("C2") = [0,1,3,4]
Dim diagProced_Cat As Variant
diagProced_Cat = Array(0, 1, 3, 4)
Set DupeRange = ICD10_Data.UsedRange
Set DupeRange = DupeRange.Offset(1, 0).Resize(DupeRange.Rows.Count - 1, DupeRange.Columns.Count)
Set ToRange = DupeRange.Columns(1).Cells(DupeRange.Rows.Count).Offset(1, 0)
For i = LBound(diagProced_Cat) To UBound(diagProced_Cat)
DupeRange.Columns(3).Value = diagProced_Cat(i)
If i < UBound(diagProced_Cat) Then
DupeRange.Copy ToRange
Set ToRange = ToRange.Offset(DupeRange.Rows.Count, 0)
End If
Next i
'---- Calculate AVG_Stay via cell formula
Dim LOS_GroupAddr As String, diagProced_CatAddr As String, AgeCatAddr As String, StayAddr As String
LOS_GroupAddr = wsDUNST15.UsedRange.Columns(4).Address(True, True, xlA1, True)
diagProced_CatAddr = wsDUNST15.UsedRange.Columns(5).Address(True, True, xlA1, True)
AgeCatAddr = wsDUNST15.UsedRange.Columns(6).Address(True, True, xlA1, True)
StayAddr = wsDUNST15.UsedRange.Columns(8).Address(True, True, xlA1, True)
Set ToRange = ICD10_Data.UsedRange.Columns(6).Offset(1, 0)
Set ToRange = ToRange.Resize(ToRange.Rows.Count - 1, 1)
ToRange.Formula = "=SUMPRODUCT((" & LOS_GroupAddr & "=$E2)*(" & diagProced_CatAddr & "=$C2)*(" & AgeCatAddr & "=$B2)*(" & StayAddr & "))/10"
ToRange.Copy
ToRange.PasteSpecial xlPasteValues
Application.CutCopyMode = False
ICD10_Data.Rows(1).Insert xlShiftDown
ICD10_Data.Range("A1") = Format(startTime, "HH:MM:SS")
endTime = Time
ICD10_Data.Range("B1") = Format(endTime, "HH:MM:SS")
Debug.Print "Finished: " & Format(endTime, "HH:MM:SS")
Application.ScreenUpdating = True
'complete:
MsgBox "Procedure Complete!"
End Sub
-
\$\begingroup\$ Looks promising, there are some bugs that I will work through and will publish the new version when complete. \$\endgroup\$Trevor Pye– Trevor Pye2017年12月18日 16:37:40 +00:00Commented Dec 18, 2017 at 16:37
SQL version based on comment:
I'm probably looking for a database solution
Since you're in Excel I'm guessing you'd be able to try a MS-Access solution
- Import ICD10Full and DUNST15 as tables
- Manually Create the two Tables [AgeCats] and [DiagProcs]:
AgeCats: DiagProcs: ageCategory DiagnosisProcedure_Category ----------- --------------------------- 1 0 2 1 3 3 4 4 5
- Create query [010 Filter ICD10Full], Note: ColumnE is referred to in your code
If ICD_Cell.Offset(, 4) = "@"
but never given a name
SELECT distinct ICD10_Type, ICD10_Code, LOS_Group
FROM [ICD10Full]
WHERE
ColumnE <> '@'
AND LOS_Group <> ''
AND ICD10_Type = 'D';
4. Create Query [020 ICD10Reduced]
SELECT ICD10_Type,ageCategory,DiagnosisProcedure_Category,ICD10_Code,LOS_Group
FROM [010 Filter ICD10Full],[AgeCats],[DiagProcs];
5. Create Query [030 ICD10 Summary], Note: I've left it as LEFT JOIN
to match your original code, but you'll probably want INNER JOIN
to reduce the nothingness, also ColumnH is never given a name in your code wsDUNST15.Range("H1048576")
SELECT
ICD10.ICD10_Type
,ICD10.ageCategory
,ICD10.DiagnosisProcedure_Category
,ICD10.ICD10_Code
,ICD10.LOS_Group
,Avg(DUNS15.ColumnH) AS AVG_Stay
FROM [020 ICD10Reduced] AS ICD10 LEFT JOIN [DUNS15] ON
ICD10.geCategory = DUNS15.ageCategory
AND ICD10.DiagnosisProcedure_Category = DUNS15.DiagnosisProcedure_Category
AND ICD10.LOS_Group = DUNS15.LOS_Group
GROUP BY
ICD10.ICD10_Type
,ICD10.ageCategory
,ICD10.DiagnosisProcedure_Category
,ICD10.ICD10_Code
,ICD10.LOS_Group;
Other databases (SQL-Server, Oracle, MySQL, PostgreSQL, ...) would have different formatting of the SQL code, but the steps would more or less remain the same.