I have built a COVID model using UVA data. Currently that data is unavailable so I'm using another source. The new source is, of course, a different format. So rather than refactor all of my model macros, I'm formatting the new data in the old format on import.
The new data looks like this:
The xlsx files go out to column EH with a new column added daily. There are 267 rows in these files. The import function ends up with a file that looks like this:
In this file I don't import all the Confirmed = 0 and I wind up deleting a bunch of the countries using the population column (population = 0 gets deleted). So the file end up with about 6800 rows. This will grow daily also.
I have a file for confirmed, another for deaths, and a third for recovered. Importing the Confirmed and deleting what I don't want takes about a minute. When I try to add in the Deaths file, I can see the column being filled in with the correct numbers from the new data but it's taking so long I can't imagine waiting for it to end. I've waited over 30 minutes before hitting Esc and Deaths still won't be finished.
I realize I'm going through a lot of cells a lot of times. So, is there a way to optimize my nested For loops in the Deaths and Recovered file imports to still be in the desired format yet not take over half an hour?
Option Explicit
Sub ImportCSSEConfirmed()
Dim i As Variant
Dim j As Variant
Dim lastrow As Long
Dim clastrow As Long
Dim lastcol As Long
Dim currentData As Range
Dim filePath As String
Dim wb As Excel.Workbook
Dim ws As Excel.Worksheet
Dim cws As Excel.Worksheet
Set cws = ThisWorkbook.Sheets("Raw_Data")
lastrow = cws.Cells(Rows.count, "a").End(xlUp).Row
If lastrow < 2 Then lastrow = 2
Set currentData = cws.Range("a2:l" & lastrow)
currentData.ClearContents
filePath = "C:\Users\chris.h\Desktop\COVID\Other_Data\CSSE\CSSE_Confirmed.xlsx"
Set wb = Excel.Workbooks.Open(filePath)
Set ws = wb.Worksheets(1)
lastrow = ws.Cells(Rows.count, "b").End(xlUp).Row
lastcol = ws.Cells(1, Columns.count).End(xlToLeft).Column
clastrow = cws.Cells(Rows.count, "a").End(xlUp).Row + 1
'takes the csse data files and combines and reformats them into the raw_data sheet in the combined file
'col a = province/state, col b = country, col c = date, col d = confirmed
For i = 2 To lastrow
For j = 3 To lastcol
If ws.Cells(i, j).Value <> 0 Then
cws.Cells(clastrow, "a").Value = ws.Cells(i, 1).Value
cws.Cells(clastrow, "b").Value = ws.Cells(i, 2).Value
cws.Cells(clastrow, "c").Value = ws.Cells(1, j).Value
cws.Cells(clastrow, "d").Value = ws.Cells(i, j).Value
cws.Cells(clastrow, "d").NumberFormat = "#,##0"
clastrow = clastrow + 1
End If
Next j
Next i
wb.Close False
Call PopulationColumn
Call DeleteExtras
predictDone = False
End Sub
Sub ImportCSSEDeaths()
Dim i As Variant
Dim j As Variant
Dim k As Variant
Dim lastrow As Long
Dim clastrow As Long
Dim lastcol As Long
Dim dte As Date
Dim filePath As String
Dim wb As Excel.Workbook
Dim ws As Excel.Worksheet
Dim cws As Excel.Worksheet
Dim t As Double
Dim tt As String
t = Timer
Set cws = ThisWorkbook.Sheets("Raw_Data")
lastrow = cws.Cells(Rows.count, "a").End(xlUp).Row
filePath = "C:\Users\chris.h\Desktop\COVID\Other_Data\CSSE\CSSE_Deaths.xlsx"
Set wb = Excel.Workbooks.Open(filePath)
Set ws = wb.Worksheets(1)
clastrow = cws.Cells(Rows.count, "b").End(xlUp).Row
lastrow = ws.Cells(Rows.count, "b").End(xlUp).Row
lastcol = ws.Cells(1, Columns.count).End(xlToLeft).Column
For i = 2 To clastrow
For j = 2 To lastrow
For k = 3 To lastcol
If cws.Cells(i, "a").Value = ws.Cells(j, "a").Value And _
cws.Cells(i, "b").Value = ws.Cells(j, "b").Value And _
cws.Cells(i, "c").Value = ws.Cells(1, k).Value Then
cws.Cells(i, "e").Value = ws.Cells(j, k).Value
cws.Cells(i, "e").NumberFormat = "#,##0"
End If
Next k
Next j
Next i
wb.Close False
tt = Format((Timer - t) / 86400, "hh:mm:ss")
predictDone = False
End Sub
Sub ImportCSSERecovered()
Dim i As Variant
Dim j As Variant
Dim k As Variant
Dim lastrow As Long
Dim clastrow As Long
Dim lastcol As Long
Dim dte As Date
Dim filePath As String
Dim wb As Excel.Workbook
Dim ws As Excel.Worksheet
Dim cws As Excel.Worksheet
Set cws = ThisWorkbook.Sheets("Raw_Data")
lastrow = cws.Cells(Rows.count, "a").End(xlUp).Row
filePath = "C:\Users\chris.h\Desktop\COVID\Other_Data\CSSE\CSSE_Deaths.xlsx"
Set wb = Excel.Workbooks.Open(filePath & fileName)
Set ws = wb.Worksheets(1)
clastrow = cws.Cells(Rows.count, "b").End(xlUp).Row
lastrow = ws.Cells(Rows.count, "b").End(xlUp).Row
lastcol = ws.Cells(1, Columns.count).End(xlToLeft).Column
For i = 2 To clastrow
For j = 2 To lastrow
For k = 3 To lastcol
If cws.Cells(i, "a").Value = ws.Cells(j, "a").Value And _
cws.Cells(i, "b").Value = ws.Cells(j, "b").Value And _
cws.Cells(i, "c").Value = ws.Cells(1, k).Value Then
cws.Cells(i, "f").Value = ws.Cells(j, k).Value
cws.Cells(i, "f").NumberFormat = "#,##0"
End If
Next k
Next j
Next i
wb.Close False
predictDone = False
End Sub
Sub PopulationColumn()
Dim i As Variant
Dim country As String
Dim state As String
Dim rng As Range
Dim lastrow As Long
Dim population As Long
Dim landarea As Double
Dim popdensity As Double
Dim cws As Worksheet
Set cws = ThisWorkbook.Worksheets("Raw_Data")
lastrow = cws.Cells(Rows.count, "b").End(xlUp).Row
Set rng = cws.Range("b2:b" & lastrow)
For Each i In rng
country = i
state = cws.Cells(i.Row, "a").Value
If country = "United Arab Emirates" Then
population = 9890402
landarea = 32278
popdensity = population / landarea
ElseIf country = "Iran" Then
population = 83992949
landarea = 628786
popdensity = population / landarea
ElseIf country = "Oman" Then
population = 5080712
landarea = 119499
popdensity = population / landarea
ElseIf country = "Kuwait" Then
population = 4270571
landarea = 6880
popdensity = population / landarea
ElseIf country = "Bahrain" Then
population = 1701575
landarea = 293
popdensity = population / landarea
ElseIf country = "Iraq" Then
population = 40222493
landarea = 167692
popdensity = population / landarea
ElseIf country = "Pakistan" Then
population = 220892340
landarea = 297638
popdensity = population / landarea
ElseIf country = "Qatar" Then
population = 2881053
landarea = 4483
popdensity = population / landarea
ElseIf country = "Jordan" Then
population = 10203134
landarea = 34278
popdensity = population / landarea
ElseIf country = "Saudi Arabia" Then
population = 34810000
landarea = 830000
popdensity = population / landarea
ElseIf country = "Kazakhstan" Then
population = 18776707
landarea = 1042360
popdensity = population / landarea
ElseIf country = "Syria" Then
population = 17500658
landarea = 70900
popdensity = population / landarea
ElseIf country = "Yemen" Then
population = 29825964
landarea = 203850
popdensity = population / landarea
ElseIf country = "Afghanistan" Then
population = 38928346
landarea = 252071
popdensity = population / landarea
ElseIf country = "Italy" Then
population = 60478457
landarea = 113568
popdensity = population / landarea
ElseIf country = "France" Then
population = 65273511
landarea = 211413
popdensity = population / landarea
ElseIf country = "South Korea" Then
population = 51269185
landarea = 37541
popdensity = population / landarea
ElseIf country = "Spain" Then
population = 46754778
landarea = 192588
popdensity = population / landarea
ElseIf state = "South Carolina" Then
population = 5210095
landarea = 30111
popdensity = population / landarea
ElseIf state = "Texas" Then
population = 29472295
landarea = 261914
popdensity = population / landarea
ElseIf state = "Georgia" Then
population = 10736059
landarea = 57919
popdensity = population / landarea
ElseIf state = "Kentucky" Then
population = 4499692
landarea = 39732
popdensity = population / landarea
ElseIf state = "North Carolina" Then
population = 10611862
landarea = 48718
popdensity = population / landarea
ElseIf country = "United Kingdom" Then
population = 67886011
landarea = 93410
popdensity = population / landarea
ElseIf country = "Switzerland" Then
population = 8654281
landarea = 15257
popdensity = population / landarea
ElseIf country = "Hungary" Then
population = 9660351
landarea = 34954
popdensity = population / landarea
ElseIf country = "Turkey" Then
population = 84339067
landarea = 297156
popdensity = population / landarea
ElseIf country = "Portugal" Then
population = 10196709
landarea = 35363
popdensity = population / landarea
ElseIf country = "Austria" Then
population = 9010000
landarea = 31818
popdensity = population / landarea
ElseIf country = "Poland" Then
population = 37846611
landarea = 118236
popdensity = population / landarea
ElseIf country = "Germany" Then
population = 83783942
landarea = 134580
popdensity = population / landarea
ElseIf country = "Egypt" Then
population = 102334404
landarea = 384345
popdensity = population / landarea
ElseIf state = "Kansas" Then
population = 2910357
landarea = 81823
popdensity = population / landarea
ElseIf country = "Argentina" Then
population = 45516865
landarea = 1056641
popdensity = population / landarea
ElseIf country = "Belize" Then
population = 397628
landarea = 8807
popdensity = population / landarea
ElseIf country = "Norway" Then
population = 5413094
landarea = 141031
popdensity = population / landarea
ElseIf country = "Finland" Then
population = 5540720
landarea = 117333
popdensity = population / landarea
ElseIf country = "Japan" Then
population = 126476461
landarea = 140755
popdensity = population / landarea
ElseIf country = "Australia" Then
population = 25701300
landarea = 2969907
popdensity = population / landarea
ElseIf state = "Colorado" Then
population = 5845526
landarea = 103730
popdensity = population / landarea
ElseIf state = "Oregon" Then
population = 4301089
landarea = 96105
popdensity = population / landarea
ElseIf country = "Sweden" Then
population = 10087218
landarea = 173860
popdensity = population / landarea
Else
population = 0
popdensity = 0
End If
cws.Cells(i.Row, "h").Value = population
cws.Cells(i.Row, "i").Value = popdensity
Next i
cws.Range("h2:h" & lastrow).NumberFormat = "#,##0"
cws.Range("i2:i" & lastrow).NumberFormat = "#,##0"
End Sub
Sub DeleteExtras()
Dim lastrow As Long
Dim rng As Range
Dim i As Variant
Dim count As Integer
Dim cws As Worksheet
Set cws = ThisWorkbook.Worksheets("Raw_Data")
lastrow = cws.Cells(Rows.count, "b").End(xlUp).Row
Set rng = cws.Range("h2:h" & lastrow)
count = 0
Do While count <= 10
For Each i In rng
If i = 0 Then
i.EntireRow.Delete
End If
Next i
count = count + 1
Loop
End Sub
1 Answer 1
I have analyzed your code and suggest the following changes:
the second loop over
j
to find thestartdate
is superfluous and can be included in the search loop by using a simpleif
.avoid using
Redim Preserve
in a loop, it's time consuming as each time the array has to be copied completely;Dim
the array once to a set maximum and shorten it once after the loop.from your code I am deducting that the array
deaths()
is filled from scratch for each loop overi
. Therefore,k
should be set to zero within thei
-loop.in the end, the whole array
deaths()
is copied cell-by-cell to a target range. This can be done in one statement, which is multiple times faster than touching each element.Sub ImportCSSEDeaths() Dim i As Long, j As Long, k As Long Dim lastrow As Long, clastrow As Long, lastcol As Long Dim deaths() As Long Dim startDate As Date Dim filePath As String Dim wb As Excel.Workbook, ws As Excel.Worksheet, cws As Excel.Worksheet Set cws = ThisWorkbook.Sheets("Raw_Data") clastrow = cws.Cells(Rows.count, "b").End(xlUp).row filePath = "C:\Users\chris.h\Desktop\COVID\Other_Data\CSSE\CSSE_Deaths.xlsx" Set wb = Excel.Workbooks.Open(filePath) Set ws = wb.Worksheets(1) lastrow = ws.Cells(Rows.count, "b").End(xlUp).row lastcol = ws.Cells(1, Columns.count).End(xlToLeft).Column For i = 2 To lastrow 'puts country row deaths into array With ws k = 0 ' deaths() is zero-based! Option Base 0 ReDim deaths(lastcol) ' cannot get larger than this For j = 3 To lastcol If .Cells(i, j).Value <> 0 Then deaths(k) = .Cells(i, j).Value If k = 0 Then startDate = .Cells(1, j).Value End If k = k + 1 End If Next j End With ReDim Preserve deaths(k - 1) ' shrink once to actual size 'finds startdate in compiled data and enters array values down column E With cws For j = 2 To clastrow If .Cells(j, "a").Value = ws.Cells(i, "a").Value And _ .Cells(j, "b").Value = ws.Cells(i, "b") And _ .Cells(j, "c").Value = startDate Then ' copy deaths(0..ub) to .cells(j..ub+j,"e") in one step Dim dest As Range Set dest = .Cells(j, "e") ' first cell in destination Set dest = dest.Resize(UBound(deaths) + 1, 1) dest.Value = Application.Transpose(deaths) End If Next j End With Next i wb.Close False End Sub ' ImportCSSEDeaths()
Edit: delete rows with a null value
Following your comment, your routine Delete_Extras()
not only searches row-by-row but does so for 11 times. You will probably have noticed that not all matching lines got deleted on the first pass.
One way to fix this is to loop from the end to the beginning of the range, so that deleting a row does not affect rows yet unprocessed.
Instead, I suggest the following: filter the range for a "0" in column H and delete all visible rows in one command, like this
Sub Delete_Extra_Rows_Based_On_Value()
' autofilter a range and delete visible rows
' 2020年07月01日
Dim cws As Worksheet
Dim lastrow As Long
Dim result As Range
Set cws = ThisWorkbook.Worksheets("H:\Raw_Data")
lastrow = cws.Cells(Rows.count, "B").End(xlUp).row
With Application
.ScreenUpdating = False
.EnableEvents = False
.Calculation = xlCalculationManual
.DisplayAlerts = False
End With
' clear any existing filters
If cws.AutoFilterMode Then cws.ShowAllData
' apply filter
With cws.Range("A1:H" & lastrow)
.AutoFilter Field:=8, Criteria1:=0
' delete matching rows
.Offset(1, 0).SpecialCells(xlCellTypeVisible).Delete
.AutoFilter
End With
With Application
.ScreenUpdating = True
.EnableEvents = True
.Calculation = xlCalculationAutomatic
.DisplayAlerts = True
End With
End Sub
-
\$\begingroup\$ This is awesome! Thank you so much! I caught the
k=0
should be inside the first loop after posting but the rest is all new. This teaches me a lot actually. \$\endgroup\$Chris H.– Chris H.2020年06月29日 12:42:04 +00:00Commented Jun 29, 2020 at 12:42 -
\$\begingroup\$ I was getting an out of bounds error at
ReDim Preserve deaths(k - 1)
so I changed it toReDim Preserve deaths(k)
and it removed the error. Honestly I'm not sure why this removed the error but it did. Now the biggest time suck on this process is removing the countries I don't track (population = 0
). Which takes anywhere from 3 minutes to 20 depending on what else is happening on the computer. \$\endgroup\$Chris H.– Chris H.2020年06月29日 13:59:32 +00:00Commented Jun 29, 2020 at 13:59 -
\$\begingroup\$ Yes I wasn't sure if
ReDim
takes the length or the highest index as parameter. For the other sub, have you already posted it here, as a new post? \$\endgroup\$user1016274– user10162742020年06月30日 13:12:53 +00:00Commented Jun 30, 2020 at 13:12 -
\$\begingroup\$ I actually cut off screen updateing on each sub and it really sped things up. Delete extras takes about a minute, now. The whole model takes around 2 minutes to run, which is totally exceptable. If that changes, I'll definitely post another question. Thank you so much!!!! \$\endgroup\$Chris H.– Chris H.2020年07月01日 14:21:23 +00:00Commented Jul 1, 2020 at 14:21
-
\$\begingroup\$ please see the sub I added - we missed each other while I was writing it. \$\endgroup\$user1016274– user10162742020年07月01日 15:23:16 +00:00Commented Jul 1, 2020 at 15:23
PopulationColumn()
andDeleteExtras()
? As is, this doesn't compile and those could likely be optimized as well. \$\endgroup\$For i = 2 To clastrow
mentionlastrow = 2
and before i loop ends increment lastrow by 1. So you can remove For j loop. \$\endgroup\$