3
\$\begingroup\$

I have built a COVID model using UVA data. Currently that data is unavailable so I'm using another source. The new source is, of course, a different format. So rather than refactor all of my model macros, I'm formatting the new data in the old format on import.

The new data looks like this:

enter image description here

The xlsx files go out to column EH with a new column added daily. There are 267 rows in these files. The import function ends up with a file that looks like this:

enter image description here

In this file I don't import all the Confirmed = 0 and I wind up deleting a bunch of the countries using the population column (population = 0 gets deleted). So the file end up with about 6800 rows. This will grow daily also.

I have a file for confirmed, another for deaths, and a third for recovered. Importing the Confirmed and deleting what I don't want takes about a minute. When I try to add in the Deaths file, I can see the column being filled in with the correct numbers from the new data but it's taking so long I can't imagine waiting for it to end. I've waited over 30 minutes before hitting Esc and Deaths still won't be finished.

I realize I'm going through a lot of cells a lot of times. So, is there a way to optimize my nested For loops in the Deaths and Recovered file imports to still be in the desired format yet not take over half an hour?

Option Explicit
Sub ImportCSSEConfirmed()
Dim i As Variant
Dim j As Variant
Dim lastrow As Long
Dim clastrow As Long
Dim lastcol As Long
Dim currentData As Range
Dim filePath As String
Dim wb As Excel.Workbook
Dim ws As Excel.Worksheet
Dim cws As Excel.Worksheet
Set cws = ThisWorkbook.Sheets("Raw_Data")
lastrow = cws.Cells(Rows.count, "a").End(xlUp).Row
If lastrow < 2 Then lastrow = 2
Set currentData = cws.Range("a2:l" & lastrow)
currentData.ClearContents
filePath = "C:\Users\chris.h\Desktop\COVID\Other_Data\CSSE\CSSE_Confirmed.xlsx"
Set wb = Excel.Workbooks.Open(filePath)
Set ws = wb.Worksheets(1)
 
lastrow = ws.Cells(Rows.count, "b").End(xlUp).Row
lastcol = ws.Cells(1, Columns.count).End(xlToLeft).Column
 
clastrow = cws.Cells(Rows.count, "a").End(xlUp).Row + 1
 
'takes the csse data files and combines and reformats them into the raw_data sheet in the combined file
'col a = province/state, col b = country, col c = date, col d = confirmed
For i = 2 To lastrow
 For j = 3 To lastcol
 If ws.Cells(i, j).Value <> 0 Then
 cws.Cells(clastrow, "a").Value = ws.Cells(i, 1).Value
 cws.Cells(clastrow, "b").Value = ws.Cells(i, 2).Value
 cws.Cells(clastrow, "c").Value = ws.Cells(1, j).Value
 cws.Cells(clastrow, "d").Value = ws.Cells(i, j).Value
 cws.Cells(clastrow, "d").NumberFormat = "#,##0"
 clastrow = clastrow + 1
 End If
 Next j
Next i
wb.Close False
Call PopulationColumn
Call DeleteExtras
predictDone = False
End Sub
Sub ImportCSSEDeaths()
Dim i As Variant
Dim j As Variant
Dim k As Variant
Dim lastrow As Long
Dim clastrow As Long
Dim lastcol As Long
Dim dte As Date
Dim filePath As String
Dim wb As Excel.Workbook
Dim ws As Excel.Worksheet
Dim cws As Excel.Worksheet
Dim t As Double
Dim tt As String
t = Timer
Set cws = ThisWorkbook.Sheets("Raw_Data")
lastrow = cws.Cells(Rows.count, "a").End(xlUp).Row
filePath = "C:\Users\chris.h\Desktop\COVID\Other_Data\CSSE\CSSE_Deaths.xlsx"
Set wb = Excel.Workbooks.Open(filePath)
Set ws = wb.Worksheets(1)
 
clastrow = cws.Cells(Rows.count, "b").End(xlUp).Row
 
lastrow = ws.Cells(Rows.count, "b").End(xlUp).Row
lastcol = ws.Cells(1, Columns.count).End(xlToLeft).Column
For i = 2 To clastrow
 For j = 2 To lastrow
 For k = 3 To lastcol
 
 If cws.Cells(i, "a").Value = ws.Cells(j, "a").Value And _
 cws.Cells(i, "b").Value = ws.Cells(j, "b").Value And _
 cws.Cells(i, "c").Value = ws.Cells(1, k).Value Then
 
 cws.Cells(i, "e").Value = ws.Cells(j, k).Value
 cws.Cells(i, "e").NumberFormat = "#,##0"
 End If
 Next k
 Next j
Next i
wb.Close False
tt = Format((Timer - t) / 86400, "hh:mm:ss")
predictDone = False
End Sub
Sub ImportCSSERecovered()
Dim i As Variant
Dim j As Variant
Dim k As Variant
Dim lastrow As Long
Dim clastrow As Long
Dim lastcol As Long
Dim dte As Date
Dim filePath As String
Dim wb As Excel.Workbook
Dim ws As Excel.Worksheet
Dim cws As Excel.Worksheet
Set cws = ThisWorkbook.Sheets("Raw_Data")
lastrow = cws.Cells(Rows.count, "a").End(xlUp).Row
filePath = "C:\Users\chris.h\Desktop\COVID\Other_Data\CSSE\CSSE_Deaths.xlsx"
Set wb = Excel.Workbooks.Open(filePath & fileName)
Set ws = wb.Worksheets(1)
 
clastrow = cws.Cells(Rows.count, "b").End(xlUp).Row
 
lastrow = ws.Cells(Rows.count, "b").End(xlUp).Row
lastcol = ws.Cells(1, Columns.count).End(xlToLeft).Column
 
For i = 2 To clastrow
 For j = 2 To lastrow
 For k = 3 To lastcol
 
 If cws.Cells(i, "a").Value = ws.Cells(j, "a").Value And _
 cws.Cells(i, "b").Value = ws.Cells(j, "b").Value And _
 cws.Cells(i, "c").Value = ws.Cells(1, k).Value Then
 
 cws.Cells(i, "f").Value = ws.Cells(j, k).Value
 cws.Cells(i, "f").NumberFormat = "#,##0"
 End If
 Next k
 Next j
Next i
wb.Close False
predictDone = False
End Sub
Sub PopulationColumn()
Dim i As Variant
Dim country As String
Dim state As String
Dim rng As Range
Dim lastrow As Long
Dim population As Long
Dim landarea As Double
Dim popdensity As Double
Dim cws As Worksheet
Set cws = ThisWorkbook.Worksheets("Raw_Data")
lastrow = cws.Cells(Rows.count, "b").End(xlUp).Row
Set rng = cws.Range("b2:b" & lastrow)
For Each i In rng
 country = i
 state = cws.Cells(i.Row, "a").Value
 
 If country = "United Arab Emirates" Then
 population = 9890402
 landarea = 32278
 popdensity = population / landarea
 ElseIf country = "Iran" Then
 population = 83992949
 landarea = 628786
 popdensity = population / landarea
 ElseIf country = "Oman" Then
 population = 5080712
 landarea = 119499
 popdensity = population / landarea
 ElseIf country = "Kuwait" Then
 population = 4270571
 landarea = 6880
 popdensity = population / landarea
 ElseIf country = "Bahrain" Then
 population = 1701575
 landarea = 293
 popdensity = population / landarea
 ElseIf country = "Iraq" Then
 population = 40222493
 landarea = 167692
 popdensity = population / landarea
 ElseIf country = "Pakistan" Then
 population = 220892340
 landarea = 297638
 popdensity = population / landarea
 ElseIf country = "Qatar" Then
 population = 2881053
 landarea = 4483
 popdensity = population / landarea
 ElseIf country = "Jordan" Then
 population = 10203134
 landarea = 34278
 popdensity = population / landarea
 ElseIf country = "Saudi Arabia" Then
 population = 34810000
 landarea = 830000
 popdensity = population / landarea
 ElseIf country = "Kazakhstan" Then
 population = 18776707
 landarea = 1042360
 popdensity = population / landarea
 ElseIf country = "Syria" Then
 population = 17500658
 landarea = 70900
 popdensity = population / landarea
 ElseIf country = "Yemen" Then
 population = 29825964
 landarea = 203850
 popdensity = population / landarea
 ElseIf country = "Afghanistan" Then
 population = 38928346
 landarea = 252071
 popdensity = population / landarea
 ElseIf country = "Italy" Then
 population = 60478457
 landarea = 113568
 popdensity = population / landarea
 ElseIf country = "France" Then
 population = 65273511
 landarea = 211413
 popdensity = population / landarea
 ElseIf country = "South Korea" Then
 population = 51269185
 landarea = 37541
 popdensity = population / landarea
 ElseIf country = "Spain" Then
 population = 46754778
 landarea = 192588
 popdensity = population / landarea
 ElseIf state = "South Carolina" Then
 population = 5210095
 landarea = 30111
 popdensity = population / landarea
 ElseIf state = "Texas" Then
 population = 29472295
 landarea = 261914
 popdensity = population / landarea
 ElseIf state = "Georgia" Then
 population = 10736059
 landarea = 57919
 popdensity = population / landarea
 ElseIf state = "Kentucky" Then
 population = 4499692
 landarea = 39732
 popdensity = population / landarea
 ElseIf state = "North Carolina" Then
 population = 10611862
 landarea = 48718
 popdensity = population / landarea
 ElseIf country = "United Kingdom" Then
 population = 67886011
 landarea = 93410
 popdensity = population / landarea
 ElseIf country = "Switzerland" Then
 population = 8654281
 landarea = 15257
 popdensity = population / landarea
 ElseIf country = "Hungary" Then
 population = 9660351
 landarea = 34954
 popdensity = population / landarea
 ElseIf country = "Turkey" Then
 population = 84339067
 landarea = 297156
 popdensity = population / landarea
 ElseIf country = "Portugal" Then
 population = 10196709
 landarea = 35363
 popdensity = population / landarea
 ElseIf country = "Austria" Then
 population = 9010000
 landarea = 31818
 popdensity = population / landarea
 ElseIf country = "Poland" Then
 population = 37846611
 landarea = 118236
 popdensity = population / landarea
 ElseIf country = "Germany" Then
 population = 83783942
 landarea = 134580
 popdensity = population / landarea
 ElseIf country = "Egypt" Then
 population = 102334404
 landarea = 384345
 popdensity = population / landarea
 ElseIf state = "Kansas" Then
 population = 2910357
 landarea = 81823
 popdensity = population / landarea
 ElseIf country = "Argentina" Then
 population = 45516865
 landarea = 1056641
 popdensity = population / landarea
 ElseIf country = "Belize" Then
 population = 397628
 landarea = 8807
 popdensity = population / landarea
 ElseIf country = "Norway" Then
 population = 5413094
 landarea = 141031
 popdensity = population / landarea
 ElseIf country = "Finland" Then
 population = 5540720
 landarea = 117333
 popdensity = population / landarea
 ElseIf country = "Japan" Then
 population = 126476461
 landarea = 140755
 popdensity = population / landarea
 ElseIf country = "Australia" Then
 population = 25701300
 landarea = 2969907
 popdensity = population / landarea
 ElseIf state = "Colorado" Then
 population = 5845526
 landarea = 103730
 popdensity = population / landarea
 ElseIf state = "Oregon" Then
 population = 4301089
 landarea = 96105
 popdensity = population / landarea
 ElseIf country = "Sweden" Then
 population = 10087218
 landarea = 173860
 popdensity = population / landarea
 Else
 population = 0
 popdensity = 0
 End If
 
 cws.Cells(i.Row, "h").Value = population
 cws.Cells(i.Row, "i").Value = popdensity
Next i
cws.Range("h2:h" & lastrow).NumberFormat = "#,##0"
cws.Range("i2:i" & lastrow).NumberFormat = "#,##0"
End Sub
Sub DeleteExtras()
Dim lastrow As Long
Dim rng As Range
Dim i As Variant
Dim count As Integer
Dim cws As Worksheet
Set cws = ThisWorkbook.Worksheets("Raw_Data")
lastrow = cws.Cells(Rows.count, "b").End(xlUp).Row
Set rng = cws.Range("h2:h" & lastrow)
count = 0
Do While count <= 10
For Each i In rng
 
 If i = 0 Then
 i.EntireRow.Delete
 End If
Next i
count = count + 1
Loop
End Sub
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Jun 17, 2020 at 14:59
\$\endgroup\$
4
  • \$\begingroup\$ Can you post the code for PopulationColumn() and DeleteExtras()? As is, this doesn't compile and those could likely be optimized as well. \$\endgroup\$ Commented Jun 17, 2020 at 17:46
  • \$\begingroup\$ The Population and Pop Density columns are just a list of interested countries. The states that are still in the list are from the UVA data. My new data set is only outside the US.I left them in the Sub becuase I just haven't gotten around to deleting them yet. The DeleteExtra Sub is looped because for some reason it doesn't catch all the 0 values the first time. \$\endgroup\$ Commented Jun 17, 2020 at 18:42
  • \$\begingroup\$ The question may be OBE. I've broken up the three loops into seperate macros and changed the order of the nesting. I did this so I could manage the new data files easier. I've also put a timer on the Deaths and Recovered macros. I will edit the above to reflect the new code and the actual time it takes to run. \$\endgroup\$ Commented Jun 18, 2020 at 14:19
  • \$\begingroup\$ In the second and the third loop you are looping on rows twice (clastrow and lastrow also). So, if the both rows are say 500 then it will loop 500*500=250000 times. It will take huge time. In the first loop, you are incrementing clastrow by 1 in every loop. Do the same in second and third loop. Before For i = 2 To clastrow mention lastrow = 2 and before i loop ends increment lastrow by 1. So you can remove For j loop. \$\endgroup\$ Commented Jun 18, 2020 at 15:10

1 Answer 1

3
\$\begingroup\$

I have analyzed your code and suggest the following changes:

  • the second loop over j to find the startdate is superfluous and can be included in the search loop by using a simple if.

  • avoid using Redim Preserve in a loop, it's time consuming as each time the array has to be copied completely; Dim the array once to a set maximum and shorten it once after the loop.

  • from your code I am deducting that the array deaths() is filled from scratch for each loop over i. Therefore, k should be set to zero within the i-loop.

  • in the end, the whole array deaths() is copied cell-by-cell to a target range. This can be done in one statement, which is multiple times faster than touching each element.

     Sub ImportCSSEDeaths()
     Dim i As Long, j As Long, k As Long
     Dim lastrow As Long, clastrow As Long, lastcol As Long
     Dim deaths() As Long
     Dim startDate As Date
     Dim filePath As String
     Dim wb As Excel.Workbook, ws As Excel.Worksheet, cws As Excel.Worksheet
     Set cws = ThisWorkbook.Sheets("Raw_Data")
     clastrow = cws.Cells(Rows.count, "b").End(xlUp).row
     filePath = "C:\Users\chris.h\Desktop\COVID\Other_Data\CSSE\CSSE_Deaths.xlsx"
     Set wb = Excel.Workbooks.Open(filePath)
     Set ws = wb.Worksheets(1)
     lastrow = ws.Cells(Rows.count, "b").End(xlUp).row
     lastcol = ws.Cells(1, Columns.count).End(xlToLeft).Column
     For i = 2 To lastrow
     'puts country row deaths into array
     With ws
     k = 0 ' deaths() is zero-based! Option Base 0
     ReDim deaths(lastcol) ' cannot get larger than this
     For j = 3 To lastcol
     If .Cells(i, j).Value <> 0 Then
     deaths(k) = .Cells(i, j).Value
     If k = 0 Then
     startDate = .Cells(1, j).Value
     End If
     k = k + 1
     End If
     Next j
     End With
     ReDim Preserve deaths(k - 1) ' shrink once to actual size
     'finds startdate in compiled data and enters array values down column E
     With cws
     For j = 2 To clastrow
     If .Cells(j, "a").Value = ws.Cells(i, "a").Value And _
     .Cells(j, "b").Value = ws.Cells(i, "b") And _
     .Cells(j, "c").Value = startDate Then
     ' copy deaths(0..ub) to .cells(j..ub+j,"e") in one step
     Dim dest As Range
     Set dest = .Cells(j, "e") ' first cell in destination
     Set dest = dest.Resize(UBound(deaths) + 1, 1)
     dest.Value = Application.Transpose(deaths)
     End If
     Next j
     End With
     Next i
     wb.Close False
     End Sub ' ImportCSSEDeaths()
    

Edit: delete rows with a null value

Following your comment, your routine Delete_Extras() not only searches row-by-row but does so for 11 times. You will probably have noticed that not all matching lines got deleted on the first pass.
One way to fix this is to loop from the end to the beginning of the range, so that deleting a row does not affect rows yet unprocessed.
Instead, I suggest the following: filter the range for a "0" in column H and delete all visible rows in one command, like this

 Sub Delete_Extra_Rows_Based_On_Value()
 ' autofilter a range and delete visible rows
 ' 2020年07月01日
 
 Dim cws As Worksheet
 Dim lastrow As Long
 Dim result As Range
 
 Set cws = ThisWorkbook.Worksheets("H:\Raw_Data")
 lastrow = cws.Cells(Rows.count, "B").End(xlUp).row
 
 With Application
 .ScreenUpdating = False
 .EnableEvents = False
 .Calculation = xlCalculationManual
 .DisplayAlerts = False
 End With
 
 ' clear any existing filters
 If cws.AutoFilterMode Then cws.ShowAllData
 ' apply filter
 With cws.Range("A1:H" & lastrow)
 .AutoFilter Field:=8, Criteria1:=0
 ' delete matching rows
 .Offset(1, 0).SpecialCells(xlCellTypeVisible).Delete
 .AutoFilter
 End With
 
 With Application
 .ScreenUpdating = True
 .EnableEvents = True
 .Calculation = xlCalculationAutomatic
 .DisplayAlerts = True
 End With
 End Sub
answered Jun 26, 2020 at 14:52
\$\endgroup\$
6
  • \$\begingroup\$ This is awesome! Thank you so much! I caught the k=0 should be inside the first loop after posting but the rest is all new. This teaches me a lot actually. \$\endgroup\$ Commented Jun 29, 2020 at 12:42
  • \$\begingroup\$ I was getting an out of bounds error at ReDim Preserve deaths(k - 1) so I changed it to ReDim Preserve deaths(k) and it removed the error. Honestly I'm not sure why this removed the error but it did. Now the biggest time suck on this process is removing the countries I don't track (population = 0). Which takes anywhere from 3 minutes to 20 depending on what else is happening on the computer. \$\endgroup\$ Commented Jun 29, 2020 at 13:59
  • \$\begingroup\$ Yes I wasn't sure if ReDim takes the length or the highest index as parameter. For the other sub, have you already posted it here, as a new post? \$\endgroup\$ Commented Jun 30, 2020 at 13:12
  • \$\begingroup\$ I actually cut off screen updateing on each sub and it really sped things up. Delete extras takes about a minute, now. The whole model takes around 2 minutes to run, which is totally exceptable. If that changes, I'll definitely post another question. Thank you so much!!!! \$\endgroup\$ Commented Jul 1, 2020 at 14:21
  • \$\begingroup\$ please see the sub I added - we missed each other while I was writing it. \$\endgroup\$ Commented Jul 1, 2020 at 15:23

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.