Hi folks: I posted a SO (https://stackoverflow.com/questions/57541246/optimizing-vba-function-loop) and was told to ask here.
I am starting to think that rather than relay on excel as my data repository, I should create a separate class that holds a variant array that I can then query must faster??
Anyway... here's my question. I hope someone can help. I agree that searching arrays may be faster but I need this data available across all functions of this sheet.
I am in need to optimizing some VBA which currently works functionally.
Given columns of sequential Dates (column B) and Times (column C), and Given a time window (T1 and T2), return a range of rows in which dates and times fall within T1 and T2. For example, I want MIN and MAX price between those two times.
The goal is to build Open/High/Low/Close charts for Excel candlestick charts and the data source has over 260,000 rows of data.
I currently have the following code that
Dim priceRange As Range
startRowNum = GetFirstRow(StartTime) << THIS TAKE 10 SECONDS
endRowNum = GetLastRow(endTime) << THIS TAKE 10 SECONDS
Set priceRange = Range(Cells(startRowNum, 4), Cells(endRowNum, 4))
targetRange.Offset(0, 2).Value = Application.WorksheetFunction.Max(priceRange)
targetRange.Offset(0, 3).Value = Application.WorksheetFunction.Min(priceRange)
To find the first row...
Function GetFirstRow(T As Date) As Long
'Starts at FirstRow and returns the first row where the time is greater than T1.
Dim currentRow As Long
Dim CompareTime As Date
Dim CompareDate As Date
currentRow = 4 'Start at row4 due to headers.
Do While (IsDate(Cells(currentRow, 2)))
CompareDate = Cells(currentRow, 2)
CompareTime = Cells(currentRow, 3)
marketTime = CompareDate + CompareTime
If (marketTime >= T) Then Exit Do
currentRow = currentRow + 1
Loop
GetFirstRow = currentRow
End Function
GetLastRow is very similar.
My issue is that the GetFirstRow function has to process 49,000 (yes, forty nine thousand) rows, and it takes about 10 seconds.... so it takes "minutes" to complete this run.
Can someone help me optimize this?
Note I Need the date since market data starts the night before. If this is what is slowing me down, I can filter that as I import the data?
1 Answer 1
You are doing twice the work by having a function to get the starting row and a second function get the last row. Passing the starting row into the GetLastRow()
function would be more efficient.
I prefer to have a single function return the range object. Using the WorkshetFunction.Match()
is far more efficient then iterating over the cells.
Results
getDateRange:Function
'Enumerations to clarify column data content
Public Enum DataColumns
dcStocks = 1
dcDates
dcTimes
dcValues
End Enum
' https://docs.microsoft.com/en-us/office/vba/api/excel.worksheetfunction.match
Function getDateRange(ByVal StartDateTime As Date, ByVal EndDateTime As Date) As Range
Const LargestValueGreaterThanOrEqualTo As Long = -1
Const FirstExactMatch As Long = 0
Const LagestValueLessThanOrEqualTo As Long = 1
Dim Target As Range
With ThisWorkbook.Worksheets(1)
Set Target = .Range("A4:Z4", .Cells(.Rows.Count, dcDates).End(xlUp))
End With
Dim dates
Dim RangeStart As Long, RangeEnd As Long
Dim SearchValue As Double
SearchValue = StartDateTime - 1
On Error Resume Next
RangeStart = WorksheetFunction.Match(SearchValue, Target.Columns(dcDates), LagestValueLessThanOrEqualTo)
On Error GoTo 0
If RangeStart = 0 Then Exit Function
Dim r As Long
Dim StartFlag As Boolean
Dim DateTime As Date
With Target
For r = RangeStart To .Rows.Count
DateTime = .Cells(r, dcDates).Value + .Cells(r, dcTimes).Value
If DateTime >= StartDateTime And Not StartFlag Then
RangeStart = r
StartFlag = True
End If
If DateTime > EndDateTime Then
RangeEnd = r - 1
Exit For
End If
Next
If r > .Rows.Count Then RangeEnd = .Rows.Count
Set getDateRange = .Rows(RangeStart & ":" & RangeEnd)
End With
End Function
Worksheet Test Preparation
Sub Prep()
Const RowCount As Long = 260000
'https://codereview.stackexchange.com/questions/226360/vba-loop-optimization
Dim codes, dates, stocks, times, Values
Dim d As Date, t As Date
codes = Array("ACB", "AYI", "A2B", "ABP", "ABL", "AEG", "ABT", "AJC", "AKG", "AX8", "AX1", "ACS", "ACQ", "ACF", "ACR", "ACW", "AIV")
ReDim stocks(1 To RowCount, 1 To 1)
ReDim dates(1 To RowCount, 1 To 1)
ReDim times(1 To RowCount, 1 To 1)
ReDim Values(1 To RowCount, 1 To 1)
Dim r As Long, r2 As Long
d = #1/1/2010#
For r = 1 To RowCount - 48
d = d + 1
For r2 = 0 To 47
t = TimeSerial(0, r2 * 30, 0)
stocks(r + r2, 1) = codes(WorksheetFunction.RandBetween(0, UBound(codes)))
dates(r + r2, 1) = d
times(r + r2, 1) = t
Values(r + r2, 1) = Int((Rnd * 100) + 1) + Rnd
Next
r = r + r2 - 1
Next
Range("A4").Resize(RowCount) = stocks
Range("B4").Resize(RowCount) = dates
Range("C4").Resize(RowCount) = times
Range("D4").Resize(RowCount) = Values
End Sub
Test
Sub Main()
Dim Results(5) As String * 25
Const TestCount As Long = 10
Dim n As Long
Results(0) = "Date Range"
Results(1) = "StartDateTime"
Results(2) = "EndDateTime"
Results(3) = "MinPrice"
Results(4) = "MaxPrice"
Results(5) = "Time"
Debug.Print Results(0), Results(1), Results(2), Results(3), Results(4), Results(5)
For n = 1 To TestCount
Test
Next
End Sub
Sub Test()
Dim Results(5) As String * 25
Dim t As Double: t = Timer
Dim Target As Range
Dim d As Date, StartDateTime As Date, EndDateTime As Date
StartDateTime = WorksheetFunction.RandBetween(#1/2/2010#, #8/30/2024#)
EndDateTime = StartDateTime + TimeSerial(WorksheetFunction.RandBetween(1, 24) - 1, WorksheetFunction.RandBetween(1, 2) * 60, 0) + WorksheetFunction.RandBetween(1, 60) - 1
Set Target = getDateRange(StartDateTime, EndDateTime)
Dim MinPrice As Double, MaxPrice As Double
MinPrice = WorksheetFunction.Min(Target.Columns(4))
MaxPrice = WorksheetFunction.Min(Target.Columns(4))
Results(0) = Target.Address
Results(1) = StartDateTime
Results(2) = EndDateTime
Results(3) = MinPrice
Results(4) = MaxPrice
Results(5) = Round(Timer - t, 2)
Debug.Print Results(0), Results(1), Results(2), Results(3), Results(4), Results(5)
Target.Select
End Sub
CandleStick Chart
The dataset in the image shows that you need to know the Open, High, Low, and Close for each day to create the Chart. Considering there are over 200 K rows, I presume that you will also need to filter by stock. If this is true then I would take a different approach.
I would have a dictionary that stores a sub-dictionary for each stock that stores a dictionary for each day that stores an arraylist to store the values.
Create Array From Data Structure and Write it To New Sheet
Dim CandleStickData
ReDim CandleStickData(1 To RowCount, 1 To 6)
r = 0
For Each StockKey In StockMap
Set DateMap = StockMap(StockKey)
For Each DateKey In DateMap
Set ValueList = DateMap(DateKey)
r = r + 1
rowData = ValueList.ToArray
CandleStickData(r, 1) = StockKey
CandleStickData(r, 2) = DateKey
CandleStickData(r, 3) = rowData(0)
CandleStickData(r, 4) = WorksheetFunction.Max(rowData)
CandleStickData(r, 5) = WorksheetFunction.Min(rowData)
CandleStickData(r, 6) = rowData(UBound(rowData))
Next
Next
Worksheets.Add
Range("A1:F1").Value = Array("Stock", "Date", "Open", "High", "Low", "Close")
Range("A2").Resize(RowCount, 6).Value = CandleStickData
Debug.Print Round(Timer - t)
I did a quick mock up and it took 21 seconds to load 259,967 rows of data into the dictionaries and ArrayList and just 11 seconds to build a new Array and write it to a worksheet. After the data has been processed, it would be a simply matter of getting the date range and updating the chart table. Changing the stocks or chart data should take no more than 1 tenth of a second.
-
\$\begingroup\$ Wow that is awesome. 'Never thought about returning a range. THANK YOU. FYI: 200,000 rows are all for one stock so will need to call getDateRange() for each increment (like 15 minutes). I also LOVE the idea of passing in the starting row of the previous iteration so reduce the amount of data to go thru as we progress thru the chart. This is awesome and I need to look deeper into you suggestions. THANK YOU! \$\endgroup\$Ed Landau– Ed Landau2019年08月20日 13:17:04 +00:00Commented Aug 20, 2019 at 13:17
-
\$\begingroup\$ Creating a Candlestick table from the data-set would take about 40 secs. Having 1 record per Candlestick would increase the performance by +15x and the storage capacity by 15x (assuming 15 records per Candlestick). Updating the table weekly would take far less time. \$\endgroup\$TinMan– TinMan2019年08月20日 17:54:11 +00:00Commented Aug 20, 2019 at 17:54
-
\$\begingroup\$ I recommend using a scrollbar as opposed to the mouse wheel. You would have to use WinApi calls to hook the mouse wheel. \$\endgroup\$TinMan– TinMan2019年08月20日 17:55:51 +00:00Commented Aug 20, 2019 at 17:55
-
\$\begingroup\$ TinMan: Would you be willing to consult for an hour? 'Willing to pay :) If so, my temporary throw-away email, valid for the next few days is...TempAug20Ed@gmail \$\endgroup\$Ed Landau– Ed Landau2019年08月21日 02:43:42 +00:00Commented Aug 21, 2019 at 2:43
GetFirstRow
andGetLastRow
with sql. \$\endgroup\$