This code is meant to run through a column of values, bin the values based on specified ranges, then output the average value of each bin. The problem is the code is running quite slowly (approximately 30 min for around 100000 values). I am definitely a beginner at coding and was hoping there was some way to speed this code along.
Sub BinValues()
'binns seperation distance values for the creation of variogram
Application.ScreenUpdating = False
Application.EnableEvents = False
Application.Calculation = xlCalculationManual
Dim Cell As Object
Dim R1 As Range
Dim R2 As Range
Dim rng As Range
'define range before running
Set rng = Range("A1:A105570")
Dim K, n, L As Integer
'n is equal to the number of lags
'L is the lag size
n = 12
L = 600
For K = L To (n * L) Step 600
For Each Cell In rng
Dim min As Integer
min = K - L
'upper bound exclusive and lower bound inclusive
If Cell.Value >= min And Cell.Value < K Then
If R1 Is Nothing Then
Set R1 = Range(Cell.Address)
Else
Set R1 = Union(R1, Range(Cell.Address))
End If
Cells((K / L), 5) = WorksheetFunction.Average(R1)
End If
Next
Set R1 = Nothing
Next
Application.ScreenUpdating = True
Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
End Sub
2 Answers 2
Data belongs in an Array
A worksheet *looks* like a grid of data, but there's an enourmous amount of overhead sitting behind it. Every time you do anything to a spreadsheet, events fire, formulas calculate and a million other things happen behind the scenes.
Working with Ranges is computationally expensive, and you're doing it N*105,570*2 times.
Instead, what you want is an Array
. An Array
is just a grid of data laid out in memory. Because it is *just* data there are no overheads, and so you can read/write to it about a Million times faster.
You can create an Array
by reading in a range, like so:
Dim dataRange As Range
Set dataRange = Range("A1:A105570")
Dim dataArray As Variant
dataArray = dataRange.Value
And now, the value in "A1"
is in dataArray(1, 1)
, "A2"
in dataArray(2, 1)
etc.
Let's re-write your code to use an Array:
Option Explicit
Public Sub BinValues()
'binns seperation distance values for the creation of variogram
Application.ScreenUpdating = False
Application.EnableEvents = False
Application.Calculation = xlCalculationManual
Dim dataRange As Range
Set dataRange = Range("A1:A105570")
Dim dataArray As Variant
dataArray = dataRange.Value
Const NUM_LAGS As Long = 12
Const LAG_SIZE As Long = 600
Dim minValue As Double
Dim maxValue As Double
Dim lagCounter As Long
Dim ix As Long
Dim elementValue As Double
Dim elementSum As Double
Dim numElements As Double
Dim elementAverage As Double
For lagCounter = 1 To NUM_LAGS
minValue = (lagCounter - 1) * LAG_SIZE
maxValue = (lagCounter * LAG_SIZE) - 1
numElements = 0
elementSum = 0
For ix = LBound(dataArray, 1) To UBound(dataArray, 1)
elementValue = dataArray(ix, 1)
If elementValue >= minValue And elementValue <= maxValue Then
numElements = numElements + 1
elementSum = elementSum + elementValue
End If
Next ix
elementAverage = elementSum / numElements
Cells(lagCounter, 5) = elementAverage
Next lagCounter
Application.ScreenUpdating = True
Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
End Sub
That alone should take your runtime from 1/2 an hour to a couple of seconds (if that).
-
\$\begingroup\$ thank you that worked wonderfully. Now I will try to piggy back on this post. I have a code nearly identical to the first I posted that averages all the values in column two that correspond with the bins in column 1 using the selection offset. I have commented in this portion of the code to my original post. I am having trouble figuring out how I can do the same with the code you provided. \$\endgroup\$zanwigz– zanwigz2016年08月05日 14:23:31 +00:00Commented Aug 5, 2016 at 14:23
-
\$\begingroup\$ @zanwigz 2 suggestions. 1, don't change the question after the fact, by all means post a follow-up instead. 2, just post the entire thing. The more code you show us, the more we can link things together and restructure them. \$\endgroup\$Kaz– Kaz2016年08月05日 15:23:29 +00:00Commented Aug 5, 2016 at 15:23
-
2\$\begingroup\$ Follow Zak's advice. I reworked an application (not mine) that way and it literally took processing down from 10 hours to under 10 minutes \$\endgroup\$user97873– user978732016年08月05日 20:41:09 +00:00Commented Aug 5, 2016 at 20:41
Limit data
the range Range("A1:A105570")
would have subsequent loops iterate over 105570 cells every time
since you're dealing with numbers it'd be faster to consider non blank cells filled with numbers only
Set rng = Range("A1:A105570").SpecialCells(xlCellTypeConstants, xlNumbers)
Do in loops only what actually varies in them
so
Dim min As Integer
must be taken outside bothFor K = L...
andFor Each Cell...
loopsmin = K - L
must be taken outsideFor K = L...
loopCells((K / L), 5) = WorksheetFunction.Average(R1)
must be taken outsideFor Each cell...
loop and placed right afterNext cell
ans substitutingSet R1 = Nothing
with:If Not R1 Is Nothing Then Cells((K / L), 5) = WorksheetFunction.Average(R1) Else Set R1 = Nothing End If
Avoid unnecessary IF statements
the following piece of code
For Each Cell In rng
....
If Cell.Value >= min And Cell.Value < K Then
If R1 Is Nothing Then
Set R1 = Range(Cell.Address)
Else
Set R1 = Union(R1, Range(Cell.Address))
End If
Cells((K / L), 5) = WorksheetFunction.Average(R1)
End If
Next
Set R1 = Nothing
could be refactored like follows
Set R1 = rng(1, 1).Offset(, 1) '<--| set R1 to a dummy "invalid" range
For Each cell In rng
'upper bound exclusive and lower bound inclusive
If cell.value >= min And cell.value < K Then Set R1 = Union(R1, cell) '<--| go with union without worrying about R1 being empty
Next cell
Set R1 = Intersect(R1, Columns(1)) '<-- intersect R1 with "valid" column
Miscellanea
beware that coding like:
Dim K, n, L As Integer
stands for:
Dim K As Variant, n As Variant, L As Integer
since without explicit type declaration after any variable has the compiler consider it implicitly as of
Variant
typeuse
Long
type instead ofInteger
oneit doesn't "cost" significantly in terms of memory and is much more appropriate when dealing with excel UI rows number since
Integer
type ranges from -32768 to 32767 while worksheet rows can reach up to 65,536 (till Excel 2003) or nearly 1 million (from Excel 2007 on)!
Summary #1
as for all what above a possible refactoring of your code could be the following
Option Explicit
Sub BinValues2()
'binns seperation distance values for the creation of variogram
Application.ScreenUpdating = False
Application.EnableEvents = False
Application.Calculation = xlCalculationManual
Dim cell As Object
Dim R1 As Range, rng As Range
Dim K As Long, min As Long, n As Long, L As Long
'define range before running
Set rng = Range("A1:A105570").SpecialCells(xlCellTypeConstants, xlNumbers) '<--| limit data to relevant ones
n = 12 'n is equal to the number of lags
L = 600 'L is the lag size
For K = L To (n * L) Step L
min = K - L
Set R1 = rng(1, 1).Offset(, 1) '<--| set R1 to a dummy "invalid" range
For Each cell In rng
'upper bound exclusive and lower bound inclusive
If cell.value >= min And cell.value < K Then Set R1 = Union(R1, cell) '<--| go with union without worrying about R1 being empty
Next cell
Set R1 = Intersect(R1, Columns(1)) '<-- intersect R1 with "valid" column
If Not R1 Is Nothing Then
Cells((K / L), 5) = WorksheetFunction.Average(R1)
Else
Set R1 = Nothing
End If
Next
Application.ScreenUpdating = True
Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
End Sub
The BIG JUMP: Use Excel built-in data manipulation
All what above leads (hopefully) to better coding habits and some significant speed increment, but the big jump comes with the use of excel built in data manipulation functions, namely Autofilter()
method of Range
object
Use With ... End With blocks
this not only shortens code typing (thus increasing code readability and decreases typos probability) but also reduces objects memory accesses, especially effective when loops are in the game
Summary #2
Autofilter()
and With...End With
blocks, can lead to a very short, elegant (at least to me it does look like) and fast code, like follows:
Option Explicit
Sub BinValues()
'binns seperation distance values for the creation of variogram
Dim K As Long, n As Long, L As Long
n = 12
L = 600
With Worksheets("bins") '<--|refer to data sheet (change "bins" with your actual worksheet name
.Cells(1, 1).Insert '<--|insert a temporary header cell: it'll be used for AutoFilter() method and eventually deleted
.Cells(1, 1).value = "bin" '<--| place a dummy header in the temporary header cell
With .Range("A1", .Cells(.Rows.Count, 1).End(xlUp)) '<--| refer columns "A" down to its last non empty row
For K = L To (n * L) Step L
.AutoFilter field:=1, Criteria1:=">=" & K - L, Operator:=xlAnd, Criteria2:="<" & K '<--| filter it on current department value
If Application.WorksheetFunction.Subtotal(103, .Cells) > 1 Then Cells((K / L), 5) = Application.WorksheetFunction.Subtotal(101, .Cells)
Next K
End With
.AutoFilterMode = False
.Cells(1, 1).Delete '<--| delete temporary header cell
End With
End Sub
which, even without any ScreenUpdating, Calculation or Events disabling runs much much faster than the one in Summary #1