7
\$\begingroup\$

This code is meant to run through a column of values, bin the values based on specified ranges, then output the average value of each bin. The problem is the code is running quite slowly (approximately 30 min for around 100000 values). I am definitely a beginner at coding and was hoping there was some way to speed this code along.

Sub BinValues()
'binns seperation distance values for the creation of variogram
Application.ScreenUpdating = False
Application.EnableEvents = False
Application.Calculation = xlCalculationManual
Dim Cell As Object
Dim R1 As Range
Dim R2 As Range
Dim rng As Range
'define range before running
Set rng = Range("A1:A105570")
Dim K, n, L As Integer
'n is equal to the number of lags
'L is the lag size
n = 12
L = 600
For K = L To (n * L) Step 600
 For Each Cell In rng
 Dim min As Integer
 min = K - L
 'upper bound exclusive and lower bound inclusive
 If Cell.Value >= min And Cell.Value < K Then
 If R1 Is Nothing Then
 Set R1 = Range(Cell.Address)
 Else
 Set R1 = Union(R1, Range(Cell.Address))
 End If
 Cells((K / L), 5) = WorksheetFunction.Average(R1) 
 End If 
 Next
 Set R1 = Nothing
Next
Application.ScreenUpdating = True
Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
End Sub
asked Aug 3, 2016 at 21:29
\$\endgroup\$

2 Answers 2

5
\$\begingroup\$

Data belongs in an Array

A worksheet *looks* like a grid of data, but there's an enourmous amount of overhead sitting behind it. Every time you do anything to a spreadsheet, events fire, formulas calculate and a million other things happen behind the scenes.

Working with Ranges is computationally expensive, and you're doing it N*105,570*2 times.

Instead, what you want is an Array. An Array is just a grid of data laid out in memory. Because it is *just* data there are no overheads, and so you can read/write to it about a Million times faster.

You can create an Array by reading in a range, like so:

Dim dataRange As Range
Set dataRange = Range("A1:A105570")
Dim dataArray As Variant
dataArray = dataRange.Value

And now, the value in "A1" is in dataArray(1, 1), "A2" in dataArray(2, 1) etc.

Let's re-write your code to use an Array:

Option Explicit
Public Sub BinValues()
 'binns seperation distance values for the creation of variogram
 Application.ScreenUpdating = False
 Application.EnableEvents = False
 Application.Calculation = xlCalculationManual
 Dim dataRange As Range
 Set dataRange = Range("A1:A105570")
 Dim dataArray As Variant
 dataArray = dataRange.Value
 Const NUM_LAGS As Long = 12
 Const LAG_SIZE As Long = 600
 Dim minValue As Double
 Dim maxValue As Double
 Dim lagCounter As Long
 Dim ix As Long
 Dim elementValue As Double
 Dim elementSum As Double
 Dim numElements As Double
 Dim elementAverage As Double
 For lagCounter = 1 To NUM_LAGS
 minValue = (lagCounter - 1) * LAG_SIZE
 maxValue = (lagCounter * LAG_SIZE) - 1
 numElements = 0
 elementSum = 0
 For ix = LBound(dataArray, 1) To UBound(dataArray, 1)
 elementValue = dataArray(ix, 1)
 If elementValue >= minValue And elementValue <= maxValue Then
 numElements = numElements + 1
 elementSum = elementSum + elementValue
 End If
 Next ix
 elementAverage = elementSum / numElements
 Cells(lagCounter, 5) = elementAverage
 Next lagCounter
 Application.ScreenUpdating = True
 Application.EnableEvents = True
 Application.Calculation = xlCalculationAutomatic
End Sub

That alone should take your runtime from 1/2 an hour to a couple of seconds (if that).

answered Aug 4, 2016 at 0:43
\$\endgroup\$
3
  • \$\begingroup\$ thank you that worked wonderfully. Now I will try to piggy back on this post. I have a code nearly identical to the first I posted that averages all the values in column two that correspond with the bins in column 1 using the selection offset. I have commented in this portion of the code to my original post. I am having trouble figuring out how I can do the same with the code you provided. \$\endgroup\$ Commented Aug 5, 2016 at 14:23
  • \$\begingroup\$ @zanwigz 2 suggestions. 1, don't change the question after the fact, by all means post a follow-up instead. 2, just post the entire thing. The more code you show us, the more we can link things together and restructure them. \$\endgroup\$ Commented Aug 5, 2016 at 15:23
  • 2
    \$\begingroup\$ Follow Zak's advice. I reworked an application (not mine) that way and it literally took processing down from 10 hours to under 10 minutes \$\endgroup\$ Commented Aug 5, 2016 at 20:41
2
\$\begingroup\$

Limit data

the range Range("A1:A105570") would have subsequent loops iterate over 105570 cells every time

since you're dealing with numbers it'd be faster to consider non blank cells filled with numbers only

Set rng = Range("A1:A105570").SpecialCells(xlCellTypeConstants, xlNumbers)

Do in loops only what actually varies in them

so

  • Dim min As Integer must be taken outside both For K = L... and For Each Cell... loops

  • min = K - L must be taken outside For K = L... loop

  • Cells((K / L), 5) = WorksheetFunction.Average(R1) must be taken outside For Each cell... loop and placed right after Next cell ans substituting Set R1 = Nothing with: If Not R1 Is Nothing Then Cells((K / L), 5) = WorksheetFunction.Average(R1) Else Set R1 = Nothing End If


Avoid unnecessary IF statements

the following piece of code

For Each Cell In rng
 ....
 If Cell.Value >= min And Cell.Value < K Then
 If R1 Is Nothing Then
 Set R1 = Range(Cell.Address)
 Else
 Set R1 = Union(R1, Range(Cell.Address))
 End If
 Cells((K / L), 5) = WorksheetFunction.Average(R1) 
 End If 
Next
Set R1 = Nothing

could be refactored like follows

 Set R1 = rng(1, 1).Offset(, 1) '<--| set R1 to a dummy "invalid" range
 For Each cell In rng
 'upper bound exclusive and lower bound inclusive
 If cell.value >= min And cell.value < K Then Set R1 = Union(R1, cell) '<--| go with union without worrying about R1 being empty
 Next cell
 Set R1 = Intersect(R1, Columns(1)) '<-- intersect R1 with "valid" column

Miscellanea

  • beware that coding like:

    Dim K, n, L As Integer
    

    stands for:

    Dim K As Variant, n As Variant, L As Integer
    

    since without explicit type declaration after any variable has the compiler consider it implicitly as of Variant type

  • use Long type instead of Integer one

    it doesn't "cost" significantly in terms of memory and is much more appropriate when dealing with excel UI rows number since Integer type ranges from -32768 to 32767 while worksheet rows can reach up to 65,536 (till Excel 2003) or nearly 1 million (from Excel 2007 on)!


Summary #1

as for all what above a possible refactoring of your code could be the following

Option Explicit
Sub BinValues2()
 'binns seperation distance values for the creation of variogram
 Application.ScreenUpdating = False
 Application.EnableEvents = False
 Application.Calculation = xlCalculationManual
 Dim cell As Object
 Dim R1 As Range, rng As Range
 Dim K As Long, min As Long, n As Long, L As Long
 'define range before running
 Set rng = Range("A1:A105570").SpecialCells(xlCellTypeConstants, xlNumbers) '<--| limit data to relevant ones
 n = 12 'n is equal to the number of lags
 L = 600 'L is the lag size
 For K = L To (n * L) Step L
 min = K - L
 Set R1 = rng(1, 1).Offset(, 1) '<--| set R1 to a dummy "invalid" range
 For Each cell In rng
 'upper bound exclusive and lower bound inclusive
 If cell.value >= min And cell.value < K Then Set R1 = Union(R1, cell) '<--| go with union without worrying about R1 being empty
 Next cell
 Set R1 = Intersect(R1, Columns(1)) '<-- intersect R1 with "valid" column
 If Not R1 Is Nothing Then
 Cells((K / L), 5) = WorksheetFunction.Average(R1)
 Else
 Set R1 = Nothing
 End If
 Next
 Application.ScreenUpdating = True
 Application.EnableEvents = True
 Application.Calculation = xlCalculationAutomatic
End Sub

The BIG JUMP: Use Excel built-in data manipulation

All what above leads (hopefully) to better coding habits and some significant speed increment, but the big jump comes with the use of excel built in data manipulation functions, namely Autofilter() method of Range object


Use With ... End With blocks

this not only shortens code typing (thus increasing code readability and decreases typos probability) but also reduces objects memory accesses, especially effective when loops are in the game


Summary #2

Autofilter() and With...End With blocks, can lead to a very short, elegant (at least to me it does look like) and fast code, like follows:

Option Explicit
Sub BinValues()
 'binns seperation distance values for the creation of variogram 
 Dim K As Long, n As Long, L As Long
 n = 12
 L = 600
 With Worksheets("bins") '<--|refer to data sheet (change "bins" with your actual worksheet name
 .Cells(1, 1).Insert '<--|insert a temporary header cell: it'll be used for AutoFilter() method and eventually deleted
 .Cells(1, 1).value = "bin" '<--| place a dummy header in the temporary header cell 
 With .Range("A1", .Cells(.Rows.Count, 1).End(xlUp)) '<--| refer columns "A" down to its last non empty row
 For K = L To (n * L) Step L
 .AutoFilter field:=1, Criteria1:=">=" & K - L, Operator:=xlAnd, Criteria2:="<" & K '<--| filter it on current department value
 If Application.WorksheetFunction.Subtotal(103, .Cells) > 1 Then Cells((K / L), 5) = Application.WorksheetFunction.Subtotal(101, .Cells)
 Next K
 End With
 .AutoFilterMode = False
 .Cells(1, 1).Delete '<--| delete temporary header cell
 End With 
End Sub

which, even without any ScreenUpdating, Calculation or Events disabling runs much much faster than the one in Summary #1

answered Aug 6, 2016 at 15:58
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.