Problem
The following statistical functions were created to calculate statistics of experiments:
- Mean
- Peak to Peak
- Standard Deviation
- Variance
- Mean Absolute Error (MAE)
- Mean Square Error (MSE)
- Root Mean Square Error (RMSE)
So there are some statistics that needs an ideal point and some that doesn't.
The following image contains the data:
Where there are two columns, the first contains the measured value and the second the quantity that the data repeats.
Statistical UDFs
Each function will have as input only the first data column and the quantities must be on the right.
Mean
The function on G3 is =MeanArr(C2:C20)
And the code is:
Public Function MeanArr(rng As Range) As Double
Dim Arr()
Dim ws As Worksheet
Dim i As Long, j As Long
Set ws = Application.Caller.Parent
Dim cell As Range
With ws
For Each cell In rng
If cell.Offset(0, 1) > 1 Then
ReDim Preserve Arr(cell.Offset(0, 1) + j - 1)
For i = 0 To cell.Offset(0, 1) - 1
Arr(j + i) = cell
Next i
j = j + i
ElseIf cell.Offset(0, 1) = 1 Then
ReDim Preserve Arr(cell.Offset(0, 1) + j - 1)
i = 0
Arr(j + i) = cell
j = j + 1
End If
Next cell
'Mean
MeanArr= Application.WorksheetFunction.Average(Arr)
End With
Exit Function
ErrHandler:
MeanArr = "Error"
End Function
It is the Arithmetic Mean:
Peak to Peak
The function on G4 is =PeaktoPeak(C2:C20)
And the code is:
Public Function PeaktoPeak(rng As Range) As Double
Dim Arr()
Dim ws As Worksheet
Dim i As Long, j As Long
Set ws = Application.Caller.Parent
Dim cell As Range
With ws
For Each cell In rng
If cell.Offset(0, 1) > 1 Then
ReDim Preserve Arr(cell.Offset(0, 1) + j - 1)
For i = 0 To cell.Offset(0, 1) - 1
Arr(j + i) = cell
Next i
j = j + i
ElseIf cell.Offset(0, 1) = 1 Then
ReDim Preserve Arr(cell.Offset(0, 1) + j - 1)
i = 0
Arr(j + i) = cell
j = j + 1
End If
Next cell
'Peak to Peak
PeaktoPeak = WorksheetFunction.Max(Arr) - WorksheetFunction.Min(Arr)
End With
Exit Function
ErrHandler:
PeaktoPeak = "Error"
End Function
Peak to Peak is the amplitude of the data, it is the max minus the min.
Standard Deviation
The function on G5 is StdDeviation(C2:C20)
.
Public Function StdDeviation(rng As Range) As Double
Dim Arr()
Dim ws As Worksheet
Dim i As Long, j As Long
Set ws = Application.Caller.Parent
Dim cell As Range
With ws
For Each cell In rng
If cell.Offset(0, 1) > 1 Then
ReDim Preserve Arr(cell.Offset(0, 1) + j - 1)
For i = 0 To cell.Offset(0, 1) - 1
Arr(j + i) = cell
Next i
j = j + i
ElseIf cell.Offset(0, 1) = 1 Then
ReDim Preserve Arr(cell.Offset(0, 1) + j - 1)
i = 0
Arr(j + i) = cell
j = j + 1
End If
Next cell
'Standard Deviation
StdDeviation = WorksheetFunction.StDev(Arr)
End With
Exit Function
ErrHandler:
StdDeviation = "Error"
End Function
The standard deviation is a measure that is used to quantify the amount of variation or dispersion of a set of data values.
Variance
The function on G7 is =Variance(C2:C20)
Public Function Variance(rng As Range) As Double
Dim Arr()
Dim ws As Worksheet
Dim i As Long, j As Long
Set ws = Application.Caller.Parent
Dim cell As Range
With ws
For Each cell In rng
If cell.Offset(0, 1) > 1 Then
ReDim Preserve Arr(cell.Offset(0, 1) + j - 1)
For i = 0 To cell.Offset(0, 1) - 1
Arr(j + i) = cell
Next i
j = j + i
ElseIf cell.Offset(0, 1) = 1 Then
ReDim Preserve Arr(cell.Offset(0, 1) + j - 1)
i = 0
Arr(j + i) = cell
j = j + 1
End If
Next cell
'Var
Variance = WorksheetFunction.Var(Arr)
End With
Exit Function
ErrHandler:
Variance = "Error"
End Function
The Variance is the expectation of the squared deviation of a random variable from its mean. Informally, it measures how far a set of (random) numbers are spread out from their average value.
Mean Absolute Error (MAE)
The function on G6 is =MAE(C2:C20;B1)
Public Function MAE(rng As Range, ideal As Double) As Double
Dim Arr()
Dim ws As Worksheet
Dim i As Long, j As Long
Dim Sum As Double
Set ws = Application.Caller.Parent
Dim cell As Range
With ws
For Each cell In rng
If cell.Offset(0, 1) > 1 Then
ReDim Preserve Arr(cell.Offset(0, 1) + j - 1)
For i = 0 To cell.Offset(0, 1) - 1
Arr(j + i) = cell
Next i
j = j + i
ElseIf cell.Offset(0, 1) = 1 Then
ReDim Preserve Arr(cell.Offset(0, 1) + j - 1)
i = 0
Arr(j + i) = cell
j = j + 1
End If
Next cell
'y=y1-t_ideal; %t_ideal is the square wave of ideal communication and y1 the test vector
For i = LBound(Arr) To UBound(Arr)
Arr(i) = Arr(i) - ideal
Next i
'%Absolute Value
For i = LBound(Arr) To UBound(Arr)
Arr(i) = Abs(Arr(i))
Next i
's=sum(se);
Sum = 0
For i = LBound(Arr) To UBound(Arr)
Sum = Sum + Arr(i)
Next i
'Mean Absolute Error
MAE = Sum / (UBound(Arr) + 1)
End With
Exit Function
ErrHandler:
MAE = "Error"
End Function
The Mean Absolute Error is a measure of difference between two continuous variables. Consider a scatter plot of n points, where point i has coordinates (xi, yi)... Mean Absolute Error (MAE) is the average vertical distance between each point and the identity line. MAE is also the average horizontal distance between each point and the identity line.
Calculated by the following formula:
Mean Squared Error (MSE)
The function on G2 is MSE(C2:C20;B1)
Public Function MSE(rng As Range, ideal As Double) As Double
Dim Arr()
Dim ws As Worksheet
Dim i As Long, j As Long
Dim Sum As Double
Set ws = Application.Caller.Parent
Dim cell As Range
With ws
For Each cell In rng
If cell.Offset(0, 1) > 1 Then
ReDim Preserve Arr(cell.Offset(0, 1) + j - 1)
For i = 0 To cell.Offset(0, 1) - 1
Arr(j + i) = cell
Next i
j = j + i
ElseIf cell.Offset(0, 1) = 1 Then
ReDim Preserve Arr(cell.Offset(0, 1) + j - 1)
i = 0
Arr(j + i) = cell
j = j + 1
End If
Next cell
'y=y1-t_ideal; %t_ideal is the square wave of ideal communication and y1 the test vector
For i = LBound(Arr) To UBound(Arr)
Arr(i) = Arr(i) - ideal
Next i
'%Square Error, where .^ is used to square vector
For i = LBound(Arr) To UBound(Arr)
Arr(i) = Arr(i) ^ 2
Next i
's=sum(se);
Sum = 0
For i = LBound(Arr) To UBound(Arr)
Sum = Sum + Arr(i)
Next i
'mse=s/n; %Mean Square Error
MSE = Sum / (UBound(Arr) + 1)
End With
Exit Function
ErrHandler:
MSE = "Error"
End Function
The Mean Squared Error of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and what is estimated.
Formula:
Root Mean Square Deviation (RMSE)
The formula on G1 is =RMSE(C2:C20;B1)
Public Function RMSE(rng As Range, ideal As Double) As Double
Dim Arr()
Dim ws As Worksheet
Dim i As Long, j As Long
Dim Soma As Double, MSE As Double
Set ws = Application.Caller.Parent
Dim cell As Range
With ws
For Each cell In rng
If cell.Offset(0, 1) > 1 Then
ReDim Preserve Arr(cell.Offset(0, 1) + j - 1)
For i = 0 To cell.Offset(0, 1) - 1
Arr(j + i) = cell
Next i
j = j + i
ElseIf cell.Offset(0, 1) = 1 Then
ReDim Preserve Arr(cell.Offset(0, 1) + j - 1)
i = 0
Arr(j + i) = cell
j = j + 1
End If
Next cell
'y=y1-t_ideal; %t_ideal is the square wave of ideal communication and y1 the test vector
For i = LBound(Arr) To UBound(Arr)
Arr(i) = Arr(i) - ideal
Next i
'%Square Error, where .^ is used to square vector
For i = LBound(Arr) To UBound(Arr)
Arr(i) = Arr(i) ^ 2
Next i
's=sum(se);
Sum = 0
For i = LBound(Arr) To UBound(Arr)
Sum = Sum + Arr(i)
Next i
'mse=s/n; %Mean Square Error
MSE = Sum / (UBound(Arr) + 1)
'rmse=sqrt(mse) %Root Mean Square Error
RMSE = Sqr(MSE)
End With
Exit Function
ErrHandler:
RMSE = "Error"
End Function
Root Mean Square Deviation (RMSE) is a frequently used measure of the differences between values (sample or population values) predicted by a model or an estimator and the values observed.
Formula:
Questions
- How is the performance? Can it improve?
- Are the results ok? Are the functions working properly?
- How to make a proper
ErrHandler
? - Should i use
WorksheetFunction
or created my own UDFs? If the quantity of data gets really large. - I was thinking... Should I use a Global Array for each Sheet? So it doesn't have to calculate an array of data for each function again?
- Further tips/help are welcome. Or another improvements.
Just for reference:
-
1\$\begingroup\$ Be aware there are 2 different formulas for stddev, one for a population and one for a sample space. This also have an effect on related formulas, like variance and stderr. \$\endgroup\$dfhwze– dfhwze2019年07月31日 20:42:46 +00:00Commented Jul 31, 2019 at 20:42
2 Answers 2
Here are some ideas on improvements you can make in the overall code, presented in an example for one of the math functions. The ideas can be applied to all the other functions as well.
My lead-in to the rest of my comments and examples primarily deal with highly repeated logic in all of your functions. A big clue is when you end up copying a section of code into another function, you should STOP and consider creating a single common function. This "functional isolation" of logic makes your code more consistent and greatly helps when changes need to be made to the logic. You only have to make the change in once place.
So my first comments deals with your input range to all your functions. Each function seems to require a two-column range with both values and quantities. This is perfectly fine, but if that's the case then your input range should also be a two-column range. Your example accepts a one-column range and uses Offset
to check the quantity value. This is a mis-match between what you think is the input range: single-column, but really using two. So bottom line is to make your input range match what the UDF is actually using.
Along those lines, each UDF should perform checks against the input values to ensure they match the expectations of your function. In my example below, I've created an InputCheck
function that can be called from each of your UDFs, providing central (and functionally isolated) checks on your input data. My example only shows two quick checks, but you can add any other checks/tests as needed. I highly recommend reading Chip Pearson's Returning Errors From User Defined Functions In VBA for guidance. Returning errors from UDFs in this manner means that "error handling" won't stop execution or use a pop-up MsgBox
-- any error will be indicated in the cell.
Private Function InputCheck(ByRef dataRange As Variant) As Long
'--- returns 0 if all checks pass!!
'--- input must be a range
If Not TypeName(dataRange) = "Range" Then
InputCheck = xlErrRef
Exit Function
End If
'--- input range must be one or two columns ONLY
If (dataRange.Columns.Count < 1) Or (dataRange.Columns.Count > 2) Then
InputCheck = xlErrRef
Exit Function
End If
'--- all cells MUST contain numeric values
Dim cell As Variant
For Each cell In dataRange
If Not IsNumeric(cell) Then
InputCheck = xlErrNum
Exit Function
End If
Next cell
'--- create any other checks for valid input data...
'--- everything looks good!
InputCheck = 0
End Function
(You'll notice that I'm sneaking in an extra check I'll show later. But basically it's if the UDF is called with a single column of data everything will still work.)
Now you can create a common block at the beginning of each of your UDFs that will return a valid error code to the worksheet cell:
Dim checkResult As Long
checkResult = InputCheck(dataRange)
If checkResult <> 0 Then
QtyMean = CVErr(checkResult)
Exit Function
End If
Your next section of common logic builds an array of values from the input data, including repeating values based on the quantity indicator. I've also moved this logic into its own isolated function.
One step in speed improvement is to perform most of your logic in memory-based arrays instead of directly interacting with the worksheet or range object. So the first thing to do is move the input range to an array. You can then make a quick check and if it's a single-column, then you're all done. If there are two columns, the logic proceeds almost identically to your original code. Note that you can pre-determine the size of your return array by summing the quantities in the second column. This avoids the expense of ReDim Preserve
during code execution.
Private Function GetDataArray(ByRef srcRange As Variant) As Variant
Dim theSourceData As Variant
theSourceData = srcRange.Value
If srcRange.Columns.Count = 1 Then
'--- only one column, so we're done!
GetDataArray = theSourceData
Exit Function
End If
'--- we're building a single array and (possibly) repeating values
' based on the quantity indicator in the second column, so...
'--- size the results array first...
Dim resultsSize As Long
Dim n As Long
For n = LBound(theSourceData, 1) To UBound(theSourceData, 1)
resultsSize = resultsSize + theSourceData(n, 2)
Next n
Dim resultArray() As Variant
ReDim resultArray(0 To resultsSize)
'--- ... now build the array and repeat values as necessary
Dim i As Long
Dim j As Long
For n = LBound(theSourceData, 1) To UBound(theSourceData, 1)
If theSourceData(n, 2) > 1 Then
'--- repeat values in the array
For i = 0 To theSourceData(n, 2) - 1
resultArray(j + i) = theSourceData(n, 1)
Next i
j = j + i
ElseIf theSourceData(n, 2) = 1 Then
'--- only a single value
i = 0
resultArray(j + i) = theSourceData(n, 1)
j = j + 1
End If
Next n
GetDataArray = resultArray
End Function
Since this function returns a well-crafted array of values, the only thing left is the math logic. So my example UDF for calculating the Mean is
Public Function QtyMean(ByRef dataRange As Variant) As Double
'--- accepts a one- or two-column range where column 1 holds the
' values and (the optional) column 2 holds the quantities
Dim checkResult As Long
checkResult = InputCheck(dataRange)
If checkResult <> 0 Then
QtyMean = CVErr(checkResult)
Exit Function
End If
Dim dataWithQty As Variant
dataWithQty = GetDataArray(dataRange)
If IsArray(dataWithQty) Then
QtyMean = Application.WorksheetFunction.Average(dataWithQty)
Else
QtyMean = CVErr(xlErrValue)
End If
End Function
Some end notes on the example code:
- Most parameters are passed as
Variant
because it forces (reminds) me to perform all input error checking in my UDF and supporting functions. Excel will convert some inputs for you, but can't always know exactly what you're expecting and which appropriate error should be raised. - Use more descriptive names for your variables. It makes the code more readable and self-documenting. As an example, your
Arr
array variable is not hard to figure out but when I useresultArray
it makes more sense when "reading" the code. - Each of your functions consistently sets up a worksheet variable
ws
but never uses it. I can recommend a tool such as Rubberduck that can help with code quality checks like this. (Disclaimer: I have nothing to do with Rubberduck, just a satisfied user)
Here's the whole example module:
Option Explicit
Public Function QtyMean(ByRef dataRange As Variant) As Double
'--- accepts a one- or two-column range where column 1 holds the
' values and (the optional) column 2 holds the quantities
Dim checkResult As Long
checkResult = InputCheck(dataRange)
If checkResult <> 0 Then
QtyMean = CVErr(checkResult)
Exit Function
End If
Dim dataWithQty As Variant
dataWithQty = GetDataArray(dataRange)
If IsArray(dataWithQty) Then
QtyMean = Application.WorksheetFunction.Average(dataWithQty)
Else
QtyMean = CVErr(xlErrValue)
End If
End Function
Private Function GetDataArray(ByRef srcRange As Variant) As Variant
Dim theSourceData As Variant
theSourceData = srcRange.Value
If srcRange.Columns.Count = 1 Then
'--- only one column, so we're done!
GetDataArray = theSourceData
Exit Function
End If
'--- we're building a single array and (possibly) repeating values
' based on the quantity indicator in the second column, so...
'--- size the results array first...
Dim resultsSize As Long
Dim n As Long
For n = LBound(theSourceData, 1) To UBound(theSourceData, 1)
resultsSize = resultsSize + theSourceData(n, 2)
Next n
Dim resultArray() As Variant
ReDim resultArray(0 To resultsSize)
'--- ... now build the array and repeat values as necessary
Dim i As Long
Dim j As Long
For n = LBound(theSourceData, 1) To UBound(theSourceData, 1)
If theSourceData(n, 2) > 1 Then
'--- repeat values in the array
For i = 0 To theSourceData(n, 2) - 1
resultArray(j + i) = theSourceData(n, 1)
Next i
j = j + i
ElseIf theSourceData(n, 2) = 1 Then
'--- only a single value
i = 0
resultArray(j + i) = theSourceData(n, 1)
j = j + 1
End If
Next n
GetDataArray = resultArray
End Function
Private Function InputCheck(ByRef dataRange As Variant) As Long
'--- returns 0 if all checks pass!!
'--- input must be a range
If Not TypeName(dataRange) = "Range" Then
InputCheck = xlErrRef
Exit Function
End If
'--- input range must be one or two columns ONLY
If (dataRange.Columns.Count < 1) Or (dataRange.Columns.Count > 2) Then
InputCheck = xlErrRef
Exit Function
End If
'--- all cells MUST contain numeric values
Dim cell As Variant
For Each cell In dataRange
If Not IsNumeric(cell) Then
InputCheck = xlErrNum
Exit Function
End If
Next cell
'--- create any other checks for valid input data...
'--- everything looks good!
InputCheck = 0
End Function
Standard public announcement: always include Option Explicit
at the top of modules.
How to improve performance
Two tips here:
- Put your range into an array and work with the array, not Excel objects. The switching between Excel and VBA models is computationally expensive.
- Avoid wherever possible
ReDim
ming andPreserve
-ing arrays. They are computationally expensive.
To use your first function as an example
Public Function MeanArr(rng As Range) As Double '<-- I will make a comment about this later
'Error checks required for if the user selected a single cell, or if they selected more than one column.
Dim initialValues as Variant '<-- create an array here
Dim initialQuantities as Variant '<-- be explicit with all values you want to use
initialValues = rng.Value
initialQuantities = rng.Offset(0,1).Value
'Dim Arr() ' <-- This would have been an array of variants, which is subtly different
Dim i As Long, j As Long
Dim totalSum as Double, totalCount as Double
For j = LBound(initialValues,1) to UBound(initialValues,1)
'Error checking required for valid input - is it really a Number?
totalCount = totalCount + CDbl(initialQuantities(j,1))
totalSum = totalSum + CDBL(initialValues(j,1)) * CDbl(initialQuantities(j,1))
Next j
MeanArr = totalSum / totalCount
Exit Function '<--- left this here for now but will address errors later.
ErrHandler:
MeanArr = "Error" '<-- I will make a comment about this later
End Function
You don't use ws
(yes, you have a With
statement, but how the code is written, that With
is not used. Nor is it needed!
I was going to use a Collection instead of an expanding array - but turns out not necessary in this case. However, for the more complex formulae, consider using a Collection instead of an array, or try to resize the array only once at the beginning. See https://stackoverflow.com/a/56842847/9101981
Are the functions working correctly?
I don't know - I hope they are otherwise this post is off-topic and I am wasting my time on this response! However, you can derive tests to ensure your own correctness.
Proper error handling
You are creating a UDF - Let Excel do some of the work for you.
Public Function MeanArr(rng As Range) As Variant '<-- significant
'Error checks required for if the user selected a single cell, or if they selected more than one column.
If rng.Columns.Count > 1 then
MeanArr = CVErr(xlErrName)
Exit Function
End If
Dim initialValues as Variant '<-- create an array here
Dim initialQuantities as Variant '<-- be explicit with all values you want to use
initialValues = rng.Value
initialQuantities = rng.Offset(0,1).Value
Dim j As Long
Dim totalSum as Double, totalCount as Double
For j = LBound(initialValues,1) to UBound(initialValues,1)
'Error checking required for valid input - is it really a Number?
If Not IsNumeric() or Not IsNumeric() Then
MeanArr = CVErr(xlErrNum)
Exit Function
End If
totalCount = totalCount + CDbl(initialQuantities(j,1))
totalSum = totalSum + CDBL(initialValues(j,1)) * CDbl(initialQuantities(j,1))
Next j
MeanArr = CDbl(totalSum / totalCount)
' Error handling is now done in the code logic.
End Function
Leaving the function return type as Variant is deliberate, it means that you can control what is returned and use the in-built error types to indicate to your user any issues. It also means that if you link the results cell to any other cell, Excel knows how to handle chained errors. Excel might misinterpret a random string ("error") potentially hiding issues from the user if you didn't use the built in types.
Use Worksheet functions?
This is an 'it depends' answer. For speed, writing your own routines will generally be faster. However, some of the more complex functions, using the built in worksheet function may be easier. Trade off against your own requirements, just remember that every switch to the Excel model costs.
Use Global Array?
Definitely not. You are writing portable, reusable UDFs which take a range as a parameter. Using a global variable just ties you down with no flexibility.