I have been reading a bit about statistics to improve the way I understand data. This is my attempt at calculating Standard Variance using data tables. I solved this simple question:
StdVariance
This is the code I used (I used SEDE to execute it):
CREATE TABLE #Numbers
(
Number DECIMAL(10,4),
Variance DECIMAL(10,4)
);
GO
INSERT INTO #Numbers (Number) VALUES
(23),
(37),
(45),
(49),
(56),
(63),
(63),
(70),
(72),
(82);
GO
DECLARE @Mean DECIMAL(10,4);
SET @Mean = (SELECT SUM (Number) / COUNT(Number) FROM #Numbers);
UPDATE #Numbers
SET Variance = POWER( (@Mean - Number), 2);
DECLARE @StdVariance DECIMAL(10,4);
SET @StdVariance = (SELECT SQRT( SUM(Variance) / COUNT(Number) ) FROM #Numbers);
SELECT
@Mean AS [Mean],
@StdVariance AS [Standard Variance];
Output:
(10 row(s) affected) Mean Standard Variance ------- ----------------- 56.0000 16.8701 (10 row(s) affected) (1 row(s) affected)
I'm less familiar with SQL Server than other RDBMS, so I want to make sure my use of temp tables and variables is done the "Transact-SQL Way". Is there a cleaner, smarter way to do this?
-
\$\begingroup\$ Do you want "variance" or "standard deviation"? \$\endgroup\$200_success– 200_success2014年11月03日 21:50:24 +00:00Commented Nov 3, 2014 at 21:50
-
\$\begingroup\$ My bad, I used the wrong wording. I was looking for standard deviation not variance. \$\endgroup\$Phrancis– Phrancis2014年11月03日 22:02:47 +00:00Commented Nov 3, 2014 at 22:02
2 Answers 2
Assuming you want to reinvent the wheel...
You really shouldn't be using a data table as scratch space for calculations, even if it is a temporary table. You should also not need to rely on variables either — that is not idiomatic SQL. A Common Table Expression would solve both problems.
CREATE TABLE #Numbers
(
Number DECIMAL(10,4)
);
INSERT ...;
WITH Mean AS (
SELECT SUM(Number) / COUNT(Number) AS Mean
FROM #Numbers
), Deviation AS (
SELECT Mean, POWER(Number - Mean, 2) AS Error
FROM #Numbers CROSS JOIN Mean
)
SELECT Mean, SQRT(SUM(Error) / COUNT(Error)) AS [Standard Deviation]
FROM Deviation
GROUP BY Mean;
The smarter way is to search MSDN.
SELECT AVG(Number) AS Mean
, STDEVP(Number) AS [Standard Deviation]
, STDEVP(Number) * STDEVP(Number) AS Variance
FROM #Numbers;
Caution: The AVG()
function in SQL Server is a bit weird due to its naïve implementation:
AVG () computes the average of a set of values by dividing the sum of those values by the count of nonnull values. If the sum exceeds the maximum value for the data type of the return value an error will be returned.
That means that
- It is vulnerable to overflow
- If the inputs have an integral data type, then it performs integer division, truncating the result to an integer.