This is a script I have written to calculate the population standard deviation. I feel that this can be simplified and also be made more pythonic.
from math import sqrt
def mean(lst):
"""calculates mean"""
sum = 0
for i in range(len(lst)):
sum += lst[i]
return (sum / len(lst))
def stddev(lst):
"""calculates standard deviation"""
sum = 0
mn = mean(lst)
for i in range(len(lst)):
sum += pow((lst[i]-mn),2)
return sqrt(sum/len(lst)-1)
numbers = [120,112,131,211,312,90]
print stddev(numbers)
2 Answers 2
The easiest way to make mean()
more pythonic is to use the sum()
built-in function.
def mean(lst):
return sum(lst) / len(lst)
Concerning your loops on lists, you don't need to use range()
. This is enough:
for e in lst:
sum += e
Other comments:
- You don't need parentheses around the return value (check out PEP 8 when you have a doubt about this).
- Your docstrings are useless: it's obvious from the name that it calculates the mean. At least make them more informative ("returns the mean of lst").
- Why do you use "-1" in the return for stddev? Is that a bug?
- You are computing the standard deviation using the variance: call that "variance", not sum!
- You should type pow(e-mn,2), not pow((e-mn),2). Using parentheses inside a function call could make the reader think he's reading a tuple (eg. pow((e,mn),2) is valid syntax)
- You shouldn't use pow() anyway, ** is enough.
This would give:
def stddev(lst):
"""returns the standard deviation of lst"""
variance = 0
mn = mean(lst)
for e in lst:
variance += (e-mn)**2
variance /= len(lst)
return sqrt(variance)
It's still way too verbose! Since we're handling lists, why not using list comprehensions?
def stddev(lst):
"""returns the standard deviation of lst"""
mn = mean(lst)
variance = sum([(e-mn)**2 for e in lst]) / len(lst)
return sqrt(variance)
This is not perfect. You could add tests using doctest. Obviously, you should not code those functions yourself, except in a small project. Consider using Numpy for a bigger project.
-
\$\begingroup\$ Thank you Cygal for your answer. I realize things like tests and validation need to be added, but I think you put me in the right direction. \$\endgroup\$Animesh D– Animesh D2012年02月22日 08:23:14 +00:00Commented Feb 22, 2012 at 8:23
-
\$\begingroup\$ @mad, I realize you're not able to comment due to your reputation, but if you see a problem in a post and want to fix it, you'll either have to be patient and wait until you have 50 reputation or go out, answer a question, and get five upvotes (or ask a good question and get 10). Please don't try to circumvent the system. Third-party edits should only edit the content of the post (as opposed to formatting, grammar, spelling, pasting in content from links etc.) with explicit approval from the poster. \$\endgroup\$anon– anon2015年12月30日 20:09:27 +00:00Commented Dec 30, 2015 at 20:09
-
\$\begingroup\$ Looks like you forgot to divide the variance by N before taking the sqrt in the last/least verbose example. \$\endgroup\$Cody A. Ray– Cody A. Ray2016年09月29日 16:41:52 +00:00Commented Sep 29, 2016 at 16:41
-
\$\begingroup\$ @CodyA.Ray Your Rev 2 corrected the result, but it was not the right fix. \$\endgroup\$200_success– 200_success2016年09月29日 19:19:50 +00:00Commented Sep 29, 2016 at 19:19
-
\$\begingroup\$ @200_success can you elaborate? Yeah, variance is the wrong variable name there. I could've just divided in the "return" line. But the equation seems correct for non-sampled std dev: libweb.surrey.ac.uk/library/skills/Number%20Skills%20Leicester/… \$\endgroup\$Cody A. Ray– Cody A. Ray2016年09月29日 22:09:11 +00:00Commented Sep 29, 2016 at 22:09
You have some serious calculation errors...
Assuming that this is Python 2, you also have bugs in the use of division: if both operands of /
are integers, then Python 2 performs integer division. Possible remedies are:
from __future__ import division
- Cast one of the operands to a
float
:return (float(sum)) / len(lst)
, for example.
(Assuming that this is Python 3, you can just use statistics.stdev()
.
The formula for the sample standard deviation is
$$ s = \sqrt{\frac{\sum_{i=1}^{n}\ (x_i - \bar{x})^2}{n - 1}}$$
In return sqrt(sum/len(lst)-1)
, you have an error with the precedence of operations. It should be
return sqrt(float(sum) / (len(lst) - 1))
-
\$\begingroup\$ Source for formula? \$\endgroup\$Agostino– Agostino2015年03月26日 23:40:48 +00:00Commented Mar 26, 2015 at 23:40
-
\$\begingroup\$ @Agostino It's basically common knowledge in statistics. \$\endgroup\$200_success– 200_success2015年03月26日 23:42:44 +00:00Commented Mar 26, 2015 at 23:42