Calculating population standard deviation

Question 1

This is a script I have written to calculate the population standard deviation. I feel that this can be simplified and also be made more pythonic.

from math import sqrt
def mean(lst):
 """calculates mean"""
 sum = 0
 for i in range(len(lst)):
 sum += lst[i]
 return (sum / len(lst))
def stddev(lst):
 """calculates standard deviation"""
 sum = 0
 mn = mean(lst)
 for i in range(len(lst)):
 sum += pow((lst[i]-mn),2)
 return sqrt(sum/len(lst)-1)
numbers = [120,112,131,211,312,90]
print stddev(numbers)

Question 2

The easiest way to make mean() more pythonic is to use the sum() built-in function.

def mean(lst):
 return sum(lst) / len(lst)

Concerning your loops on lists, you don't need to use range(). This is enough:

for e in lst:
 sum += e

Other comments:

You don't need parentheses around the return value (check out PEP 8 when you have a doubt about this).
Your docstrings are useless: it's obvious from the name that it calculates the mean. At least make them more informative ("returns the mean of lst").
Why do you use "-1" in the return for stddev? Is that a bug?
You are computing the standard deviation using the variance: call that "variance", not sum!
You should type pow(e-mn,2), not pow((e-mn),2). Using parentheses inside a function call could make the reader think he's reading a tuple (eg. pow((e,mn),2) is valid syntax)
You shouldn't use pow() anyway, ** is enough.

This would give:

def stddev(lst):
 """returns the standard deviation of lst"""
 variance = 0
 mn = mean(lst)
 for e in lst:
 variance += (e-mn)**2
 variance /= len(lst)
 return sqrt(variance)

It's still way too verbose! Since we're handling lists, why not using list comprehensions?

def stddev(lst):
 """returns the standard deviation of lst"""
 mn = mean(lst)
 variance = sum([(e-mn)**2 for e in lst]) / len(lst)
 return sqrt(variance)

This is not perfect. You could add tests using doctest. Obviously, you should not code those functions yourself, except in a small project. Consider using Numpy for a bigger project.

Question 3

Thank you Cygal for your answer. I realize things like tests and validation need to be added, but I think you put me in the right direction.

Question 4

@mad, I realize you're not able to comment due to your reputation, but if you see a problem in a post and want to fix it, you'll either have to be patient and wait until you have 50 reputation or go out, answer a question, and get five upvotes (or ask a good question and get 10). Please don't try to circumvent the system. Third-party edits should only edit the content of the post (as opposed to formatting, grammar, spelling, pasting in content from links etc.) with explicit approval from the poster.

Question 5

Looks like you forgot to divide the variance by N before taking the sqrt in the last/least verbose example.

Question 6

@CodyA.Ray Your Rev 2 corrected the result, but it was not the right fix.

Question 7

@200_success can you elaborate? Yeah, variance is the wrong variable name there. I could've just divided in the "return" line. But the equation seems correct for non-sampled std dev: libweb.surrey.ac.uk/library/skills/Number%20Skills%20Leicester/…

Question 8

You have some serious calculation errors...

Assuming that this is Python 2, you also have bugs in the use of division: if both operands of / are integers, then Python 2 performs integer division. Possible remedies are:

from __future__ import division
Cast one of the operands to a float: return (float(sum)) / len(lst), for example.

(Assuming that this is Python 3, you can just use statistics.stdev().

The formula for the sample standard deviation is

$$ s = \sqrt{\frac{\sum_{i=1}^{n}\ (x_i - \bar{x})^2}{n - 1}}$$

In return sqrt(sum/len(lst)-1), you have an error with the precedence of operations. It should be

return sqrt(float(sum) / (len(lst) - 1))

Question 9

Source for formula?

Question 10

@Agostino It's basically common knowledge in statistics.

Quentin Pradet Quentin Pradet 7,0641 gold badge25 silver badges44 bronze badges · Accepted Answer · 2012-02-21 13:01:11Z

The easiest way to make mean() more pythonic is to use the sum() built-in function.

def mean(lst):
 return sum(lst) / len(lst)

Concerning your loops on lists, you don't need to use range(). This is enough:

for e in lst:
 sum += e

Other comments:

You don't need parentheses around the return value (check out PEP 8 when you have a doubt about this).
Your docstrings are useless: it's obvious from the name that it calculates the mean. At least make them more informative ("returns the mean of lst").
Why do you use "-1" in the return for stddev? Is that a bug?
You are computing the standard deviation using the variance: call that "variance", not sum!
You should type pow(e-mn,2), not pow((e-mn),2). Using parentheses inside a function call could make the reader think he's reading a tuple (eg. pow((e,mn),2) is valid syntax)
You shouldn't use pow() anyway, ** is enough.

This would give:

def stddev(lst):
 """returns the standard deviation of lst"""
 variance = 0
 mn = mean(lst)
 for e in lst:
 variance += (e-mn)**2
 variance /= len(lst)
 return sqrt(variance)

It's still way too verbose! Since we're handling lists, why not using list comprehensions?

def stddev(lst):
 """returns the standard deviation of lst"""
 mn = mean(lst)
 variance = sum([(e-mn)**2 for e in lst]) / len(lst)
 return sqrt(variance)

This is not perfect. You could add tests using doctest. Obviously, you should not code those functions yourself, except in a small project. Consider using Numpy for a bigger project.

Thank you Cygal for your answer. I realize things like tests and validation need to be added, but I think you put me in the right direction.
@mad, I realize you're not able to comment due to your reputation, but if you see a problem in a post and want to fix it, you'll either have to be patient and wait until you have 50 reputation or go out, answer a question, and get five upvotes (or ask a good question and get 10). Please don't try to circumvent the system. Third-party edits should only edit the content of the post (as opposed to formatting, grammar, spelling, pasting in content from links etc.) with explicit approval from the poster.
Looks like you forgot to divide the variance by N before taking the sqrt in the last/least verbose example.
@CodyA.Ray Your Rev 2 corrected the result, but it was not the right fix.
@200_success can you elaborate? Yeah, variance is the wrong variable name there. I could've just divided in the "return" line. But the equation seems correct for non-sampled std dev: libweb.surrey.ac.uk/library/skills/Number%20Skills%20Leicester/…

Stack Exchange Network

Calculating population standard deviation

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Calculating population standard deviation

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions