Standard deviation from iterators

Question 1

I want this to be similar to the STL algorithms but I don't find it elegant nor concise at all:

#include <iostream>
#include <iterator>
#include <vector>
#include <algorithm>
using E = double;
template <typename IT>
E std_dev(IT begin, IT end){
 auto N = std::distance(begin, end);
 E average = std::accumulate(begin, end, E()) / N;
 auto sum_term = [average](E init, E value)-> E{ 
 return init + (value - average)*(value - average);
 };
 E variance = std::accumulate(begin, end, E(), sum_term);
 return std::sqrt(variance * 1.0 / (N - 1));
}
int main(){
 std::vector<double> stuff {3.5, 3.4, 3.6, 3.9, 3.5, 3.5, 3.5, 3.5, 3.5};
 std::cout << std_dev(stuff.begin(), stuff.end()) << "\n";
}

Question 2

Please do not change the code in your questions after an answer has been posted, so as not to invalidate it. See What should I do when someone answers my question?

Question 3

Firstly, make it correct.

N is integral, you could make it E so you don't accidentally do integer arithmetic.

N-1 is wrong.

Rename average to mean.

Don't hardcode E.

You get:

template <typename It, typename E = typename std::iterator_traits<It>::value_type>
E std_dev(It begin, It end){
 E N = std::distance(begin, end);
 E const mean = std::accumulate(begin, end, E()) / N;
 auto sum_term = [mean](E init, E value)-> E { return init + (value - mean)*(value - mean); };
 E variance = std::accumulate(begin, end, E(), sum_term);
 return std::sqrt(variance / N);
}

Slightly stylized, with comparison to Boost Accumulator:

Live On Coliru

#include <iostream>
#include <iterator>
#include <vector>
#include <algorithm>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics.hpp>
template <typename It, typename E = typename std::iterator_traits<It>::value_type, typename R = typename std::common_type<double, E>::type>
R std_dev_boost(It begin, It end){
 namespace ba = boost::accumulators;
 ba::accumulator_set<R, ba::stats<ba::tag::variance> > accu;
 std::for_each(begin, end, std::ref(accu));
 return std::sqrt(ba::variance(accu));
}
template <typename It, 
 typename E = typename std::iterator_traits<It>::value_type, 
 typename R = typename std::common_type<double, E>::type>
R std_dev(It b, It e)
{
 R N = std::distance(b, e);
 R const mean = std::accumulate(b, e, R{}) / N;
 R variance = std::accumulate(b, e, R{}, [mean](R a, E v)-> R { return a + (v-mean)*(v-mean); });
 return std::sqrt(variance / N);
}
int main(){
 std::vector<int> stuff {35, 34, 36, 39, 35, 35, 35, 35, 35};
 std::cout << std_dev_boost(stuff.begin(), stuff.end()) << "\n";
 std::cout << std_dev (stuff.begin(), stuff.end()) << "\n";
}

Prints

1.34256
1.34256

Question 4

@CaptainGiraffe Mmm. I was mistaken there. The N-1 seems to be the only mistake that made the outcome wrong :)

Question 5

Ah. Well, see my update for how to derive a proper result type that works even if the input is an integer container.

Question 6

The N-1 isn't wrong. It's just computing a different standard deviation. To be specific, if you're computing the standard deviation of "the universe", you use N. If you're computing the standard deviation of a sample, you use N-1.

Question 7

@JerryCoffin Yeah, that was clear to me since Giraffe told me :) I was content contributing the style feedback

Question 8

A simple conciseness improvement would be to inline non-informative variables:

template <typename IT>
E std_dev(IT begin, IT end) {
 auto N = std::distance(begin, end);
 E variance = std::accumulate(begin, end, E{},
 [average=std::accumulate(begin, end, E{}) / N](E init, E value) -> E
 {
 return init + (value - average)*(value - average);
 });
 return std::sqrt(variance * 1.0 / (N - 1));
}

average was moved into the lambda capture (C++14(?)) and the lambda itself was passed directly to std::accumulate.

Question 9

You left in all the bugs.

Question 10

@sehe No, I didn't remove them. I'm here for conciseness (as stated in the answer), not for correctness.

Question 11

The main point of programming is to make it easier to read not more concise.

Question 12

If you want it concise, use valarray. stackoverflow.com/a/1723071/179910

sehe sehe 1,3987 silver badges14 bronze badges · Accepted Answer · 2016-03-19 00:36:37Z

Firstly, make it correct.

N is integral, you could make it E so you don't accidentally do integer arithmetic.

N-1 is wrong.

Rename average to mean.

Don't hardcode E.

You get:

template <typename It, typename E = typename std::iterator_traits<It>::value_type>
E std_dev(It begin, It end){
 E N = std::distance(begin, end);
 E const mean = std::accumulate(begin, end, E()) / N;
 auto sum_term = [mean](E init, E value)-> E { return init + (value - mean)*(value - mean); };
 E variance = std::accumulate(begin, end, E(), sum_term);
 return std::sqrt(variance / N);
}

Slightly stylized, with comparison to Boost Accumulator:

Live On Coliru

#include <iostream>
#include <iterator>
#include <vector>
#include <algorithm>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics.hpp>
template <typename It, typename E = typename std::iterator_traits<It>::value_type, typename R = typename std::common_type<double, E>::type>
R std_dev_boost(It begin, It end){
 namespace ba = boost::accumulators;
 ba::accumulator_set<R, ba::stats<ba::tag::variance> > accu;
 std::for_each(begin, end, std::ref(accu));
 return std::sqrt(ba::variance(accu));
}
template <typename It, 
 typename E = typename std::iterator_traits<It>::value_type, 
 typename R = typename std::common_type<double, E>::type>
R std_dev(It b, It e)
{
 R N = std::distance(b, e);
 R const mean = std::accumulate(b, e, R{}) / N;
 R variance = std::accumulate(b, e, R{}, [mean](R a, E v)-> R { return a + (v-mean)*(v-mean); });
 return std::sqrt(variance / N);
}
int main(){
 std::vector<int> stuff {35, 34, 36, 39, 35, 35, 35, 35, 35};
 std::cout << std_dev_boost(stuff.begin(), stuff.end()) << "\n";
 std::cout << std_dev (stuff.begin(), stuff.end()) << "\n";
}

Prints

1.34256
1.34256

@CaptainGiraffe Mmm. I was mistaken there. The N-1 seems to be the only mistake that made the outcome wrong :)
Ah. Well, see my update for how to derive a proper result type that works even if the input is an integer container.
The N-1 isn't wrong. It's just computing a different standard deviation. To be specific, if you're computing the standard deviation of "the universe", you use N. If you're computing the standard deviation of a sample, you use N-1.
@JerryCoffin Yeah, that was clear to me since Giraffe told me :) I was content contributing the style feedback

Stack Exchange Network

Standard deviation from iterators

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Standard deviation from iterators

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions