4
\$\begingroup\$

I wrote a zscore algorithm in Ruby that runs fine, but is incredibly slow when I have 8000+ entries to score. Can anyone help me figure out a way to improve my code, please?

module Enumerable 
 def mean
 reduce(:+).to_f / length
 end
 def sample_variance
 sum = inject(0){ |acc, i| acc + (i - mean)**2 }
 1 / length.to_f * sum
 end
 def standard_deviation
 Math.sqrt(sample_variance)
 end
 def zscore
 if standard_deviation.zero?
 Array.new(length, 0)
 else
 collect { |v| (v - mean) / standard_deviation } 
 end
 end
end

The float is giving every score an accuracy of up to 17 decimal places. Would making it only 8 decimal places speed things up?

EDIT: Here is an updated version of the algorithm given the advice received in this thread.

class Array
 def mean(len=self.length)
 reduce(:+).to_f / len
 end
 def sample_variance
 len = length
 m = mean(len)
 sum = reduce { |acc, i| acc + (i - m)**2 }
 sum.to_f / len
 end
 def standard_deviation
 Math.sqrt(sample_variance)
 end
 def zscore
 stdev = standard_deviation
 m = mean
 stdev.zero? ? Array.new(length, 0) : collect { |v| (v - m) / stdev }
 end
end
asked May 18, 2014 at 20:14
\$\endgroup\$
3
  • \$\begingroup\$ If that is a sample variance (as opposed to a population variance), you need to divide by n-1 (not n) for it to be an unbiased estimator (though in this case the difference is negligible). Also, I suggest you write that sum / (length + 1.0). \$\endgroup\$ Commented May 21, 2014 at 8:38
  • \$\begingroup\$ @CarySwoveland I'm not a very mathy person...would you be so kind as to explain why I need to do sum/(length + 1.0) for sample variance? Thanks! Also, this is a population variance as I am running it on all records. \$\endgroup\$ Commented May 21, 2014 at 12:16
  • \$\begingroup\$ Unfortunately, it's a mathy answer. I think sum.to_f / len reads better than 1 / len.to_f * sum. \$\endgroup\$ Commented May 21, 2014 at 16:28

1 Answer 1

4
\$\begingroup\$

The problem is here: collect { |v| (v - mean) / standard_deviation }. standard_deviation is constant but, being inside a block, it is called on each iteration. Set the value to a local variable before. As noted by Flambino, the same principle applies to sample_variance (which uses mean inside a block).

In a functional language (where immutability is honored) the compiler would be able to do the right thing, but not in an imperative language plagued with side-effects like Ruby.

Some additional notes to your code:

  • module Enumerable: But you call .length, which is not a method that an enumerable is required to implement. Consider adding them to Array (which includes Enumerable).

  • reduce and then inject. I'd use just one of the alias.

answered May 18, 2014 at 21:38
\$\endgroup\$
4
  • \$\begingroup\$ I actually just noticed this before you wrote it. Setting standard_deviation to a variable at the beginning of the zscore method makes this run blazingly fast. Thanks \$\endgroup\$ Commented May 18, 2014 at 21:39
  • 1
    \$\begingroup\$ @DaniG2k You can do the same local var trick with mean in your zscore and sample_variance methods. Not as big a boost as storing standard_deviation, but the principle's the same \$\endgroup\$ Commented May 18, 2014 at 21:42
  • \$\begingroup\$ @tokland I'm confused: the reduce method should be called on an Enumerable but length should be called on an Array. Which is better to use? \$\endgroup\$ Commented May 18, 2014 at 21:53
  • 1
    \$\begingroup\$ @DaniG2k: Array includes Enumerable. \$\endgroup\$ Commented May 18, 2014 at 21:58

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.