2
\$\begingroup\$

Below is a Ruby implementation of a random statistical event, based on a hash with the actual observed counts of outcomes.

I'd be interested in feedback in particular on what techniques I might use to avoid a loop-based accumulator in the RandomEvent#predict! method. I'm also very curious as well about any other suggestions on refactoring, patterns and performance that might be applicable here.

The statistics material itself might be somewhat beyond the scope of a review but I'd appreciate any thoughts on appropriate naming and more effective (deterministic) ways to test this.

Spec

include Statistics
describe RandomEvent do
 context 'when an event has only one outcome' do
 it 'always happens' do
 expect(RandomEvent.from_hash(always: 1).predict!).to eq(:always)
 end
 end
 context 'when the event has multiple outcomes' do
 let(:trials) { 10_000 }
 subject(:event) do
 RandomEvent.from_hash(heads: 51, tails: 49)
 end
 it 'should distribute them' do
 coinflips = trials.times.map { event.predict! }
 heads_variance = (coinflips.count(:heads) - trials/2).abs
 tails_variance = (coinflips.count(:tails) - trials/2).abs
 expected_variance = trials/10
 expect(heads_variance).to be < expected_variance
 expect(tails_variance).to be < expected_variance
 end
 end
end

Implementation

class RandomEvent
 def initialize
 @outcome_counts = {}
 end
 def add_outcome(outcome, count:)
 @outcome_counts[outcome] = count
 end
 def normalized_outcome_probabilities
 total_outcome_counts = @outcome_counts.values.reduce(&:+)
 @outcome_counts.inject({}) do |hash,(outcome,count)|
 hash[outcome] = count / total_outcome_counts.to_f
 hash
 end
 end
 def predict!
 acc = 0.0
 roll = rand
 selected_outcome = nil
 normalized_outcome_probabilities.each do |outcome, probability|
 acc += probability
 if acc > roll
 selected_outcome = outcome
 break
 end
 end
 selected_outcome
 end
 def self.from_hash(outcome_counts_hash)
 event = new
 outcome_counts_hash.each do |outcome, count|
 event.add_outcome(outcome, count: count)
 end
 event
 end
end
200_success
145k22 gold badges190 silver badges478 bronze badges
asked Jun 23, 2016 at 20:49
\$\endgroup\$

2 Answers 2

1
\$\begingroup\$

First thing, it looks like RandomEvent.from_hash implements features of initialize method.

acc variable at RandomEvent#predict can be easily moved to inject iterator.

Code:

class RandomEvent
 def initialize(outcome_counts = {})
 @outcome_counts = outcome_counts
 end
 def add_outcome(outcome, count)
 @outcome_counts[outcome] = count
 end
 def normalized_outcome_probabilities
 total_outcome_counts = @outcome_counts.values.reduce(:+).to_f
 @outcome_counts.map { |outcome, count| [outcome, count / total_outcome_counts] }.to_h
 end
 def predict!
 roll = rand
 normalized_outcome_probabilities.inject(0.0) do |acc, (outcome, probability)|
 break outcome if (acc += probability) > roll
 acc
 end
 end
end

Now instead of RandomEvent.from_hash(heads: 51, tails: 49) you can write RandomEvent.new(heads: 51, tails: 49)

answered Jun 24, 2016 at 6:21
\$\endgroup\$
2
  • \$\begingroup\$ Hey, this is a great approach! Although I think we need to return acc in the body of the inject inside predict!? \$\endgroup\$ Commented Jun 24, 2016 at 14:43
  • \$\begingroup\$ Yes, definitely we need to return acc. Fixed. \$\endgroup\$ Commented Jun 24, 2016 at 19:48
2
\$\begingroup\$

Unit test

The statistical reasoning in this unit test looks sloppy to me:

context 'when the event has multiple outcomes' do
 let(:trials) { 10_000 }
 subject(:event) do
 RandomEvent.from_hash(heads: 51, tails: 49)
 end
 it 'should distribute them' do
 coinflips = trials.times.map { event.predict! }
 heads_variance = (coinflips.count(:heads) - trials/2).abs
 tails_variance = (coinflips.count(:tails) - trials/2).abs
 expected_variance = trials/10
 expect(heads_variance).to be < expected_variance
 expect(tails_variance).to be < expected_variance
 end
end

It looks like you are flipping a slightly biased coin, but for some reason you expect heads and tails each to be 50%. Then, for 10000 trials, you're requiring the count of heads and tails to be within the 40%–60% range — a very generous band.

The head count should follow a binomial distribution, namely \$B(n=10000, p=0.51)\$. Applying Hoeffding's inequality

$$ \mathrm{Pr}(X \le k) \le e^{\frac{-2 (np - k)^2}{n}} $$

for \$k = 4000\$ gives

$$ \mathrm{Pr}(X \le 4000) \le e^{-242} \approx 8 \times 10^{-106} $$

The conclusion: for 10 coin flips, sure, the outcome can easily vary by 10%. For 10000 coin flips, due to the Law of large numbers and the Central Limit Theorem, it basically never happens. (It would be more likely that your computer is smashed by a meteor during the time it takes to run the test.)

Implementation

The convention is to use the ! suffix on methods that mutate the object. Your predict! method doesn't really mutate the RandomEvent object, so I wouldn't name it with a !.1

Instead of doing the summation loop in predict!, it might be better to establish the cumulative thresholds in add_outcome, since that happens more rarely.


1 Functional programming purists would note that the method consumes randomness from the pseudorandom number generator when predict! calls rand, and thus it does have a side-effect. By Ruby standards, though, I wouldn't consider it mutation. Besides, your add_outcome is much more of a mutation than predict!.

answered Jun 24, 2016 at 21:06
\$\endgroup\$
1
  • \$\begingroup\$ "The convention is to use the ! suffix on methods that mutate the object." – No, the convention is to use the ! suffix when you have a pair of identical named methods which do roughly the same thing to mark the "more surprising" variant. See, e.g. Process::exit vs. Process::exit! which has nothing to do with mutation and e.g. String#replace which does mutate but doesn't have a pair and thus doesn't get a ! suffix. \$\endgroup\$ Commented Jul 29, 2016 at 12:44

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.