Random distribution in Ruby

Question 1

Below is a Ruby implementation of a random statistical event, based on a hash with the actual observed counts of outcomes.

I'd be interested in feedback in particular on what techniques I might use to avoid a loop-based accumulator in the RandomEvent#predict! method. I'm also very curious as well about any other suggestions on refactoring, patterns and performance that might be applicable here.

The statistics material itself might be somewhat beyond the scope of a review but I'd appreciate any thoughts on appropriate naming and more effective (deterministic) ways to test this.

Spec

include Statistics
describe RandomEvent do
 context 'when an event has only one outcome' do
 it 'always happens' do
 expect(RandomEvent.from_hash(always: 1).predict!).to eq(:always)
 end
 end
 context 'when the event has multiple outcomes' do
 let(:trials) { 10_000 }
 subject(:event) do
 RandomEvent.from_hash(heads: 51, tails: 49)
 end
 it 'should distribute them' do
 coinflips = trials.times.map { event.predict! }
 heads_variance = (coinflips.count(:heads) - trials/2).abs
 tails_variance = (coinflips.count(:tails) - trials/2).abs
 expected_variance = trials/10
 expect(heads_variance).to be < expected_variance
 expect(tails_variance).to be < expected_variance
 end
 end
end

Implementation

class RandomEvent
 def initialize
 @outcome_counts = {}
 end
 def add_outcome(outcome, count:)
 @outcome_counts[outcome] = count
 end
 def normalized_outcome_probabilities
 total_outcome_counts = @outcome_counts.values.reduce(&:+)
 @outcome_counts.inject({}) do |hash,(outcome,count)|
 hash[outcome] = count / total_outcome_counts.to_f
 hash
 end
 end
 def predict!
 acc = 0.0
 roll = rand
 selected_outcome = nil
 normalized_outcome_probabilities.each do |outcome, probability|
 acc += probability
 if acc > roll
 selected_outcome = outcome
 break
 end
 end
 selected_outcome
 end
 def self.from_hash(outcome_counts_hash)
 event = new
 outcome_counts_hash.each do |outcome, count|
 event.add_outcome(outcome, count: count)
 end
 event
 end
end

Question 2

First thing, it looks like RandomEvent.from_hash implements features of initialize method.

acc variable at RandomEvent#predict can be easily moved to inject iterator.

Code:

class RandomEvent
 def initialize(outcome_counts = {})
 @outcome_counts = outcome_counts
 end
 def add_outcome(outcome, count)
 @outcome_counts[outcome] = count
 end
 def normalized_outcome_probabilities
 total_outcome_counts = @outcome_counts.values.reduce(:+).to_f
 @outcome_counts.map { |outcome, count| [outcome, count / total_outcome_counts] }.to_h
 end
 def predict!
 roll = rand
 normalized_outcome_probabilities.inject(0.0) do |acc, (outcome, probability)|
 break outcome if (acc += probability) > roll
 acc
 end
 end
end

Now instead of RandomEvent.from_hash(heads: 51, tails: 49) you can write RandomEvent.new(heads: 51, tails: 49)

Question 3

Hey, this is a great approach! Although I think we need to return acc in the body of the inject inside predict!?

Question 4

Yes, definitely we need to return acc. Fixed.

Question 5

Unit test

The statistical reasoning in this unit test looks sloppy to me:

context 'when the event has multiple outcomes' do
 let(:trials) { 10_000 }
 subject(:event) do
 RandomEvent.from_hash(heads: 51, tails: 49)
 end
 it 'should distribute them' do
 coinflips = trials.times.map { event.predict! }
 heads_variance = (coinflips.count(:heads) - trials/2).abs
 tails_variance = (coinflips.count(:tails) - trials/2).abs
 expected_variance = trials/10
 expect(heads_variance).to be < expected_variance
 expect(tails_variance).to be < expected_variance
 end
end

It looks like you are flipping a slightly biased coin, but for some reason you expect heads and tails each to be 50%. Then, for 10000 trials, you're requiring the count of heads and tails to be within the 40%–60% range — a very generous band.

The head count should follow a binomial distribution, namely \$B(n=10000, p=0.51)\$. Applying Hoeffding's inequality

$$ \mathrm{Pr}(X \le k) \le e^{\frac{-2 (np - k)^2}{n}} $$

for \$k = 4000\$ gives

$$ \mathrm{Pr}(X \le 4000) \le e^{-242} \approx 8 \times 10^{-106} $$

The conclusion: for 10 coin flips, sure, the outcome can easily vary by 10%. For 10000 coin flips, due to the Law of large numbers and the Central Limit Theorem, it basically never happens. (It would be more likely that your computer is smashed by a meteor during the time it takes to run the test.)

Implementation

The convention is to use the ! suffix on methods that mutate the object. Your predict! method doesn't really mutate the RandomEvent object, so I wouldn't name it with a !.¹

Instead of doing the summation loop in predict!, it might be better to establish the cumulative thresholds in add_outcome, since that happens more rarely.

¹ Functional programming purists would note that the method consumes randomness from the pseudorandom number generator when predict! calls rand, and thus it does have a side-effect. By Ruby standards, though, I wouldn't consider it mutation. Besides, your add_outcome is much more of a mutation than predict!.

Question 6

"The convention is to use the ! suffix on methods that mutate the object." – No, the convention is to use the ! suffix when you have a pair of identical named methods which do roughly the same thing to mark the "more surprising" variant. See, e.g. Process::exit vs. Process::exit! which has nothing to do with mutation and e.g. String#replace which does mutate but doesn't have a pair and thus doesn't get a ! suffix.

Sergii K Sergii K 6664 silver badges12 bronze badges · Accepted Answer · 2016-06-24 06:21:15Z

First thing, it looks like RandomEvent.from_hash implements features of initialize method.

acc variable at RandomEvent#predict can be easily moved to inject iterator.

Code:

class RandomEvent
 def initialize(outcome_counts = {})
 @outcome_counts = outcome_counts
 end
 def add_outcome(outcome, count)
 @outcome_counts[outcome] = count
 end
 def normalized_outcome_probabilities
 total_outcome_counts = @outcome_counts.values.reduce(:+).to_f
 @outcome_counts.map { |outcome, count| [outcome, count / total_outcome_counts] }.to_h
 end
 def predict!
 roll = rand
 normalized_outcome_probabilities.inject(0.0) do |acc, (outcome, probability)|
 break outcome if (acc += probability) > roll
 acc
 end
 end
end

Now instead of RandomEvent.from_hash(heads: 51, tails: 49) you can write RandomEvent.new(heads: 51, tails: 49)

Hey, this is a great approach! Although I think we need to return acc in the body of the inject inside predict!?

Stack Exchange Network

Random distribution in Ruby

2 Answers 2

Unit test

Implementation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Random distribution in Ruby

2 Answers 2

Unit test

Implementation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions