Below is a Ruby implementation of a random statistical event, based on a hash with the actual observed counts of outcomes.
I'd be interested in feedback in particular on what techniques I might use to avoid a loop-based accumulator in the RandomEvent#predict!
method. I'm also very curious as well about any other suggestions on refactoring, patterns and performance that might be applicable here.
The statistics material itself might be somewhat beyond the scope of a review but I'd appreciate any thoughts on appropriate naming and more effective (deterministic) ways to test this.
Spec
include Statistics
describe RandomEvent do
context 'when an event has only one outcome' do
it 'always happens' do
expect(RandomEvent.from_hash(always: 1).predict!).to eq(:always)
end
end
context 'when the event has multiple outcomes' do
let(:trials) { 10_000 }
subject(:event) do
RandomEvent.from_hash(heads: 51, tails: 49)
end
it 'should distribute them' do
coinflips = trials.times.map { event.predict! }
heads_variance = (coinflips.count(:heads) - trials/2).abs
tails_variance = (coinflips.count(:tails) - trials/2).abs
expected_variance = trials/10
expect(heads_variance).to be < expected_variance
expect(tails_variance).to be < expected_variance
end
end
end
Implementation
class RandomEvent
def initialize
@outcome_counts = {}
end
def add_outcome(outcome, count:)
@outcome_counts[outcome] = count
end
def normalized_outcome_probabilities
total_outcome_counts = @outcome_counts.values.reduce(&:+)
@outcome_counts.inject({}) do |hash,(outcome,count)|
hash[outcome] = count / total_outcome_counts.to_f
hash
end
end
def predict!
acc = 0.0
roll = rand
selected_outcome = nil
normalized_outcome_probabilities.each do |outcome, probability|
acc += probability
if acc > roll
selected_outcome = outcome
break
end
end
selected_outcome
end
def self.from_hash(outcome_counts_hash)
event = new
outcome_counts_hash.each do |outcome, count|
event.add_outcome(outcome, count: count)
end
event
end
end
2 Answers 2
First thing, it looks like RandomEvent.from_hash
implements features of initialize
method.
acc
variable at RandomEvent#predict
can be easily moved to inject
iterator.
Code:
class RandomEvent
def initialize(outcome_counts = {})
@outcome_counts = outcome_counts
end
def add_outcome(outcome, count)
@outcome_counts[outcome] = count
end
def normalized_outcome_probabilities
total_outcome_counts = @outcome_counts.values.reduce(:+).to_f
@outcome_counts.map { |outcome, count| [outcome, count / total_outcome_counts] }.to_h
end
def predict!
roll = rand
normalized_outcome_probabilities.inject(0.0) do |acc, (outcome, probability)|
break outcome if (acc += probability) > roll
acc
end
end
end
Now instead of RandomEvent.from_hash(heads: 51, tails: 49)
you can write RandomEvent.new(heads: 51, tails: 49)
-
\$\begingroup\$ Hey, this is a great approach! Although I think we need to return
acc
in the body of theinject
insidepredict!
? \$\endgroup\$Joseph Weissman– Joseph Weissman2016年06月24日 14:43:22 +00:00Commented Jun 24, 2016 at 14:43 -
\$\begingroup\$ Yes, definitely we need to return
acc
. Fixed. \$\endgroup\$Sergii K– Sergii K2016年06月24日 19:48:20 +00:00Commented Jun 24, 2016 at 19:48
Unit test
The statistical reasoning in this unit test looks sloppy to me:
context 'when the event has multiple outcomes' do let(:trials) { 10_000 } subject(:event) do RandomEvent.from_hash(heads: 51, tails: 49) end it 'should distribute them' do coinflips = trials.times.map { event.predict! } heads_variance = (coinflips.count(:heads) - trials/2).abs tails_variance = (coinflips.count(:tails) - trials/2).abs expected_variance = trials/10 expect(heads_variance).to be < expected_variance expect(tails_variance).to be < expected_variance end end
It looks like you are flipping a slightly biased coin, but for some reason you expect heads and tails each to be 50%. Then, for 10000 trials, you're requiring the count of heads and tails to be within the 40%–60% range — a very generous band.
The head count should follow a binomial distribution, namely \$B(n=10000, p=0.51)\$. Applying Hoeffding's inequality
$$ \mathrm{Pr}(X \le k) \le e^{\frac{-2 (np - k)^2}{n}} $$
for \$k = 4000\$ gives
$$ \mathrm{Pr}(X \le 4000) \le e^{-242} \approx 8 \times 10^{-106} $$
The conclusion: for 10 coin flips, sure, the outcome can easily vary by 10%. For 10000 coin flips, due to the Law of large numbers and the Central Limit Theorem, it basically never happens. (It would be more likely that your computer is smashed by a meteor during the time it takes to run the test.)
Implementation
The convention is to use the !
suffix on methods that mutate the object. Your predict!
method doesn't really mutate the RandomEvent
object, so I wouldn't name it with a !
.1
Instead of doing the summation loop in predict!
, it might be better to establish the cumulative thresholds in add_outcome
, since that happens more rarely.
1 Functional programming purists would note that the method consumes randomness from the pseudorandom number generator when predict!
calls rand
, and thus it does have a side-effect. By Ruby standards, though, I wouldn't consider it mutation. Besides, your add_outcome
is much more of a mutation than predict!
.
-
\$\begingroup\$ "The convention is to use the
!
suffix on methods that mutate the object." – No, the convention is to use the!
suffix when you have a pair of identical named methods which do roughly the same thing to mark the "more surprising" variant. See, e.g.Process::exit
vs.Process::exit!
which has nothing to do with mutation and e.g.String#replace
which does mutate but doesn't have a pair and thus doesn't get a!
suffix. \$\endgroup\$Jörg W Mittag– Jörg W Mittag2016年07月29日 12:44:00 +00:00Commented Jul 29, 2016 at 12:44
Explore related questions
See similar questions with these tags.