Weighted Probability in Go

Question 1

I was recently tasked a programming assignment with a prospective company, unfortunately I didn't pass the assignment and didn't receive any feedback on what I could have done better. I would appreciate any and all feedback on how I could make the program below better.

Task:

Write a class or module that has a function or method that returns a random number chosen from a set of numbers where each has a specified probability. For example, the input could be an associative array consisting of:

1 => 0.25

2 => 0.5

7 => 0.25

When the function or method is called repeatedly, the above input might yield 2, 7, 1, 1, 2, 2, 2, 7, 2, etc.

Additional requirements:

The random number generator will be called billions of times, so performance is important.

The probabilities will be updated very infrequently, if ever. But neither the probabilities nor their distribution is known before processing.

The size of the set of numbers will typically be at least a thousand, possibly well into the millions, each with its own probability. No other number should be returned except for those specified in the input.

The class or module should support the ability to replay the same sequence of random numbers regardless of platform it is compiled on and regardless of whether the program is directly translated to another programming language. The class does not need to read or write to any device, but it is desirable to minimize the amount of data stored to reproduce the replay.

The code must be written such that a competent developer could translate your code directly into another common programming language without having to retrieve any code not written by you.

You can use whatever programming language you like as long as it is suitable for practical use (e.g., Befunge and INTERCAL are not acceptable but Python, C++, or Haskell is), has a visible community online, and is one we can freely and easily download and setup to interpret or run the code (e.g., MATLAB is not acceptable, but Octave is).

Code:

package main
import (
 "errors"
 "math/rand"
 "sort"
)
/* WeightedProbability is the class to handle all of the weights etc
 * There will be a values and weights slice, think ArrayList in Java
 * Total will be the total value of weights
 * ReplayValues is a slice containing the replay sequence
 */
type WeightedProbability struct {
 ValueWeights []vw //assume float 64 because with such a large input size you would need decimal percentages, ie .25%
 Total float64 // total value of the weights
 ReplayValues []float64
 Initialized bool
}
/* vw is a struct used to map the Key Value stores of the map to a slice
 */
type vw struct {
 Value float64
 Weight float64
}
/* sortByWeight sorts a map by weight and return a slice of struct vw and the total
 * Can then perform binary search on a slice of structs, can't perform binary search on map
 * @m float[64]float64
 */
func sortByWeight(m map[float64]float64) ([]vw, float64) {
 var mkv []vw
 total := 0.0
 for value, weight := range m {
 total += weight
 mkv = append(mkv, vw{value, total})
 }
 sort.Slice(mkv, func(i, j int) bool {
 return mkv[i].Weight < mkv[j].Weight
 })
 return mkv, total
}
/* Init
 * @numPool float[64]float64
 * numPool of tbe form [value] => [weight]
 * Ex: 7 => .25 so 7 will appear with 25% frequency
 * Total Probabilty can be over 100%
 * Algorithm takes O(N) to create the weights and values
 * Since using a Map there should be no duplicates except ones of form 7 vs 7.00
 */
func (wp *WeightedProbability) Init(numPool map[float64]float64) error {
 if numPool == nil {
 return errors.New("Number Pool is not initialized!")
 }
 valueWeights, total := sortByWeight(numPool)
 if total > 100.00 {
 return errors.New("Total is greater than 100!")
 }
 replayValues := []float64{}
 wp.ValueWeights = valueWeights
 wp.Total = total
 wp.ReplayValues = replayValues
 wp.Initialized = true
 return nil
}
/* GenerateRandomNumber
 * Returns an error if the struct is not initialized or if struct is unable to generate a random number
 * Sort.Search uses binary search to find the index of the first weight >= x
 */
func (wp *WeightedProbability) GenerateRandomNumber() (float64, error) {
 if !wp.Initialized {
 return 0, errors.New("Not initialized")
 }
 x := rand.Float64() * wp.Total
 // search the distribution, essentially the same as pythons bisect_left
 i := sort.Search(len(wp.ValueWeights), func(i int) bool {
 return wp.ValueWeights[i].Weight >= x
 })
 if i >= len(wp.ValueWeights) {
 return 0, errors.New("Index to big")
 }
 wp.ReplayValues = append(wp.ReplayValues, wp.ValueWeights[i].Value)
 return wp.ValueWeights[i].Value, nil
}
/* Replay
 * Since we were told we shouldn't write to a file, just store and return the slice
 */
func (wp *WeightedProbability) Replay() ([]float64, error) {
 if !wp.Initialized {
 return nil, errors.New("Not initialized")
 }
 return wp.ReplayValues, nil
}

Question 2

By the way, I don't know what job it was you were applying for, but unless it's specifically in the domain of data science / statistical analysis, be glad you aren't working for a company that's fishing for obscure algorithms.

Question 3

Hello @AlexReinking, thank you for the feedback and the excellent answer! I was applying as a regular React/Express software engineer, so I'm pretty glad things didn't work out at this point of time.

Question 4

That is absolutely embarrassing for the company. Front end work has nothing in common with this.

Question 5

You are looking for the Alias method as outlined in this StackOverflow question. The basic idea is to precompute some tables that turn a random number in [0, 1) into a value in your distribution. It works by creating a number of binary distributions such that when you select one of them uniformly at random, and then select between its two symbols, you recover the original probabilities.

There is a detailed explanation and a C++ implementation available here.

Question 6

That is a great algorithm I'd never heard of before. Thanks.

Question 7

@Josiah - You're welcome! It's a good one with mostly hand-rolled implementations. The constant is kind of high, but the asymptotic complexity is better. If you need to sample tons of numbers from the distribution, or there are a lot of different options, then this is best. Both are the case in this question.

Question 8

C++ implementation link is broken

Question 9

@Tony - still working for me.

Question 10

I guess I got an intermittent error. Please ignore. Meanwhile I found a C implementation : github.com/Tecnarca/Vose-Alias-Method/blob/master/extractor.c and a Java implementation : keithschwarz.com/interesting/code/?dir=alias-method I'm thinking of writing a go implementation

Question 11

I see two main issues with this piece of code:

the code is hard to read, and not idiomatic
there is no tests/benchmarks

1. Readability

It's always a good idea to run golint and govet on your code to detect common style mistakes, here mainly comment/error message formatting.

The name WeightedProbability might not be clear enough. Maybe Generator is enough?

The Init() function is not a good way to initialize a WeightedProbability.

Instead of

wp := &WeightedProbability{}
err := wp.Init(map)
if err != nil {
 ...
}

we could add a NewGenerator() method doing the same thing, which is the common way of doing this in go:

g, err := NewGenerator(map) 
if err != nil {
 ...
}

The Replay() function should not exists either. Intead, as @Josiah already said, intantiate the generator with a seed:

func NewGenerator(seed int64, numPool map[float64]float64) (*Generator, error) {
 ...
 return &Generator{
 randSource: rand.NewSource(seed),
 ...
 }, nil
}

Don't use a map[float64]float64 to get the number set, because iteration order is not garanteed from one iteration to the next (see go maps in action for details) This can be a problem if some value have the same weight: the order of ValueWeights randomly change from a run to another. Intead, use two slice of []float64

A new version of the could could look like this:

package generator
import (
 "fmt"
 "math/rand"
 "sort"
)
type numberSet struct {
 values []float64
 bounds []float64
}
func newNumberSet(values []float64, weight []float64) (*numberSet, error) {
 if len(values) != len(weight) {
 return nil, fmt.Errorf("values and weight should have the same length")
 }
 s := &numberSet{
 values: values,
 bounds: weight,
 }
 sort.Sort(s)
 sum := float64(0)
 for i, weight := range s.bounds {
 sum += weight
 s.bounds[i] = sum
 }
 if sum-1 > 1e9 {
 return nil, fmt.Errorf("sum of weight should be 1, but was %f", sum)
 }
 return s, nil
}
func (s *numberSet) Len() int { return len(s.values) }
func (s *numberSet) Swap(i, j int) {
 s.values[i], s.values[j] = s.values[j], s.values[i]
 s.bounds[i], s.bounds[j] = s.bounds[j], s.bounds[i]
}
func (s *numberSet) Less(i, j int) bool { return s.bounds[i] < s.bounds[j] }
// Generator is a struct that can returns a random number chosen from a set
// of numbers where each has a specified probability.
type Generator struct {
 randSource rand.Source
 size int
 numberSet
}
// NewGenerator return a Generator. It returns an error if len(weight) != len(values),
// or if the sum of weights is != 1.
// Two Generators with same seed, values and weight will always produce the same sequence
// of random number
func NewGenerator(seed int64, values []float64, weight []float64) (*Generator, error) {
 s, err := newNumberSet(values, weight)
 if err != nil {
 return nil, err
 }
 return &Generator{
 randSource: rand.NewSource(seed),
 size: len(values),
 numberSet: *s,
 }, nil
}
// Random returns a random number from the generator number set.
func (g *Generator) Random() float64 {
 r := float64(g.randSource.Int63()) / (1 << 63)
 i := sort.Search(g.size, func(i int) bool {
 return g.bounds[i] >= r
 })
 return g.values[i]
}

2. Tests / benchmark

First thing to do is to make sure that the code works as expected. Test should make sure that value weight are correctly taken into account, and that two generators with same seed and same value always generate the same sequence of values.

After that, we can add a benchmark to see how our solution performs, and more importantly how it scales:

package generator
import (
 "fmt"
 "testing"
 "time"
)
const (
 nbGeneration = 100000
 setSize = 10
)
func TestGeneratorDistribution(t *testing.T) {
 g := newGenerator(t, time.Now().Unix(), setSize)
 numbers := map[float64]float64{}
 for i := 0; i < nbGeneration; i++ {
 r := g.Random()
 if _, ok := numbers[r]; !ok {
 numbers[r] = 0
 }
 numbers[r]++
 }
 startBound := float64(0)
 for index, value := range g.values {
 got := numbers[value]
 want := (g.bounds[index] - startBound) * nbGeneration
 delta := want * 5 / 100
 if got > want+delta || got < want-delta {
 t.Errorf("distribution not correct for value %f, extected %f +/- 5%%, but got %f", value, want, got)
 }
 startBound = g.bounds[index]
 }
}
func TestGeneratorSeeding(t *testing.T) {
 seed := int64(1)
 firstRun := generate(t, seed, nbGeneration)
 secondRun := generate(t, seed, nbGeneration)
 for i := 0; i < nbGeneration; i++ {
 if firstRun[i] != secondRun[i] {
 t.Errorf("expected same sequence of number, but at pos %d, got %f and %f", i, firstRun[i], secondRun[i])
 }
 }
}
func generate(t *testing.T, seed int64, size int) []float64 {
 g := newGenerator(t, seed, setSize)
 runVal := make([]float64, 0, size)
 for i := 0; i < size; i++ {
 runVal = append(runVal, g.Random())
 }
 return runVal
}
func newGenerator(t *testing.T, seed int64, size int) *Generator {
 values := make([]float64, 0, size)
 weight := make([]float64, 0, size)
 p := float64(1) / float64(size)
 for i := 0; i < size; i++ {
 values = append(values, float64(i))
 weight = append(weight, p)
 }
 g, err := NewGenerator(seed, values, weight)
 if err != nil {
 t.Error(err)
 }
 return g
}
func BenchmarkScaling(b *testing.B) {
 for size := 2; size <= 1024; size = size * 2 {
 name := fmt.Sprintf("numberSet_size_%d", size)
 b.Run(name, func(b *testing.B) {
 g := newGenerator(nil, 1, size)
 b.ResetTimer()
 for n := 0; n < b.N; n++ {
 g.Random()
 }
 })
 }
}

from now, we can check that we don't break anything when modifying the code. We can also accurately measure the performance gain...

3. Results

We can see a clear performance improvement with the new code:

old code:

BenchmarkScaling/numberSet_size_2-2 20000000 112 ns/op 41 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_4-2 20000000 109 ns/op 41 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_8-2 10000000 119 ns/op 42 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_16-2 10000000 130 ns/op 42 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_32-2 10000000 140 ns/op 42 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_64-2 10000000 157 ns/op 42 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_128-2 10000000 174 ns/op 42 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_256-2 10000000 185 ns/op 42 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_512-2 10000000 194 ns/op 42 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_1024-2 10000000 207 ns/op 42 B/op 0 allocs/op

new code:

BenchmarkScaling/numberSet_size_2-2 50000000 33.8 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_4-2 30000000 41.4 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_8-2 30000000 51.7 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_16-2 20000000 64.4 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_32-2 20000000 76.7 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_64-2 20000000 91.3 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_128-2 20000000 106 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_256-2 10000000 116 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_512-2 10000000 126 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_1024-2 10000000 137 ns/op 0 B/op 0 allocs/op

There's still lots of room for improvement, but it should be a good start !

Question 12

Thank you so much for writing this all up! Quick question, why user fmt.Errorf rather than errors.New()?

Question 13

+1 - I don't speak Go, but it's good to have a perspective on implementation details in the language as an answer.

Question 14

@Turtle It's really handy for error message with dynamic value, and I personnaly find it easier to read, but that's just my opinion !

Question 15

@felix you learn something new every day! Thank you so much for sharing!

Question 16

The general approach seems sensible here, of generating the CDF and selecting a uniform number to do the look up.

I also like the prevalence of comments,

The approach to allowing replays is probably what catches you out most. Storing the list is space and time inefficient. Instead, the conventional approach with a prng is to remember a fixed seed, and reseed with it if needed.

Aside from that, tiny acronym variable names are generally discouraged, with preference for descriptive names.

if total > 100.00 {

That is fine, but I would suggest seeking consistency. You are not enforcing the distributions sum as high as 100, so not going over may be an unnecessary restriction. Note also that very slight floating point errors could trigger that fail case.

For a very slight microoptimisation, considering pre-dividing the thresholds by total rather than multiplying by total each random number generated.

Question 17

I love your idea to avoid storing the replays! Thank you so much for the feedback!

Question 18

Storing the PRNG seed is important, yes, but generating a CDF and doing binary search is extremely slow when billions of queries will be made to the generator. An O(1) approach should be used instead.

Alex Reinking Alex Reinking 3232 silver badges9 bronze badges · Answer 1 · 2018-05-14 21:47:21Z

5

\$\begingroup\$

You are looking for the Alias method as outlined in this StackOverflow question. The basic idea is to precompute some tables that turn a random number in [0, 1) into a value in your distribution. It works by creating a number of binary distributions such that when you select one of them uniformly at random, and then select between its two symbols, you recover the original probabilities.

There is a detailed explanation and a C++ implementation available here.

Share

answered May 14, 2018 at 21:47

Alex Reinking's user avatar

Alex Reinking Alex Reinking

3232 silver badges9 bronze badges

\$\endgroup\$

5

\$\begingroup\$ That is a great algorithm I'd never heard of before. Thanks. \$\endgroup\$

Josiah
– Josiah

2018年05月15日 13:02:59 +00:00
Commented May 15, 2018 at 13:02
\$\begingroup\$ @Josiah - You're welcome! It's a good one with mostly hand-rolled implementations. The constant is kind of high, but the asymptotic complexity is better. If you need to sample tons of numbers from the distribution, or there are a lot of different options, then this is best. Both are the case in this question. \$\endgroup\$

Alex Reinking
– Alex Reinking

2018年05月15日 16:40:57 +00:00
Commented May 15, 2018 at 16:40
\$\begingroup\$ C++ implementation link is broken \$\endgroup\$

Tony
– Tony

2021年08月11日 18:31:41 +00:00
Commented Aug 11, 2021 at 18:31
\$\begingroup\$ @Tony - still working for me. \$\endgroup\$

Alex Reinking
– Alex Reinking

2021年08月11日 21:41:50 +00:00
Commented Aug 11, 2021 at 21:41
\$\begingroup\$ I guess I got an intermittent error. Please ignore. Meanwhile I found a C implementation : github.com/Tecnarca/Vose-Alias-Method/blob/master/extractor.c and a Java implementation : keithschwarz.com/interesting/code/?dir=alias-method I'm thinking of writing a go implementation \$\endgroup\$

Tony
– Tony

2021年08月11日 22:52:23 +00:00
Commented Aug 11, 2021 at 22:52

Add a comment |

felix felix 6085 silver badges16 bronze badges · Answer 2 · 2018-05-15 13:39:40Z

I see two main issues with this piece of code:

the code is hard to read, and not idiomatic
there is no tests/benchmarks

1. Readability

It's always a good idea to run golint and govet on your code to detect common style mistakes, here mainly comment/error message formatting.

The name WeightedProbability might not be clear enough. Maybe Generator is enough?

The Init() function is not a good way to initialize a WeightedProbability.

Instead of

wp := &WeightedProbability{}
err := wp.Init(map)
if err != nil {
 ...
}

we could add a NewGenerator() method doing the same thing, which is the common way of doing this in go:

g, err := NewGenerator(map) 
if err != nil {
 ...
}

The Replay() function should not exists either. Intead, as @Josiah already said, intantiate the generator with a seed:

func NewGenerator(seed int64, numPool map[float64]float64) (*Generator, error) {
 ...
 return &Generator{
 randSource: rand.NewSource(seed),
 ...
 }, nil
}

Don't use a map[float64]float64 to get the number set, because iteration order is not garanteed from one iteration to the next (see go maps in action for details) This can be a problem if some value have the same weight: the order of ValueWeights randomly change from a run to another. Intead, use two slice of []float64

A new version of the could could look like this:

package generator
import (
 "fmt"
 "math/rand"
 "sort"
)
type numberSet struct {
 values []float64
 bounds []float64
}
func newNumberSet(values []float64, weight []float64) (*numberSet, error) {
 if len(values) != len(weight) {
 return nil, fmt.Errorf("values and weight should have the same length")
 }
 s := &numberSet{
 values: values,
 bounds: weight,
 }
 sort.Sort(s)
 sum := float64(0)
 for i, weight := range s.bounds {
 sum += weight
 s.bounds[i] = sum
 }
 if sum-1 > 1e9 {
 return nil, fmt.Errorf("sum of weight should be 1, but was %f", sum)
 }
 return s, nil
}
func (s *numberSet) Len() int { return len(s.values) }
func (s *numberSet) Swap(i, j int) {
 s.values[i], s.values[j] = s.values[j], s.values[i]
 s.bounds[i], s.bounds[j] = s.bounds[j], s.bounds[i]
}
func (s *numberSet) Less(i, j int) bool { return s.bounds[i] < s.bounds[j] }
// Generator is a struct that can returns a random number chosen from a set
// of numbers where each has a specified probability.
type Generator struct {
 randSource rand.Source
 size int
 numberSet
}
// NewGenerator return a Generator. It returns an error if len(weight) != len(values),
// or if the sum of weights is != 1.
// Two Generators with same seed, values and weight will always produce the same sequence
// of random number
func NewGenerator(seed int64, values []float64, weight []float64) (*Generator, error) {
 s, err := newNumberSet(values, weight)
 if err != nil {
 return nil, err
 }
 return &Generator{
 randSource: rand.NewSource(seed),
 size: len(values),
 numberSet: *s,
 }, nil
}
// Random returns a random number from the generator number set.
func (g *Generator) Random() float64 {
 r := float64(g.randSource.Int63()) / (1 << 63)
 i := sort.Search(g.size, func(i int) bool {
 return g.bounds[i] >= r
 })
 return g.values[i]
}

2. Tests / benchmark

First thing to do is to make sure that the code works as expected. Test should make sure that value weight are correctly taken into account, and that two generators with same seed and same value always generate the same sequence of values.

After that, we can add a benchmark to see how our solution performs, and more importantly how it scales:

package generator
import (
 "fmt"
 "testing"
 "time"
)
const (
 nbGeneration = 100000
 setSize = 10
)
func TestGeneratorDistribution(t *testing.T) {
 g := newGenerator(t, time.Now().Unix(), setSize)
 numbers := map[float64]float64{}
 for i := 0; i < nbGeneration; i++ {
 r := g.Random()
 if _, ok := numbers[r]; !ok {
 numbers[r] = 0
 }
 numbers[r]++
 }
 startBound := float64(0)
 for index, value := range g.values {
 got := numbers[value]
 want := (g.bounds[index] - startBound) * nbGeneration
 delta := want * 5 / 100
 if got > want+delta || got < want-delta {
 t.Errorf("distribution not correct for value %f, extected %f +/- 5%%, but got %f", value, want, got)
 }
 startBound = g.bounds[index]
 }
}
func TestGeneratorSeeding(t *testing.T) {
 seed := int64(1)
 firstRun := generate(t, seed, nbGeneration)
 secondRun := generate(t, seed, nbGeneration)
 for i := 0; i < nbGeneration; i++ {
 if firstRun[i] != secondRun[i] {
 t.Errorf("expected same sequence of number, but at pos %d, got %f and %f", i, firstRun[i], secondRun[i])
 }
 }
}
func generate(t *testing.T, seed int64, size int) []float64 {
 g := newGenerator(t, seed, setSize)
 runVal := make([]float64, 0, size)
 for i := 0; i < size; i++ {
 runVal = append(runVal, g.Random())
 }
 return runVal
}
func newGenerator(t *testing.T, seed int64, size int) *Generator {
 values := make([]float64, 0, size)
 weight := make([]float64, 0, size)
 p := float64(1) / float64(size)
 for i := 0; i < size; i++ {
 values = append(values, float64(i))
 weight = append(weight, p)
 }
 g, err := NewGenerator(seed, values, weight)
 if err != nil {
 t.Error(err)
 }
 return g
}
func BenchmarkScaling(b *testing.B) {
 for size := 2; size <= 1024; size = size * 2 {
 name := fmt.Sprintf("numberSet_size_%d", size)
 b.Run(name, func(b *testing.B) {
 g := newGenerator(nil, 1, size)
 b.ResetTimer()
 for n := 0; n < b.N; n++ {
 g.Random()
 }
 })
 }
}

from now, we can check that we don't break anything when modifying the code. We can also accurately measure the performance gain...

3. Results

We can see a clear performance improvement with the new code:

old code:

BenchmarkScaling/numberSet_size_2-2 20000000 112 ns/op 41 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_4-2 20000000 109 ns/op 41 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_8-2 10000000 119 ns/op 42 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_16-2 10000000 130 ns/op 42 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_32-2 10000000 140 ns/op 42 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_64-2 10000000 157 ns/op 42 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_128-2 10000000 174 ns/op 42 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_256-2 10000000 185 ns/op 42 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_512-2 10000000 194 ns/op 42 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_1024-2 10000000 207 ns/op 42 B/op 0 allocs/op

new code:

BenchmarkScaling/numberSet_size_2-2 50000000 33.8 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_4-2 30000000 41.4 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_8-2 30000000 51.7 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_16-2 20000000 64.4 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_32-2 20000000 76.7 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_64-2 20000000 91.3 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_128-2 20000000 106 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_256-2 10000000 116 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_512-2 10000000 126 ns/op 0 B/op 0 allocs/op
BenchmarkScaling/numberSet_size_1024-2 10000000 137 ns/op 0 B/op 0 allocs/op

There's still lots of room for improvement, but it should be a good start !

Thank you so much for writing this all up! Quick question, why user fmt.Errorf rather than errors.New()?
+1 - I don't speak Go, but it's good to have a perspective on implementation details in the language as an answer.
@Turtle It's really handy for error message with dynamic value, and I personnaly find it easier to read, but that's just my opinion !
@felix you learn something new every day! Thank you so much for sharing!

Josiah Josiah 6,1062 gold badges18 silver badges39 bronze badges · Answer 3 · 2018-05-14 21:34:07Z

The general approach seems sensible here, of generating the CDF and selecting a uniform number to do the look up.

I also like the prevalence of comments,

The approach to allowing replays is probably what catches you out most. Storing the list is space and time inefficient. Instead, the conventional approach with a prng is to remember a fixed seed, and reseed with it if needed.

Aside from that, tiny acronym variable names are generally discouraged, with preference for descriptive names.

if total > 100.00 {

That is fine, but I would suggest seeking consistency. You are not enforcing the distributions sum as high as 100, so not going over may be an unnecessary restriction. Note also that very slight floating point errors could trigger that fail case.

For a very slight microoptimisation, considering pre-dividing the thresholds by total rather than multiplying by total each random number generated.

I love your idea to avoid storing the replays! Thank you so much for the feedback!
Storing the PRNG seed is important, yes, but generating a CDF and doing binary search is extremely slow when billions of queries will be made to the generator. An O(1) approach should be used instead.

Stack Exchange Network

Weighted Probability in Go

3 Answers 3

1. Readability

2. Tests / benchmark

3. Results

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Weighted Probability in Go

3 Answers 3

1. Readability

2. Tests / benchmark

3. Results

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions