1
\$\begingroup\$

I was playing with an implemention of sieve of Eratosthenes in Go and realised a memory-efficient solution would benefit from a memory-efficient data structure for storing boolean values at the given location - something like bitset in C++.

I created this simple implementation which is backed by map[int]byte. Initial benchmarks show that it's significantly faster than using map[int]bool for the same purposes. Any feedback is welcome.

package bitset
import "errors"
// BitSet represents a set of booleans.
type BitSet struct {
 buckets map[int]byte
}
// New creates a new BitSet.
func New() *BitSet {
 return &BitSet{make(map[int]byte)}
}
// Set sets the bit at the given location to the given value. For example,
// bitset.Set(18, true) sets the bit at the position 18 to 1.
func (b *BitSet) Set(pos int, value bool) {
 if pos < 0 {
 panic(errors.New("index out of range"))
 }
 index := pos / 8
 bit := uint(pos % 8)
 current, ok := b.buckets[index]
 if value {
 current = current | (1 << bit)
 } else {
 current = current &^ (1 << bit)
 }
 if current != 0 {
 b.buckets[index] = current
 } else if ok {
 delete(b.buckets, index)
 }
}
// Get reads the bit value at the given position.
func (b *BitSet) Get(pos int) bool {
 if pos < 0 {
 panic(errors.New("index out of range"))
 }
 index := pos / 8
 bit := uint(pos % 8)
 if ((b.buckets[index] >> bit) & 1) == 1 {
 return true
 }
 return false
}
// ToSlice returns a slice containing all the bit positions that are set to 1.
func (b *BitSet) ToSlice() (slice []int) {
 for index, value := range b.buckets {
 bit := 0
 for value > 0 {
 if (value & 1) == 1 {
 slice = append(slice, index*8+bit)
 }
 bit++
 value = value >> 1
 }
 }
 return slice
}

Benchmark results:

BenchmarkBitSet-4 100 17703207 ns/op 833918 B/op 492 allocs/op
BenchmarkMap-4 50 25846182 ns/op 3701760 B/op 3985 allocs/op

When both map and bitset are given the correct initial capacity:

BenchmarkBitSet-4 100 18789690 ns/op 605561 B/op 183 allocs/op
BenchmarkMap-4 100 23047036 ns/op 1856182 B/op 1676 allocs/op

Benchmark code (I'm aware there are refinements that can be applied, but I just went with a simple implementation, since the point was to test the bitset):

package bitset
import (
 "testing"
)
func generateSieveBitSet(max int) []int {
 bitset := New()
 for i := 2; i <= max; i++ {
 bitset.Set(i, true)
 }
 for candidate := 2; candidate <= max; candidate++ {
 if !bitset.Get(candidate) {
 continue
 }
 for notPrime := candidate * candidate; notPrime <= max; notPrime += candidate {
 bitset.Set(notPrime, false)
 }
 }
 return bitset.ToSlice()
}
func generateSieveMap(max int) []int {
 bitset := make(map[int]bool)
 for i := 2; i <= max; i++ {
 bitset[i] = true
 }
 for candidate := 2; candidate <= max; candidate++ {
 if !bitset[candidate] {
 continue
 }
 for notPrime := candidate * candidate; notPrime <= max; notPrime += candidate {
 delete(bitset, notPrime)
 }
 }
 primes := make([]int, 0, len(bitset))
 for prime := range bitset {
 primes = append(primes, prime)
 }
 return primes
}
func BenchmarkBitSet(b *testing.B) {
 for n := 0; n < b.N; n++ {
 generateSieveBitSet(100000)
 }
}
func BenchmarkMap(b *testing.B) {
 for n := 0; n < b.N; n++ {
 generateSieveMap(100000)
 }
}
asked Jan 21, 2018 at 21:33
\$\endgroup\$
3
  • \$\begingroup\$ Could you provide your implementation of Sieve of Eratosthenes using this code ? \$\endgroup\$ Commented Jan 22, 2018 at 14:09
  • \$\begingroup\$ You are making unsubstantiated performance claims. Where are your benchmarks? \$\endgroup\$ Commented Jan 22, 2018 at 18:06
  • \$\begingroup\$ Good point, added benchmark code that uses a sieve and results. \$\endgroup\$ Commented Jan 27, 2018 at 22:10

1 Answer 1

1
\$\begingroup\$

Your implementation is really nice, however you could get way better performance with an array of bool!

generateSieve with a []bool:

func generateSieveArray(max int) []int {
 array := make([]bool, max+1)
 for i := 2; i <= max; i++ {
 array[i] = true
 }
 for candidate := 2; candidate <= max; candidate++ {
 if !array[candidate] {
 continue
 }
 for notPrime := candidate * candidate; notPrime <= max; notPrime += candidate {
 array[notPrime] = false
 }
 }
 result := make([]int, 0)
 for i, v := range array {
 if v {
 result = append(result, i)
 }
 }
 return result
}
func BenchmarkArray(b *testing.B) {
 for n := 0; n < b.N; n++ {
 generateSieveArray(100000)
 }
}

and here is the result:

BenchmarkBitSet-2 50 33380755 ns/op 833430 B/op 487 allocs/op
BenchmarkMap-2 50 34550383 ns/op 3699287 B/op 3983 allocs/op
BenchmarkArray-2 2000 915800 ns/op 492792 B/op 21 allocs/op

Almost two order of magnitude faster ! And the code is way easier to understand

Also added a small test to make sure that all generateSieve implementations have the same output:

func TestSieveResult(t *testing.T) {
 nb := 100000
 result := generateSieveBitSet(nb)
 mapResult := generateSieveMap(nb)
 arrayResult := generateSieveArray(nb)
 l := [][]int{result, mapResult, arrayResult}
 for _, s := range l {
 sort.Slice(s, func(i, j int) bool {
 return s[i] < s[j]
 })
 }
 for i, v := range result {
 if v != mapResult[i] || v != arrayResult[i] {
 t.Fail()
 }
 }
}
answered Jan 30, 2018 at 9:28
\$\endgroup\$
1
  • \$\begingroup\$ Interesting - it seems that a bool slice trumps over even a modified bitset implementation that uses a byte slice. That said, I was looking more for tips on how to improve my code rather than how to write an efficient sieve - if you have any tips, it'd be appreciated. \$\endgroup\$ Commented Feb 1, 2018 at 21:44

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.