Simple bitset in Go

Question 1

I was playing with an implemention of sieve of Eratosthenes in Go and realised a memory-efficient solution would benefit from a memory-efficient data structure for storing boolean values at the given location - something like bitset in C++.

I created this simple implementation which is backed by map[int]byte. Initial benchmarks show that it's significantly faster than using map[int]bool for the same purposes. Any feedback is welcome.

package bitset
import "errors"
// BitSet represents a set of booleans.
type BitSet struct {
 buckets map[int]byte
}
// New creates a new BitSet.
func New() *BitSet {
 return &BitSet{make(map[int]byte)}
}
// Set sets the bit at the given location to the given value. For example,
// bitset.Set(18, true) sets the bit at the position 18 to 1.
func (b *BitSet) Set(pos int, value bool) {
 if pos < 0 {
 panic(errors.New("index out of range"))
 }
 index := pos / 8
 bit := uint(pos % 8)
 current, ok := b.buckets[index]
 if value {
 current = current | (1 << bit)
 } else {
 current = current &^ (1 << bit)
 }
 if current != 0 {
 b.buckets[index] = current
 } else if ok {
 delete(b.buckets, index)
 }
}
// Get reads the bit value at the given position.
func (b *BitSet) Get(pos int) bool {
 if pos < 0 {
 panic(errors.New("index out of range"))
 }
 index := pos / 8
 bit := uint(pos % 8)
 if ((b.buckets[index] >> bit) & 1) == 1 {
 return true
 }
 return false
}
// ToSlice returns a slice containing all the bit positions that are set to 1.
func (b *BitSet) ToSlice() (slice []int) {
 for index, value := range b.buckets {
 bit := 0
 for value > 0 {
 if (value & 1) == 1 {
 slice = append(slice, index*8+bit)
 }
 bit++
 value = value >> 1
 }
 }
 return slice
}

Benchmark results:

BenchmarkBitSet-4 100 17703207 ns/op 833918 B/op 492 allocs/op
BenchmarkMap-4 50 25846182 ns/op 3701760 B/op 3985 allocs/op

When both map and bitset are given the correct initial capacity:

BenchmarkBitSet-4 100 18789690 ns/op 605561 B/op 183 allocs/op
BenchmarkMap-4 100 23047036 ns/op 1856182 B/op 1676 allocs/op

Benchmark code (I'm aware there are refinements that can be applied, but I just went with a simple implementation, since the point was to test the bitset):

package bitset
import (
 "testing"
)
func generateSieveBitSet(max int) []int {
 bitset := New()
 for i := 2; i <= max; i++ {
 bitset.Set(i, true)
 }
 for candidate := 2; candidate <= max; candidate++ {
 if !bitset.Get(candidate) {
 continue
 }
 for notPrime := candidate * candidate; notPrime <= max; notPrime += candidate {
 bitset.Set(notPrime, false)
 }
 }
 return bitset.ToSlice()
}
func generateSieveMap(max int) []int {
 bitset := make(map[int]bool)
 for i := 2; i <= max; i++ {
 bitset[i] = true
 }
 for candidate := 2; candidate <= max; candidate++ {
 if !bitset[candidate] {
 continue
 }
 for notPrime := candidate * candidate; notPrime <= max; notPrime += candidate {
 delete(bitset, notPrime)
 }
 }
 primes := make([]int, 0, len(bitset))
 for prime := range bitset {
 primes = append(primes, prime)
 }
 return primes
}
func BenchmarkBitSet(b *testing.B) {
 for n := 0; n < b.N; n++ {
 generateSieveBitSet(100000)
 }
}
func BenchmarkMap(b *testing.B) {
 for n := 0; n < b.N; n++ {
 generateSieveMap(100000)
 }
}

Question 2

Could you provide your implementation of Sieve of Eratosthenes using this code ?

Question 3

You are making unsubstantiated performance claims. Where are your benchmarks?

Question 4

Good point, added benchmark code that uses a sieve and results.

Question 5

Your implementation is really nice, however you could get way better performance with an array of bool!

generateSieve with a []bool:

func generateSieveArray(max int) []int {
 array := make([]bool, max+1)
 for i := 2; i <= max; i++ {
 array[i] = true
 }
 for candidate := 2; candidate <= max; candidate++ {
 if !array[candidate] {
 continue
 }
 for notPrime := candidate * candidate; notPrime <= max; notPrime += candidate {
 array[notPrime] = false
 }
 }
 result := make([]int, 0)
 for i, v := range array {
 if v {
 result = append(result, i)
 }
 }
 return result
}
func BenchmarkArray(b *testing.B) {
 for n := 0; n < b.N; n++ {
 generateSieveArray(100000)
 }
}

and here is the result:

BenchmarkBitSet-2 50 33380755 ns/op 833430 B/op 487 allocs/op
BenchmarkMap-2 50 34550383 ns/op 3699287 B/op 3983 allocs/op
BenchmarkArray-2 2000 915800 ns/op 492792 B/op 21 allocs/op

Almost two order of magnitude faster ! And the code is way easier to understand

Also added a small test to make sure that all generateSieve implementations have the same output:

func TestSieveResult(t *testing.T) {
 nb := 100000
 result := generateSieveBitSet(nb)
 mapResult := generateSieveMap(nb)
 arrayResult := generateSieveArray(nb)
 l := [][]int{result, mapResult, arrayResult}
 for _, s := range l {
 sort.Slice(s, func(i, j int) bool {
 return s[i] < s[j]
 })
 }
 for i, v := range result {
 if v != mapResult[i] || v != arrayResult[i] {
 t.Fail()
 }
 }
}

Question 6

Interesting - it seems that a bool slice trumps over even a modified bitset implementation that uses a byte slice. That said, I was looking more for tips on how to improve my code rather than how to write an efficient sieve - if you have any tips, it'd be appreciated.

felix felix 6085 silver badges16 bronze badges · Answer 1 · 2018-01-30 09:28:14Z

Your implementation is really nice, however you could get way better performance with an array of bool!

generateSieve with a []bool:

func generateSieveArray(max int) []int {
 array := make([]bool, max+1)
 for i := 2; i <= max; i++ {
 array[i] = true
 }
 for candidate := 2; candidate <= max; candidate++ {
 if !array[candidate] {
 continue
 }
 for notPrime := candidate * candidate; notPrime <= max; notPrime += candidate {
 array[notPrime] = false
 }
 }
 result := make([]int, 0)
 for i, v := range array {
 if v {
 result = append(result, i)
 }
 }
 return result
}
func BenchmarkArray(b *testing.B) {
 for n := 0; n < b.N; n++ {
 generateSieveArray(100000)
 }
}

and here is the result:

BenchmarkBitSet-2 50 33380755 ns/op 833430 B/op 487 allocs/op
BenchmarkMap-2 50 34550383 ns/op 3699287 B/op 3983 allocs/op
BenchmarkArray-2 2000 915800 ns/op 492792 B/op 21 allocs/op

Almost two order of magnitude faster ! And the code is way easier to understand

Also added a small test to make sure that all generateSieve implementations have the same output:

func TestSieveResult(t *testing.T) {
 nb := 100000
 result := generateSieveBitSet(nb)
 mapResult := generateSieveMap(nb)
 arrayResult := generateSieveArray(nb)
 l := [][]int{result, mapResult, arrayResult}
 for _, s := range l {
 sort.Slice(s, func(i, j int) bool {
 return s[i] < s[j]
 })
 }
 for i, v := range result {
 if v != mapResult[i] || v != arrayResult[i] {
 t.Fail()
 }
 }
}

Interesting - it seems that a bool slice trumps over even a modified bitset implementation that uses a byte slice. That said, I was looking more for tips on how to improve my code rather than how to write an efficient sieve - if you have any tips, it'd be appreciated.

Stack Exchange Network

Simple bitset in Go

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Simple bitset in Go

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions