7
\$\begingroup\$

I am practicing Go and try to implement Boyer Moore Horspool Algorithm in Go, It returns all ocurrance of a "phrase" in a given text. This code is working, but I would be pleased to have any feedback and suggestions from Go Gurus, about all aspect of my code.

package boyer_moore_horspool_search
type badMatchTable struct{
 table [256]int
 pattern string
}
func newBadMatchTable(pattern string) *badMatchTable{
 b := badMatchTable{
 pattern: pattern,
 }
 b.table = [256]int{}
 b.table = b.generateTable()
 return &b
}
func (b *badMatchTable) generateTable() [256]int{
 for i := 0; i < 256 ; i++ {
 b.table[i] = len(b.pattern)
 }
 lastPatternByte := len(b.pattern) - 1
 for i := 0; i < lastPatternByte ; i++ {
 b.table[int(b.pattern[i])] = lastPatternByte - i
 }
 return b.table
}
func Search(text string, pattern string) []int{
 var indexes []int
 byteText := []byte(text)
 bytePattern := []byte(pattern)
 textLength := len(byteText)
 patternLength := len(bytePattern)
 if textLength == 0 || patternLength == 0 || patternLength > textLength{
 return indexes
 }
 lastPatternByte := patternLength - 1
 mt := newBadMatchTable(pattern)
 index := 0
 for index <= (textLength - patternLength) {
 for i := lastPatternByte ; byteText[index + i] == bytePattern[i]; i-- {
 if i == 0{
 indexes = append(indexes, index)
 break
 }
 }
 index += mt.table[byteText[index + lastPatternByte]]
 }
 return indexes
}

And this is the test file:

import (
 "fmt"
 "testing"
)
func TestSearch(t *testing.T) {
 indexes := Search("Hello, we want to find word Guru, so this #phrase has the word Guru.", "word Guru")
 fmt.Printf("Occurance: %v\n", len(indexes))
 for _, i := range indexes {
 fmt.Printf("Index: %v\n", i)
 }
}
Edward
67.2k4 gold badges120 silver badges284 bronze badges
asked Oct 19, 2019 at 14:22
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

I added

func BenchmarkSearch(b *testing.B) {
 c := "Hello, we want to find word Guru, so this #phrase has the word Guru."
 s := "word Guru"
 for i := 0; i < b.N; i++ {
 Search(c, s)
 }
}

Then i ran

$ go test -v -bench=. -benchmem -memprofile=mem.out

It yielded

goos: linux
goarch: amd64
pkg: test/boyermoor
BenchmarkSearch-4 2000000 780 ns/op 104 B/op 3 allocs/op
PASS
ok test/boyermoor 2.360s

Now i open pprof to get detailed stats

$ go tool pprof mem.out
(pprof) list Sear
Total: 268.52MB
ROUTINE ======================== test/boyermoor.BenchmarkSearch in /home/mh-cbon/gow/src/test/boyermoor/main_test.go
 0 268.52MB (flat, cum) 100% of Total
 . . 18:
 . . 19:func BenchmarkSearch(b *testing.B) {
 . . 20: c := "Hello, we want to find word Guru, so this #phrase has the word Guru."
 . . 21: s := "word Guru"
 . . 22: for i := 0; i < b.N; i++ {
 . 268.52MB 23: Search(c, s)
 . . 24: }
 . . 25:}
ROUTINE ======================== test/boyermoor.Search in /home/mh-cbon/gow/src/test/boyermoor/main.go
 268.52MB 268.52MB (flat, cum) 100% of Total
 . . 36:}
 . . 37:
 . . 38:func Search(text string, pattern string) []int {
 . . 39:
 . . 40: var indexes []int
 208.52MB 208.52MB 41: byteText := []byte(text)
 . . 42: bytePattern := []byte(pattern)
 . . 43:
 . . 44: textLength := len(byteText)
 . . 45: patternLength := len(bytePattern)
 . . 46:
 . . 47: if textLength == 0 || patternLength == 0 || patternLength > textLength {
 . . 48: return indexes
 . . 49: }
 . . 50:
 . . 51: lastPatternByte := patternLength - 1
 . . 52:
 . . 53: mt := newBadMatchTable(pattern)
 . . 54: index := 0
 . . 55: for index <= (textLength - patternLength) {
 . . 56: for i := lastPatternByte; byteText[index+i] == bytePattern[i]; i-- {
 . . 57: if i == 0 {
 60MB 60MB 58: indexes = append(indexes, index)
 . . 59: break
 . . 60: }
 . . 61: }
 . . 62:
 . . 63: index += mt.table[byteText[index+lastPatternByte]]

You have allocations that are totally superfluous, given current test case, and they could be avoided by changing the input parameters types.

Lets give it a try

The code is updated to

package main
func main() {
}
type badMatchTable struct {
 table [256]int
 pattern []byte
}
func newBadMatchTable(pattern []byte) *badMatchTable {
 b := badMatchTable{
 pattern: pattern,
 }
 b.table = [256]int{}
 b.table = b.generateTable()
 return &b
}
func (b *badMatchTable) generateTable() [256]int {
 for i := 0; i < 256; i++ {
 b.table[i] = len(b.pattern)
 }
 lastPatternByte := len(b.pattern) - 1
 for i := 0; i < lastPatternByte; i++ {
 b.table[int(b.pattern[i])] = lastPatternByte - i
 }
 return b.table
}
func Search(text, pattern []byte) []int {
 var indexes []int
 textLength := len(text)
 patternLength := len(pattern)
 if textLength == 0 || patternLength == 0 || patternLength > textLength {
 return indexes
 }
 lastPatternByte := patternLength - 1
 mt := newBadMatchTable(pattern)
 index := 0
 for index <= (textLength - patternLength) {
 for i := lastPatternByte; text[index+i] == pattern[i]; i-- {
 if i == 0 {
 indexes = append(indexes, index)
 break
 }
 }
 index += mt.table[text[index+lastPatternByte]]
 }
 return indexes
}

I behncmarked again

$ go test -v -bench=. -benchmem -memprofile=mem.out
goos: linux
goarch: amd64
pkg: test/boyermoor
BenchmarkSearch-4 2000000 669 ns/op 24 B/op 2 allocs/op
PASS
ok test/boyermoor 2.023s

Those are small improvements.

I change the code again to remove the int slice allocation

package boyermoor
type badMatchTable struct {
 table [256]int
 pattern []byte
}
func newBadMatchTable(pattern []byte) *badMatchTable {
 b := badMatchTable{
 pattern: pattern,
 }
 b.table = [256]int{}
 b.table = b.generateTable()
 return &b
}
func (b *badMatchTable) generateTable() [256]int {
 for i := 0; i < 256; i++ {
 b.table[i] = len(b.pattern)
 }
 lastPatternByte := len(b.pattern) - 1
 for i := 0; i < lastPatternByte; i++ {
 b.table[int(b.pattern[i])] = lastPatternByte - i
 }
 return b.table
}
func Search(text, pattern []byte, indexes []int) []int {
 // var indexes []int
 indexes = indexes[:0]
 textLength := len(text)
 patternLength := len(pattern)
 if textLength == 0 || patternLength == 0 || patternLength > textLength {
 return indexes
 }
 lastPatternByte := patternLength - 1
 mt := newBadMatchTable(pattern)
 index := 0
 for index <= (textLength - patternLength) {
 for i := lastPatternByte; text[index+i] == pattern[i]; i-- {
 if i == 0 {
 indexes = append(indexes, index)
 break
 }
 }
 index += mt.table[text[index+lastPatternByte]]
 }
 return indexes
}

The benchmark is modified to

func BenchmarkSearch(b *testing.B) {
 indexes := make([]int, 0, 100)
 c := []byte("Hello, we want to find word Guru, so this #phrase has the word Guru.")
 s := []byte("word Guru")
 for i := 0; i < b.N; i++ {
 indexes = Search(c, s, indexes)
 }
}

Now i got

goos: linux
goarch: amd64
pkg: test/boyermoor
BenchmarkSearch-4 3000000 597 ns/op 0 B/op 0 allocs/op
PASS
ok test/boyermoor 2.402s

That is more substantial.

In the details, you rewrite the generateTable this way to save a few more ops

func (b *badMatchTable) generateTable() [256]int {
 k := len(b.pattern)
 for i := 0; i < 256; i++ {
 b.table[i] = k
 }
 lastPatternByte := k - 1
 for i := 0; i < lastPatternByte; i++ {
 b.table[b.pattern[i]] = lastPatternByte - i
 }
 return b.table
}

At that point, consider that your algorithm is 1/ cpu bound 2/ a bunch of tight loops. So to get much out of the runtime, squeeze as many instructions as possible.

Given the last state of the code, let just get ride of the badTable struct. So the whole thing is contained into one function call.

func Search(text, pattern []byte, indexes []int) []int {
 // var indexes []int
 indexes = indexes[:0]
 textLength := len(text)
 patternLength := len(pattern)
 if textLength == 0 || patternLength == 0 || patternLength > textLength {
 return indexes
 }
 lastPatternByte := patternLength - 1
 table := [256]int{}
 {
 k := len(pattern)
 for i := 0; i < 256; i++ {
 table[i] = k
 }
 lastPatternByte := k - 1
 for i := 0; i < lastPatternByte; i++ {
 table[pattern[i]] = lastPatternByte - i
 }
 }
 index := 0
 for index <= (textLength - patternLength) {
 for i := lastPatternByte; text[index+i] == pattern[i]; i-- {
 if i == 0 {
 indexes = append(indexes, index)
 break
 }
 }
 index += table[text[index+lastPatternByte]]
 }
 return indexes
}

And now it gives us

goos: linux
goarch: amd64
pkg: test/boyermoor
BenchmarkSearch-4 5000000 339 ns/op 0 B/op 0 allocs/op
BenchmarkSearch-4 5000000 340 ns/op 0 B/op 0 allocs/op
BenchmarkSearch-4 5000000 341 ns/op 0 B/op 0 allocs/op
BenchmarkSearch-4 5000000 338 ns/op 0 B/op 0 allocs/op
PASS

Now we have a good 2 times improvement.

But this is not all about performance, it is also about usability for other devs. So the API can be presented this way, which produces a decent trade between the two.

type Horspool struct {
 table [256]int
 indexes []int
}
func (t *Horspool) Search(text, pattern []byte) []int {
 table := t.table
 indexes := t.indexes
 if cap(indexes) < 1 {
 indexes = make([]int, 0, 100)
 }
 indexes = indexes[:0]
 textLength := len(text)
 patternLength := len(pattern)
 if textLength == 0 || patternLength == 0 || patternLength > textLength {
 t.indexes = indexes
 return indexes
 }
 lastPatternByte := patternLength - 1
 {
 for i := 0; i < 256; i++ {
 table[i] = patternLength
 }
 lastPatternByte := patternLength - 1
 for i := 0; i < lastPatternByte; i++ {
 table[pattern[i]] = lastPatternByte - i
 }
 }
 index := 0
 for index <= (textLength - patternLength) {
 for i := lastPatternByte; text[index+i] == pattern[i]; i-- {
 if i == 0 {
 indexes = append(indexes, index)
 break
 }
 }
 index += table[text[index+lastPatternByte]]
 }
 t.indexes = indexes
 return indexes
}

It gives me a slightly slower program

BenchmarkSearch-4 5000000 370 ns/op 0 B/op 0 allocs/op

I have not tried to understand as to why it is slower because here is a more interesting case involving struct alignment and padding. See this code,

type Horspool struct {
 indexes []int
 table [256]int
}

And check the results

BenchmarkSearch-4 3000000 480 ns/op 0 B/op 0 allocs/op

read more: https://dave.cheney.net/2015/10/09/padding-is-hard

FTR, consider this is an old algorithm and that it is not utf-8 valid. Go being utf-8 first, this should be improved.

If my understanding of the algorithm is correct, the fix is rather simple

type Horspool struct {
 table map[rune]int
 indexes []int
}
func (t *Horspool) Search(text, pattern []rune) []int {
 table := t.table
 if table == nil {
 table = map[rune]int{}
 } else {
 for r := range table {
 delete(table, r)
 }
 }
 indexes := t.indexes
 if cap(indexes) < 1 {
 indexes = make([]int, 0, 100)
 }
 indexes = indexes[:0]
 textLength := len(text)
 patternLength := len(pattern)
 if textLength == 0 || patternLength == 0 || patternLength > textLength {
 return indexes
 }
 lastPatternByte := patternLength - 1
 {
 for _, r := range pattern {
 table[r] = patternLength
 }
 for i := 0; i < lastPatternByte; i++ {
 table[pattern[i]] = patternLength - 1 - i
 }
 }
 index := 0
 for index <= (textLength - patternLength) {
 for i := lastPatternByte; text[index+i] == pattern[i]; i-- {
 if i == 0 {
 indexes = append(indexes, index)
 break
 }
 }
 x, ok := table[text[index+lastPatternByte]]
 if ok {
 index += x
 } else {
 index += lastPatternByte
 }
 }
 t.table = table
 t.indexes = indexes
 return indexes
}

But, take care of maps. https://stackoverflow.com/questions/58475257/map-delete-doesnt-actually-delete-entries

Use a simple counter to free-it-by-allocation regularly.

Finally, as somewhat expected the code is slower by a factor 2

goos: linux
goarch: amd64
pkg: test/boyermoor
BenchmarkSearch-4 2000000 756 ns/op 0 B/op 0 allocs/op
PASS
ok test/boyermoor 2.282s

Two notes:

  • There might be more interesting/subtle optimizations that i m not aware of myself, i did the obvious.
answered Oct 20, 2019 at 14:40
\$\endgroup\$
3
  • \$\begingroup\$ Thank you, how do you call your Search func, so it has 0 alloc? I changed my Search fun to your version, but it still has allocs/op. (talking about the version you mentioned you got 0 alloc) \$\endgroup\$ Commented Oct 20, 2019 at 16:26
  • 1
    \$\begingroup\$ I preallocated it. If you don't preallocate, go runtime will allocate when you append. see stackoverflow.com/a/38654841/4466350 (i updated my answer). But anyways, what matter most is that you understand the overall procedure to optimize your go code. \$\endgroup\$ Commented Oct 20, 2019 at 16:32
  • 1
    \$\begingroup\$ you can also check for online resources like github.com/dgryski/go-perfbook to get more ideas \$\endgroup\$ Commented Oct 20, 2019 at 16:36

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.