I am practicing Go and try to implement Boyer Moore Horspool Algorithm in Go, It returns all ocurrance of a "phrase" in a given text. This code is working, but I would be pleased to have any feedback and suggestions from Go Gurus, about all aspect of my code.
package boyer_moore_horspool_search
type badMatchTable struct{
table [256]int
pattern string
}
func newBadMatchTable(pattern string) *badMatchTable{
b := badMatchTable{
pattern: pattern,
}
b.table = [256]int{}
b.table = b.generateTable()
return &b
}
func (b *badMatchTable) generateTable() [256]int{
for i := 0; i < 256 ; i++ {
b.table[i] = len(b.pattern)
}
lastPatternByte := len(b.pattern) - 1
for i := 0; i < lastPatternByte ; i++ {
b.table[int(b.pattern[i])] = lastPatternByte - i
}
return b.table
}
func Search(text string, pattern string) []int{
var indexes []int
byteText := []byte(text)
bytePattern := []byte(pattern)
textLength := len(byteText)
patternLength := len(bytePattern)
if textLength == 0 || patternLength == 0 || patternLength > textLength{
return indexes
}
lastPatternByte := patternLength - 1
mt := newBadMatchTable(pattern)
index := 0
for index <= (textLength - patternLength) {
for i := lastPatternByte ; byteText[index + i] == bytePattern[i]; i-- {
if i == 0{
indexes = append(indexes, index)
break
}
}
index += mt.table[byteText[index + lastPatternByte]]
}
return indexes
}
And this is the test file:
import (
"fmt"
"testing"
)
func TestSearch(t *testing.T) {
indexes := Search("Hello, we want to find word Guru, so this #phrase has the word Guru.", "word Guru")
fmt.Printf("Occurance: %v\n", len(indexes))
for _, i := range indexes {
fmt.Printf("Index: %v\n", i)
}
}
1 Answer 1
I added
func BenchmarkSearch(b *testing.B) {
c := "Hello, we want to find word Guru, so this #phrase has the word Guru."
s := "word Guru"
for i := 0; i < b.N; i++ {
Search(c, s)
}
}
Then i ran
$ go test -v -bench=. -benchmem -memprofile=mem.out
It yielded
goos: linux
goarch: amd64
pkg: test/boyermoor
BenchmarkSearch-4 2000000 780 ns/op 104 B/op 3 allocs/op
PASS
ok test/boyermoor 2.360s
Now i open pprof to get detailed stats
$ go tool pprof mem.out
(pprof) list Sear
Total: 268.52MB
ROUTINE ======================== test/boyermoor.BenchmarkSearch in /home/mh-cbon/gow/src/test/boyermoor/main_test.go
0 268.52MB (flat, cum) 100% of Total
. . 18:
. . 19:func BenchmarkSearch(b *testing.B) {
. . 20: c := "Hello, we want to find word Guru, so this #phrase has the word Guru."
. . 21: s := "word Guru"
. . 22: for i := 0; i < b.N; i++ {
. 268.52MB 23: Search(c, s)
. . 24: }
. . 25:}
ROUTINE ======================== test/boyermoor.Search in /home/mh-cbon/gow/src/test/boyermoor/main.go
268.52MB 268.52MB (flat, cum) 100% of Total
. . 36:}
. . 37:
. . 38:func Search(text string, pattern string) []int {
. . 39:
. . 40: var indexes []int
208.52MB 208.52MB 41: byteText := []byte(text)
. . 42: bytePattern := []byte(pattern)
. . 43:
. . 44: textLength := len(byteText)
. . 45: patternLength := len(bytePattern)
. . 46:
. . 47: if textLength == 0 || patternLength == 0 || patternLength > textLength {
. . 48: return indexes
. . 49: }
. . 50:
. . 51: lastPatternByte := patternLength - 1
. . 52:
. . 53: mt := newBadMatchTable(pattern)
. . 54: index := 0
. . 55: for index <= (textLength - patternLength) {
. . 56: for i := lastPatternByte; byteText[index+i] == bytePattern[i]; i-- {
. . 57: if i == 0 {
60MB 60MB 58: indexes = append(indexes, index)
. . 59: break
. . 60: }
. . 61: }
. . 62:
. . 63: index += mt.table[byteText[index+lastPatternByte]]
You have allocations that are totally superfluous, given current test case, and they could be avoided by changing the input parameters types.
Lets give it a try
The code is updated to
package main
func main() {
}
type badMatchTable struct {
table [256]int
pattern []byte
}
func newBadMatchTable(pattern []byte) *badMatchTable {
b := badMatchTable{
pattern: pattern,
}
b.table = [256]int{}
b.table = b.generateTable()
return &b
}
func (b *badMatchTable) generateTable() [256]int {
for i := 0; i < 256; i++ {
b.table[i] = len(b.pattern)
}
lastPatternByte := len(b.pattern) - 1
for i := 0; i < lastPatternByte; i++ {
b.table[int(b.pattern[i])] = lastPatternByte - i
}
return b.table
}
func Search(text, pattern []byte) []int {
var indexes []int
textLength := len(text)
patternLength := len(pattern)
if textLength == 0 || patternLength == 0 || patternLength > textLength {
return indexes
}
lastPatternByte := patternLength - 1
mt := newBadMatchTable(pattern)
index := 0
for index <= (textLength - patternLength) {
for i := lastPatternByte; text[index+i] == pattern[i]; i-- {
if i == 0 {
indexes = append(indexes, index)
break
}
}
index += mt.table[text[index+lastPatternByte]]
}
return indexes
}
I behncmarked again
$ go test -v -bench=. -benchmem -memprofile=mem.out
goos: linux
goarch: amd64
pkg: test/boyermoor
BenchmarkSearch-4 2000000 669 ns/op 24 B/op 2 allocs/op
PASS
ok test/boyermoor 2.023s
Those are small improvements.
I change the code again to remove the int slice allocation
package boyermoor
type badMatchTable struct {
table [256]int
pattern []byte
}
func newBadMatchTable(pattern []byte) *badMatchTable {
b := badMatchTable{
pattern: pattern,
}
b.table = [256]int{}
b.table = b.generateTable()
return &b
}
func (b *badMatchTable) generateTable() [256]int {
for i := 0; i < 256; i++ {
b.table[i] = len(b.pattern)
}
lastPatternByte := len(b.pattern) - 1
for i := 0; i < lastPatternByte; i++ {
b.table[int(b.pattern[i])] = lastPatternByte - i
}
return b.table
}
func Search(text, pattern []byte, indexes []int) []int {
// var indexes []int
indexes = indexes[:0]
textLength := len(text)
patternLength := len(pattern)
if textLength == 0 || patternLength == 0 || patternLength > textLength {
return indexes
}
lastPatternByte := patternLength - 1
mt := newBadMatchTable(pattern)
index := 0
for index <= (textLength - patternLength) {
for i := lastPatternByte; text[index+i] == pattern[i]; i-- {
if i == 0 {
indexes = append(indexes, index)
break
}
}
index += mt.table[text[index+lastPatternByte]]
}
return indexes
}
The benchmark is modified to
func BenchmarkSearch(b *testing.B) {
indexes := make([]int, 0, 100)
c := []byte("Hello, we want to find word Guru, so this #phrase has the word Guru.")
s := []byte("word Guru")
for i := 0; i < b.N; i++ {
indexes = Search(c, s, indexes)
}
}
Now i got
goos: linux
goarch: amd64
pkg: test/boyermoor
BenchmarkSearch-4 3000000 597 ns/op 0 B/op 0 allocs/op
PASS
ok test/boyermoor 2.402s
That is more substantial.
In the details, you rewrite the generateTable this way to save a few more ops
func (b *badMatchTable) generateTable() [256]int {
k := len(b.pattern)
for i := 0; i < 256; i++ {
b.table[i] = k
}
lastPatternByte := k - 1
for i := 0; i < lastPatternByte; i++ {
b.table[b.pattern[i]] = lastPatternByte - i
}
return b.table
}
At that point, consider that your algorithm is 1/ cpu bound 2/ a bunch of tight loops. So to get much out of the runtime, squeeze as many instructions as possible.
Given the last state of the code, let just get ride of the badTable
struct. So the whole thing is contained into one function call.
func Search(text, pattern []byte, indexes []int) []int {
// var indexes []int
indexes = indexes[:0]
textLength := len(text)
patternLength := len(pattern)
if textLength == 0 || patternLength == 0 || patternLength > textLength {
return indexes
}
lastPatternByte := patternLength - 1
table := [256]int{}
{
k := len(pattern)
for i := 0; i < 256; i++ {
table[i] = k
}
lastPatternByte := k - 1
for i := 0; i < lastPatternByte; i++ {
table[pattern[i]] = lastPatternByte - i
}
}
index := 0
for index <= (textLength - patternLength) {
for i := lastPatternByte; text[index+i] == pattern[i]; i-- {
if i == 0 {
indexes = append(indexes, index)
break
}
}
index += table[text[index+lastPatternByte]]
}
return indexes
}
And now it gives us
goos: linux
goarch: amd64
pkg: test/boyermoor
BenchmarkSearch-4 5000000 339 ns/op 0 B/op 0 allocs/op
BenchmarkSearch-4 5000000 340 ns/op 0 B/op 0 allocs/op
BenchmarkSearch-4 5000000 341 ns/op 0 B/op 0 allocs/op
BenchmarkSearch-4 5000000 338 ns/op 0 B/op 0 allocs/op
PASS
Now we have a good 2 times improvement.
But this is not all about performance, it is also about usability for other devs. So the API can be presented this way, which produces a decent trade between the two.
type Horspool struct {
table [256]int
indexes []int
}
func (t *Horspool) Search(text, pattern []byte) []int {
table := t.table
indexes := t.indexes
if cap(indexes) < 1 {
indexes = make([]int, 0, 100)
}
indexes = indexes[:0]
textLength := len(text)
patternLength := len(pattern)
if textLength == 0 || patternLength == 0 || patternLength > textLength {
t.indexes = indexes
return indexes
}
lastPatternByte := patternLength - 1
{
for i := 0; i < 256; i++ {
table[i] = patternLength
}
lastPatternByte := patternLength - 1
for i := 0; i < lastPatternByte; i++ {
table[pattern[i]] = lastPatternByte - i
}
}
index := 0
for index <= (textLength - patternLength) {
for i := lastPatternByte; text[index+i] == pattern[i]; i-- {
if i == 0 {
indexes = append(indexes, index)
break
}
}
index += table[text[index+lastPatternByte]]
}
t.indexes = indexes
return indexes
}
It gives me a slightly slower program
BenchmarkSearch-4 5000000 370 ns/op 0 B/op 0 allocs/op
I have not tried to understand as to why it is slower because here is a more interesting case involving struct alignment and padding. See this code,
type Horspool struct {
indexes []int
table [256]int
}
And check the results
BenchmarkSearch-4 3000000 480 ns/op 0 B/op 0 allocs/op
read more: https://dave.cheney.net/2015/10/09/padding-is-hard
FTR, consider this is an old algorithm and that it is not utf-8 valid. Go being utf-8 first, this should be improved.
If my understanding of the algorithm is correct, the fix is rather simple
type Horspool struct {
table map[rune]int
indexes []int
}
func (t *Horspool) Search(text, pattern []rune) []int {
table := t.table
if table == nil {
table = map[rune]int{}
} else {
for r := range table {
delete(table, r)
}
}
indexes := t.indexes
if cap(indexes) < 1 {
indexes = make([]int, 0, 100)
}
indexes = indexes[:0]
textLength := len(text)
patternLength := len(pattern)
if textLength == 0 || patternLength == 0 || patternLength > textLength {
return indexes
}
lastPatternByte := patternLength - 1
{
for _, r := range pattern {
table[r] = patternLength
}
for i := 0; i < lastPatternByte; i++ {
table[pattern[i]] = patternLength - 1 - i
}
}
index := 0
for index <= (textLength - patternLength) {
for i := lastPatternByte; text[index+i] == pattern[i]; i-- {
if i == 0 {
indexes = append(indexes, index)
break
}
}
x, ok := table[text[index+lastPatternByte]]
if ok {
index += x
} else {
index += lastPatternByte
}
}
t.table = table
t.indexes = indexes
return indexes
}
But, take care of maps. https://stackoverflow.com/questions/58475257/map-delete-doesnt-actually-delete-entries
Use a simple counter to free-it-by-allocation regularly.
Finally, as somewhat expected the code is slower by a factor 2
goos: linux
goarch: amd64
pkg: test/boyermoor
BenchmarkSearch-4 2000000 756 ns/op 0 B/op 0 allocs/op
PASS
ok test/boyermoor 2.282s
Two notes:
- There might be more interesting/subtle optimizations that i m not aware of myself, i did the obvious.
-
\$\begingroup\$ Thank you, how do you call your Search func, so it has 0 alloc? I changed my Search fun to your version, but it still has allocs/op. (talking about the version you mentioned you got 0 alloc) \$\endgroup\$Mazdak– Mazdak2019年10月20日 16:26:54 +00:00Commented Oct 20, 2019 at 16:26
-
1\$\begingroup\$ I preallocated it. If you don't preallocate, go runtime will allocate when you
append
. see stackoverflow.com/a/38654841/4466350 (i updated my answer). But anyways, what matter most is that you understand the overall procedure to optimize your go code. \$\endgroup\$mh-cbon– mh-cbon2019年10月20日 16:32:43 +00:00Commented Oct 20, 2019 at 16:32 -
1\$\begingroup\$ you can also check for online resources like github.com/dgryski/go-perfbook to get more ideas \$\endgroup\$mh-cbon– mh-cbon2019年10月20日 16:36:20 +00:00Commented Oct 20, 2019 at 16:36