I wrote some code to determine the additions and deletions that happened to a sorted slice after each (sorted) update.
Git repository (includes some tests)
// Package slicediff is a utility to determine the additions and deletions that happened to a sorted slice after each update.
//
// All the slices are assumed to be sorted!
package slicediff
import "container/list"
// SliceDiff stores the current state of the sorted slice
type SliceDiff struct {
l *list.List
}
// New creates a new SliceDiff
func New() *SliceDiff {
return &SliceDiff{
list.New(),
}
}
// Append appends a slice at the end of the SliceDiff
//
// sa is assumed to be sorted
func (sd *SliceDiff) Append(sa []string) {
for _, s := range sa {
sd.l.PushBack(s)
}
}
// SortedDiff compares the updated slice with l and returns the additions and deletions performed
//
// updated is assumed to be sorted
func (sd *SliceDiff) SortedDiff(updated []string) (additions []string, deletions []string) {
if sd.l.Len() == 0 {
sd.Append(updated)
additions = make([]string, len(updated))
copy(additions, updated)
return additions, make([]string, 0)
}
e := sd.l.Front()
additions = make([]string, 0)
deletions = make([]string, 0)
for _, s := range updated {
// Delete the small elements at the beginning of the list
for e != nil && e.Value.(string) < s {
deletions = append(deletions, e.Value.(string))
e = removeAndGetNext(sd.l, e)
}
if e == nil {
// End of list is empty: simply push it
additions = append(additions, s)
sd.l.PushBack(s)
} else if s == e.Value.(string) {
// Same as current element: skip it
e = e.Next()
} else {
// Smaller than current element: insert it
additions = append(additions, s)
sd.l.InsertBefore(s, e)
}
}
// delete end of the list
for e != nil {
deletions = append(deletions, e.Value.(string))
e = removeAndGetNext(sd.l, e)
}
return additions, deletions
}
func removeAndGetNext(l *list.List, e *list.Element) (next *list.Element) {
next = e.Next()
l.Remove(e)
return
}
Example:
// Create a new SliceDiff
sd := New()
// Add some initial elements
sd.Append([]string{
"a",
"b",
"d",
})
// Perform an update
update := []string{
"a",
"b",
"c",
}
add, del := sd.SortedDiff(update)
// Get the diff
fmt.Println(add, del)
// Output: [c] [d]
I'm interested to know if my implementation is correct (all the tests are passing, but maybe I missed something) and if my "Go-style" can be improved.
1 Answer 1
Unnecessary optimization
You can safely remove this block of code without changing the behavior of the program:
if sd.l.Len() == 0 { sd.Append(updated) additions = make([]string, len(updated)) copy(additions, updated) return additions, make([]string, 0) }
Possible optimization
When you append to additions
and deletions
,
reallocation might occur.
If you don't mind spending a little extra space,
you could ensure that these slices have enough capacity to avoid reallocation.
You know that the maximum number of additions is the size of the input,
and the maximum number of deletions is the size of the current content:
additions = make([]string, 0, len(updated))
deletions = make([]string, 0, sd.l.Len())
Shorter format declaration
Instead of this:
func (sd *SliceDiff) SortedDiff(updated []string) (additions []string, deletions []string) {
You can use this shorter form:
func (sd *SliceDiff) SortedDiff(updated []string) (additions, deletions []string) {
Naming
Short names are very popular and encouraged in Go.
But l
is really the worst of all single-letter variable names.
In addition to the usual badness of these variables,
depending on the viewer program,
l
might be indistinguishable from 1
or I
or |
.
I suggest to never use l
as a symbol name.
Alternative implementation
It seems to me that the use of container/List
doesn't really buy you much.
It would be simpler and clearer to implement without it.
Consider this alternative implementation using only slices:
// Package slicediff is a utility to determine the additions and deletions that happened to a sorted slice after each update.
//
// All the slices are assumed to be sorted!
package slicediff
// SliceDiff stores the current state of the sorted slice
type SliceDiff struct {
content []string
}
// New creates a new SliceDiff
func New() *SliceDiff {
return &SliceDiff{}
}
// Append appends a slice at the end of the SliceDiff
//
// sa is assumed to be sorted
func (sd *SliceDiff) Append(sa []string) {
sd.content = append(sd.content, sa...)
}
func diff(s1, s2 []string) (added, deleted []string) {
len1, len2 := len(s1), len(s2)
added = make([]string, 0, len2)
deleted = make([]string, 0, len1)
for i1, i2 := 0, 0; i1 < len1 || i2 < len2; {
if i1 == len1 {
added = append(added, s2[i2:]...)
return
} else if i2 == len2 {
deleted = append(deleted, s1[i1:]...)
return
}
if s1[i1] < s2[i2] {
deleted = append(deleted, s1[i1])
i1++
} else if s1[i1] > s2[i2] {
added = append(added, s2[i2])
i2++
} else {
i1++
i2++
}
}
return
}
// SortedDiff compares the updated slice with l and returns the additions and deletions performed
//
// updated is assumed to be sorted
func (sd *SliceDiff) SortedDiff(updated []string) (additions, deletions []string) {
additions, deletions = diff(sd.content, updated)
sd.content = updated
return
}
-
\$\begingroup\$ Thank you for your suggestions, I just implemented them in the git repository! Naming is hard: I replaced
l
withs
. I chose not to use aslice
as internal representation, because I think that deleting an element in the middle is quite expensive (the rest of the slice will be re-written). My use-case is to watch a huge list of folder and be alerted when one appears or disappear (it happens one folder at a time). \$\endgroup\$oliverpool– oliverpool2016年12月02日 08:27:30 +00:00Commented Dec 2, 2016 at 8:27 -
\$\begingroup\$ I see your point, that is indeed a valid justification. \$\endgroup\$janos– janos2016年12月02日 09:11:22 +00:00Commented Dec 2, 2016 at 9:11
go test
will run allTest***
functions - if nothing callst.Error
ort.Fail
it is considered passing) \$\endgroup\$