I'm using this right now to count the number of discrete values in a given []string
. Is there a faster way to do this?
func CountDiscrete(x []string) int {
discrete := 0
for i,xi := range x {
is_discrete := true
for j,xj := range x {
if i != j {
if xi == xj {
is_discrete = false
break
}
}
}
if is_discrete {
discrete = discrete + 1
}
}
return discrete
}
-
2\$\begingroup\$ If you're downvoting the question can you please leave a comment explaining why so that I can make any necessary changes or remove the question if it is not within the guidelines of this forum \$\endgroup\$Nathan F.– Nathan F.2020年03月06日 20:50:57 +00:00Commented Mar 6, 2020 at 20:50
2 Answers 2
Code review:
1. Use discrete++
instead of discrete = discrete + 1
2. Use if i != j && xi == xj
instead of
if i != j {
if xi == xj {
is_discrete = false
break
}
}
Don't use underscores in Go names:
is_discrete
, simply usediscrete
orisDiscrete
.You may use
slice ...string
instead ofslice []string
like the following code:
// O(n**2)
func distincts(x ...string) int {
result := 0
for i, xi := range x {
unique := 1
for j, xj := range x {
if i != j && xi == xj {
unique = 0
break
}
}
result += unique
}
return result
}
Also note the result += unique
instead of if ...
.
- Using map, the following code has the time complexity (asymptotic notation):
O(n)
package main
import "fmt"
func main() {
r := distincts("a", "b", "c", "a")
fmt.Println(r) // 2
}
// O(n)
func distincts(slice ...string) int {
result := 0
m := map[string]int{}
for _, str := range slice {
m[str]++
}
for _, v := range m {
if v == 1 {
result++
}
}
return result
}
You're currently iterating over the slice each once, and then again for each value. That's understandable, but not ideal. Why not simply sort the data, and then check whether or not the next value is the same as the current one?
func distinct(values []string) int {
if len(values) == 0 {
return 0
}
// the loop doesn't count the last value
// add if the slice isn't empty
distinct := 1
// sort the values
sort.Strings(values)
// stop iterating at the next to last element
for i := 0; i < len(values)-1; i++ {
// if the current value is distinct from the next, this is a unique word
if values[i] != values[i+1] {
distinct++
}
}
return distinct
}
Other, more general comments on your code:
- Variables like
is_discrete
is not in line with golang's coding conventions. Golang preferscamelCased
variable names.isDiscrete
. - Your nested
if
checkingi != j
and thenxi == xj
can, and should be combined:if i != j && xi == xj
- Personally, I'd probably use the indexes, and just use
for i := range x
andfor j := range x
. Although it doesn't make the code perform any better, it does get rid of that rather clunky looking
if isDiscrete
, you could use either one of the following snippets:for i := range x { inc := 1 for j := range x { for i != j && x[i] == x[j] { inc = 0 break } } discrete += inc }
You could also increment discrete
beforehand, and just replace inc = 0
with discrete--
in the inner loop.
While we're on the subjects of other approaches, while not efficient, it is a very easy approach:
func Discrete(vals []string) int {
unique := map[string]struct{}{}
for i := range vals {
unique[vals[i]] = struct{}{}
}
return len(unique)
}
In this code, any duplicate string value is just reassigning the same key in the map. At the end, getting the length of the map will give you the total of unique string values. Maps cause memory allocations, so it's not the most efficient way, but it's a solid way to test your code. You can pass some data to your function, and use a map to calculate what the return for any given slice of strings should be.
Last thoughts on efficiency: the golang toolchain packs a couple of really handy tools that will help you determine which approach is more efficient (writing benchmarks). If memory consumption is a worry, you can use pprof
. Not just to determine how much memory any given piece of code uses, but also which part of the code is the most likely bottleneck.
-
\$\begingroup\$ "Why not simply sort the data?" Because sorting is O(n log n). While O(n log n) is better than O(n*n), we want O(n). See Wikipedia: Big O notation. See texts on algorithms, for example, Sedgewick, CLRS, Knuth. \$\endgroup\$peterSO– peterSO2020年03月15日 13:48:14 +00:00Commented Mar 15, 2020 at 13:48
-
\$\begingroup\$ @peterSO: Compared to the OP's implementation (which is
O(n*n)
, and requires manually writing more code), simply usingsort
is easier, and requires less code. \$\endgroup\$Elias Van Ootegem– Elias Van Ootegem2020年03月16日 10:46:24 +00:00Commented Mar 16, 2020 at 10:46