Count the number of discrete values in a slice

Question 1

I'm using this right now to count the number of discrete values in a given []string. Is there a faster way to do this?

func CountDiscrete(x []string) int {
 discrete := 0
 for i,xi := range x {
 is_discrete := true
 for j,xj := range x {
 if i != j {
 if xi == xj {
 is_discrete = false
 break
 }
 }
 }
 if is_discrete {
 discrete = discrete + 1
 }
 }
 return discrete
}

Question 2

If you're downvoting the question can you please leave a comment explaining why so that I can make any necessary changes or remove the question if it is not within the guidelines of this forum

Question 3

Code review:
1. Use discrete++ instead of discrete = discrete + 1
2. Use if i != j && xi == xj instead of

if i != j {
 if xi == xj {
 is_discrete = false
 break
 }
}

Don't use underscores in Go names: is_discrete, simply use discrete or isDiscrete.
You may use slice ...string instead of slice []string like the following code:

// O(n**2)
func distincts(x ...string) int {
 result := 0
 for i, xi := range x {
 unique := 1
 for j, xj := range x {
 if i != j && xi == xj {
 unique = 0
 break
 }
 }
 result += unique
 }
 return result
}

Also note the result += unique instead of if ....

Using map, the following code has the time complexity (asymptotic notation): O(n)

package main
import "fmt"
func main() {
 r := distincts("a", "b", "c", "a")
 fmt.Println(r) // 2
}
// O(n)
func distincts(slice ...string) int {
 result := 0
 m := map[string]int{}
 for _, str := range slice {
 m[str]++
 }
 for _, v := range m {
 if v == 1 {
 result++
 }
 }
 return result
}

Question 4

You're currently iterating over the slice each once, and then again for each value. That's understandable, but not ideal. Why not simply sort the data, and then check whether or not the next value is the same as the current one?

func distinct(values []string) int {
 if len(values) == 0 {
 return 0
 }
 // the loop doesn't count the last value
 // add if the slice isn't empty
 distinct := 1
 // sort the values
 sort.Strings(values)
 // stop iterating at the next to last element
 for i := 0; i < len(values)-1; i++ {
 // if the current value is distinct from the next, this is a unique word
 if values[i] != values[i+1] {
 distinct++
 }
 }
 return distinct
}

Other, more general comments on your code:

Variables like is_discrete is not in line with golang's coding conventions. Golang prefers camelCased variable names. isDiscrete.
Your nested if checking i != j and then xi == xj can, and should be combined: if i != j && xi == xj
Personally, I'd probably use the indexes, and just use for i := range x and for j := range x.
Although it doesn't make the code perform any better, it does get rid of that rather clunky looking if isDiscrete, you could use either one of the following snippets:

for i := range x { inc := 1 for j := range x { for i != j && x[i] == x[j] { inc = 0 break } } discrete += inc }

You could also increment discrete beforehand, and just replace inc = 0 with discrete-- in the inner loop.

While we're on the subjects of other approaches, while not efficient, it is a very easy approach:

func Discrete(vals []string) int {
 unique := map[string]struct{}{}
 for i := range vals {
 unique[vals[i]] = struct{}{}
 }
 return len(unique)
}

In this code, any duplicate string value is just reassigning the same key in the map. At the end, getting the length of the map will give you the total of unique string values. Maps cause memory allocations, so it's not the most efficient way, but it's a solid way to test your code. You can pass some data to your function, and use a map to calculate what the return for any given slice of strings should be.

Last thoughts on efficiency: the golang toolchain packs a couple of really handy tools that will help you determine which approach is more efficient (writing benchmarks). If memory consumption is a worry, you can use pprof. Not just to determine how much memory any given piece of code uses, but also which part of the code is the most likely bottleneck.

Question 5

"Why not simply sort the data?" Because sorting is O(n log n). While O(n log n) is better than O(n*n), we want O(n). See Wikipedia: Big O notation. See texts on algorithms, for example, Sedgewick, CLRS, Knuth.

Question 6

@peterSO: Compared to the OP's implementation (which is O(n*n), and requires manually writing more code), simply using sort is easier, and requires less code.

wasmup wasmup 3114 silver badges7 bronze badges · Answer 1 · 2020-03-06 20:34:47Z

Code review:
1. Use discrete++ instead of discrete = discrete + 1
2. Use if i != j && xi == xj instead of

if i != j {
 if xi == xj {
 is_discrete = false
 break
 }
}

Don't use underscores in Go names: is_discrete, simply use discrete or isDiscrete.
You may use slice ...string instead of slice []string like the following code:

// O(n**2)
func distincts(x ...string) int {
 result := 0
 for i, xi := range x {
 unique := 1
 for j, xj := range x {
 if i != j && xi == xj {
 unique = 0
 break
 }
 }
 result += unique
 }
 return result
}

Also note the result += unique instead of if ....

Using map, the following code has the time complexity (asymptotic notation): O(n)

package main
import "fmt"
func main() {
 r := distincts("a", "b", "c", "a")
 fmt.Println(r) // 2
}
// O(n)
func distincts(slice ...string) int {
 result := 0
 m := map[string]int{}
 for _, str := range slice {
 m[str]++
 }
 for _, v := range m {
 if v == 1 {
 result++
 }
 }
 return result
}

Elias Van Ootegem Elias Van Ootegem 11k31 silver badges49 bronze badges · Answer 2 · 2020-03-14 02:28:19Z

You're currently iterating over the slice each once, and then again for each value. That's understandable, but not ideal. Why not simply sort the data, and then check whether or not the next value is the same as the current one?

func distinct(values []string) int {
 if len(values) == 0 {
 return 0
 }
 // the loop doesn't count the last value
 // add if the slice isn't empty
 distinct := 1
 // sort the values
 sort.Strings(values)
 // stop iterating at the next to last element
 for i := 0; i < len(values)-1; i++ {
 // if the current value is distinct from the next, this is a unique word
 if values[i] != values[i+1] {
 distinct++
 }
 }
 return distinct
}

Other, more general comments on your code:

Variables like is_discrete is not in line with golang's coding conventions. Golang prefers camelCased variable names. isDiscrete.
Your nested if checking i != j and then xi == xj can, and should be combined: if i != j && xi == xj
Personally, I'd probably use the indexes, and just use for i := range x and for j := range x.
Although it doesn't make the code perform any better, it does get rid of that rather clunky looking if isDiscrete, you could use either one of the following snippets:

for i := range x { inc := 1 for j := range x { for i != j && x[i] == x[j] { inc = 0 break } } discrete += inc }

You could also increment discrete beforehand, and just replace inc = 0 with discrete-- in the inner loop.

While we're on the subjects of other approaches, while not efficient, it is a very easy approach:

func Discrete(vals []string) int {
 unique := map[string]struct{}{}
 for i := range vals {
 unique[vals[i]] = struct{}{}
 }
 return len(unique)
}

In this code, any duplicate string value is just reassigning the same key in the map. At the end, getting the length of the map will give you the total of unique string values. Maps cause memory allocations, so it's not the most efficient way, but it's a solid way to test your code. You can pass some data to your function, and use a map to calculate what the return for any given slice of strings should be.

Last thoughts on efficiency: the golang toolchain packs a couple of really handy tools that will help you determine which approach is more efficient (writing benchmarks). If memory consumption is a worry, you can use pprof. Not just to determine how much memory any given piece of code uses, but also which part of the code is the most likely bottleneck.

"Why not simply sort the data?" Because sorting is O(n log n). While O(n log n) is better than O(n*n), we want O(n). See Wikipedia: Big O notation. See texts on algorithms, for example, Sedgewick, CLRS, Knuth.
@peterSO: Compared to the OP's implementation (which is O(n*n), and requires manually writing more code), simply using sort is easier, and requires less code.

Stack Exchange Network

Count the number of discrete values in a slice

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Count the number of discrete values in a slice

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions