I'm trying to remove the duplicate from the list and count the list after removing the duplicates
seq = [[1,2,3], [1,2,3], [2,3,4], [4,5,6]]
new_seq = [[1,2,3], [2,3,4], [4,5,6]]
count = 3
My code takes around 23 seconds for around 66,000 lists in a list
How can I make my code faster?
def unique(seq):
new_seq = []
count = 0
for i in seq:
if i not in new_seq:
new_seq.append(i)
count += 1
return count
-
2\$\begingroup\$ What are you really trying to accomplish? Is this function part of a larger program? Tell us about the context. \$\endgroup\$200_success– 200_success2016年05月06日 19:41:10 +00:00Commented May 6, 2016 at 19:41
-
\$\begingroup\$ The lists comes from another function which calculates an algorithm \$\endgroup\$jack– jack2016年05月06日 19:45:37 +00:00Commented May 6, 2016 at 19:45
1 Answer 1
Your function is slow because it is O(n2): each element being added to new_seq
has to be compared against every previously added element.
To deduplicate a sequence, use a set. Constructing the set is only O(n) because it uses hashing.
Then, to obtain the size of the set, use len()
.
def unique(seq):
return len(set(tuple(element) for element in seq))