Is there a way to find if a list contains duplicates. For example:
list1 = [1,2,3,4,5]
list2 = [1,1,2,3,4,5]
list1.*method* = False # no duplicates
list2.*method* = True # contains duplicates
-
1Is this assuming the lists are always sorted?tyjkenn– tyjkenn2012年06月28日 17:25:42 +00:00Commented Jun 28, 2012 at 17:25
-
Possible duplicate: stackoverflow.com/questions/1920145/…tyjkenn– tyjkenn2012年06月28日 17:27:12 +00:00Commented Jun 28, 2012 at 17:27
-
1@tyjkenn: Checking for existence of duplicates is simpler than finding the actual duplicates (which is what the other question is about).interjay– interjay2012年06月28日 17:30:44 +00:00Commented Jun 28, 2012 at 17:30
4 Answers 4
If you convert the list to a set temporarily, that will eliminate the duplicates in the set. You can then compare the lengths of the list and set.
In code, it would look like this:
list1 = [...]
tmpSet = set(list1)
haveDuplicates = len(list1) != len(tmpSet)
-
2+1 for including some actual text to explain what you are doing as opposed to just plopping down code.jdi– jdi2012年06月28日 17:34:03 +00:00Commented Jun 28, 2012 at 17:34
-
1@jdi: I actually tried to just plop down some code originally but it came under the 30 characters minimum.3Doubloons– 3Doubloons2012年06月28日 17:50:54 +00:00Commented Jun 28, 2012 at 17:50
Convert the list to a set to remove duplicates. Compare the lengths of the original list and the set to see if any duplicates existed.
>>> list1 = [1,2,3,4,5]
>>> list2 = [1,1,2,3,4,5]
>>> len(list1) == len(set(list1))
True # no duplicates
>>> len(list2) == len(set(list2))
False # duplicates
Check if the length of the original list is larger than the length of the unique "set" of elements in the list. If so, there must have been duplicates
list1 = [1,2,3,4,5]
list2 = [1,1,2,3,4,5]
if len(list1) != len(set(list1)):
#duplicates
The set()
approach only works for hashable objects, so for completness, you could do it with just plain iteration:
import itertools
def has_duplicates(iterable):
"""
>>> has_duplicates([1,2,3])
False
>>> has_duplicates([1, 2, 1])
True
>>> has_duplicates([[1,1], [3,2], [4,3]])
False
>>> has_duplicates([[1,1], [3,2], [4,3], [4,3]])
True
"""
return any(x == y for x, y in itertools.combinations(iterable, 2))
-
Ouch. This one hurts for complexity. Better to write hash functions for your unhashable objects.Joel Cornett– Joel Cornett2012年06月28日 17:58:53 +00:00Commented Jun 28, 2012 at 17:58
-
@JoelCornett Mind doing it for
list
?lqc– lqc2012年06月28日 18:07:38 +00:00Commented Jun 28, 2012 at 18:07 -
listHash = lambda x: hash(tuple(x))
. Note that since this hash is just a one-time thing, you don't have to worry about objects mutating on you.Joel Cornett– Joel Cornett2012年06月28日 20:58:43 +00:00Commented Jun 28, 2012 at 20:58 -
Here's a simpler one:
lambda x: 1
. Creating such a function doesn't makelist
objects any more hashable, 'causelist.__hash__
is stillNone
. As for efficiency, you can easily tweak this to take constant extra memory. Hashing is just a CPU/memory tradeoff.lqc– lqc2012年06月29日 07:04:53 +00:00Commented Jun 29, 2012 at 7:04