I have a list with duplicate elements:
list_a=[1,2,3,5,6,7,5,2]
tmp=[]
for i in list_a:
if tmp.__contains__(i):
print i
else:
tmp.append(i)
I have used the above code to find the duplicate elements in the list_a
. I don't want to remove the elements from list.
But I want to use for loop here. Normally C/C++ we use like this I guess:
for (int i=0;i<=list_a.length;i++)
for (int j=i+1;j<=list_a.length;j++)
if (list_a[i]==list_a[j])
print list_a[i]
how do we use like this in Python?
for i in list_a:
for j in list_a[1:]:
....
I tried the above code. But it gets solution wrong. I don't know how to increase the value for j
.
20 Answers 20
Just for information, In python 2.7+, we can use Counter
import collections
x=[1, 2, 3, 5, 6, 7, 5, 2]
>>> x
[1, 2, 3, 5, 6, 7, 5, 2]
>>> y=collections.Counter(x)
>>> y
Counter({2: 2, 5: 2, 1: 1, 3: 1, 6: 1, 7: 1})
Unique List
>>> list(y)
[1, 2, 3, 5, 6, 7]
Items found more than 1 time
>>> [i for i in y if y[i]>1]
[2, 5]
Items found only one time
>>> [i for i in y if y[i]==1]
[1, 3, 6, 7]
-
3
[n for n, i in y.iteritems() if i > 1]
instead, andi == 1
.Roger Pate– Roger Pate2009年12月17日 08:40:59 +00:00Commented Dec 17, 2009 at 8:40 -
...but why the list(y), isn't Counter iterable ?LeMiz– LeMiz2009年12月17日 08:41:09 +00:00Commented Dec 17, 2009 at 8:41
-
@Roger Pate, thanks, yours is no need to do dict lookup, it could be better.YOU– YOU2009年12月17日 08:50:28 +00:00Commented Dec 17, 2009 at 8:50
-
The only drawbacks of Counter is that it doesn't terminate early if duplicates are found early in a big list and it doesn't work for infinite iterator.Lie Ryan– Lie Ryan2012年09月25日 08:11:57 +00:00Commented Sep 25, 2012 at 8:11
-
My solution: for k, v in x.most_common(): if v > 1: print kluistm– luistm2014年07月03日 13:44:19 +00:00Commented Jul 3, 2014 at 13:44
Use the in
operator instead of calling __contains__
directly.
What you have almost works (but is O(n**2)):
for i in xrange(len(list_a)):
for j in xrange(i + 1, len(list_a)):
if list_a[i] == list_a[j]:
print "duplicate:", list_a[i]
But it's far easier to use a set (roughly O(n) due to the hash table):
seen = set()
for n in list_a:
if n in seen:
print "duplicate:", n
else:
seen.add(n)
Or a dict, if you want to track locations of duplicates (also O(n)):
import collections
items = collections.defaultdict(list)
for i, item in enumerate(list_a):
items[item].append(i)
for item, locs in items.iteritems():
if len(locs) > 1:
print "duplicates of", item, "at", locs
Or even just detect a duplicate somewhere (also O(n)):
if len(set(list_a)) != len(list_a):
print "duplicate"
You could always use a list comprehension:
dups = [x for x in list_a if list_a.count(x) > 1]
-
3This traverses the list once for each element (Although, OP's code is O(N**2), too).Alok Singhal– Alok Singhal2009年12月17日 08:41:45 +00:00Commented Dec 17, 2009 at 8:41
-
Yeah, I understood it's inefficient. If the OP is looking for that, he should go with Roger's answers for sure.Evan Fosmark– Evan Fosmark2009年12月17日 10:23:30 +00:00Commented Dec 17, 2009 at 10:23
-
2I think this is slightly more efficient: [x for i,x in enumerate(list_a) if list_a[i:].count(x) > 1]dMb– dMb2011年08月12日 19:50:37 +00:00Commented Aug 12, 2011 at 19:50
-
this will return a list with duplicates as well as list_a.count(x) > 1 will return True for each occurence of the element. I'd use set() to get unique duplicatesdmitko– dmitko2012年11月08日 14:21:38 +00:00Commented Nov 8, 2012 at 14:21
Before Python 2.3, use dict() :
>>> lst = [1, 2, 3, 5, 6, 7, 5, 2]
>>> stats = {}
>>> for x in lst : # count occurrences of each letter:
... stats[x] = stats.get(x, 0) + 1
>>> print stats
{1: 1, 2: 2, 3: 1, 5: 2, 6: 1, 7: 1} # filter letters appearing more than once:
>>> duplicates = [dup for (dup, i) in stats.items() if i > 1]
>>> print duplicates
So a function :
def getDuplicates(iterable):
"""
Take an iterable and return a generator yielding its duplicate items.
Items must be hashable.
e.g :
>>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
[2, 5]
"""
stats = {}
for x in iterable :
stats[x] = stats.get(x, 0) + 1
return (dup for (dup, i) in stats.items() if i > 1)
With Python 2.3 comes set(), and it's even a built-in after than :
def getDuplicates(iterable):
"""
Take an iterable and return a generator yielding its duplicate items.
Items must be hashable.
e.g :
>>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
[2, 5]
"""
try: # try using built-in set
found = set()
except NameError: # fallback on the sets module
from sets import Set
found = Set()
for x in iterable:
if x in found : # set is a collection that can't contain duplicate
yield x
found.add(x) # duplicate won't be added anyway
With Python 2.7 and above, you have the collections
module providing the very same function than the dict one, and we can make it shorter (and faster, it's probably C under the hood) than solution 1 :
import collections
def getDuplicates(iterable):
"""
Take an iterable and return a generator yielding its duplicate items.
Items must be hashable.
e.g :
>>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
[2, 5]
"""
return (dup for (dup, i) in collections.counter(iterable).items() if i > 1)
I'd stick with solution 2.
You can use this function to find duplicates:
def get_duplicates(arr):
dup_arr = arr[:]
for i in set(arr):
dup_arr.remove(i)
return list(set(dup_arr))
Examples
print get_duplicates([1,2,3,5,6,7,5,2])
[2, 5]
print get_duplicates([1,2,1,3,4,5,4,4,6,7,8,2])
[1, 2, 4]
If you're looking for one-to-one mapping between your nested loops and Python, this is what you want:
n = len(list_a)
for i in range(n):
for j in range(i+1, n):
if list_a[i] == list_a[j]:
print list_a[i]
The code above is not "Pythonic". I would do it something like this:
seen = set()
for i in list_a:
if i in seen:
print i
else:
seen.add(i)
Also, don't use __contains__
, rather, use in
(as above).
The following requires the elements of your list to be hashable (not just implementing __eq__
).
I find it more pythonic to use a defaultdict (and you have the number of repetitions for free):
import collections l = [1, 2, 4, 1, 3, 3] d = collections.defaultdict(int) for x in l: d[x] += 1 print [k for k, v in d.iteritems() if v> 1] # prints [1, 3]
Using only itertools, and works fine on Python 2.5
from itertools import groupby
list_a = sorted([1, 2, 3, 5, 6, 7, 5, 2])
result = dict([(r, len(list(grp))) for r, grp in groupby(list_a)])
Result:
{1: 1, 2: 2, 3: 1, 5: 2, 6: 1, 7: 1}
It looks like you have a list (list_a
) potentially including duplicates, which you would rather keep as it is, and build a de-duplicated list tmp
based on list_a. In Python 2.7, you can accomplish this with one line:
tmp = list(set(list_a))
Comparing the lengths of tmp
and list_a
at this point should clarify if there were indeed duplicate items in list_a
. This may help simplify things if you want to go into the loop for additional processing.
You could just "translate" it line by line.
c++
for (int i=0;i<=list_a.length;i++)
for (int j=i+1;j<=list_a.length;j++)
if (list_a[i]==list_a[j])
print list_a[i]
Python
for i in range(0, len(list_a)):
for j in range(i + 1, len(list_a))
if list_a[i] == list_a[j]:
print list_a[i]
c++ for loop:
for(int x = start; x < end; ++x)
Python equivalent:
for x in range(start, end):
-
3You should not accept this answer. Yes, it's valid code, but it's not the way you should code in Python. Don't code Python like C/C++, or Java. They are not the same languages, and are not meant to be used the same way.Bite code– Bite code2009年12月17日 09:27:00 +00:00Commented Dec 17, 2009 at 9:27
-
I agree with e-satis, although the the question specifically tries to compare the routine to C/C++ we should try to nudge it in the right direction.Mizipzor– Mizipzor2009年12月17日 09:43:57 +00:00Commented Dec 17, 2009 at 9:43
Just quick and dirty,
list_a=[1,2,3,5,6,7,5,2]
holding_list=[]
for x in list_a:
if x in holding_list:
pass
else:
holding_list.append(x)
print holding_list
Output [1, 2, 3, 5, 6, 7]
Using numpy:
import numpy as np
count,value = np.histogram(list_a,bins=np.hstack((np.unique(list_a),np.inf)))
print 'duplicate value(s) in list_a: ' + ', '.join([str(v) for v in value[count>1]])
In case of Python3 and if you two lists
def removedup(List1,List2):
List1_copy = List1[:]
for i in List1_copy:
if i in List2:
List1.remove(i)
List1 = [4,5,6,7]
List2 = [6,7,8,9]
removedup(List1,List2)
print (List1)
Granted, I haven't done tests, but I guess it's going to be hard to beat pandas in speed:
pd.DataFrame(list_a, columns=["x"]).groupby('x').size().to_dict()
You can use:
b=['E', 'P', 'P', 'E', 'O', 'E']
c={}
for i in b:
value=0
for j in b:
if(i == j):
value+=1
c[i]=value
print(c)
Output:
{'E': 3, 'P': 2, 'O': 1}
Find duplicates in the list using loops, conditional logic, logical operators, and list methods
some_list = ['a','b','c','d','e','b','n','n','c','c','h',]
duplicates = []
for values in some_list:
if some_list.count(values) > 1:
if values not in duplicates:
duplicates.append(values)
print("Duplicate Values are : ",duplicates)
-
How to get this without using any python library such as count?@ZaheerNavi– Navi2020年11月27日 10:24:50 +00:00Commented Nov 27, 2020 at 10:24
Finding the number of repeating elements in a list:
myList = [3, 2, 2, 5, 3, 8, 3, 4, 'a', 'a', 'f', 4, 4, 1, 8, 'D']
listCleaned = set(myList)
for s in listCleaned:
count = 0
for i in myList:
if s == i :
count += 1
print(f'total {s} => {count}')
Try like this:
list_a=[1,2,3,5,6,7,5,2]
unique_values = []
duplicates = []
for i in list_a:
if i not in unique_values:
unique_values.append(i)
else:
found = False
for x in duplicates:
if x.get("key") == i:
found = True
if found:
x["occurrence"] += 1
else:
duplicates.append({
"key": i,
"occurrence": 1
})
some_string= list(input("Enter any string:\n"))
count={}
dup_count={}
for i in some_string:
if i not in count:
count[i]=1
else:
count[i]+=1
dup_count[i]=count[i]
print("Duplicates of given string are below:\n",dup_count)
A little bit more Pythonic implementation (not the most, of course), but in the spirit of your C code could be:
for i, elem in enumerate(seq):
if elem in seq[i+1:]:
print elem
Edit: yes, it prints the elements more than once if there're more than 2 repetitions, but that's what the op's C pseudo code does too.
-
You must sort before doing that. Use sorted. What's more, you will print the same duplicate several times if there is more than one of the same.Bite code– Bite code2009年12月17日 09:03:23 +00:00Commented Dec 17, 2009 at 9:03
-
This will print the same element multiple times if it occurs more than 2 times in the list.mthurlin– mthurlin2009年12月17日 09:04:19 +00:00Commented Dec 17, 2009 at 9:04
-
1Have you guys bothered to read the op's code? It does the exactly the same. @e-satis There's no need to sort, maybe you meant something like
[k for k, it in itertools.groupby(sorted(l)) if len(list(it)) > 1]
?fortran– fortran2009年12月17日 11:21:16 +00:00Commented Dec 17, 2009 at 11:21