I would like to remove a certain number of duplicates of a list without removing all of them. For example, I have a list [1,2,3,4,4,4,4,4]
and I want to remove 3 of the 4's, so that I am left with [1,2,3,4,4]
. A naive way to do it would probably be
def remove_n_duplicates(remove_from, what, how_many):
for j in range(how_many):
remove_from.remove(what)
Is there a way to do remove the three 4's in one pass through the list, but keep the other two.
5 Answers 5
If you just want to remove the first n
occurrences of something from a list, this is pretty easy to do with a generator:
def remove_n_dupes(remove_from, what, how_many):
count = 0
for item in remove_from:
if item == what and count < how_many:
count += 1
else:
yield item
Usage looks like:
lst = [1,2,3,4,4,4,4,4]
print list(remove_n_dupes(lst, 4, 3)) # [1, 2, 3, 4, 4]
Keeping a specified number of duplicates of any item is similarly easy if we use a little extra auxiliary storage:
from collections import Counter
def keep_n_dupes(remove_from, how_many):
counts = Counter()
for item in remove_from:
counts[item] += 1
if counts[item] <= how_many:
yield item
Usage is similar:
lst = [1,1,1,1,2,3,4,4,4,4,4]
print list(keep_n_dupes(lst, 2)) # [1, 1, 2, 3, 4, 4]
Here the input is the list and the max number of items that you want to keep. The caveat is that the items need to be hashable...
Comments
You can use Python's set functionality with the & operator to create a list of lists and then flatten the list. The result list will be [1, 2, 3, 4, 4].
x = [1,2,3,4,4,4,4,4]
x2 = [val for sublist in [[item]*max(1, x.count(item)-3) for item in set(x) & set(x)] for val in sublist]
As a function you would have the following.
def remove_n_duplicates(remove_from, what, how_many):
return [val for sublist in [[item]*max(1, remove_from.count(item)-how_many) if item == what else [item]*remove_from.count(item) for item in set(remove_from) & set(remove_from)] for val in sublist]
Comments
If the list is sorted, there's the fast solution:
def remove_n_duplicates(remove_from, what, how_many):
index = 0
for i in range(len(remove_from)):
if remove_from[i] == what:
index = i
break
if index + how_many >= len(remove_from):
#There aren't enough things to remove.
return
for i in range(index, how_many):
if remove_from[i] != what:
#Again, there aren't enough things to remove
return
endIndex = index + how_many
return remove_from[:index+1] + remove_from[endIndex:]
Note that this returns the new array, so you want to do arr = removeCount(arr, 4, 3)
Comments
Here is another trick which might be useful sometimes. Not to be taken as the recommended recipe.
def remove_n_duplicates(remove_from, what, how_many):
exec('remove_from.remove(what);'*how_many)
Comments
I can solve it in different way using collections.
from collections import Counter
li = [1,2,3,4,4,4,4]
cntLi = Counter(li)
print cntLi.keys()
3 Comments
Counter
at all...
n
duplicates? Or assert that there are at mostm
duplicates of any given item?pop
the indeces where you find the element. By iterating in reverse you make sure that the popping of an element doesn't disrupt the next iterations, so:for i, el in enumerate(reversed(seq)):if el == what:seq.pop(i)
and you stop when you have popped enough of them.n
duplicates. I might have[4,4,4,6,6,6,6,6]
and want to remove one 4, but leave the 6's alone. It doesn't matter which duplicate is removed and order does not need to be preserved at all.