Removing some of the duplicates from a list in Python

Question 1

I would like to remove a certain number of duplicates of a list without removing all of them. For example, I have a list [1,2,3,4,4,4,4,4] and I want to remove 3 of the 4's, so that I am left with [1,2,3,4,4]. A naive way to do it would probably be

def remove_n_duplicates(remove_from, what, how_many):
 for j in range(how_many):
 remove_from.remove(what)

Is there a way to do remove the three 4's in one pass through the list, but keep the other two.

Question 2

@dot.Py: Definitely not a duplicate of that, because we're only trying to remove a limited number of items from the list, not completely eliminate duplicates.

Question 3

Do you want to remove n duplicates? Or assert that there are at most m duplicates of any given item?

Question 4

Also, does it matter which duplicates you remove? (e.g. can you remove the first 4 dupes or would it have to be the last 4?)

Question 5

You could iterate over the list in reverse and pop the indeces where you find the element. By iterating in reverse you make sure that the popping of an element doesn't disrupt the next iterations, so: for i, el in enumerate(reversed(seq)):if el == what:seq.pop(i) and you stop when you have popped enough of them.

Question 6

@mgilson I want to remove n duplicates. I might have [4,4,4,6,6,6,6,6] and want to remove one 4, but leave the 6's alone. It doesn't matter which duplicate is removed and order does not need to be preserved at all.

Question 7

If you just want to remove the first n occurrences of something from a list, this is pretty easy to do with a generator:

def remove_n_dupes(remove_from, what, how_many):
 count = 0
 for item in remove_from:
 if item == what and count < how_many:
 count += 1
 else:
 yield item

Usage looks like:

lst = [1,2,3,4,4,4,4,4]
print list(remove_n_dupes(lst, 4, 3)) # [1, 2, 3, 4, 4]

Keeping a specified number of duplicates of any item is similarly easy if we use a little extra auxiliary storage:

from collections import Counter
def keep_n_dupes(remove_from, how_many):
 counts = Counter()
 for item in remove_from:
 counts[item] += 1
 if counts[item] <= how_many:
 yield item

Usage is similar:

lst = [1,1,1,1,2,3,4,4,4,4,4]
print list(keep_n_dupes(lst, 2)) # [1, 1, 2, 3, 4, 4]

Here the input is the list and the max number of items that you want to keep. The caveat is that the items need to be hashable...

Question 8

You can use Python's set functionality with the & operator to create a list of lists and then flatten the list. The result list will be [1, 2, 3, 4, 4].

x = [1,2,3,4,4,4,4,4]
x2 = [val for sublist in [[item]*max(1, x.count(item)-3) for item in set(x) & set(x)] for val in sublist]

As a function you would have the following.

def remove_n_duplicates(remove_from, what, how_many):
 return [val for sublist in [[item]*max(1, remove_from.count(item)-how_many) if item == what else [item]*remove_from.count(item) for item in set(remove_from) & set(remove_from)] for val in sublist]

Question 9

If the list is sorted, there's the fast solution:

def remove_n_duplicates(remove_from, what, how_many):
 index = 0
 for i in range(len(remove_from)):
 if remove_from[i] == what:
 index = i
 break
 if index + how_many >= len(remove_from):
 #There aren't enough things to remove.
 return
 for i in range(index, how_many):
 if remove_from[i] != what:
 #Again, there aren't enough things to remove
 return
 endIndex = index + how_many
 return remove_from[:index+1] + remove_from[endIndex:]

Note that this returns the new array, so you want to do arr = removeCount(arr, 4, 3)

Question 10

Here is another trick which might be useful sometimes. Not to be taken as the recommended recipe.

def remove_n_duplicates(remove_from, what, how_many):
 exec('remove_from.remove(what);'*how_many)

Question 11

I can solve it in different way using collections.

from collections import Counter
li = [1,2,3,4,4,4,4]
cntLi = Counter(li)
print cntLi.keys()

Question 12

But this removes all duplicates and doesn't really take advantage of the Counter at all...

Question 13

This can be achieved by using the value for respective key. The cntLi.items() provides a list of tuples, in which the unique number is present in key and count of the number is present in value. By processing the value you can decide the operation.

Question 14

Right. It definitely can be done that way (and that wouldn't even be a bad solution), but as it is, you answer is missing that crucial step.

mgilson mgilson 312k70 gold badges656 silver badges720 bronze badges · Accepted Answer · 2016-07-26 20:20:48Z

If you just want to remove the first n occurrences of something from a list, this is pretty easy to do with a generator:

def remove_n_dupes(remove_from, what, how_many):
 count = 0
 for item in remove_from:
 if item == what and count < how_many:
 count += 1
 else:
 yield item

Usage looks like:

lst = [1,2,3,4,4,4,4,4]
print list(remove_n_dupes(lst, 4, 3)) # [1, 2, 3, 4, 4]

Keeping a specified number of duplicates of any item is similarly easy if we use a little extra auxiliary storage:

from collections import Counter
def keep_n_dupes(remove_from, how_many):
 counts = Counter()
 for item in remove_from:
 counts[item] += 1
 if counts[item] <= how_many:
 yield item

Usage is similar:

lst = [1,1,1,1,2,3,4,4,4,4,4]
print list(keep_n_dupes(lst, 2)) # [1, 1, 2, 3, 4, 4]

Here the input is the list and the max number of items that you want to keep. The caveat is that the items need to be hashable...

CollectivesTM on Stack Overflow

Removing some of the duplicates from a list in Python

5 Answers 5

Comments

Comments

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

5 Answers 5

Comments

Comments

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related