7

I would like to remove a certain number of duplicates of a list without removing all of them. For example, I have a list [1,2,3,4,4,4,4,4] and I want to remove 3 of the 4's, so that I am left with [1,2,3,4,4]. A naive way to do it would probably be

def remove_n_duplicates(remove_from, what, how_many):
 for j in range(how_many):
 remove_from.remove(what)

Is there a way to do remove the three 4's in one pass through the list, but keep the other two.

asked Jul 26, 2016 at 20:13
5
  • @dot.Py: Definitely not a duplicate of that, because we're only trying to remove a limited number of items from the list, not completely eliminate duplicates. Commented Jul 26, 2016 at 20:16
  • 2
    Do you want to remove n duplicates? Or assert that there are at most m duplicates of any given item? Commented Jul 26, 2016 at 20:17
  • 2
    Also, does it matter which duplicates you remove? (e.g. can you remove the first 4 dupes or would it have to be the last 4?) Commented Jul 26, 2016 at 20:18
  • You could iterate over the list in reverse and pop the indeces where you find the element. By iterating in reverse you make sure that the popping of an element doesn't disrupt the next iterations, so: for i, el in enumerate(reversed(seq)):if el == what:seq.pop(i) and you stop when you have popped enough of them. Commented Jul 26, 2016 at 20:29
  • @mgilson I want to remove n duplicates. I might have [4,4,4,6,6,6,6,6] and want to remove one 4, but leave the 6's alone. It doesn't matter which duplicate is removed and order does not need to be preserved at all. Commented Jul 26, 2016 at 20:47

5 Answers 5

8

If you just want to remove the first n occurrences of something from a list, this is pretty easy to do with a generator:

def remove_n_dupes(remove_from, what, how_many):
 count = 0
 for item in remove_from:
 if item == what and count < how_many:
 count += 1
 else:
 yield item

Usage looks like:

lst = [1,2,3,4,4,4,4,4]
print list(remove_n_dupes(lst, 4, 3)) # [1, 2, 3, 4, 4]

Keeping a specified number of duplicates of any item is similarly easy if we use a little extra auxiliary storage:

from collections import Counter
def keep_n_dupes(remove_from, how_many):
 counts = Counter()
 for item in remove_from:
 counts[item] += 1
 if counts[item] <= how_many:
 yield item

Usage is similar:

lst = [1,1,1,1,2,3,4,4,4,4,4]
print list(keep_n_dupes(lst, 2)) # [1, 1, 2, 3, 4, 4]

Here the input is the list and the max number of items that you want to keep. The caveat is that the items need to be hashable...

answered Jul 26, 2016 at 20:20

Comments

0

You can use Python's set functionality with the & operator to create a list of lists and then flatten the list. The result list will be [1, 2, 3, 4, 4].

x = [1,2,3,4,4,4,4,4]
x2 = [val for sublist in [[item]*max(1, x.count(item)-3) for item in set(x) & set(x)] for val in sublist]

As a function you would have the following.

def remove_n_duplicates(remove_from, what, how_many):
 return [val for sublist in [[item]*max(1, remove_from.count(item)-how_many) if item == what else [item]*remove_from.count(item) for item in set(remove_from) & set(remove_from)] for val in sublist]
answered Jul 26, 2016 at 20:39

Comments

0

If the list is sorted, there's the fast solution:

def remove_n_duplicates(remove_from, what, how_many):
 index = 0
 for i in range(len(remove_from)):
 if remove_from[i] == what:
 index = i
 break
 if index + how_many >= len(remove_from):
 #There aren't enough things to remove.
 return
 for i in range(index, how_many):
 if remove_from[i] != what:
 #Again, there aren't enough things to remove
 return
 endIndex = index + how_many
 return remove_from[:index+1] + remove_from[endIndex:]

Note that this returns the new array, so you want to do arr = removeCount(arr, 4, 3)

answered Jul 26, 2016 at 21:31

Comments

-1

Here is another trick which might be useful sometimes. Not to be taken as the recommended recipe.

def remove_n_duplicates(remove_from, what, how_many):
 exec('remove_from.remove(what);'*how_many)
answered Jul 26, 2016 at 21:17

Comments

-1

I can solve it in different way using collections.

from collections import Counter
li = [1,2,3,4,4,4,4]
cntLi = Counter(li)
print cntLi.keys()
Chris_Rands
41.6k15 gold badges92 silver badges126 bronze badges
answered Jul 26, 2016 at 20:35

3 Comments

But this removes all duplicates and doesn't really take advantage of the Counter at all...
This can be achieved by using the value for respective key. The cntLi.items() provides a list of tuples, in which the unique number is present in key and count of the number is present in value. By processing the value you can decide the operation.
Right. It definitely can be done that way (and that wouldn't even be a bad solution), but as it is, you answer is missing that crucial step.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.