Can anyone suggest a good solution to remove duplicates from nested lists if wanting to evaluate duplicates based on first element of each nested list?
The main list looks like this:
L = [['14', '65', 76], ['2', '5', 6], ['7', '12', 33], ['14', '22', 46]]
If there is another list with the same element at first position [k][0]
that had already occurred, then I'd like to remove that list and get this result:
L = [['14', '65', 76], ['2', '5', 6], ['7', '12', 33]]
Can you suggest an algorithm to achieve this goal?
6 Answers 6
Do you care about preserving order / which duplicate is removed? If not, then:
dict((x[0], x) for x in L).values()
will do it. If you want to preserve order, and want to keep the first one you find then:
def unique_items(L):
found = set()
for item in L:
if item[0] not in found:
yield item
found.add(item[0])
print list(unique_items(L))
-
your conversion to a dict was so much more elegant than mind that I stole it :)Jiaaro– Jiaaro2009年07月17日 14:02:33 +00:00Commented Jul 17, 2009 at 14:02
-
Doesn't the first one also preserve order because dicts preserve order since Python 3.7 and the keys are inserted in the order that the comprehension produces them?xuiqzy– xuiqzy2020年10月01日 13:49:24 +00:00Commented Oct 1, 2020 at 13:49
use a dict instead like so:
L = {'14': ['65', 76], '2': ['5', 6], '7': ['12', 33]}
L['14'] = ['22', 46]
if you are receiving the first list from some external source, convert it like so:
L = [['14', '65', 76], ['2', '5', 6], ['7', '12', 33], ['14', '22', 46]]
L_dict = dict((x[0], x[1:]) for x in L)
Use Pandas :
import pandas as pd
L = [['14', '65', 76], ['2', '5', 6], ['7', '12', 33], ['14', '22', 46],['7','a','b']]
df = pd.DataFrame(L)
df = df.drop_duplicates()
L_no_duplicates = df.values.tolist()
If you want to drop duplicates in specific columns only use instead:
df = df.drop_duplicates([1,2])
i am not sure what you meant by "another list", so i assume you are saying those lists inside L
a=[]
L = [['14', '65', 76], ['2', '5', 6], ['7', '12', 33], ['14', '22', 46],['7','a','b']]
for item in L:
if not item[0] in a:
a.append(item[0])
print item
-
1This would be more efficient if you used a set for 'a' - you're O(N^2) using a list like that, and amortised O(N) using a set.RichieHindle– RichieHindle2009年07月17日 13:58:07 +00:00Commented Jul 17, 2009 at 13:58
-
that has not come to mind, thanks for the info. nevertheless, that code works in older Python version that doesn't come with set. ;)ghostdog74– ghostdog742009年07月17日 14:14:17 +00:00Commented Jul 17, 2009 at 14:14
If the order does not matter, code below
print [ [k] + v for (k, v) in dict( [ [a[0], a[1:]] for a in reversed(L) ] ).items() ]
gives
[['2', '5', '6'], ['14', '65', '76'], ['7', '12', '33']]
def Remove(duplicate):
final_list = []
for num in duplicate:
if num not in final_list:
final_list.append(num)
return final_list
duplicate = [2, 4, 10, 20, 5, 2, 20, 4]
print(Remove(duplicate))
-
1Plase provide some comments about your code and changes you made with the original code.SLDem– SLDem2023年03月28日 12:57:12 +00:00Commented Mar 28, 2023 at 12:57