I have searched and haven't quite found the same question as mine. I want to remove duplicates from a list of lists in python; however, I don't care what order the values are in the list. They way I am doing it currently is too time-consuming.
What I want to do:
A = [[1,2,3] , [2,3,4] , [3,4,5] , [3,2,4]]
I want to search through A and remove all duplicates. The duplicates here would be [2,3,4] and [3,2,4]. This would reduce down to:
smaller_A = [[1,2,3] , [2,3,4], [3,4,5]]
How I am currently doing it:
todelete = []
for i in range(len(A)):
for j in range(i+1,len(A)):
if set(A[i]) == set(A[j]):
todelete.append(j)
todelete = sorted(set(todelete))
smaller_A= [A[i] for i in range(len(A)) if i not in todelete]
Again, this works, but it is very time consuming when my lists are large. Any ideas? Thanks!
2 Answers 2
Frozensets are perfect for cases like this, when you need to nest sets:
>>> A = [[1,2,3], [2,3,4], [3,4,5], [3,2,4]]
>>> smaller_A = {frozenset(x) for x in A}
>>> smaller_A
{frozenset({1, 2, 3}), frozenset({2, 3, 4}), frozenset({3, 4, 5})}
To convert back to lists, you can do this:
>>> [list(x) for x in smaller_A]
[[1, 2, 3], [2, 3, 4], [3, 4, 5]]
This won't conserve the order of your lists or the elements inside them. (Although it didn't make a difference here.)
If you do need to preserve order, you can iterate over A
while keeping track of frozensets seen so far:
>>> A = [[1,2,3], [2,3,4], [3,4,5], [3,2,4]]
>>> seen = set()
>>> smaller_A = []
>>> for x in A:
... if frozenset(x) not in seen:
... smaller_A.append(x)
... seen.add(frozenset(x))
...
>>> smaller_A
[[1, 2, 3], [2, 3, 4], [3, 4, 5]]
(This isn't optimized; ideally, you'd only call frozenset(x)
once and store the result in a variable.)
-
This is great! I have not come across frozenset. I did need to preserve some sort of order, so I used the small loop you provided. However, I didn't quite understand what isn't optimized. Are you saying before the if statement I would have fs = frozenset(x), then use fs for the next two lines? Thanks!mcfly– mcfly2013年09月05日 13:37:09 +00:00Commented Sep 5, 2013 at 13:37
-
For fun, this sped up my code (containing a list of 5040 lists) from 24.1s to .0067s! Wow! Props to flornquake!mcfly– mcfly2013年09月05日 13:56:00 +00:00Commented Sep 5, 2013 at 13:56
you can do a trick with sorting this way
for i in range(len(A)): A[i].sort()
then remove duplicates