3

I have searched and haven't quite found the same question as mine. I want to remove duplicates from a list of lists in python; however, I don't care what order the values are in the list. They way I am doing it currently is too time-consuming.

What I want to do:

A = [[1,2,3] , [2,3,4] , [3,4,5] , [3,2,4]]

I want to search through A and remove all duplicates. The duplicates here would be [2,3,4] and [3,2,4]. This would reduce down to:

smaller_A = [[1,2,3] , [2,3,4], [3,4,5]]

How I am currently doing it:

todelete = []
for i in range(len(A)):
 for j in range(i+1,len(A)):
 if set(A[i]) == set(A[j]):
 todelete.append(j)
todelete = sorted(set(todelete))
smaller_A= [A[i] for i in range(len(A)) if i not in todelete]

Again, this works, but it is very time consuming when my lists are large. Any ideas? Thanks!

user3483203
51.3k10 gold badges72 silver badges104 bronze badges
asked Sep 4, 2013 at 21:12

2 Answers 2

8

Frozensets are perfect for cases like this, when you need to nest sets:

>>> A = [[1,2,3], [2,3,4], [3,4,5], [3,2,4]]
>>> smaller_A = {frozenset(x) for x in A}
>>> smaller_A
{frozenset({1, 2, 3}), frozenset({2, 3, 4}), frozenset({3, 4, 5})}

To convert back to lists, you can do this:

>>> [list(x) for x in smaller_A]
[[1, 2, 3], [2, 3, 4], [3, 4, 5]]

This won't conserve the order of your lists or the elements inside them. (Although it didn't make a difference here.)

If you do need to preserve order, you can iterate over A while keeping track of frozensets seen so far:

>>> A = [[1,2,3], [2,3,4], [3,4,5], [3,2,4]]
>>> seen = set()
>>> smaller_A = []
>>> for x in A:
... if frozenset(x) not in seen:
... smaller_A.append(x)
... seen.add(frozenset(x))
...
>>> smaller_A
[[1, 2, 3], [2, 3, 4], [3, 4, 5]]

(This isn't optimized; ideally, you'd only call frozenset(x) once and store the result in a variable.)

answered Sep 4, 2013 at 21:16
2
  • This is great! I have not come across frozenset. I did need to preserve some sort of order, so I used the small loop you provided. However, I didn't quite understand what isn't optimized. Are you saying before the if statement I would have fs = frozenset(x), then use fs for the next two lines? Thanks! Commented Sep 5, 2013 at 13:37
  • For fun, this sped up my code (containing a list of 5040 lists) from 24.1s to .0067s! Wow! Props to flornquake! Commented Sep 5, 2013 at 13:56
1

you can do a trick with sorting this way

for i in range(len(A)): A[i].sort()

then remove duplicates

answered Sep 4, 2013 at 21:29

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.