How to sort and remove duplicates from Python list? [duplicate]

Question 1

Given a list of strings, I want to sort it alphabetically and remove duplicates. I know I can do this:

from sets import Set
[...]
myHash = Set(myList)

but I don't know how to retrieve the list members from the hash in alphabetical order.

I'm not married to the hash, so any way to accomplish this will work. Also, performance is not an issue, so I'd prefer a solution that is expressed in code clearly to a fast but more opaque one.

Question 2

Also see here for more information

Question 3

This question, after @ColonelPanic's edit, is kind of a mess; the question in the title and the question in the body are not the same. The title indicates that the original order, pre-duplicate-removal, should be preserved. But the body presents a scenario where this is not in fact necessary.

Question 4

I changed the title to match the body and the accepted answer.

Question 5

A list can be sorted and deduplicated using built-in functions:

myList = sorted(set(myList))

set is a built-in function for Python>= 2.3
sorted is a built-in function for Python>= 2.4

Question 6

This does not work if your myList has unhashable objects.

Question 7

wouldn't set(sorted(myList)) be faster? I mean isn't it faster to first sort a list and then remove its duplicates than first removing its duplicates and only sort it afterwards?

Question 8

@CorneliuZuzu Removing duplicates with set() changes order, so you have to do it this way

Question 9

Downvoted because there is a distinction between ordered and sorted. Ordered means keep the original order, e.g. f([3,1,4,1,5,9,2,6,5,3,5]) = [3,1,4,5,9,2,6]

Question 10

@user3667349 The "keep order" clause was not a part of the original question, it was added by the Colonel Panic edit in 2015.

Question 11

If your input is already sorted, then there may be a simpler way to do it:

from operator import itemgetter
from itertools import groupby
unique_list = list(map(itemgetter(0), groupby(yourList)))

Question 12

This can also be expressed as [e for e, _ in groupby(sortedList)]

Question 13

This is O(n) rather than O(n log n) right?

Question 14

FWIW something similar was added to the list of recipes from the documentation for itertools.

Question 15

If you want to keep order of the original list, just use OrderedDict with None as values.

In Python2:

from collections import OrderedDict
from itertools import izip, repeat
unique_list = list(OrderedDict(izip(my_list, repeat(None))))

In Python3 it's even simpler:

from collections import OrderedDict
from itertools import repeat
unique_list = list(OrderedDict(zip(my_list, repeat(None))))

If you don't like iterators (zip and repeat) you can use a generator (works both in 2 & 3):

from collections import OrderedDict
unique_list = list(OrderedDict((element, None) for element in my_list))

Question 16

If it's clarity you're after, rather than speed, I think this is very clear:

def sortAndUniq(input):
 output = []
 for x in input:
 if x not in output:
 output.append(x)
 output.sort()
 return output

It's O(n^2) though, with the repeated use of not in for each element of the input list.

Question 17

> but I don't know how to retrieve the list members from the hash in alphabetical order.

Not really your main question, but for future reference Rod's answer using sorted can be used for traversing a dict's keys in sorted order:

for key in sorted(my_dict.keys()):
 print key, my_dict[key]
 ...

and also because tuple's are ordered by the first member of the tuple, you can do the same with items:

for key, val in sorted(my_dict.items()):
 print key, val
 ...

Question 18

For the string data

output = []
 def uniq(input):
 if input not in output:
 output.append(input)
print output

score 213 · Accepted Answer · 2009-01-26 14:16:10Z

213

A list can be sorted and deduplicated using built-in functions:

myList = sorted(set(myList))

set is a built-in function for Python>= 2.3
sorted is a built-in function for Python>= 2.4

Share

Improve this answer

edited Sep 26, 2012 at 14:22

Bengt's user avatar

Bengt

14.5k7 gold badges53 silver badges67 bronze badges

answered Jan 26, 2009 at 14:16

Rod Daunoravicius's user avatar

Rod Daunoravicius Rod Daunoravicius

2,5381 gold badge16 silver badges10 bronze badges

8

19

This does not work if your myList has unhashable objects.

J_Zar
– J_Zar

2012年11月14日 11:30:04 +00:00
Commented Nov 14, 2012 at 11:30
wouldn't set(sorted(myList)) be faster? I mean isn't it faster to first sort a list and then remove its duplicates than first removing its duplicates and only sort it afterwards?

Zuzu Corneliu
– Zuzu Corneliu

2017年01月26日 19:27:35 +00:00
Commented Jan 26, 2017 at 19:27
1

@CorneliuZuzu Removing duplicates with set() changes order, so you have to do it this way

Dimali
– Dimali

2017年06月13日 09:25:41 +00:00
Commented Jun 13, 2017 at 9:25
2

Downvoted because there is a distinction between ordered and sorted. Ordered means keep the original order, e.g. f([3,1,4,1,5,9,2,6,5,3,5]) = [3,1,4,5,9,2,6]

Ken Seehart
– Ken Seehart

2019年12月19日 03:38:56 +00:00
Commented Dec 19, 2019 at 3:38
1

@user3667349 The "keep order" clause was not a part of the original question, it was added by the Colonel Panic edit in 2015.

Rod Daunoravicius
– Rod Daunoravicius

2019年12月24日 17:18:11 +00:00
Commented Dec 24, 2019 at 17:18

| Show 3 more comments

CollectivesTM on Stack Overflow

How to sort and remove duplicates from Python list? [duplicate]

6 Answers 6

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

6 Answers 6

Linked

Related