Get unique values from a list in python [duplicate]

Question 1

I want to get the unique values from the following list:

['nowplaying', 'PBS', 'PBS', 'nowplaying', 'job', 'debate', 'thenandnow']

The output which I require is:

['nowplaying', 'PBS', 'job', 'debate', 'thenandnow']

This code works:

output = []
for x in trends:
 if x not in output:
 output.append(x)
print(output)

is there a better solution I should use?

Question 2

Does the order matter? I.e. do you want the order of first occurrence, or would ["PBS", "debate", "job", "thenandnow", "nowplaying"] work as well?

Question 3

all the top solutions work for the example of the question, but they don't answer the questions. They all use set, which is dependent on the types found in the list. e.g: d = dict();l = list();l.append (d);set(l) will lead to TypeError: unhashable type: 'dict. frozenset instead won't save you. Learn it the real pythonic way: implement a nested n^2 loop for a simple task of removing duplicates from a list. You can, then optimize it to n.log n. Or implement a real hashing for your objects. Or marshal your objects before creating a set for it.

Question 4

If you need to preserve the order of the list: unique_items = list(dict.fromkeys(list_with_duplicates)) (CPython 3.6+)

Question 5

related: How to use multiprocessing to drop duplicates in a very big list?

Question 6

short and simple. //np.unique(listName)

Question 7

First declare your list properly, separated by commas. You can get the unique values by converting the list to a set.

mylist = ['nowplaying', 'PBS', 'PBS', 'nowplaying', 'job', 'debate', 'thenandnow']
myset = set(mylist)
print(myset)

If you use it further as a list, you should convert it back to a list by doing:

mynewlist = list(myset)

Another possibility, probably faster would be to use a set from the beginning, instead of a list. Then your code should be:

output = set()
for x in trends:
 output.add(x)
print(output)

As it has been pointed out, sets do not maintain the original order. If you need that, you should look for an ordered set implementation (see this question for more).

Question 8

If you need to maintain the set order there is also a library on PyPI: pypi.python.org/pypi/ordered-set

Question 9

why lists have '.append' and sets have '.add' ??

Question 10

"append" means to add to the end, which is accurate and makes sense for lists, but sets have no notion of ordering and hence no beginning or end, so "add" makes more sense for them.

Question 11

the 'sets' module is deprecated, yes. So you don't have to 'import sets' to get the functionality. if you see import sets; output = sets.Set() that's deprecated This answer uses the built-in 'set' class docs.python.org/2/library/stdtypes.html#set

Question 12

This does not work if the values of the list are not hashable (e.g., sets or lists)

Question 13

To be consistent with the type I would use:

mylist = list(set(mylist))

Question 14

Please note, the result will be unordered.

Question 15

@Ninjakannon your code will sort the list alphabetically. That does not have to be the order of the original list.

Question 16

Note a neat way to do this in python 3 is mylist = [*{*mylist}]. This is an *arg-style set-expansion followed by an *arg-style list-expansion.

Question 17

@LukeDavis best answer for me, sorted([*{*c}]) is 25% faster than sorted(list(set(c))) (measured with timeit.repeat with number=100000)

Question 18

N.B.: This fails if the list has unhashable elements.(e.g. elements which are itself sets, lists or hashes).

Question 19

If we need to keep the elements order, how about this:

used = set()
mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
unique = [x for x in mylist if x not in used and (used.add(x) or True)]

And one more solution using reduce and without the temporary used var.

mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
unique = reduce(lambda l, x: l.append(x) or l if x not in l else l, mylist, [])

UPDATE - Dec, 2020 - Maybe the best approach!

Starting from python 3.7, the standard dict preserves insertion order.

Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6.

So this gives us the ability to use dict.fromkeys() for de-duplication!

NOTE: Credits goes to @rlat for giving us this approach in the comments!

mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
unique = list(dict.fromkeys(mylist))

In terms of speed - for me its fast enough and readable enough to become my new favorite approach!

UPDATE - March, 2019

And a 3rd solution, which is a neat one, but kind of slow since .index is O(n).

mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
unique = [x for i, x in enumerate(mylist) if i == mylist.index(x)]

UPDATE - Oct, 2016

Another solution with reduce, but this time without .append which makes it more human readable and easier to understand.

mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
unique = reduce(lambda l, x: l+[x] if x not in l else l, mylist, [])
#which can also be writed as:
unique = reduce(lambda l, x: l if x in l else l+[x], mylist, [])

NOTE: Have in mind that more human-readable we get, more unperformant the script is. Except only for the dict.fromkeys() approach which is python 3.7+ specific.

import timeit
setup = "mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']"
#10x to Michael for pointing out that we can get faster with set()
timeit.timeit('[x for x in mylist if x not in used and (used.add(x) or True)]', setup='used = set();'+setup)
0.2029558869980974
timeit.timeit('[x for x in mylist if x not in used and (used.append(x) or True)]', setup='used = [];'+setup)
0.28999493700030143
# 10x to rlat for suggesting this approach! 
timeit.timeit('list(dict.fromkeys(mylist))', setup=setup)
0.31227896199925453
timeit.timeit('reduce(lambda l, x: l.append(x) or l if x not in l else l, mylist, [])', setup='from functools import reduce;'+setup)
0.7149233570016804
timeit.timeit('reduce(lambda l, x: l+[x] if x not in l else l, mylist, [])', setup='from functools import reduce;'+setup)
0.7379565160008497
timeit.timeit('reduce(lambda l, x: l if x in l else l+[x], mylist, [])', setup='from functools import reduce;'+setup)
0.7400134069976048
timeit.timeit('[x for i, x in enumerate(mylist) if i == mylist.index(x)]', setup=setup)
0.9154880290006986

ANSWERING COMMENTS

Because @monica asked a good question about "how is this working?". For everyone having problems figuring it out. I will try to give a more deep explanation about how this works and what sorcery is happening here ;)

So she first asked:

I try to understand why unique = [used.append(x) for x in mylist if x not in used] is not working.

Well it's actually working

>>> used = []
>>> mylist = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
>>> unique = [used.append(x) for x in mylist if x not in used]
>>> print used
[u'nowplaying', u'PBS', u'job', u'debate', u'thenandnow']
>>> print unique
[None, None, None, None, None]

The problem is that we are just not getting the desired results inside the unique variable, but only inside the used variable. This is because during the list comprehension .append modifies the used variable and returns None.

So in order to get the results into the unique variable, and still use the same logic with .append(x) if x not in used, we need to move this .append call on the right side of the list comprehension and just return x on the left side.

But if we are too naive and just go with:

>>> unique = [x for x in mylist if x not in used and used.append(x)]
>>> print unique
[]

We will get nothing in return.

Again, this is because the .append method returns None, and it this gives on our logical expression the following look:

x not in used and None

This will basically always:

evaluates to False when x is in used,
evaluates to None when x is not in used.

And in both cases (False/None), this will be treated as falsy value and we will get an empty list as a result.

But why this evaluates to None when x is not in used? Someone may ask.

Well it's because this is how Python's short-circuit operators works.

The expression x and y first evaluates x; if x is false, its value is returned; otherwise, y is evaluated and the resulting value is returned.

So when x is not in used (i.e. when its True) the next part or the expression will be evaluated (used.append(x)) and its value (None) will be returned.

But that's what we want in order to get the unique elements from a list with duplicates, we want to .append them into a new list only when we they came across for a fist time.

So we really want to evaluate used.append(x) only when x is not in used, maybe if there is a way to turn this None value into a truthy one we will be fine, right?

Well, yes and here is where the 2nd type of short-circuit operators come to play.

The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.

We know that .append(x) will always be falsy, so if we just add one or next to him, we will always get the next part. That's why we write:

x not in used and (used.append(x) or True)

so we can evaluate used.append(x) and get True as a result, only when the first part of the expression (x not in used) is True.

Similar fashion can be seen in the 2nd approach with the reduce method.

(l.append(x) or l) if x not in l else l
#similar as the above, but maybe more readable
#we return l unchanged when x is in l
#we append x to l and return l when x is not in l
l if x in l else (l.append(x) or l)

where we:

Append x to l and return that l when x is not in l. Thanks to the or statement .append is evaluated and l is returned after that.
Return l untouched when x is in l

Question 20

I try to understand why unique = [used.append(x) for x in mylist if x not in used] is not working. Why do we have to put and (used.append(x) or True) at the end of the list comprehensions?

Question 21

@Monica basically, because used.append(x) adds x into used but the return value from this function is None, so if we skip the or True part, we get: x not in used and None which will always evaluate to False and the unique list will remain empty.

Question 22

Don't worry, there are no stupid questions, only stupid answers :) I updated my answer with an attempt to better explain how it works, hope I make it clear and you can understand it now.

Question 23

Even faster is using a set: timeit.timeit('[x for x in mylist if x not in used and not used.add(x)]', setup='used = set();'+setup)

Question 24

Another option worth mentioning and working since Python 3.7 is using dict as it keeps the order of the keys but also eliminates duplicates: list(dict.fromkeys(mylist)) Timing-wise it positions as 3rd.

Question 25

A Python list:

>>> a = ['a', 'b', 'c', 'd', 'b']

To get unique items, just transform it into a set (which you can transform back again into a list if required):

>>> b = set(a)
>>> print(b)
{'b', 'c', 'd', 'a'}

Question 26

Nice, so a = list(set(a)) gets the unique items.

Question 27

Brian, set(a) is sufficient to "get the unique items". You only need to construct another list if you specifically need a list for some reason.

Question 28

Note the result will be unordered.

Question 29

What type is your output variable?

Python sets are what you need. Declare output like this:

output = set() # initialize an empty set

and you're ready to go adding elements with output.add(elem) and be sure they're unique.

Warning: sets DO NOT preserve the original order of the list.

Question 30

Options to remove duplicates may include the following generic data structures:

set: unordered, unique elements
ordered set: ordered, unique elements

Here is a summary on quickly getting either one in Python.

Given

from collections import OrderedDict
seq = [u"nowplaying", u"PBS", u"PBS", u"nowplaying", u"job", u"debate", u"thenandnow"]

Code

Option 1 - A set (unordered):

list(set(seq))
# ['thenandnow', 'PBS', 'debate', 'job', 'nowplaying']

Python doesn't have ordered sets, but here are some ways to mimic one.

Option 2 - an OrderedDict (insertion ordered):

list(OrderedDict.fromkeys(seq))
# ['nowplaying', 'PBS', 'job', 'debate', 'thenandnow']

Option 3 - a dict (insertion ordered), default in Python 3.6+. See more details in this post:

list(dict.fromkeys(seq))
# ['nowplaying', 'PBS', 'job', 'debate', 'thenandnow']

Note: listed elements must be hashable. See details on the latter example in this blog post. Furthermore, see R. Hettinger's post on the same technique; the order preserving dict is extended from one of his early implementations. See also more on total ordering.

Question 31

@Henry Henrinson I appreciate your voicing your reason in down-voting this answer. However, your opinion and claim " The Python 3.6 solution is not order preserving" are not qualified with references. To be clear, in Python 3.6, dictionaries preserve insertion order in the CPython implementation. It is a language feature in Python 3.7+. Moreover, see an on-going blog post on that approach claimed at that time to be the fastest ordered option in Python 3.6.

Question 32

Maintaining order:

# oneliners
# slow -> . --- 14.417 seconds ---
[x for i, x in enumerate(array) if x not in array[0:i]]
# fast -> . --- 0.0378 seconds ---
[x for i, x in enumerate(array) if array.index(x) == i]
# multiple lines
# fastest -> --- 0.012 seconds ---
uniq = []
[uniq.append(x) for x in array if x not in uniq]
uniq

Order doesn't matter:

# fastest-est -> --- 0.0035 seconds ---
list(set(array))

Question 33

This has terrible performance (O(n^2)) for large lists and is neither simpler nor easier to read than list(set(array)). The only advantage is the preservation of order, which was not asked for.

Question 34

This is great for simple scripts where you want to keep order and don't care about speed.

Question 35

@JeffCharter- added one that maintains order and is mucho faster :)

Question 36

@MMT - list comprehension

Question 37

I really appreciate you taking the time to break out the timestamps too

Question 38

Getting unique elements from List

mylist = [1,2,3,4,5,6,6,7,7,8,8,9,9,10]

Using Simple Logic from Sets - Sets are unique list of items

mylist=list(set(mylist))
In [0]: mylist
Out[0]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Using Simple Logic

newList=[]
for i in mylist:
 if i not in newList:
 newList.append(i)
In [0]: mylist
Out[0]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Using pop method ->pop removes the last or indexed item and displays that to user. video

k=0
while k < len(mylist):
 if mylist[k] in mylist[k+1:]:
 mylist.pop(mylist[k])
 else:
 k=k+1
In [0]: mylist
Out[0]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Using Numpy

import numpy as np
np.unique(mylist)
In [0]: mylist
Out[0]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Reference

Question 39

this answer deserves more updoots: for unhashable types where you want to check value uniqueness rather than identity uniqueness the simple logic is correct - meaning it's more correct in general.

Question 40

Numpy is good here

Question 41

If you are using numpy in your code (which might be a good choice for larger amounts of data), check out numpy.unique:

>>> import numpy as np
>>> wordsList = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
>>> np.unique(wordsList)
array([u'PBS', u'debate', u'job', u'nowplaying', u'thenandnow'], 
 dtype='<U10')

(http://docs.scipy.org/doc/numpy/reference/generated/numpy.unique.html)

As you can see, numpy supports not only numeric data, string arrays are also possible. Of course, the result is a numpy array, but it doesn't matter a lot, because it still behaves like a sequence:

>>> for word in np.unique(wordsList):
... print word
... 
PBS
debate
job
nowplaying
thenandnow

If you really want to have a vanilla python list back, you can always call list().

However, the result is automatically sorted, as you can see from the above code fragments. Check out numpy unique without sort if retaining list order is required.

Question 42

set - unordered collection of unique elements. List of elements can be passed to set's constructor. So, pass list with duplicate elements, we get set with unique elements and transform it back to list then get list with unique elements. I can say nothing about performance and memory overhead, but I hope, it's not so important with small lists.

list(set(my_not_unique_list))

Simply and short.

Question 43

Could you add some explanation on your code for OP?

Question 44

I tried your answer, this is a good answer but with an explanation it will turns into a great answer :)

Question 45

set - unordered collection of unique elements. List of elements can be passed to set's constructor. So, pass list with duplicate elements, we get set with unique elements and transform it back to list then get list with unique elements. I can say nothing about performance and memory overhead, but I hope, it's not so important with small lists.

Question 46

Same order unique list using only a list compression.

> my_list = [1, 2, 1, 3, 2, 4, 3, 5, 4, 3, 2, 3, 1]
> unique_list = [
> e
> for i, e in enumerate(my_list)
> if my_list.index(e) == i
> ]
> unique_list
[1, 2, 3, 4, 5]

enumerates gives the index i and element e as a tuple.

my_list.index returns the first index of e. If the first index isn't i then the current iteration's e is not the first e in the list.

Edit

I should note that this isn't a good way to do it, performance-wise. This is just a way that achieves it using only a list compression.

Question 47

First thing, the example you gave is not a valid list.

example_list = [u'nowplaying',u'PBS', u'PBS', u'nowplaying', u'job', u'debate',u'thenandnow']

Suppose if above is the example list. Then you can use the following recipe as give the itertools example doc that can return the unique values and preserving the order as you seem to require. The iterable here is the example_list

from itertools import ifilterfalse
def unique_everseen(iterable, key=None):
 "List unique elements, preserving order. Remember all elements ever seen."
 # unique_everseen('AAAABBBCCDAABBB') --> A B C D
 # unique_everseen('ABBCcAD', str.lower) --> A B C D
 seen = set()
 seen_add = seen.add
 if key is None:
 for element in ifilterfalse(seen.__contains__, iterable):
 seen_add(element)
 yield element
 else:
 for element in iterable:
 k = key(element)
 if k not in seen:
 seen_add(k)
 yield element

Question 48

What's the reason for seen_add = seen.add ?

Question 49

It saves one attribute lookup for each element.

Question 50

What is the purpose of ifilterfalse(seen.__contains__, iterable)? Is there a benefit versus for element not in seen:... ?

Question 51

As a bonus, Counter is a simple way to get both the unique values and the count for each value:

from collections import Counter
l = [u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
c = Counter(l)

Question 52

By using basic property of Python Dictionary:

inp=[u'nowplaying', u'PBS', u'PBS', u'nowplaying', u'job', u'debate', u'thenandnow']
d={i for i in inp}
print d

Output will be:

set([u'nowplaying', u'job', u'debate', u'PBS', u'thenandnow'])

Question 53

And, from dinamic values?

Question 54

@e-info128 Quite similarly, put those in a set.

Question 55

This is a set, not a dict.

Question 56

def get_distinct(original_list):
 distinct_list = []
 for each in original_list:
 if each not in distinct_list:
 distinct_list.append(each)
 return distinct_list

Question 57

please add some explanation - this is only code. If you look at the other answers, they always go with code and explanation.

Question 58

@Alexander not always useless, but typically is.

Question 59

set can help you filter out the elements from the list that are duplicates. It will work well for str, int or tuple elements, but if your list contains dict or other list elements, then you will end up with TypeError exceptions.

Here is a general order-preserving solution to handle some (not all) non-hashable types:

def unique_elements(iterable):
 seen = set()
 result = []
 for element in iterable:
 hashed = element
 if isinstance(element, dict):
 hashed = tuple(sorted(element.iteritems()))
 elif isinstance(element, list):
 hashed = tuple(element)
 if hashed not in seen:
 result.append(element)
 seen.add(hashed)
 return result

Question 60

def setlist(lst=[]):
 return list(set(lst))

Question 61

Try not to use [] as a default parameter. It is the same instance that is used every time so modifications affect the next time the function is called. Not so much of an issue here but it's still unnecessary.

Question 62

@Trengot Exactly. It should be lst=None, and add a line lst = [] if lst is None

Question 63

@xis: or simply lst or []

Question 64

Please note, the result will be unordered.

lefterav lefterav 16.1k1 gold badge19 silver badges14 bronze badges · Accepted Answer · 2012-10-15 14:11:06Z

1533

First declare your list properly, separated by commas. You can get the unique values by converting the list to a set.

mylist = ['nowplaying', 'PBS', 'PBS', 'nowplaying', 'job', 'debate', 'thenandnow']
myset = set(mylist)
print(myset)

If you use it further as a list, you should convert it back to a list by doing:

mynewlist = list(myset)

Another possibility, probably faster would be to use a set from the beginning, instead of a list. Then your code should be:

output = set()
for x in trends:
 output.add(x)
print(output)

As it has been pointed out, sets do not maintain the original order. If you need that, you should look for an ordered set implementation (see this question for more).

Share

Improve this answer

edited Oct 9, 2019 at 4:05

user3064538

answered Oct 15, 2012 at 14:11

lefterav's user avatar

lefterav lefterav

16.1k1 gold badge19 silver badges14 bronze badges

10

8

If you need to maintain the set order there is also a library on PyPI: pypi.python.org/pypi/ordered-set

Jace Browning
– Jace Browning

2013年09月26日 01:12:31 +00:00
Commented Sep 26, 2013 at 1:12
13

why lists have '.append' and sets have '.add' ??

Antonello
– Antonello

2014年01月28日 11:05:51 +00:00
Commented Jan 28, 2014 at 11:05
75

"append" means to add to the end, which is accurate and makes sense for lists, but sets have no notion of ordering and hence no beginning or end, so "add" makes more sense for them.

maackle
– maackle

2014年03月11日 03:01:14 +00:00
Commented Mar 11, 2014 at 3:01
3

the 'sets' module is deprecated, yes. So you don't have to 'import sets' to get the functionality. if you see import sets; output = sets.Set() that's deprecated This answer uses the built-in 'set' class docs.python.org/2/library/stdtypes.html#set

FlipMcF
– FlipMcF

2015年12月09日 00:25:42 +00:00
Commented Dec 9, 2015 at 0:25
14

This does not work if the values of the list are not hashable (e.g., sets or lists)

steffen
– steffen

2018年05月02日 05:14:15 +00:00
Commented May 2, 2018 at 5:14

| Show 5 more comments

CollectivesTM on Stack Overflow

Get unique values from a list in python [duplicate]

30 Answers 30

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

30 Answers 30

Linked

Related