3

I was under the impression that set() would order a collection much like .sort()

However it seems that it doesn't, what was peculiar to me was why it reorders the collection.

>>> h = '321'
>>> set(h)
set(['1', '3', '2'])
>>> h
'321'
>>> h = '22311'
>>> set(h)
set(['1', '3', '2'])

why doesn't it return set(['1', '2', '3']). I also seems that no matter how many instances of each number I user or in what order I use them it always return set(['1', '3', '2']). Why?

Edit:

So I have read your answers and my counter to that is this.

>>> l = [1,2,3,3]
>>> set(l)
set([1, 2, 3])
>>> l = [3,3,2,3,1,1,3,2,3]
>>> set(l)
set([1, 2, 3])

Why does it order numbers and not strings?

Also

import random
l = []
for itr in xrange(101):
 l.append(random.randint(1,101))
print set(l)

Outputs

>>> 
set([1, 2, 4, 5, 6, 8, 10, 11, 12, 14, 15, 16, 18, 19, 23, 24, 25, 26, 29, 30, 31, 32, 34, 40, 43, 45, 46, 47, 48, 49, 50, 51, 53, 54, 55, 57, 58, 59, 60, 61, 62, 63, 64, 66, 67, 69, 70, 74, 75, 77, 79, 80, 83, 84, 85, 87, 88, 89, 90, 93, 94, 96, 97, 99, 101])
asked Aug 20, 2011 at 5:00

3 Answers 3

4

python set is unordered, hence there is no guarantee that the elements would be ordered in the same way as you specify them

If you want a sorted output, then call sorted:

sorted(set(h))

Responding to your edit: it comes down to the implementation of set. In CPython, it boils down to two things:

1) the set will be sorted by hash (the __hash__ function) modulo a limit

2) the limit is generally the next largest power of 2

So let's look at the int case:

x=1
type(x) # int
x.__hash__() # 1

for ints, the hash equals the original value:

[x==x.__hash__() for x in xrange(1000)].count(False) # = 0

Hence, when all the values are ints, it will use the integer hash value and everything works smoothly.

for the string representations, the hashes dont work the same way:

x='1'
type(x) 
# str
x.__hash__()
# 6272018864

To understand why the sort breaks for ['1','2','3'], look at those hash values:

[str(x).__hash__() for x in xrange(1,4)]
# [6272018864, 6400019251, 6528019634]

In our example, the mod value is 4 (3 elts, 2^1 = 2, 2^2 = 4) so

[str(x).__hash__()%4 for x in xrange(1,4)]
# [0, 3, 2]
[(str(x).__hash__()%4,str(x)) for x in xrange(1,4)]
# [(0, '1'), (3, '2'), (2, '3')]

Now if you sort this beast, you get the ordering that you see in set:

[y[1] for y in sorted([(str(x).__hash__()%4,str(x)) for x in xrange(1,4)])]
# ['1', '3', '2']
answered Aug 20, 2011 at 5:01
Sign up to request clarification or add additional context in comments.

Comments

1

From the python documentation of the set type:

A set object is an unordered collection of distinct hashable objects.

This means that the set doesn't have a concept of the order of the elements in it. You should not be surprised when the elements are printed on your screen in an unusual order.

answered Aug 20, 2011 at 5:05

Comments

1

A set in Python tries to be a "set" in the mathematical sense of the term. No duplicates, and order shouldn't matter.

answered Aug 20, 2011 at 5:06

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.