37

Which approach is better? Using a tuple, like:

if number in (1, 2):

or a list, like:

if number in [1, 2]:

Which one is recommended for such uses and why (both logical and performance wise)?

Martijn Pieters
1.1m326 gold badges4.2k silver badges3.4k bronze badges
asked Aug 18, 2014 at 16:58
7
  • 11
    Third option: set (which has a faster membership test). Commented Aug 18, 2014 at 17:00
  • 3
    CPython will do some internal optimisation and store your list literal as a tuple anyway... Commented Aug 18, 2014 at 17:00
  • 4
    Fourth option: frozenset, which has same membership test cost as set, O(1), but because it's immutable, the python interpreter knows the exact size of the hash table it needs to allocate, rather than leaving room for additional elements. Commented Aug 18, 2014 at 17:05
  • 3
    @IceArdor: but only in Python 3; using a set literal or frozenset([...]) expression in Python 2 means the object has to be created first, an operation more costly than the membership test against a tuple of equal length. Commented Aug 18, 2014 at 17:09
  • 3
    @sapam: in which case a simple equality test will beat both. You need to take the average cost into account here, not the best-case scenario. For 2 elements or more, the set wins. Provided it is a constant stored with the bytecode. Commented Aug 18, 2014 at 17:11

1 Answer 1

56

The CPython interpreter replaces the second form with the first.

That's because loading the tuple from a constant is one operation, but the list would be 3 operations; load the two integer contents and build a new list object.

Because you are using a list literal that isn't otherwise reachable, it is substituted for a tuple:

>>> import dis
>>> dis.dis(compile('number in [1, 2]', '<stdin>', 'eval'))
 1 0 LOAD_NAME 0 (number)
 3 LOAD_CONST 2 ((1, 2))
 6 COMPARE_OP 6 (in)
 9 RETURN_VALUE 

Here the second bytecode loads a (1, 2) tuple as a constant, in one step. Compare this to creating a list object not used in a membership test:

>>> dis.dis(compile('[1, 2]', '<stdin>', 'eval'))
 1 0 LOAD_CONST 0 (1)
 3 LOAD_CONST 1 (2)
 6 BUILD_LIST 2
 9 RETURN_VALUE 

Here N+1 steps are required for a list object of length N.

This substitution is a CPython-specific peephole optimisation; see the Python/peephole.c source. For other Python implementations then, you want to stick with immutable objects instead.

That said, the best option when using Python 3.2 and up, is to use a set literal:

if number in {1, 2}:

as the peephole optimiser will replace that with a frozenset() object and membership tests against sets are O(1) constant operations:

>>> dis.dis(compile('number in {1, 2}', '<stdin>', 'eval'))
 1 0 LOAD_NAME 0 (number)
 3 LOAD_CONST 2 (frozenset({1, 2}))
 6 COMPARE_OP 6 (in)
 9 RETURN_VALUE

This optimization was added in Python 3.2 but wasn't backported to Python 2.

As such, the Python 2 optimiser doesn't recognize this option and the cost of building either a set or frozenset from the contents is almost guaranteed to be more costly than using a tuple for the test.

Set membership tests are O(1) and fast; testing against a tuple is O(n) worst case. Although testing against a set has to calculate the hash (higher constant cost, cached for immutable types), the cost for testing against a tuple other than the first element is always going to be higher. So on average, sets are easily faster:

>>> import timeit
>>> timeit.timeit('1 in (1, 3, 5)', number=10**7) # best-case for tuples
0.21154764899984002
>>> timeit.timeit('8 in (1, 3, 5)', number=10**7) # worst-case for tuples
0.5670104179880582
>>> timeit.timeit('1 in {1, 3, 5}', number=10**7) # average-case for sets
0.2663505630043801
>>> timeit.timeit('8 in {1, 3, 5}', number=10**7) # worst-case for sets
0.25939063701662235
endolith
27k35 gold badges138 silver badges205 bronze badges
answered Aug 18, 2014 at 17:00
Sign up to request clarification or add additional context in comments.

16 Comments

@IceArdor: No, because the cost of building that set is greater than the cost of testing against a tuple. The tuple test is O(N) worst case, while creating the set is guaranteed to cost O(N), plus the membership test. This isn't about literal syntax, this is about the optimizer recognising you can replace the set with a frozenset constant.
Many Python programs that really care about efficiency will pre-build the data structure, regexp, etc. that is being matched. So if ACCEPTABLE = {1,2} is pre-defined (as a global or phantom kwarg), the only overhead is the testing, not the construction of what's being tested against.
@jfs: but this isn't a sequence. It is a set of possible values you are testing against. Containment doesn't care about position in a sequence, only the binary value of 'present' or 'not present'.
@endolith: Set literals are still better. The timeit test in my answer shows that for the worst case they are faster than using a tuple, for the best case they are nearly equivalent. That's because sets very much do not need to compare every element.
@jamesdlin because that wouldn’t work if the contents are not hashable.
|

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.