7

Given a list of integers, what is the most Pythonic / best way of counting how many elements are within a certain range?

I researched and found 2 ways of doing it:

>>> x = [10, 60, 20, 66, 79, 5]
>>> len([i for i in x if 60 < i < 70])
1

or:

>>> x = [10, 60, 20, 66, 79, 5]
>>> sum(1 for i in x if 60 < i < 70)
1

Which method uses less time/memory (for larger lists) and why? Or maybe another way is better...

asked Feb 24, 2016 at 15:16
4
  • Do you actually need the list? If not, the second version avoids ever creating it. Commented Feb 24, 2016 at 15:19
  • Most pythonic does not imply less time/memory, yet you ask about both in your question. Do you want to know which is the most pythonic or the most efficient? Commented Feb 24, 2016 at 15:25
  • You can also use if i in range(61, 70). Commented Feb 24, 2016 at 15:30
  • @pp_ yes that would also work, but I think it uses more operations because you are checking to see if every element of x is in the list/iterator [61, 62, 63, 64, 65, 66, 67, 68, 69, 70] Commented Feb 24, 2016 at 18:48

3 Answers 3

7

In the specific instances you presented

[i for i in x if 60 < i < 70]

actually generates a brand-new list, then takes its len. Conversely,

(1 for i in x if 60 < i < 70)

is a generator expression over which you take a sum.

For large enough relevant items, the second version will be more efficient (esp. in terms of memory).


Timings

x = [65] * 9999999
%%time
len([i for i in x if 60 < i < 70])
CPU times: user 724 ms, sys: 44 ms, total: 768 ms
Wall time: 768 ms
Out[7]:
9999999
%%time
sum(1 for i in x if 60 < i < 70)
CPU times: user 592 ms, sys: 0 ns, total: 592 ms
Wall time: 593 ms
answered Feb 24, 2016 at 15:19
Sign up to request clarification or add additional context in comments.

2 Comments

I question the speed advantage you are giving the generator. My timings can't reproduce the generator being faster. In the list version you don't have to do additions, just get the length.
@timgeb In the list version, there may be garbage collection playing a role.
5

The generator expression is more memory efficient, because you don't have to create an extra list.

Creating a list and getting it's length (the latter being a very fast O(1) operation) seems to be faster than creating a generator and doing n additions for relatively small lists.

In [13]: x = [1]
In [14]: timeit len([i for i in x if 60 < i < 70])
10000000 loops, best of 3: 141 ns per loop
In [15]: timeit sum(1 for i in x if 60 < i < 70)
1000000 loops, best of 3: 355 ns per loop
In [16]: x = range(10)
In [17]: timeit len([i for i in x if 60 < i < 70])
1000000 loops, best of 3: 564 ns per loop
In [18]: timeit sum(1 for i in x if 60 < i < 70)
1000000 loops, best of 3: 781 ns per loop
In [19]: x = range(50)
In [20]: timeit len([i for i in x if 60 < i < 70])
100000 loops, best of 3: 2.4 μs per loop
In [21]: timeit sum(1 for i in x if 60 < i < 70)
100000 loops, best of 3: 2.62 μs per loop
In [22]: x = range(1000)
In [23]: timeit len([i for i in x if 60 < i < 70])
10000 loops, best of 3: 50.9 μs per loop
In [24]: timeit sum(1 for i in x if 60 < i < 70)
10000 loops, best of 3: 51.7 μs per loop

I tried with various lists, for example [65]*n and the trend does not change. For example:

In [1]: x = [65]*1000
In [2]: timeit len([i for i in x if 60 < i < 70])
10000 loops, best of 3: 67.3 μs per loop
In [3]: timeit sum(1 for i in x if 60 < i < 70)
10000 loops, best of 3: 82.3 μs per loop
answered Feb 24, 2016 at 15:29

Comments

3

You can easily test this using the timeit module. For your particular example, the first len-based solution appears to be faster:

$ python --version
Python 2.7.10
$ python -m timeit -s "x = [10,60,20,66,79,5]" "len([i for i in x if 60 < i < 70])"
1000000 loops, best of 3: 0.514 usec per loop
$ python -m timeit -s "x = [10,60,20,66,79,5]" "sum(i for i in x if 60 < i < 70)"
1000000 loops, best of 3: 0.693 usec per loop

Even for larger lists -- but with most elements not matching your predicate -- the len version appears to be no slower:

$ python -m timeit -s "x = [66] + [8] * 10000" "len([i for i in x if 60 < i < 70])"
1000 loops, best of 3: 504 usec per loop
$ python -m timeit -s "x = [66] + [8] * 10000" "sum(1 for i in x if 60 < i < 70)"
1000 loops, best of 3: 501 usec per loop

In fact, even if most elements of the given list match (so a big result list is constructed to pass to len), the len version wins:

$ python -m timeit -s "x = [66] + [65] * 10000" "len([i for i in x if 60 < i < 70])"
1000 loops, best of 3: 762 usec per loop
$ python -m timeit -s "x = [66] + [65] * 10000" "sum(1 for i in x if 60 < i < 70)"
1000 loops, best of 3: 935 usec per loop

However, what seems to be a lot faster is to not have a list in the first place, if possible, but rather hold e.g. a collections.Counter. E.g. for 100000 elements, I get:

$ python -m timeit -s "import collections; x = [66] + [65] * 100000" "len([i for i in x if 60 < i < 70])"
100 loops, best of 3: 8.11 msec per loop
$ python -m timeit -s "import collections; x = [66] + [65] * 100000; d = collections.Counter(x)" "sum(v for k,v in d.items() if 60 < k < 70)"
1000000 loops, best of 3: 0.761 usec per loop
answered Feb 24, 2016 at 15:30

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.