Python: count number of elements in list for if condition

Question 1

Given a list of integers, what is the most Pythonic / best way of counting how many elements are within a certain range?

I researched and found 2 ways of doing it:

>>> x = [10, 60, 20, 66, 79, 5]
>>> len([i for i in x if 60 < i < 70])
1

or:

>>> x = [10, 60, 20, 66, 79, 5]
>>> sum(1 for i in x if 60 < i < 70)
1

Which method uses less time/memory (for larger lists) and why? Or maybe another way is better...

Question 2

Do you actually need the list? If not, the second version avoids ever creating it.

Question 3

Most pythonic does not imply less time/memory, yet you ask about both in your question. Do you want to know which is the most pythonic or the most efficient?

Question 4

You can also use if i in range(61, 70).

Question 5

@pp_ yes that would also work, but I think it uses more operations because you are checking to see if every element of x is in the list/iterator [61, 62, 63, 64, 65, 66, 67, 68, 69, 70]

Question 6

In the specific instances you presented

[i for i in x if 60 < i < 70]

actually generates a brand-new list, then takes its len. Conversely,

(1 for i in x if 60 < i < 70)

is a generator expression over which you take a sum.

For large enough relevant items, the second version will be more efficient (esp. in terms of memory).

Timings

x = [65] * 9999999
%%time
len([i for i in x if 60 < i < 70])
CPU times: user 724 ms, sys: 44 ms, total: 768 ms
Wall time: 768 ms
Out[7]:
9999999
%%time
sum(1 for i in x if 60 < i < 70)
CPU times: user 592 ms, sys: 0 ns, total: 592 ms
Wall time: 593 ms

Question 7

I question the speed advantage you are giving the generator. My timings can't reproduce the generator being faster. In the list version you don't have to do additions, just get the length.

Question 8

@timgeb In the list version, there may be garbage collection playing a role.

Question 9

The generator expression is more memory efficient, because you don't have to create an extra list.

Creating a list and getting it's length (the latter being a very fast O(1) operation) seems to be faster than creating a generator and doing n additions for relatively small lists.

In [13]: x = [1]
In [14]: timeit len([i for i in x if 60 < i < 70])
10000000 loops, best of 3: 141 ns per loop
In [15]: timeit sum(1 for i in x if 60 < i < 70)
1000000 loops, best of 3: 355 ns per loop
In [16]: x = range(10)
In [17]: timeit len([i for i in x if 60 < i < 70])
1000000 loops, best of 3: 564 ns per loop
In [18]: timeit sum(1 for i in x if 60 < i < 70)
1000000 loops, best of 3: 781 ns per loop
In [19]: x = range(50)
In [20]: timeit len([i for i in x if 60 < i < 70])
100000 loops, best of 3: 2.4 μs per loop
In [21]: timeit sum(1 for i in x if 60 < i < 70)
100000 loops, best of 3: 2.62 μs per loop
In [22]: x = range(1000)
In [23]: timeit len([i for i in x if 60 < i < 70])
10000 loops, best of 3: 50.9 μs per loop
In [24]: timeit sum(1 for i in x if 60 < i < 70)
10000 loops, best of 3: 51.7 μs per loop

I tried with various lists, for example [65]*n and the trend does not change. For example:

In [1]: x = [65]*1000
In [2]: timeit len([i for i in x if 60 < i < 70])
10000 loops, best of 3: 67.3 μs per loop
In [3]: timeit sum(1 for i in x if 60 < i < 70)
10000 loops, best of 3: 82.3 μs per loop

Question 10

You can easily test this using the timeit module. For your particular example, the first len-based solution appears to be faster:

$ python --version
Python 2.7.10
$ python -m timeit -s "x = [10,60,20,66,79,5]" "len([i for i in x if 60 < i < 70])"
1000000 loops, best of 3: 0.514 usec per loop
$ python -m timeit -s "x = [10,60,20,66,79,5]" "sum(i for i in x if 60 < i < 70)"
1000000 loops, best of 3: 0.693 usec per loop

Even for larger lists -- but with most elements not matching your predicate -- the len version appears to be no slower:

$ python -m timeit -s "x = [66] + [8] * 10000" "len([i for i in x if 60 < i < 70])"
1000 loops, best of 3: 504 usec per loop
$ python -m timeit -s "x = [66] + [8] * 10000" "sum(1 for i in x if 60 < i < 70)"
1000 loops, best of 3: 501 usec per loop

In fact, even if most elements of the given list match (so a big result list is constructed to pass to len), the len version wins:

$ python -m timeit -s "x = [66] + [65] * 10000" "len([i for i in x if 60 < i < 70])"
1000 loops, best of 3: 762 usec per loop
$ python -m timeit -s "x = [66] + [65] * 10000" "sum(1 for i in x if 60 < i < 70)"
1000 loops, best of 3: 935 usec per loop

However, what seems to be a lot faster is to not have a list in the first place, if possible, but rather hold e.g. a collections.Counter. E.g. for 100000 elements, I get:

$ python -m timeit -s "import collections; x = [66] + [65] * 100000" "len([i for i in x if 60 < i < 70])"
100 loops, best of 3: 8.11 msec per loop
$ python -m timeit -s "import collections; x = [66] + [65] * 100000; d = collections.Counter(x)" "sum(v for k,v in d.items() if 60 < k < 70)"
1000000 loops, best of 3: 0.761 usec per loop

Ami Tavory 76.7k13 gold badges152 silver badges196 bronze badges · Accepted Answer · 2016-02-24 15:19:19Z

In the specific instances you presented

[i for i in x if 60 < i < 70]

actually generates a brand-new list, then takes its len. Conversely,

(1 for i in x if 60 < i < 70)

is a generator expression over which you take a sum.

For large enough relevant items, the second version will be more efficient (esp. in terms of memory).

Timings

x = [65] * 9999999
%%time
len([i for i in x if 60 < i < 70])
CPU times: user 724 ms, sys: 44 ms, total: 768 ms
Wall time: 768 ms
Out[7]:
9999999
%%time
sum(1 for i in x if 60 < i < 70)
CPU times: user 592 ms, sys: 0 ns, total: 592 ms
Wall time: 593 ms

I question the speed advantage you are giving the generator. My timings can't reproduce the generator being faster. In the list version you don't have to do additions, just get the length.
@timgeb In the list version, there may be garbage collection playing a role.

CollectivesTM on Stack Overflow

Python: count number of elements in list for if condition

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related