While discussing with some colleagues, one argued that a for
loop over a list of objects to call a method is a bad practice because it has bad performance compared to deque(map(methodcaller())
.
He did not support his claim with a benchmark. I did one myself.
Here is the setup:
import collections
import operator
import pyperf
class Do:
def nothing(self):
pass
dos = [Do() for _ in range(10000)]
And functions I'll benchmark:
def with_for_loop():
for do in dos:
do.nothing()
def with_deque_map():
collections.deque(
map(operator.methodcaller("nothing"), dos),
maxlen=0,
)
From a Pythonic point of view, if performance are not a requirement, I think the for loop is better by far. But I'm looking for a performance point of view.
I would expect the difference in an algorithm, minimal if significant, in favor of the deque/map/methodcaller.
But here are the results:
.....................
for_loop: Mean +- std dev: 244 us +- 8 us
.....................
deque_map: Mean +- std dev: 1.09 ms +- 0.02 ms
(same time difference with larger list)
Did I do something wrong with the benchmark?
Is the overhead of methodcaller
big enough to make this this slow?
I don't understand this result.
When Do.nothing()
is a static method:
class Do:
@staticmethod
def nothing():
pass
The performance gap get smaller:
.....................
for_loop: Mean +- std dev: 395 us +- 13 us
.....................
deque_map: Mean +- std dev: 712 us +- 10 us
Here, the for
loop get slower, and the other get faster than before. I think that the fact that Do.nothing
is a static method should make both faster, since there is no need to instantiate a bound method
I don't understand why the for
loop get slower.
If you want to run the benchmark yourself:
- Install pyperf:
pip install pyperf
- And ehre is the full script:
import collections
import operator
import pyperf
class Do:
def nothing(self):
pass
dos = [Do() for _ in range(10000)]
def with_for_loop():
for do in dos:
do.nothing()
def with_deque_map():
collections.deque(
map(operator.methodcaller("nothing"), dos),
maxlen=0,
)
def pyperf_bench():
runner = pyperf.Runner()
runner.bench_func(
name="for_loop",
func=with_for_loop,
)
runner.bench_func(
name="deque_map",
func=with_deque_map,
)
if __name__ == '__main__':
pyperf_bench()
Notes: deque with maxlen of 0 consume the map iterable without storing anything. docs.python.org/3/library/collections.html#deque-objects It's a known recipe to consume an generator (look at "consume" recipe): docs.python.org/3/library/itertools.html#itertools-recipes
1 Answer 1
With regards to your benchmark that uses two different methods to invoke a method on each element of an iterable of objects, then note that:
- In
with_for_loop
you are able to "directly" call methoddo_nothing
simply and efficiently withdo.do_nothing()
. - In
with_deque_map
you are invoking thedo_nothing
method in each iteration using the built-inmap
withoperator.methodcaller
, which are more or less necessitated by using adeque
instance to implicitly do the iteration for you. But the mechanics use to perform the method call on each iteration this way incurs additional overhead not present inwith_for_loop
.
So if you would like to benchmark two different methods of invoking a method call on an iterable containing object references, then I am not surprised to find that with_deque_map
performs worse than with_for_loop
because of the use of map
with operator_methodcaller
. Let's not consider a benchmark that measures the two methods of iterating an iterable of values without performing function calls, then I would modify the benchmark as follows:
import collections
import pyperf
iterable = [x for x in range(10_000)]
def with_for_loop():
for _ in iterable:
pass
def with_deque():
collections.deque(
iterable,
maxlen=0,
)
def pyperf_bench():
runner = pyperf.Runner()
runner.bench_func(
name="with_for_loop",
func=with_for_loop,
)
runner.bench_func(
name="with_deque",
func=with_deque,
)
if __name__ == '__main__':
pyperf_bench()
The output is:
with_for_loop: Mean +- std dev: 108 us +- 18 us
with_deque: Mean +- std dev: 25.0 us +- 1.8 us
You can clearly see that using a deque
instance now performs significantly better than using an explicit for loop. But if you have to use the built-in map
function with the operator.methodcaller
method to invoke method do_nothing
on an iterable of objects just to be able to use the "consume" recipe, you discover that you will be better off not using the recipe at all.
Do.nothing
instead ofoperator.methodcaller('nothing')
should bring them closer (probably still slower though). \$\endgroup\$