Benchmark for loop and deque/map/methodcaller

Question 1

While discussing with some colleagues, one argued that a for loop over a list of objects to call a method is a bad practice because it has bad performance compared to deque(map(methodcaller()).

He did not support his claim with a benchmark. I did one myself.

Here is the setup:

import collections
import operator
import pyperf
class Do:
 def nothing(self):
 pass
dos = [Do() for _ in range(10000)]

And functions I'll benchmark:

def with_for_loop():
 for do in dos:
 do.nothing()
def with_deque_map():
 collections.deque(
 map(operator.methodcaller("nothing"), dos),
 maxlen=0,
 )

From a Pythonic point of view, if performance are not a requirement, I think the for loop is better by far. But I'm looking for a performance point of view.

I would expect the difference in an algorithm, minimal if significant, in favor of the deque/map/methodcaller.

But here are the results:

.....................
for_loop: Mean +- std dev: 244 us +- 8 us
.....................
deque_map: Mean +- std dev: 1.09 ms +- 0.02 ms

(same time difference with larger list)

Did I do something wrong with the benchmark? Is the overhead of methodcaller big enough to make this this slow? I don't understand this result.

When Do.nothing() is a static method:

class Do:
 @staticmethod
 def nothing():
 pass

The performance gap get smaller:

.....................
for_loop: Mean +- std dev: 395 us +- 13 us
.....................
deque_map: Mean +- std dev: 712 us +- 10 us

Here, the for loop get slower, and the other get faster than before. I think that the fact that Do.nothing is a static method should make both faster, since there is no need to instantiate a bound method I don't understand why the for loop get slower.

If you want to run the benchmark yourself:

Install pyperf: pip install pyperf
And ehre is the full script:

import collections
import operator
import pyperf
class Do:
 def nothing(self):
 pass
dos = [Do() for _ in range(10000)]
def with_for_loop():
 for do in dos:
 do.nothing()
def with_deque_map():
 collections.deque(
 map(operator.methodcaller("nothing"), dos),
 maxlen=0,
 )
def pyperf_bench():
 runner = pyperf.Runner()
 runner.bench_func(
 name="for_loop",
 func=with_for_loop,
 )
 runner.bench_func(
 name="deque_map",
 func=with_deque_map,
 )
if __name__ == '__main__':
 pyperf_bench()

Notes: deque with maxlen of 0 consume the map iterable without storing anything. docs.python.org/3/library/collections.html#deque-objects It's a known recipe to consume an generator (look at "consume" recipe): docs.python.org/3/library/itertools.html#itertools-recipes

Question 2

I have a feeling that with Do.nothing instead of operator.methodcaller('nothing') should bring them closer (probably still slower though).

Question 3

With regards to your benchmark that uses two different methods to invoke a method on each element of an iterable of objects, then note that:

In with_for_loop you are able to "directly" call method do_nothing simply and efficiently with do.do_nothing().
In with_deque_map you are invoking the do_nothing method in each iteration using the built-in map with operator.methodcaller, which are more or less necessitated by using a deque instance to implicitly do the iteration for you. But the mechanics use to perform the method call on each iteration this way incurs additional overhead not present in with_for_loop.

So if you would like to benchmark two different methods of invoking a method call on an iterable containing object references, then I am not surprised to find that with_deque_map performs worse than with_for_loop because of the use of map with operator_methodcaller. Let's not consider a benchmark that measures the two methods of iterating an iterable of values without performing function calls, then I would modify the benchmark as follows:

import collections
import pyperf
iterable = [x for x in range(10_000)]
def with_for_loop():
 for _ in iterable:
 pass
def with_deque():
 collections.deque(
 iterable,
 maxlen=0,
 )
def pyperf_bench():
 runner = pyperf.Runner()
 runner.bench_func(
 name="with_for_loop",
 func=with_for_loop,
 )
 runner.bench_func(
 name="with_deque",
 func=with_deque,
 )
if __name__ == '__main__':
 pyperf_bench()

The output is:

with_for_loop: Mean +- std dev: 108 us +- 18 us
with_deque: Mean +- std dev: 25.0 us +- 1.8 us

You can clearly see that using a deque instance now performs significantly better than using an explicit for loop. But if you have to use the built-in map function with the operator.methodcaller method to invoke method do_nothing on an iterable of objects just to be able to use the "consume" recipe, you discover that you will be better off not using the recipe at all.

Booboo Booboo 2,9813 silver badges13 bronze badges · Answer 1 · 2024-05-29 17:30:08Z

With regards to your benchmark that uses two different methods to invoke a method on each element of an iterable of objects, then note that:

In with_for_loop you are able to "directly" call method do_nothing simply and efficiently with do.do_nothing().
In with_deque_map you are invoking the do_nothing method in each iteration using the built-in map with operator.methodcaller, which are more or less necessitated by using a deque instance to implicitly do the iteration for you. But the mechanics use to perform the method call on each iteration this way incurs additional overhead not present in with_for_loop.

So if you would like to benchmark two different methods of invoking a method call on an iterable containing object references, then I am not surprised to find that with_deque_map performs worse than with_for_loop because of the use of map with operator_methodcaller. Let's not consider a benchmark that measures the two methods of iterating an iterable of values without performing function calls, then I would modify the benchmark as follows:

import collections
import pyperf
iterable = [x for x in range(10_000)]
def with_for_loop():
 for _ in iterable:
 pass
def with_deque():
 collections.deque(
 iterable,
 maxlen=0,
 )
def pyperf_bench():
 runner = pyperf.Runner()
 runner.bench_func(
 name="with_for_loop",
 func=with_for_loop,
 )
 runner.bench_func(
 name="with_deque",
 func=with_deque,
 )
if __name__ == '__main__':
 pyperf_bench()

The output is:

with_for_loop: Mean +- std dev: 108 us +- 18 us
with_deque: Mean +- std dev: 25.0 us +- 1.8 us

You can clearly see that using a deque instance now performs significantly better than using an explicit for loop. But if you have to use the built-in map function with the operator.methodcaller method to invoke method do_nothing on an iterable of objects just to be able to use the "consume" recipe, you discover that you will be better off not using the recipe at all.

Stack Exchange Network

Benchmark for loop and deque/map/methodcaller

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Benchmark for loop and deque/map/methodcaller

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions