[Python-Dev] accumulator display syntax

Tue Oct 21 00:09:11 EDT 2003

> with a.py having:
> def asum(R):
> sum([ x*x for x in R ])
>> def gen(R):
> for x in R: yield x*x
> def gsum(R, gen=gen):
> sum(gen(R))
>> I measure:
>> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(100)' 'a.asum(R)'
> 10000 loops, best of 3: 96 usec per loop
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(100)' 'a.gsum(R)'
> 10000 loops, best of 3: 60 usec per loop
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(1000)' 'a.asum(R)'
> 1000 loops, best of 3: 930 usec per loop
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(1000)' 'a.gsum(R)'
> 1000 loops, best of 3: 590 usec per loop
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(10000)' 'a.asum(R)'
> 100 loops, best of 3: 1.28e+04 usec per loop
> [alex at lancelot auto]$ timeit.py -c -s'import a' -s'R=range(10000)' 'a.gsum(R)'
> 100 loops, best of 3: 8.4e+03 usec per loop
>> not sure why gsum's advantage ratio over asum seems to be roughly
> constant, but, this IS what I measure!-)

Great! This is a plus for iterator comprehensions (we need a better
term BTW). I guess that building up a list using repeated append()
calls slows things down more than the frame switching used by
generator functions; I knew the latter was fast but this is a pleasant
result.
BTW, if I use a different function that calculates list() instead of
sum(), the generator version is a few percent slower than the list
comprehension. But that's because list(a) has a shortcut in case a is
a list, while sum(a) always uses PyIter_Next(). So this is actually
consistent: despite the huge win of the shortcut, the generator
version is barely slower.
I think the answer lies in the bytecode:
>>> def lc(a):
 return [x for x in a]
 
>>> import dis
>>> dis.dis(lc)
 2 0 BUILD_LIST 0
 3 DUP_TOP 
 4 LOAD_ATTR 0 (append)
 7 STORE_FAST 1 (_[1])
 10 LOAD_FAST 0 (a)
 13 GET_ITER 
 >> 14 FOR_ITER 16 (to 33)
 17 STORE_FAST 2 (x)
 20 LOAD_FAST 1 (_[1])
 23 LOAD_FAST 2 (x)
 26 CALL_FUNCTION 1
 29 POP_TOP 
 30 JUMP_ABSOLUTE 14
 >> 33 DELETE_FAST 1 (_[1])
 36 RETURN_VALUE 
 37 LOAD_CONST 0 (None)
 40 RETURN_VALUE 
>>> def gen(a):
 for x in a: yield x
 
>>> dis.dis(gen)
 2 0 SETUP_LOOP 18 (to 21)
 3 LOAD_FAST 0 (a)
 6 GET_ITER 
 >> 7 FOR_ITER 10 (to 20)
 10 STORE_FAST 1 (x)
 13 LOAD_FAST 1 (x)
 16 YIELD_VALUE 
 17 JUMP_ABSOLUTE 7
 >> 20 POP_BLOCK 
 >> 21 LOAD_CONST 0 (None)
 24 RETURN_VALUE 
>>>
The list comprehension executes 7 bytecodes per iteration; the
generator version only 5 (this could be more of course if the
expression was more complicated than 'x'). The YIELD_VALUE does very
little work; falling out of the frame is like falling off a log; and
gen_iternext() is pretty sparse code too. On the list comprehension
side, calling the list's append method has a bunch of overhead. (Some
of which could be avoided if we had a special-purpose opcode which
called PyList_Append().)
But the executive summary remains: the generator wins because it
doesn't have to materialize the whole list.
--Guido van Rossum (home page: http://www.python.org/~guido/)