homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: with should be as fast as try/finally
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 2.6
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, belopolsky, benjamin.peterson, christian.heimes, collinwinter, jcea, jyasskin, ncoghlan, nnorwitz, rhettinger
Priority: normal Keywords: patch

Created on 2008年02月24日 23:10 by jyasskin, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
faster_with.patch jyasskin, 2008年03月03日 00:29 proof of concept
faster_with.patch jyasskin, 2008年03月03日 02:34 complete
Messages (10)
msg62951 - (view) Author: Jeffrey Yasskin (jyasskin) * (Python committer) Date: 2008年02月24日 23:10
Currently, 'with' costs about .2us over try/finally:
$ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'lock.acquire()' 'try: pass' 'finally:
lock.release()'
1000000 loops, best of 3: 0.617 usec per loop
$ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'with lock: pass'
1000000 loops, best of 3: 0.774 usec per loop
Since it's doing the same thing (and calling the same C functions to do
the lock manipulation), it shouldn't take more time. The bytecodes
associated with the two options look like:
2) with lock:
3) pass
 2 0 LOAD_GLOBAL 0 (lock)
 3 DUP_TOP 
 4 LOAD_ATTR 1 (__exit__)
 7 STORE_FAST 0 (_[1])
 10 LOAD_ATTR 2 (__enter__)
 13 CALL_FUNCTION 0
 16 POP_TOP 
 17 SETUP_FINALLY 4 (to 24)
 3 20 POP_BLOCK 
 21 LOAD_CONST 0 (None)
 >> 24 LOAD_FAST 0 (_[1])
 27 DELETE_FAST 0 (_[1])
 30 WITH_CLEANUP 
 31 END_FINALLY 
 32 LOAD_CONST 0 (None)
 35 RETURN_VALUE 
6) lock.acquire()
7) try:
8) pass
9) finally:
10) lock.release()
 6 0 LOAD_GLOBAL 0 (lock)
 3 LOAD_ATTR 1 (acquire)
 6 CALL_FUNCTION 0
 9 POP_TOP 
 7 10 SETUP_FINALLY 4 (to 17)
 8 13 POP_BLOCK 
 14 LOAD_CONST 0 (None)
 10 >> 17 LOAD_GLOBAL 0 (lock)
 20 LOAD_ATTR 2 (release)
 23 CALL_FUNCTION 0
 26 POP_TOP 
 27 END_FINALLY 
 28 LOAD_CONST 0 (None)
 31 RETURN_VALUE
The major difference I see is the extra local variable (_[1]) used by
with. It looks like the compiler ought to be able to save that on the
stack instead, and save 3 opcodes. Neal Norwitz suggested that, if that
optimization is impossible, WITH_CLEANUP could be modified to take the
variable as a parameter, which would let it absorb the LOAD_FAST and
DELETE_FAST instructions.
I've added everyone on the previous bug to the nosy list. Sorry if you
don't care. :)
msg63165 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2008年03月01日 16:24
A closer approximation of what the with statement is doing would be:
exit = lock.release()
lock.acquire()
try:
 pass
finally:
 exit()
The problem with trying to store the result of the retrieval of __exit__
on the stack is that we need to leave the context manager itself on top
of the stack for the next LOAD_ATTR opcode (when we're retrieving the
__enter__ method).
However, changing WITH_CLEANUP to take an argument indicating which
local variable holds the bound __exit__ method sounds like it might work.
msg63167 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2008年03月01日 16:34
Scratch the parentheses on that first line of sample code in my previous
comment (isn't copy and paste wonderful?)
msg63197 - (view) Author: Jeffrey Yasskin (jyasskin) * (Python committer) Date: 2008年03月03日 00:29
Here's a proof-of-concept patch that keeps the __exit__ method on the
stack. It uses ROT_TWO to stuff it under the context object instead of
storing it into a temporary. (Thanks Nick for pointing out that problem
before I had to waste time on it.) test_with passes, although I need to
update several more things and maybe fix a refleak.
The patch changes the compilation of:
 def with_(l):
 with l:
 pass
from
 4 0 LOAD_FAST 0 (l)
 3 DUP_TOP 
 4 LOAD_ATTR 0 (__exit__)
 7 STORE_FAST 1 (_[1])
 10 LOAD_ATTR 1 (__enter__)
 13 CALL_FUNCTION 0
 16 POP_TOP 
 17 SETUP_FINALLY 4 (to 24)
 5 20 POP_BLOCK 
 21 LOAD_CONST 0 (None)
 >> 24 LOAD_FAST 1 (_[1])
 27 DELETE_FAST 1 (_[1])
 30 WITH_CLEANUP 
 31 END_FINALLY 
 32 LOAD_CONST 0 (None)
 35 RETURN_VALUE 
to
 4 0 LOAD_FAST 0 (l)
 3 DUP_TOP 
 4 LOAD_ATTR 0 (__exit__)
 7 ROT_TWO 
 8 LOAD_ATTR 1 (__enter__)
 11 CALL_FUNCTION 0
 14 POP_TOP 
 15 SETUP_FINALLY 4 (to 22)
 5 18 POP_BLOCK 
 19 LOAD_CONST 0 (None)
 >> 22 WITH_CLEANUP 
 23 END_FINALLY 
 24 LOAD_CONST 0 (None)
 27 RETURN_VALUE 
And speeds it up from:
$ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'with lock: pass'
1000000 loops, best of 3: 0.832 usec per loop
to:
$ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'with lock: pass'
1000000 loops, best of 3: 0.762 usec per loop
That's only half of the way to parity with try/finally:
$ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'lock.acquire()' 'try: pass' 'finally:
lock.release()'
1000000 loops, best of 3: 0.638 usec per loop
What's strange is that calling __enter__ and __exit__ in a try/finally
block brings the speed back to the faster 'with' speed, even though they
call the same C functions:
$ ./python.exe -m timeit -s 'import thread; lock =
thread.allocate_lock()' 'lock.__enter__()' 'try: pass' 'finally:
lock.__exit__()'
1000000 loops, best of 3: 0.754 usec per loop
Any ideas?
msg63200 - (view) Author: Jeffrey Yasskin (jyasskin) * (Python committer) Date: 2008年03月03日 02:34
Now with documentation, a working test_compile, and one less refleak.
msg63323 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008年03月06日 15:53
> What's strange is that calling __enter__ and __exit__ in a 
> try/finally block brings the speed back to the faster 'with' speed, 
> even though they call the same C functions
Looking carefully at the code, there are two reasons for this:
- LockType has no methods! try "dir(thread.LockType)". Instead, LockType
defines a tp_getattr which does a *linear* search in a PyMethodDef
array. First items are served faster...
- After converting this to use the usual tp_methods slot, there is still
a performance difference, due to the fact that release() is a
METH_NOARGS method, while __exit__() uses METH_VARARGS.
Phew.
msg63353 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2008年03月07日 12:01
Patch applied cleanly for me and all tests pass.
It also looked good on a visual scan over the diff text.
msg63355 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2008年03月07日 14:17
I went ahead and committed the change to the bytecode generation as r61290.
The deficiencies in the lock implementation should probably be raised as
a separate issue, so I'm closing this one.
msg63356 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008年03月07日 14:43
Hm, my tests do not see any speedup with this patch.
I used VS2005 on win2K, and VS2008 on winXP.
Timings are very similar before and after this patch.
Maybe the optimization is only useful with gcc?
msg63425 - (view) Author: Jeffrey Yasskin (jyasskin) * (Python committer) Date: 2008年03月09日 17:03
Thanks Nick and Amaury!
Amaury, what times are you seeing? It could be a just-gcc speedup, but I
wouldn't have thought so since the patch eliminates 2 times around the
eval loop. It could be that the overhead of WITH_CLEANUP is high enough
to drown out those iterations. You had mentioned optimizing the
PyObject_CallFunctionObjArgs() call before. If you're still seeing with
significantly slower than try, that's probably the right place to look.
Here are my current timings. To avoid the lock issues, I wrote
simple_cm.py containing:
class CM(object):
 def __enter__(self):
 pass
 def __exit__(self, *args):
 pass
$ ./python.exe -m timeit -s 'import simple_cm; cm = simple_cm.CM()'
'with cm: pass'
1000000 loops, best of 3: 0.885 usec per loop
$ ./python.exe -m timeit -s 'import simple_cm; cm = simple_cm.CM()'
'cm.__enter__()' 'try: pass' 'finally: cm.__exit__()'
1000000 loops, best of 3: 0.858 usec per loop
If __exit__ doesn't contain *args (making it not a context manager), the
try/finally time goes down to:
$ ./python.exe -m timeit -s 'import simple_cm; cm = simple_cm.CM()'
'cm.__enter__()' 'try: pass' 'finally: cm.__exit__()'
1000000 loops, best of 3: 0.619 usec per loop
I think in theory, with could be slightly faster than finally with the
same argument list, but it's pretty close now.
History
Date User Action Args
2022年04月11日 14:56:31adminsetgithub: 46432
2008年03月09日 17:03:40jyasskinsetmessages: + msg63425
2008年03月08日 00:08:03jceasetnosy: + jcea
2008年03月07日 14:43:02amaury.forgeotdarcsetmessages: + msg63356
2008年03月07日 14:17:51ncoghlansetstatus: open -> closed
resolution: accepted
messages: + msg63355
2008年03月07日 12:01:48ncoghlansetmessages: + msg63353
2008年03月06日 15:54:00amaury.forgeotdarcsetmessages: + msg63323
2008年03月03日 15:27:57belopolskysetnosy: + belopolsky
2008年03月03日 02:34:12jyasskinsetfiles: + faster_with.patch
messages: + msg63200
2008年03月03日 00:29:46jyasskinsetfiles: + faster_with.patch
keywords: + patch
messages: + msg63197
2008年03月01日 16:34:03ncoghlansetmessages: + msg63167
2008年03月01日 16:24:17ncoghlansetnosy: + ncoghlan
messages: + msg63165
2008年02月25日 00:56:30collinwintersetnosy: + collinwinter
2008年02月24日 23:10:40jyasskincreate

AltStyle によって変換されたページ (->オリジナル) /