Issue 1479611: speed up function calls

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/43303

classification

Title:	speed up function calls
Type:	performance	Stage:	patch review
Components:	Interpreter Core	Versions:	Python 3.3

process

Dependencies:	Superseder:
Status:	closed	Resolution:	out of date
Assigned To:	Nosy List:	BreamoreBoy, belopolsky, bob.ippolito, collinwinter, jyasskin, loewis, nnorwitz, pas, pitrou, rhettinger
Priority:	low	Keywords:	needs review, patch

Created on 2006年05月01日 06:58 by nnorwitz, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
speed.diff	nnorwitz, 2006年05月01日 06:58	v1	review
s2.diff	nnorwitz, 2006年05月05日 08:27	v2	review
func-speed.diff	nnorwitz, 2006年05月11日 07:43	v3	review
funcall.patch	pitrou, 2008年01月13日 23:23

Messages (12)
msg50158 - (view)	Author: Neal Norwitz (nnorwitz) * (Python committer)	Date: 2006年05月01日 06:58
Results: 2.86% for 1 arg (len), 11.8% for 2 args (min), and 1.6% for pybench. trunk-speed$ ./python.exe -m timeit 'for x in xrange(10000): len([])' 100 loops, best of 3: 4.74 msec per loop trunk-speed$ ./python.exe -m timeit 'for x in xrange(10000): min(1,2)' 100 loops, best of 3: 8.03 msec per loop trunk-clean$ ./python.exe -m timeit 'for x in xrange(10000): len([])' 100 loops, best of 3: 4.88 msec per loop trunk-clean$ ./python.exe -m timeit 'for x in xrange(10000): min(1,2)' 100 loops, best of 3: 9.09 msec per loop pybench goes from 5688.00 down to 5598.00 Details about the patch: There are 2 unrelated changes. They both seem to provide equal benefits for calling varargs C. One is very simple and just inlines calling a varargs C function rather than calling PyCFunction_Call() which does extra checks that are already known. This moves meth and self up one block. and breaks the C_TRACE into 2. (When looking at the patch, this will make sense I hope.) The other change is more dangerous. It modifies load_args() to hold on to tuples so they aren't allocated and deallocated. The initialization is done one time in the new func _PyEval_Init(). It allocates 64 tuples of size 8 that are never deallocated. The idea is that there won't be usually be more than 64 frames with 8 or less parameters active on the stack at any one time (stack depth). There are cases where this can degenerate, but for the most part, it should only be marginally slower, but generally this should be a fair amount faster by skipping the alloc and dealloc and some extra work. My decrementing the _last_index inside the needs_free blocks, that could improve behaviour. This really needs comments added to the code. But I'm not gonna get there tonight. I'd be interested in comments about the code.
msg50159 - (view)	Author: Neal Norwitz (nnorwitz) * (Python committer)	Date: 2006年05月01日 07:08
Logged In: YES user_id=33168 I should note the numbers 64 and 8 are total guesses. It might be good to try and determine values based on empirical data.
msg50160 - (view)	Author: Martin v. Löwis (loewis) * (Python committer)	Date: 2006年05月01日 08:27
Logged In: YES user_id=21627 The tuples should get deallocated when Py_Finalize is called. It would be good if there was (conditional) statistical analysis, showing how often no tuple was found because the number of arguments was too large, and how often no tuple was found because the candidate was in use. I think it should be more stack-like, starting off with no tuples allocated, then returning them inside the needs_free blocks only if the refcount is 1 (or 2?). This would avoid degeneralized cases where some function holds onto its argument tuple indefinitely, thus consuming all 64 tuples. For the other part, I think it would make the code more readable if it inlined PyCFunction_Call even more: the test for NOARGS\|O could be integrated into the switch statement (one case for each), VARARGS and VARARGS\|KEYWORDS would both load the arguments, then call the function directly (possibly with NULL keywords). OLDARGS should goto either METH_NOARGS, METH_O, or METH_VARARGS depending on na (if you don't like goto, modifying flags would work as well).
msg50161 - (view)	Author: Neal Norwitz (nnorwitz) * (Python committer)	Date: 2006年05月05日 08:27
Logged In: YES user_id=33168 v2 attached. You might not want to review yet. I mostly did the first part of your suggest (stats, _Fini, and stack-like if I understood you correctly). I didn't do anything on the second part about inlinting Function_Call. perf seems to be about the same. I'm not entirely sure the patch is correct yet. I found one or two problems in the original. I added some more comments.
msg50162 - (view)	Author: Neal Norwitz (nnorwitz) * (Python committer)	Date: 2006年05月11日 07:43
Logged In: YES user_id=33168 This version actually works (in both normal and debug builds). It adds some stats which are useful and updates Misc/SpecialBuilds.txt. I modified to not preallocate and only hold a ref when the function didn't keep a ref. I still need to inline more of PyCFunction_Call. Speed is still the same as before. I'm not sure if I'll finish this before the sprint next week. Anyone there feel free to check this in if you finish it.
msg50163 - (view)	Author: Bob Ippolito (bob.ippolito) * (Python committer)	Date: 2006年05月22日 12:02
Logged In: YES user_id=139309 The performance gain for this patch (as-is) on Mac OS X i386 with a release build seems totally negligible. I'm not getting any consistent win with any of the timeit or pybench benchmarks.
msg50164 - (view)	Author: Neal Norwitz (nnorwitz) * (Python committer)	Date: 2006年05月23日 05:32
Logged In: YES user_id=33168 Interesting. I did the original work for this on an amd64 (gcc 3.4 i think). And continued work on ppc mac laptop (gcc 4.0 i think). Both had improvements. I assume you tested with v3? What about v1?
msg50165 - (view)	Author: Bob Ippolito (bob.ippolito) * (Python committer)	Date: 2006年05月23日 08:45
Logged In: YES user_id=139309 This was v3 on a MacBook Pro running 10.4.6 (gcc 4, of course, since that's the only Apple-distributed i386 GCC for OS X).
msg59872 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2008年01月13日 23:23
Here is a patch applicable for SVN trunk. However, as Bob I have mixed results on this. For example, functions with variable parameter count have become slower: # With patch $ ./python -m timeit -s "def f(x): pass" 'for x in xrange(10000): f(1)' 100 loops, best of 3: 4.92 msec per loop $ ./python -m timeit -s "def f(x): pass" 'for x in xrange(10000): f()' 100 loops, best of 3: 4.07 msec per loop $ ./python -m timeit -s "def f(x): pass" 'for x in xrange(10000): f(1,2)' 100 loops, best of 3: 5.04 msec per loop # Without patch $ ./python-orig -m timeit -s "def f(x): pass" 'for x in xrange(10000): f(1)' 100 loops, best of 3: 4.22 msec per loop $ ./python-orig -m timeit -s "def f(x): pass" 'for x in xrange(10000): f()' 100 loops, best of 3: 3.5 msec per loop $ ./python-orig -m timeit -s "def f(x): pass" 'for x in xrange(10000): f(1,2)' 100 loops, best of 3: 4.46 msec per loop
msg110446 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2010年07月16日 14:32
I'm not sure if this is worth pursuing given the way performance is so often governed by networking and/or IO issues today, bearing in mind comments like msg50163 and msg59872. I'd certainly like to see more comments from core developers. Could someone in the know please put them on the nosy list.
msg110448 - (view)	Author: Alexander Belopolsky (belopolsky) * (Python committer)	Date: 2010年07月16日 14:48
I think Raymond might be interested. Since this is not a bug fix, it can only be considered for 3.x.
msg160886 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2012年05月16日 16:45
Closing as terribly outdated (and not very promising).

History
Date	User	Action	Args
2022年04月11日 14:56:17	admin	set	github: 43303
2012年05月16日 16:45:57	pitrou	set	status: open -> closed resolution: out of date messages: + msg160886
2011年05月05日 18:27:20	pas	set	nosy: + pas
2011年03月22日 22:25:05	rhettinger	set	assignee: rhettinger -> nosy: loewis, nnorwitz, collinwinter, rhettinger, bob.ippolito, belopolsky, pitrou, jyasskin, BreamoreBoy versions: + Python 3.3, - Python 3.2
2010年08月11日 19:10:55	rhettinger	set	priority: normal -> low assignee: rhettinger
2010年08月09日 18:57:02	terry.reedy	set	type: enhancement -> performance
2010年07月16日 14:48:54	belopolsky	set	nosy: + belopolsky, rhettinger messages: + msg110448 versions: + Python 3.2, - Python 2.7
2010年07月16日 14:32:30	BreamoreBoy	set	nosy: + BreamoreBoy messages: + msg110446
2010年01月20日 17:03:15	ezio.melotti	set	keywords: + needs review stage: patch review versions: + Python 2.7, - Python 2.6
2009年01月12日 22:28:49	collinwinter	set	nosy: + collinwinter, jyasskin
2008年01月13日 23:23:48	pitrou	set	files: + funcall.patch nosy: + pitrou messages: + msg59872
2008年01月12日 04:43:23	christian.heimes	set	type: enhancement versions: + Python 2.6, - Python 2.5
2006年05月01日 06:58:24	nnorwitz	create

homepage