Issue 17170: string method lookup is too slow

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/61372

classification

Title:	string method lookup is too slow
Type:	performance	Stage:
Components:	Interpreter Core	Versions:	Python 3.5

process

Dependencies:	Superseder:
Status:	closed	Resolution:	rejected
Assigned To:	Nosy List:	BreamoreBoy, Mark.Shannon, amaury.forgeotdarc, barry, ezio.melotti, flox, gvanrossum, isoschiz, jcea, josh.r, larry, ncoghlan, pitrou, python-dev, scoder, serhiy.storchaka, terry.reedy, vstinner, yselivanov
Priority:	normal	Keywords:	patch

Created on 2013年02月09日 15:59 by gvanrossum, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
getargs_freelist.patch	pitrou, 2013年02月10日 00:05	review

Messages (27)
msg181741 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2013年02月09日 15:59
I'm trying to speed up a web template engine and I find that the code needs to do a lot of string replacements of this form: name = name.replace('_', '-') Characteristics of the data: the names are relatively short (1-10 characters usually), and the majority don't contain a '_' at all. For this combination I've found that the following idiom is significantly faster: if '_' in name: name = name.replace('_', '-') I'd hate for that idiom to become popular. I looked at the code (in the default branch) briefly, but it is already optimized for this case. So I am at a bit of a loss to explain the speed difference... Some timeit experiments: bash-3.2$ ./python.exe -m timeit -s "a = 'hundred'" "'x' in a" ./python.exe -m timeit -s "a = 'hundred'" "'x' in a" bash-3.2$ ./python.exe -m timeit -s "a = 'hundred'" "a.replace('x', 'y')" ./python.exe -m timeit -s "a = 'hundred'" "a.replace('x', 'y')" bash-3.2$ ./python.exe -m timeit -s "a = 'hundred'" "if 'x' in a: a.replace('x', 'y')" ./python.exe -m timeit -s "a = 'hundred'" "if 'x' in a: a.replace('x', 'y')" bash-3.2$ ./python.exe -m timeit -s "a = 'hunxred'" "a.replace('x', 'y')" ./python.exe -m timeit -s "a = 'hunxred'" "a.replace('x', 'y')" bash-3.2$ ./python.exe -m timeit -s "a = 'hunxred'" "if 'x' in a: a.replace('x', 'y')" ./python.exe -m timeit -s "a = 'hunxred'" "if 'x' in a: a.replace('x', 'y')"
msg181742 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2013年02月09日 16:07
> Characteristics of the data: the names are relatively short (1-10 > characters usually) $ ./python -m timeit -s "a = 'hundred'" "'x' in a" 10000000 loops, best of 3: 0.0431 usec per loop $ ./python -m timeit -s "a = 'hundred'" "a.find('x')" 1000000 loops, best of 3: 0.206 usec per loop $ ./python -m timeit -s "a = 'hundred'" "a.replace('x', 'y')" 10000000 loops, best of 3: 0.198 usec per loop Basically, it's simply the overhead of method calls over operator calls. You only see it because the strings are very short, and therefore the cost of finding / replacing is tiny.
msg181743 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2013年02月09日 16:18
Hm, you seem to be right. Changing the bug title. So, can we speed up method lookup? It's a shame that I have to start promoting this ugly idiom. There's a similar issue where s[:5]=='abcde' is faster than s.startswith('abcde'): ./python.exe -m timeit -s "a = 'hundred'" "a.startswith('foo')" 1000000 loops, best of 3: 0.281 usec per loop ./python.exe -m timeit -s "a = 'hundred'" "a[:3] == 'foo'" 10000000 loops, best of 3: 0.158 usec per loop
msg181744 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年02月09日 16:51
There are two overheads: an attribute lookup and a function call. $ ./python -m timeit -s "a = 'hundred'" "'x' in a" 10000000 loops, best of 3: 0.0943 usec per loop $ ./python -m timeit -s "a = 'hundred'" "a.__contains__('x')" 1000000 loops, best of 3: 0.271 usec per loop $ ./python -m timeit -s "a = 'hundred'" "a.__contains__" 10000000 loops, best of 3: 0.135 usec per loop Time of "a.__contains__('x')" is greater than the sum of times of "a.__contains__" and "'x' in a".
msg181753 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2013年02月09日 19:43
Indeed the function call cost actually dominates: $ ./python -m timeit -s "a = 'hundred'" "a.find('x')" 1000000 loops, best of 3: 0.206 usec per loop $ ./python -m timeit -s "a = 'hundred'; f=a.find" "f('x')" 10000000 loops, best of 3: 0.176 usec per loop $ ./python -m timeit -s "a = 'hundred'" "'x' in a" 10000000 loops, best of 3: 0.0431 usec per loop
msg181754 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2013年02月09日 19:58
Some crude C benchmarking on this computer: - calling PyUnicode_Replace is 35 ns (per call) - calling "hundred".replace is 125 ns - calling PyArg_ParseTuple with the same signature as "hundred".replace is 80 ns Therefore, most of the overhead (125 - 35 = 90 ns) is in calling PyArg_ParseTuple() to unpack the method arguments.
msg181755 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年02月09日 20:22
And PyArg_ParseTupleAndKeywords() is even more slow. $ ./python -m timeit "str(b'', 'utf-8', 'strict')" 1000000 loops, best of 3: 0.554 usec per loop $ ./python -m timeit "str(object=b'', encoding='utf-8', errors='strict')" 1000000 loops, best of 3: 1.74 usec per loop
msg181761 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2013年02月09日 21:00
Here is a patch yielding a decent speedup (~ 40%) on PyArg_ParseTuple itself. More generally though, this would be improved by precompiling some of the information (like Argument Clinic does, perhaps). (note: PyArg_ParseTupleAndKeywords is a completely separate implementation...)
msg181774 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2013年02月10日 00:02
Updated patch to also handle PyArg_ParseTupleAndKeywords.
msg181775 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2013年02月10日 00:16
Great to see some action. Would there be a problem in backporting this? It's not a new feature after all...
msg181776 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2013年02月10日 00:20
That would be left to the discretion of release managers. In all honesty the real-world benefit should be small (around 2% on the benchmark suite, apparently). Also, the principle of this patch doesn't apply to 2.7.
msg181933 - (view)	Author: Terry J. Reedy (terry.reedy) * (Python committer)	Date: 2013年02月11日 21:15
A related issue: the speed of finding and hence replacing chars in strings is known to have regressed in 3.3 relative to 3.2, especially on Windows. For long strings, that will negate in 3.3 the speedup for the initial method call. See #16061, with patches. The holdup seems to be deciding which of two good patches to apply.
msg181952 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)	Date: 2013年02月12日 11:01
I left some comments on Rietveld. I wonder if PyArg_ParseTupleAndKeywords can be replaced by something that would compute and cache the set of keywords; a bit like _Py_IDENTIFIER.
msg181965 - (view)	Author: Guido van Rossum (gvanrossum) * (Python committer)	Date: 2013年02月12日 16:15
What's the status of Argument Clinic? Won't that make this obsolete? --Guido van Rossum (sent from Android phone)
msg181969 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2013年02月12日 17:06
> I left some comments on Rietveld. > > I wonder if PyArg_ParseTupleAndKeywords can be replaced by something > that would compute and cache the set of keywords; a bit like > _Py_IDENTIFIER. It would make sense indeed.
msg182001 - (view)	Author: Alyssa Coghlan (ncoghlan) * (Python committer)	Date: 2013年02月13日 07:44
To answer Guido's question about clinic, see http://bugs.python.org/issue16612 Mostly positive feedback, but several of us would like a PEP to make sure we're happy with the resolution of the limited negative feedback.
msg182002 - (view)	Author: Larry Hastings (larry) * (Python committer)	Date: 2013年02月13日 08:05
Argument Clinic has languished for lack of time. I didn't get much feedback, though a couple people were shouting for a PEP, which I was resisting. I figured, if they have something to say, they can go ahead and reply on the tracker issue, and if they don't have something to say, why do we need a PEP? I need to reply to one bit of thorough feedback, and after that--I don't know. I'd like to get things moving before PyCon so we can point sprinters at it.
msg182006 - (view)	Author: Larry Hastings (larry) * (Python committer)	Date: 2013年02月13日 08:57
Oh, and, as to whether Argument Clinic would solve this problem, the answer is "not yet". Right now Argument Clinic literally generates calls to PyArg_ParseTupleAndKeywords. (In special cases it switches to PyArg_ParseTuple.) I'm more interested in Argument Clinic from the API perspective; I wanted to make a better way of specifying arguments to functions so we got all the metadata we needed without having to endlessly repeat ourselves. Truthfully I was hoping someone else would pick up the gauntlet once it was checked in and make a new argument processing API / hack up the Argument Clinic output to make it faster.
msg182007 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2013年02月13日 10:05
> Truthfully I was hoping someone else would pick up the gauntlet once it > was checked in and make a new argument processing API / hack up the > Argument Clinic output to make it faster. Argument Clinic's preprocessing would be a very nice building block to generate faster parsing sequences. Like Nick I'd still like to see a PEP, though ;-)
msg182250 - (view)	Author: Roundup Robot (python-dev) (Python triager)	Date: 2013年02月17日 00:09
New changeset 4e985a96a612 by Antoine Pitrou in branch 'default': Issue #17170: speed up PyArg_ParseTuple[AndKeywords] a bit. http://hg.python.org/cpython/rev/4e985a96a612
msg182607 - (view)	Author: Stefan Behnel (scoder) * (Python committer)	Date: 2013年02月21日 20:34
Let me throw in a quick reminder that Cython has substantially faster argument parsing than the C-API functions provide because it translates function signatures like def func(int a, b=1, *, list c, d=2): ... into tightly specialised unpacking code, while keeping it as compatible as possible with the equivalent Python function (better than manually implemented C functions, BTW). Might be an alternative to the Argument Clinic, one that has been working for a couple of years now and has already proven its applicability to a large body of real world code.
msg182612 - (view)	Author: Terry J. Reedy (terry.reedy) * (Python committer)	Date: 2013年02月21日 22:20
(Stefan) > into tightly specialised unpacking code, Are you suggesting that func.__call__ should be specialized to func's signature, more than it is now (which is perhaps not at all), or something else?
msg182613 - (view)	Author: Stefan Behnel (scoder) * (Python committer)	Date: 2013年02月21日 22:32
Cython does that in general, sure. However, this ticket is about a specific case where string methods (which are implemented in C) are slow when called from Python. Antoine found out that the main overhead is not so much from the method lookup itself but from argument parsing inside of the function. The unpacking code that Cython generates for the equivalent Python signature would speed this up, while keeping or improving the compatibility with Python call semantics.
msg183721 - (view)	Author: STINNER Victor (vstinner) * (Python committer)	Date: 2013年03月08日 01:56
> More generally though, this would be improved by precompiling some of the information (like Argument Clinic does, perhaps). The same idea was already proposed to optimize str%args and str.format(args). struct.unpack() does also compile the format into an optimize structure (and have a cache). We may do something like Martin von Loewis's _Py_IDENTIFIER API: compile at runtime at the first call, and cache the result in a static variable. It's not a tiny project, and I don't know exactly how to build a "JIT compiler" for getargs.c, nor how complex it would be. But it would speed up all Python calls, so any Python application.
msg212851 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2014年03月06日 22:52
What's the status of this issue? Code was committed to the default branch over a year ago, see msg182250
msg221085 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2014年06月20日 13:04
I don't think there's anything to do here so can it be closed? If anything else needs discussing surely it can go to python-ideas, python-dev or a new issue as appropriate.
msg221274 - (view)	Author: Antoine Pitrou (pitrou) * (Python committer)	Date: 2014年06月22日 16:38
Indeed keeping this issue open wouldn't be very productive since it relates to the more general problem of Python's slow interpretation.

History
Date	User	Action	Args
2022年04月11日 14:57:41	admin	set	github: 61372
2014年06月22日 16:38:56	pitrou	set	status: open -> closed resolution: rejected messages: + msg221274
2014年06月20日 13:04:50	BreamoreBoy	set	messages: + msg221085
2014年03月06日 22:52:33	BreamoreBoy	set	nosy: + BreamoreBoy messages: + msg212851
2014年03月06日 22:40:56	josh.r	set	nosy: + josh.r
2014年01月31日 23:34:30	yselivanov	set	nosy: + yselivanov
2014年01月31日 23:34:26	yselivanov	set	versions: + Python 3.5, - Python 3.4
2013年05月01日 22:48:52	isoschiz	set	nosy: + isoschiz
2013年03月08日 01:56:10	vstinner	set	messages: + msg183721
2013年02月28日 10:02:26	Mark.Shannon	set	nosy: + Mark.Shannon
2013年02月21日 22:32:25	scoder	set	messages: + msg182613
2013年02月21日 22:20:27	terry.reedy	set	messages: + msg182612
2013年02月21日 20:34:34	scoder	set	nosy: + scoder messages: + msg182607
2013年02月18日 16:08:58	jcea	set	nosy: + jcea
2013年02月17日 00:09:16	python-dev	set	nosy: + python-dev messages: + msg182250
2013年02月13日 14:33:50	barry	set	nosy: + barry
2013年02月13日 10:05:51	pitrou	set	messages: + msg182007
2013年02月13日 08:57:10	larry	set	messages: + msg182006
2013年02月13日 08:05:18	larry	set	nosy: + larry messages: + msg182002
2013年02月13日 07:44:49	ncoghlan	set	nosy: + ncoghlan messages: + msg182001
2013年02月12日 17:06:59	pitrou	set	messages: + msg181969
2013年02月12日 16:15:36	gvanrossum	set	messages: + msg181965
2013年02月12日 11:01:21	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc messages: + msg181952
2013年02月11日 21:15:50	terry.reedy	set	nosy: + terry.reedy messages: + msg181933
2013年02月11日 19:43:51	flox	set	nosy: + vstinner, flox
2013年02月10日 00:20:36	pitrou	set	messages: + msg181776
2013年02月10日 00:16:33	gvanrossum	set	messages: + msg181775 stage: patch review ->
2013年02月10日 00:05:34	pitrou	set	files: + getargs_freelist.patch
2013年02月10日 00:05:28	pitrou	set	files: - getargs_freelist.patch
2013年02月10日 00:05:24	pitrou	set	files: - getargs_freelist.patch
2013年02月10日 00:02:32	pitrou	set	files: + getargs_freelist.patch messages: + msg181774
2013年02月09日 21:04:31	serhiy.storchaka	set	stage: patch review
2013年02月09日 21:00:54	pitrou	set	files: + getargs_freelist.patch keywords: + patch messages: + msg181761
2013年02月09日 20:22:48	serhiy.storchaka	set	messages: + msg181755
2013年02月09日 19:58:53	pitrou	set	messages: + msg181754
2013年02月09日 19:43:53	pitrou	set	messages: + msg181753
2013年02月09日 16:51:54	serhiy.storchaka	set	messages: + msg181744
2013年02月09日 16:23:51	ezio.melotti	set	nosy: + ezio.melotti, serhiy.storchaka versions: + Python 3.4, - Python 3.2
2013年02月09日 16:18:55	gvanrossum	set	messages: + msg181743 title: string replace is too slow -> string method lookup is too slow
2013年02月09日 16:07:30	pitrou	set	nosy: + pitrou messages: + msg181742
2013年02月09日 15:59:30	gvanrossum	create

homepage