homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Argument Clinic: inline parsing code for functions with only positional parameters
Type: performance Stage: resolved
Components: Argument Clinic Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: larry, pitrou, scoder, serhiy.storchaka, vstinner, xtreak
Priority: normal Keywords: patch, patch, patch

Created on 2018年12月25日 14:40 by serhiy.storchaka, last changed 2022年04月11日 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
bench.py vstinner, 2019年01月11日 15:12
Pull Requests
URL Status Linked Edit
PR 11313 merged serhiy.storchaka, 2018年12月25日 14:44
PR 11313 merged serhiy.storchaka, 2018年12月25日 14:44
PR 11313 merged serhiy.storchaka, 2018年12月25日 14:44
PR 11435 rhettinger, 2019年01月06日 18:28
PR 11435 rhettinger, 2019年01月06日 18:28
PR 11520 merged serhiy.storchaka, 2019年01月11日 15:09
PR 11520 merged serhiy.storchaka, 2019年01月11日 15:09
PR 11520 merged serhiy.storchaka, 2019年01月11日 15:09
PR 11524 merged serhiy.storchaka, 2019年01月11日 17:04
PR 11524 merged serhiy.storchaka, 2019年01月11日 17:04
PR 11524 merged serhiy.storchaka, 2019年01月11日 17:04
Messages (22)
msg332510 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018年12月25日 14:40
This is a continuation of issue23867. The proposed PR makes Argument Clinic inlining parsing code for functions with only positional parameters, i.e. functions that use PyArg_ParseTuple() and _PyArg_ParseStack() now. This saves time for parsing format strings and calling few levels of functions. It can save also a C stack, because of lesser number of nested (and potentially recursive) calls, lesser number of variables, and getting rid of a stack allocated array for "objects" which will need to be deallocated or cleaned up if overall parsing fails.
PyArg_ParseTuple() and _PyArg_ParseStack() will still be used if there are parameters for which inlining converter is not supported. Unsupported converters are deprecated Py_UNICODE API ("u", "Z"), encoded strings ("es", "et"), obsolete string/bytes converters ("y", "s#", "z#"), some custom converters (DWORD, HANDLE, pid_t, intptr_t).
msg333446 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019年01月11日 10:09
Some examples:
$ ./python -m timeit "format('abc')"
Unpatched: 5000000 loops, best of 5: 65 nsec per loop
Patched: 5000000 loops, best of 5: 42.4 nsec per loop
$ ./python -m timeit "'abc'.replace('x', 'y')"
Unpatched: 5000000 loops, best of 5: 101 nsec per loop
Patched: 5000000 loops, best of 5: 63.8 nsec per loop
$ ./python -m timeit "'abc'.ljust(5)"
Unpatched: 2000000 loops, best of 5: 120 nsec per loop
Patched: 5000000 loops, best of 5: 94.4 nsec per loop
$ ./python -m timeit "(1, 2, 3).index(2)"
Unpatched: 2000000 loops, best of 5: 100 nsec per loop
Patched: 5000000 loops, best of 5: 62.4 nsec per loop
$ ./python -m timeit -s "a = [1, 2, 3]" "a.index(2)"
Unpatched: 2000000 loops, best of 5: 93.8 nsec per loop
Patched: 5000000 loops, best of 5: 70.1 nsec per loop
./python -m timeit -s "import math" "math.pow(0.5, 2.0)"
Unpatched: 2000000 loops, best of 5: 112 nsec per loop
Patched: 5000000 loops, best of 5: 82.3 nsec per loop
msg333449 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年01月11日 10:33
$ ./python -m timeit "format('abc')"
Unpatched: 5000000 loops, best of 5: 65 nsec per loop
Patched: 5000000 loops, best of 5: 42.4 nsec per loop
-23 ns on 65 ns: this is very significant! I spent like 6 months to implement "FASTCALL" to avoid a single tuple to pass positional arguments and it was only 20 ns faster per call. Additional 23 ns make the code way faster compared than Python without FASTCALL! I estimate something like 80 ns => 42 ns: 2x faster!
$ ./python -m timeit "'abc'.replace('x', 'y')"
Unpatched: 5000000 loops, best of 5: 101 nsec per loop
Patched: 5000000 loops, best of 5: 63.8 nsec per loop
-38 ns on 101 ns: that's even more significant! Wow, that's very impressive!
Please merge your PR, I want it now :-D
Can you maybe add a vague sentence in the Optimizations section of What's New in Python 3.8 ? Something like: "Parsing positional arguments in builtin functions has been made more efficient."? I'm not sure if "builtin" is the proper term here. Functions using Argument Clinic to parse their arguments?
msg333453 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019年01月11日 10:52
I suppose that my computer is a bit faster than your, so your 20 ns can be only 15 ns or 10 ns on my computer. Run microbenchmarks on your computer to get a scale.
It may be possible to save yet few nanoseconds if inline a fast path for _PyArg_CheckPositional(), but I'm going to try this later.
This change is a step in a sequence. I will add a What's New note after finishing so much steps as possible. The next large step is to optimize argument parsing for functions with keyword parameters.
msg333455 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019年01月11日 10:57
Is it possible to run custom builds or benchmark of this once merged on speed.python.org ? I hope this give will be a noticeable dip in the benchmark graphs.
msg333457 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年01月11日 11:02
I can trigger a benchmark run on speed.python.org once the change is merged.
msg333458 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019年01月11日 11:11
Added Stefan because the new C API could be used in Cython after stabilizing. We should more cooperate with Cython team and provide a (semi-)official stable API for using in Cython.
I do not expect large affect on most tests, since this optimization affects only a part of functions, and can be noticeable only for very fast function calls.
msg333463 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019年01月11日 12:37
It might be worth inlining a fast path of "_PyArg_CheckPositional()" that only tests "nargs < min || nargs > max" (even via a macro), and then branches to the full error checking and reporting code only if that fails. Determining the concrete exception to raise is not time critical, but the good case is. Also, that would immediately collapse into "nargs != minmax" for the cases where "min == max", i.e. we expect an exact number of arguments.
And yes, a function that raises the expected exception with the expected error message for a hand full of common cases would be nice. :)
msg333469 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019年01月11日 14:01
New changeset 4fa9591025b6a098f3d6402e5413ee6740ede6c5 by Serhiy Storchaka in branch 'master':
bpo-35582: Argument Clinic: inline parsing code for positional parameters. (GH-11313)
https://github.com/python/cpython/commit/4fa9591025b6a098f3d6402e5413ee6740ede6c5
msg333476 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年01月11日 15:12
I converted msg333446 into attached bench.py using perf. Results on my laptop:
vstinner@apu$ ./python -m perf compare_to ref.json inlined.json --table -G
+-------------------------+---------+------------------------------+
| Benchmark | ref | inlined |
+=========================+=========+==============================+
| format('abc') | 74.4 ns | 43.7 ns: 1.70x faster (-41%) |
+-------------------------+---------+------------------------------+
| 'abc'.replace('x', 'y') | 93.0 ns | 57.5 ns: 1.62x faster (-38%) |
+-------------------------+---------+------------------------------+
| (1, 2, 3).index(2) | 92.5 ns | 59.2 ns: 1.56x faster (-36%) |
+-------------------------+---------+------------------------------+
| a.index(2) | 93.6 ns | 59.9 ns: 1.56x faster (-36%) |
+-------------------------+---------+------------------------------+
| 'abc'.ljust(5) | 124 ns | 86.0 ns: 1.44x faster (-30%) |
+-------------------------+---------+------------------------------+
| math.pow(0.5, 2.0) | 121 ns | 88.1 ns: 1.37x faster (-27%) |
+-------------------------+---------+------------------------------+
The speedup on my laptop is between 30.7 and 38.0 ns per function call, on these specific functions.
1.7x faster on format() is very welcome, well done Serhiy!
Note: You need the just released perf 1.6.0 version to run this benchmark ;-)
msg333478 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年01月11日 15:30
> I can trigger a benchmark run on speed.python.org once the change is merged.
Aha, it seems like Serhiy has more optimizations to come: PR #11520.
@Serhiy: tell me when you are done, so I can trigger a new benchmark run.
msg333479 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019年01月11日 15:41
PR 11520 additionally replaces PyArg_UnpackTuple() and _PyArg_UnpackStack() with _PyArg_CheckPositional() and inlined code in Argument Clinic.
Some examples for PR 11520:
$ ./python -m timeit "'abc'.strip()"
Unpatched: 5000000 loops, best of 5: 51.2 nsec per loop
Patched: 5000000 loops, best of 5: 45.8 nsec per loop
$ ./python -m timeit -s "d = {'a': 1}" "d.get('a')"
Unpatched: 5000000 loops, best of 5: 55 nsec per loop
Patched: 5000000 loops, best of 5: 51.1 nsec per loop
$ ./python -m timeit "divmod(5, 2)"
Unpatched: 5000000 loops, best of 5: 87 nsec per loop
Patched: 5000000 loops, best of 5: 80.6 nsec per loop
$ ./python -m timeit "hasattr(1, 'numerator')"
Unpatched: 5000000 loops, best of 5: 62.4 nsec per loop
Patched: 5000000 loops, best of 5: 54.8 nsec per loop
$ ./python -m timeit "isinstance(1, int)"
Unpatched: 5000000 loops, best of 5: 62.7 nsec per loop
Patched: 5000000 loops, best of 5: 54.1 nsec per loop
$ ./python -m timeit -s "from math import gcd" "gcd(6, 10)"
Unpatched: 2000000 loops, best of 5: 99.6 nsec per loop
Patched: 5000000 loops, best of 5: 89.9 nsec per loop
$ ./python -m timeit -s "from operator import add" "add(1, 2)"
Unpatched: 5000000 loops, best of 5: 40.7 nsec per loop
Patched: 10000000 loops, best of 5: 32.6 nsec per loop
msg333480 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年01月11日 15:47
$ ./python -m timeit -s "from operator import add" "add(1, 2)"
Unpatched: 5000000 loops, best of 5: 40.7 nsec per loop
Patched: 10000000 loops, best of 5: 32.6 nsec per loop
We should stop you, or the timing will become negative if you continue!
msg333482 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019年01月11日 16:01
New changeset 2a39d251f07d4c620e3b9a1848e3d1eb3067be64 by Serhiy Storchaka in branch 'master':
bpo-35582: Argument Clinic: Optimize the "all boring objects" case. (GH-11520)
https://github.com/python/cpython/commit/2a39d251f07d4c620e3b9a1848e3d1eb3067be64
msg333487 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019年01月11日 17:09
PR 11524 performs the same kind of changes as PR 11520, but for handwritten code (only if this causes noticeable speed up). Also iter() is now use the fast call convention.
$ ./python -m timeit "iter(())"
Unpatched: 5000000 loops, best of 5: 82.8 nsec per loop
Patched: 5000000 loops, best of 5: 56.3 nsec per loop
$ ./python -m timeit -s "it = iter([])" "next(it, None)"
Unpatched: 5000000 loops, best of 5: 54.1 nsec per loop
Patched: 5000000 loops, best of 5: 44.9 nsec per loop
$ ./python -m timeit "getattr(1, 'numerator')"
Unpatched: 5000000 loops, best of 5: 63.6 nsec per loop
Patched: 5000000 loops, best of 5: 57.5 nsec per loop
$ ./python -m timeit -s "from operator import attrgetter; f = attrgetter('numerator')" "f(1)"
Unpatched: 5000000 loops, best of 5: 64.1 nsec per loop
Patched: 5000000 loops, best of 5: 56.8 nsec per loop
$ ./python -m timeit -s "from operator import methodcaller; f = methodcaller('conjugate')" "f(1)"
Unpatched: 5000000 loops, best of 5: 79.5 nsec per loop
Patched: 5000000 loops, best of 5: 74.1 nsec per loop
It is possible to speed up also many math methods and maybe some contextvar and hamt methods, but this is for other issues.
msg333488 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019年01月11日 17:11
Nice! Well done, Serhiy!
msg333491 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年01月11日 17:19
$ ./python -m timeit "iter(())"
Unpatched: 5000000 loops, best of 5: 82.8 nsec per loop
Patched: 5000000 loops, best of 5: 56.3 nsec per loop
That's quite significant. Oh, it's because you converted builtin_iter() from METH_VARARGS to METH_FASTCALL at the same time. Interesting.
msg333493 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019年01月11日 17:23
Just inlining the arg tuple unpacking in iter() give only 10% speed up. I would not apply this optimization for such small difference. But with converting it to fast call it looks more interesting.
msg333505 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019年01月11日 20:58
Are there any numbers on higher-level benchmarks?
msg333513 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019年01月12日 06:25
New changeset 793426687509be24a42663a27e568cc92dcc07f6 by Serhiy Storchaka in branch 'master':
bpo-35582: Inline arguments tuple unpacking in handwritten code. (GH-11524)
https://github.com/python/cpython/commit/793426687509be24a42663a27e568cc92dcc07f6
msg333516 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019年01月12日 06:30
I do not expect significant changes in higher-level benchmarks. But if there are some, they can be shown on speed.python.org.
I this all work on this stage is finished.
msg333776 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年01月16日 17:11
I ran benchmarks on speed.python.org, it's the 5bb146aaea1484bcc117ab6cb38dda39ceb5df0f dot (Jan 13, 2019). I didn't look at results.
History
Date User Action Args
2022年04月11日 14:59:09adminsetgithub: 79763
2019年01月16日 17:11:25vstinnersetkeywords: patch, patch, patch

messages: + msg333776
2019年01月12日 06:30:26serhiy.storchakasetstatus: open -> closed
messages: + msg333516

keywords: patch, patch, patch
resolution: fixed
stage: patch review -> resolved
2019年01月12日 06:25:45serhiy.storchakasetmessages: + msg333513
2019年01月11日 20:58:18pitrousetkeywords: patch, patch, patch
nosy: + pitrou
messages: + msg333505

2019年01月11日 17:23:43serhiy.storchakasetkeywords: patch, patch, patch

messages: + msg333493
2019年01月11日 17:19:15vstinnersetkeywords: patch, patch, patch

messages: + msg333491
2019年01月11日 17:11:58scodersetmessages: + msg333488
2019年01月11日 17:09:39serhiy.storchakasetkeywords: patch, patch, patch

messages: + msg333487
2019年01月11日 17:04:27serhiy.storchakasetpull_requests: + pull_request11106
2019年01月11日 17:04:17serhiy.storchakasetpull_requests: + pull_request11105
2019年01月11日 17:04:08serhiy.storchakasetpull_requests: + pull_request11104
2019年01月11日 16:01:55serhiy.storchakasetmessages: + msg333482
2019年01月11日 15:47:51vstinnersetkeywords: patch, patch, patch

messages: + msg333480
2019年01月11日 15:41:27serhiy.storchakasetkeywords: patch, patch, patch

messages: + msg333479
2019年01月11日 15:30:25vstinnersetkeywords: patch, patch, patch

messages: + msg333478
2019年01月11日 15:12:07vstinnersetkeywords: patch, patch, patch
files: + bench.py
messages: + msg333476
2019年01月11日 15:09:21serhiy.storchakasetpull_requests: + pull_request11097
2019年01月11日 15:09:13serhiy.storchakasetpull_requests: + pull_request11096
2019年01月11日 15:09:03serhiy.storchakasetpull_requests: + pull_request11095
2019年01月11日 14:01:18serhiy.storchakasetmessages: + msg333469
2019年01月11日 12:37:27scodersetmessages: + msg333463
2019年01月11日 12:14:24scodersetnosy: + scoder, - scode
2019年01月11日 11:11:11serhiy.storchakasetkeywords: patch, patch, patch
nosy: + scode
messages: + msg333458

2019年01月11日 11:02:27vstinnersetkeywords: patch, patch, patch

messages: + msg333457
2019年01月11日 10:57:33xtreaksetkeywords: patch, patch, patch
nosy: + xtreak
messages: + msg333455

2019年01月11日 10:52:17serhiy.storchakasetkeywords: patch, patch, patch

messages: + msg333453
2019年01月11日 10:33:35vstinnersetkeywords: patch, patch, patch

messages: + msg333449
2019年01月11日 10:09:17serhiy.storchakasetkeywords: patch, patch, patch

messages: + msg333446
2019年01月06日 18:28:58rhettingersetpull_requests: + pull_request10904
2019年01月06日 18:28:53rhettingersetpull_requests: + pull_request10903
2019年01月05日 16:43:58serhiy.storchakalinkissue34838 dependencies
2019年01月05日 15:14:40serhiy.storchakasetkeywords: patch, patch, patch
nosy: + vstinner
2018年12月25日 14:44:11serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request10564
2018年12月25日 14:44:07serhiy.storchakasetkeywords: + patch
stage: (no value)
pull_requests: + pull_request10563
2018年12月25日 14:44:02serhiy.storchakasetkeywords: + patch
stage: (no value)
pull_requests: + pull_request10562
2018年12月25日 14:40:52serhiy.storchakacreate

AltStyle によって変換されたページ (->オリジナル) /