homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: decimal: Use FASTCALL and/or Argument Clinic
Type: performance Stage: resolved
Components: Versions: Python 3.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: skrah Nosy List: malin, rhettinger, serhiy.storchaka, skrah, vstinner
Priority: normal Keywords: patch

Created on 2017年01月17日 16:38 by vstinner, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
decimal.patch vstinner, 2017年01月17日 16:38
decimal-2.patch vstinner, 2017年01月27日 15:31 review
Messages (23)
msg285665 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017年01月17日 16:38
I'm currently working on the isuse #29259: "Add tp_fastcall to PyTypeObject: support FASTCALL calling convention for all callable objects". I used bm_telco of the performance benchmark suite to check which functions still require to create a temporary tuple for positional arguments and a temporary dict for keyword arguments. I found 3 remaining functions which have an impact on the result of the benchmark:
* print(): optimized by the issue #29296
* _struct.unpack(): I just created the issue #29300 "Modify the _struct module to use FASTCALL and Argument Clinic"
* _decimal.Decimal.quantize()
I would like to know if Stephan would be ok to modify the _decimal module to use FASTCALL. I know that recently he reverted changes to keep the same code base on Python 3.5, 3.6 and 3.7.
With 4 changes (tp_fastcall #29259, print #29296, unpack #29300 and this issue), bm_telco becomes 22% faster which is not negligible!
 20.9 ms +- 0.5 ms => 16.4 ms +- 0.5 ms
Attached decimal.patch patch is the strict minimum change to optimize bm_telco, but I would prefer to change all _decimal functions and methods using METH_VARARGS and METH_VARARGS|METH_KEYWORDS to convert them to METH_FASTCALL.
The best would be to use Argument Clinic. AC exists since Python 3.5, so it should be possible to keep the same code base on Python 3.5-3.7, only generated code would be different.
msg285670 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017年01月17日 17:10
Oh wait, I'm not sure that attached patch has a significant impact on performances. It seems like the speedup mostly comes from the print patch:
http://bugs.python.org/issue29296#msg285668
But well, it is still interesting to use METH_FASTCALL in decimal ;-)
msg285671 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017年01月17日 17:12
See msg207652 in issue20177.
msg285673 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017年01月17日 17:29
This should not go in without Stephan Krah's approval. This code is finely tuned and carefully arranged.
msg285680 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017年01月17日 20:42
Raymond Hettinger: "This should not go in without Stephan Krah's approval. This code is finely tuned and carefully arrang"
Sure, that's why I opened this issue.
Serhiy Storchaka: "See msg207652 in issue20177."
Oh, I didn't know that Stefan Krah already gave his opinion.
Since the performance impact is unclear, I prefer to close this issue. I will come back if argument parsing is more clearly identified as a performance bottleneck.
msg286359 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017年01月27日 14:56
Hello, I'm back! :-)
I almost abandonned my the tp_fastcall change (#29259).
print() now uses FASTCALL (#29296), it already made bm_telco faster:
https://speed.python.org/timeline/#/?exe=5&ben=telco&env=1&revs=50&equid=off&quarts=on&extr=on
(see the speedup around January 16 at the right)
For the unpack module, I'm still working on my patch in the issue #29300, but it doesn't seem to have a significant impact on bm_telco.
So the remaining question is the usage of FASTCALL or Argument Clinic in the _decimal module.
> Since the performance impact is unclear, I prefer to close this issue. I will come back if argument parsing is more clearly identified as a performance bottleneck.
I ran a new benchmark and FASTCALL makes bm_telco benchmark 1.11x faster:
Median +- std dev: [ref] 19.4 ms +- 0.7 ms -> [decimal-2.patch] 17.5 ms +- 0.6 ms: 1.11x faster (-10%)
So I decided to reopen the issue.
Stefan: what do you think of using Argument Clinic and/or FASTCALL in _decimal? Is "bm_telco 1.11x faster" significant enough for you to justify such change?
"The best would be to use Argument Clinic. AC exists since Python 3.5, so it should be possible to keep the same code base on Python 3.5-3.7, only generated code would be different."
msg286368 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017年01月27日 15:23
I would dissuade from direct using of FASTCALL, but Argument Clinic is enough reliable. I would recommend to use Argument Clinic unless Stefan supports external fork of the decimal module and is limited by public API.
msg286369 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017年01月27日 15:31
Oops, I forgot to attach decimal-2.patch, here you have.
> I would dissuade from direct using of FASTCALL, but Argument Clinic is enough reliable.
I wrote decimal-2.patch in a few minutes, when I was trying to find all functions still calling _PyStack_AsTuple() in my tp_fast{new,init,call} fork. I just wrote it to identify which parts of CPython can be optimized to make bm_telco faster.
I agree that Argument Clinic should be preferred over using directly METH_FASTCALL, especially for the _decimal module: Stefan already wrote that he wants to have the same code base on Python 3.5, 3.6 and 3.7.
> I would recommend to use Argument Clinic unless Stefan supports external fork of the decimal module and is limited by public API.
What is the status of Argument Clinic? Is it something "public" or not?
The status of _decimal is also unclear to me. Is it part of CPython or not? :-) I know that there is also a "backported" module for Python 2:
 https://pypi.python.org/pypi/cdecimal
 http://www.bytereef.org/mpdecimal/index.html
IMHO it's a great tool. Maybe it's time to make it usable outside CPython in Python 3.7? Or maybe we should wait until the remaining missing features are implemented? Issues #20291 and #29299 for example.
I'm now waiting for Stefan's feedback ;-)
msg286371 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2017年01月27日 15:43
I'll take a look when I have the opportunity.
AC will not happen: It makes the module too large and unreadable.
msg286372 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017年01月27日 15:45
> What is the status of Argument Clinic? Is it something "public" or not?
No, it is for for internal CPython use only. It lacks some features (support 
of var-positional and var-keyword parameters, optional parameters without 
default value), the syntax of positional-only parameters is not officially 
accepted still, and future optimizations can require incompatible changes.
Only when all CPython builtins and extensions be converted to Argument Clinic,
PEP 457 (or an alternative) be accepted, Argument Clinic issues be resolved, 
we could say it stable enough. For now even Argument Clinic tests are broken.
msg286374 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017年01月27日 15:52
Stefan Krah:
> AC will not happen: It makes the module too large and unreadable.
Ah you dislike the additional [clinic input] sections?
It's kind of strange when you have to convert existing code, when once
the code uses AC, I prefer AC to separated documentation variables. It
helps to keep docstrings more up to date, and it helps to enhance the
API (ex: allow keywords, rename parameters to better names, etc.). It
also helps to make the documentation closer to the code, which is IMHO
a good thing :-)
IMHO the PyArg_ParseXXX() calls and their "kwlist" static variable are
"unreadable", and I'm happy to be able to hide them!
FYI decimal-2.patch replaces PyArg_ParseTupleAndKeywords() with
_PyArg_ParseStackAndKeywords() with static _PyArg_Parser object. This
object only decides keyword names once and is more efficient to parse
arguments. It explains partially the speedup. Only partially because
bm_telco only calls the .quantize() method, and it only uses
positional arguments (no keyword arguments) ;-)
msg286375 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2017年01月27日 16:01
> STINNER Victor added the comment:
> > AC will not happen: It makes the module too large and unreadable.
> 
> Ah you dislike the additional [clinic input] sections?
Yes, they tear apart the code. I stopped reading many C files because
of this. Brett asked why several people don't review, this is actually
*one* of the reasons for me.
> It's kind of strange when you have to convert existing code, when once
> the code uses AC, I prefer AC to separated documentation variables. It
> helps to keep docstrings more up to date, and it helps to enhance the
> API (ex: allow keywords, rename parameters to better names, etc.). It
> also helps to make the documentation closer to the code, which is IMHO
> a good thing :-)
Apparently it works for several people, but not me.
> IMHO the PyArg_ParseXXX() calls and their "kwlist" static variable are
> "unreadable", and I'm happy to be able to hide them!
> 
> FYI decimal-2.patch replaces PyArg_ParseTupleAndKeywords() with
> _PyArg_ParseStackAndKeywords() with static _PyArg_Parser object. This
> object only decides keyword names once and is more efficient to parse
> arguments. It explains partially the speedup. Only partially because
> bm_telco only calls the .quantize() method, and it only uses
> positional arguments (no keyword arguments) ;-)
Okay, thanks!
msg286448 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017年01月29日 19:47
>> Ah you dislike the additional [clinic input] sections?
>
> Yes, they tear apart the code. I stopped reading many C files because
> of this.
I concur with Stefan. AC is very impactful on modules, greatly affecting their readability and maintainability. The existing PyArg_ParseXXX() calls and their "kwlist" static variable are very easy to work with, easy to modify, and easy to teach (I cover them effortlessly in the Python classes I teach). In contrast, AC is much harder to learn, harder to get right, doesn't fit some existing APIs, harder to change, and it greatly expands the number of lines of code.
msg287131 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2017年02月06日 13:58
If the API can still change (msg287083), I think I'll wait with this until shortly before 3.7 beta.
msg287133 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017年02月06日 14:01
> If the API can still change (msg287083), I think I'll wait with this until shortly before 3.7 beta.
I do not understand why Serhiy keeps repeating that the API is going to change, I have no such plan. IMHO the FASTCALL API is now very well defined: (PyObject **args, Py_ssize_t nargs, PyObject *kwnames), and helper functions are now well tested.
We might optimize Argument Clinic further, but for decimal, it seems like it's a no-no and that direct usage of METH_FASTCALL is preferred.
So no, I don't plan or expect any API change.
msg287149 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017年02月06日 16:08
Oh ok, it seems like Serhiy wants to change METH_FASTCALL. I didn't know :-) He just opened the issue #29464: "Specialize FASTCALL for functions with positional-only parameters". Interesting idea.
msg297135 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017年06月28日 01:47
So Stefan: what do you want to do? Use FASTCALL or not? :-)
I would prefer to not wait until the beta if you are ok to use FASTCALL.
Note: it seems like bpo-29464 is going to be approved ;-)
msg297153 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2017年06月28日 05:20
I think I'll wait until #29464 is committed and the API is considered frozen (see msg295176?).
msg318123 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018年05月29日 22:14
Sorry, I lost interest in this optimization. If someone wants to work on it, please open a new issue.
msg338430 - (view) Author: Ma Lin (malin) * Date: 2019年03月20日 02:01
> AC will not happen: It makes the module too large and unreadable.
AC has great performance improvements these days: issue35582 and issue36127
It's quite worthy of reconsidering this decision.
msg338462 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2019年03月20日 12:17
Thanks, but it is still not going to happen. Look at the increased code size
in e.g. blake2s_impl.c.h.
I want to know what is going on in the code. Also, the performance
improvements are in argument parsing, but not when you have numerical
code like a * b, where a and b are already decimals.
msg338513 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年03月20日 23:36
> Also, the performance improvements are in argument parsing, but not when you have numerical code like a * b, where a and b are already decimals.
If the function call takes around 100 ns, the benefit of FASTCALL and the new more efficient function to parse arguments becomes quite significant. Some Decimal methods are that fast if not faster.
python3 -m perf timeit \
 -s 'import decimal; a=decimal.Decimal(1); b=decimal.Decimal(2); ctx=decimal.getcontext()' \
 'ctx.copy_sign(a, b)' \
 --duplicate=1024
=> Mean +- std dev: [ref] 148 ns +- 1 ns -> [fastcall] 98.9 ns +- 4.9 ns: 1.49x faster (-33%)
./python -m perf timeit \
 -s 'import decimal; a=decimal.Decimal(1); b=decimal.Decimal(2)' \
 'a.copy_sign(b)' \
 --duplicate=1024
=> Mean +- std dev: [ref] 123 ns +- 5 ns -> [fastcall] 86.5 ns +- 1.4 ns: 1.42x faster (-29%)
I wrote a quick & dirty change using directly METH_FASTCALL with _PyArg_ParseStackAndKeywords() (dec_mpd_qcopy_sign) and _PyArg_CheckPositional() (ctx_mpd_qcopy_sign). Using Argument Clinic may be more verbose, but it generates *even more* efficient code for functions accepting keyword arguments (like Decimal.copy_sign) and generate better docstring (at least, the signature for function parameters if no text is given).
Obviously, if your application only uses large numbers, the benefit will be invisible. But I guess that some people use decimal with "small" numbers ;-)
Note: Sure, you're right that operators like "a * b" wouldn't benefit from FASTCALL since they don't parse explicitly arguments. BINARY_MULTIPLY instruction gets directly 2 values from the Python stack and pass them to PyNumber_Multiply() with is defined to always take exactly 2 PyObject* in C.
msg338516 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019年03月21日 00:24
Hum, after reading again my previous, I'm not sure that my intent was clear. I'm fine with Stefan rejecting the optimization. He is the maintainer of decimal. I just wanted to comment what he said ;-)
History
Date User Action Args
2022年04月11日 14:58:42adminsetgithub: 73487
2019年03月21日 00:24:41vstinnersetmessages: + msg338516
2019年03月20日 23:36:13vstinnersetmessages: + msg338513
2019年03月20日 12:17:57skrahsetmessages: + msg338462
2019年03月20日 02:01:04malinsetnosy: + malin
messages: + msg338430
2018年05月29日 22:14:26vstinnersetstatus: open -> closed
resolution: out of date
messages: + msg318123

stage: resolved
2017年06月28日 05:20:21skrahsetmessages: + msg297153
2017年06月28日 01:47:33vstinnersetmessages: + msg297135
2017年02月06日 16:08:36vstinnersetmessages: + msg287149
2017年02月06日 14:01:22vstinnersetmessages: + msg287133
2017年02月06日 13:58:08skrahsetmessages: + msg287131
2017年01月29日 19:47:51rhettingersetmessages: + msg286448
2017年01月27日 16:01:24skrahsetmessages: + msg286375
2017年01月27日 15:52:55vstinnersetmessages: + msg286374
2017年01月27日 15:45:46serhiy.storchakasetmessages: + msg286372
2017年01月27日 15:43:04skrahsetmessages: + msg286371
2017年01月27日 15:31:32vstinnersetfiles: + decimal-2.patch

messages: + msg286369
2017年01月27日 15:23:48serhiy.storchakasetmessages: + msg286368
2017年01月27日 14:56:35vstinnersetstatus: closed -> open
resolution: rejected -> (no value)
messages: + msg286359
2017年01月17日 20:42:06vstinnersetstatus: open -> closed
resolution: rejected
messages: + msg285680
2017年01月17日 17:29:46rhettingersetassignee: skrah

messages: + msg285673
nosy: + rhettinger
2017年01月17日 17:12:19serhiy.storchakasetmessages: + msg285671
2017年01月17日 17:10:02vstinnersetmessages: + msg285670
2017年01月17日 16:38:48vstinnersetnosy: + serhiy.storchaka
2017年01月17日 16:38:44vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /