Message286314
| Author |
vstinner |
| Recipients |
methane, python-dev, serhiy.storchaka, vstinner |
| Date |
2017年01月26日.14:08:48 |
| SpamBayes Score |
-1.0 |
| Marked as misclassified |
Yes |
| Message-id |
<1485439729.14.0.66968452831.issue29259@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
"While I feel your work is great, performance benefit seems very small,
compared complexity of this patch."
I have to agree. I spent a lot of times on benhchmarking these tp_fast* changes. While one or two benchmarks are faster, it's not really the case for the others.
I also agree with the complexity. In Python 3.6, most FASTCALL changes were internals. For example, using PyObject_CallFunctionObjArgs() now uses FASTCALL internally, without having to modify callers of the API. I tried to only use _PyObject_FastCallDict/Keywords() in a few places where the speedup was significant.
The main visible change of Python 3.6 FASTCALL is the new METH_CALL calling convention for C function. Your change modifying print() to use METH_CALL has a significant impact on the telco benchmark, without no drawback. I tested further changes to use METH_FASTCALL in struct and decimal modules, and they optimize telco even more.
To continue the optimization work, I guess that using METH_CALL in more cases, using Argument Clinic whenever possible, would have a more concrete and measurable impact on performances, than this big tp_fastcall patch.
But I'm not ready to abandon the whole approach yet, so I change the status to Pending. I may come back in one or two months, to check if I didn't miss anything obvious to unlock even more optimizations ;-) |
|
History
|
|---|
| Date |
User |
Action |
Args |
| 2017年01月26日 14:08:49 | vstinner | set | recipients:
+ vstinner, methane, python-dev, serhiy.storchaka |
| 2017年01月26日 14:08:49 | vstinner | set | messageid: <1485439729.14.0.66968452831.issue29259@psf.upfronthosting.co.za> |
| 2017年01月26日 14:08:49 | vstinner | link | issue29259 messages |
| 2017年01月26日 14:08:48 | vstinner | create |
|