Message 286314 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	vstinner
Recipients	methane, python-dev, serhiy.storchaka, vstinner
Date	2017年01月26日.14:08:48
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1485439729.14.0.66968452831.issue29259@psf.upfronthosting.co.za>

Content
"While I feel your work is great, performance benefit seems very small, compared complexity of this patch." I have to agree. I spent a lot of times on benhchmarking these tp_fast* changes. While one or two benchmarks are faster, it's not really the case for the others. I also agree with the complexity. In Python 3.6, most FASTCALL changes were internals. For example, using PyObject_CallFunctionObjArgs() now uses FASTCALL internally, without having to modify callers of the API. I tried to only use _PyObject_FastCallDict/Keywords() in a few places where the speedup was significant. The main visible change of Python 3.6 FASTCALL is the new METH_CALL calling convention for C function. Your change modifying print() to use METH_CALL has a significant impact on the telco benchmark, without no drawback. I tested further changes to use METH_FASTCALL in struct and decimal modules, and they optimize telco even more. To continue the optimization work, I guess that using METH_CALL in more cases, using Argument Clinic whenever possible, would have a more concrete and measurable impact on performances, than this big tp_fastcall patch. But I'm not ready to abandon the whole approach yet, so I change the status to Pending. I may come back in one or two months, to check if I didn't miss anything obvious to unlock even more optimizations ;-)

Content

"While I feel your work is great, performance benefit seems very small,
compared complexity of this patch."
I have to agree. I spent a lot of times on benhchmarking these tp_fast* changes. While one or two benchmarks are faster, it's not really the case for the others.
I also agree with the complexity. In Python 3.6, most FASTCALL changes were internals. For example, using PyObject_CallFunctionObjArgs() now uses FASTCALL internally, without having to modify callers of the API. I tried to only use _PyObject_FastCallDict/Keywords() in a few places where the speedup was significant.
The main visible change of Python 3.6 FASTCALL is the new METH_CALL calling convention for C function. Your change modifying print() to use METH_CALL has a significant impact on the telco benchmark, without no drawback. I tested further changes to use METH_FASTCALL in struct and decimal modules, and they optimize telco even more.
To continue the optimization work, I guess that using METH_CALL in more cases, using Argument Clinic whenever possible, would have a more concrete and measurable impact on performances, than this big tp_fastcall patch.
But I'm not ready to abandon the whole approach yet, so I change the status to Pending. I may come back in one or two months, to check if I didn't miss anything obvious to unlock even more optimizations ;-)

History
Date	User	Action	Args
2017年01月26日 14:08:49	vstinner	set	recipients: + vstinner, methane, python-dev, serhiy.storchaka
2017年01月26日 14:08:49	vstinner	set	messageid: <1485439729.14.0.66968452831.issue29259@psf.upfronthosting.co.za>
2017年01月26日 14:08:49	vstinner	link	issue29259 messages
2017年01月26日 14:08:48	vstinner	create

homepage