This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015年03月01日 00:50 by rhettinger, last changed 2022年04月11日 14:58 by admin.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| size_t.diff | rhettinger, 2015年03月01日 06:19 | Fix deque casts | review | |
| bounds_check_list.diff | rhettinger, 2015年03月01日 07:14 | Faster bounds checking for lists. | review | |
| bounds_check_deque.diff | rhettinger, 2015年03月01日 07:50 | Faster bounds checking for deques | review | |
| valid_index.diff | rhettinger, 2015年03月02日 04:13 | Isolate the technique in an in-lineable function call | review | |
| Messages (13) | |||
|---|---|---|---|
| msg236928 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2015年03月01日 00:50 | |
Python's core is full of bound checks like this one in Objects/listobject.c: static PyObject * list_item(PyListObject *a, Py_ssize_t i) { if (i < 0 || i >= Py_SIZE(a)) { ... Abner Fog's high-level language optimization guide, http://www.agner.org/optimize/optimizing_cpp.pdf in section 14.2 Bounds Checking, shows a way to fold this into a single check: - if (i < 0 || i >= Py_SIZE(a)) { + if ((unsigned)i >= (unsigned)(Py_SIZE(a))) { if (indexerr == NULL) { indexerr = PyUnicode_FromString( "list index out of range"); The old generated assembly code looks like this: _list_item: subq 8,ドル %rsp testq %rsi, %rsi js L227 cmpq 16(%rdi), %rsi jl L228 L227: ... <error reporting and exit > ... L228: movq 24(%rdi), %rax movq (%rax,%rsi,8), %rax addq 1,ドル (%rax) addq 8,ドル %rsp ret The new disassembly looks like this: _list_item: cmpl %esi, 16(%rdi) ja L227 ... <error reporting and exit > ... L227: movq 24(%rdi), %rax movq (%rax,%rsi,8), %rax addq 1,ドル (%rax) ret Note, the new code not only saves a comparison/conditional-jump pair, it also avoids the need to adjust %rsp on the way in and the way out for a net savings of four instructions along the critical path. When we have good branch prediction, the current approach is very low cost; however, Abner Fog's recommendation is never more expensive, is sometimes cheaper, saves a possible misprediction, and reduces the total code generated. All in all, it is a net win. I recommend we put in a macro of some sort so that this optimization gets expressed exactly once in the code and so that it has a good clear name with an explanation of what it does. |
|||
| msg236931 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年03月01日 05:39 | |
Yes, this is a technique commonly used in STL implementations. This is why sizes and indices in STL are unsigned.
But in CPython implementation sizes are signed (Py_ssize_t). The problem with using this optimization (rather low-level than high-level) is that we need to know unsigned version of the type of compared values.
> - if (i < 0 || i >= Py_SIZE(a)) {
> + if ((unsigned)i >= (unsigned)(Py_SIZE(a))) {
Here is a bug. The type of i and Py_SIZE(a) is Py_ssize_t, so when casted to unsigned int, highest bits are lost. The correct casting type is size_t.
In changeset 5942fd9ab335 you introduced a bug.
|
|||
| msg236933 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2015年03月01日 06:09 | |
> The type of i and Py_SIZE(a) is Py_ssize_t, so when casted to > unsigned int, highest bits are lost. The correct casting type is size_t. Yes, I had just seen that a early today and deciding whether to substitute size_t for the unsigned cast or whether to just revert. I believe size_t is guaranteed to hold any array index and that a cast from non-negative Py_ssize_t would not lose bits. > But in CPython implementation sizes are signed (Py_ssize_t). > The problem with using this optimization (rather low-level > than high-level) is that we need to know unsigned version of > the type of compared values. Wouldn't size_t always work for Py_ssize_t? |
|||
| msg236934 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年03月01日 06:15 | |
> Wouldn't size_t always work for Py_ssize_t? Yes. But it wouldn't work for say off_t. The consistent way is always use size_t instead of Py_ssize_t. But this boat has sailed. |
|||
| msg236935 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2015年03月01日 06:21 | |
> But it wouldn't work for say off_t. I'm only proposing a bounds checking macro for the Py_ssize_t case which is what all of our IndexError tests look for. Also, please look at the attached deque fix. |
|||
| msg236938 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年03月01日 06:59 | |
It looks correct to me, but I would change type and introduce few new variables to get rid of casts. |
|||
| msg236939 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2015年03月01日 07:14 | |
Attaching a diff for the bounds checking Objects/listobject.c. It looks like elsewhere in that file, (size_t) casts are done for various reasons. |
|||
| msg236940 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2015年03月01日 07:50 | |
Also attaching a bounds checking patch for deques. |
|||
| msg236942 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年03月01日 08:13 | |
Parenthesis around Py_SIZE() are redundant. Are there any benchmarking results that show a speed up? Such microoptimization makes sense in tight loops, but optimized source code looks more cumbersome and errorprone. |
|||
| msg236945 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2015年03月01日 08:36 | |
I think the source in listobject.c would be benefit from a well-named macro for this. That would provide the most clarity. For deques, I'll just put in the simple patch because it only applies to a place that is already doing unsigned arithmetic/comparisons. FWIW, I don't usually use benchmarking on these kinds of changes. The generated assembler is sufficiently informative. Benchmarking each tiny change risks getting trapped in a local minimum. Also, little timeit tests tend to branch the same way every time (which won't show the cost of prediction misses) and it tends to have all code and data in cache (so you don't see the effects of cache misses) and it risks tuning to a single processor (in my case a haswell). Instead, I look at the code generated by GCC and CLang to see that it does less work. |
|||
| msg236946 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2015年03月01日 08:38 | |
New changeset 1e89094998b2 by Raymond Hettinger in branch 'default': Issue #23553: Use an unsigned cast to tighten-up the bounds checking logic. https://hg.python.org/cpython/rev/1e89094998b2 |
|||
| msg236948 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年03月01日 08:48 | |
My point is that if the benefit is too small (say < 5% in microbenchmarks), it is not worth code churning. Actually my bar for microbenchmarks is higher, about 20%. |
|||
| msg237011 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2015年03月02日 04:13 | |
FWIW, here is a small patch to show how this could can be done consistently and with code clarity. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:13 | admin | set | github: 67741 |
| 2015年03月02日 04:13:50 | rhettinger | set | files:
+ valid_index.diff messages: + msg237011 title: Reduce the number of comparison for range checking. -> Reduce the number of comparisons for range checking. |
| 2015年03月01日 12:15:21 | Arfrever | set | nosy:
+ Arfrever |
| 2015年03月01日 08:48:56 | serhiy.storchaka | set | messages: + msg236948 |
| 2015年03月01日 08:38:09 | python-dev | set | nosy:
+ python-dev messages: + msg236946 |
| 2015年03月01日 08:36:42 | rhettinger | set | messages: + msg236945 |
| 2015年03月01日 08:13:59 | serhiy.storchaka | set | nosy:
+ pitrou messages: + msg236942 |
| 2015年03月01日 07:50:31 | rhettinger | set | files:
+ bounds_check_deque.diff messages: + msg236940 |
| 2015年03月01日 07:14:55 | rhettinger | set | files:
+ bounds_check_list.diff messages: + msg236939 |
| 2015年03月01日 06:59:06 | serhiy.storchaka | set | messages: + msg236938 |
| 2015年03月01日 06:21:43 | rhettinger | set | messages: + msg236935 |
| 2015年03月01日 06:19:20 | rhettinger | set | files:
+ size_t.diff keywords: + patch |
| 2015年03月01日 06:15:23 | serhiy.storchaka | set | messages: + msg236934 |
| 2015年03月01日 06:09:15 | rhettinger | set | messages: + msg236933 |
| 2015年03月01日 05:39:46 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg236931 |
| 2015年03月01日 00:50:38 | rhettinger | create | |