This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015年05月06日 16:18 by larry, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| larry.range.hack.1.txt | larry, 2015年05月06日 16:18 | review | ||
| int_free_list.patch | serhiy.storchaka, 2015年05月06日 17:37 | review | ||
| Messages (15) | |||
|---|---|---|---|
| msg242685 - (view) | Author: Larry Hastings (larry) * (Python committer) | Date: 2015年05月06日 16:18 | |
This probably shouldn't be checked in. But it was an interesting experiment, and I did get it to work. My brother forwarded me this question from Stack Overflow: http://stackoverflow.com/questions/23453133/is-there-a-reason-python-3-enumerates-slower-than-python-2 The debate brought up a good point: a lot of the overhead of range() is in creating and destroying the long objects. I wondered, could I get rid of that? Long story short, yes. rangeiterobject is a special-case range object for when you're iterating over integer values and the values all fit inside C longs. Otherwise it has to use the much slower general-purpose longrangeiterobject.) rangeiter_next is simple: it computes the new value then returns PyLong_FromLong of that value. First thought: cache a reference to the previous value. If its reference count is 1, we have the only reference. Overwrite its value and return it. But that doesn't help in the general case, because in "for x in range(1000)" x is holding a reference at the time __next__ is called on the iterator. The trick: cache *two* old yielded objects. In the general case, by the second iteration, everyone else has dropped their references to the older cached object and we can modify it safely. The second trick: if the value you're yielding is one of the interned "small ints", you *have* to return the interned small int. (Otherwise you break 0 == 0, I kid you not.) With this patch applied all regression tests pass. And, on my desktop machine, the benchmark they used in the above link: ./python -mtimeit -n 5 -r 2 -s"cnt = 0" "for i in range(10000000): cnt += 1" drops from 410ms to 318ms ("5 loops, best of 2: 318 msec per loop"). This implementation requires the rangeiterobject to have intimate knowledge of the implementation of the PyLongObject, including copy-and-pasting some information that isn't otherwise exposed (max small int essentially). At the very least that information would need to be exposed properly so rangeiterobject could use it correctly before this could be checked in. It might be cleaner for longobject.c to expose a private "set this long object to this value" function. It would fail if we can't do it safely: if the value was an interned (small) int, or if the long was of the wrong size (Py_SIZE(o) != 1). Is this interesting enough to pursue? I'm happy to drop it. |
|||
| msg242686 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年05月06日 16:38 | |
This regression was just discussed in issue24076. General suggestion about free list for small ints was proposed. It could speed up other integer computations, but comprehensive benchmarking results are needed. See also similar issue23507 for tuples. Perhaps we need general solution for fast specialized free lists. |
|||
| msg242687 - (view) | Author: Stefan Behnel (scoder) * (Python committer) | Date: 2015年05月06日 16:38 | |
See issue 24076. |
|||
| msg242691 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年05月06日 17:37 | |
Here is a patch that adds a free list for 1-digit long objects. $ ./python -m timeit -s "r = range(10**3)" -- "for i in r: pass" Unpatched: 10000 loops, best of 3: 54.4 usec per loop With free list: 10000 loops, best of 3: 38 usec per loop With hacked range: 10000 loops, best of 3: 34.5 usec per loop Python 2.7: 10000 loops, best of 3: 37.1 usec per loop $ ./python -m timeit -s "r = list(range(10**3))" -- "for i in r: pass" 10000 loops, best of 3: 30.7 usec per loop In Python 2.7: $ ./python -m timeit -s "r = xrange(10**3)" -- "for i in r: pass" 10000 loops, best of 3: 41.4 usec per loop $ ./python -m timeit -s "r = range(10**3)" -- "for i in r: pass" 10000 loops, best of 3: 37.1 usec per loop |
|||
| msg242704 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2015年05月06日 19:55 | |
How does this apply where people like me scarcely if ever use range, it's standard for loops? There are plenty of recipes using itertools that show that you don't need range, is this really needed? No axe to grind, just curious. |
|||
| msg242707 - (view) | Author: Larry Hastings (larry) * (Python committer) | Date: 2015年05月06日 20:47 | |
10**3 doesn't show off this hack as much as other numbers would; the hack only operates from 257 to the max in that will fit in a single long "digit" (32767 on 32-bit, 2**30 on 64-bit). Anyway, freelist for one-digit longs seems nearly as good, and it's a lot more general-purpose, so I'm much more interested in that. |
|||
| msg242885 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2015年05月11日 08:57 | |
I like the idea of adding a free list of longs in Python 3, but I think we should extend this somewhat to cover more ground, e.g. by pre-allocating a block of 1 digit long objects, like we did for Python 2 ints, and perhaps allocate up to 4k (= 1 memory page) towards such a free list. The added cache locality of having the data in a pre-allocated block should provide some more performance. |
|||
| msg242886 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年05月11日 09:42 | |
According to my and Larry's measurements [1] the distribution of created int's by size during running Python tests is: On 32-bit Linux: int 42828741 13.40% 0 425353 0.99% 0.99% 1 21399290 49.96% 50.96% 2 10496856 24.51% 75.47% 3 4873346 11.38% 86.85% 4 1021563 2.39% 89.23% 5 1246444 2.91% 92.14% 6 733676 1.71% 93.85% 7 123074 0.29% 94.14% 8 139203 0.33% 94.47% ... On 64-bit Linux: int 47600237 13.53% 0 294964 0.62% 0.62% 1 36135772 75.92% 76.53% 2 4504046 9.46% 86.00% 3 2109837 4.43% 90.43% 4 1277995 2.68% 93.11% 5 542775 1.14% 94.25% 6 485451 1.02% 95.27% ... 86% of ints have size <= 3 on 32-bit and <= 2 on 64-bit. This is enough to represent 32-bit integers (as in Python 2 int). I think we should support free list not only for 1-digit ints, but at least up to 3 digit on 32-bit build and up to 2 digits on 64-bit build. Other natural limit is 3 digit on 64-bit build (enough to represent 64-bit C long on Linux or pointer on all platforms). Larger integers perhaps are encountered mainly in tests. Pre-allocating a block has a disadvantage. It is hard to free allocated block. The program can create a lot of integers, then drop most of them, and request the memory for other needs, but blocks once allocated for integers would not freed. This is not trivial design decision and should be discussed on Python-Dev and accepted by BDFL. |
|||
| msg242887 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年05月11日 09:43 | |
[1] http://comments.gmane.org/gmane.comp.python.devel/153078 |
|||
| msg242888 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2015年05月11日 10:09 | |
I'd rather stick to the simple freelist approach. People interested in (allegedly) more ambitious designs can open new issues, together with patches to evaluate their efficiency. |
|||
| msg242889 - (view) | Author: Larry Hastings (larry) * (Python committer) | Date: 2015年05月11日 10:10 | |
I'd rather have the general-purpose freelist for ints too. How about we close this issue now, and assuming the freelist for ints goes in we can abandon this approach entirely. |
|||
| msg242891 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2015年05月11日 11:05 | |
On 11.05.2015 11:42, Serhiy Storchaka wrote: > > Pre-allocating a block has a disadvantage. It is hard to free allocated block. The program can create a lot of integers, then drop most of them, and request the memory for other needs, but blocks once allocated for integers would not freed. This is not trivial design decision and should be discussed on Python-Dev and accepted by BDFL. True, but if it's only 1-4k RAM, I don't think anyone would mind :-) Python 2 is doing exactly that with 1k RAM for the integer free list. I think it was one of the first free lists ever added to Python, so done in a time when RAM was expensive ;-). |
|||
| msg242892 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2015年05月11日 12:36 | |
Well, 1-4k RAM can be consumed just after the start up (it's only 100-300 integers) and never freed. What next? |
|||
| msg402137 - (view) | Author: Guido van Rossum (gvanrossum) * (Python committer) | Date: 2021年09月18日 17:44 | |
Should we try the free list for integers again? |
|||
| msg402138 - (view) | Author: Guido van Rossum (gvanrossum) * (Python committer) | Date: 2021年09月18日 17:55 | |
(Never mind, that should have gone to https://bugs.python.org/issue24165, which is the integer freelist -- conclusion there in 2015 was also that it didn't do much.) |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:16 | admin | set | github: 68326 |
| 2021年09月18日 17:55:25 | gvanrossum | set | messages: + msg402138 |
| 2021年09月18日 17:44:32 | gvanrossum | set | nosy:
+ gvanrossum messages: + msg402137 |
| 2015年05月11日 12:36:52 | serhiy.storchaka | set | messages: + msg242892 |
| 2015年05月11日 11:05:13 | lemburg | set | messages: + msg242891 |
| 2015年05月11日 10:10:56 | larry | set | status: open -> closed resolution: rejected messages: + msg242889 stage: patch review -> resolved |
| 2015年05月11日 10:09:17 | pitrou | set | messages: + msg242888 |
| 2015年05月11日 09:43:06 | serhiy.storchaka | set | messages: + msg242887 |
| 2015年05月11日 09:42:21 | serhiy.storchaka | set | messages: + msg242886 |
| 2015年05月11日 08:57:43 | lemburg | set | nosy:
+ lemburg messages: + msg242885 |
| 2015年05月07日 20:48:43 | ethan.furman | set | nosy:
+ ethan.furman |
| 2015年05月06日 20:47:57 | larry | set | messages: + msg242707 |
| 2015年05月06日 19:55:59 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg242704 |
| 2015年05月06日 17:37:42 | serhiy.storchaka | set | files:
+ int_free_list.patch keywords: + patch |
| 2015年05月06日 17:37:23 | serhiy.storchaka | set | messages: + msg242691 |
| 2015年05月06日 16:38:52 | scoder | set | nosy:
+ scoder messages: + msg242687 |
| 2015年05月06日 16:38:08 | serhiy.storchaka | set | messages: + msg242686 |
| 2015年05月06日 16:28:20 | mark.dickinson | set | nosy:
+ mark.dickinson |
| 2015年05月06日 16:18:56 | larry | create | |