homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Improve list() pre-sizing for inputs with known lengths
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: eric.snow, miss-islington, pablogsal, rhettinger, serhiy.storchaka, tim.peters, vstinner
Priority: normal Keywords: patch

Created on 2018年04月05日 21:46 by rhettinger, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 6493 closed pablogsal, 2018年04月17日 00:28
PR 9846 merged pablogsal, 2018年10月13日 17:01
PR 10200 merged pablogsal, 2018年10月28日 20:48
PR 11220 merged sir-sigurd, 2018年12月20日 08:05
PR 11899 merged rhettinger, 2019年02月16日 20:42
PR 14432 nmusolino, 2019年07月01日 08:43
Messages (14)
msg315006 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018年04月05日 21:46
The list() constructor isn't taking full advantage of known input lengths or length hints. Ideally, it should pre-size and not over-allocate when the input size is known or can be reasonably estimated.
Python 3.8.0a0 (heads/master:091e95e900, Apr 5 2018, 09:48:33)
[Clang 9.1.0 (clang-902.0.39.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from sys import getsizeof
>>> getsizeof([0] * 10)
144
>>> getsizeof(list([0] * 10))
200
>>> getsizeof(list(range(10)))
200
msg315412 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018年04月17日 20:32
Calling PyObject_LengthHint() adds an overhead. It is significant for short sequences. I work on a patch that reduces it. PR 6493 adds the second call of PyObject_LengthHint() and increases the overhead.
As for this issue, in-place repeat overallocates too.
>>> a = [0]; a *= 10; getsizeof(a)
200
I think it would be better to make it not preallocating.
And maybe it would be worth to avoid overallocating if newsize > allocated + allocated/8 or something like.
msg315420 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018年04月17日 21:07
Microbenchmarked with @vstinner's perf module:
python -m perf timeit "list([0]*10)" -o new.json
checkout master and build
python -m perf timeit "list([0]*10)" -o old.json
python -m perf compare_to new.json old.json
Mean +- std dev: [new] 241 ns +- 3 ns -> [old] 243 ns +- 18 ns: 1.01x slower (+1%)
Not significant!
msg315940 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018年04月30日 08:39
See also http://bugs.python.org/issue26814#msg264003 and bpo-26828.
msg315941 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018年04月30日 08:46
Should we shrink the list of the length hint was way too big? For example, if the length hint was 100 but the final list of only 10. Should we shrink in that case?
I'm asking because it seems like Pablo's PR doesn't shrink the list in that case.
I wasn't sure so I checked, del shrinks the list if needed:
>>> x=list(range(1000))
>>> import sys
>>> sys.getsizeof(x)
9112
>>> del x[1:]
>>> sys.getsizeof(x)
96
msg315973 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018年04月30日 22:22
@serhiy.storchaka @rhettinger @vstinner Should we better make the pre-allocation if the length of the iterable is known (so we call PyObject_Length and not PyObject_LengthHint)? This will keep the logic simpler (so not shrinking if PyObject_LengthHint gives more than the real length) and it will not be as expensive as calling PyObject_LengthHint.
msg328738 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018年10月28日 20:16
New changeset 372d705d958964289d762953d0a61622755f5386 by Pablo Galindo in branch 'master':
bpo-33234 Improve list() pre-sizing for inputs with known lengths (GH-9846)
https://github.com/python/cpython/commit/372d705d958964289d762953d0a61622755f5386
msg328741 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018年10月28日 20:32
> bpo-33234 Improve list() pre-sizing for inputs with known lengths (GH-9846)
Oh. Can you please document the optimization in the following doc as well?
https://docs.python.org/dev/whatsnew/3.8.html#optimizations 
msg328748 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018年10月28日 20:49
>Oh. Can you please document the optimization in the following doc as well?
Sure! Opened https://github.com/python/cpython/pull/10200 to address that. :)
msg328762 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018年10月28日 22:03
New changeset c61e229d2a4c54ffb4153e1f0f48126ba33c9cbf by Pablo Galindo in branch 'master':
bpo-33234: Add exact allocation optimization to lists in What's New (GH-10200)
https://github.com/python/cpython/commit/c61e229d2a4c54ffb4153e1f0f48126ba33c9cbf
msg332737 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018年12月29日 22:31
New changeset 0e5f771f38138714415f665651de7e674fcebc38 by Pablo Galindo (Sergey Fedoseev) in branch 'master':
bpo-33234: Simplify list_preallocate_exact() (GH-11220)
https://github.com/python/cpython/commit/0e5f771f38138714415f665651de7e674fcebc38
msg335717 - (view) Author: miss-islington (miss-islington) Date: 2019年02月16日 20:47
New changeset e182318e6a473e6e9341f1aab8e49f2b1035c674 by Miss Islington (bot) (Raymond Hettinger) in branch 'master':
bpo-33234: Add another attribution in Whatsnew (GH-11899)
https://github.com/python/cpython/commit/e182318e6a473e6e9341f1aab8e49f2b1035c674
msg363187 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2020年03月02日 15:42
Possible backward incompatibility caused by this issue: issue39829 
msg363587 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020年03月07日 09:22
See also bpo-14126: "Speed up list comprehensions by preallocating the list where possible".
History
Date User Action Args
2022年04月11日 14:58:59adminsetgithub: 77415
2020年03月07日 09:22:57vstinnersetmessages: + msg363587
2020年03月07日 09:22:27vstinnerunlinkissue14126 superseder
2020年03月07日 04:13:26ammar2linkissue14126 superseder
2020年03月02日 15:42:00eric.snowsetnosy: + eric.snow
messages: + msg363187
2019年07月01日 08:43:48nmusolinosetpull_requests: + pull_request14316
2019年02月16日 20:47:50miss-islingtonsetnosy: + miss-islington
messages: + msg335717
2019年02月16日 20:42:26rhettingersetpull_requests: + pull_request11926
2018年12月29日 22:33:35pablogsalsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018年12月29日 22:31:39pablogsalsetmessages: + msg332737
2018年12月20日 08:05:14sir-sigurdsetpull_requests: + pull_request10490
2018年10月28日 22:03:22pablogsalsetmessages: + msg328762
2018年10月28日 20:49:56pablogsalsetmessages: + msg328748
2018年10月28日 20:48:53pablogsalsetpull_requests: + pull_request9518
2018年10月28日 20:32:24vstinnersetmessages: + msg328741
2018年10月28日 20:16:30pablogsalsetmessages: + msg328738
2018年10月15日 13:48:17rhettingersetnosy: + tim.peters
2018年10月13日 17:01:40pablogsalsetpull_requests: + pull_request9218
2018年04月30日 22:22:04pablogsalsetmessages: + msg315973
2018年04月30日 08:46:25vstinnersetmessages: + msg315941
2018年04月30日 08:39:36vstinnersetnosy: + vstinner
messages: + msg315940
2018年04月17日 21:07:31pablogsalsetnosy: + pablogsal
messages: + msg315420
2018年04月17日 20:32:06serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg315412
2018年04月17日 00:28:44pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request6190
2018年04月05日 21:46:53rhettingercreate

AltStyle によって変換されたページ (->オリジナル) /