homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: List object memory allocator
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: alecsandru.patrascu, catalin.manciu, florin.papa, methane, terry.reedy, vstinner
Priority: normal Keywords: patch

Created on 2016年02月18日 14:26 by catalin.manciu, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
listobject_CPython3.patch catalin.manciu, 2016年02月18日 14:26 Patch for CPython 3.x review
listobject_CPython2.patch catalin.manciu, 2016年02月18日 14:27 Patch for CPython 2.7.x review
listobject_CPython2-2.patch methane, 2016年12月28日 13:26 review
listobject_CPython3-2.patch methane, 2016年12月28日 20:05 review
Messages (17)
msg260459 - (view) Author: Catalin Gabriel Manciu (catalin.manciu) * Date: 2016年02月18日 14:26
Hi All,
This is Catalin from the Server Scripting Languages Optimization Team at Intel Corporation. I would like to submit a patch that replaces the 'malloc' allocator used by the list object (Objects/listobject.c) with the small object allocator (obmalloc.c) and simplifies the 'list_resize' function by removing a redundant check and properly handling resizing to zero.
Replacing PyMem_* calls with PyObject_* inside the list implementation is beneficial because many PyMem_* calls are made for requesting sizes that are better handled by the small object allocator. For example, when running Tools/pybench.py -w 1 a total of 48.295.840 allocation requests are made by the list implementation (either by using 'PyMem_MALLOC' directly or by calling 'PyMem_RESIZE') out of which 42.581.993 (88%) are requesting sizes that can be handled by the small object allocator (they're equal or less than 512 bytes in size).
The changes to 'list_resize' were made in order to further improve performance by removing a redundant check and handling the 'resize to zero' case separately. The 'empty' state of a list is suggested by the 'PyList_New' function as having the 'ob_item' pointer NULL and the 'ob_size' and 'allocated' members equal with 0. Previously, when being called with zero as a size parameter, 'list_resize' would set 'ob_size' and 'allocated' to zero, but it would also call 'PyMem_RESIZE' which, by its design, would call 'realloc' with a size of 1, thus going through the process of allocating an unnecessary 1 byte and setting the 'ob_item' pointer with the newly obtained address. The proposed implementation just deletes the buffer pointed by 'ob_item' and sets 'ob_size', 'allocated' and 'ob_item' to zero when receiving a 'resize to zero' request.
Hardware and OS Configuration
=============================
Hardware: Intel XEON (Haswell-EP) 36 Cores / Intel XEON (Broadwell-EP) 36 Cores
BIOS settings: Intel Turbo Boost Technology: false
 Hyper-Threading: false 
OS: Ubuntu 14.04.2 LTS
OS configuration: Address Space Layout Randomization (ASLR) disabled to reduce run
 to run variation by echo 0 > /proc/sys/kernel/randomize_va_space
 CPU frequency set fixed at 2.3GHz
GCC version: GCC version 5.1.0
Benchmark: Grand Unified Python Benchmark from 
 https://hg.python.org/benchmarks/
Measurements and Results
========================
A. Repository:
 GUPB Benchmark:
 hg id : 9923b81a1d34 tip
 hg --debug id -i : 9923b81a1d346891f179f57f8780f86dcf5cf3b9
 CPython3:
 hg id : 733a902ac816 tip
 hg id -r 'ancestors(.) and tag()': 737efcadf5a6 (3.4) v3.4.4
 hg --debug id -i : 733a902ac816bd5b7b88884867ae1939844ba2c5
 CPython2:
 hg id : 5715a6d9ff12 (2.7)
 hg id -r 'ancestors(.) and tag()': 6d1b6a68f775 (2.7) v2.7.11
 hg --debug id -i : 5715a6d9ff12053e81f7ad75268ac059b079b351
B. Results:
CPython2 and CPython3 sample results, measured on a Haswell and a Broadwell platform can be viewed in Tables 1, 2, 3 and 4. The first column (Benchmark) is the benchmark name and the second (%D) is the speedup in percents compared with the unpatched version.
Table 1. CPython 3 results on Intel XEON (Haswell-EP) @ 2.3 GHz
Benchmark %D
----------------------------------
unpickle_list 20.27
regex_effbot 6.07
fannkuch 5.87
mako_v2 5.19
meteor_contest 4.31
simple_logging 3.98
nqueens 3.40
json_dump_v2 3.14
fastpickle 2.16
django_v3 2.03
tornado_http 1.90
pathlib 1.84
fastunpickle 1.81
call_simple 1.75
nbody 1.60
etree_process 1.58
go 1.54
call_method_unknown 1.53
2to3 1.26
telco 1.04
etree_generate 1.02
json_load 0.85
etree_parse 0.81
call_method_slots 0.73
etree_iterparse 0.68
call_method 0.65
normal_startup 0.63
silent_logging 0.56
chameleon_v2 0.56
pickle_list 0.52
regex_compile 0.50
hexiom2 0.47
pidigits 0.39
startup_nosite 0.17
pickle_dict 0.00
unpack_sequence 0.00
formatted_logging -0.06
raytrace -0.06
float -0.18
richards -0.37
spectral_norm -0.51
chaos -0.65
regex_v8 -0.72
Table 2. CPython 3 results on Intel XEON (Broadwell-EP) @ 2.3 GHz
Benchmark %D
----------------------------------
unpickle_list 15.75
nqueens 5.24
mako_v2 5.17
unpack_sequence 4.44
fannkuch 4.42
nbody 3.25
meteor_contest 2.86
regex_effbot 2.45
json_dump_v2 2.44
django_v3 2.26
call_simple 2.09
tornado_http 1.74
regex_compile 1.40
regex_v8 1.16
spectral_norm 0.89
2to3 0.76 
chameleon_v2 0.70
telco 0.70
normal_startup 0.64
etree_generate 0.61
etree_process 0.55
hexiom2 0.51
json_load 0.51
call_method_slots 0.48
formatted_logging 0.33
call_method 0.28
startup_nosite -0.02
fastunpickle -0.02
pidigits -0.20
etree_parse -0.23
etree_iterparse -0.27
richards -0.30
silent_logging -0.36
pickle_list -0.42
simple_logging -0.82
float -0.91
pathlib -0.99
go -1.16
raytrace -1.16
chaos -1.26
fastpickle -1.72
call_method_unknown -2.94
pickle_dict -4.73
Table 3. CPython 2 results on Intel XEON (Haswell-EP) @ 2.3 GHz
Benchmark %D
----------------------------------
unpickle_list 15.89
json_load 11.53
fannkuch 7.90
mako_v2 7.01
meteor_contest 4.21
nqueens 3.81
fastunpickle 3.56
django_v3 2.91
call_simple 2.72
call_method_slots 2.45
slowpickle 2.23
call_method 2.21
html5lib_warmup 1.90
chaos 1.89
html5lib 1.81
regex_v8 1.81
tornado_http 1.66
2to3 1.56
json_dump_v2 1.49
nbody 1.38
rietveld 1.26
formatted_logging 1.12
regex_compile 0.99
spambayes 0.92
pickle_list 0.87
normal_startup 0.82
pybench 0.74
slowunpickle 0.71
raytrace 0.67
startup_nosite 0.59
float 0.47
hexiom2 0.46
slowspitfire 0.46
pidigits 0.44
etree_process 0.44
etree_generate 0.37
go 0.27
telco 0.24
regex_effbot 0.12
etree_iterparse 0.06
bzr_startup 0.04
richards 0.03
etree_parse 0.00
unpack_sequence 0.00
call_method_unknown -0.26
pathlib -0.57
fastpickle -0.64
silent_logging -0.94
simple_logging -1.10
chameleon_v2 -1.25
pickle_dict -1.67
spectral_norm -3.25
Table 4. CPython 2 results on Intel XEON (Broadwell-EP) @ 2.3 GHz
Benchmark %D
----------------------------------
unpickle_list 15.44
json_load 11.11
fannkuch 7.55
meteor_contest 5.51
mako_v2 4.94
nqueens 3.49
html5lib_warmup 3.15
html5lib 2.78
call_simple 2.35
silent_logging 2.33
json_dump_v2 2.14
startup_nosite 2.09
bzr_startup 1.93
fastunpickle 1.93
slowspitfire 1.91
regex_v8 1.79
rietveld 1.74
pybench 1.59
nbody 1.57
regex_compile 1.56
pathlib 1.51
tornado_http 1.33
normal_startup 1.21
2to3 1.14
chaos 1.00
spambayes 0.85
etree_process 0.73
pickle_list 0.70
float 0.69
hexiom2 0.51
slowpickle 0.44
call_method_unknown 0.42
slowunpickle 0.37
pickle_dict 0.25
etree_parse 0.20
go 0.19
django_v3 0.12
call_method_slots 0.12
spectral_norm 0.05
call_method 0.01
unpack_sequence 0.00
raytrace -0.08
pidigits -0.11
richards -0.16
etree_generate -0.23
regex_effbot -0.26
telco -0.28
simple_logging -0.32
etree_iterparse -0.38
formatted_logging -0.50
fastpickle -1.08
chameleon_v2 -1.74
msg260467 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016年02月18日 16:21
Instead of modifying individual files, I proposed to modify PyMem_Malloc to use PyObject_Malloc allocator: issue #26249.
But the patch for Python 2 still makes sense.
msg260513 - (view) Author: Catalin Gabriel Manciu (catalin.manciu) * Date: 2016年02月19日 09:20
Hi Victor,
This patch follows the same idea as your proposal, but it's focused on a single object type. I think doing this incrementally is the safer approach, allowing us to have finer control over the new 
areas where we enable allocating using the small object allocator and detect where this replacement might be detrimental to the performance.
msg260514 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016年02月19日 09:46
Catalin Gabriel Manciu: "(...) allowing us to have finer control over
the new areas where we enable allocating using the small object
allocator and detect where this replacement might be detrimental to
the performance"
Ah, interesting, do you think that it's possible that my change can
*slow down* Python? I don't think so, but I'm interested on feedback
on my patch :-) You may try to run benchmarks with my patch.
msg260517 - (view) Author: Catalin Gabriel Manciu (catalin.manciu) * Date: 2016年02月19日 11:06
Theoretically, an object type that consistently allocates more than the small object threshold would perform a bit slower because
it would first jump to the small object allocator, do the size comparison and then jump to malloc. There would be a small overhead 
if PyMem_* would be redirected to PyObject_* in this (hypothetical) case and the initial choice of PyMem_* over PyObject_* might have 
been determined by knowing about that overhead. This is because many think of PyMem_* as the lower-level allocator, PyObject_* as a
higher-level one. Of course, PyMem_Raw* should be used in such cases, but it's not as widely adopted as the other two.
I will post some benchmark results on your issue page as soon as I get them.
msg260518 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016年02月19日 11:12
"Theoretically, an object type that consistently allocates more than the small object threshold would perform a bit slower because it would first jump to the small object allocator, do the size comparison and then jump to malloc."
I expect that the cost of the extra check is *very* cheap (completly negligible) compared to the cost of a call to malloc().
To have an idea of the cost of the Python code around system allocators, you can take a look at the Performance section of my PEP 445 which added an indirection to all Python allocators:
https://www.python.org/dev/peps/pep-0445/#performances
I was unable to measure an overhead on macro benchmarks (perf.py). The overhead on microbenchmarks was really hard to measure because it was so low that benchmarks were very unable.
msg260548 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016年02月20日 02:20
My impression is that we do not do such performance enhancements to 2.7 for the same reason we would not do them to current 3.x -- the risk of breakage. Have I misunderstood?
msg260550 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016年02月20日 03:04
Terry J. Reedy added the comment:
> My impression is that we do not do such performance enhancements to 2.7 for the same reason we would not do them to current 3.x -- the risk of breakage. Have I misunderstood?
Breakage of what? The change looks very safe.
msg260678 - (view) Author: Catalin Gabriel Manciu (catalin.manciu) * Date: 2016年02月22日 13:42
Our Haswell-EP OpenStack Swift setup shows a 1% improvement in throughput rate using CPython 2.7 (5715a6d9ff12) with this patch.
msg284173 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2016年12月28日 13:26
Update patch for Python 2.7
msg284200 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016年12月28日 21:47
Il don't understand your change: in Python 3.6, PyMem now uses exactly the
same allocator than PyObject.
msg284201 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016年12月28日 21:49
See https://bugs.python.org/issue26249 
msg284221 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2016年12月29日 01:37
I know PyMem and PyObject allocator is same by default. But it's configurable.
How should I choose right allocator?
msg284222 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2016年12月29日 01:40
Maybe, PyObject_MALLOC remains only for backward compatibility?
msg284239 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016年12月29日 09:41
I know PyMem and PyObject allocator is same by default. But it's
configurable.
How should I choose right allocator?
The two functions always use the same allocator:
https://docs.python.org/dev/using/cmdline.html#envvar-PYTHONMALLOC
Sorry but which issue are you trying to fix here? Can you please elaborate
your use case?
As I wrote before, only Python 2 should be modified now (if you consider
that it's worth it, the speedup is small).
msg284242 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2016年12月29日 10:10
OK. I didn't know PyMem and PyObject allocators are always same.
No reason to change Python 3.
How about Python 2?
Off topic: I want to know which of PyMem and PyObject allocator is preferred
when writing new code.
msg288206 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017年02月20日 14:11
FYI the Python 3.6 change in PyMem_Malloc() required to implement a new complex check on the GIL. Search for "PyMem_Malloc() now fails if the GIL is not held" in my following blog post:
https://haypo.github.io/contrib-cpython-2016q1.html
Requiring that the GIL is held is a backward incompatible change. I suggest to run your code with PYTHONMALLOC=debug on Python 3.6 ;-)
History
Date User Action Args
2022年04月11日 14:58:27adminsetgithub: 70570
2017年02月20日 14:11:04vstinnersetmessages: + msg288206
2017年02月20日 11:43:44methanesetstatus: open -> closed
resolution: wont fix
stage: resolved
2016年12月29日 10:10:39methanesetmessages: + msg284242
versions: - Python 3.7
2016年12月29日 09:41:53vstinnersetmessages: + msg284239
2016年12月29日 01:40:38methanesetmessages: + msg284222
2016年12月29日 01:37:48methanesetmessages: + msg284221
2016年12月28日 21:49:58vstinnersetmessages: + msg284201
2016年12月28日 21:47:21vstinnersetmessages: + msg284200
2016年12月28日 20:05:46methanesetfiles: + listobject_CPython3-2.patch
2016年12月28日 13:26:35methanesetfiles: + listobject_CPython2-2.patch
versions: + Python 3.7, - Python 3.6
nosy: + methane

messages: + msg284173
2016年02月22日 13:42:56catalin.manciusetmessages: + msg260678
2016年02月20日 03:04:19vstinnersetmessages: + msg260550
2016年02月20日 02:20:08terry.reedysetnosy: + terry.reedy
messages: + msg260548
2016年02月19日 11:20:24alecsandru.patrascusetnosy: + alecsandru.patrascu
2016年02月19日 11:12:44vstinnersetmessages: + msg260518
2016年02月19日 11:06:31catalin.manciusetmessages: + msg260517
2016年02月19日 09:46:02vstinnersetmessages: + msg260514
2016年02月19日 09:20:12catalin.manciusetmessages: + msg260513
2016年02月19日 07:18:51florin.papasetnosy: + florin.papa
2016年02月18日 16:21:29vstinnersetnosy: + vstinner
messages: + msg260467
2016年02月18日 14:27:25catalin.manciusetfiles: + listobject_CPython2.patch
2016年02月18日 14:26:51catalin.manciucreate

AltStyle によって変換されたページ (->オリジナル) /