homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Use VirtualAlloc to allocate memory arenas
Type: resource usage Stage: commit review
Components: Interpreter Core, Windows Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: brian.curtin, giampaolo.rodola, loewis, neologix, pitrou, python-dev, tim.golden, trent, vstinner
Priority: low Keywords: patch

Created on 2011年11月26日 13:12 by pitrou, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
va.patch pitrou, 2011年11月26日 13:12 review
tuples.py loewis, 2012年06月21日 17:10
va.diff loewis, 2012年06月21日 17:11 review
Messages (23)
msg148399 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年11月26日 13:12
Similar to issue #11849, this patch proposes to use VirtualAlloc/VirtualFree to allocate the Python allocator's memory arenas (rather than malloc() / free()). It might help release more memory if there is some fragmentation, although I don't know how Microsoft's malloc() works.
msg148605 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011年11月29日 20:48
The patch looks good to me.
To study Microsoft's malloc, see VC\crt\src\malloc.c. Typically, it uses HeapAlloc from the CRT heap, unless it's in 32-bit mode, and __active_heap is either __V6_HEAP or __V5_HEAP. This is determined at startup by __heap_select, inspecting an environment variable __MSVCRT_HEAP_SELECT. If that's not set, the CRT heap is used.
The CRT heap, in turn, is created with HeapCreate (no flags).
As an alternative approach, Python could consider completely dropping obmalloc on Windows, and using a Windows Low Fragementation Heap (LFH) instead, with HEAP_NO_SERIALIZE (as the heap would be protected by the GIL).
If we take the route proposed by this patch, I recommend also dropping all other CRT malloc() calls in Python, and make allocations from the process heap instead (that's a separate issue, though).
msg148611 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年11月29日 21:02
> The patch looks good to me.
> 
> To study Microsoft's malloc, see VC\crt\src\malloc.c. Typically, it
> uses HeapAlloc from the CRT heap, unless it's in 32-bit mode, and
> __active_heap is either __V6_HEAP or __V5_HEAP. This is determined at
> startup by __heap_select, inspecting an environment variable
> __MSVCRT_HEAP_SELECT. If that's not set, the CRT heap is used.
Ah, right, I guessed it was using HeapAlloc indeed. What would be more
interesting is how HeapAlloc works :)
I think it would be nice to know whether the patch has a chance of being
useful before committing it. I did it as a thought experiment after the
similar change was committed for Unix, but I'm not an expert in Windows
internals. Perhaps HeapAlloc deals fine with fragmentation? Tim, Brian,
do you know anything about this?
> As an alternative approach, Python could consider completely dropping
> obmalloc on Windows, and using a Windows Low Fragementation Heap (LFH)
> instead, with HEAP_NO_SERIALIZE (as the heap would be protected by the
> GIL).
I'm not sure that would serve the same purpose as obmalloc, which
(AFAIU) is very fast at the expense of compacity.
msg148612 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2011年11月29日 21:04
'fraid not. I've never had to dig into the allocation stuff at this level.
msg148621 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011年11月29日 22:13
> I think it would be nice to know whether the patch has a chance of being
> useful before committing it. I did it as a thought experiment after the
> similar change was committed for Unix, but I'm not an expert in Windows
> internals. Perhaps HeapAlloc deals fine with fragmentation?
Unfortunately, the implementation of HeapAlloc isn't really documented.
If Reactos is right, it looks like this: http://bit.ly/t2NPHh
Blocks < 1024 bytes are allocated from per-size free lists.
Blocks < Heap->VirtualMemoryThreshold are allocated through the free
list for variable-sized blocks of the heap.
Other blocks are allocated through ZwAllocateVirtualMemory, adding
sizeof(HEAP_VIRTUAL_ALLOC_ENTRY) in the beginning. I think this header
will cause malloc() to allocate one extra page in front of an arena.
>> As an alternative approach, Python could consider completely dropping
>> obmalloc on Windows, and using a Windows Low Fragementation Heap (LFH)
>> instead, with HEAP_NO_SERIALIZE (as the heap would be protected by the
>> GIL).
> 
> I'm not sure that would serve the same purpose as obmalloc, which
> (AFAIU) is very fast at the expense of compacity.
I'd expect that LFH heaps are also very fast. The major difference I can
see is that blocks in the LFH heap still have an 8-byte header (possibly
more on a 64-bit system). So I wouldn't expect any speed savings, but
(possibly relevant) memory savings from obmalloc.
msg148623 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2011年11月29日 22:39
> Tim, Brian, do you know anything about this?
Unfortunately, no. It's on my todo list of things to understand but I don't see that happening in the near future.
I'm willing to run tests or benchmarks for this issue, but that's likely the most I can provide.
msg148625 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年11月29日 23:12
Le mardi 29 novembre 2011 à 22:39 +0000, Brian Curtin a écrit :
> Brian Curtin <brian@python.org> added the comment:
> 
> > Tim, Brian, do you know anything about this?
> 
> Unfortunately, no. It's on my todo list of things to understand but I
> don't see that happening in the near future.
> 
> I'm willing to run tests or benchmarks for this issue, but that's
> likely the most I can provide.
Benchmarks would be nice indeed.
msg163350 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012年06月21日 17:10
Here is a benchmark. Based on my assumption that this patch may reduce allocation overheads due to minimizing padding+fragmentation, it allocates a lot of memory, and then waits 20s so you can check in the process explorer what the "Commit Size" of the process is.
For the current 3.3 tree, in 32-bit mode, on a 64-bit Windows 7 installation, I get 464,756K for the unpatched version, and 450,436K for the patched version.
This is a 3% saving, which seems good enough for me.
msg163351 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012年06月21日 17:11
Here is an updated patch.
msg189760 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013年05月21日 14:21
Martin, do you think your latest patch can be committed?
msg189771 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013年05月21日 16:24
Antoine's request for benchmarks still stands. I continue to think that it should be applied even in absence of benchmarks. In the absence of third opinions on this specific aspect, I don't think it can be applied.
msg189824 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013年05月22日 17:03
I can't speak for Antoine, but I guess that the result of pybench
would be enough to make sure it doesn't introduce any regression
(which would be *really* suprising).
As for the memory savings, the benchmark you posted earlier is
conclusive enough IMO (especially since the it can be difficult to
come up with a scheme leading to heap fragmentation).
msg189825 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年05月22日 17:24
I asked for benchmarks because I don't know anything about Windows virtual memory management, but if other people think this patch should go in then it's fine.
The main point of using VirtualAlloc/VirtualFree was, in my mind, to allow *releasing* memory in more cases than when relying on free() (assuming Windows uses some sbrk() equivalent). But perhaps Windows is already tuned to release memory on most free() calls.
msg189850 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013年05月23日 06:34
Ah ok. I guess tuples.py then indeed demonstrates a saving. I'll apply the patch.
msg189876 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年05月23日 20:07
Set also issue #3329 which proposes an API to define memory allocators.
msg190427 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年05月31日 23:29
I tested VirtualAlloc/VirtualFree versus malloc/free on Windows 7 SP1 64-bit. On my small test, when using VirtualAlloc/VirtualFree, the memory peak is lower (ex: 58.1 MB vs 59.0), and the memory usage is the same or sometimes lower. The difference is small, malloc() implementation on Windows 7 is efficient! But I am in favor of using VirtualAlloc/VirtualFree because it is the native API and the gain may be bigger on a real application.
--
I used the following script for my test:
https://bitbucket.org/haypo/misc/raw/98eb42a3ed2144141d62c75e3d07933839fe2a0c/python/python_memleak.py
I reused get_process_mem_info() code from psutil to get current and peak memory usage (I failed to install psutil, I don't understand why).
I also replace func() of my script with tuples.py to create many tuples.
--
Python < 3.3 wastes a lot of memory with python_memleak.py. Python 3.3 behaves much better thanks to the usage of mmap() on Linux, and the fixed threshold on 64-bit (min=512 bytes, instead of 256).
msg191252 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年06月16日 02:05
Martin von Loewis: "If we take the route proposed by this patch, I recommend also dropping all other CRT malloc() calls in Python, and make allocations from the process heap instead (that's a separate issue, though)."
=> see issue #18203 
msg191253 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年06月16日 02:11
haypo> I tested VirtualAlloc/VirtualFree versus malloc/free
haypo> on Windows 7 SP1 64-bit. On my small test, ...
I realized that I was no precise: I tried attached va.diff patch. I didn't try to replace completly malloc().
msg191461 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年06月19日 12:03
> Ah ok. I guess tuples.py then indeed demonstrates a saving. I'll apply the patch.
According to my test, the memory usage is a little bit better with the patch. So Martin:,do you plan to commit the patch?
Or is a benchmark required? Or should check first check the Low Fragmentation Allocator?
I plan to test the Low Fragmentation Allocator, at least on Windows 7. But I prefer to do it later, I'm working on the PEP 445 right now.
msg191504 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013年06月20日 08:24
> I plan to test the Low Fragmentation Allocator, at least on Windows 7.
I don't think it can be any better than raw mmap() / VirtualAlloc()...
msg191506 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年06月20日 08:45
>> I plan to test the Low Fragmentation Allocator, at least on Windows 7.
> I don't think it can be any better than raw mmap() / VirtualAlloc()...
I mean using the Low Fragmentation Allocator for PyObject_Malloc()
instead of pymalloc.
Martin wrote (msg148605):
"As an alternative approach, Python could consider completely dropping
obmalloc on Windows, and using a Windows Low Fragementation Heap (LFH)
instead, with HEAP_NO_SERIALIZE (as the heap would be protected by the
GIL)."
msg191507 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013年06月20日 10:50
Ok, I'm going to commit this patch. Any further revisions (including reversions) can be done then.
msg191939 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年06月27日 10:24
New changeset 44f455e6163d by Martin v. Löwis in branch 'default':
Issue #13483: Use VirtualAlloc in obmalloc on Windows.
http://hg.python.org/cpython/rev/44f455e6163d 
History
Date User Action Args
2022年04月11日 14:57:24adminsetgithub: 57692
2013年06月27日 10:24:52loewissetstatus: open -> closed
resolution: fixed
2013年06月27日 10:24:20python-devsetnosy: + python-dev
messages: + msg191939
2013年06月20日 10:50:56loewissetmessages: + msg191507
2013年06月20日 08:45:06vstinnersetmessages: + msg191506
2013年06月20日 08:24:56pitrousetmessages: + msg191504
2013年06月19日 12:03:01vstinnersetmessages: + msg191461
2013年06月17日 22:42:29trentsetnosy: + trent
2013年06月16日 02:11:36vstinnersetmessages: + msg191253
2013年06月16日 02:05:20vstinnersetmessages: + msg191252
2013年06月06日 14:17:53giampaolo.rodolasetnosy: + giampaolo.rodola
2013年05月31日 23:29:07vstinnersetmessages: + msg190427
2013年05月23日 20:07:33vstinnersetnosy: + vstinner
messages: + msg189876
2013年05月23日 06:34:07loewissetmessages: + msg189850
2013年05月22日 17:24:57pitrousetstage: commit review
messages: + msg189825
versions: + Python 3.4, - Python 3.3
2013年05月22日 17:03:57neologixsetmessages: + msg189824
2013年05月21日 16:24:28loewissetmessages: + msg189771
2013年05月21日 14:21:57neologixsetmessages: + msg189760
2012年06月21日 17:11:04loewissetfiles: + va.diff

messages: + msg163351
2012年06月21日 17:10:39loewissetfiles: + tuples.py

messages: + msg163350
2011年11月29日 23:12:56pitrousetmessages: + msg148625
2011年11月29日 22:39:18brian.curtinsetmessages: + msg148623
2011年11月29日 22:13:23loewissetmessages: + msg148621
2011年11月29日 21:04:01tim.goldensetmessages: + msg148612
2011年11月29日 21:02:08pitrousetmessages: + msg148611
2011年11月29日 20:48:33loewissetnosy: + loewis
messages: + msg148605
2011年11月26日 13:12:42pitroucreate

AltStyle によって変換されたページ (->オリジナル) /