This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011年11月26日 13:12 by pitrou, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| va.patch | pitrou, 2011年11月26日 13:12 | review | ||
| tuples.py | loewis, 2012年06月21日 17:10 | |||
| va.diff | loewis, 2012年06月21日 17:11 | review | ||
| Messages (23) | |||
|---|---|---|---|
| msg148399 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年11月26日 13:12 | |
Similar to issue #11849, this patch proposes to use VirtualAlloc/VirtualFree to allocate the Python allocator's memory arenas (rather than malloc() / free()). It might help release more memory if there is some fragmentation, although I don't know how Microsoft's malloc() works. |
|||
| msg148605 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2011年11月29日 20:48 | |
The patch looks good to me. To study Microsoft's malloc, see VC\crt\src\malloc.c. Typically, it uses HeapAlloc from the CRT heap, unless it's in 32-bit mode, and __active_heap is either __V6_HEAP or __V5_HEAP. This is determined at startup by __heap_select, inspecting an environment variable __MSVCRT_HEAP_SELECT. If that's not set, the CRT heap is used. The CRT heap, in turn, is created with HeapCreate (no flags). As an alternative approach, Python could consider completely dropping obmalloc on Windows, and using a Windows Low Fragementation Heap (LFH) instead, with HEAP_NO_SERIALIZE (as the heap would be protected by the GIL). If we take the route proposed by this patch, I recommend also dropping all other CRT malloc() calls in Python, and make allocations from the process heap instead (that's a separate issue, though). |
|||
| msg148611 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年11月29日 21:02 | |
> The patch looks good to me. > > To study Microsoft's malloc, see VC\crt\src\malloc.c. Typically, it > uses HeapAlloc from the CRT heap, unless it's in 32-bit mode, and > __active_heap is either __V6_HEAP or __V5_HEAP. This is determined at > startup by __heap_select, inspecting an environment variable > __MSVCRT_HEAP_SELECT. If that's not set, the CRT heap is used. Ah, right, I guessed it was using HeapAlloc indeed. What would be more interesting is how HeapAlloc works :) I think it would be nice to know whether the patch has a chance of being useful before committing it. I did it as a thought experiment after the similar change was committed for Unix, but I'm not an expert in Windows internals. Perhaps HeapAlloc deals fine with fragmentation? Tim, Brian, do you know anything about this? > As an alternative approach, Python could consider completely dropping > obmalloc on Windows, and using a Windows Low Fragementation Heap (LFH) > instead, with HEAP_NO_SERIALIZE (as the heap would be protected by the > GIL). I'm not sure that would serve the same purpose as obmalloc, which (AFAIU) is very fast at the expense of compacity. |
|||
| msg148612 - (view) | Author: Tim Golden (tim.golden) * (Python committer) | Date: 2011年11月29日 21:04 | |
'fraid not. I've never had to dig into the allocation stuff at this level. |
|||
| msg148621 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2011年11月29日 22:13 | |
> I think it would be nice to know whether the patch has a chance of being > useful before committing it. I did it as a thought experiment after the > similar change was committed for Unix, but I'm not an expert in Windows > internals. Perhaps HeapAlloc deals fine with fragmentation? Unfortunately, the implementation of HeapAlloc isn't really documented. If Reactos is right, it looks like this: http://bit.ly/t2NPHh Blocks < 1024 bytes are allocated from per-size free lists. Blocks < Heap->VirtualMemoryThreshold are allocated through the free list for variable-sized blocks of the heap. Other blocks are allocated through ZwAllocateVirtualMemory, adding sizeof(HEAP_VIRTUAL_ALLOC_ENTRY) in the beginning. I think this header will cause malloc() to allocate one extra page in front of an arena. >> As an alternative approach, Python could consider completely dropping >> obmalloc on Windows, and using a Windows Low Fragementation Heap (LFH) >> instead, with HEAP_NO_SERIALIZE (as the heap would be protected by the >> GIL). > > I'm not sure that would serve the same purpose as obmalloc, which > (AFAIU) is very fast at the expense of compacity. I'd expect that LFH heaps are also very fast. The major difference I can see is that blocks in the LFH heap still have an 8-byte header (possibly more on a 64-bit system). So I wouldn't expect any speed savings, but (possibly relevant) memory savings from obmalloc. |
|||
| msg148623 - (view) | Author: Brian Curtin (brian.curtin) * (Python committer) | Date: 2011年11月29日 22:39 | |
> Tim, Brian, do you know anything about this? Unfortunately, no. It's on my todo list of things to understand but I don't see that happening in the near future. I'm willing to run tests or benchmarks for this issue, but that's likely the most I can provide. |
|||
| msg148625 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年11月29日 23:12 | |
Le mardi 29 novembre 2011 à 22:39 +0000, Brian Curtin a écrit : > Brian Curtin <brian@python.org> added the comment: > > > Tim, Brian, do you know anything about this? > > Unfortunately, no. It's on my todo list of things to understand but I > don't see that happening in the near future. > > I'm willing to run tests or benchmarks for this issue, but that's > likely the most I can provide. Benchmarks would be nice indeed. |
|||
| msg163350 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年06月21日 17:10 | |
Here is a benchmark. Based on my assumption that this patch may reduce allocation overheads due to minimizing padding+fragmentation, it allocates a lot of memory, and then waits 20s so you can check in the process explorer what the "Commit Size" of the process is. For the current 3.3 tree, in 32-bit mode, on a 64-bit Windows 7 installation, I get 464,756K for the unpatched version, and 450,436K for the patched version. This is a 3% saving, which seems good enough for me. |
|||
| msg163351 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2012年06月21日 17:11 | |
Here is an updated patch. |
|||
| msg189760 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2013年05月21日 14:21 | |
Martin, do you think your latest patch can be committed? |
|||
| msg189771 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2013年05月21日 16:24 | |
Antoine's request for benchmarks still stands. I continue to think that it should be applied even in absence of benchmarks. In the absence of third opinions on this specific aspect, I don't think it can be applied. |
|||
| msg189824 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2013年05月22日 17:03 | |
I can't speak for Antoine, but I guess that the result of pybench would be enough to make sure it doesn't introduce any regression (which would be *really* suprising). As for the memory savings, the benchmark you posted earlier is conclusive enough IMO (especially since the it can be difficult to come up with a scheme leading to heap fragmentation). |
|||
| msg189825 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年05月22日 17:24 | |
I asked for benchmarks because I don't know anything about Windows virtual memory management, but if other people think this patch should go in then it's fine. The main point of using VirtualAlloc/VirtualFree was, in my mind, to allow *releasing* memory in more cases than when relying on free() (assuming Windows uses some sbrk() equivalent). But perhaps Windows is already tuned to release memory on most free() calls. |
|||
| msg189850 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2013年05月23日 06:34 | |
Ah ok. I guess tuples.py then indeed demonstrates a saving. I'll apply the patch. |
|||
| msg189876 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2013年05月23日 20:07 | |
Set also issue #3329 which proposes an API to define memory allocators. |
|||
| msg190427 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2013年05月31日 23:29 | |
I tested VirtualAlloc/VirtualFree versus malloc/free on Windows 7 SP1 64-bit. On my small test, when using VirtualAlloc/VirtualFree, the memory peak is lower (ex: 58.1 MB vs 59.0), and the memory usage is the same or sometimes lower. The difference is small, malloc() implementation on Windows 7 is efficient! But I am in favor of using VirtualAlloc/VirtualFree because it is the native API and the gain may be bigger on a real application. -- I used the following script for my test: https://bitbucket.org/haypo/misc/raw/98eb42a3ed2144141d62c75e3d07933839fe2a0c/python/python_memleak.py I reused get_process_mem_info() code from psutil to get current and peak memory usage (I failed to install psutil, I don't understand why). I also replace func() of my script with tuples.py to create many tuples. -- Python < 3.3 wastes a lot of memory with python_memleak.py. Python 3.3 behaves much better thanks to the usage of mmap() on Linux, and the fixed threshold on 64-bit (min=512 bytes, instead of 256). |
|||
| msg191252 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2013年06月16日 02:05 | |
Martin von Loewis: "If we take the route proposed by this patch, I recommend also dropping all other CRT malloc() calls in Python, and make allocations from the process heap instead (that's a separate issue, though)." => see issue #18203 |
|||
| msg191253 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2013年06月16日 02:11 | |
haypo> I tested VirtualAlloc/VirtualFree versus malloc/free haypo> on Windows 7 SP1 64-bit. On my small test, ... I realized that I was no precise: I tried attached va.diff patch. I didn't try to replace completly malloc(). |
|||
| msg191461 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2013年06月19日 12:03 | |
> Ah ok. I guess tuples.py then indeed demonstrates a saving. I'll apply the patch. According to my test, the memory usage is a little bit better with the patch. So Martin:,do you plan to commit the patch? Or is a benchmark required? Or should check first check the Low Fragmentation Allocator? I plan to test the Low Fragmentation Allocator, at least on Windows 7. But I prefer to do it later, I'm working on the PEP 445 right now. |
|||
| msg191504 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年06月20日 08:24 | |
> I plan to test the Low Fragmentation Allocator, at least on Windows 7. I don't think it can be any better than raw mmap() / VirtualAlloc()... |
|||
| msg191506 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2013年06月20日 08:45 | |
>> I plan to test the Low Fragmentation Allocator, at least on Windows 7. > I don't think it can be any better than raw mmap() / VirtualAlloc()... I mean using the Low Fragmentation Allocator for PyObject_Malloc() instead of pymalloc. Martin wrote (msg148605): "As an alternative approach, Python could consider completely dropping obmalloc on Windows, and using a Windows Low Fragementation Heap (LFH) instead, with HEAP_NO_SERIALIZE (as the heap would be protected by the GIL)." |
|||
| msg191507 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2013年06月20日 10:50 | |
Ok, I'm going to commit this patch. Any further revisions (including reversions) can be done then. |
|||
| msg191939 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年06月27日 10:24 | |
New changeset 44f455e6163d by Martin v. Löwis in branch 'default': Issue #13483: Use VirtualAlloc in obmalloc on Windows. http://hg.python.org/cpython/rev/44f455e6163d |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:24 | admin | set | github: 57692 |
| 2013年06月27日 10:24:52 | loewis | set | status: open -> closed resolution: fixed |
| 2013年06月27日 10:24:20 | python-dev | set | nosy:
+ python-dev messages: + msg191939 |
| 2013年06月20日 10:50:56 | loewis | set | messages: + msg191507 |
| 2013年06月20日 08:45:06 | vstinner | set | messages: + msg191506 |
| 2013年06月20日 08:24:56 | pitrou | set | messages: + msg191504 |
| 2013年06月19日 12:03:01 | vstinner | set | messages: + msg191461 |
| 2013年06月17日 22:42:29 | trent | set | nosy:
+ trent |
| 2013年06月16日 02:11:36 | vstinner | set | messages: + msg191253 |
| 2013年06月16日 02:05:20 | vstinner | set | messages: + msg191252 |
| 2013年06月06日 14:17:53 | giampaolo.rodola | set | nosy:
+ giampaolo.rodola |
| 2013年05月31日 23:29:07 | vstinner | set | messages: + msg190427 |
| 2013年05月23日 20:07:33 | vstinner | set | nosy:
+ vstinner messages: + msg189876 |
| 2013年05月23日 06:34:07 | loewis | set | messages: + msg189850 |
| 2013年05月22日 17:24:57 | pitrou | set | stage: commit review messages: + msg189825 versions: + Python 3.4, - Python 3.3 |
| 2013年05月22日 17:03:57 | neologix | set | messages: + msg189824 |
| 2013年05月21日 16:24:28 | loewis | set | messages: + msg189771 |
| 2013年05月21日 14:21:57 | neologix | set | messages: + msg189760 |
| 2012年06月21日 17:11:04 | loewis | set | files:
+ va.diff messages: + msg163351 |
| 2012年06月21日 17:10:39 | loewis | set | files:
+ tuples.py messages: + msg163350 |
| 2011年11月29日 23:12:56 | pitrou | set | messages: + msg148625 |
| 2011年11月29日 22:39:18 | brian.curtin | set | messages: + msg148623 |
| 2011年11月29日 22:13:23 | loewis | set | messages: + msg148621 |
| 2011年11月29日 21:04:01 | tim.golden | set | messages: + msg148612 |
| 2011年11月29日 21:02:08 | pitrou | set | messages: + msg148611 |
| 2011年11月29日 20:48:33 | loewis | set | nosy:
+ loewis messages: + msg148605 |
| 2011年11月26日 13:12:42 | pitrou | create | |