This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2009年08月13日 16:45 by matthiastroffaes, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| py3k-clearfreelist-dict_set_list.patch | matthiastroffaes, 2009年08月13日 16:45 | clearfreelist for dict, set, and list (against py3k, revision 74419) | review | |
| py3k-clearfreelist-gc_collect.patch | matthiastroffaes, 2009年08月13日 16:49 | also call the new PyXXX_ClearFreeList functions during gc.collect() | review | |
| py3k-freelist_test.py | matthiastroffaes, 2009年08月13日 16:57 | script to test memory usage after gc.collect() | ||
| py3k-tp_free_list-dict.patch | matthiastroffaes, 2009年08月14日 16:23 | tp_free_list for dict implementation, including call from gc.collect() | ||
| py3k-clearfreelist-time_gc_collect.patch | matthiastroffaes, 2009年08月17日 11:43 | patch to time effect of freeing freelists during garbage collection | ||
| py3k-rev81387-clearfreelist-dict_set_list.patch | matthiastroffaes, 2010年05月21日 12:11 | clearfreelist for dict, set, and list (against py3k, revision 81387) | ||
| py3k-rev81387-clearfreelist-gc_collect.patch | matthiastroffaes, 2010年05月21日 12:12 | also call the new PyXXX_ClearFreeList functions during gc.collect() (against py3k, revision 81387) | ||
| py3k-rev81387-clearfreelist-time_gc_collect.patch | matthiastroffaes, 2010年05月21日 12:12 | patch to time effect of freeing freelists during garbage collection (against py3k, revision 81387) | ||
| py3k-04082011-clearfreelist-dict_set_list.patch | matthiastroffaes, 2011年08月04日 13:44 | |||
| Messages (23) | |||
|---|---|---|---|
| msg91520 - (view) | Author: Matthias Troffaes (matthiastroffaes) | Date: 2009年08月13日 16:45 | |
The Python C API provides PyXXX_ClearFreeList functions to allow the float, int, etc... freelists to be freed, potentially releasing memory to the OS earlier. Currently, there is no such API for the dict, set, and list freelists. The attached patch adds PyXXX_ClearFreeList functions to the C API, so the dict, set, and list freelists can be freed as well. |
|||
| msg91521 - (view) | Author: Matthias Troffaes (matthiastroffaes) | Date: 2009年08月13日 16:49 | |
I attach a second patch which also calls the new PyXXX_ClearFreeList functions on garbage collection, during gc.collect(). |
|||
| msg91523 - (view) | Author: Matthias Troffaes (matthiastroffaes) | Date: 2009年08月13日 16:57 | |
I'm also attaching a test script to check the effect of the two patches on gc.collect(). If many objects are allocated, space savings appear to be relevant. Before applying the patch (debug build on linux 64 bit): Memory used (begin): 121Mb memtest 2000000 int ==================== Memory used (peak): 297Mb Memory used (end): 122Mb Unfreed memory: 1Mb memtest 2000000 str ==================== Memory used (peak): 455Mb Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 float ==================== Memory used (peak): 236Mb Memory used (end): 127Mb Unfreed memory: 6Mb memtest 2000000 int ==================== Memory used (peak): 313Mb Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 Test ==================== Memory used (peak): 372Mb Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 Test2 ==================== Memory used (peak): 361Mb Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 _tuple ==================== Memory used (peak): 529Mb Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 _set ==================== Memory used (peak): 846Mb Memory used (end): 765Mb Unfreed memory: 644Mb memtest 2000000 _dict ==================== Memory used (peak): 1241Mb Memory used (end): 1241Mb Unfreed memory: 1120Mb memtest 2000000 Test3 ==================== Memory used (peak): 1241Mb Memory used (end): 765Mb Unfreed memory: 644Mb [40720 refs] After applying the patch (same build system): Memory used (begin): 121Mb memtest 2000000 int ==================== Memory used (peak): 298Mb Memory used (end): 121Mb Unfreed memory: 0Mb memtest 2000000 str ==================== Memory used (peak): 455Mb Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 float ==================== Memory used (peak): 236Mb Memory used (end): 127Mb Unfreed memory: 6Mb memtest 2000000 int ==================== Memory used (peak): 312Mb Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 Test ==================== Memory used (peak): 374Mb Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 Test2 ==================== Memory used (peak): 361Mb Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 _tuple ==================== Memory used (peak): 528Mb Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 _set ==================== Memory used (peak): 846Mb Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 _dict ==================== Memory used (peak): 1240Mb Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 Test3 ==================== Memory used (peak): 999Mb Memory used (end): 123Mb Unfreed memory: 2Mb [40740 refs] |
|||
| msg91526 - (view) | Author: Skip Montanaro (skip.montanaro) * (Python triager) | Date: 2009年08月13日 18:00 | |
Instead of expanding the C API for each type which supports a free list perhaps there should be a single call, say, PyObject_ClearFreeList, which takes a pointer to the appropriate type object as an argument. PyTypeObject can then grow a tp_free_list slot through which the function calls a type-specific free list clearing function. Skip |
|||
| msg91558 - (view) | Author: Matthias Troffaes (matthiastroffaes) | Date: 2009年08月14日 16:23 | |
Thanks for the feedback! Attaching a new patch which implements tp_free_list slot as suggested - I hope I did it correctly. I've only implemented the new slot for dict so far, but I'm happy to tp_free_list-ify the other freelist types as well, in a future patch, if this gets the green light. Description of the patch: * added tp_free_list slot to PyTypeObject (definitely for review: is the location of tp_free_list right after tp_free sensible?) * added PyType_ClearFreeList(PyTypeObject *) to C API, which calls the tp_free_list function if not NULL * inserted the new slot where necessary (e.g. in PyGen_Type) to sync type definitions with the updated PyTypeObject * created dict_free_list function and added it to PyDict_Type * call PyType_ClearFreeList(&PyDict_Type) from gc.collect() |
|||
| msg91561 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2009年08月14日 17:04 | |
Guido, do you want a slot assigned for this? |
|||
| msg91563 - (view) | Author: Guido van Rossum (gvanrossum) * (Python committer) | Date: 2009年08月14日 17:19 | |
A slot in every type object for this purpose seems wasteful; the large majority of types won't have a free list. (Remember, each user-defined class allocates a full type structure on the heap.) |
|||
| msg91565 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2009年08月14日 18:19 | |
Marking the PyXXX_ClearFreeList version as approved. |
|||
| msg91593 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2009年08月15日 05:37 | |
Does anyone here know why GC calls the free_xxx functions? ISTM, they cannot be involved in cycles. Free lists are kept by container objects to speed-up allocation. Having GC call the free_xxx just slows down the GC process and all the subsequent set/list/tuple allocations until the free lists are built-up again. IMO, the free_xxx functions should only be called during atexit or by an explicit call from the user perhaps, sys.clear_freelists() or somesuch. |
|||
| msg91613 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年08月15日 17:35 | |
The reason is that users expect gc.collect() to make its best to diminish memory use. Clearing free lists can allow deallocting some arenas which otherwise would still contain some used memory blocks. As the comment says: /* Clear all free lists * All free lists are cleared during the collection of the highest generation. * Allocated items in the free list may keep a pymalloc arena occupied. * Clearing the free lists may give back memory to the OS earlier. */ Full collections (collections of the oldest generation) are rather rare, so the performance impact is probably minimal, and it helps reduce memory fragmentation from time to time (which can produce significant effect as shown in Matthias' example). |
|||
| msg91618 - (view) | Author: Raymond Hettinger (rhettinger) * (Python committer) | Date: 2009年08月15日 22:06 | |
> The reason is that users expect gc.collect() to make > its best to diminish memory use. I thought GC was expected to eliminate reference cycles. Perhaps there ought to be a separate API, such as sys.clear_freelists(), to eliminate other memory use when needed. Putting this in GC seems like feature creep and has negative performance implications (long running programs will likely find an immediate need to reallocate the freed members). > Allocated items in the free list may keep a pymalloc arena occupied. > * Clearing the free lists may give back memory to the OS earlier. These are both good reasons to expose the functionality somewhere. |
|||
| msg91623 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年08月15日 22:50 | |
Le samedi 15 août 2009 à 22:06 +0000, Raymond Hettinger a écrit : > Raymond Hettinger <rhettinger@users.sourceforge.net> added the comment: > > > The reason is that users expect gc.collect() to make > > its best to diminish memory use. > > I thought GC was expected to eliminate reference cycles. Of course, but it's also the de facto API when wanting to reclaim memory. The face that a single function call is sufficient is a good thing in itself. > Perhaps there ought to be a separate API, such as > sys.clear_freelists(), to eliminate other memory use when > needed. Putting this in GC seems like feature creep and > has negative performance implications (long running programs > will likely find an immediate need to reallocate the freed > members). Performance claims should be substanstiated with actual numbers, otherwise it's too easy to clutter the API with gratuitous complications. The impact of reallocating may be negligible, or it might even be positive if it improves cache locality. |
|||
| msg91630 - (view) | Author: Skip Montanaro (skip.montanaro) * (Python triager) | Date: 2009年08月16日 14:28 | |
>> I thought GC was expected to eliminate reference cycles. Antoine> Of course, but it's also the de facto API when wanting to Antoine> reclaim memory. When did that happen? I agree with Raymond. The cyclic gc should just reclaim cycles. Skip |
|||
| msg91631 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年08月16日 14:43 | |
> When did that happen? I agree with Raymond. The cyclic gc should just > reclaim cycles. People don't care about referential cycles, they care about freeing some memory (if memory was available in infinite quantities nobody would care about breaking cycles). That's how the API is used most of the time, IMO. And that's why measurements of the usefulness of calling gc.collect() are usually done in megabytes, not in number of references :-). So, while I agree that it sounds bizarre for the GC to do other memory-related tasks, it's also quite practical. Besides, the GC already has a heuristic for *when* to cleanup memory, and it makes sense to reuse this heuristic for other memory cleanup tasks, rather than to invent another heuristic or put the burden of the task on the user (who usually won't even know what those freelists are). |
|||
| msg91633 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2009年08月16日 15:13 | |
The change was originally made in r60797, « Implemented Martin's suggestion to clear the free lists during the garbage collection of the highest generation ». |
|||
| msg91634 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2009年08月16日 15:18 | |
FWIW, I agree with Antoine here. I think user expectation is that when "garbage" is collected, at least some freed memory will be returned to the operating system. The normal user's conception of what "garbage" is has nothing to do with cycles. It just so happens that in CPython, that's the main thing the garbage collector collects. |
|||
| msg91635 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2009年08月16日 17:08 | |
I still stand by my suggestion to free memory as a side effect of garbage collection. It may well be that an application will start re-allocating blocks that soon end up in the free list again. OTOH, it may also be that releasing those free lists, along with the freeing that the GC just did, can cause arenas to become completely free, and thus be returned to the operating system. Users really really want Python to return memory to the operating system whenever possible, and on its own; those free lists can block memory from being returned, more or less unreasonably. So unless it can be demonstrated (preferably in a realistic application) that clearing the free lists has a measurable negative impact, I propose to keep things the way they are. IMO, it would be best if we could eliminate the freelists altogether. |
|||
| msg91637 - (view) | Author: Gregory P. Smith (gregory.p.smith) * (Python committer) | Date: 2009年08月16日 18:23 | |
+1 on the PyXXX_ClearFreeList patch and calling them from gc.collect() as is done with the others. I agree with Guido, don't add a tp_free_list slot as the common case would be NULL. Regarding gc clearing freelists: I agree with Antoine and Martin. Clearing free lists in the highest generation of GC is a very good thing. Rebuilding them infrequently should not have a significant performance impact and makes long running python jobs better behaved by releasing more memory when possible. |
|||
| msg91659 - (view) | Author: Matthias Troffaes (matthiastroffaes) | Date: 2009年08月17日 11:43 | |
To aid the discussion, I attach another quick patch which reports the time spent on PyXXX_ClearFreeList calls during highest generation garbage collection (including gc.collect()). For simplicity, the timer uses clock() so the resolution is quite limited (appears to be 10ms on my machine) and I don't claim that this is the best way of measuring execution speed, but at least it gives some indication. The patch also gives an indication at how frequently the highest generation is collected. Below is the result of the patch on the py3k-freelist_test.py test script, on my machine (again, debug build). For reference, I've measured the total time spent by the script as well, with the time command. Summarizing, (30+70+30+30+50+70+20+110)/48420.0 = 0.0085 = 0.85% of time is spent on freeing the freelists, in my test. Another way to look at the data is that it roughly takes 10ms for each 100MB allocated (at least for the types of data in the script). Floats seem to be an exception and take at least twice as long (not sure why). Keep in mind that the test merely allocates and deallocates memory, without doing much else, so it isn't a typical Python application. $ time ./python py3k-freelist_test.py Memory used (begin): 121Mb memtest 2000000 int ==================== Memory used (peak): 297Mb cleared free lists in 0 clock ticks (0.000000ms) Memory used (end): 121Mb Unfreed memory: 0Mb memtest 2000000 str ==================== Memory used (peak): 455Mb cleared free lists in 30000 clock ticks (30.000000ms) Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 float ==================== Memory used (peak): 236Mb cleared free lists in 70000 clock ticks (70.000000ms) Memory used (end): 127Mb Unfreed memory: 6Mb memtest 2000000 int ==================== Memory used (peak): 312Mb cleared free lists in 0 clock ticks (0.000000ms) Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 Test ==================== cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) Memory used (peak): 372Mb cleared free lists in 30000 clock ticks (30.000000ms) Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 Test2 ==================== cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) Memory used (peak): 361Mb cleared free lists in 30000 clock ticks (30.000000ms) Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 _tuple ==================== Memory used (peak): 529Mb cleared free lists in 50000 clock ticks (50.000000ms) Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 _set ==================== cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) Memory used (peak): 846Mb cleared free lists in 70000 clock ticks (70.000000ms) Memory used (end): 123Mb Unfreed memory: 2Mb memtest 200000 _dict ==================== Memory used (peak): 233Mb cleared free lists in 20000 clock ticks (20.000000ms) Memory used (end): 123Mb Unfreed memory: 2Mb memtest 2000000 Test3 ==================== cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) cleared free lists in 0 clock ticks (0.000000ms) Memory used (peak): 999Mb cleared free lists in 110000 clock ticks (110.000000ms) Memory used (end): 123Mb Unfreed memory: 2Mb cleared free lists in 0 clock ticks (0.000000ms) [40763 refs] real 0m50.921s user 0m48.420s sys 0m2.330s |
|||
| msg106231 - (view) | Author: Matthias Troffaes (matthiastroffaes) | Date: 2010年05月21日 12:30 | |
I uploaded updates of the three relevant patches against the current revision of the py3k branch, as the old patches no longer applied cleanly due to whitespace changes. To summarize: * The first patch, py3k-rev81387-clearfreelist-dict_set_list.patch, simply adds freelist methods to the public API for dict, list, and set. No opposition has been expressed against this, so I hope this can be accepted. * The second patch, py3k-rev81387-clearfreelist-gc_collect.patch, adds calls to these methods to gc.collect() - some opposition was expressed against the (already present before my patch!!) method of freeing lists during highest generation garbage collection. I attempted to measure the actual time spent on freeing the freelists in a simply python program which does a lot of allocation (attached as py3k-freelist_test.py). This apparently shows that clearing the freelists does not affect timing much at all. * The third patch, file py3k-rev81387-clearfreelist-time_gc_collect.patch, causes estimates of the time spent on freeing the freelists to be printed to the console, and is obviously for testing/benchmarking purpose only. * The tp_free_list patch is no longer relevant (see comment by Guido). Hoping for a conclusion of this issue, Matthias |
|||
| msg141627 - (view) | Author: Matthias Troffaes (matthiastroffaes) | Date: 2011年08月04日 13:44 | |
Patch against current tip attached. I can no longer reproduce the large memory leaks with the current tip (which is of course wonderful!), so I guess the second part of the patch (freeing the freelists during gc.collect) makes no longer sense. |
|||
| msg149609 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2011年12月16日 10:25 | |
New changeset 57f0af61da53 by Antoine Pitrou in branch 'default': Issue #6695: Full garbage collection runs now clear the freelist of set objects. http://hg.python.org/cpython/rev/57f0af61da53 |
|||
| msg149610 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年12月16日 10:26 | |
The clearing the dict and list freelists has been added independently, so I committed the part of the patch pertaining to sets. Thank you! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:51 | admin | set | github: 50944 |
| 2011年12月16日 10:26:26 | pitrou | set | status: open -> closed versions: + Python 3.3, - Python 3.2 messages: + msg149610 resolution: fixed stage: resolved |
| 2011年12月16日 10:25:43 | python-dev | set | nosy:
+ python-dev messages: + msg149609 |
| 2011年08月04日 13:44:05 | matthiastroffaes | set | files:
+ py3k-04082011-clearfreelist-dict_set_list.patch messages: + msg141627 |
| 2010年05月21日 15:26:04 | vstinner | set | nosy:
+ vstinner |
| 2010年05月21日 12:30:45 | matthiastroffaes | set | messages: + msg106231 |
| 2010年05月21日 12:12:56 | matthiastroffaes | set | files: + py3k-rev81387-clearfreelist-time_gc_collect.patch |
| 2010年05月21日 12:12:17 | matthiastroffaes | set | files: + py3k-rev81387-clearfreelist-gc_collect.patch |
| 2010年05月21日 12:11:40 | matthiastroffaes | set | files: + py3k-rev81387-clearfreelist-dict_set_list.patch |
| 2010年05月20日 20:31:21 | skip.montanaro | set | nosy:
- skip.montanaro |
| 2009年08月17日 11:43:11 | matthiastroffaes | set | files:
+ py3k-clearfreelist-time_gc_collect.patch messages: + msg91659 |
| 2009年08月16日 18:23:47 | gregory.p.smith | set | nosy:
+ gregory.p.smith messages: + msg91637 |
| 2009年08月16日 17:08:03 | loewis | set | messages: + msg91635 |
| 2009年08月16日 15:57:02 | rhettinger | set | resolution: accepted -> (no value) |
| 2009年08月16日 15:18:28 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg91634 |
| 2009年08月16日 15:13:55 | pitrou | set | nosy:
+ loewis, christian.heimes messages: + msg91633 |
| 2009年08月16日 14:43:54 | pitrou | set | messages: + msg91631 |
| 2009年08月16日 14:28:04 | skip.montanaro | set | messages: + msg91630 |
| 2009年08月15日 22:50:43 | pitrou | set | messages: + msg91623 |
| 2009年08月15日 22:06:15 | rhettinger | set | messages: + msg91618 |
| 2009年08月15日 17:35:02 | pitrou | set | nosy:
+ pitrou messages: + msg91613 |
| 2009年08月15日 05:37:49 | rhettinger | set | messages: + msg91593 |
| 2009年08月14日 18:19:58 | rhettinger | set | resolution: accepted messages: + msg91565 |
| 2009年08月14日 17:19:33 | gvanrossum | set | assignee: gvanrossum -> messages: + msg91563 |
| 2009年08月14日 17:04:05 | rhettinger | set | assignee: gvanrossum messages: + msg91561 nosy: + gvanrossum, rhettinger |
| 2009年08月14日 16:23:26 | matthiastroffaes | set | files:
+ py3k-tp_free_list-dict.patch messages: + msg91558 |
| 2009年08月13日 18:00:56 | skip.montanaro | set | nosy:
+ skip.montanaro messages: + msg91526 |
| 2009年08月13日 16:57:44 | matthiastroffaes | set | files:
+ py3k-freelist_test.py messages: + msg91523 |
| 2009年08月13日 16:49:07 | matthiastroffaes | set | files:
+ py3k-clearfreelist-gc_collect.patch messages: + msg91521 |
| 2009年08月13日 16:45:30 | matthiastroffaes | create | |