This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年04月16日 12:10 by Robert.Elsner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| unpack_memory_leak.py | Robert.Elsner, 2012年04月16日 12:10 | Minimal example | ||
| unpack_memory_leak.py | Robert.Elsner, 2012年04月16日 12:30 | Revised minimal example for Python 3.1 and 2.6 | ||
| struct_repeat.patch | serhiy.storchaka, 2012年04月16日 16:09 | Patch for reduction of memory consumption | review | |
| struct_sizeof.patch | serhiy.storchaka, 2012年04月23日 21:01 | Patch for implementing struct.__sizeof__ for measuring memory consumption | review | |
| struct_sizeof-2.patch | serhiy.storchaka, 2012年06月23日 20:41 | review | ||
| struct_repeat_2.patch | serhiy.storchaka, 2013年05月10日 13:04 | review | ||
| Messages (38) | |||
|---|---|---|---|
| msg158418 - (view) | Author: Robert Elsner (Robert.Elsner) | Date: 2012年04月16日 12:10 | |
When unpacking multiple files with _variable_ length, struct unpack leaks massive amounts of memory. The corresponding functions from numpy (fromfile) or the array (fromfile) standard lib module behave as expected. I prepared a minimal testcase illustrating the problem on Python 2.6.6 (r266:84292, Dec 26 2010, 22:31:48) [GCC 4.4.5] on linux2 This is a severe limitation when reading big files where performance is critical. The struct.Struct class does not display this behavior. Note that the variable length of the buffer is necessary to reproduce the problem (as is usually the case with real data files). I suspect this is due to some internal buffer in the struct module not being freed after use. I did not test on later Python versions, but could not find a related bug in the tracker. |
|||
| msg158419 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2012年04月16日 12:18 | |
Do you see the same results with Python 2.7? Python 2.6 is only receiving security bugfixes at this point. |
|||
| msg158420 - (view) | Author: Robert Elsner (Robert.Elsner) | Date: 2012年04月16日 12:22 | |
I would love to test but I am in a production environment atm and can't really spare the time to set up a test box. But maybe somebody with access to 2.7 on linux could test it with the supplied script (just start it and it should happily eat 8GB of memory - I think most users are going to notice ;) |
|||
| msg158421 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2012年04月16日 12:25 | |
I suspect that this is due to the struct module cache, which caches Struct instances corresponding to formats used. If that's true, there's no real leak as such. As a test, what happens if you increase your xrange(30) to xrange(300)? (And perhaps decrease the size of the struct itself a bit to compensate). You should see that memory usage stays constant after the first ~100 runs. Using Struct directly is a good workaround if this is a problem. |
|||
| msg158422 - (view) | Author: Robert Elsner (Robert.Elsner) | Date: 2012年04月16日 12:30 | |
Well seems like 3.1 is in the Debian repos as well. Same memory leak. So it is very unlikely it has been fixed in 2.7. I modified the test case to be compatible to 3.1 and 2.6. |
|||
| msg158423 - (view) | Author: Robert Elsner (Robert.Elsner) | Date: 2012年04月16日 12:35 | |
Well the problem is, that performance is severely degraded when calling unpack multiple times. I do not know in advance the size of the files and they might vary in size from 1M to 1G. I could use some fixed-size buffer which is inefficient depending on the file size (too big or too small). And if I change the buffer on the fly, I end up with the memory leak. I think the caching should take into account the available memory on the system. the no_leak function has comparable performance without the leak. And I think there is no point in caching Struct instances when they go out of scope and can not be accessed anymore? If i let it slip from the scope I do not want to use it thereafter. Especially considering that struct.Struct behaves as expected as do array.fromfile and numpy.fromfile. |
|||
| msg158427 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年04月16日 12:51 | |
> I suspect that this is due to the struct module cache, which caches > Struct instances corresponding to formats used. If that's true, > there's no real leak as such. Well, the posted code creates 30 struct instances. That shouldn't exhaust the memory of a 8GB box (which it does here)... |
|||
| msg158429 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年04月16日 12:56 | |
Yes, the problem is the place to be in Python 2.7, in Python 3.2 and in Python 3.3. Random size of the structure is important -- if you remove the randint, leakage will not. The memory is not released in cached structuress, which are created in module-level unpack. I now deal with it. |
|||
| msg158431 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年04月16日 13:02 | |
It appears the storage of Struct instances is rather inefficient when there's a repeat code such as "<48L". In this example, 48 almost identical structures describing the "L" format (struct _formatcode) will be created. You can guess what happens with large repeat counts... |
|||
| msg158442 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2012年04月16日 13:25 | |
> It appears the storage of Struct instances is rather inefficient when > there's a repeat code such as "<48L" Right. Repeat counts aren't directly supported in the underlying PyStructObject; a format string containing repeat counts is effectively 'compiled' to a series of (type, offset, size) triples before it can be used. The caching is there to save repeated compilations when the same format string is used repeatedly. |
|||
| msg158451 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2012年04月16日 13:53 | |
Perhaps the best quick fix would be to only cache small PyStructObjects, for some value of 'small'. (Total size < a few hundred bytes, perhaps.) |
|||
| msg158453 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年04月16日 13:59 | |
> Perhaps the best quick fix would be to only cache small > PyStructObjects, for some value of 'small'. (Total size < a few > hundred bytes, perhaps.) Or perhaps not care at all? Is there a use case for huge repeat counts? (limiting cacheability could decrease performance in existing applications) |
|||
| msg158456 - (view) | Author: Robert Elsner (Robert.Elsner) | Date: 2012年04月16日 14:21 | |
Well I stumbled across this leak while reading big files. And what is the point of having a fast C-level unpack when it can not be used with big files? I am not adverse to the idea of caching the format string but if the cache grows beyond a reasonable size, it should be freed. And "reasonable" is not the number of objects contained but the amount of memory it consumes. And caching an arbitrary amount of data (8GB here) is a waste of memory. And reading the Python docs, struct.Struct.unpack which is _not_ affected from the memory leak is supposed to be faster. Quote: > class struct.Struct(format) > > Return a new Struct object which writes and reads binary data according to the format string format. Creating a Struct object once and calling its methods is more efficient than calling the struct functions with the same format since the format string only needs to be compiled once. Caching in case of struct.Struct is straightforward: As long as the object exists, the format string is cached and if the object is no longer accessible, its memory gets freed - including the cached format string. The problem is with the "magic" creation of struct.Struct objects by struct.unpack that linger around even after all associated variables are no longer in scope. Using for example fixed 1MB buffer to read files (regardless of size) incurs a huge performance penalty. Reading everything at once into memory using struct.unpack (or with the same speed struct.Struct.unpack) is the fastest way. Approximately 40% faster than array.fromfile and and 70% faster than numpy.fromfile. I read some unspecified report about a possible memory leak in struct.unpack but the author did not investigate further. It took me quite some time to figure out what exactly happens. So there should be at least a warning about this (ugly) behavior when reading big files for speed and a pointer to a quick workaround (using struct.Struct.unpack). cheers Am 16.04.2012 15:59, schrieb Antoine Pitrou: > > Antoine Pitrou <pitrou@free.fr> added the comment: > >> Perhaps the best quick fix would be to only cache small >> PyStructObjects, for some value of 'small'. (Total size < a few >> hundred bytes, perhaps.) > > Or perhaps not care at all? Is there a use case for huge repeat counts? > (limiting cacheability could decrease performance in existing > applications) > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue14596> > _______________________________________ |
|||
| msg158462 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2012年04月16日 15:06 | |
> Or perhaps not care at all? That's also possible. :-) IMO, Robert's use-case doesn't really match the intended use-case for struct (parsing structures of values laid out like a C-struct ). There the caching makes sense. |
|||
| msg158479 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年04月16日 16:09 | |
The proposed patch uses a more compact encoding format of large structures. |
|||
| msg158862 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2012年04月20日 17:56 | |
IMO, the struct module does what it's intended to do just fine here. I don't a big need for any change. I'd propose closing this as "won't fix". |
|||
| msg159003 - (view) | Author: Robert Elsner (Robert.Elsner) | Date: 2012年04月23日 09:24 | |
Well then at least the docs need an update. I simply fail to see how a cache memory leak constitutes "just fine" (while the caching behavior of struct.unpack is not documented - if somebody wants caching, he ought to use struct.Struct.unpack which does cache and does not leak). Something like a warning: "struct.unpack might display memory leaks when parsing big files using large format strings" might be sufficient. I do not like the idea of code failing outside some not-documented "use-case". Especially as those problems usually indicate some underlying design flaw. I did not review the proposed patch but might find time to have a look in a few months. cheers Am 20.04.2012 19:56, schrieb Mark Dickinson: > > Mark Dickinson <dickinsm@gmail.com> added the comment: > > IMO, the struct module does what it's intended to do just fine here. I don't a big need for any change. I'd propose closing this as "won't fix". > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue14596> > _______________________________________ |
|||
| msg159011 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2012年04月23日 12:58 | |
Other of our functions that do caching have been fixed so that the cache does not grow unbounded (usually by using lrucache, I think). IMO a cache that is unbounded by default is a bug. |
|||
| msg159012 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年04月23日 13:00 | |
AFAIU, it's not unbounded, but it's the memory footprint of individual cached objects which can grow senseless. |
|||
| msg159013 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2012年04月23日 13:05 | |
If that is the case, then a doc footnote that a large repeat count will result in a large cache seems appropriate. Some way to control the max size of the cache would also be a reasonable enhancement request, at which point the cache size issue could be documented along with the cache control. |
|||
| msg159079 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年04月23日 21:01 | |
Here is a patch that implements method __sizeof__ for Struct. This can be used to limit the caching. In any case, it is useful to know the real memory consumption of the object. I'm not sure of the correctness of the method implementation. |
|||
| msg163595 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年06月23日 12:21 | |
Reduction of memory consumption of struct is a new feature. Any chance to commit struct_repeat.patch+struct_sizeof.patch today and to get this feature in Python 3.3? |
|||
| msg163596 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2012年06月23日 12:24 | |
I'm still not convinced that something like struct_repeat.patch is necessary. So unless someone else wants to own this issue and review the struct_repeat, I'd say that it's too late for 3.3. |
|||
| msg163599 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年06月23日 12:47 | |
Now internal representation of Struct with small format string may
consume unexpectedly large memory and this representation may be
invisible cached. With patch you can get large internal representation
only for large format strings. It is expected.
And how about struct_sizeof.patch? Now sys.getsizeof() returns wrong
result for Struct:
28
>>> sys.getsizeof(struct.Struct('100B'))
28
The patch (it compatible with both Struct representations) fixes it:
52
>>> sys.getsizeof(struct.Struct('100B'))
1240
|
|||
| msg163612 - (view) | Author: Mark Dickinson (mark.dickinson) * (Python committer) | Date: 2012年06月23日 13:57 | |
The struct_sizeof patch looks fine, but lacks tests. I think it might be reasonable to call this a bugfix. |
|||
| msg163670 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年06月23日 20:41 | |
Here is Struct.__sizeof__ patch with tests. |
|||
| msg165812 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年07月18日 21:55 | |
Please, can anyone do the review? |
|||
| msg165892 - (view) | Author: Meador Inge (meador.inge) * (Python committer) | Date: 2012年07月20日 00:27 | |
I just read through all this and see three separate points be discussed: 1. The unbounded caching behavior. 2. A more compact representation for repeat counts. 3. Correct __sizeof__ support for struct. For issue (1) I think this is unfortunate, but I don't think any code changes are required because there is already a way to get the cached and uncached behavior by using the free function or a Struct object, respectively. I do think adjusting the documentation is appropriate. As a side note, we do have a private function named 'struct._clearcache' that is used by the regression tests. If others really think the caching is a problem we could make that public. Issues (2) and (3) are hijacking this issue IMO. However, since they are being discussed... I already implemented something like (2) when reworking the struct data structures for PEP 3118 in issue3132. Hopefully I will get the PEP 3118 patch pushed through one of these days. I do think issue (3) should be fixed, but a separate issue should be opened for it. This issue should just address the caching behavior. Serhiy, if you open another issue for the __sizeof__ change, then I promise to review ASAP. |
|||
| msg165903 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年07月20日 06:50 | |
> I do think issue (3) should be fixed, but a separate issue should be opened for it. Issue #15402. |
|||
| msg188831 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年05月10日 13:04 | |
So what about more compact Struct object? This is not only memory issue (a Struct object can require several times larger memory than size of processed data), but performance issue (time of creating such object is proportional to it's size). Here is a patch with updated __sizeof__ method and tests. |
|||
| msg188840 - (view) | Author: Meador Inge (meador.inge) * (Python committer) | Date: 2013年05月10日 14:28 | |
> > Serhiy Storchaka added the comment: > > So what about more compact Struct object? I already implemented the count optimization as a part of my patch for implementing PEP 3188 in issue3132. I need to rebaseline the patch. It has gotten stale. Hopefully there is still interest in the PEP 3188 work and I can get that patch pushed through. |
|||
| msg189214 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年05月14日 12:41 | |
I don't think Serhiy's patch should be blocked by a larger issue. I suppose you could rebase easily over his changes. |
|||
| msg189234 - (view) | Author: Meador Inge (meador.inge) * (Python committer) | Date: 2013年05月14日 16:37 | |
> I don't think Serhiy's patch should be blocked by a larger issue. > I suppose you could rebase easily over his changes. Where rebase=undo, sure. The changes for issue3132 are pretty extensive (the basic data structures are changed). And as mentioned in msg165892, msg188840, and msg125617 I have already proposed and implemented this optimization many months back. If we feel that this optimization is really critical, then I agree let's not hold it up and I will just work around it with my patch for issue3132. I don't see it as that critical, but I understand that the PEP 3118 changes are dragging on and this optimization might be important for some now. |
|||
| msg189239 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年05月14日 17:07 | |
Le mardi 14 mai 2013 à 16:37 +0000, Meador Inge a écrit : > If we feel that this optimization is really critical, then I agree > let's not hold it up and I will just work around it with my patch for > issue3132. I don't see it as that critical, but I understand that the > PEP 3118 changes are dragging on and this optimization might be important > for some now. Are you sure the PEP 3118 changes will land in 3.4? It would be a pity to lose a simple improvement because it was deferred to a bigger change. |
|||
| msg189249 - (view) | Author: Meador Inge (meador.inge) * (Python committer) | Date: 2013年05月14日 20:35 | |
> Are you sure the PEP 3118 changes will land in 3.4? It would be a pity > to lose a simple improvement because it was deferred to a bigger > change. No, I am not sure. That is why I said that I understand if others felt this bug was critical to fix now since the PEP 3118 changes were dragging on. In that case I will just rework my patch. I am not trying to stand in the way of this patch. I just wanted folks to be aware that this approach was implemented in the PEP 3118 work. |
|||
| msg189429 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年05月17日 07:50 | |
New changeset 6707637f68ca by Serhiy Storchaka in branch 'default': Issue #14596: The struct.Struct() objects now use more compact implementation. http://hg.python.org/cpython/rev/6707637f68ca |
|||
| msg189431 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年05月17日 08:07 | |
We already lost this improvement for 3.3. :( Could we now close the issue? |
|||
| msg189440 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2013年05月17日 09:48 | |
Closing indeed! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:29 | admin | set | github: 58801 |
| 2013年05月17日 09:48:47 | pitrou | set | status: open -> closed resolution: fixed messages: + msg189440 stage: patch review -> resolved |
| 2013年05月17日 08:07:36 | serhiy.storchaka | set | messages: + msg189431 |
| 2013年05月17日 07:50:21 | python-dev | set | nosy:
+ python-dev messages: + msg189429 |
| 2013年05月14日 20:35:19 | meador.inge | set | messages: + msg189249 |
| 2013年05月14日 17:07:45 | pitrou | set | messages: + msg189239 |
| 2013年05月14日 16:37:54 | meador.inge | set | messages: + msg189234 |
| 2013年05月14日 12:41:24 | pitrou | set | messages:
+ msg189214 versions: + Python 3.4, - Python 2.7, Python 3.2, Python 3.3 |
| 2013年05月10日 14:28:44 | meador.inge | set | messages: + msg188840 |
| 2013年05月10日 13:04:40 | serhiy.storchaka | set | files:
+ struct_repeat_2.patch messages: + msg188831 |
| 2012年07月20日 06:50:59 | serhiy.storchaka | set | messages: + msg165903 |
| 2012年07月20日 00:27:47 | meador.inge | set | messages:
+ msg165892 stage: patch review |
| 2012年07月18日 22:02:34 | meador.inge | set | nosy:
+ meador.inge |
| 2012年07月18日 21:55:48 | serhiy.storchaka | set | messages: + msg165812 |
| 2012年06月23日 20:41:43 | serhiy.storchaka | set | files:
+ struct_sizeof-2.patch messages: + msg163670 |
| 2012年06月23日 13:57:24 | mark.dickinson | set | messages: + msg163612 |
| 2012年06月23日 12:47:04 | serhiy.storchaka | set | messages: + msg163599 |
| 2012年06月23日 12:24:14 | mark.dickinson | set | messages: + msg163596 |
| 2012年06月23日 12:21:05 | serhiy.storchaka | set | messages: + msg163595 |
| 2012年04月23日 21:01:55 | serhiy.storchaka | set | files:
+ struct_sizeof.patch messages: + msg159079 |
| 2012年04月23日 13:05:09 | r.david.murray | set | messages: + msg159013 |
| 2012年04月23日 13:00:18 | pitrou | set | messages: + msg159012 |
| 2012年04月23日 12:58:47 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg159011 |
| 2012年04月23日 09:24:00 | Robert.Elsner | set | messages: + msg159003 |
| 2012年04月20日 17:56:27 | mark.dickinson | set | messages: + msg158862 |
| 2012年04月18日 20:09:17 | jcea | set | nosy:
+ jcea |
| 2012年04月16日 16:09:56 | serhiy.storchaka | set | files:
+ struct_repeat.patch keywords: + patch messages: + msg158479 |
| 2012年04月16日 15:06:46 | mark.dickinson | set | messages: + msg158462 |
| 2012年04月16日 14:21:30 | Robert.Elsner | set | messages: + msg158456 |
| 2012年04月16日 13:59:05 | pitrou | set | messages: + msg158453 |
| 2012年04月16日 13:53:52 | mark.dickinson | set | messages: + msg158451 |
| 2012年04月16日 13:25:28 | mark.dickinson | set | messages: + msg158442 |
| 2012年04月16日 13:02:51 | pitrou | set | messages: + msg158431 |
| 2012年04月16日 12:56:21 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg158429 |
| 2012年04月16日 12:51:28 | pitrou | set | versions:
+ Python 2.7, Python 3.2, Python 3.3, - Python 2.6, Python 3.1 nosy: + pitrou messages: + msg158427 type: resource usage |
| 2012年04月16日 12:35:43 | Robert.Elsner | set | messages: + msg158423 |
| 2012年04月16日 12:30:27 | Robert.Elsner | set | files:
+ unpack_memory_leak.py messages: + msg158422 versions: + Python 3.1 |
| 2012年04月16日 12:25:07 | mark.dickinson | set | messages: + msg158421 |
| 2012年04月16日 12:22:03 | Robert.Elsner | set | messages: + msg158420 |
| 2012年04月16日 12:18:43 | mark.dickinson | set | nosy:
+ mark.dickinson messages: + msg158419 |
| 2012年04月16日 12:10:11 | Robert.Elsner | create | |