This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011年12月08日 12:52 by phillies, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| pickle_overflow.diff | neologix, 2011年12月17日 22:15 | |||
| pickle_overflow-1.diff | neologix, 2011年12月17日 22:46 | |||
| pickle_overflow-2.diff | neologix, 2011年12月22日 22:40 | |||
| pickle_overflow-4.diff | serhiy.storchaka, 2013年02月11日 21:09 | review | ||
| pickle | Arfrever, 2013年02月25日 19:57 | |||
| Messages (34) | |||
|---|---|---|---|
| msg149027 - (view) | Author: Philipp Lies (phillies) | Date: 2011年12月08日 12:52 | |
When I try to load a large file (>1GB) cPickle crashes with a MemoryError:
$python test.py
Traceback (most recent call last):
File "/tmp/test.py", line 8, in <module>
A2 = cPickle.load(f2)
MemoryError
test.py contains following code:
import numpy as np
import cPickle
A = np.random.randn(196,240000)
f = open('test.pydat', 'w')
cPickle.dump(A,f)
f.close()
f2 = open('test.pydat', 'rb')
A2 = cPickle.load(f2)
System:
cPickle 1.71
python 2.7.2
Ubuntu 11.10 amd64
Memory is not an issue as a) pickle works nicely and b) my computer has 122GB free RAM
|
|||
| msg149028 - (view) | Author: Ramchandra Apte (Ramchandra Apte) * | Date: 2011年12月08日 12:56 | |
Maybe Ubuntu doesn't think it is safe to allocate the memory. |
|||
| msg149029 - (view) | Author: Philipp Lies (phillies) | Date: 2011年12月08日 13:02 | |
Well, replace cPickle by pickle and it works. So if there is a memory allocation problem cPickle should be able to handle it, especially since it should be completely compatible to pickle. |
|||
| msg149034 - (view) | Author: Ramchandra Apte (Ramchandra Apte) * | Date: 2011年12月08日 13:38 | |
Have you checked the system monitor after all cPickle can use more memory than 1GB. |
|||
| msg149035 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年12月08日 13:40 | |
Is there a way to reproduce that doesn't involve numpy? |
|||
| msg149049 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2011年12月08日 18:57 | |
> my computer has 122GB free RAM 122, or 1.22? > Well, replace cPickle by pickle and it works. cPickle makes some direct call to malloc()/realloc()/free(), contrarily to pickle which uses pymalloc. This could lead to heap fragmentation. What does the following return ? $ /usr/bin/time -v python test.py |
|||
| msg149300 - (view) | Author: Philipp Lies (phillies) | Date: 2011年12月12日 13:39 | |
a) it's 122GB free RAM (out of 128GB total RAM) b) when I convert the numpy array to a list it works. So seems to be a problem with cPickle and numpy at/from a certain array size c) $ /usr/bin/time -v python test_np.py Traceback (most recent call last): File "test_np.py", line 12, in <module> A2 = cPickle.load(f2) MemoryError Command exited with non-zero status 1 Command being timed: "python test_np.py" User time (seconds): 73.72 System time (seconds): 4.56 Percent of CPU this job got: 87% Elapsed (wall clock) time (h:mm:ss or m:ss): 1:29.52 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 7402448 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 726827 Voluntary context switches: 41043 Involuntary context switches: 7793 Swaps: 0 File system inputs: 3368 File system outputs: 2180744 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 1 hth |
|||
| msg149319 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2011年12月12日 16:00 | |
@Antoine Couldn't this be linked to #11564 (pickle not 64-bit ready)? AFAICT this wasn't fixed in 2.7. Basically, an integer overflow, and malloc() would bail out when asked a ridiculous size. @Philipp I'd be curious to see the last lines of $ ltrace -e malloc python test_np.py you'll probably see a call to malloc() return NULL. |
|||
| msg149326 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年12月12日 16:20 | |
> Couldn't this be linked to #11564 (pickle not 64-bit ready)? Well, I don't know anything about numpy, but: >>> 196 * 240000 47040000 >>> 196 * 240000 * 8 # assuming 8 bytes per float 376320000 >>> 2**31 2147483648 So it seems unlikely to be the explanation. After all the report says ">1GB" for the file size, which is still comfortably in the capabilities of a 32-bit module. Philipp, perhaps you could try to run Python under gdb and try to diagnose where the MemoryError occurs? Also, posting the disassembly (using pickletools.dis()) of a very small equivalent pickle (e.g. of randn(2,3)) could help us know what is involved. |
|||
| msg149327 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年12月12日 16:35 | |
Oh, what's more interesting is that it works here (Python 2.7.1 and numpy 1.6.1 under Mageia with 8GB RAM). Looking at a pickle disassembly, the only remarkable thing is the presence of a long binary string (the raw serialization of all IEEE floats), which shouldn't give any problem to cPickle (and indeed doesn't). What is your numpy version, Philipp? |
|||
| msg149673 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2011年12月17日 16:21 | |
> So it seems unlikely to be the explanation.
Victor reproduced in on IRC, and it's indeed an overflow.
The problematic code is in readline_file:
"""
bigger = self->buf_size << 1;
if (bigger <= 0) { /* overflow */
PyErr_NoMemory();
return -1;
}
newbuf = (char *)realloc(self->buf, bigger);
if (!newbuf) {
PyErr_NoMemory();
return -1;
}
"""
self->buf_size is an int, which overflow pretty easily.
>>> 196 * 240000
47040000
>>> 196 * 240000 * 8 # assuming 8 bytes per float
376320000
>>> 2**31
2147483648
Hmmm... A byte is 8 bit, which gives:
>>> 196 * 240000 * 8 * 8
3010560000L
>>> 196 * 240000 * 8 * 8 > 2**31
True
Now, if it works on your box, it's probably due to the compiler optimizing the check away. Since `bigger` is cast to an unsigned 64-bit (size_t) when calling malloc(), it happens to work.
Maybe your distro doesn't build python with -fwrapv.
So, what do you suggest? Should we fix this (Py_ssize_t, overflow check before computation), as in #11564?
|
|||
| msg149674 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年12月17日 16:24 | |
> Should we fix this (Py_ssize_t, overflow check before computation), as in #11564? Yes. Use Py_ssize_t type for the buf_size attribute, and replace "bigger <= 0" (test if an overflow occurred) by "self->buf_size > (PY_SSIZE_T_MAX >> 1)". |
|||
| msg149676 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年12月17日 16:35 | |
Ah, I see. It's a bit of a pity not to be able to load files > 1GB, especially on a 64-bit build (!). Perhaps cPickle could be made partly 64-bit compatible? Or at least, indeed, do a proper anti-overflow check. |
|||
| msg149711 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2011年12月17日 22:15 | |
Here's a patch which should fix this. However, I'm unable to test it. |
|||
| msg149713 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年12月17日 22:25 | |
I think there's a problem here: + self->data = realloc(self->data, self->size * sizeof(PyObject *)); + if (self->data == NULL) goto nomemory; If realloc() fails, the old data pointer is lost and therefore will never get free()ed. Same for: + self->buf = (char *)realloc(self->buf, self->buf_size); Here: - int *marks; - s=self->marks_size+20; - if (s <= self->num_marks) s=self->num_marks + 1; + size_t alloc; + Py_ssize_t *marks; + + /* Use the size_t type to check for overflow. */ + alloc = ((size_t)self->num_marks << 1) + 20; It seems you are changing the overallocation algorithm (from additive to multiplicative). I'm not sure it should be in the scope of the patch, although multiplicative overallocation is generally better. |
|||
| msg149715 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2011年12月17日 22:46 | |
New version. |
|||
| msg149757 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年12月18日 14:21 | |
> New version. Looks good to me. |
|||
| msg150117 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2011年12月22日 22:40 | |
Here's a new version with a test (untested). Note that I'm absolutely not sure that the 'memsize' argument to bigmemtest is correct. |
|||
| msg150762 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2012年01月06日 19:16 | |
Antoine, could you test the last version (test_pickle and if possible with the OP testcase)? I can't test it myself (32-bit machine with 1 GB). |
|||
| msg150765 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年01月06日 19:33 | |
Le vendredi 06 janvier 2012 à 19:17 +0000, Charles-François Natali a écrit : > Charles-François Natali <neologix@free.fr> added the comment: > > Antoine, could you test the last version (test_pickle and if possible > with the OP testcase)? > I can't test it myself (32-bit machine with 1 GB). Well, first, there are compilation warnings on a 64-bit box: gcc -pthread -fPIC -fno-strict-aliasing -g -O2 -g -O0 -Wall -Wstrict-prototypes -I. -IInclude -I./Include -I/usr/local/include -I/home/antoine/cpython/27/Include -I/home/antoine/cpython/27 -c /home/antoine/cpython/27/Modules/cPickle.c -o build/temp.linux-x86_64-2.7-pydebug/home/antoine/cpython/27/Modules/cPickle.o /home/antoine/cpython/27/Modules/cPickle.c: In function ‘put2’: /home/antoine/cpython/27/Modules/cPickle.c:810:9: attention : format ‘%d’ expects type ‘int’, but argument 4 has type ‘Py_ssize_t’ /home/antoine/cpython/27/Modules/cPickle.c: In function ‘load_mark’: /home/antoine/cpython/27/Modules/cPickle.c:4610:21: attention : assignment from incompatible pointer type gcc -pthread -shared build/temp.linux-x86_64-2.7-pydebug/home/antoine/cpython/27/Modules/cPickle.o -L/usr/local/lib -o build/lib.linux-x86_64-2.7-pydebug/cPickle.so Second, I can't seem to get the test to work with 8GB RAM (approximately 6.5GB free according to "free"). The MemoryError is quite expectable for test_pickle, though, since the code there doesn't try to conserve memory at all: test test_pickle failed -- Traceback (most recent call last): File "/home/antoine/cpython/27/Lib/test/test_support.py", line 983, in wrapper return f(self, maxsize) File "/home/antoine/cpython/27/Lib/test/pickletester.py", line 1298, in test_huge_str_32b pickled = self.dumps(data, proto) File "/home/antoine/cpython/27/Lib/test/test_pickle.py", line 74, in dumps return pickle.dumps(arg, proto) File "/home/antoine/cpython/27/Lib/pickle.py", line 1374, in dumps Pickler(file, protocol).dump(obj) File "/home/antoine/cpython/27/Lib/pickle.py", line 224, in dump self.save(obj) File "/home/antoine/cpython/27/Lib/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/home/antoine/cpython/27/Lib/pickle.py", line 488, in save_string self.write(STRING + repr(obj) + '\n') MemoryError I would therefore suggest to enable the test only for cPickle. For test_cpickle the behaviour is different: - for protocol 0, I get a MemoryError (which may be expected, if the test truly needs more than 6.5GB, for example because of suboptimal buffer management) - for protocol 1 and 2, I get "SystemError: error return without exception set" |
|||
| msg178042 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2012年12月24日 09:14 | |
Could someone with a 64-bit box take over this one? I won't go anywhere with my 32-bit box... |
|||
| msg178559 - (view) | Author: Ramchandra Apte (Ramchandra Apte) * | Date: 2012年12月30日 08:24 | |
Bump. @neologix I have a 64-bit laptop with 2 GB memory so I don't think I can do so. (though one could use swap) |
|||
| msg178564 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年12月30日 10:42 | |
I may tackle this but rare 2.7-only bugs are pretty low on my priorities list. |
|||
| msg178566 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2012年12月30日 11:29 | |
> I have a 64-bit laptop with 2 GB memory so I don't think I can do so. (though one could use swap) AFAICT, a binary string a little longer than 1GB should be enough to reproduce the bug. Just make sure Python isn't built with '-fwrapv'. |
|||
| msg180763 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年01月27日 12:32 | |
Charles-François, are you going to finish this before 2.7.4 RC released? The patch should be updated to address Antoine's comments. |
|||
| msg180777 - (view) | Author: Charles-François Natali (neologix) * (Python committer) | Date: 2013年01月27日 17:41 | |
> Charles-François, are you going to finish this before 2.7.4 RC released? The patch should be updated to address Antoine's comments. No. I don't have access to a 64-bit box, which makes it difficult to write (I don't get compiler warnings) and test. Feel free to pick up the patch and update it! |
|||
| msg180792 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年01月27日 20:49 | |
I have picked up it. Actually BINSTRING format supports only strings less 2GiB, so test should be changed. But there are other bugs which prohibit pickling a lot of data. For one I open issue17054. |
|||
| msg181750 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年02月09日 18:21 | |
Here is an updated patch. Fixed two bugs found by Antoine (an inappropriate format and a memory error in bigmemtest), fixed resizing of marks array and one possible integer overflow in write_other(). Used workaround to bypass limitations of cStringIO API. |
|||
| msg181932 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年02月11日 21:09 | |
Test updated too. Now it doesn't try to write a string larger than 2 GiB (it's impossible), instead writes a lot of shorter strings with total size larger than 2 GiB. |
|||
| msg181973 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年02月12日 19:37 | |
New changeset 680959a3ae2e by Serhiy Storchaka in branch '2.7': Issue #13555: cPickle now supports files larger than 2 GiB. http://hg.python.org/cpython/rev/680959a3ae2e |
|||
| msg182979 - (view) | Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) | Date: 2013年02月25日 19:57 | |
680959a3ae2e has caused that g-ir-scanner tool from gobject-introspection (which seems to create some pickles and next load them?) fails with MemoryError, e.g. during building of GTK+. MemoryError occurs only for a subset of pickles. I attach a small pickle, which allows to reproduce this problem. Download this pickle and save it in /tmp. Additional preparation: $ cd /tmp $ wget -q http://ftp.gnome.org/pub/gnome/sources/gobject-introspection/1.34/gobject-introspection-1.34.2.tar.xz $ tar -xJf gobject-introspection-1.34.2.tar.xz $ sed -e '/^with LibtoolImporter/,/^$/d' -i gobject-introspection-1.34.2/giscanner/xmlwriter.py Behavior before 680959a3ae2e: $ PYTHONPATH="gobject-introspection-1.34.2" python2.7 -c 'import cPickle; print(cPickle.load(open("pickle")))' <giscanner.girparser.GIRParser object at 0x7f506299be90> Behavior after 680959a3ae2e: $ PYTHONPATH="gobject-introspection-1.34.2" python2.7 -c 'import cPickle; print(cPickle.load(open("pickle")))' Traceback (most recent call last): File "<string>", line 1, in <module> MemoryError |
|||
| msg183025 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年02月26日 08:08 | |
New changeset f3f23ecdb1c6 by Serhiy Storchaka in branch '2.7': Issue #13555: Fix an integer overflow check. http://hg.python.org/cpython/rev/f3f23ecdb1c6 |
|||
| msg183026 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年02月26日 08:18 | |
Thank you for the report. Standard tests do not cover pickling/unpickling to real files. |
|||
| msg183030 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年02月26日 08:42 | |
I have opened issue17299 for testing issue. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:24 | admin | set | github: 57764 |
| 2013年02月26日 14:10:33 | jcea | set | nosy:
+ jcea |
| 2013年02月26日 08:42:24 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages: + msg183030 stage: resolved |
| 2013年02月26日 08:18:14 | serhiy.storchaka | set | messages: + msg183026 |
| 2013年02月26日 08:08:41 | python-dev | set | messages: + msg183025 |
| 2013年02月25日 19:57:14 | Arfrever | set | status: closed -> open files: + pickle nosy: + benjamin.peterson stage: resolved -> (no value) messages: + msg182979 resolution: fixed -> (no value) priority: normal -> release blocker |
| 2013年02月15日 08:26:40 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2013年02月12日 19:37:47 | python-dev | set | nosy:
+ python-dev messages: + msg181973 |
| 2013年02月11日 21:09:50 | serhiy.storchaka | set | files:
+ pickle_overflow-4.diff messages: + msg181932 |
| 2013年02月11日 21:07:19 | serhiy.storchaka | set | files: - pickle_overflow-3.diff |
| 2013年02月09日 18:21:40 | serhiy.storchaka | set | files:
+ pickle_overflow-3.diff assignee: serhiy.storchaka messages: + msg181750 stage: needs patch -> patch review |
| 2013年01月28日 09:04:30 | serhiy.storchaka | set | dependencies: + cStringIO not 64-bit safe, - cStringIO.StringIO aborted when more then INT_MAX bytes written |
| 2013年01月27日 20:49:24 | serhiy.storchaka | set | dependencies:
+ cStringIO.StringIO aborted when more then INT_MAX bytes written messages: + msg180792 |
| 2013年01月27日 17:41:53 | neologix | set | messages: + msg180777 |
| 2013年01月27日 12:32:38 | serhiy.storchaka | set | messages: + msg180763 |
| 2012年12月30日 11:29:16 | neologix | set | messages: + msg178566 |
| 2012年12月30日 10:42:36 | pitrou | set | messages: + msg178564 |
| 2012年12月30日 08:24:03 | Ramchandra Apte | set | messages: + msg178559 |
| 2012年12月24日 09:14:23 | neologix | set | messages: + msg178042 |
| 2012年12月02日 10:23:39 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka stage: needs patch |
| 2012年07月30日 18:21:26 | Arfrever | set | nosy:
+ Arfrever |
| 2012年01月06日 19:33:26 | pitrou | set | messages: + msg150765 |
| 2012年01月06日 19:16:59 | neologix | set | messages: + msg150762 |
| 2011年12月22日 22:40:59 | neologix | set | files:
+ pickle_overflow-2.diff messages: + msg150117 |
| 2011年12月18日 14:21:10 | pitrou | set | messages: + msg149757 |
| 2011年12月18日 11:13:33 | flox | set | nosy:
+ flox |
| 2011年12月17日 22:46:05 | neologix | set | files:
+ pickle_overflow-1.diff messages: + msg149715 |
| 2011年12月17日 22:25:28 | pitrou | set | messages: + msg149713 |
| 2011年12月17日 22:15:35 | neologix | set | files:
+ pickle_overflow.diff keywords: + patch messages: + msg149711 |
| 2011年12月17日 16:35:30 | pitrou | set | messages: + msg149676 |
| 2011年12月17日 16:24:08 | vstinner | set | nosy:
+ vstinner messages: + msg149674 |
| 2011年12月17日 16:21:29 | neologix | set | messages: + msg149673 |
| 2011年12月12日 16:35:35 | pitrou | set | messages: + msg149327 |
| 2011年12月12日 16:20:22 | pitrou | set | messages: + msg149326 |
| 2011年12月12日 16:03:24 | neologix | set | messages: - msg149321 |
| 2011年12月12日 16:02:21 | neologix | set | messages: + msg149321 |
| 2011年12月12日 16:00:33 | neologix | set | messages: + msg149319 |
| 2011年12月12日 13:39:15 | phillies | set | messages: + msg149300 |
| 2011年12月08日 18:57:44 | neologix | set | nosy:
+ neologix messages: + msg149049 |
| 2011年12月08日 13:40:36 | pitrou | set | nosy:
+ mark.dickinson |
| 2011年12月08日 13:40:28 | pitrou | set | nosy:
+ pitrou messages: + msg149035 components: + Extension Modules, - None |
| 2011年12月08日 13:38:18 | Ramchandra Apte | set | messages: + msg149034 |
| 2011年12月08日 13:02:08 | phillies | set | messages: + msg149029 |
| 2011年12月08日 12:56:55 | Ramchandra Apte | set | nosy:
+ Ramchandra Apte messages: + msg149028 |
| 2011年12月08日 12:52:02 | phillies | create | |