This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011年03月15日 22:58 by nyevik, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| pickle64.patch | pitrou, 2011年08月12日 18:03 | |||
| pickle64-3.3.patch | nadeem.vawda, 2011年08月16日 19:19 | review | ||
| pickle64-4.patch | pitrou, 2011年08月27日 12:56 | |||
| Messages (28) | |||
|---|---|---|---|
| msg131060 - (view) | Author: Nik Galitsky (nyevik) | Date: 2011年03月15日 22:58 | |
Python 3.2 on linux (RHEL 5.3) x86_64 build from source code.
Configure options:
./configure --prefix=/scratch/Python-3.2 --enable-big-digits=30 --with-universal-archs=all --with-fpectl --enable-shared
Built with GCC 4.3.3 with major options
-g3 -O3 -m64 -fPIC.
Testcase that shows the issue:
#import numpy
import pickle
print("begin")
#a = numpy.zeros((2.5e9 / 8,), dtype = numpy.float64)
a = ('a' * (2 ** 31))
print("allocated")
#print(a);
pickle.dumps(a, pickle.DEFAULT_PROTOCOL)
print("end")
The problem as I can see it is that in pickle.py most types defined either as 2 bytes, or 4 bytes.
For example it is peppered with lines like:
self.write(SOMETYPE + pack("<i", n) + obj)
while pickling,
when unpickling:
len = mloads('i' + self.read(4))
Which limits the range and the size of the datatype that can be pickled, if I understand correctly.
replacing in pickle.py above lines with something like
self.write(SOMETYPE + pack("<Q", n) + obj)
and
len = mloads('Q' + self.read(8))
lets above testcase run to completion.
Othervise it crashes (on Python 2.7.1 with SIGSEGV) on Python 3.2 strace shows:
.......
open("/scratch/Python-3.2/lib/python3.2/lib-dynload/_pickle.cpython-32m.so", O_RDONLY) = 4
fstat(4, {st_mode=S_IFREG|0755, st_size=412939, ...}) = 0
open("/scratch/hpl005/UIT_test/apps_exc/Python-3.2/lib/python3.2/lib-dynload/_pickle.cpython-32m.so", O_RDONLY) = 5
read(5, "177円ELF2円1円1円0円0円0円0円0円0円0円0円0円3円0円>0円1円0円0円0円300円>0円0円0円0円0円0円"..., 832) = 832
fstat(5, {st_mode=S_IFREG|0755, st_size=412939, ...}) = 0
mmap(NULL, 2185384, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 5, 0) = 0x2b05b5f68000
mprotect(0x2b05b5f7b000, 2093056, PROT_NONE) = 0
mmap(0x2b05b617a000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 5, 0x12000) = 0x2b05b617a000
close(5) = 0
close(4) = 0
close(3) = 0
write(1, "begin\n", 6begin
) = 6
mmap(NULL, 4294971392, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b05b617e000
write(1, "allocated\n", 10allocated
) = 10
mmap(NULL, 8589938688, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b06b617f000
mremap(0x2b06b617f000, 8589938688, 2147487744, MREMAP_MAYMOVE) = 0x2b06b617f000
mmap(NULL, 4294971392, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b0736180000
munmap(0x2b06b617f000, 2147487744) = 0
munmap(0x2b0736180000, 4294971392) = 0
write(2, "Traceback (most recent call last"..., 35Traceback (most recent call last):
) = 35
write(2, " File \"pickle_long.py\", line 9,"..., 45 File "pickle_long.py", line 9, in <module>
) = 45
open("pickle_long.py", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=251, ...}) = 0
ioctl(3, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7ffff9f7c9e0) = -1 ENOTTY (Inappropriate ioctl for device)
fstat(3, {st_mode=S_IFREG|0644, st_size=251, ...}) = 0
lseek(3, 0, SEEK_CUR) = 0
dup(3) = 4
fcntl(4, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fstat(4, {st_mode=S_IFREG|0644, st_size=251, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b06b617f000
lseek(4, 0, SEEK_CUR) = 0
read(4, "#import numpy\n\nimport pickle\npri"..., 4096) = 251
close(4) = 0
munmap(0x2b06b617f000, 4096) = 0
lseek(3, 0, SEEK_SET) = 0
lseek(3, 0, SEEK_CUR) = 0
read(3, "#import numpy\n\nimport pickle\npri"..., 4096) = 251
close(3) = 0
write(2, " pickle.dumps(a, pickle.DEFAU"..., 45 pickle.dumps(a, pickle.DEFAULT_PROTOCOL)
) = 45
write(2, "SystemError: error return withou"..., 48SystemError: error return without exception set
) = 48
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x2b05b118e4c0}, {0x2b05b0e7a570, [], SA_RESTORER, 0x2b05b118e4c0}, 8) = 0
munmap(0x2b05b617e000, 4294971392) = 0
exit_group(1) = ?
Why is this limitation?
Please advise.
|
|||
| msg131062 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年03月15日 23:05 | |
Indeed: >>> s = b'a' * (2**31) >>> d = pickle.dumps(s) Traceback (most recent call last): File "<stdin>", line 1, in <module> SystemError: error return without exception set There are two aspects to this: - (bugfix) raise a proper exception when an object too large for handling by pickle is given - (feature) improve the pickle protocol to handle objects larger than (2**31-1) elements The improvement to the pickle protocol should probably be considered along other improvements, because we don't want to create a new protocol too often. See also issue9614. |
|||
| msg131064 - (view) | Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) | Date: 2011年03月15日 23:15 | |
We could resort to the text-based protocol which doesn't have these limitations with respect to object lengths (IIRC). Performance won't be amazing, but we won't have to modify the current pickle protocol. |
|||
| msg131118 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2011年03月16日 14:18 | |
On Tue, Mar 15, 2011 at 7:05 PM, Antoine Pitrou <report@bugs.python.org> wrote: .. > - (bugfix) raise a proper exception when an object too large for handling by pickle is given What would be the "proper exception" here? With _pickle acceleration disabled, I get a struct.error: $ cat p.py import sys sys.modules['_pickle'] = None import pickle s = b'a' * (2**31) d = pickle.dumps(s) $ ./python.exe p.py Traceback (most recent call last): .. File "Lib/pickle.py", line 496, in save_bytes self.write(BINBYTES + pack("<i", n) + bytes(obj)) struct.error: 'i' format requires -2147483648 <= number <= 2147483647 I would say "proper exception" would be ValueError, but that means that we should change python implementation in an incompatible way. |
|||
| msg131192 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年03月16日 22:59 | |
> On Tue, Mar 15, 2011 at 7:05 PM, Antoine Pitrou <report@bugs.python.org> wrote: > .. > > - (bugfix) raise a proper exception when an object too large for handling by pickle is given > > What would be the "proper exception" here? OverflowError. This is the exception that gets raised when some user-supplied value exceeds some internal limit. |
|||
| msg131193 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2011年03月16日 23:10 | |
On Wed, Mar 16, 2011 at 6:59 PM, Antoine Pitrou <report@bugs.python.org> wrote: .. >> >> What would be the "proper exception" here? > > OverflowError. This is the exception that gets raised when some > user-supplied value exceeds some internal limit. I don't think so. OverflowError is a subclass of ArithmeticError and is raised when result of an arithmetic operation cannot be represented by the python type. For example, Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: (34, 'Result too large') I don't think failing pickle dump should raise an ArithmeticError. |
|||
| msg131196 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年03月16日 23:27 | |
> I don't think so. OverflowError is a subclass of ArithmeticError and > is raised when result of an arithmetic operation cannot be represented > by the python type. For example, If you grep for OverflowError in the C code base, you'll see that it is in practice used for the present kind of error. Examples: - "signed short integer is less than minimum" - "%c arg not in range(0x110000)" - "size does not fit in an int" - "module name is too long" - "modification time overflows a 4 byte field" - "range too large to represent as a range_iterator" - "Python int too large to convert to C size_t" (at this point I am bored of pasting examples but you get the point) |
|||
| msg131197 - (view) | Author: Nik Galitsky (nyevik) | Date: 2011年03月16日 23:38 | |
Thank you all for your responses. While getting the meaningful error message in this case is very important, the main thing for us is to somehow fix this problem to allow larger objects serialization which is not at all uncommon on a 64-bit machines with large amounts of memory. This issue affects cPickle as well, I believe, as well cStringIO that uses pickle too, I believe. So, what are your plans/thoughts - would there be any action on fixing this problem in the near future? I think I grasp the extent of changes that need to be made to Python code, but I think this issue will have to bee addressed soonoer or later anyhow. |
|||
| msg141284 - (view) | Author: Jorgen Skancke (jorgsk) | Date: 2011年07月28日 07:47 | |
I recently ran into this problem when I tried to multiprocess jobs with large objects (3-4 GB). I have plenty of memory for this, but multiprocessing hangs without error, presumably because pickle hangs without error when multiprocessing tries to pickle the object. I can't offer a solution, but I can verify that the limitation in pickle is affecting Python usage. |
|||
| msg141981 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年08月12日 18:03 | |
This patch contains assorted improvements for 64-bit compatibility of the pickle module. The protocol still doesn't support >4GB bytes or str objects, but at least its behaviour shouldn't be misleading anymore. |
|||
| msg142216 - (view) | Author: Nadeem Vawda (nadeem.vawda) * (Python committer) | Date: 2011年08月16日 19:19 | |
pickle64.patch applies cleanly to 3.2, but not 3.3. I've attached an adapted version that applies cleanly to 3.3. |
|||
| msg142415 - (view) | Author: Nadeem Vawda (nadeem.vawda) * (Python committer) | Date: 2011年08月19日 03:39 | |
I have tried running the tests on a machine with 12GB of RAM, but when I do so, the new tests get skipped saying "not enough memory", even when I specify "-M 11G" on the command-line. The problem seems to be the change to the precisionbigmemtest decorator in test.support. I don't understand what the purpose of the "dryrun" flag is, but the modified condition for skipping doesn't look right to me. (Now that I think about it, I should be able to get the tests to run by undoing that one part of the change. I'll get back to you about the results later today.) |
|||
| msg142426 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年08月19日 12:23 | |
> I have tried running the tests on a machine with 12GB of RAM, but when I do so, > the new tests get skipped saying "not enough memory", even when I specify "-M 11G" > on the command-line. How much does it say is required? Did you remove the skips in BigmemPickleTests? > The problem seems to be the change to the precisionbigmemtest > decorator in test.support. I don't understand what the purpose of the "dryrun" > flag is, but the modified condition for skipping doesn't look right to me. Well, perhaps I got the logic wrong. Debugging welcome :) |
|||
| msg142434 - (view) | Author: Nadeem Vawda (nadeem.vawda) * (Python committer) | Date: 2011年08月19日 12:35 | |
> How much does it say is required? > Did you remove the skips in BigmemPickleTests? Yes, I did remove the skips. It says 2GB for some, and 4GB for others. > Well, perhaps I got the logic wrong. Debugging welcome :) I'd be glad to do so, but I'm not sure what the aim of the "dryrun" flag is. Do you want to make it the default that precisionbigmem tests are skipped, unless the decorator invocation explicitly specifies dryrun=False? |
|||
| msg142436 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年08月19日 12:38 | |
> I'd be glad to do so, but I'm not sure what the aim of the "dryrun" flag is. > Do you want to make it the default that precisionbigmem tests are skipped, > unless the decorator invocation explicitly specifies dryrun=False? No, the point is to avoid running these tests when -M is not specified. See what happens with other bigmem tests. |
|||
| msg142476 - (view) | Author: Nadeem Vawda (nadeem.vawda) * (Python committer) | Date: 2011年08月19日 16:48 | |
D'oh. I just realized why the -M option wasn't being recognized - I had passed it after the actual test name, so it was being treated as another test instead of an option. Sorry for the confusion :/ As for the actual test results, test_huge_bytes_(32|64)b both pass, but test_huge_str fails with this traceback: ====================================================================== FAIL: test_huge_str (test.test_pickle.InMemoryPickleTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/google/home/nadeemvawda/code/cpython/3.2/Lib/test/support.py", line 1108, in wrapper return f(self, maxsize) File "/usr/local/google/home/nadeemvawda/code/cpython/3.2/Lib/test/pickletester.py", line 1151, in test_huge_str self.dumps(data, protocol=proto) AssertionError: (<class 'ValueError'>, <class 'OverflowError'>) not raised The same error occurs on the default branch. |
|||
| msg142477 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年08月19日 16:58 | |
> D'oh. I just realized why the -M option wasn't being recognized - I had passed it > after the actual test name, so it was being treated as another test instead of an > option. Sorry for the confusion :/ > > As for the actual test results, test_huge_bytes_(32|64)b both pass, but > test_huge_str fails with this traceback: Can you replace "_2G" with "_4G" in the decorator for that test? |
|||
| msg142481 - (view) | Author: Nadeem Vawda (nadeem.vawda) * (Python committer) | Date: 2011年08月19日 18:01 | |
> Can you replace "_2G" with "_4G" in the decorator for that test? I'm not at work any more, but I'll try that out on Monday. |
|||
| msg142756 - (view) | Author: Nadeem Vawda (nadeem.vawda) * (Python committer) | Date: 2011年08月22日 21:47 | |
> Can you replace "_2G" with "_4G" in the decorator for that test? When I do that, it pushes the memory usage for the test up to 16GB, which is beyond what the machine can handle. When I tried with 2.5G (_2G * 5 // 4), that was enough to make it swap heavily (and in the end the test still failed). As an aside, it turns out the problem with -M being ignored wasn't due to me being stupid; it seems that -j doesn't pass the memlimit on to subprocesses. I'll open a separate issue for this. |
|||
| msg142759 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年08月22日 21:57 | |
> > Can you replace "_2G" with "_4G" in the decorator for that test? > > When I do that, it pushes the memory usage for the test up to 16GB, which is > beyond what the machine can handle. When I tried with 2.5G (_2G * 5 // 4), > that was enough to make it swap heavily (and in the end the test still failed). Uh, does it? With 4G it should raise OverflowError, and not try to do anything else. Could I ask you to try to take a look? :S > As an aside, it turns out the problem with -M being ignored wasn't due to me > being stupid; it seems that -j doesn't pass the memlimit on to subprocesses. > I'll open a separate issue for this. Running bigmem tests in parallel doesn't make much sense IMO. You want to run as many of them as you can, which requires that you allocate all memory to *one* test process. |
|||
| msg142760 - (view) | Author: Nadeem Vawda (nadeem.vawda) * (Python committer) | Date: 2011年08月22日 22:03 | |
> Uh, does it? With 4G it should raise OverflowError, and not try to do > anything else. > Could I ask you to try to take a look? :S Sure; I'll see what I can figure out tomorrow. > Running bigmem tests in parallel doesn't make much sense IMO. You want > to run as many of them as you can, which requires that you allocate all > memory to *one* test process. Yeah, actually running them in parallel isn't a sensible use. But it bit me because I was just using "make test EXTRATESTOPTS='-uall -M11G test_pickle'". It would be nice to have a warning so other people don't get confused by the same problem. I guess that shouldn't be too hard to arrange. |
|||
| msg142854 - (view) | Author: Nadeem Vawda (nadeem.vawda) * (Python committer) | Date: 2011年08月23日 18:35 | |
I was playing around with pickling large Unicode strings in an interactive interpreter, and it seems that you have to have at least 4G chars (not bytes) to trigger the OverflowError. Consider the following snippet of code: out = dumps(data) del data result = loads(out) assert isinstance(result, str) assert len(result) == _1G With data as (b"a" * _4G) the result is as expected: Traceback (most recent call last): File "pickle-bigmem-test.py", line 5, in <module> out = dumps(data) OverflowError: cannot serialize a string larger than 4GB But with (b"a" * _2G), I get this: Traceback (most recent call last): File "pickle-bigmem-test.py", line 7, in <module> result = loads(out) _pickle.UnpicklingError: BINUNICODE pickle has negative byte count |
|||
| msg142855 - (view) | Author: Nadeem Vawda (nadeem.vawda) * (Python committer) | Date: 2011年08月23日 18:41 | |
Some more info: the first few bytes of the output for the _2G case are this: b'\x80\x03X\x00\x00\x00\x80aaaaaa' |
|||
| msg142857 - (view) | Author: Nadeem Vawda (nadeem.vawda) * (Python committer) | Date: 2011年08月23日 18:58 | |
> With data as (b"a" * _4G) the result is as expected:
>
> Traceback (most recent call last):
> File "pickle-bigmem-test.py", line 5, in <module>
> out = dumps(data)
> OverflowError: cannot serialize a string larger than 4GB
>
> But with (b"a" * _2G), I get this:
>
> Traceback (most recent call last):
> File "pickle-bigmem-test.py", line 7, in <module>
> result = loads(out)
> _pickle.UnpicklingError: BINUNICODE pickle has negative byte count
Correction: these should be ("a" * _4G) and ("a" * _2G).
|
|||
| msg143064 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年08月27日 12:56 | |
Here is a new patch against 3.2. I can't say it works for sure, but it should be much better. It also adds a couple more tests. There seems to be a separate issue where pure-Python pickle.py considers 32-bit lengths signed where the C impl considers them unsigned... |
|||
| msg143166 - (view) | Author: Nadeem Vawda (nadeem.vawda) * (Python committer) | Date: 2011年08月29日 17:45 | |
Tested the latest patch with -M11G. All tests pass. |
|||
| msg143180 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2011年08月29日 21:24 | |
New changeset babc90f3cbf4 by Antoine Pitrou in branch '3.2': Issue #11564: Avoid crashes when trying to pickle huge objects or containers http://hg.python.org/cpython/rev/babc90f3cbf4 New changeset 56242682a931 by Antoine Pitrou in branch 'default': Issue #11564: Avoid crashes when trying to pickle huge objects or containers http://hg.python.org/cpython/rev/56242682a931 |
|||
| msg143183 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2011年08月29日 21:43 | |
Should be fixed as far as possible (OverflowErrors will be raised instead of crashing). Making people actually 64-bit compliant is part of PEP 3154 (http://www.python.org/dev/peps/pep-3154/). |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:14 | admin | set | github: 55773 |
| 2012年12月15日 19:15:18 | pitrou | link | issue10640 superseder |
| 2011年12月11日 01:22:28 | jcea | set | nosy:
+ jcea |
| 2011年08月29日 21:43:22 | pitrou | set | status: open -> closed resolution: fixed messages: + msg143183 stage: patch review -> resolved |
| 2011年08月29日 21:24:38 | python-dev | set | nosy:
+ python-dev messages: + msg143180 |
| 2011年08月29日 17:45:13 | nadeem.vawda | set | messages: + msg143166 |
| 2011年08月27日 12:56:13 | pitrou | set | files:
+ pickle64-4.patch messages: + msg143064 |
| 2011年08月23日 18:58:46 | nadeem.vawda | set | messages: + msg142857 |
| 2011年08月23日 18:41:58 | nadeem.vawda | set | messages: + msg142855 |
| 2011年08月23日 18:35:50 | nadeem.vawda | set | messages: + msg142854 |
| 2011年08月22日 22:03:44 | nadeem.vawda | set | messages: + msg142760 |
| 2011年08月22日 21:57:58 | pitrou | set | messages: + msg142759 |
| 2011年08月22日 21:47:07 | nadeem.vawda | set | messages: + msg142756 |
| 2011年08月19日 18:01:34 | nadeem.vawda | set | messages: + msg142481 |
| 2011年08月19日 16:58:58 | pitrou | set | messages: + msg142477 |
| 2011年08月19日 16:48:14 | nadeem.vawda | set | messages: + msg142476 |
| 2011年08月19日 12:38:34 | pitrou | set | messages: + msg142436 |
| 2011年08月19日 12:35:28 | nadeem.vawda | set | messages: + msg142434 |
| 2011年08月19日 12:23:03 | pitrou | set | messages: + msg142426 |
| 2011年08月19日 03:39:36 | nadeem.vawda | set | messages: + msg142415 |
| 2011年08月16日 19:20:00 | nadeem.vawda | set | files:
+ pickle64-3.3.patch messages: + msg142216 |
| 2011年08月16日 18:50:30 | nadeem.vawda | set | nosy:
+ nadeem.vawda |
| 2011年08月12日 18:03:16 | pitrou | set | files:
+ pickle64.patch versions: - Python 3.1 messages: + msg141981 keywords: + patch stage: patch review |
| 2011年07月28日 07:47:21 | jorgsk | set | nosy:
+ jorgsk messages: + msg141284 |
| 2011年04月26日 17:39:50 | santoso.wijaya | set | nosy:
+ santoso.wijaya |
| 2011年03月16日 23:38:51 | nyevik | set | nosy:
amaury.forgeotdarc, belopolsky, pitrou, alexandre.vassalotti, nyevik messages: + msg131197 |
| 2011年03月16日 23:27:04 | pitrou | set | nosy:
amaury.forgeotdarc, belopolsky, pitrou, alexandre.vassalotti, nyevik messages: + msg131196 |
| 2011年03月16日 23:10:08 | belopolsky | set | nosy:
amaury.forgeotdarc, belopolsky, pitrou, alexandre.vassalotti, nyevik messages: + msg131193 |
| 2011年03月16日 22:59:36 | pitrou | set | nosy:
amaury.forgeotdarc, belopolsky, pitrou, alexandre.vassalotti, nyevik messages: + msg131192 |
| 2011年03月16日 14:18:15 | belopolsky | set | nosy:
amaury.forgeotdarc, belopolsky, pitrou, alexandre.vassalotti, nyevik messages: + msg131118 |
| 2011年03月15日 23:15:52 | alexandre.vassalotti | set | nosy:
amaury.forgeotdarc, belopolsky, pitrou, alexandre.vassalotti, nyevik messages: + msg131064 |
| 2011年03月15日 23:05:52 | pitrou | set | nosy:
+ amaury.forgeotdarc, alexandre.vassalotti, pitrou, belopolsky title: pickle limits most datatypes -> pickle not 64-bit ready messages: + msg131062 versions: + Python 3.1, Python 3.3 |
| 2011年03月15日 22:58:27 | nyevik | create | |