homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Increase pickle compatibility
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Ramchandra Apte, alexandre.vassalotti, josh.r, pitrou, sbt, serhiy.storchaka, terry.reedy, vajrasky, vstinner
Priority: normal Keywords: patch

Created on 2011年12月09日 13:25 by sbt, last changed 2022年04月11日 14:57 by admin.

Files
File name Uploaded Description Edit
pickle_old_strings.patch serhiy.storchaka, 2015年05月12日 09:17 review
pickle-old-strings-2.patch serhiy.storchaka, 2017年03月06日 07:41 review
Messages (16)
msg149092 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011年12月09日 13:25
If you pickle an array object on python 3 the typecode is encoded as a unicode string rather than as a byte string. This makes python 2 reject the pickle.
#########################################
Python 3.3.0a0 (default, Dec 8 2011, 17:56:13) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle, array
>>> pickle.dumps(array.array('i', [1,2,3]), 2)
b'\x80\x02carray\narray\nq\x00X\x01\x00\x00\x00iq\x01]q\x02(K\x01K\x02K\x03e\x86q\x03Rq\x04.'
#########################################
Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> pickle.loads(b'\x80\x02carray\narray\nq\x00X\x01\x00\x00\x00iq\x01]q\x02(K\x01K\x02K\x03e\x86q\x03Rq\x04.')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "c:\Python27\lib\pickle.py", line 1382, in loads
 return Unpickler(file).load()
 File "c:\Python27\lib\pickle.py", line 858, in load
 dispatch[key](self)
 File "c:\Python27\lib\pickle.py", line 1133, in load_reduce
 value = func(*args)
TypeError: must be char, not unicode
msg149101 - (view) Author: Ramchandra Apte (Ramchandra Apte) * Date: 2011年12月09日 14:23
The problem is that pickle is calling array.array(u'i',[1,2,3]) and array.array in Python 2 doesn't allow unicode strings as a typecode (typecode is the first argument)
The docs in Python 2 and Py3k doesn't specify the type of the typecode argument of array.array.
In Python 2 it seems that typecode has to be a bytes string.
In Python 3 it seems that typecode has to be a unicode string.
I suggest that array.array be changed in Python 2 to allow unicode strings as a typecode or that pickle detects array.array being called and fixes the call.
msg149104 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2011年12月09日 15:27
> I suggest that array.array be changed in Python 2 to allow unicode strings 
> as a typecode or that pickle detects array.array being called and fixes 
> the call.
Interestingly, py3 does understand arrays pickled by py2. This appears to be because py2 pickles str using BINSTRING or SHORT_BINSTRING which will unpickle as str on py2 and py3. py3 pickles str using BINUNICODE which will unpickle as unicode on py2 and str on py3.
I think it would be better to fix this in py3 if possible, but that does not look easy: modifying array.__reduce_ex__ alone would not be enough.
The only thing I can think of is for py3 to grow a "_binstr" type which only supports ascii strings and is special-cased by pickle to be pickled using BINSTRING. Then array.__reduce_ex__ could be something like:
 def __reduce_ex__(self, protocol):
 if protocol <= 2:
 return array.array, (_binstr(self.typecode), list(self))
 else:
 ...
msg205445 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2013年12月07日 10:17
Adding a special type is not a bad idea. We have to keep the code for loading BINSTRING opcodes anyway, so we might as well use it. It could be helpful for unit-testing our Python 2 compatibility support for pickle.
We should still fix array in 2.7 to accept unicode object for the typecode though.
msg206504 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013年12月18日 09:51
Alexandre Vassalotti said: "We should still fix array in 2.7 to accept unicode object for the typecode though."
I created issue #20014 (with the patch) for this feature.
msg206532 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年12月18日 16:02
See issue20015 for more general approach.
msg206536 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013年12月18日 16:08
> If you pickle an array object on python 3 the typecode is encoded as a unicode string rather than as a byte string. This makes python 2 reject the pickle.
Pickles files of Python 3 are supposed to be compatible with Python 2?
It looks very tricky to produce pickle files compatible with both versions.
msg242949 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年05月12日 09:17
Proposed patch pickles all ascii strings with protocols < 3 and fix_import=True with compatible opcodes (STRING, BINSTRING and SHORT_BINSTRING). Pickled strings are unpickled as str in Python 2 and Python 3 (unless encoding="bytes").
As a side effect, short ascii strings (length < 256) are pickled more compact with protocols < 3.
msg245048 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年06月09日 08:05
Alexandre, Antoine, what are your thoughts?
msg245049 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年06月09日 08:23
Won't that fail if a Python 2 API accepts only unicode strings?
msg245050 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年06月09日 08:32
Does such API even exist?
msg245051 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年06月09日 08:38
I wouldn't be very surprised if third-party libraries enforce such typing, yes. If your library has a clear text/bytes separation, it makes sense to enforce it at the API level, to avoid mistakes by users.
msg245056 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年06月09日 10:07
Such libraries already have a problem. Both str and unicode pickled in Python 2 are unpickled as str in Python 3.
msg245057 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015年06月09日 10:08
It's not a problem, since str *is* unicode in Python 3.
msg288158 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017年02月19日 19:28
This is a problem when pickle data in Python 3 for unpickling in Python 2.
msg289119 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2017年03月06日 16:02
Right, but Antoine's objection is that suddenly strs pickled in Py3 can end up as strs in Py2, rather than unicode. If the library enforces a Py3-like type separation on Py2 (text arguments are unicode only, binary data is str only), then you have the problem where pickling on Py3 produces a pickle that will unpickle as str on Py2, and suddenly the library explodes because the argument, that should be unicode on Py2 and str on Py3, is suddenly str on both.
This means that, to fix a problem with non-forward compatible libraries (that accept text only as Py2 str), a Py2 library that's (very) forward thinking would have problems.
Admittedly, I wouldn't expect there to be very many such libraries, and many of them would have their own custom pickle formats, but stuff like numpy is quite sensitive to argument type; numpy.array(u'123') and numpy.array(b'123') are different. In numpy's case, each of those produces a derived datatype that is explicitly pickled and (I believe) would prevent the error, but some other more heuristic library might not do so.
History
Date User Action Args
2022年04月11日 14:57:24adminsetgithub: 57775
2017年03月06日 16:02:42josh.rsetnosy: + josh.r
messages: + msg289119
2017年03月06日 07:41:33serhiy.storchakasetfiles: + pickle-old-strings-2.patch
2017年02月19日 19:28:26serhiy.storchakasetmessages: + msg288158
versions: + Python 3.7, - Python 3.5
2015年06月09日 10:08:12pitrousetmessages: + msg245057
2015年06月09日 10:07:45serhiy.storchakasetmessages: + msg245056
2015年06月09日 08:38:57pitrousetmessages: + msg245051
2015年06月09日 08:32:04serhiy.storchakasetmessages: + msg245050
2015年06月09日 08:23:06pitrousetmessages: + msg245049
2015年06月09日 08:05:52serhiy.storchakasetmessages: + msg245048
2015年05月12日 09:17:43serhiy.storchakasetfiles: + pickle_old_strings.patch

title: Array objects pickled in 3.x with protocol <=2 are unpickled incorrectly in 2.x -> Increase pickle compatibility
keywords: + patch
type: behavior -> enhancement
versions: + Python 3.5, - Python 2.7, Python 3.3, Python 3.4
messages: + msg242949
stage: needs patch -> patch review
2015年05月11日 07:54:27serhiy.storchakasetassignee: serhiy.storchaka
2013年12月21日 07:30:06serhiy.storchakasetnosy: + terry.reedy
2013年12月18日 16:08:33vstinnersetnosy: + vstinner
messages: + msg206536
2013年12月18日 16:02:15serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg206532
2013年12月18日 09:51:31vajraskysetnosy: + vajrasky
messages: + msg206504
2013年12月07日 10:17:59alexandre.vassalottisetstage: needs patch
messages: + msg205445
versions: + Python 2.7, Python 3.4, - Python 3.2
2011年12月09日 15:27:03sbtsetmessages: + msg149104
2011年12月09日 14:23:58Ramchandra Aptesetnosy: + Ramchandra Apte
messages: + msg149101
2011年12月09日 13:26:30pitrousetnosy: + pitrou, alexandre.vassalotti
2011年12月09日 13:25:03sbtcreate

AltStyle によって変換されたページ (->オリジナル) /