homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: pickle is unable to encode unicode surrogates
Type: Stage:
Components: Library (Lib) Versions: Python 3.1, Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: lemburg, loewis, vstinner
Priority: normal Keywords: patch

Created on 2010年04月13日 00:39 by vstinner, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
pickle_surrogates.patch vstinner, 2010年04月13日 00:44
Messages (6)
msg102996 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年04月13日 00:39
Python3 uses unicode surrogates to store undecodable filenames. Eg. the filename b"abc\xff.py" is encoded as "abc\xdcff.py" if the file system encoding is ASCII. Pickle is unable to store them:
./python -c 'import pickle; pickle.dumps("abc\udcff")'
(...)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 20: surrogates not allowed
This is a limitation of pickle (in the binary mode): Python accepts to store any unicode character, but pickle doesn't.
Using "surrogatepass" error handler should be enough to fix this issue.
Related issue: #3672 (Reject surrogates in utf-8 codec) -> r72208 creates "surrogatepass" error handler.
msg102997 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年04月13日 00:51
I found this bug indirectly: test_logging failed on a SocketHandler if LogRecord.pathname contains a surrogate character. SocketHandler uses pickle to serialize the record.
msg103022 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010年04月13日 09:01
Both pickle and marshal will need to use the new error handler in order to stay compatible with Python 3.0 (and 2.x) and also to enable creating Unicode literals that include lone surrogates.
msg103029 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年04月13日 09:28
> Both pickle and marshal will need to use the new error handler 
> in order to stay compatible with Python 3.0 (and 2.x) 
> and also to enable creating Unicode literals that include 
> lone surrogates.
Attached patch fixes pickle. Marshal does already use surrogatepass since Martin's commit r72208 (Issue #3672).
msg103030 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010年04月13日 09:44
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
>> Both pickle and marshal will need to use the new error handler 
>> in order to stay compatible with Python 3.0 (and 2.x) 
>> and also to enable creating Unicode literals that include 
>> lone surrogates.
> 
> Attached patch fixes pickle. Marshal does already use surrogatepass since Martin's commit r72208 (Issue #3672).
Looks good !
Thanks.
msg103034 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010年04月13日 11:10
Commited: r80031 (py3k) and r80032 (3.1), fix also pickletools.
History
Date User Action Args
2022年04月11日 14:56:59adminsetgithub: 52630
2010年04月16日 01:14:54vstinnerlinkissue8242 dependencies
2010年04月13日 11:10:22vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg103034
2010年04月13日 09:44:29lemburgsetmessages: + msg103030
2010年04月13日 09:28:21vstinnersetmessages: + msg103029
2010年04月13日 09:01:49lemburgsetmessages: + msg103022
2010年04月13日 00:51:33vstinnersetmessages: + msg102997
2010年04月13日 00:44:40vstinnersetfiles: + pickle_surrogates.patch
keywords: + patch
2010年04月13日 00:39:55vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /