This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年04月13日 00:39 by vstinner, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| pickle_surrogates.patch | vstinner, 2010年04月13日 00:44 | |||
| Messages (6) | |||
|---|---|---|---|
| msg102996 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年04月13日 00:39 | |
Python3 uses unicode surrogates to store undecodable filenames. Eg. the filename b"abc\xff.py" is encoded as "abc\xdcff.py" if the file system encoding is ASCII. Pickle is unable to store them:
./python -c 'import pickle; pickle.dumps("abc\udcff")'
(...)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcff' in position 20: surrogates not allowed
This is a limitation of pickle (in the binary mode): Python accepts to store any unicode character, but pickle doesn't.
Using "surrogatepass" error handler should be enough to fix this issue.
Related issue: #3672 (Reject surrogates in utf-8 codec) -> r72208 creates "surrogatepass" error handler.
|
|||
| msg102997 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年04月13日 00:51 | |
I found this bug indirectly: test_logging failed on a SocketHandler if LogRecord.pathname contains a surrogate character. SocketHandler uses pickle to serialize the record. |
|||
| msg103022 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2010年04月13日 09:01 | |
Both pickle and marshal will need to use the new error handler in order to stay compatible with Python 3.0 (and 2.x) and also to enable creating Unicode literals that include lone surrogates. |
|||
| msg103029 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年04月13日 09:28 | |
> Both pickle and marshal will need to use the new error handler > in order to stay compatible with Python 3.0 (and 2.x) > and also to enable creating Unicode literals that include > lone surrogates. Attached patch fixes pickle. Marshal does already use surrogatepass since Martin's commit r72208 (Issue #3672). |
|||
| msg103030 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2010年04月13日 09:44 | |
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > >> Both pickle and marshal will need to use the new error handler >> in order to stay compatible with Python 3.0 (and 2.x) >> and also to enable creating Unicode literals that include >> lone surrogates. > > Attached patch fixes pickle. Marshal does already use surrogatepass since Martin's commit r72208 (Issue #3672). Looks good ! Thanks. |
|||
| msg103034 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年04月13日 11:10 | |
Commited: r80031 (py3k) and r80032 (3.1), fix also pickletools. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:59 | admin | set | github: 52630 |
| 2010年04月16日 01:14:54 | vstinner | link | issue8242 dependencies |
| 2010年04月13日 11:10:22 | vstinner | set | status: open -> closed resolution: fixed messages: + msg103034 |
| 2010年04月13日 09:44:29 | lemburg | set | messages: + msg103030 |
| 2010年04月13日 09:28:21 | vstinner | set | messages: + msg103029 |
| 2010年04月13日 09:01:49 | lemburg | set | messages: + msg103022 |
| 2010年04月13日 00:51:33 | vstinner | set | messages: + msg102997 |
| 2010年04月13日 00:44:40 | vstinner | set | files:
+ pickle_surrogates.patch keywords: + patch |
| 2010年04月13日 00:39:55 | vstinner | create | |