This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011年03月10日 10:19 by ply, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| testutf16.py | ply, 2011年03月10日 10:19 | Error reproducing script | ||
| partial_utf16.patch | amaury.forgeotdarc, 2011年03月10日 12:19 | review | ||
| partial_utf16-3.3.patch | serhiy.storchaka, 2012年09月27日 13:29 | Patch for 3.3 | review | |
| Messages (4) | |||
|---|---|---|---|
| msg130498 - (view) | Author: Yuriy Pilgun (ply) | Date: 2011年03月10日 10:19 | |
Reading UTF-16 text file with module 'codecs' fails, if surrogate pair is located at 72-character boundary. Attached python script fails with message: UnicodeDecodeError: 'utf16' codec can't decode bytes in position 70-71: unexpected end of data The reason is splitting of input data for readline() into chunks, namely readsize = size or 72 |
|||
| msg130504 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2011年03月10日 12:19 | |
The utf16 incremental codec does not like incomplete surrogate pairs. Patch attached. I also plan to refactor all the test_partial() functions of test_codecs, to give them a common implementation. |
|||
| msg171373 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年09月27日 13:29 | |
In issue14624 utf-16 decoder has been significantly reworked. Here is adapted for 3.3 patch. |
|||
| msg179375 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年01月08日 21:47 | |
New changeset f2353e74b335 by Serhiy Storchaka in branch '2.7': Issue #11461: Fix the incremental UTF-16 decoder. Original patch by http://hg.python.org/cpython/rev/f2353e74b335 New changeset 4677c5f6fcf7 by Serhiy Storchaka in branch '3.2': Issue #11461: Fix the incremental UTF-16 decoder. Original patch by http://hg.python.org/cpython/rev/4677c5f6fcf7 New changeset eed1883b1974 by Serhiy Storchaka in branch '3.3': Issue #11461: Fix the incremental UTF-16 decoder. Original patch by http://hg.python.org/cpython/rev/eed1883b1974 New changeset 5e84d020d001 by Serhiy Storchaka in branch 'default': Issue #11461: Fix the incremental UTF-16 decoder. Original patch by http://hg.python.org/cpython/rev/5e84d020d001 |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:14 | admin | set | github: 55670 |
| 2013年01月08日 21:49:54 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2013年01月08日 21:47:35 | python-dev | set | nosy:
+ python-dev messages: + msg179375 |
| 2013年01月07日 17:55:37 | serhiy.storchaka | set | assignee: serhiy.storchaka |
| 2013年01月07日 17:54:49 | serhiy.storchaka | link | issue15278 superseder |
| 2012年09月27日 13:29:41 | serhiy.storchaka | set | files:
+ partial_utf16-3.3.patch nosy: + serhiy.storchaka messages: + msg171373 keywords: + needs review |
| 2012年09月26日 20:07:36 | vstinner | set | title: Reading UTF-16 with codecs.readline() breaks on surrogate pairs -> UTF-16 incremental decoder doesn't support partial surrogate pair |
| 2012年09月26日 20:06:08 | vstinner | set | versions: + Python 3.2, Python 3.3, Python 3.4 |
| 2012年09月26日 17:27:59 | ezio.melotti | set | stage: test needed -> patch review |
| 2011年03月11日 00:22:25 | pitrou | set | nosy:
+ vstinner |
| 2011年03月10日 12:19:30 | amaury.forgeotdarc | set | files:
+ partial_utf16.patch nosy: + amaury.forgeotdarc messages: + msg130504 keywords: + patch |
| 2011年03月10日 10:23:42 | ezio.melotti | set | nosy:
+ ezio.melotti stage: test needed |
| 2011年03月10日 10:19:57 | ply | create | |