This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年07月17日 08:15 by serhiy.storchaka, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| decode_charmap_maxchar-3.3_2.patch | serhiy.storchaka, 2012年09月21日 20:14 | Patch for 3.3 | review | |
| decode_charmap_maxchar-3.2_2.patch | serhiy.storchaka, 2012年09月21日 20:14 | Patch for 3.2 | review | |
| decode_charmap_maxchar-2.7.patch | serhiy.storchaka, 2012年10月02日 16:39 | Patch for 2.7 | review | |
| Messages (16) | |||
|---|---|---|---|
| msg165688 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年07月17日 08:15 | |
Yet one inconsistency in charmap codec.
>>> import codecs
>>> codecs.charmap_decode(b'\x00', 'strict', '\U0002000B')
('𠀋', 1)
>>> codecs.charmap_decode(b'\x00', 'strict', {0: '\U0002000B'})
('𠀋', 1)
>>> codecs.charmap_decode(b'\x00', 'strict', {0: 0x2000B})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: character mapping must be in range(65536)
The suggested patch removes this unnecessary limitation in charmap decoder.
|
|||
| msg165690 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年07月17日 08:54 | |
Could you add a test to your patch? Is the issue 3.3-specific? |
|||
| msg165710 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年07月17日 11:36 | |
Fixing for 3.2 and lesser is possible, but expensive, because of narrow build limitation. If necessary, I will give the patch, but it is easier to mark it as "wont fix" for pre-3.3 versions. Here is a tests for charmap decoding. Tests added not only for this issue, but for all non-covered cases with int2str and int2str mappings. |
|||
| msg165753 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2012年07月18日 11:02 | |
In 3.2, narrow build is also broken when the "charmap" is a string:
>>> codecs.charmap_decode(b'0円', 'strict', '\U0002000B')
returns ('𠀋', 1) with a wide unicode build, but ('\ud840', 1) with a narrow build.
3.2 could be fixed to allow characters up to sys.maxunicode, though.
|
|||
| msg165786 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年07月18日 15:48 | |
Well, here is a patch for 3.2. |
|||
| msg165796 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2012年07月18日 20:26 | |
About the patch for 3.2: "needed = 6 - extrachars" Where does this 6 come from? There is another part which uses this "extrachars". Why is the allocation strategy different here? |
|||
| msg165798 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年07月18日 20:33 | |
It's the same strategy. "needed = (targetsize - extrachars) + (targetsize << 2)". targetsize == 2. |
|||
| msg165801 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2012年07月18日 21:07 | |
Ah, I was worried by the possible quadratic behavior. So the other (existing) case is quadratic as well (I was mislead by the <<, which made me think there is something clever there). That's good enough for 3.2, I guess. |
|||
| msg170567 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年09月16日 18:43 | |
Ping. |
|||
| msg170913 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年09月21日 20:14 | |
Patches updated. Added a few new tests, used MAX_UNICODE, a little changed extrachars grow step. |
|||
| msg171069 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年09月23日 18:01 | |
New changeset 620d23f7ad41 by Antoine Pitrou in branch '3.2': Issue #15379: Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings). http://hg.python.org/cpython/rev/620d23f7ad41 New changeset c64dec45d46f by Antoine Pitrou in branch 'default': Issue #15379: Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings). http://hg.python.org/cpython/rev/c64dec45d46f |
|||
| msg171070 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年09月23日 18:02 | |
Thank you, I've committed the patches. There was a test failure in test_codeccallbacks in 3.2, which I fixed simply by replacing sys.maxunicode with a hardcoded 0x110000. |
|||
| msg171814 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年10月02日 16:39 | |
We forgot about 2.7 (because I had not thought to apply it even for a 3.2). Here is backported patch. |
|||
| msg173356 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年10月19日 19:05 | |
The 2.7 patch is just a backport of 3.2 patch (including the last Antoine's fix). Please look and commit. |
|||
| msg175802 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年11月17日 20:17 | |
New changeset c7ce91756472 by Antoine Pitrou in branch '2.7': Issue #15379: Fix passing of non-BMP characters as integers for the charmap decoder (already working as unicode strings). http://hg.python.org/cpython/rev/c7ce91756472 |
|||
| msg175803 - (view) | Author: Antoine Pitrou (pitrou) * (Python committer) | Date: 2012年11月17日 20:17 | |
Thanks for the backport, committed! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:32 | admin | set | github: 59584 |
| 2012年11月17日 20:17:42 | pitrou | set | stage: commit review -> resolved |
| 2012年11月17日 20:17:36 | pitrou | set | status: open -> closed messages: + msg175803 |
| 2012年11月17日 20:17:14 | python-dev | set | messages: + msg175802 |
| 2012年10月24日 09:10:51 | serhiy.storchaka | set | stage: resolved -> commit review versions: - Python 3.2, Python 3.3 |
| 2012年10月19日 19:05:41 | serhiy.storchaka | set | messages: + msg173356 |
| 2012年10月02日 16:40:09 | serhiy.storchaka | set | versions: + Python 2.7 |
| 2012年10月02日 16:39:25 | serhiy.storchaka | set | status: closed -> open files: + decode_charmap_maxchar-2.7.patch messages: + msg171814 |
| 2012年09月23日 18:02:19 | pitrou | set | stage: patch review -> resolved |
| 2012年09月23日 18:02:05 | pitrou | set | status: open -> closed resolution: fixed messages: + msg171070 |
| 2012年09月23日 18:01:12 | python-dev | set | nosy:
+ python-dev messages: + msg171069 |
| 2012年09月21日 20:16:26 | serhiy.storchaka | set | files: - decode_charmap_maxchar-3.2.patch |
| 2012年09月21日 20:16:16 | serhiy.storchaka | set | files: - decode_charmap_tests.patch |
| 2012年09月21日 20:16:09 | serhiy.storchaka | set | files: - decode_charmap_maxchar.patch |
| 2012年09月21日 20:14:09 | serhiy.storchaka | set | files:
+ decode_charmap_maxchar-3.3_2.patch, decode_charmap_maxchar-3.2_2.patch messages: + msg170913 |
| 2012年09月16日 18:43:26 | serhiy.storchaka | set | messages: + msg170567 |
| 2012年08月05日 10:48:52 | serhiy.storchaka | set | stage: needs patch -> patch review |
| 2012年08月05日 10:47:32 | serhiy.storchaka | set | keywords:
+ needs review priority: normal -> low stage: patch review -> needs patch |
| 2012年07月18日 21:07:56 | amaury.forgeotdarc | set | messages: + msg165801 |
| 2012年07月18日 20:33:37 | serhiy.storchaka | set | messages: + msg165798 |
| 2012年07月18日 20:26:53 | amaury.forgeotdarc | set | messages: + msg165796 |
| 2012年07月18日 15:48:30 | serhiy.storchaka | set | files:
+ decode_charmap_maxchar-3.2.patch messages: + msg165786 versions: + Python 3.2 |
| 2012年07月18日 11:02:19 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages: + msg165753 |
| 2012年07月17日 11:36:02 | serhiy.storchaka | set | files:
+ decode_charmap_tests.patch messages: + msg165710 |
| 2012年07月17日 08:54:29 | pitrou | set | nosy:
+ lemburg, pitrou, vstinner, benjamin.peterson, ezio.melotti messages: + msg165690 stage: patch review |
| 2012年07月17日 08:15:57 | serhiy.storchaka | create | |