This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2017年04月06日 03:42 by malin, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 1556 | merged | xiang.zhang, 2017年05月12日 10:30 | |
| PR 1718 | merged | xiang.zhang, 2017年05月22日 14:46 | |
| PR 1719 | merged | xiang.zhang, 2017年05月22日 14:47 | |
| PR 1720 | merged | xiang.zhang, 2017年05月22日 15:10 | |
| Messages (10) | |||
|---|---|---|---|
| msg291207 - (view) | Author: Ma Lin (malin) * | Date: 2017年04月06日 03:42 | |
hz is a Simplified Chinese codec, available in Python since around 2004.
However, hz encoder has a serious bug, it forgets to escape ~
>>> 'hi~'.encode('hz')
b'hi~' # the correct output should be b'hi~~'
As a result, we can't finish a roundtrip:
>>> b'hi~'.decode('hz')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'hz' codec can't decode byte 0x7e in position 2: incomplete multibyte
In these years, no one has reported this bug, so I think it's pretty safe to remove hz codec.
FYI:
HZ codec is a 7-bit wrapper for GB2312, was formerly commonly used in email and USENET postings. It was designed in 1989 by Fung Fung Lee, and subsequently codified in 1995 into RFC 1843.
It was popular in USENET networks, which in the late 1980s and early 1990s, generally did not allow transmission of 8-bit characters or escape characters.
https://en.wikipedia.org/wiki/HZ_(character_encoding)
Does other languages have hz codec?
Java 8: no [1]
.NET: yes [2]
PHP: yes [3]
Perl: yes [4]
[1] http://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html
[2] https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx
[3] http://php.net/manual/en/mbstring.supported-encodings.php
[4] http://perldoc.perl.org/Encode/CN.html
|
|||
| msg291214 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年04月06日 07:27 | |
Can't we fix the bug instead of removing the whole codec? Or do you know other bugs? The bug is only on the encoder part, right? I see unit test for '~' on the hz decoder. |
|||
| msg291216 - (view) | Author: Ma Lin (malin) * | Date: 2017年04月06日 07:54 | |
I tried to fix this two years ago, here is the patch (not merged): http://bugs.python.org/review/24117/diff/14803/Modules/cjkcodecs/_codecs_cn.c But later, I thought it's a good opportunity to remove this codec, this serious bug indicates that almost no one is using it. But fixing will create a possibility that someone will using it in future. So I suggest we don't fix it, just remove it or leave it as is. hz is outdated, searching on internet almost no one talking about it. > Or do you know other bugs? It has another small bug in decoder, about state switch, but it's trivial, also fixed in the patch. |
|||
| msg291298 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2017年04月07日 21:35 | |
We seldom just remove things; we usually deprecate in the doc and if possible, issue a runtime warning. This is probably not the only obsolete codec. There should be a uniform policy for deprecation and removal, if ever. But for any codec, there might be archives, even if the codec is not used for new files. If the codec is buggy, I think it should be fixed. Bt you yourself closed #24117, suggesting that you did not believe that the patches should be applied. |
|||
| msg291312 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年04月08日 02:32 | |
"But for any codec, there might be archives, even if the codec is not used for new files." The bug is in the encoder. The codec is still usable to *decode* files. So maybe a few people use it but didn't notice the encoder bug? |
|||
| msg291315 - (view) | Author: Ma Lin (malin) * | Date: 2017年04月08日 04:09 | |
From my subjective feelings, probably no old archives still exist, but I can't assert it. That's why I suggest remove it, or at least don't fix it. Ah, let's slow down the pace, this bug exists over a dacade, we don't need to solve it at once. I closed #24117, because it became a soup of small issues, so I split it into individual issues (such as this issue). |
|||
| msg294150 - (view) | Author: Xiang Zhang (xiang.zhang) * (Python committer) | Date: 2017年05月22日 14:42 | |
New changeset 89a5e03244370f41ce9bed5cea38e0dd620edb73 by Xiang Zhang in branch 'master': bpo-30003: Fix handling escape characters in HZ codec (#1556) https://github.com/python/cpython/commit/89a5e03244370f41ce9bed5cea38e0dd620edb73 |
|||
| msg294161 - (view) | Author: Xiang Zhang (xiang.zhang) * (Python committer) | Date: 2017年05月22日 17:02 | |
New changeset 65440f8278351e16350be716dff61f5f786f7060 by Xiang Zhang in branch '3.5': bpo-30003: Fix handling escape characters in HZ codec (#1556) (#1718) https://github.com/python/cpython/commit/65440f8278351e16350be716dff61f5f786f7060 |
|||
| msg294162 - (view) | Author: Xiang Zhang (xiang.zhang) * (Python committer) | Date: 2017年05月22日 17:03 | |
New changeset 54af41d42eebbe4c6afe6b34ebb0fb550de1e7ba by Xiang Zhang in branch '3.6': bpo-30003: Fix handling escape characters in HZ codec (#1556) (#1719) https://github.com/python/cpython/commit/54af41d42eebbe4c6afe6b34ebb0fb550de1e7ba |
|||
| msg294163 - (view) | Author: Xiang Zhang (xiang.zhang) * (Python committer) | Date: 2017年05月22日 17:04 | |
New changeset 6e1b832a6c0c8f32962a196ab631ccc17471d32b by Xiang Zhang in branch '2.7': bpo-30003: Fix handling escape characters in HZ codec (#1720) (#1556) https://github.com/python/cpython/commit/6e1b832a6c0c8f32962a196ab631ccc17471d32b |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:45 | admin | set | github: 74189 |
| 2017年05月22日 17:04:55 | xiang.zhang | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2017年05月22日 17:04:30 | xiang.zhang | set | messages: + msg294163 |
| 2017年05月22日 17:03:02 | xiang.zhang | set | messages: + msg294162 |
| 2017年05月22日 17:02:35 | xiang.zhang | set | messages: + msg294161 |
| 2017年05月22日 15:10:43 | xiang.zhang | set | pull_requests: + pull_request1809 |
| 2017年05月22日 14:47:07 | xiang.zhang | set | pull_requests: + pull_request1808 |
| 2017年05月22日 14:46:50 | xiang.zhang | set | pull_requests: + pull_request1807 |
| 2017年05月22日 14:42:10 | xiang.zhang | set | messages: + msg294150 |
| 2017年05月12日 10:32:31 | xiang.zhang | set | title: Remove hz codec -> Fix handling escape characters in HZ codec stage: patch review versions: + Python 2.7, Python 3.5, Python 3.6 |
| 2017年05月12日 10:30:41 | xiang.zhang | set | pull_requests: + pull_request1652 |
| 2017年04月08日 04:09:51 | malin | set | messages: + msg291315 |
| 2017年04月08日 02:32:42 | vstinner | set | messages: + msg291312 |
| 2017年04月07日 21:35:28 | terry.reedy | set | nosy:
+ terry.reedy messages: + msg291298 |
| 2017年04月06日 07:54:32 | malin | set | messages: + msg291216 |
| 2017年04月06日 07:27:33 | vstinner | set | messages: + msg291214 |
| 2017年04月06日 03:42:17 | malin | create | |