This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2003年11月29日 01:24 by mhammond, last changed 2022年04月11日 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| mbcs_errors.py | mhammond, 2003年11月29日 01:24 | Trivial demo of the bug | ||
| mbcs_errors.patch | mhammond, 2003年11月29日 01:38 | Working patch, but with a few issues | ||
| translate_reason_unicode.patch | vstinner, 2010年06月11日 00:54 | |||
| mbcs_errors-py3k-3.patch | vstinner, 2010年06月12日 00:06 | |||
| mbcs_errors-py3k-4.patch | vstinner, 2010年06月16日 22:21 | |||
| Messages (17) | |||
|---|---|---|---|
| msg19177 - (view) | Author: Mark Hammond (mhammond) * (Python committer) | Date: 2003年11月29日 01:24 | |
The following snippet:
>>> u'@test-\u5171'.encode("mbcs", "strict")
'@test-?'
Should raise a UnicodeError. The errors param is
completely ignored, and the function always works as
though errors='replace'.
Attaching a test case, and the start of a patch. The
patch has a number of issues:
* I'm not sure what errors are considered 'mandatory'.
I have handled 'strict', 'ignore' and 'replace' -
however, 'ignore' and 'replace' currently are exactly
the same (ie, replace)
* The Windows functions don't tell us exactly what
character failed in the conversion. Thus, the
exception I raise implies the first character is the
one that failed. For the same reason, I have made no
attempt to support error callbacks.
Comments/guidance appreciated.
|
|||
| msg19178 - (view) | Author: Mark Hammond (mhammond) * (Python committer) | Date: 2003年11月29日 01:31 | |
Logged In: YES user_id=14198 Attaching a patch. This patch also attempts to handle Encode, but I haven't worked out how to exercise this code-path - ie, what mbcs encoded string can I pass that can not be converted to unicode? As I mentioned, patch has a few issues |
|||
| msg19179 - (view) | Author: Thomas Heller (theller) * (Python committer) | Date: 2003年11月29日 15:18 | |
Logged In: YES user_id=11105 No idea why this was assigned to me - unicode is certainly not one of my strengths. |
|||
| msg19180 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2003年12月01日 21:25 | |
Logged In: YES user_id=21627 The conventional semantics of "ignore" would be "remove the failing characters from the output". This would be difficult to implement if the Microsoft API provides no detailed error indication. You could try to get more detailed error indication by re-encoding the resulting string with a NULL buffer, counting the number of characters that have successfully been encoded, atleast in the .decode case. In the .encode case, you could try using 0円 as the default char. To my knowledge, no ACP ever uses 0円 in a multi-byte string. What is the meaning of the WC_DEFAULTCHAR flag, in WideCharToMultiByte, and why are you not using it? I'm somewhat concerned with backwards compatibility, since the mbcs codec has never returned errors. So this should be applied to 2.4 only, and listed in whatsnew.tex. |
|||
| msg82015 - (view) | Author: Daniel Diniz (ajaksu2) * (Python triager) | Date: 2009年02月14日 11:35 | |
Is this behavior still present? If so, is it still interesting to change it? |
|||
| msg82133 - (view) | Author: Mark Hammond (mhammond) * (Python committer) | Date: 2009年02月14日 22:40 | |
It is still present, but I'm not sure what problems can be seen due to this so can't comment on its desirability. It would also introduce a backwards compatability concern but I've not enough experience to know how much of a problem that would be in practice either. |
|||
| msg106277 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年05月22日 01:36 | |
I patched py3k with mbcs_errors.patch (only encode_mbcs, not the decoder function) and most test pass: I opened #8784 for test_tarfile failure. I don't think that it's a problem that mbcs only supports few error handlers, eg. 'strict', 'replace' and 'errors' (but not 'ignore' nor 'surrogateescape'). mbcs should be avoided anyway :-) It is kept for backward compatibility (with Python2). Python3 tries to avoid it by using the Unicode functions of Windows API. I don't know exactly where mbcs is still used in Python3. If mbcs becomes more strict and raise new errors, I would like to say that the problem comes from the program, not in the encodig, and the program should be fixed (especilly if the "program" is the Python standard library). About the backward compatibility with Python < 3.2: I don't know exactly if this change would be a problem or not. I bet that few people use (directly or indirectly) mbcs with Python 3.1 (on Windows), and few peple (or nobody) would notice this change. And as I wrote, if someone notices a problem: the problem should be fixed in the function using mbcs, not in the codec. |
|||
| msg106278 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年05月22日 01:38 | |
Since this change breaks backward compatibility, it's a very bad idea to change mbcs codec in Python 2.7: remove this version from this issue. |
|||
| msg106407 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年05月24日 23:11 | |
Updated version of the patch for py3k: - don't accept "ignore" error handler anymore - there is a FIXME near "mbcs_decode_error:" The whole test suite pass with these patch. |
|||
| msg107513 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年06月11日 00:54 | |
I worked again on the patch. I opened new issues to prepare the new mbcs codec: - #8966: ctypes: remove implicit conversion between unicode and bytes - #8967: Create PyErr_GetWindowsMessage() function - #8969: Windows: use (mbcs in) strict mode to encode/decode filenames, and enable os.fsencode() #8967 can be used to get the translated message of a mbcs encode error. PyErr_GetWindowsMessage() returns a PyUnicodeObject, whereas make_translate_exception() and PyUnicodeTranslateError_SetReason() expect a "char*". Another patch is requied: translate_reason_unicode.patch (attached to this issue, not tested). But I don't think that the message is very important for now :-) #8784 (tarfile/Windows: Don't use mbcs as the default encoding) is still open. |
|||
| msg107517 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年06月11日 01:28 | |
New version of the patch: - decode_mbcs() calls raise_translate_exception() to set the error (in the previous patch, I'm not sure that the error was set) - include #8784 patch (tarfile uses utf-8 as the default encoding) - ctypes: use mbcs is strict mode instead of ignore mode. This is just a workaround, the real fix is to remove the implicit conversion between bytes and characters: see #8966 The patch requires #8969 patch (use mbcs in strict mode to encode/decode filenames). |
|||
| msg107611 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年06月12日 00:01 | |
Tim: are you interested in testing this patch? |
|||
| msg107612 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年06月12日 00:06 | |
Update the patch (I commited the patch on tarfile module): version 3. |
|||
| msg107957 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年06月16日 22:21 | |
Patch version 4: - encode_mbcs() uses WC_NO_BEST_FIT_CHARS flag in strict mode. Examples: ğ and ł are not more replaced by g and l - encode_mbcs() doesn't set *repr to NULL on encode error: the caller does anyway destroy it - write more documentation about mbcs, especially about the error handlers and the changes in Python 3.2 |
|||
| msg107965 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年06月16日 23:35 | |
I commited the last patch to py3k: r82037. Let see how the buildbots react :-) |
|||
| msg108018 - (view) | Author: Tim Golden (tim.golden) * (Python committer) | Date: 2010年06月17日 14:21 | |
I'm unlikely to get to it soon. If there's no urgency I can look at it later. FWIW, it's not something I'm especially familiar with. On 12/06/2010 01:02, STINNER Victor wrote: > > STINNER Victor<victor.stinner@haypocalc.com> added the comment: > > Tim: are you interested in testing this patch? > > ---------- > nosy: +tim.golden > > _______________________________________ > Python tracker<report@bugs.python.org> > <http://bugs.python.org/issue850997> > _______________________________________ |
|||
| msg108149 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年06月18日 23:24 | |
Close this issue: nothing special on the buildbots. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:56:01 | admin | set | github: 39624 |
| 2010年06月18日 23:24:39 | vstinner | set | status: open -> closed resolution: fixed messages: + msg108149 |
| 2010年06月17日 14:21:56 | tim.golden | set | messages: + msg108018 |
| 2010年06月16日 23:35:09 | vstinner | set | messages: + msg107965 |
| 2010年06月16日 22:21:04 | vstinner | set | files:
+ mbcs_errors-py3k-4.patch messages: + msg107957 |
| 2010年06月12日 00:07:08 | vstinner | set | files: - mbcs_errors-py3k-2.patch |
| 2010年06月12日 00:07:02 | vstinner | set | files: - mbcs_errors-py3k.patch |
| 2010年06月12日 00:06:53 | vstinner | set | files:
+ mbcs_errors-py3k-3.patch messages: + msg107612 |
| 2010年06月12日 00:01:58 | vstinner | set | nosy:
+ tim.golden messages: + msg107611 |
| 2010年06月11日 01:28:51 | vstinner | set | files:
+ mbcs_errors-py3k-2.patch messages: + msg107517 |
| 2010年06月11日 00:54:17 | vstinner | set | files:
+ translate_reason_unicode.patch messages: + msg107513 |
| 2010年05月24日 23:11:38 | vstinner | set | files:
+ mbcs_errors-py3k.patch messages: + msg106407 |
| 2010年05月22日 01:38:50 | vstinner | set | messages:
+ msg106278 versions: - Python 2.7 |
| 2010年05月22日 01:36:15 | vstinner | set | messages: + msg106277 |
| 2010年05月22日 01:01:38 | vstinner | set | nosy:
+ vstinner |
| 2010年02月05日 16:57:18 | ezio.melotti | set | nosy:
+ ezio.melotti versions: + Python 2.7, Python 3.2 |
| 2009年02月14日 22:40:35 | mhammond | set | messages: + msg82133 |
| 2009年02月14日 12:14:14 | theller | set | nosy: - theller |
| 2009年02月14日 11:35:45 | ajaksu2 | set | nosy:
+ ajaksu2 messages: + msg82015 components: + Unicode keywords: + patch type: enhancement stage: test needed |
| 2003年11月29日 01:24:21 | mhammond | create | |