This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010年09月29日 00:20 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| pyunicode_aswidecharstring-2.patch | vstinner, 2010年09月29日 01:00 | |||
| Messages (6) | |||
|---|---|---|---|
| msg117566 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年09月29日 00:20 | |
PyUnicode_AsWideChar() doesn't merge surrogate pairs on a system with 32 bits wchar_t and Python compiled in narrow mode (sizeof(wchar_t) == 4 and sizeof(Py_UNICODE) == 2) => see issue #8670. It is not easy to fix this problem because the callers of PyUnicode_AsWideChar() suppose that the output (wide character) string has the same length (in character) than the input (PyUnicode) string (suppose that sizeof(wchar_t) == sizeof(Py_UNICODE)). And PyUnicode_AsWideChar() doesn't write nul character at the end if the output string is truncated. To prepare this change, a new PyUnicode_AsWideCharString() function would help because it does compute the size of the output buffer (whereas PyUnicode_AsWideChar() requires the output buffer in an argument). Attached patch implements it: ------- /* Convert the Unicode object to a wide character string. The output string always ends with a nul character. If size is not NULL, write the number of wide characters (including the final nul character) into *size. Returns a buffer allocated by PyMem_Alloc() (use PyMem_Free() to free it) on success. On error, returns NULL and *size is undefined. */ PyAPI_FUNC(wchar_t*) PyUnicode_AsWideCharString( PyUnicodeObject *unicode, /* Unicode object */ Py_ssize_t *size /* number of characters of the result */ ); ------- |
|||
| msg117570 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年09月29日 01:00 | |
New version of the patch: - fix PyUnicode_AsWideCharString() :-) - replace PyUnicode_AsWideChar() by PyUnicode_AsWideCharString() in most functions using PyUnicode_AsWideChar() - indicate that PyUnicode_AsWideCharString() raises a MemoryError on error Keep the call to PyUnicode_AsWideChar() in: - Modules/getpath.c because getpath.c uses a global limitation of MAXPATHLEN+1 characters - WCharArray_set_value() and U_set() of ctypes because the output buffer size is fixed |
|||
| msg117577 - (view) | Author: Marc-Andre Lemburg (lemburg) * (Python committer) | Date: 2010年09月29日 06:54 | |
STINNER Victor wrote: > > New submission from STINNER Victor <victor.stinner@haypocalc.com>: > > PyUnicode_AsWideChar() doesn't merge surrogate pairs on a system with 32 bits wchar_t and Python compiled in narrow mode (sizeof(wchar_t) == 4 and sizeof(Py_UNICODE) == 2) => see issue #8670. > > It is not easy to fix this problem because the callers of PyUnicode_AsWideChar() suppose that the output (wide character) string has the same length (in character) than the input (PyUnicode) string (suppose that sizeof(wchar_t) == sizeof(Py_UNICODE)). And PyUnicode_AsWideChar() doesn't write nul character at the end if the output string is truncated. > > To prepare this change, a new PyUnicode_AsWideCharString() function would help because it does compute the size of the output buffer (whereas PyUnicode_AsWideChar() requires the output buffer in an argument). Great idea ! |
|||
| msg117578 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) | Date: 2010年09月29日 07:11 | |
+1 from me as well. But shouldn't PyUnicode_AsWideCharString() merge surrogate pairs when it can? The implementation doesn't do this. |
|||
| msg117586 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年09月29日 09:14 | |
> But shouldn't PyUnicode_AsWideCharString() merge surrogate pairs when it > can? The implementation doesn't do this. I don't want to do two different things at the same time. My plan is: - create PyUnicode_AsWideCharString() - use PyUnicode_AsWideCharString() everywhere - patch unicode_aswidechar() (used by PyUnicode_AsWideChar() and PyUnicode_AsWideCharString()) to convert surrogates when needed So, you agree with the API (and the documentation)? |
|||
| msg117592 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2010年09月29日 10:41 | |
I fixed in this issue in multiple commits: - r85093: create PyUnicode_AsWideCharString() - r85094: use it in import.c - r85095: use it for _locale.strcoll() - r85096: use it for time.strftime() - r85097: use it in _ctypes module > So, you agree with the API (and the documentation)? Well, you can now directly patch the documentation. I think that the API is simple and fine :-) |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:07 | admin | set | github: 54188 |
| 2010年09月29日 10:41:57 | vstinner | set | status: open -> closed resolution: fixed messages: + msg117592 |
| 2010年09月29日 09:14:24 | vstinner | set | messages: + msg117586 |
| 2010年09月29日 07:11:26 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages: + msg117578 |
| 2010年09月29日 06:54:20 | lemburg | set | messages: + msg117577 |
| 2010年09月29日 01:01:02 | vstinner | set | files: - pyunicode_aswidecharstring.patch |
| 2010年09月29日 01:00:57 | vstinner | set | files:
+ pyunicode_aswidecharstring-2.patch messages: + msg117570 |
| 2010年09月29日 00:28:14 | stutzbach | set | nosy:
+ lemburg, ezio.melotti |
| 2010年09月29日 00:20:49 | vstinner | create | |