Message 117566 - Python tracker

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

In-reply-to
Author	vstinner
Recipients	vstinner
Date	2010年09月29日.00:20:45
SpamBayes Score	6.1062266e-16
Marked as misclassified	No
Message-id	<1285719650.86.0.122072445011.issue9979@psf.upfronthosting.co.za>

Content
PyUnicode_AsWideChar() doesn't merge surrogate pairs on a system with 32 bits wchar_t and Python compiled in narrow mode (sizeof(wchar_t) == 4 and sizeof(Py_UNICODE) == 2) => see issue #8670. It is not easy to fix this problem because the callers of PyUnicode_AsWideChar() suppose that the output (wide character) string has the same length (in character) than the input (PyUnicode) string (suppose that sizeof(wchar_t) == sizeof(Py_UNICODE)). And PyUnicode_AsWideChar() doesn't write nul character at the end if the output string is truncated. To prepare this change, a new PyUnicode_AsWideCharString() function would help because it does compute the size of the output buffer (whereas PyUnicode_AsWideChar() requires the output buffer in an argument). Attached patch implements it: ------- /* Convert the Unicode object to a wide character string. The output string always ends with a nul character. If size is not NULL, write the number of wide characters (including the final nul character) into size. Returns a buffer allocated by PyMem_Alloc() (use PyMem_Free() to free it) on success. On error, returns NULL and size is undefined. / PyAPI_FUNC(wchar_t) PyUnicode_AsWideCharString( PyUnicodeObject unicode, / Unicode object / Py_ssize_t size /* number of characters of the result */ ); -------

Content

PyUnicode_AsWideChar() doesn't merge surrogate pairs on a system with 32 bits wchar_t and Python compiled in narrow mode (sizeof(wchar_t) == 4 and sizeof(Py_UNICODE) == 2) => see issue #8670.
It is not easy to fix this problem because the callers of PyUnicode_AsWideChar() suppose that the output (wide character) string has the same length (in character) than the input (PyUnicode) string (suppose that sizeof(wchar_t) == sizeof(Py_UNICODE)). And PyUnicode_AsWideChar() doesn't write nul character at the end if the output string is truncated.
To prepare this change, a new PyUnicode_AsWideCharString() function would help because it does compute the size of the output buffer (whereas PyUnicode_AsWideChar() requires the output buffer in an argument).
Attached patch implements it:
-------
/* Convert the Unicode object to a wide character string. The output string
 always ends with a nul character. If size is not NULL, write the number of
 wide characters (including the final nul character) into *size.
 Returns a buffer allocated by PyMem_Alloc() (use PyMem_Free() to free it) on
 success. On error, returns NULL and *size is undefined. */
PyAPI_FUNC(wchar_t*) PyUnicode_AsWideCharString(
 PyUnicodeObject *unicode, /* Unicode object */
 Py_ssize_t *size /* number of characters of the result */
 );
-------

History
Date	User	Action	Args
2010年09月29日 00:20:51	vstinner	set	recipients: + vstinner
2010年09月29日 00:20:50	vstinner	set	messageid: <1285719650.86.0.122072445011.issue9979@psf.upfronthosting.co.za>
2010年09月29日 00:20:48	vstinner	link	issue9979 messages
2010年09月29日 00:20:47	vstinner	create

homepage