[Python-Dev] Difference between PyUnicode_IS_ASCII and PyUnicode_IS_COMPACT_ASCII ?

Tue Dec 20 20:26:51 CET 2011

On 20/12/2011 09:54, Antoine Pitrou wrote:
>> Hello,
>> The include file (unicodeobject.h) seems to imply that some pure ASCII
> strings can be non-compact, but I don't understand how that can happen.

If you create a string from Py_UNICODE* or wchar_t* (using the legacy 
API), PyUnicode_READY() may create a non-compact but ASCII string.
Such string would be in the following state (extract of unicodeobject.h):
 - legacy string, ready:
 * structure = PyUnicodeObject structure
 * test: !PyUnicode_IS_COMPACT(op) && kind != PyUnicode_WCHAR_KIND
 * kind = PyUnicode_1BYTE_KIND, PyUnicode_2BYTE_KIND or
 PyUnicode_4BYTE_KIND
 * compact = 0
 * ready = 1
 * data.any is not NULL
 * utf8 is shared and utf8_length = length with data.any if 
ascii = 1
 * utf8_length = 0 if utf8 is NULL
> Besides, the following comment also seems wrong:
>> - compact:
>> * structure = PyCompactUnicodeObject
> * test: PyUnicode_IS_ASCII(op)&& !PyUnicode_IS_COMPACT(op)

I added the "test" lines recently because I always forget how to get the 
structure type. The correct test should be:
 - compact:
 * structure = PyCompactUnicodeObject
 * test: PyUnicode_IS_COMPACT(op) && !PyUnicode_IS_ASCII(op)
Victor