[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
Baptiste Carvello
baptiste13z at free.fr
Wed Apr 29 10:43:49 CEST 2009
Lino Mastrodomenico a écrit :
>> Only for the new utf-8b encoding (if Martin agrees), while the
> existing utf-8 is fine as is (or at least waaay outside the scope of
> this PEP).
>
This is questionable. This would have the consequence that \udcxx in a python
string would sometimes mean a surrogate, and sometimes mean raw bytes, depending
on the history of the string.
By contrast, if the new utf-8b codec would *supercede* the old one, \udcxx would
always mean raw bytes (at least on UCS-4 builds, where surrogates are unused).
Thus ambiguity could be avoided.
Baptiste
More information about the Python-Dev
mailing list