[Python-Dev] len(chr(i)) = 2?
Greg Ewing
greg.ewing at canterbury.ac.nz
Wed Nov 24 00:49:42 CET 2010
Alexander Belopolsky wrote:
> """
> Because the most commonly used characters are all in the Basic
> Multilingual Plane, converting between surrogate pairs and the
> original values is often not tested thoroughly. This leads to
> persistent bugs, and potential security holes, even in popular and
> well-reviewed application software.
> """
Maybe Python should have used UTF-8 as its internal unicode
representation. Then people who were foolish enough to assume
one character per string item would have their programs break
rather soon under only light unicode testing. :-)
--
Greg
More information about the Python-Dev
mailing list