[Python-Dev] len(chr(i)) = 2?

Wed Nov 24 11:27:30 CET 2010

On 2010年11月24日 18:51:49 +0900
"Stephen J. Turnbull" <stephen at xemacs.org> wrote:
> James Y Knight writes:
>> > But, now, if your choices are UTF-8 or UTF-16, UTF-8 is clearly
> > superior [...]a because it is an ASCII superset, and thus more
> > easily compatible with other software. That also makes it most
> > commonly used for internet communication.
>> Sure, UTF-8 is very nice as a protocol for communicating text. So
> what? If your application involves shoveling octets real fast, don't
> convert and shovel those octets. If your application involves
> significant text processing, well, conversion can almost always be
> done as fast as you can do I/O so it doesn't cost wallclock time, and
> generally doesn't require a huge percentage of CPU time compared to
> the actual text processing. It's just a specialization of
> serialization, that we do all the time for more complex data
> structures.
>> So wire protocols are not a killer argument for or against any
> particular internal representation of text.

Agreed. Decoding and encoding utf-8 is so fast that it should be
dwarfed by any actual processing done on the text.
Regards
Antoine.