[Python-ideas] Python 3000 TIOBE -3%

Tue Feb 14 09:02:16 CET 2012

Nick Coghlan writes:
 > I'd hazard a guess that the non-ASCII compatible encoding mostly
 > likely to be encountered outside Asia is UTF-16.
In other words, only people who insist on messing with
application/octet-stream files (like Word ;-). They don't deserve the
pain, but they're gonna feel it anyway.
 > The choice is really between "never give me UnicodeErrors, but feel
 > free to silently corrupt the data stream if I do the wrong thing
 > with that data" (i.e. "latin-1")
Yes.
 > and "correctly handle any ASCII compatible encoding, but still
 > throw UnicodeEncodeError if I'm about to emit corrupted data"
 > ("ascii+surrogateescape").
Not if I understand what ascii+surrogateescape would do correctly.
Yes, you can pass through verbatim, but AFAICS you would have to work
quite hard to do anything to that stream that would cause a
UnicodeError in your program, even though you corrupt it. (Eg, delete
half of a multibyte EUC character.)
The question is what happens if you run into a validating processor
internally -- then you'll see an error (even though you're just
passing it through verbatim!)