[Python-3000] Console encoding detection broken

"Martin v. Löwis" martin at v.loewis.de
Fri Aug 10 08:17:32 CEST 2007


Georg Brandl schrieb:
> Well, subject says it all. While 2.5 sets sys.std*.encoding correctly to
> UTF-8, 3k sets it to 'latin-1', breaking output of Unicode strings.

And not surprisingly so: io.py says
 if encoding is None:
 # XXX This is questionable
 encoding = sys.getfilesystemencoding() or "latin-1"
First, at the point where this call is made, sys.getfilesystemencoding
is still None, plus the code is broken as getfilesystemencoding is not
the correct value for sys.stdout.encoding. Instead, the way it should
be computed is:
1. On Unix, use the same value that sys.getfilesystemencoding will get,
 namely the result of nl_langinfo(CODESET); if that is not available,
 fall back - to anything, but the most logical choices are UTF-8
 (if you want output to always succeed) and ASCII (if you don't want
 to risk mojibake).
2. On Windows, if output is to a terminal, use GetConsoleOutputCP.
 Else fall back, probably to CP_ACP (ie. "mbcs")
3. On OSX, I don't know. If output is to a terminal, UTF-8 may be
 a good bet (although some people operate their Terminal.apps
 not in UTF-8; there is no way to find out). Otherwise, use the
 locale's encoding - not sure how to find out what that is.
Regards,
Martin


More information about the Python-3000 mailing list

AltStyle によって変換されたページ (->オリジナル) /