Return to Revisions

1 of 3

answered Feb 9, 2017 at 20:18

5.6k
1
52
37

First: reload(sys) and setting some random default encoding regarding just for the need of an output terminal stream is the worst, what you can do. reload often changes things in sys which have been put in place depending on the environment - e.g. sys.stdin/stdout streams, sys.excepthook, etc.

Solving the encode/decode problem on stdout

The best solution I know of for solving the encode/decode problems of print'ing unicode strings and beyond-ascii str literals on sys.stdout is: to take care of a sys.stdout file(-like) object which is capable and optionally tolerant according to the needs.

So if sys.stdout.encoding for some reason is None, non-existing, or erronously false or "less" than what the stdout terminal / stream really is capable of, then try to provide a correct .encoding attribute. At last by replacing sys.stdout/.stderr by a translating file-like object.
If the terminal / stream still cannot encode all occurring unicode chars, and when you don't want to break just because of that, you can introduce an encode-with-replace behavior in the translating file-like object.

Here an example which does both:

#!/usr/bin/env python
# encoding: utf-8
import sys
class SmartStdout:
 def __init__(self, encoding=None, org_stdout=None):
 if org_stdout is None:
 org_stdout = getattr(sys.stdout, 'org_stdout', sys.stdout)
 self.org_stdout = org_stdout
 self.encoding = encoding or \
 getattr(org_stdout, 'encoding', None) or 'utf-8'
 def write(self, s):
 self.org_stdout.write(s.encode(self.encoding, 'backslashreplace'))
 def __getattr__(self, name):
 return getattr(self.org_stdout, name)
if __name__ == '__main__':
 if sys.stdout.isatty():
 sys.stdout = sys.stderr = SmartStdout()
 us = u'aouäöüфżß2'
 print us
 sys.stdout.flush()

Using beyond-ascii plain string literals in Python2/2+3 code

The only good reason to change the global default encoding (to UTF-8 only) I think is regarding an application source code decision (and not because of I/O stream encodings): For writing beyond-ascii string literals into code without being forced to always use u'string' style unicode escaping. This can be done rather consistently (despite what anonbadger's article says) by taking care of a Python2 or Python2+3 source code basis which uses ascii or UTF-8 plain string literals consistently - as far as those strings potentially undergo silent unicode conversion or move between modules or potentially go to stdout. For that, prefer "# encoding: utf-8" or ascii (no declaration). Change or drop libraries which still rely in a very dumb way fatally on ascii default encoding errors beyond chr #127 (which is rare today).

And do like this at application start (and/or via sitecustomize.py) in addition to the SmartStdout scheme above - without using reload(sys):

...
def set_defaultencoding_globally(encoding='utf-8'):
 assert sys.getdefaultencoding() in ('ascii', 'mbcs', encoding)
 import imp
 _sys_org = imp.load_dynamic('_sys_org', 'sys')
 _sys_org.setdefaultencoding(encoding)
if __name__ == '__main__':
 sys.stdout = sys.stderr = SmartStdout()
 set_defaultencoding_globally() 
 s = 'aouäöüфżß2'
 print s

This way string literals and most operations (except character iteration) work comfortable without thinking about unicode conversion as if there would be Python3 only. File I/O of course always need special care regarding encodings - as it is in Python3.

answered Feb 9, 2017 at 20:18

kxr

5.6k
1
52
37

CollectivesTM on Stack Overflow

Return to Revisions

Solving the encode/decode problem on stdout

Using beyond-ascii plain string literals in Python2/2+3 code