Return to Revisions

3 of 3

edited body

edited Apr 27, 2017 at 10:53

5.6k
1
52
37

First: reload(sys) and setting some random default encoding just regarding the need of an output terminal stream is bad practice. reload often changes things in sys which have been put in place depending on the environment - e.g. sys.stdin/stdout streams, sys.excepthook, etc.

Solving the encode problem on stdout

The best solution I know for solving the encode problem of print'ing unicode strings and beyond-ascii str's (e.g. from literals) on sys.stdout is: to take care of a sys.stdout (file-like object) which is capable and optionally tolerant regarding the needs:

When sys.stdout.encoding is None for some reason, or non-existing, or erroneously false or "less" than what the stdout terminal or stream really is capable of, then try to provide a correct .encoding attribute. At last by replacing sys.stdout & sys.stderr by a translating file-like object.
When the terminal / stream still cannot encode all occurring unicode chars, and when you don't want to break print's just because of that, you can introduce an encode-with-replace behavior in the translating file-like object.

Here an example:

#!/usr/bin/env python
# encoding: utf-8
import sys
class SmartStdout:
 def __init__(self, encoding=None, org_stdout=None):
 if org_stdout is None:
 org_stdout = getattr(sys.stdout, 'org_stdout', sys.stdout)
 self.org_stdout = org_stdout
 self.encoding = encoding or \
 getattr(org_stdout, 'encoding', None) or 'utf-8'
 def write(self, s):
 self.org_stdout.write(s.encode(self.encoding, 'backslashreplace'))
 def __getattr__(self, name):
 return getattr(self.org_stdout, name)
if __name__ == '__main__':
 if sys.stdout.isatty():
 sys.stdout = sys.stderr = SmartStdout()
 us = u'aouäöüфżß2'
 print us
 sys.stdout.flush()

Using beyond-ascii plain string literals in Python 2 / 2 + 3 code

The only good reason to change the global default encoding (to UTF-8 only) I think is regarding an application source code decision - and not because of I/O stream encodings issues: For writing beyond-ascii string literals into code without being forced to always use u'string' style unicode escaping. This can be done rather consistently (despite what anonbadger's article says) by taking care of a Python 2 or Python 2 + 3 source code basis which uses ascii or UTF-8 plain string literals consistently - as far as those strings potentially undergo silent unicode conversion and move between modules or potentially go to stdout. For that, prefer "# encoding: utf-8" or ascii (no declaration). Change or drop libraries which still rely in a very dumb way fatally on ascii default encoding errors beyond chr #127 (which is rare today).

And do like this at application start (and/or via sitecustomize.py) in addition to the SmartStdout scheme above - without using reload(sys):

...
def set_defaultencoding_globally(encoding='utf-8'):
 assert sys.getdefaultencoding() in ('ascii', 'mbcs', encoding)
 import imp
 _sys_org = imp.load_dynamic('_sys_org', 'sys')
 _sys_org.setdefaultencoding(encoding)
if __name__ == '__main__':
 sys.stdout = sys.stderr = SmartStdout()
 set_defaultencoding_globally('utf-8') 
 s = 'aouäöüфżß2'
 print s

This way string literals and most operations (except character iteration) work comfortable without thinking about unicode conversion as if there would be Python3 only. File I/O of course always need special care regarding encodings - as it is in Python3.

Note: plains strings then are implicitely converted from utf-8 to unicode in SmartStdout before being converted to the output stream enconding.

answered Feb 9, 2017 at 20:18

kxr

5.6k
1
52
37

CollectivesTM on Stack Overflow

Return to Revisions

Solving the encode problem on stdout

Using beyond-ascii plain string literals in Python 2 / 2 + 3 code