Revision c7829e04-a7a7-48d6-be8c-26216efd5e50 - Stack Overflow
First: `reload(sys)` and setting some random default encoding just regarding the need of an output terminal stream is bad practice. `reload` often changes things in sys which have been put in place depending on the environment - e.g. sys.stdin/stdout streams, sys.excepthook, etc.
Solving the encode problem on stdout
---
The best solution I know for solving the encode problem of `print`'ing unicode strings and beyond-ascii `str`'s (e.g. from literals) on sys.stdout is: to take care of a sys.stdout (file-like object) which is capable and optionally tolerant regarding the needs:
- When `sys.stdout.encoding` is `None` for some reason, or non-existing, or erroneously false or "less" than what the stdout terminal or stream really is capable of, then try to provide a correct `.encoding` attribute. At last by replacing `sys.stdout & sys.stderr` by a translating file-like object.
- When the terminal / stream still cannot encode all occurring unicode chars, and when you don't want to break `print`'s just because of that, you can introduce an encode-with-replace behavior in the translating file-like object.
Here an example:
#!/usr/bin/env python
# encoding: utf-8
import sys
class SmartStdout:
def __init__(self, encoding=None, org_stdout=None):
if org_stdout is None:
org_stdout = getattr(sys.stdout, 'org_stdout', sys.stdout)
self.org_stdout = org_stdout
self.encoding = encoding or \
getattr(org_stdout, 'encoding', None) or 'utf-8'
def write(self, s):
self.org_stdout.write(s.encode(self.encoding, 'backslashreplace'))
def __getattr__(self, name):
return getattr(self.org_stdout, name)
if __name__ == '__main__':
if sys.stdout.isatty():
sys.stdout = sys.stderr = SmartStdout()
us = u'aouäöüфżß²'
print us
sys.stdout.flush()
Using beyond-ascii plain string literals in Python 2 / 2 + 3 code
---
The only good reason to change the global default encoding (to UTF-8 only) I think is regarding an application **source code** decision - and not because of I/O stream encodings issues: For writing beyond-ascii string literals into code without being forced to always use `u'string'` style unicode escaping. This can be done rather consistently (despite what *anonbadger*'s article says) by taking care of a Python 2 or Python 2 + 3 source code basis which uses ascii or UTF-8 plain string literals consistently - as far as those strings potentially undergo silent unicode conversion and move between modules or potentially go to stdout. For that, prefer "`# encoding: utf-8`" or ascii (no declaration). Change or drop libraries which still rely in a very dumb way fatally on ascii default encoding errors beyond chr #127 (which is rare today).
And do like this at application start (and/or via sitecustomize.py) in addition to the `SmartStdout` scheme above - without using `reload(sys)`:
...
def set_defaultencoding_globally(encoding='utf-8'):
assert sys.getdefaultencoding() in ('ascii', 'mbcs', encoding)
import imp
_sys_org = imp.load_dynamic('_sys_org', 'sys')
_sys_org.setdefaultencoding(encoding)
if __name__ == '__main__':
sys.stdout = sys.stderr = SmartStdout()
set_defaultencoding_globally('utf-8')
s = 'aouäöüфżß²'
print s
This way string literals and most operations (except character iteration) work comfortable without thinking about unicode conversion as if there would be Python3 only.
File I/O of course always need special care regarding encodings - as it is in Python3.
Note: plains strings then are implicitely converted from utf-8 to unicode in `SmartStdout` before being converted to the output stream enconding.