Skip to main content
Stack Overflow
  1. About
  2. For Teams

Return to Answer

edited body
Source Link
kxr
  • 5.6k
  • 1
  • 52
  • 37

First: reload(sys) and setting some random default encoding regarding just forregarding the need of an output terminal stream is the worst, what you can dobad practice. reload often changes things in sys which have been put in place depending on the environment - e.g. sys.stdin/stdout streams, sys.excepthook, etc.

Solving the encode/decode problem on stdout

The best solution I know of for solving the encode/decode problems problem of print'ing unicode strings and beyond-ascii str's (e.g. from literals) on sys.stdout is: to take care of a sys.stdout file(file-like) object) which is capable and optionally tolerant according toregarding the needs.:

  • SoWhen if sys.stdout.encoding for some reason is None for some reason, or non-existing, or erronouslyerroneously false or "less" than what the stdout terminal /or stream really is capable of, then try to provide a correct .encoding attribute. At last by replacing sys.stdout/ & sys.stderr by a translating file-like object.

  • IfWhen the terminal / stream still cannot encode all occurring unicode chars, and when you don't want to breakprint's just because of that, you can introduce an encode-with-replace behavior in the translating file-like object.

Here an example which does both:

Using beyond-ascii plain string literals in Python2Python 2 /2+3 2 + 3 code

The only good reason to change the global default encoding (to UTF-8 only) I think is regarding an application source code decision (and- and not because of I/O stream encodings) issues: For writing beyond-ascii string literals into code without being forced to always use u'string' style unicode escaping. This can be done rather consistently (despite what anonbadger's article says) by taking care of a Python2Python 2 or Python2+3Python 2 + 3 source code basis which uses ascii or UTF-8 plain string literals consistently - as far as those strings potentially undergo silent unicode conversion orand move between modules or potentially go to stdout. For that, prefer "# encoding: utf-8" or ascii (no declaration). Change or drop libraries which still rely in a very dumb way fatally on ascii default encoding errors beyond chr #127 (which is rare today).

...
def set_defaultencoding_globally(encoding='utf-8'):
 assert sys.getdefaultencoding() in ('ascii', 'mbcs', encoding)
 import imp
 _sys_org = imp.load_dynamic('_sys_org', 'sys')
 _sys_org.setdefaultencoding(encoding)
if __name__ == '__main__':
 sys.stdout = sys.stderr = SmartStdout()
 set_defaultencoding_globally('utf-8') 
 s = 'aouäöüфżß2'
 print s

First: reload(sys) and setting some random default encoding regarding just for the need of an output terminal stream is the worst, what you can do. reload often changes things in sys which have been put in place depending on the environment - e.g. sys.stdin/stdout streams, sys.excepthook, etc.

Solving the encode/decode problem on stdout

The best solution I know of for solving the encode/decode problems of print'ing unicode strings and beyond-ascii str literals on sys.stdout is: to take care of a sys.stdout file(-like) object which is capable and optionally tolerant according to the needs.

  • So if sys.stdout.encoding for some reason is None, non-existing, or erronously false or "less" than what the stdout terminal / stream really is capable of, then try to provide a correct .encoding attribute. At last by replacing sys.stdout/.stderr by a translating file-like object.

  • If the terminal / stream still cannot encode all occurring unicode chars, and when you don't want to break just because of that, you can introduce an encode-with-replace behavior in the translating file-like object.

Here an example which does both:

Using beyond-ascii plain string literals in Python2/2+3 code

The only good reason to change the global default encoding (to UTF-8 only) I think is regarding an application source code decision (and not because of I/O stream encodings): For writing beyond-ascii string literals into code without being forced to always use u'string' style unicode escaping. This can be done rather consistently (despite what anonbadger's article says) by taking care of a Python2 or Python2+3 source code basis which uses ascii or UTF-8 plain string literals consistently - as far as those strings potentially undergo silent unicode conversion or move between modules or potentially go to stdout. For that, prefer "# encoding: utf-8" or ascii (no declaration). Change or drop libraries which still rely in a very dumb way fatally on ascii default encoding errors beyond chr #127 (which is rare today).

...
def set_defaultencoding_globally(encoding='utf-8'):
 assert sys.getdefaultencoding() in ('ascii', 'mbcs', encoding)
 import imp
 _sys_org = imp.load_dynamic('_sys_org', 'sys')
 _sys_org.setdefaultencoding(encoding)
if __name__ == '__main__':
 sys.stdout = sys.stderr = SmartStdout()
 set_defaultencoding_globally() 
 s = 'aouäöüфżß2'
 print s

First: reload(sys) and setting some random default encoding just regarding the need of an output terminal stream is bad practice. reload often changes things in sys which have been put in place depending on the environment - e.g. sys.stdin/stdout streams, sys.excepthook, etc.

Solving the encode problem on stdout

The best solution I know for solving the encode problem of print'ing unicode strings and beyond-ascii str's (e.g. from literals) on sys.stdout is: to take care of a sys.stdout (file-like object) which is capable and optionally tolerant regarding the needs:

  • When sys.stdout.encoding is None for some reason, or non-existing, or erroneously false or "less" than what the stdout terminal or stream really is capable of, then try to provide a correct .encoding attribute. At last by replacing sys.stdout & sys.stderr by a translating file-like object.

  • When the terminal / stream still cannot encode all occurring unicode chars, and when you don't want to breakprint's just because of that, you can introduce an encode-with-replace behavior in the translating file-like object.

Here an example:

Using beyond-ascii plain string literals in Python 2 / 2 + 3 code

The only good reason to change the global default encoding (to UTF-8 only) I think is regarding an application source code decision - and not because of I/O stream encodings issues: For writing beyond-ascii string literals into code without being forced to always use u'string' style unicode escaping. This can be done rather consistently (despite what anonbadger's article says) by taking care of a Python 2 or Python 2 + 3 source code basis which uses ascii or UTF-8 plain string literals consistently - as far as those strings potentially undergo silent unicode conversion and move between modules or potentially go to stdout. For that, prefer "# encoding: utf-8" or ascii (no declaration). Change or drop libraries which still rely in a very dumb way fatally on ascii default encoding errors beyond chr #127 (which is rare today).

...
def set_defaultencoding_globally(encoding='utf-8'):
 assert sys.getdefaultencoding() in ('ascii', 'mbcs', encoding)
 import imp
 _sys_org = imp.load_dynamic('_sys_org', 'sys')
 _sys_org.setdefaultencoding(encoding)
if __name__ == '__main__':
 sys.stdout = sys.stderr = SmartStdout()
 set_defaultencoding_globally('utf-8') 
 s = 'aouäöüфżß2'
 print s
added 149 characters in body
Source Link
kxr
  • 5.6k
  • 1
  • 52
  • 37

Note: plains strings then are implicitely converted from utf-8 to unicode in SmartStdout before being converted to the output stream enconding.

Note: plains strings then are implicitely converted from utf-8 to unicode in SmartStdout before being converted to the output stream enconding.

Source Link
kxr
  • 5.6k
  • 1
  • 52
  • 37

First: reload(sys) and setting some random default encoding regarding just for the need of an output terminal stream is the worst, what you can do. reload often changes things in sys which have been put in place depending on the environment - e.g. sys.stdin/stdout streams, sys.excepthook, etc.

Solving the encode/decode problem on stdout

The best solution I know of for solving the encode/decode problems of print'ing unicode strings and beyond-ascii str literals on sys.stdout is: to take care of a sys.stdout file(-like) object which is capable and optionally tolerant according to the needs.

  • So if sys.stdout.encoding for some reason is None, non-existing, or erronously false or "less" than what the stdout terminal / stream really is capable of, then try to provide a correct .encoding attribute. At last by replacing sys.stdout/.stderr by a translating file-like object.

  • If the terminal / stream still cannot encode all occurring unicode chars, and when you don't want to break just because of that, you can introduce an encode-with-replace behavior in the translating file-like object.

Here an example which does both:

#!/usr/bin/env python
# encoding: utf-8
import sys
class SmartStdout:
 def __init__(self, encoding=None, org_stdout=None):
 if org_stdout is None:
 org_stdout = getattr(sys.stdout, 'org_stdout', sys.stdout)
 self.org_stdout = org_stdout
 self.encoding = encoding or \
 getattr(org_stdout, 'encoding', None) or 'utf-8'
 def write(self, s):
 self.org_stdout.write(s.encode(self.encoding, 'backslashreplace'))
 def __getattr__(self, name):
 return getattr(self.org_stdout, name)
if __name__ == '__main__':
 if sys.stdout.isatty():
 sys.stdout = sys.stderr = SmartStdout()
 us = u'aouäöüфżß2'
 print us
 sys.stdout.flush()

Using beyond-ascii plain string literals in Python2/2+3 code

The only good reason to change the global default encoding (to UTF-8 only) I think is regarding an application source code decision (and not because of I/O stream encodings): For writing beyond-ascii string literals into code without being forced to always use u'string' style unicode escaping. This can be done rather consistently (despite what anonbadger's article says) by taking care of a Python2 or Python2+3 source code basis which uses ascii or UTF-8 plain string literals consistently - as far as those strings potentially undergo silent unicode conversion or move between modules or potentially go to stdout. For that, prefer "# encoding: utf-8" or ascii (no declaration). Change or drop libraries which still rely in a very dumb way fatally on ascii default encoding errors beyond chr #127 (which is rare today).

And do like this at application start (and/or via sitecustomize.py) in addition to the SmartStdout scheme above - without using reload(sys):

...
def set_defaultencoding_globally(encoding='utf-8'):
 assert sys.getdefaultencoding() in ('ascii', 'mbcs', encoding)
 import imp
 _sys_org = imp.load_dynamic('_sys_org', 'sys')
 _sys_org.setdefaultencoding(encoding)
if __name__ == '__main__':
 sys.stdout = sys.stderr = SmartStdout()
 set_defaultencoding_globally() 
 s = 'aouäöüфżß2'
 print s

This way string literals and most operations (except character iteration) work comfortable without thinking about unicode conversion as if there would be Python3 only. File I/O of course always need special care regarding encodings - as it is in Python3.

lang-py

AltStyle によって変換されたページ (->オリジナル) /