homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ncoghlan
Recipients ncoghlan
Date 2014年08月21日.12:35:20
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1408624521.11.0.680226004171.issue22016@psf.upfronthosting.co.za>
In-reply-to
Content
Stephen Turnbull suggested on python-dev that this was a bad idea, and after reconsidering the current behaviour in Python 2, I realised that setting surrogateescape and letting the terminal deal with the consequences is exactly what we want.
What confused me is that ls replaces the unknown characters with question marks in the C locale:
$ ls
ニコラス.txt
$ LANG=C ls
????????????.txt
Python 2 passes the bytes through, regardless of locale:
$ python -c "import os; print(os.listdir('.')[0])"
ニコラス.txt
$ LANG=C python -c "import os; print(os.listdir('.')[0])"
ニコラス.txt
Current Python 3 gets confused if the C locale is set, as the encoding on sys.stdout gets set to "ascii", which breaks roundtripping:
$ python3 -c "import os; print(os.listdir('.')[0])"
ニコラス.txt 
$ LANG=C python3 -c "import os; print(os.listdir('.')[0])"
Traceback (most recent call last):
 File "<string>", line 1, in <module> 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11: ordinal not in range(128)
However, Python 3.5 will already set "surrogateescape" on sys.stdout by default, reproducing the behaviour of *Python 2*, rather than the behaviour of ls:
$ LANG=C ~/devel/py3k/python -c "import os; print(os.listdir('.')[0])"
ニコラス.txt
History
Date User Action Args
2014年08月21日 12:35:21ncoghlansetrecipients: + ncoghlan
2014年08月21日 12:35:21ncoghlansetmessageid: <1408624521.11.0.680226004171.issue22016@psf.upfronthosting.co.za>
2014年08月21日 12:35:21ncoghlanlinkissue22016 messages
2014年08月21日 12:35:20ncoghlancreate

AltStyle によって変換されたページ (->オリジナル) /