This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2014年07月20日 11:19 by ncoghlan, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Messages (3) | |||
|---|---|---|---|
| msg223508 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2014年07月20日 11:19 | |
This would be along the same lines as xmlcharrefreplace and backslashreplace, but only affect surrogate escaped characters. Unlike surrogate escape, which reproduces the escaped characters directly in the data stream, this would follow the 'replace' error handler and insert an appropriately encoded '?' character in the output stream. The use case would be any context where losing the escaped characters is preferred to either potentially injecting arbitrary binary data into the output (surrogateescape), failing with an exception (strict), or any of the other existing codecs. It would differ from 'replace' in that normal code points that can't be encoded at all would still trigger an error. |
|||
| msg225607 - (view) | Author: Alyssa Coghlan (ncoghlan) * (Python committer) | Date: 2014年08月21日 12:35 | |
Stephen Turnbull suggested on python-dev that this was a bad idea, and after reconsidering the current behaviour in Python 2, I realised that setting surrogateescape and letting the terminal deal with the consequences is exactly what we want.
What confused me is that ls replaces the unknown characters with question marks in the C locale:
$ ls
ニコラス.txt
$ LANG=C ls
????????????.txt
Python 2 passes the bytes through, regardless of locale:
$ python -c "import os; print(os.listdir('.')[0])"
ニコラス.txt
$ LANG=C python -c "import os; print(os.listdir('.')[0])"
ニコラス.txt
Current Python 3 gets confused if the C locale is set, as the encoding on sys.stdout gets set to "ascii", which breaks roundtripping:
$ python3 -c "import os; print(os.listdir('.')[0])"
ニコラス.txt
$ LANG=C python3 -c "import os; print(os.listdir('.')[0])"
Traceback (most recent call last):
File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-11: ordinal not in range(128)
However, Python 3.5 will already set "surrogateescape" on sys.stdout by default, reproducing the behaviour of *Python 2*, rather than the behaviour of ls:
$ LANG=C ~/devel/py3k/python -c "import os; print(os.listdir('.')[0])"
ニコラス.txt
|
|||
| msg308565 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2017年12月18日 14:36 | |
Follow-up: the PEP 538 (bpo-28180) and PEP 540 (bpo-29240) have been accepted and implemented in Python 3.7! |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:06 | admin | set | github: 66215 |
| 2017年12月18日 14:36:37 | vstinner | set | nosy:
+ vstinner messages: + msg308565 |
| 2014年08月21日 12:35:21 | ncoghlan | set | status: open -> closed type: enhancement messages: + msg225607 resolution: rejected stage: resolved |
| 2014年07月20日 11:19:14 | ncoghlan | create | |