Message109702
| Author |
ezio.melotti |
| Recipients |
Rhamphoryncus, amaury.forgeotdarc, ezio.melotti, lemburg, loewis, vstinner |
| Date |
2010年07月09日.09:49:04 |
| SpamBayes Score |
4.413009e-11 |
| Marked as misclassified |
No |
| Message-id |
<1278668949.82.0.0997720973536.issue9198@psf.upfronthosting.co.za> |
| In-reply-to |
| Content |
Here is a patch to "fix" sys_displayhook (note: the patch is just a proof of concept -- it seems to work fine but I still have to clean it up, add comments, rename and reorganize some vars and add tests).
This is an example output while using iso-8859-1 as IO encoding:
wolf@linuxvm:~/dev/py3k$ PYTHONIOENCODING=iso-8859-1 ./python
Python 3.2a0 (py3k:82643:82644M, Jul 9 2010, 11:39:25)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys; sys.stdout.encoding, sys.stdin.encoding
('iso-8859-1', 'iso-8859-1')
>>> 'ascii string'
'ascii string' # works fine
>>> 'some accented chars: öäå'
'some accented chars: öäå' # works fine - these chars are encodable
>>> 'a snowman: \u2603'
'a snowman: \u2603' # non-encodable - the char is escaped instead of raising an error
>>> 'snowman: \u2603, and accented öäå'
'snowman: \u2603, and accented öäå' # only non-encodable chars are escaped
>>> # the behavior of print is still the same:
>>> print('some accented chars: öäå')
some accented chars: öäå
>>> print('a snowman: \u2603')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2603' in position 11: ordinal not in range(256)
-------------------------------------
While testing the patch with PYTHONIOENCODING=iso-8859-1 I also found this weird issue that however is *not* related to the patch, since I managed to reproduce on a clean py3k using PYTHONIOENCODING=iso-8859-1:
>>> 'òàùèì óáúéí öäüëï'
'ò�\xa0ùèì óáúé�\xad öäüëï'
>>> 'òàùèì óáúéí öäüëï'.encode('iso-8859-1')
b'\xc3\xb2\xc3\xa0\xc3\xb9\xc3\xa8\xc3\xac \xc3\xb3\xc3\xa1\xc3\xba\xc3\xa9\xc3\xad \xc3\xb6\xc3\xa4\xc3\xbc\xc3\xab\xc3\xaf'
>>> 'òàùèì'.encode('utf-8')
b'\xc3\x83\xc2\xb2\xc3\x83\xc2\xa0\xc3\x83\xc2\xb9\xc3\x83\xc2\xa8\xc3\x83\xc2\xac'
I think there might be some conflict between the IO encoding that I specified and the one that my terminal actually uses, but I couldn't figure out what's going on exactly (it also weird that only 'à' and 'í' are not displayed correctly). Unless this behavior is expected I'll open another issue about it. |
|