homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients Rhamphoryncus, amaury.forgeotdarc, ezio.melotti, lemburg, loewis, vstinner
Date 2010年07月09日.09:49:04
SpamBayes Score 4.413009e-11
Marked as misclassified No
Message-id <1278668949.82.0.0997720973536.issue9198@psf.upfronthosting.co.za>
In-reply-to
Content
Here is a patch to "fix" sys_displayhook (note: the patch is just a proof of concept -- it seems to work fine but I still have to clean it up, add comments, rename and reorganize some vars and add tests).
This is an example output while using iso-8859-1 as IO encoding:
wolf@linuxvm:~/dev/py3k$ PYTHONIOENCODING=iso-8859-1 ./python
Python 3.2a0 (py3k:82643:82644M, Jul 9 2010, 11:39:25)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys; sys.stdout.encoding, sys.stdin.encoding
('iso-8859-1', 'iso-8859-1')
>>> 'ascii string'
'ascii string' # works fine
>>> 'some accented chars: öäå'
'some accented chars: öäå' # works fine - these chars are encodable
>>> 'a snowman: \u2603'
'a snowman: \u2603' # non-encodable - the char is escaped instead of raising an error
>>> 'snowman: \u2603, and accented öäå'
'snowman: \u2603, and accented öäå' # only non-encodable chars are escaped
>>> # the behavior of print is still the same:
>>> print('some accented chars: öäå') 
some accented chars: öäå
>>> print('a snowman: \u2603')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2603' in position 11: ordinal not in range(256)
-------------------------------------
While testing the patch with PYTHONIOENCODING=iso-8859-1 I also found this weird issue that however is *not* related to the patch, since I managed to reproduce on a clean py3k using PYTHONIOENCODING=iso-8859-1:
>>> 'òàùèì óáúéí öäüëï'
'ò�\xa0ùèì óáúé�\xad öäüëï'
>>> 'òàùèì óáúéí öäüëï'.encode('iso-8859-1')
b'\xc3\xb2\xc3\xa0\xc3\xb9\xc3\xa8\xc3\xac \xc3\xb3\xc3\xa1\xc3\xba\xc3\xa9\xc3\xad \xc3\xb6\xc3\xa4\xc3\xbc\xc3\xab\xc3\xaf'
>>> 'òàùèì'.encode('utf-8')
b'\xc3\x83\xc2\xb2\xc3\x83\xc2\xa0\xc3\x83\xc2\xb9\xc3\x83\xc2\xa8\xc3\x83\xc2\xac'
I think there might be some conflict between the IO encoding that I specified and the one that my terminal actually uses, but I couldn't figure out what's going on exactly (it also weird that only 'à' and 'í' are not displayed correctly). Unless this behavior is expected I'll open another issue about it.
History
Date User Action Args
2010年07月09日 09:49:10ezio.melottisetrecipients: + ezio.melotti, lemburg, loewis, amaury.forgeotdarc, Rhamphoryncus, vstinner
2010年07月09日 09:49:09ezio.melottisetmessageid: <1278668949.82.0.0997720973536.issue9198@psf.upfronthosting.co.za>
2010年07月09日 09:49:08ezio.melottilinkissue9198 messages
2010年07月09日 09:49:06ezio.melotticreate

AltStyle によって変換されたページ (->オリジナル) /