homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: IDLE shows traceback when printing non-BMP character
Type: behavior Stage: resolved
Components: IDLE Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: THRlWiTi, belopolsky, martin.panter, serhiy.storchaka, terry.reedy
Priority: normal Keywords:

Created on 2014年10月27日 16:18 by belopolsky, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Messages (8)
msg230078 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2014年10月27日 16:18
>>> print("\N{ROCKET}")
Traceback (most recent call last):
 File "<pyshell#1>", line 1, in <module>
 print("\N{ROCKET}")
 File "idlelib/PyShell.py", line 1352, in write
 return self.shell.write(s, self.tags)
UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001f680' in position 0: Non-BMP character not supported in Tk
Shouldn't IDLE replace non-encodable characters with "\uFFFD"?
I think
>>> "\N{ROCKET}"
�
is user-friendlier than the traceback.
See also #14304.
msg230416 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014年11月01日 00:36
I think Idle should consistently display astral chars with their \U escape. It sometimes does, just not always.
>>> s='\U0001f680'
>>> s
'\U0001f680'
>>> str(s)
'\U0001f680'
>>> repr(s)
"'\U0001f680'"
>>> print(s) # gives error above.
>>> print(str(s)) #ditto
I thought that implicit print of expression and overt print of the same expression were supposed to be the same.
#21084 is also about this general issue.
msg340675 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019年04月22日 19:05
On my puzzlement above: repr(s) is a string of 3 characters -- s bracketed by quote characters. print(repr(s)) fails. I am not sure how s gets expanded to the full escape in IDLE. ascii(s) expands all non-ascii and adds extra quotes. Need to check Shell code.
In the python REPL, astral chars are not expanded to escape sequences.
>>> s='\U0001f603'
>>> s
'😃' # Windows REPL shows two replacement boxes instead of 😃
#36698 is about astral chars in exceptions messages.
>>> raise Exception(s)
results in the Exception traceback, 3 Unicodedecode tracebacks, and a restart.
msg340820 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2019年04月25日 00:55
I haven’t looked at the code, but I suspect Idle implements a custom "sys.displayhook":
>>> help(sys.displayhook)
Help on function displayhook in module idlelib.rpc:
displayhook(value)
 Override standard display hook to use non-locale encoding
>>> sys.displayhook('\N{ROCKET}')
'\U0001f680'
>>> sys.__displayhook__('\N{ROCKET}')
Traceback (most recent call last):
 File "<pyshell#20>", line 1, in <module>
 sys.__displayhook__('\N{ROCKET}')
 File "/usr/lib/python3.5/idlelib/PyShell.py", line 1344, in write
 return self.shell.write(s, self.tags)
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 1-1: Non-BMP character not supported in Tk
msg353926 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019年10月04日 11:46
Fixed by PR 16545 (see issue13153).
msg353931 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019年10月04日 11:59
It was fixed for all valid Unicode characters, you can still get an error when print a surrogate character to the stderr on Linux:
>>> import sys
>>> print('\ud800', file=sys.stderr)
Traceback (most recent call last):
 File "<pyshell#4>", line 1, in <module>
 print('\ud800', file=sys.stderr)
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed
In the Python REPL you get an escaped sequence.
>>> import sys
>>> print('\ud800', file=sys.stderr)
\ud800
msg353963 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019年10月04日 17:55
Printing the unquoted escape representation rather than a replacement char is a bit strange and not what I expect from the python docs. I could see it as a bug. In any case, on Windows, it is the Python REPL that raises, but only for sys.stdout.
>>> import sys
>>> print('\ud800', file=sys.stderr)
\ud800
>>> print('\ud800', file=sys.stdout)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed
whereas on Windows the surrogate is displayed as a box with diagonal lines ([X] compressed in one char) in both cases. When copied and pasted into FireFox, the pasted surrogate shows as a square box containing mini D 8 0 0 chars.
>>> print('\ud800', file=sys.stdout)
�
>>> print('\ud800', file=sys.stderr)
�
I consider putting the undisplayable codepoint, rather than a replacement character, into the editor buffer (however tcl encodes it) so that IDLE can retrieve it without loss of information the proper thing for tk to do. IDLE can then potentially identify the character to the user.
===
An oddity though. With
>>> import tkinter as tk
>>> r = tk.Tk()
>>> t = tk.Text(r)
>>> t.pack()
>>> t.insert('insert', 'a\ud800b')
the box is an empty square, not crossed. But when I copy-paste 'a�b' into the font sample (Serhiy, making this editable was a great idea), it is crossed for every font I tried, even for Courier, which is what is being used in text t.
msg354193 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019年10月08日 12:05
And with PR 16583 it is now completely fixed. I.e. it can only fail in cases when the regular interactive interpreter fails too.
History
Date User Action Args
2022年04月11日 14:58:09adminsetgithub: 66931
2019年10月08日 12:05:45serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg354193

stage: needs patch -> resolved
2019年10月04日 17:55:38terry.reedysetmessages: + msg353963
stage: needs patch
2019年10月04日 11:59:57serhiy.storchakasetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg353931

stage: resolved -> (no value)
2019年10月04日 11:46:09serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg353926

resolution: fixed
stage: needs patch -> resolved
2019年04月25日 00:55:41martin.pantersetnosy: + martin.panter
messages: + msg340820
2019年04月22日 19:09:13terry.reedylinkissue36698 superseder
2019年04月22日 19:05:27terry.reedysetmessages: + msg340675
versions: + Python 3.8, - Python 3.6
2017年06月19日 19:06:18terry.reedysetassignee: terry.reedy
components: + IDLE, - Library (Lib)
versions: + Python 3.6, Python 3.7, - Python 2.7, Python 3.4, Python 3.5
2015年12月06日 13:00:03THRlWiTisetnosy: + THRlWiTi
2014年11月01日 00:36:20terry.reedysetversions: + Python 2.7, Python 3.4, Python 3.5
nosy: + terry.reedy

messages: + msg230416

stage: needs patch
2014年10月27日 16:18:24belopolskycreate

AltStyle によって変換されたページ (->オリジナル) /