This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2014年10月27日 16:18 by belopolsky, last changed 2022年04月11日 14:58 by admin. This issue is now closed.
| Messages (8) | |||
|---|---|---|---|
| msg230078 - (view) | Author: Alexander Belopolsky (belopolsky) * (Python committer) | Date: 2014年10月27日 16:18 | |
>>> print("\N{ROCKET}")
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
print("\N{ROCKET}")
File "idlelib/PyShell.py", line 1352, in write
return self.shell.write(s, self.tags)
UnicodeEncodeError: 'UCS-2' codec can't encode character '\U0001f680' in position 0: Non-BMP character not supported in Tk
Shouldn't IDLE replace non-encodable characters with "\uFFFD"?
I think
>>> "\N{ROCKET}"
�
is user-friendlier than the traceback.
See also #14304.
|
|||
| msg230416 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2014年11月01日 00:36 | |
I think Idle should consistently display astral chars with their \U escape. It sometimes does, just not always. >>> s='\U0001f680' >>> s '\U0001f680' >>> str(s) '\U0001f680' >>> repr(s) "'\U0001f680'" >>> print(s) # gives error above. >>> print(str(s)) #ditto I thought that implicit print of expression and overt print of the same expression were supposed to be the same. #21084 is also about this general issue. |
|||
| msg340675 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2019年04月22日 19:05 | |
On my puzzlement above: repr(s) is a string of 3 characters -- s bracketed by quote characters. print(repr(s)) fails. I am not sure how s gets expanded to the full escape in IDLE. ascii(s) expands all non-ascii and adds extra quotes. Need to check Shell code. In the python REPL, astral chars are not expanded to escape sequences. >>> s='\U0001f603' >>> s '😃' # Windows REPL shows two replacement boxes instead of 😃 #36698 is about astral chars in exceptions messages. >>> raise Exception(s) results in the Exception traceback, 3 Unicodedecode tracebacks, and a restart. |
|||
| msg340820 - (view) | Author: Martin Panter (martin.panter) * (Python committer) | Date: 2019年04月25日 00:55 | |
I haven’t looked at the code, but I suspect Idle implements a custom "sys.displayhook":
>>> help(sys.displayhook)
Help on function displayhook in module idlelib.rpc:
displayhook(value)
Override standard display hook to use non-locale encoding
>>> sys.displayhook('\N{ROCKET}')
'\U0001f680'
>>> sys.__displayhook__('\N{ROCKET}')
Traceback (most recent call last):
File "<pyshell#20>", line 1, in <module>
sys.__displayhook__('\N{ROCKET}')
File "/usr/lib/python3.5/idlelib/PyShell.py", line 1344, in write
return self.shell.write(s, self.tags)
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 1-1: Non-BMP character not supported in Tk
|
|||
| msg353926 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2019年10月04日 11:46 | |
Fixed by PR 16545 (see issue13153). |
|||
| msg353931 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2019年10月04日 11:59 | |
It was fixed for all valid Unicode characters, you can still get an error when print a surrogate character to the stderr on Linux:
>>> import sys
>>> print('\ud800', file=sys.stderr)
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
print('\ud800', file=sys.stderr)
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed
In the Python REPL you get an escaped sequence.
>>> import sys
>>> print('\ud800', file=sys.stderr)
\ud800
|
|||
| msg353963 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2019年10月04日 17:55 | |
Printing the unquoted escape representation rather than a replacement char is a bit strange and not what I expect from the python docs. I could see it as a bug. In any case, on Windows, it is the Python REPL that raises, but only for sys.stdout.
>>> import sys
>>> print('\ud800', file=sys.stderr)
\ud800
>>> print('\ud800', file=sys.stdout)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed
whereas on Windows the surrogate is displayed as a box with diagonal lines ([X] compressed in one char) in both cases. When copied and pasted into FireFox, the pasted surrogate shows as a square box containing mini D 8 0 0 chars.
>>> print('\ud800', file=sys.stdout)
�
>>> print('\ud800', file=sys.stderr)
�
I consider putting the undisplayable codepoint, rather than a replacement character, into the editor buffer (however tcl encodes it) so that IDLE can retrieve it without loss of information the proper thing for tk to do. IDLE can then potentially identify the character to the user.
===
An oddity though. With
>>> import tkinter as tk
>>> r = tk.Tk()
>>> t = tk.Text(r)
>>> t.pack()
>>> t.insert('insert', 'a\ud800b')
the box is an empty square, not crossed. But when I copy-paste 'a�b' into the font sample (Serhiy, making this editable was a great idea), it is crossed for every font I tried, even for Courier, which is what is being used in text t.
|
|||
| msg354193 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2019年10月08日 12:05 | |
And with PR 16583 it is now completely fixed. I.e. it can only fail in cases when the regular interactive interpreter fails too. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:09 | admin | set | github: 66931 |
| 2019年10月08日 12:05:45 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages: + msg354193 stage: needs patch -> resolved |
| 2019年10月04日 17:55:38 | terry.reedy | set | messages:
+ msg353963 stage: needs patch |
| 2019年10月04日 11:59:57 | serhiy.storchaka | set | status: closed -> open resolution: fixed -> (no value) messages: + msg353931 stage: resolved -> (no value) |
| 2019年10月04日 11:46:09 | serhiy.storchaka | set | status: open -> closed nosy: + serhiy.storchaka messages: + msg353926 resolution: fixed stage: needs patch -> resolved |
| 2019年04月25日 00:55:41 | martin.panter | set | nosy:
+ martin.panter messages: + msg340820 |
| 2019年04月22日 19:09:13 | terry.reedy | link | issue36698 superseder |
| 2019年04月22日 19:05:27 | terry.reedy | set | messages:
+ msg340675 versions: + Python 3.8, - Python 3.6 |
| 2017年06月19日 19:06:18 | terry.reedy | set | assignee: terry.reedy components: + IDLE, - Library (Lib) versions: + Python 3.6, Python 3.7, - Python 2.7, Python 3.4, Python 3.5 |
| 2015年12月06日 13:00:03 | THRlWiTi | set | nosy:
+ THRlWiTi |
| 2014年11月01日 00:36:20 | terry.reedy | set | versions:
+ Python 2.7, Python 3.4, Python 3.5 nosy: + terry.reedy messages: + msg230416 stage: needs patch |
| 2014年10月27日 16:18:24 | belopolsky | create | |