homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: characters with ord above 65535 fail to display in IDLE
Type: behavior Stage: resolved
Components: Tkinter Versions: Python 3.3
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Idle shell crash on printing non-BMP unicode character
View: 14200
Assigned To: asvetlov Nosy List: Ramchandra Apte, asvetlov, eric.smith, ezio.melotti, flox, kbk, loewis, ned.deily, python-dev, r.david.murray, roger.serwy, terry.reedy, vstinner, wujek.srujek
Priority: normal Keywords: patch

Created on 2011年06月15日 20:59 by wujek.srujek, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tcl_unicode_range.patch vstinner, 2011年11月03日 19:54
Messages (22)
msg138389 - (view) Author: wujek (wujek.srujek) Date: 2011年06月15日 20:59
The following code produces an exception:
print('{:c}'.format(65536))
when executed in Idle 3.2. The stack trace:
>>> print('{:c}'.format(65536))
Traceback (most recent call last):
 File "<pyshell#149>", line 1, in <module>
 print('{:c}'.format(65536))
 File "/usr/lib/python3.2/idlelib/PyShell.py", line 1231, in write
 self.shell.write(s, self.tags)
 File "/usr/lib/python3.2/idlelib/PyShell.py", line 1213, in write
 OutputWindow.write(self, s, tags, "iomark")
 File "/usr/lib/python3.2/idlelib/OutputWindow.py", line 40, in write
 self.text.insert(mark, s, tags)
 File "/usr/lib/python3.2/idlelib/Percolator.py", line 25, in insert
 self.top.insert(index, chars, tags)
 File "/usr/lib/python3.2/idlelib/ColorDelegator.py", line 79, in insert
 self.delegate.insert(index, chars, tags)
 File "/usr/lib/python3.2/idlelib/PyShell.py", line 316, in insert
 UndoDelegator.insert(self, index, chars, tags)
 File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 81, in insert
 self.addcmd(InsertCommand(index, chars, tags))
 File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 116, in addcmd
 cmd.do(self.delegate)
 File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 219, in do
 text.insert(self.index1, self.chars, self.tags)
 File "/usr/lib/python3.2/idlelib/ColorDelegator.py", line 79, in insert
 self.delegate.insert(index, chars, tags)
 File "/usr/lib/python3.2/idlelib/WidgetRedirector.py", line 104, in __call__
 return self.tk_call(self.orig_and_operation + args)
ValueError: unsupported character
Seems to work fine in a terminal (Gnome-terminal in this case):
>>> print('{:c}'.format(0x10000))
𐀀
(my font doesn't have the glyph, but otherwise it works)
Python version:
>>> print(sys.version)
3.2 (r32:88445, Mar 25 2011, 19:56:22) 
[GCC 4.5.2]
Os:
wujek@home:~$ uname -a
Linux studio 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
wujek@home:~$ cat /etc/issue
Ubuntu 11.04
msg138390 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011年06月15日 21:10
Judging from the stack trace, it isn't str.format that's failing, it's tk failing to display it.
msg138392 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月15日 21:47
U+10000 is not the most common character in fonts. You should try another character in U+10000-U+10FFFF range (non-BMP characters). The new funny emoticon are in this range, but I don't know if your Ubuntu setup includes a font supporting this range.
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-1F600.pdf 
msg138395 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011年06月15日 21:59
From the discussions here, http://wiki.tcl.tk/1364, it appears that Tcl 8.5 (and earlier) does not support Unicode code points outside the BMP range as in this example. I don't think there is anything practical IDLE or tkinter can do about that.
msg138397 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月15日 22:01
> From the discussions here, http://wiki.tcl.tk/1364, it appears that Tcl
> 8.5 (and earlier) does not support Unicode code points outside
> the BMP range as in this example.
Extract of http://wiki.tcl.tk/1364 :
"RS 2008年07月09日: Unicode out of BMP (> U+FFFF) requires a deeper rework of Tcl and Tk: we'd need 32 bit chars and/or surrogate pairs. UTF-8 at least can deal with 31-bit Unicodes by principle."
> I don't think there is anything practical IDLE
> or tkinter can do about that.
We might raise an error with better error message than ValueError('unsupported character'), but it's maybe overkill.
msg138402 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011年06月15日 22:17
It looks like that error message has been in _tkinter.c since 2002: http://svn.python.org/view/python/trunk/Modules/_tkinter.c?r1=28989&r2=28990&;
I suppose it could be slightly more informative but it seems pretty unambiguous to me. Martin, any opinions?
msg138497 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月17日 10:54
Instead of
 ValueError: unsupported character
I suggest:
 ValueError: unsupported character (U+10000): Tcl doesn't support characters outside U+0000-U+FFFF range
What do you think?
msg138541 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011年06月17日 18:31
>ValueError: unsupported character (U+10000): Tcl doesn't support characters outside U+0000-U+FFFF range
Slightly shorter and without the double :s.
ValueError: character U+10000 is above the range (U+0000-U+FFFF) allowed by Tcl/Tk.
I agree with a change like this. People are going to increasingly use non-BMP chars and need to find out that the problem is not our fault.
msg146663 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011年10月30日 21:46
(Merging CC list from duplicate Issue13265.
msg146665 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011年10月30日 22:33
Changing the error message sounds fine to me.
People in need of the feature should lobby their system vendors to provide a Tcl build that uses a 32-bit Tcl_UniChar. Not sure whether it would actually render the string correctly, but at least it would be able to represent it correctly internally.
msg146965 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年11月03日 19:54
Here is the patch as a .patch file.
msg146983 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011年11月03日 21:39
I'm not sure whether the wording is good English, but apart from that, the patch looks fine.
msg146984 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011年11月03日 21:49
The patch implements my suggestion. Looking again, I think the English is fine ;-).
msg146987 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011年11月03日 22:14
You could say "Unicode character ..." in the error to make clear what kind of range is U+0000-U+FFFF (people that are not familiar with Unicode and BMP chars might wonder if that's some tcl/tk thing).
msg146991 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年11月03日 23:42
New changeset 9a07b73abdb1 by Victor Stinner in branch '3.2':
Issue #12342: Improve _tkinter error message on unencodable character
http://hg.python.org/cpython/rev/9a07b73abdb1
New changeset 5aea95d41ad2 by Victor Stinner in branch 'default':
(Merge 3.2) Issue #12342: Improve _tkinter error message on unencodable character
http://hg.python.org/cpython/rev/5aea95d41ad2 
msg146992 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年11月03日 23:49
_tkinter now raises ValueError("character U+10ffff is above the range (U+0000-U+FFFF) allowed by Tcl").
> You could say "Unicode character ..." in the error to make clear
> what kind of range is U+0000-U+FFFF (people that are not familiar
> with Unicode and BMP chars might wonder if that's some tcl/tk thing).
I consider that U+10ffff in "character U+10ffff" is enough to specify that it is a Unicode character. Even if you don't understand Unicode, you can at least computer numbers (0x10ffff is not in range [0x0000; 0xFFFF]) ;-)
msg146994 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2011年11月04日 00:27
Failed to build these modules: (3.3 on Snow Leopard)
_tkinter
./cpython/Modules/_tkinter.c: In function ‘AsObj’:
./cpython/Modules/_tkinter.c:996: warning: dereferencing ‘void *’ pointer
./cpython/Modules/_tkinter.c:996: error: invalid use of void expression
msg146999 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年11月04日 08:49
New changeset 5f49b496d161 by Victor Stinner in branch 'default':
Issue #12342: Fix compilation on Mac OS X
http://hg.python.org/cpython/rev/5f49b496d161 
msg154966 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012年03月05日 17:59
In responding to #14200, it occurred to me that better than an exception would be doing what the interpreter does in Command Prompt window, which is expand high chars to '\U0001xxxx' escaped form.
msg155414 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012年03月11日 22:11
I agree with Terry. The current behavior of raising ValueError will lead to problems in application code in the future if Tkinter gets fixed such that it can render Unicode properly beyond 0xFFFF.
msg155804 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012年03月14日 21:48
Fixed in #14200 
msg155809 - (view) Author: Roger Serwy (roger.serwy) * (Python committer) Date: 2012年03月14日 22:15
Rather than raising a ValueError, would UnicodeEncodeError be more appropriate? I admit that this suggestion may be bike shedding.
History
Date User Action Args
2022年04月11日 14:57:18adminsetgithub: 56551
2012年03月14日 22:15:48roger.serwysetmessages: + msg155809
2012年03月14日 21:48:11asvetlovsetstatus: open -> closed

assignee: asvetlov
versions: - Python 2.7, Python 3.2
messages: + msg155804
superseder: Idle shell crash on printing non-BMP unicode character
resolution: fixed -> duplicate
stage: commit review -> resolved
2012年03月12日 18:52:35asvetlovsetnosy: + asvetlov
2012年03月11日 22:11:41roger.serwysetmessages: + msg155414
2012年03月05日 17:59:56terry.reedysetmessages: + msg154966
2011年11月04日 08:49:30python-devsetmessages: + msg146999
2011年11月04日 00:27:38floxsetstatus: closed -> open
nosy: + flox
messages: + msg146994

2011年11月03日 23:49:35vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg146992
2011年11月03日 23:42:25python-devsetnosy: + python-dev
messages: + msg146991
2011年11月03日 22:14:50ezio.melottisetmessages: + msg146987
2011年11月03日 21:49:02terry.reedysetmessages: + msg146984
stage: commit review
2011年11月03日 21:39:33loewissetmessages: + msg146983
2011年11月03日 19:54:37vstinnersetfiles: + tcl_unicode_range.patch
keywords: + patch
messages: + msg146965
2011年10月30日 22:33:31loewissetmessages: + msg146665
2011年10月30日 21:46:55ned.deilysetnosy: + kbk, ezio.melotti, roger.serwy, Ramchandra Apte
messages: + msg146663
2011年10月30日 21:45:57ned.deilylinkissue13265 superseder
2011年06月17日 18:31:10terry.reedysetmessages: + msg138541
components: + Tkinter, - IDLE, IO
versions: + Python 2.7, Python 3.3
2011年06月17日 10:54:44vstinnersetmessages: + msg138497
2011年06月16日 02:20:54eric.smithsetnosy: + eric.smith
2011年06月15日 22:17:20ned.deilysetnosy: + loewis
messages: + msg138402
2011年06月15日 22:01:49vstinnersetmessages: + msg138397
2011年06月15日 21:59:06ned.deilysetnosy: + ned.deily
messages: + msg138395
2011年06月15日 21:47:17vstinnersetnosy: + vstinner
messages: + msg138392
2011年06月15日 21:10:07r.david.murraysetnosy: + r.david.murray, terry.reedy

messages: + msg138390
title: characters with ord above 65535 fail conversion with str.format for '{:c}' in IDLE -> characters with ord above 65535 fail to display in IDLE
2011年06月15日 20:59:59wujek.srujekcreate

AltStyle によって変換されたページ (->オリジナル) /