This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011年06月15日 20:59 by wujek.srujek, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| tcl_unicode_range.patch | vstinner, 2011年11月03日 19:54 | |||
| Messages (22) | |||
|---|---|---|---|
| msg138389 - (view) | Author: wujek (wujek.srujek) | Date: 2011年06月15日 20:59 | |
The following code produces an exception:
print('{:c}'.format(65536))
when executed in Idle 3.2. The stack trace:
>>> print('{:c}'.format(65536))
Traceback (most recent call last):
File "<pyshell#149>", line 1, in <module>
print('{:c}'.format(65536))
File "/usr/lib/python3.2/idlelib/PyShell.py", line 1231, in write
self.shell.write(s, self.tags)
File "/usr/lib/python3.2/idlelib/PyShell.py", line 1213, in write
OutputWindow.write(self, s, tags, "iomark")
File "/usr/lib/python3.2/idlelib/OutputWindow.py", line 40, in write
self.text.insert(mark, s, tags)
File "/usr/lib/python3.2/idlelib/Percolator.py", line 25, in insert
self.top.insert(index, chars, tags)
File "/usr/lib/python3.2/idlelib/ColorDelegator.py", line 79, in insert
self.delegate.insert(index, chars, tags)
File "/usr/lib/python3.2/idlelib/PyShell.py", line 316, in insert
UndoDelegator.insert(self, index, chars, tags)
File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 81, in insert
self.addcmd(InsertCommand(index, chars, tags))
File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 116, in addcmd
cmd.do(self.delegate)
File "/usr/lib/python3.2/idlelib/UndoDelegator.py", line 219, in do
text.insert(self.index1, self.chars, self.tags)
File "/usr/lib/python3.2/idlelib/ColorDelegator.py", line 79, in insert
self.delegate.insert(index, chars, tags)
File "/usr/lib/python3.2/idlelib/WidgetRedirector.py", line 104, in __call__
return self.tk_call(self.orig_and_operation + args)
ValueError: unsupported character
Seems to work fine in a terminal (Gnome-terminal in this case):
>>> print('{:c}'.format(0x10000))
𐀀
(my font doesn't have the glyph, but otherwise it works)
Python version:
>>> print(sys.version)
3.2 (r32:88445, Mar 25 2011, 19:56:22)
[GCC 4.5.2]
Os:
wujek@home:~$ uname -a
Linux studio 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux
wujek@home:~$ cat /etc/issue
Ubuntu 11.04
|
|||
| msg138390 - (view) | Author: R. David Murray (r.david.murray) * (Python committer) | Date: 2011年06月15日 21:10 | |
Judging from the stack trace, it isn't str.format that's failing, it's tk failing to display it. |
|||
| msg138392 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年06月15日 21:47 | |
U+10000 is not the most common character in fonts. You should try another character in U+10000-U+10FFFF range (non-BMP characters). The new funny emoticon are in this range, but I don't know if your Ubuntu setup includes a font supporting this range. http://www.unicode.org/charts/PDF/Unicode-6.0/U60-1F600.pdf |
|||
| msg138395 - (view) | Author: Ned Deily (ned.deily) * (Python committer) | Date: 2011年06月15日 21:59 | |
From the discussions here, http://wiki.tcl.tk/1364, it appears that Tcl 8.5 (and earlier) does not support Unicode code points outside the BMP range as in this example. I don't think there is anything practical IDLE or tkinter can do about that. |
|||
| msg138397 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年06月15日 22:01 | |
> From the discussions here, http://wiki.tcl.tk/1364, it appears that Tcl > 8.5 (and earlier) does not support Unicode code points outside > the BMP range as in this example. Extract of http://wiki.tcl.tk/1364 : "RS 2008年07月09日: Unicode out of BMP (> U+FFFF) requires a deeper rework of Tcl and Tk: we'd need 32 bit chars and/or surrogate pairs. UTF-8 at least can deal with 31-bit Unicodes by principle." > I don't think there is anything practical IDLE > or tkinter can do about that. We might raise an error with better error message than ValueError('unsupported character'), but it's maybe overkill. |
|||
| msg138402 - (view) | Author: Ned Deily (ned.deily) * (Python committer) | Date: 2011年06月15日 22:17 | |
It looks like that error message has been in _tkinter.c since 2002: http://svn.python.org/view/python/trunk/Modules/_tkinter.c?r1=28989&r2=28990& I suppose it could be slightly more informative but it seems pretty unambiguous to me. Martin, any opinions? |
|||
| msg138497 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年06月17日 10:54 | |
Instead of ValueError: unsupported character I suggest: ValueError: unsupported character (U+10000): Tcl doesn't support characters outside U+0000-U+FFFF range What do you think? |
|||
| msg138541 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2011年06月17日 18:31 | |
>ValueError: unsupported character (U+10000): Tcl doesn't support characters outside U+0000-U+FFFF range Slightly shorter and without the double :s. ValueError: character U+10000 is above the range (U+0000-U+FFFF) allowed by Tcl/Tk. I agree with a change like this. People are going to increasingly use non-BMP chars and need to find out that the problem is not our fault. |
|||
| msg146663 - (view) | Author: Ned Deily (ned.deily) * (Python committer) | Date: 2011年10月30日 21:46 | |
(Merging CC list from duplicate Issue13265. |
|||
| msg146665 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2011年10月30日 22:33 | |
Changing the error message sounds fine to me. People in need of the feature should lobby their system vendors to provide a Tcl build that uses a 32-bit Tcl_UniChar. Not sure whether it would actually render the string correctly, but at least it would be able to represent it correctly internally. |
|||
| msg146965 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年11月03日 19:54 | |
Here is the patch as a .patch file. |
|||
| msg146983 - (view) | Author: Martin v. Löwis (loewis) * (Python committer) | Date: 2011年11月03日 21:39 | |
I'm not sure whether the wording is good English, but apart from that, the patch looks fine. |
|||
| msg146984 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2011年11月03日 21:49 | |
The patch implements my suggestion. Looking again, I think the English is fine ;-). |
|||
| msg146987 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2011年11月03日 22:14 | |
You could say "Unicode character ..." in the error to make clear what kind of range is U+0000-U+FFFF (people that are not familiar with Unicode and BMP chars might wonder if that's some tcl/tk thing). |
|||
| msg146991 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2011年11月03日 23:42 | |
New changeset 9a07b73abdb1 by Victor Stinner in branch '3.2': Issue #12342: Improve _tkinter error message on unencodable character http://hg.python.org/cpython/rev/9a07b73abdb1 New changeset 5aea95d41ad2 by Victor Stinner in branch 'default': (Merge 3.2) Issue #12342: Improve _tkinter error message on unencodable character http://hg.python.org/cpython/rev/5aea95d41ad2 |
|||
| msg146992 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2011年11月03日 23:49 | |
_tkinter now raises ValueError("character U+10ffff is above the range (U+0000-U+FFFF) allowed by Tcl").
> You could say "Unicode character ..." in the error to make clear
> what kind of range is U+0000-U+FFFF (people that are not familiar
> with Unicode and BMP chars might wonder if that's some tcl/tk thing).
I consider that U+10ffff in "character U+10ffff" is enough to specify that it is a Unicode character. Even if you don't understand Unicode, you can at least computer numbers (0x10ffff is not in range [0x0000; 0xFFFF]) ;-)
|
|||
| msg146994 - (view) | Author: Florent Xicluna (flox) * (Python committer) | Date: 2011年11月04日 00:27 | |
Failed to build these modules: (3.3 on Snow Leopard) _tkinter ./cpython/Modules/_tkinter.c: In function ‘AsObj’: ./cpython/Modules/_tkinter.c:996: warning: dereferencing ‘void *’ pointer ./cpython/Modules/_tkinter.c:996: error: invalid use of void expression |
|||
| msg146999 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2011年11月04日 08:49 | |
New changeset 5f49b496d161 by Victor Stinner in branch 'default': Issue #12342: Fix compilation on Mac OS X http://hg.python.org/cpython/rev/5f49b496d161 |
|||
| msg154966 - (view) | Author: Terry J. Reedy (terry.reedy) * (Python committer) | Date: 2012年03月05日 17:59 | |
In responding to #14200, it occurred to me that better than an exception would be doing what the interpreter does in Command Prompt window, which is expand high chars to '\U0001xxxx' escaped form. |
|||
| msg155414 - (view) | Author: Roger Serwy (roger.serwy) * (Python committer) | Date: 2012年03月11日 22:11 | |
I agree with Terry. The current behavior of raising ValueError will lead to problems in application code in the future if Tkinter gets fixed such that it can render Unicode properly beyond 0xFFFF. |
|||
| msg155804 - (view) | Author: Andrew Svetlov (asvetlov) * (Python committer) | Date: 2012年03月14日 21:48 | |
Fixed in #14200 |
|||
| msg155809 - (view) | Author: Roger Serwy (roger.serwy) * (Python committer) | Date: 2012年03月14日 22:15 | |
Rather than raising a ValueError, would UnicodeEncodeError be more appropriate? I admit that this suggestion may be bike shedding. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:18 | admin | set | github: 56551 |
| 2012年03月14日 22:15:48 | roger.serwy | set | messages: + msg155809 |
| 2012年03月14日 21:48:11 | asvetlov | set | status: open -> closed assignee: asvetlov versions: - Python 2.7, Python 3.2 messages: + msg155804 superseder: Idle shell crash on printing non-BMP unicode character resolution: fixed -> duplicate stage: commit review -> resolved |
| 2012年03月12日 18:52:35 | asvetlov | set | nosy:
+ asvetlov |
| 2012年03月11日 22:11:41 | roger.serwy | set | messages: + msg155414 |
| 2012年03月05日 17:59:56 | terry.reedy | set | messages: + msg154966 |
| 2011年11月04日 08:49:30 | python-dev | set | messages: + msg146999 |
| 2011年11月04日 00:27:38 | flox | set | status: closed -> open nosy: + flox messages: + msg146994 |
| 2011年11月03日 23:49:35 | vstinner | set | status: open -> closed resolution: fixed messages: + msg146992 |
| 2011年11月03日 23:42:25 | python-dev | set | nosy:
+ python-dev messages: + msg146991 |
| 2011年11月03日 22:14:50 | ezio.melotti | set | messages: + msg146987 |
| 2011年11月03日 21:49:02 | terry.reedy | set | messages:
+ msg146984 stage: commit review |
| 2011年11月03日 21:39:33 | loewis | set | messages: + msg146983 |
| 2011年11月03日 19:54:37 | vstinner | set | files:
+ tcl_unicode_range.patch keywords: + patch messages: + msg146965 |
| 2011年10月30日 22:33:31 | loewis | set | messages: + msg146665 |
| 2011年10月30日 21:46:55 | ned.deily | set | nosy:
+ kbk, ezio.melotti, roger.serwy, Ramchandra Apte messages: + msg146663 |
| 2011年10月30日 21:45:57 | ned.deily | link | issue13265 superseder |
| 2011年06月17日 18:31:10 | terry.reedy | set | messages:
+ msg138541 components: + Tkinter, - IDLE, IO versions: + Python 2.7, Python 3.3 |
| 2011年06月17日 10:54:44 | vstinner | set | messages: + msg138497 |
| 2011年06月16日 02:20:54 | eric.smith | set | nosy:
+ eric.smith |
| 2011年06月15日 22:17:20 | ned.deily | set | nosy:
+ loewis messages: + msg138402 |
| 2011年06月15日 22:01:49 | vstinner | set | messages: + msg138397 |
| 2011年06月15日 21:59:06 | ned.deily | set | nosy:
+ ned.deily messages: + msg138395 |
| 2011年06月15日 21:47:17 | vstinner | set | nosy:
+ vstinner messages: + msg138392 |
| 2011年06月15日 21:10:07 | r.david.murray | set | nosy:
+ r.david.murray, terry.reedy messages: + msg138390 title: characters with ord above 65535 fail conversion with str.format for '{:c}' in IDLE -> characters with ord above 65535 fail to display in IDLE |
| 2011年06月15日 20:59:59 | wujek.srujek | create | |