homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: curses implementation of Unicode is wrong in Python 3
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, Nicholas.Cole, akuchling, cben, eric.araujo, gpolo, inigoserna, jcea, john.feuerstein, nadeem.vawda, ned.deily, petri.lehtinen, pitrou, python-dev, r.david.murray, schodet, vstinner, zeha
Priority: normal Keywords: patch

Created on 2011年07月14日 22:33 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
getkey.patch vstinner, 2011年07月14日 23:09 review
curses_unicode.patch vstinner, 2011年07月19日 00:19 review
Messages (37)
msg140375 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月14日 22:33
curses functions accepting strings encode implicitly character strings to UTF-8. This is wrong. We should add a function to set the encoding (see issue #6745) or use the wide character C functions. I don't think that UTF-8 is the right default encoding, I suppose that the locale encoding is a better choice.
Accepting characters (and character strings) but calling byte functions is wrong. For example, addch('é') doesn't work with UTF-8 locale encoding. It calls waddch(0xE9) (é is U+00E9), whereas waddch(0xC3)+waddch(0xA9) should be called. Workaround in Python:
 for byte in 'é'.encode('utf-8'):
 win.addch(byte)
I see two possible solutions:
A) Add a new functions only accepting characters, and not accept characters in the existing functions
B) The function should be fixed to call the right C function depending on the input type. For example, Python addch(10) and addch(b'\n') would call waddch(10), whereas addch('é') would call wadd_wch(233).
I prefer solution (B) because addch('é') would just work as expected.
msg140379 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月14日 23:09
getkey.patch fixes window.getkey(): use get_wch() instead of getch() to handle correctly non-ASCII characters. I tested with the key é (U+00E9) with ISO-8859-1 and UTF-8 locale encoding: getkey() gives the expected result (but addstr is unable to display it, because addstr encodes the string to UTF-8 instead of the locale encoding).
msg140405 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月15日 13:33
Oh, by the way: do all platforms have wide character functions? I don't see any failure on our Python 3.x buildbots, but test_curses is skipped on many buildbots.
msg140406 - (view) Author: Nicholas Cole (Nicholas.Cole) Date: 2011年07月15日 13:56
I think that some platforms do not have wide character support, though I could be wrong. The FAQ here: http://invisible-island.net/ncurses/ncurses.faq.html has a list of those that do and those that don't, but I don't know how up to date it is.
msg140411 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月15日 14:39
> by the way: do all platforms have wide character functions?
See msg140408 and msg140409: Antoine Pitrou (OS=Mageia 1) and some buildbots don't have get_wch().
msg140637 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月19日 00:19
Patch the _curses module to improve Unicode support:
 - add an encoding attribute to a window (only visible in C): read the locale encoding
 - encode a character and a character string to the window encoding if the ncursesw library is NOT used
 - addch(), addstr(), addnstr(), insstr() and insnstr() use the wide character functions if the ncursesw library is used
 - PyCurses_ConvertToChtype() checks for integer overflow and rejects values outside [0; 255]
The check on the ncursesw library availability is done in setup.py because the library linked to _curses depends on the readline library (see issues #7384 and #9408).
I don't know if wide character functions can be available in curses or ncurses library.
Details:
 - locale encoding: use GetConsoleOutputCP() on Windows, nl_langinfo(CODESET) if available, or "utf-8"
 - don't encode a character to the window encoding if its code is in [0; 127] (use the Unicode point code): all encoding are compatible with ASCII... except some encodings like JIS X 0201. In JIS, 0x5C is decoded to the yen sign (U+00A5) instead of a backslash (U+005C).
 - if an encoded character is longer than 1 byte, raise a OverflowError. For example, U+00E9 (é) encoded to UTF-8 gives b'\xC3\xA9' (two bytes).
 - copy the encoding when creating a subwindow.
 - use a global variable, screen_encoding, in PyCurses_UnCtrl() and PyCurses_UngetCh()
It's not possible to specify an encoding.
GetConsoleOutputCP() is maybe not the right code on Windows if a text application doesn't run in a Windows console (e.g. if it uses its own terminal emulator). GetOEMCP() is maybe a better choice, or a function should be added to specify the encoding used by the _curses module (override the "locale encoding").
If a function is added to specify the encoding, I think that it is better to add a global function instead of adding an argument to functions creating a new window object (initscr(), getwin(), subwin(), derwin(), newpad()).
msg140638 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月19日 00:26
Using curses_unicode.patch:
 - without ncursesw: addch('é') raises an OverflowError because 'é'.encode('UTF-8') is 2 bytes and not 1 byte
 - with ncursesw: the charset is displayable character depends on the locale encoding (e.g. € cannot be printed with ISO-8859-1 locale encoding)
 - with ncursesw: any character can be printed with a UTF-8 locale encoding (including non-BMP characters: U-10000..U+10FFFF)
It would be possible to support multibyte encoded character (like é in UTF-8) for addch() by calling addch() multiple times, one per byte, but I would prefer to keep _curses simple and not workaround libncurses limitations (bugs).
msg140639 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年07月19日 00:28
See also #6755 (curses.get_wch).
msg141462 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年07月31日 13:06
New changeset d98b5e0f0862 by Nadeem Vawda in branch 'default':
Fix build error in _curses module when not using libncursesw.
http://hg.python.org/cpython/rev/d98b5e0f0862 
msg141465 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011年07月31日 13:18
Following d98b5e0f0862, I have been able to successfully build the curses
module with curses_unicode.patch applied.
msg141466 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011年07月31日 13:19
Ack sorry, forgot to give context - my machine doesn't have libncursesw,
so the curses module failed to build before that commit (with or without
the patch applied).
msg141771 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年08月08日 11:22
See also #10570.
msg142283 - (view) Author: Nicholas Cole (Nicholas.Cole) Date: 2011年08月17日 15:49
There are now several bugs dealing with related issues here. Are we any closer to a solution to any of them? The suggested patches look like a good idea - what needs to happen for them to move forward?
msg142289 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年08月17日 18:35
> what needs to happen for them to move forward?
I would like a review of curses_unicode.patch.
msg143574 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年09月05日 23:53
New changeset b1e03d10391e by Victor Stinner in branch 'default':
Issue #12567: Add curses.unget_wch() function
http://hg.python.org/cpython/rev/b1e03d10391e 
msg143575 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年09月06日 00:05
I'm not sure that it is correct to call nl_langinfo(CODESET) to get the locale encoding. The LC_CTYPE locale should maybe be set temporary to the current locale (""), as does locale.getpreferredencoding(). Or maybe better, locale.getpreferredencoding() should be called.
msg143576 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年09月06日 00:06
> The LC_CTYPE locale should maybe be set temporary to
> the current locale (""), as does locale.getpreferredencoding().
See also issue #6203.
msg143589 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年09月06日 08:08
New changeset 786668a4fb6b by Victor Stinner in branch 'default':
Issue #12567: Fix curses.unget_wch() tests
http://hg.python.org/cpython/rev/786668a4fb6b 
msg148361 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年11月25日 21:08
New changeset c3581ca21a57 by Victor Stinner in branch 'default':
Issue #12567: The curses module uses Unicode functions for Unicode arguments
http://hg.python.org/cpython/rev/c3581ca21a57 
msg148365 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011年11月25日 22:38
This broke several Gentoo buildbots.
msg148429 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年11月26日 23:21
New changeset 919259054621 by Victor Stinner in branch 'default':
Issue #13415: Help to locate curses.h when _curses module is linked to ncursesw
http://hg.python.org/cpython/rev/919259054621
(Oops, wrong issue number, again)
msg148430 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年11月26日 23:26
> This broke several Gentoo buildbots.
setup.py is unable to locate correctly curses.h. I added a hack to always search in /usr/include/ncursesw/. The hack is needed on Ubuntu 11.10 if you only have libncursesw5-dev but not libncursesw-dev for example.
msg148452 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011年11月27日 15:04
I am still concerned about the compilation warning in OpenIndiana buildbots :-(
msg148468 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年11月28日 06:31
Compile output on OpenSolaris:
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
ld: fatal: file /usr/local/lib/libncursesw.so: wrong ELF class: ELFCLASS32
ld: fatal: file processing errors. No output written to build/lib.solaris-2.11-i86pc-3.3-pydebug/readline.so
collect2: ld returned 1 exit status
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:279: error: expected declaration specifiers or '...' before 'cchar_t'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCurses_ConvertToCchar_t':
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:298: error: 'wch' undeclared (first use in this function)
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:298: error: (Each undeclared identifier is reported only once
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:298: error: for each function it appears in.)
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_AddCh':
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:584: error: 'cchar_t' undeclared (first use in this function)
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:584: error: expected ';' before 'wch'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:618: error: 'wch' undeclared (first use in this function)
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:618: error: too many arguments to function 'PyCurses_ConvertToCchar_t'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:623: warning: implicit declaration of function 'mvwadd_wch'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:625: warning: implicit declaration of function 'wadd_wch'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_AddStr':
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:702: warning: implicit declaration of function 'mvwaddwstr'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:704: warning: implicit declaration of function 'waddwstr'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_AddNStr':
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:779: warning: implicit declaration of function 'mvwaddnwstr'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:781: warning: implicit declaration of function 'waddnwstr'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_Get_WCh':
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1187: warning: implicit declaration of function 'wget_wch'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1194: warning: implicit declaration of function 'mvwget_wch'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_InsStr':
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1468: warning: implicit declaration of function 'mvwins_wstr'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1470: warning: implicit declaration of function 'wins_wstr'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCursesWindow_InsNStr':
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1546: warning: implicit declaration of function 'mvwins_nwstr'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:1548: warning: implicit declaration of function 'wins_nwstr'
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c: In function 'PyCurses_Unget_Wch':
/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Modules/_cursesmodule.c:3130: warning: implicit declaration of function 'unget_wch'
ld: fatal: file /usr/local/lib/libpanelw.so: wrong ELF class: ELFCLASS32
ld: fatal: file /usr/local/lib/libncursesw.so: wrong ELF class: ELFCLASS32
ld: fatal: file processing errors. No output written to build/lib.solaris-2.11-i86pc-3.3-pydebug/_curses_panel.so
collect2: ld returned 1 exit status
msg148469 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年11月28日 06:33
New changeset bf51e32b2a81 by Victor Stinner in branch 'default':
Issue #13415: test_curses skips unencodable characters
http://hg.python.org/cpython/rev/bf51e32b2a81
(Oops, I copy-pasted the issue number from my previous commit, and the issue number was wrong...)
msg149008 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年12月08日 00:53
> I am still concerned about the compilation warning in OpenIndiana buildbots :-(
I'm unable to reproduce the issue in my OpenIndiana VM: the compilaton of the _curses module fail, not because of Unicode, but because mvwchgat() function is missing => see the issue #3786. I don't know how to install ncursesw on OpenIndiana, I didn't find an official package using pkg search.
curses issues on OpenIndiana are serious enough to have their own issue: I opened the issue #13552.
msg149012 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年12月08日 01:07
The code has been commited. The remaining task is to fix OpenIndiana issues: see #13552.
msg149110 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011年12月09日 17:16
Victor, I have these notes I wrote down when I set up the OpenIndiana buildbots. Maybe can be useful to you: (compiling from source)
"""
 * ncurses 5.7: Instalación estándar "./configure --with-shared --without-normal --enable-widec --without-cxx-binding". Al curses que viene con OpenIndiana le faltan un par de funciones: "mvwchgat" y "wchgat".
"""
I installed ncurses because the lack of "mvwchgat" and "wchgat".
When compiling Python, I add export "CFLAGS=-I/usr/local/include/ncursesw" to help it to find the right lib.
Hope to be useful.
msg149111 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年12月09日 17:31
> I wrote down when I set up the OpenIndiana buildbots
Hum, please use the issue #13552 for curses issues on OpenIndiana/Solaris.
> ... de funciones: "mvwchgat" y "wchgat"
See issues #3786 and #13552 for this problem.
> I installed ncurses ... I add export "CFLAGS=-I/usr/local/include/ncursesw"
The curses module is compiled by setup.py, not Makefile. It looks that setup.py ignores CFLAGS. I don't know if setup.py permits to specify such option.
msg154477 - (view) Author: Nicholas Cole (Nicholas.Cole) Date: 2012年02月27日 13:04
It looks to me as if the documentation in the release candidates for 2.7.3 and 3.2.3 haven't been updated to include the new curses fixes. Is that correct?
msg154478 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012年02月27日 13:13
Yes, it was only fixed for 3.3.
msg157627 - (view) Author: Nicholas Cole (Nicholas.Cole) Date: 2012年04月05日 21:45
Testing the Python3.3a2 build on OS X - the exception 
AttributeError: '_curses.curses window' object has no attribute 'get_wch'
is still being raised. I don't have a Linux build I can easily test with. Is this a particular problem with the OS X build?
msg157628 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2012年04月05日 21:48
> AttributeError: '_curses.curses window' object has no attribute 'get_wch'
> is still being raised.
"still"? Did it work before my last changes?
Unicode functions of the (n)curses library are only available if the Python curses module is linked to libncursesw.
Is libncursesw available? Is libreadline linked to libncurses or libncursesw? If libreadline is linked to libncurses, the Python curses module is also linked to libncurses.
msg157636 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2012年04月06日 02:28
Nicholas, please open a new issue documenting which Python 3.3 you are using, from which python.org installer or the ./configure parameters if you built it yourself (and whether you supplied a version of GNU readline or used the Apple default of BSD libedit) and an example of how to reproduce the error. Please don't add to closed issues. Note also there is a known open issue with the 32-bit-only OS X installer for 3.3 where the _curses module does not build (Issue14225) with an older version of GNU readline.
msg163306 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年06月21日 06:48
New changeset 2035c5ad4239 by Ned Deily in branch 'default':
Issue #14225: Fix Unicode support for curses (#12567) on OS X:
http://hg.python.org/cpython/rev/2035c5ad4239 
msg163308 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2012年06月21日 07:10
It turns out that the Unicode support introduced by this issue didn't build correctly on OS X, either silently failing to build (explaining the problem seen by Nicholas) or causing a compile error (as seen in Issue14225). This should be working OK (as of 3.3.0b1).
BTW, a test of the wide char functions would be nice and might have caught this.
msg163362 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2012年06月21日 20:13
See also Issue15037 which documents a broken curses.unget_wch and, hence, test_curses when Python is built with ncurses 5.7 or earlier.
History
Date User Action Args
2022年04月11日 14:57:19adminsetgithub: 56776
2012年06月21日 20:13:09ned.deilysetmessages: + msg163362
2012年06月21日 07:10:52ned.deilysetmessages: + msg163308
2012年06月21日 06:48:40python-devsetmessages: + msg163306
2012年04月06日 02:28:27ned.deilysetnosy: + ned.deily
messages: + msg157636
2012年04月05日 21:48:23vstinnersetmessages: + msg157628
2012年04月05日 21:45:44Nicholas.Colesetmessages: + msg157627
2012年02月27日 13:13:24eric.araujosetnosy: + eric.araujo

messages: + msg154478
stage: resolved
2012年02月27日 13:04:57Nicholas.Colesetmessages: + msg154477
2011年12月09日 17:31:45vstinnersetmessages: + msg149111
2011年12月09日 17:16:19jceasetmessages: + msg149110
2011年12月08日 01:07:21vstinnersetstatus: open -> closed
resolution: fixed
2011年12月08日 01:07:13vstinnersetmessages: + msg149012
2011年12月08日 00:53:27vstinnersetmessages: + msg149008
2011年11月28日 06:33:47vstinnersetmessages: + msg148469
2011年11月28日 06:31:05vstinnersetmessages: + msg148468
2011年11月27日 15:04:29jceasetmessages: + msg148452
2011年11月26日 23:26:08vstinnersetmessages: + msg148430
2011年11月26日 23:21:19vstinnersetmessages: + msg148429
2011年11月25日 22:38:22pitrousetnosy: + pitrou
messages: + msg148365
2011年11月25日 21:08:33python-devsetmessages: + msg148361
2011年11月07日 13:01:33john.feuersteinsetnosy: + john.feuerstein
2011年10月28日 08:17:40petri.lehtinensetnosy: + petri.lehtinen
2011年09月09日 19:57:35jceasetnosy: + jcea
2011年09月06日 08:08:41python-devsetmessages: + msg143589
2011年09月06日 00:06:34vstinnersetmessages: + msg143576
2011年09月06日 00:05:08vstinnersetmessages: + msg143575
2011年09月05日 23:53:32python-devsetmessages: + msg143574
2011年08月17日 18:35:00vstinnersetmessages: + msg142289
2011年08月17日 15:49:38Nicholas.Colesetmessages: + msg142283
2011年08月08日 11:22:09vstinnersetmessages: + msg141771
2011年07月31日 13:19:27nadeem.vawdasetmessages: + msg141466
2011年07月31日 13:18:34nadeem.vawdasetnosy: + nadeem.vawda
messages: + msg141465
2011年07月31日 13:06:29python-devsetmessages: + msg141462
2011年07月19日 00:28:16vstinnersetmessages: + msg140639
2011年07月19日 00:26:28vstinnersetmessages: + msg140638
2011年07月19日 00:19:51vstinnersetfiles: + curses_unicode.patch

messages: + msg140637
2011年07月15日 14:39:08vstinnersetmessages: + msg140411
2011年07月15日 13:56:47Nicholas.Colesetmessages: + msg140406
2011年07月15日 13:33:55vstinnersetmessages: + msg140405
2011年07月15日 04:21:16Arfreversetnosy: + Arfrever
2011年07月14日 23:09:33vstinnersetfiles: + getkey.patch
keywords: + patch
messages: + msg140379
2011年07月14日 22:33:36vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /