homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Tkinter: handle the null character
Type: behavior Stage: resolved
Components: Extension Modules, Tkinter Versions: Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: gpolo, kbk, loewis, python-dev, roger.serwy, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2014年01月23日 15:15 by serhiy.storchaka, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tkinter_null_character.patch serhiy.storchaka, 2014年01月23日 15:15 review
tkinter_null_character_2.patch serhiy.storchaka, 2014年02月03日 10:08 review
Messages (7)
msg208954 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年01月23日 15:15
Tcl/Tk uses modified UTF-8 encoding to represent strings as C strings (char*). Because C strings are NUL-terminated, the null character represented as illegal UTF-8 sequence \xc0\x80.
Current Tkinter code is not very aware about this. It has special handling the "\xc0\x80" string (i.e. encoded single null character) in one place, but doesn't handle encoded null character contained in larger string. As result Tkinter may truncate strings contained the null character, or return wrong result.
The proposed patch fixes many issues with the null character (converting from Tcl to Python strings). NUL is still forbidden in string arguments of many methods.
Also the patch enhances error handling for variable-related commands.
msg210012 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年02月02日 20:44
If there are no objections I'll commit this patch tomorrow.
msg210075 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014年02月03日 02:21
The core of the patch is a wrapper that traps UnicodeDecodeErrors, corrects the strings, and re-decodes. A Python version might look like
def unicodeFromTclStringAndSize(s, size):
 try:
 return <PyUnicode_DecodeUTF8(s, size, NULL)>
 except UnicodeDecodeError:
 if b'\xc0\x80' in s:
 s.replace(b'\xc0\x80', b'\x00')
 return <PyUnicode_DecodeUTF8(s, size, NULL)>
 else:
 raise
This is used in a couple of additional wrappers and all direct decode calls are replaced with wrappers. New tests are added. Overall, a great idea, and I want to see this patch in 3.4. But, how many of the replacement sites are exercised by the tests?
There are a few changes that seem unrelated to nulls, which might have been left for another patch. Example:
-#if TCL_UTF_MAX==3
 return PyUnicode_FromKindAndData(
- PyUnicode_2BYTE_KIND, Tcl_GetUnicode(value),
+ sizeof(Tcl_UniChar), Tcl_GetUnicode(value),
 Tcl_GetCharLength(value));
-#else
- return PyUnicode_FromKindAndData(
- PyUnicode_4BYTE_KIND, Tcl_GetUnicode(value),
- Tcl_GetCharLength(value));
-#endif
Do you know if this code block is tested.
msg210106 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年02月03日 10:08
> But, how many of the replacement sites are exercised by the tests?
I added tests for most the replacement sites and updated tests has even more tests.
split() and splitlist() -- tested. Unfortunately they are tested only for bytes argument because these methods reject unicode string argument with NUL.
Tcl_Obj.string, Tcl_Obj.typename and Tcl_Obj.__str__() -- not tested. There are no explicit tests for these properties and methods. Seems as Tcl_Obj.typename can't be tested for NUL.
eval(), evalfile() -- tested.
Variable's methods -- tested.
exprstring() -- tested. I added tests for exprstring(), exprdouble(), exprlong(), exprboolean() in the patch.
record() -- not tested. There are no explicit tests for record() and I have no ideas how it can be used in Python.
C functions:
FromObj() and Tkapp_CallResult() -- implicitly tested in a lot of tests, in particular in test_passing_values and test_user_command.
PythonCmd() -- tested in test_user_command.
> There are a few changes that seem unrelated to nulls, which might have been left for another patch.
They are just make code more robust. For example Tcl can be compiled with TCL_UTF_MAX=6. In this case Python will work correctly most time but can work incorrectly or crash on specific rare data. With proposed changes it will raise SystemError early. Yes, it is worth separate issue.
> Do you know if this code block is tested.
It is implicitly tested in many tests which tests non-ASCII strings.
msg210118 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014年02月03日 12:14
With the additional tests, it seems reasonable to apply.
msg210155 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014年02月03日 19:39
New changeset a6ba6db9edb4 by Serhiy Storchaka in branch '2.7':
Issue #20368: Add tests for Tkinter methods exprstring(), exprdouble(),
http://hg.python.org/cpython/rev/a6ba6db9edb4
New changeset 825c8db8b1e2 by Serhiy Storchaka in branch '3.3':
Issue #20368: Add tests for Tkinter methods exprstring(), exprdouble(),
http://hg.python.org/cpython/rev/825c8db8b1e2
New changeset 28ec384e7dcc by Serhiy Storchaka in branch 'default':
Issue #20368: Add tests for Tkinter methods exprstring(), exprdouble(),
http://hg.python.org/cpython/rev/28ec384e7dcc
New changeset 65c29c07bb31 by Serhiy Storchaka in branch '2.7':
Issue #20368: The null character now correctly passed from Tcl to Python (in
http://hg.python.org/cpython/rev/65c29c07bb31
New changeset 08e3343f01a5 by Serhiy Storchaka in branch '3.3':
Issue #20368: The null character now correctly passed from Tcl to Python.
http://hg.python.org/cpython/rev/08e3343f01a5
New changeset 321b714653e3 by Serhiy Storchaka in branch 'default':
Issue #20368: The null character now correctly passed from Tcl to Python.
http://hg.python.org/cpython/rev/321b714653e3 
msg210278 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014年02月04日 23:31
New changeset d83ce3a2d954 by Christian Heimes in branch '3.3':
Issue #20515: Fix NULL pointer dereference introduced by issue #20368
http://hg.python.org/cpython/rev/d83ce3a2d954
New changeset 145032f626d3 by Christian Heimes in branch 'default':
Issue #20515: Fix NULL pointer dereference introduced by issue #20368
http://hg.python.org/cpython/rev/145032f626d3 
History
Date User Action Args
2022年04月11日 14:57:57adminsetgithub: 64567
2014年02月04日 23:31:42python-devsetmessages: + msg210278
2014年02月03日 21:49:03serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2014年02月03日 19:39:30python-devsetnosy: + python-dev
messages: + msg210155
2014年02月03日 12:14:21terry.reedysetmessages: + msg210118
2014年02月03日 10:08:28serhiy.storchakasetfiles: + tkinter_null_character_2.patch

messages: + msg210106
2014年02月03日 02:21:06terry.reedysetmessages: + msg210075
2014年02月02日 20:44:03serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg210012
2014年01月23日 15:15:02serhiy.storchakacreate

AltStyle によって変換されたページ (->オリジナル) /