Issue 9242: unicodeobject.c: use of uninitialized values

➜

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/53488

classification

Title:	unicodeobject.c: use of uninitialized values
Type:	behavior	Stage:	resolved
Components:	Versions:	Python 3.2, Python 2.7

process

Dependencies:	Superseder:
Status:	closed	Resolution:	fixed
Assigned To:	serhiy.storchaka	Nosy List:	Arfrever, lemburg, serhiy.storchaka, skrah, vstinner
Priority:	normal	Keywords:	patch

Created on 2010年07月13日 09:24 by skrah, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
uninitialized.py	skrah, 2010年07月16日 11:10
issue9242.patch	serhiy.storchaka, 2013年01月06日 20:27	review

Messages (8)
msg110165 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2010年07月13日 09:24
Not sure if this is valid or if there is some internal trickery that Valgrind isn't aware of. If it's the latter, perhaps an entry to Misc/valgrind.supp could be added. test_bug1175396 (__main__.UTF32Test) ... ==26861== Conditional jump or move depends on uninitialised value(s) ==26861== at 0x48A2DD: PyUnicodeUCS2_DecodeUTF32Stateful (unicodeobject.c:2282) ==26861== by 0x50E25C: utf_32_le_decode (_codecsmodule.c:420) ==26861== by 0x52E727: PyCFunction_Call (methodobject.c:81) ==26861== by 0x4B4EB7: call_function (ceval.c:4012) ==26861== by 0x4B1402: PyEval_EvalFrameEx (ceval.c:2665) ==26861== by 0x4B31DA: PyEval_EvalCodeEx (ceval.c:3252) ==26861== by 0x4B52C0: fast_function (ceval.c:4108) ==26861== by 0x4B4FE1: call_function (ceval.c:4033) ==26861== by 0x4B1402: PyEval_EvalFrameEx (ceval.c:2665) ==26861== by 0x4B31DA: PyEval_EvalCodeEx (ceval.c:3252) ==26861== by 0x4B52C0: fast_function (ceval.c:4108) ==26861== by 0x4B4FE1: call_function (ceval.c:4033) ==26861=わ=わ =わ=わ26861=わ=わ =わ=わ26861== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- y ==26861== starting debugger with cmd: /usr/bin/gdb -nw /proc/26862/fd/1014 26862 GNU gdb 6.8-debian Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu"... Attaching to program: /proc/26862/fd/1014, process 26862 0x000000000048a2dd in PyUnicodeUCS2_DecodeUTF32Stateful (s=0x5bd6a54 "\n", size=1, errors=0x5add144 "strict", byteorder=0x7feffb18c, consumed=0x7feffb170) at Objects/unicodeobject.c:2282 2282 if (qq[iorder[2]] != 0 \|\| qq[iorder[3]] != 0) (gdb) p s 1ドル = 0x5bd6a54 "\n" (gdb) p qq 2ドル = (const unsigned char ) 0x5bd6a54 "\n" (gdb) p iorder[2] 3ドル = 2 (gdb) p iorder[3] 4ドル = 3 (gdb) p bo 5ドル = -1 (gdb) p byteorder 6ドル = -1
msg110168 - (view)	Author: Marc-Andre Lemburg (lemburg) * (Python committer)	Date: 2010年07月13日 10:45
Stefan Krah wrote: > > New submission from Stefan Krah <stefan-usenet@bytereef.org>: > > Not sure if this is valid or if there is some internal trickery that Valgrind isn't aware of. If it's the latter, perhaps an entry to > Misc/valgrind.supp could be added. > > > test_bug1175396 (__main__.UTF32Test) ... ==26861== Conditional jump or move depends on uninitialised value(s) > ==26861== at 0x48A2DD: PyUnicodeUCS2_DecodeUTF32Stateful (unicodeobject.c:2282) > ==26861== by 0x50E25C: utf_32_le_decode (_codecsmodule.c:420) > ==26861== by 0x52E727: PyCFunction_Call (methodobject.c:81) > ==26861== by 0x4B4EB7: call_function (ceval.c:4012) > ==26861== by 0x4B1402: PyEval_EvalFrameEx (ceval.c:2665) > ==26861== by 0x4B31DA: PyEval_EvalCodeEx (ceval.c:3252) > ==26861== by 0x4B52C0: fast_function (ceval.c:4108) > ==26861== by 0x4B4FE1: call_function (ceval.c:4033) > ==26861== by 0x4B1402: PyEval_EvalFrameEx (ceval.c:2665) > ==26861== by 0x4B31DA: PyEval_EvalCodeEx (ceval.c:3252) > ==26861== by 0x4B52C0: fast_function (ceval.c:4108) > ==26861== by 0x4B4FE1: call_function (ceval.c:4033) > ==26861== > ==26861== > ==26861== ---- Attach to debugger ? --- [Return/N/n/Y/y/C/c] ---- y > ==26861== starting debugger with cmd: /usr/bin/gdb -nw /proc/26862/fd/1014 26862 > GNU gdb 6.8-debian > Copyright (C) 2008 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu"... > Attaching to program: /proc/26862/fd/1014, process 26862 > 0x000000000048a2dd in PyUnicodeUCS2_DecodeUTF32Stateful (s=0x5bd6a54 "\n", size=1, errors=0x5add144 "strict", byteorder=0x7feffb18c, > consumed=0x7feffb170) at Objects/unicodeobject.c:2282 > 2282 if (qq[iorder[2]] != 0 \|\| qq[iorder[3]] != 0) > (gdb) p s > 1ドル = 0x5bd6a54 "\n" > (gdb) p qq > 2ドル = (const unsigned char ) 0x5bd6a54 "\n" > (gdb) p iorder[2] > 3ドル = 2 > (gdb) p iorder[3] > 4ドル = 3 > (gdb) p bo > 5ドル = -1 > (gdb) p byteorder > 6ドル = -1 Could you check whether the report goes away when using the following definition of iorder in that function: #ifdef BYTEORDER_IS_LITTLE_ENDIAN const int iorder[4] = {0, 1, 2, 3}; #else const int iorder[4] = {3, 2, 1, 0}; #endif I guess that the missing dimension is causing the valgrind report. From a code perspective, everything is in order, in fact I'd expect the compiler to optimize the array away. It may even be a good idea to replace the array with symbols: #ifdef BYTEORDER_IS_LITTLE_ENDIAN # define BYTEORDER_0 0 # define BYTEORDER_1 1 # define BYTEORDER_2 2 # define BYTEORDER_3 3 #else # define BYTEORDER_0 3 # define BYTEORDER_1 2 # define BYTEORDER_2 1 # define BYTEORDER_3 0 #endif
msg110171 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2010年07月13日 11:40
> const int iorder[4] = {0, 1, 2, 3}; const isn't possible, iorder is modified later on. Adding the array dimension did not change anything. Making everything const (see below) did not change anything either. I presume that Valgrind regards qq[2] or qq[3] as uninitialized. Index: Objects/unicodeobject.c =================================================================== --- Objects/unicodeobject.c (revision 82816) +++ Objects/unicodeobject.c (working copy) @@ -2216,10 +2216,12 @@ int bo = 0; /* assume native ordering by default / const char errmsg = ""; /* Offsets from q for retrieving bytes in the right order. / + const int iorder_le[] = {0, 1, 2, 3}; + const int iorder_be[] = {3, 2, 1, 0}; #ifdef BYTEORDER_IS_LITTLE_ENDIAN - int iorder[] = {0, 1, 2, 3}; + const int iorder = iorder_le; #else - int iorder[] = {3, 2, 1, 0}; + const int iorder = iorder_be; #endif PyObject errorHandler = NULL; PyObject exc = NULL; @@ -2262,17 +2264,11 @@ if (bo == -1) { / force LE / - iorder[0] = 0; - iorder[1] = 1; - iorder[2] = 2; - iorder[3] = 3; + iorder = iorder_le; } else if (bo == 1) { / force BE */ - iorder[0] = 3; - iorder[1] = 2; - iorder[2] = 1; - iorder[3] = 0; + iorder = iorder_be; }
msg110427 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2010年07月16日 11:10
Here is a minimal example how to reproduce this issue, extracted from UTF32LETest. valgrind --db-attach=yes --suppressions=Misc/valgrind-python.supp ./python uninitialized.py It seems that in Lib/codecs.py the equivalent of "\x00".decode('utf-32-le') is called (line 477). This leads to: PyUnicodeUCS2_DecodeUTF32Stateful (s=0x5b0fc0c "", size=1, errors=0x5add144 "strict", byteorder=0x7fefff39c, consumed=0x0) So we have a string of size 1, but s[2] and s[3] will be accessed in the function.
msg179219 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年01月06日 20:27
3.2 is affected too. The PEP-393 changes fixes this bug in 3.3+. Here is a simple patch.
msg179371 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年01月08日 20:51
I forgot mention the issue number in commit messages. See changeset3570e04f4ea9 and changesetbf347198fbaf. Is the issue fixed now?
msg179372 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)	Date: 2013年01月08日 20:51
changeset 3570e04f4ea9 and changeset bf347198fbaf
msg179376 - (view)	Author: Stefan Krah (skrah) * (Python committer)	Date: 2013年01月08日 22:01
The utf_32_le_decode invalid access is gone; for Valgrind the issue is fixed (I didn't look at the patch, no time ATM, sorry).

History
Date	User	Action	Args
2022年04月11日 14:57:03	admin	set	github: 53488
2013年01月08日 22:22:10	serhiy.storchaka	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2013年01月08日 22:01:25	skrah	set	messages: + msg179376
2013年01月08日 20:51:57	serhiy.storchaka	set	messages: + msg179372
2013年01月08日 20:51:21	serhiy.storchaka	set	messages: + msg179371
2013年01月06日 20:27:04	serhiy.storchaka	set	files: + issue9242.patch assignee: serhiy.storchaka versions: + Python 3.2 keywords: + patch nosy: + serhiy.storchaka messages: + msg179219 stage: patch review
2013年01月04日 23:40:25	Arfrever	set	nosy: + Arfrever
2011年07月07日 10:44:13	vstinner	set	nosy: + vstinner
2010年07月16日 11:10:59	skrah	set	files: + uninitialized.py messages: + msg110427
2010年07月13日 11:40:08	skrah	set	messages: + msg110171
2010年07月13日 10:45:45	lemburg	set	messages: + msg110168
2010年07月13日 09:24:47	skrah	create

homepage