homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Initialization of globals in unicodeobject.c
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, Gregory.Andersen, amaury.forgeotdarc, franck, georg.brandl, lemburg, ncoghlan, pitrou, python-dev, serhiy.storchaka, skrah, stutzbach, vstinner
Priority: critical Keywords: patch

Created on 2010年10月20日 17:27 by skrah, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
unicode_init_globals.patch skrah, 2010年10月22日 14:39 review
unicode_init_globals2.patch skrah, 2010年10月24日 10:21 review
unicode-leak.patch skrah, 2011年04月11日 07:31 patch by Daniel Stutzbach
unicode_globals-2.7.patch serhiy.storchaka, 2013年01月07日 11:28 review
unicode_globals-3.2.patch serhiy.storchaka, 2013年01月07日 11:28 review
unicode_globals-3.3.patch serhiy.storchaka, 2013年01月07日 11:28 review
unicode_globals-3.4.patch serhiy.storchaka, 2013年01月07日 11:28 review
unicode_globals-2.7_2.patch serhiy.storchaka, 2013年01月24日 20:21 review
unicode_globals-3.2_2.patch serhiy.storchaka, 2013年01月24日 20:21 review
unicode_globals-3.3_2.patch serhiy.storchaka, 2013年01月24日 20:21
unicode_globals-3.4_2.patch serhiy.storchaka, 2013年01月24日 20:21
unicode_globals-2.7_3.patch serhiy.storchaka, 2013年01月25日 19:51 review
unicode_globals-3.2_3.patch serhiy.storchaka, 2013年01月25日 19:51 review
unicode_globals-3.3_3.patch serhiy.storchaka, 2013年01月25日 19:51 review
unicode_globals-3.4_3.patch serhiy.storchaka, 2013年01月25日 19:51 review
Messages (29)
msg119226 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010年10月20日 17:27
This is one of two remaining "definitely lost" leaks in py3k. It started
to appear in r70459. How to reproduce:
make distclean && ./configure OPT="-O0 -g" --without-pymalloc && make
valgrind --leak-check=full --suppressions=Misc/valgrind-python.supp ./python > VGOUT 2>&1
Then search for 'definitely'. This leak is not present in release-2.7.
==2058== 56 bytes in 1 blocks are definitely lost in loss record 918 of 2,136
==2058== at 0x4C2412C: malloc (vg_replace_malloc.c:195)
==2058== by 0x4167DE: _PyObject_New (object.c:243)
==2058== by 0x42C278: _PyUnicode_New (unicodeobject.c:341)
==2058== by 0x4306BD: PyUnicodeUCS2_DecodeUTF8Stateful (unicodeobject.c:2100)
==2058== by 0x430671: PyUnicodeUCS2_DecodeUTF8 (unicodeobject.c:2065)
==2058== by 0x42C8F7: PyUnicodeUCS2_FromStringAndSize (unicodeobject.c:541)
==2058== by 0x42C973: PyUnicodeUCS2_FromString (unicodeobject.c:559)
==2058== by 0x50B432: PyDict_SetItemString (dictobject.c:2088)
==2058== by 0x4258DF: PyType_Ready (typeobject.c:3844)
==2058== by 0x517B64: PyStructSequence_InitType (structseq.c:522)
==2058== by 0x4F3B4F: _PyFloat_Init (floatobject.c:1905)
==2058== by 0x4813CE: Py_InitializeEx (pythonrun.c:217)
msg119237 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010年10月20日 22:10
The stack corresponds to the allocation of type(sys.float_info).__doc__.
Why would only this object appear as a memory leak? It is certainly not deallocated, but all other types are in the same situation.
For example, sys.int_info is very similar, and happens to be defined in r70459. Why doesn't it appear in the report?
msg119238 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010年10月20日 22:24
To add to the mystery, the leak disappears if the key value is not
interned in PyDict_SetItemString:
Index: Objects/dictobject.c
===================================================================
--- Objects/dictobject.c (revision 70459)
+++ Objects/dictobject.c (working copy)
@@ -2088,7 +2088,6 @@
 kv = PyUnicode_FromString(key);
 if (kv == NULL)
 return -1;
- PyUnicode_InternInPlace(&kv); /* XXX Should we really? */
 err = PyDict_SetItem(v, kv, item);
 Py_DECREF(kv);
 return err;
msg119239 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010年10月20日 22:29
Stefan Krah wrote:
> 
> Stefan Krah <stefan-usenet@bytereef.org> added the comment:
> 
> To add to the mystery, the leak disappears if the key value is not
> interned in PyDict_SetItemString:
I'm not sure how you determine what is a leak and what not.
Interned Unicode objects stay alive until the interpreter
is finalized.
Are you suggesting that the finalization does not free the
interned Unicode strings or not all of them ?
> Index: Objects/dictobject.c
> ===================================================================
> --- Objects/dictobject.c (revision 70459)
> +++ Objects/dictobject.c (working copy)
> @@ -2088,7 +2088,6 @@
> kv = PyUnicode_FromString(key);
> if (kv == NULL)
> return -1;
> - PyUnicode_InternInPlace(&kv); /* XXX Should we really? */
> err = PyDict_SetItem(v, kv, item);
> Py_DECREF(kv);
> return err;
msg119241 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010年10月20日 22:47
Marc-Andre Lemburg <report@bugs.python.org> wrote:
> I'm not sure how you determine what is a leak and what not.
> Interned Unicode objects stay alive until the interpreter
> is finalized.
> 
> Are you suggesting that the finalization does not free the
> interned Unicode strings or not all of them ?
No, Valgrind's "definitely lost" category means that all pointers
to an allocated region have been lost, so it would not be possible
to free the area. [1]
There are hundreds of "possibly lost" warnings as well, but I did
not report those.
My experience is that Valgrind is usually correct with "definitely
lost", see e.g. #10153. That said, of course it _could_ be a false
alarm.
[1] Last category from:
 http://mail.python.org/pipermail/python-dev/2002-October/029758.html 
msg119356 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010年10月22日 00:03
Re disabling interning in PyDict_SetItemString:
A comment in unicodeobject.c says that globals should not be used
before calling _PyUnicode_Init. But in Py_InitializeEx (pythonrun.c)
_PyUnicode_Init is called after _Py_ReadyTypes, _PyFrame_Init,
_PyLong_Init, PyByteArray_Init and _PyFloat_Init.
In fact, when I move _PyUnicode_Init up, the error concerning
_PyFloat_Init disappears.
Problem is, PyType_Ready also uses PyDict_SetItemString, but I
presume that _Py_ReadyTypes has to be called before anything else.
In that case it would be unavoidable that PyDict_SetItemString is
used before _PyUnicode_Init, and it might be a good idea to disable
interning.
msg119387 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010年10月22日 14:39
I've verified the leak manually. The cause is that global variables in
unicodeobject.c, e.g. free_list, are used before _PyUnicode_Init() is
called. Later on _PyUnicode_Init() sets these variables to NULL, losing
the allocated memory.
Here is an example of the earliest use of free_list during
_Py_ReadyTypes (),
well before _PyUnicode_Init():
Breakpoint 1, unicode_dealloc (unicode=0x1b044c0) at Objects/unicodeobject.c:392
392 switch (PyUnicode_CHECK_INTERNED(unicode)) {
(gdb) bt
#0 unicode_dealloc (unicode=0x1b044c0) at Objects/unicodeobject.c:392
#1 0x000000000044fc69 in PyUnicode_InternInPlace (p=0x7fff303852b8) at Objects/unicodeobject.c:9991
#2 0x000000000044fed3 in PyUnicode_InternFromString (cp=0x568861 "__len__") at Objects/unicodeobject.c:10025
#3 0x00000000004344d0 in init_slotdefs () at Objects/typeobject.c:5751
#4 0x0000000000434840 in add_operators (type=0x7be260) at Objects/typeobject.c:5905
#5 0x000000000042eec8 in PyType_Ready (type=0x7be260) at Objects/typeobject.c:3810
#6 0x000000000042edfc in PyType_Ready (type=0x7bde60) at Objects/typeobject.c:3774
#7 0x000000000041aa5f in _Py_ReadyTypes () at Objects/object.c:1514
#8 0x00000000004992ff in Py_InitializeEx (install_sigs=1) at Python/pythonrun.c:232
#9 0x000000000049957f in Py_Initialize () at Python/pythonrun.c:321
#10 0x00000000004b289f in Py_Main (argc=1, argv=0x1afa010) at Modules/main.c:590
#11 0x0000000000417dcc in main (argc=1, argv=0x7fff30385758) at ./Modules/python.c:59
(gdb) n
411 if (PyUnicode_CheckExact(unicode) &&
(gdb) 
414 if (unicode->length >= KEEPALIVE_SIZE_LIMIT) {
(gdb) 
419 if (unicode->defenc) {
(gdb) 
423 *(PyUnicodeObject **)unicode = free_list;
(gdb) n
424 free_list = unicode;
(gdb) n
425 numfree++;
(gdb) n
411 if (PyUnicode_CheckExact(unicode) &&
A possible fix could be to initialize the globals right at the start
in main.c. Note that there are still several Unicode API functions in
main.c before PyType_Ready has been called on the Unicode type.
With the patch, Valgrind does not show the leak any longer.
msg119396 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010年10月22日 18:04
About the patch: why should _PyUnicode_Init() try to call _PyUnicode_InitGlobals() again?
msg119503 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010年10月24日 10:21
> why should _PyUnicode_Init() try to call _PyUnicode_InitGlobals() again?
For the embedding scenario (when only Py_Initialize() is called) I wanted
to preserve the old behavior of _PyUnicode_Init().
But this is not really enough. I wrote a new patch that also calls 
_PyUnicode_InitGlobals() at the beginning of Py_Initialize().
I don't like the fact that even more clutter is added to Py_Main(). Perhaps
Py_Initialize() could be moved up or the Unicode functions could be moved
down.
msg133502 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011年04月11日 07:31
[Merging with issue 11402]
Daniel's patch is much simpler, but I think that unicode_empty and
unicode_latin1 would need to be protected before _PyUnicode_Init
is called.
Is the module initialization procedure documented somewhere? I get
the impression that unicodeobject.c depends on dict.c and dict.c
depends on unicodeobject.c.
msg133504 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011年04月11日 07:36
Stefan Krah <report@bugs.python.org> wrote:
> Is the module initialization procedure documented somewhere? I get
> the impression that unicodeobject.c depends on dict.c and dict.c
> depends on unicodeobject.c.
s/dict.c/dictobject.c/
msg144749 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2011年10月01日 20:59
The PEP-393 changes apparently fix this leak; at least I can't reproduce
it in default any longer (but still in 3.2).
msg172103 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012年10月05日 17:24
See also #16143.
msg179221 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月06日 20:33
Daniel's patch looks good for me. 2.7 looks affected too.
msg179223 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013年01月06日 20:43
unicode-leak.patch doesn't fix #16143 though. unicode_empty and
unicode_latin1 need to be initialized, too.
Actually we could close this in favor of #16143.
msg179234 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月06日 22:29
> unicode-leak.patch doesn't fix #16143 though. unicode_empty and
> unicode_latin1 need to be initialized, too.
Indeed. I'll upload patches tomorrow.
msg179256 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月07日 11:28
Here are patches for all four Python versions. They fixes possible usage of the 
followed non-initialized global variables: free_list, numfree, interned, 
unicode_empty, static_strings, unicode_latin1, bloom_linebreak, 
unicode_default_encoding.
msg179494 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013年01月09日 22:54
Nick, I'm adding you to the nosy list since this issue seems related to PEP 432.
Quick summary: Globals are used in unicodeobject.c before they are initialized.
Also, Unicode objects are created before PyType_Ready(&PyUnicode_Type) has been
called.
This happens during startup:
_Py_InitializeEx_Private():
 _Py_ReadyTypes():
 PyType_Ready(&PyType_Type);
 [...]
 Many Unicode objects like "" or "__add__" are created. Uninitialized
 globals have led to a crash (#16143). This is fixed by Serhiy's patch,
 which always dynamically checks all globals for NULL before using them.
 However, Unicode objects are still created at this point.
 [...]
 PyType_Ready(&PyUnicode_Type); /* Called for the first time */
 [...]
 _PyUnicode_Init:
 for (i = 0; i < 256; i++) /* Could leak if latin1 strings
 unicode_latin1[i] = NULL; have already been created. */
 PyType_Ready(&PyUnicode_Type); /* Called a second time! */
So, considering PEP 432: Are these "pre-type-ready" Unicode objects
safe to use, or should something be done about it?
msg179498 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2013年01月09日 23:10
There should still be a check in tp_new (IIRC) that calls PyType_Ready on
unready types.
While doing something systematic about this kind of problem is part of the
rationale of PEP 432, that won't help earlier versions.
msg179504 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013年01月10日 00:21
Nick Coghlan <report@bugs.python.org> wrote:
> There should still be a check in tp_new (IIRC) that calls PyType_Ready on
> unready types.
Indeed there is one in type_new(), but that isn't used here AFAICS. If
you apply this patch and start up python, there are many "str: not ready"
instances:
diff --git a/Objects/unicodeobject.c b/Objects/unicodeobject.c
--- a/Objects/unicodeobject.c
+++ b/Objects/unicodeobject.c
@@ -14282,6 +14282,10 @@
 PyUnicode_InternFromString(const char *cp)
 {
 PyObject *s = PyUnicode_FromString(cp);
+
+ fprintf(stderr, "%s: %s\n", PyUnicode_Type.tp_name,
+ (PyUnicode_Type.tp_flags & Py_TPFLAGS_READY) ? "ready" : "not ready");
+
 if (s == NULL)
 return NULL;
 PyUnicode_InternInPlace(&s);
msg180546 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月24日 20:21
There is a set of updated patches.
msg180547 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2013年01月24日 20:42
Serhiy's general approach here looks good to me (although there seem to be some unrelated changes to the re module in the current 3.2 patch).
For PEP 432, I want to try to rearrange things so that _PyUnicode_Init is one of the *first* calls made in Py_BeginInitialization (even before the general call to Py_ReadyTypes), but that still won't invalidate the work done here.
msg180573 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013年01月25日 12:29
Since Rietveld didn't mail me this time: I left some comments on the 2.7 patch.
msg180579 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013年01月25日 13:40
The 2.7 comments also apply to the 3.2 patch. Otherwise the 3.2 patch
(without the _sre changes :) looks good to me.
msg180617 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月25日 19:51
> The 2.7 comments also apply to the 3.2 patch. Otherwise the 3.2 patch
> (without the _sre changes :) looks good to me.
Patches updated addressing new Stefan's comments. Unicode globals no longer 
reinitialized in _PyUnicode_Init(). Note that I have added a consistency check 
into the macro in 3.3+.
I hope Rietveld will accept this set.
msg180623 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013年01月25日 21:12
Nice. I think the latest patches are commit-ready.
msg180652 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013年01月26日 10:21
New changeset 7c8ad0d02664 by Serhiy Storchaka in branch '2.7':
Issue #10156: In the interpreter's initialization phase, unicode globals
http://hg.python.org/cpython/rev/7c8ad0d02664
New changeset f7eda8165e6f by Serhiy Storchaka in branch '3.2':
Issue #10156: In the interpreter's initialization phase, unicode globals
http://hg.python.org/cpython/rev/f7eda8165e6f
New changeset 01d4dd412581 by Serhiy Storchaka in branch '3.3':
Issue #10156: In the interpreter's initialization phase, unicode globals
http://hg.python.org/cpython/rev/01d4dd412581
New changeset cb12d642eed2 by Serhiy Storchaka in branch 'default':
Issue #10156: In the interpreter's initialization phase, unicode globals
http://hg.python.org/cpython/rev/cb12d642eed2 
msg180654 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年01月26日 10:33
Committed. Thank you for review, Stefan. Close this issue if the work is finished.
msg180721 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2013年01月26日 23:21
Buildbots etc. look all good. Thanks for fixing this.
History
Date User Action Args
2022年04月11日 14:57:07adminsetgithub: 54365
2013年01月26日 23:21:43skrahsetresolution: fixed
stage: patch review -> resolved
2013年01月26日 23:21:26skrahsetstatus: open -> closed
2013年01月26日 23:21:19skrahsetmessages: + msg180721
2013年01月26日 10:33:56serhiy.storchakasetmessages: + msg180654
2013年01月26日 10:21:50python-devsetnosy: + python-dev
messages: + msg180652
2013年01月25日 21:12:47skrahsetmessages: + msg180623
2013年01月25日 19:51:33serhiy.storchakasetfiles: + unicode_globals-2.7_3.patch, unicode_globals-3.2_3.patch, unicode_globals-3.3_3.patch, unicode_globals-3.4_3.patch

messages: + msg180617
2013年01月25日 13:40:39skrahsetmessages: + msg180579
2013年01月25日 12:29:15skrahsetmessages: + msg180573
versions: + Python 3.3, Python 3.4
2013年01月24日 20:42:08ncoghlansetmessages: + msg180547
2013年01月24日 20:21:09serhiy.storchakasetfiles: + unicode_globals-2.7_2.patch, unicode_globals-3.2_2.patch, unicode_globals-3.3_2.patch, unicode_globals-3.4_2.patch

messages: + msg180546
2013年01月10日 00:21:25skrahsetmessages: + msg179504
2013年01月09日 23:10:52ncoghlansetmessages: + msg179498
2013年01月09日 22:54:13skrahsetnosy: + ncoghlan
messages: + msg179494
2013年01月07日 23:30:53skrahsetpriority: high -> critical
2013年01月07日 23:30:38skrahsetnosy: + georg.brandl, pitrou, franck, Gregory.Andersen
2013年01月07日 23:29:13skrahlinkissue16143 superseder
2013年01月07日 11:34:18serhiy.storchakasetstage: commit review -> patch review
2013年01月07日 11:28:43serhiy.storchakasetfiles: + unicode_globals-2.7.patch, unicode_globals-3.2.patch, unicode_globals-3.3.patch, unicode_globals-3.4.patch

messages: + msg179256
2013年01月06日 22:29:46serhiy.storchakasetmessages: + msg179234
2013年01月06日 20:43:48skrahsetmessages: + msg179223
2013年01月06日 20:33:36serhiy.storchakasetversions: + Python 2.7, - Python 3.3, Python 3.4
nosy: + serhiy.storchaka

messages: + msg179221

stage: patch review -> commit review
2013年01月04日 23:41:18Arfreversetnosy: + Arfrever
2012年10月05日 17:24:52skrahsetmessages: + msg172103
versions: + Python 3.3, Python 3.4
2012年04月20日 17:58:50mark.dickinsonsetnosy: - mark.dickinson
2011年10月01日 20:59:19skrahsetmessages: + msg144749
2011年04月11日 07:36:29skrahsetmessages: + msg133504
2011年04月11日 07:32:27skrahlinkissue11402 superseder
2011年04月11日 07:31:07skrahsetfiles: + unicode-leak.patch
nosy: + stutzbach
messages: + msg133502

2010年10月24日 10:21:44skrahsetfiles: + unicode_init_globals2.patch

messages: + msg119503
2010年10月22日 18:04:51amaury.forgeotdarcsetmessages: + msg119396
2010年10月22日 14:39:03skrahsetfiles: + unicode_init_globals.patch
priority: normal -> high
title: Memory leak (r70459) -> Initialization of globals in unicodeobject.c
messages: + msg119387

keywords: + patch
stage: patch review
2010年10月22日 00:03:30skrahsetmessages: + msg119356
2010年10月20日 22:47:11skrahsetmessages: + msg119241
2010年10月20日 22:29:25lemburgsetnosy: + lemburg
messages: + msg119239
2010年10月20日 22:24:43skrahsetmessages: + msg119238
2010年10月20日 22:10:02amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg119237
2010年10月20日 17:58:55belopolskysetnosy: + vstinner
2010年10月20日 17:27:05skrahcreate

AltStyle によって変換されたページ (->オリジナル) /