homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: _elementtree.TreeBuilder broken with a non-C-deriving element_factory
Type: crash Stage: resolved
Components: Extension Modules, XML Versions: Python 3.3, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, christian.heimes, eli.bendersky, ezio.melotti, flox, georg.brandl, larry, mitya57, pitrou, python-dev, skrah
Priority: release blocker Keywords: 3.3regression, patch

Created on 2012年09月30日 02:07 by christian.heimes, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue16089-pseudo-fix.diff skrah, 2012年09月30日 11:17
ctree.patch pitrou, 2012年10月01日 18:03 review
ctree2.patch pitrou, 2012年10月01日 19:09 review
c_treebuilder.patch pitrou, 2012年10月02日 19:53 review
c_treebuilder2.patch pitrou, 2012年10月02日 20:13 review
Messages (25)
msg171603 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012年09月30日 02:07
The issue was reported by Arfrever on #python-dev. The test suite of simpletal [1] was segfaulting with Python 3.3. It doesn't segfault without the _elementtree C extension.
I'm able to reproduce the issue. It may take a couple of runs with a debug build of Python but eventually Python segfaults. So far segfaults occur inside a GC collection run. Benjamin suspects the issue is related to the new GC code in _elementtree.
 #0 visit_decref (op=<unknown at remote 0x7fea64fc8ca8>, data=0x0) at Modules/gcmodule.c:361
 #1 0x00000000004bccd5 in subtract_refs (containers=<optimized out>) at Modules/gcmodule.c:386
 #2 collect (generation=0, n_collected=0x7fffe3eedb70, n_uncollectable=0x7fffe3eedb78) at Modules/gcmodule.c:891
 #3 0x00000000004bd7e6 in collect_with_callback (generation=0) at Modules/gcmodule.c:1048
 #4 collect_generations () at Modules/gcmodule.c:1071
 #5 _PyObject_GC_Malloc (basicsize=<optimized out>) at Modules/gcmodule.c:1580
 #6 0x00000000004bddcc in _PyObject_GC_Malloc (basicsize=<optimized out>) at Modules/gcmodule.c:1567
 #7 _PyObject_GC_New (tp=0x8322a0) at Modules/gcmodule.c:1590
 #8 0x0000000000548a3a in new_dict (values=0x0, keys=0xd02e60) at Objects/dictobject.c:395
 #9 _PyDict_NewPresized (minused=<optimized out>) at Objects/dictobject.c:1044
 #10 0x000000000047a449 in PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:2245
 #11 0x0000000000480b03 in fast_function (nk=<optimized out>, na=1, n=<optimized out>, pp_stack=0x7fffe3eede40, func=
 <function at remote 0x7fea653ff9e0>) at Python/ceval.c:4150
[1] http://www.owlfish.com/software/simpleTAL/downloads/SimpleTAL-5.1.tar.gz 
msg171605 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012年09月30日 02:56
The issue could be related to #14007.
msg171619 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012年09月30日 08:13
I can't reproduce this, so just a wild guess: Does the segfault still
happen if you replace Py_XDECREF() with Py_CLEAR() in xmlparser_gc_clear()?
msg171625 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012年09月30日 10:59
No, it doesn't make a difference when I replace Py_XDECREF() with Py_CLEAR(). I've also replaced Py_(X)DECREF() in the other *_gc_clear() methods without success.
msg171629 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012年09月30日 11:17
I'm now able to reproduce the issue with a non-debug build. As
Christian says, using Py_CLEAR() does not help (though I wonder
if it shouldn't be used anyway).
Reverting part of 20b8f0ee3d64 "fixes" the segfault, see the
attached diff. I don't know the code well enough to say if this
is a valid possibility.
msg171630 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2012年09月30日 11:45
I can confirm that Stefan's fix works. I ran the SimpleTAL test suite about hundred times in a loop without a segfault. But I don't understand why the change fixes the issue. Could the alteration just lead to another code path so the erroneous code isn't triggered?
+1 for Py_CLEAR()
msg171642 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012年09月30日 14:39
I'm currently somewhat "offline" for a while (cross-continental move), but I'll do my best to try to recreate my setup to test this problem and the proposed solution.
msg171644 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012年09月30日 14:55
Do note that the patch is somewhat cargo-cult. I don't understand yet why
it works; it may very well just silence the real problem.
msg171738 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012年10月01日 17:53
Let's make sure this gets into 3.3.1.
msg171739 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年10月01日 18:03
Here is a collection of assorted small fixes for celementtree. There are probably other problems lurking (the coding style there is quite old).
I cannot say anything about the crasher until there's a simple reproducer :)
msg171742 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年10月01日 19:09
More assorted celementtree cleanups.
msg171746 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年10月01日 21:13
By the way, the crash involves an _ElementInterface subclass named SimpleElementTreeVar:
#0 0x0000000000524c0f in visit_decref (op=Traceback (most recent call last):
 File "/home/antoine/cpython/33/python-gdb.py", line 1298, in to_string
 pyop = PyObjectPtr.from_pyobject_ptr(self.gdbval)
 File "/home/antoine/cpython/33/python-gdb.py", line 370, in from_pyobject_ptr
 cls = cls.subclass_from_type(p.type())
 File "/home/antoine/cpython/33/python-gdb.py", line 318, in subclass_from_type
 tp_name = t.field('tp_name').string()
 File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
 return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x95 in position 1: invalid start byte
, data=0x0) at Modules/gcmodule.c:361
#1 0x00000000005bcbc5 in BaseException_traverse (self=0x7f08f390a8b0, visit=0x524b98 <visit_decref>, arg=0x0)
 at Objects/exceptions.c:104
#2 0x00000000004336d9 in subtype_traverse (self=
 <SimpleElementTreeVar(attrib={}, tag='root', ourValue=None, _children=[]) at remote 0x7f08f390a8b0>, 
 visit=0x524b98 <visit_decref>, arg=0x0) at Objects/typeobject.c:837
#3 0x0000000000524cba in subtract_refs (containers=0x8dcf40) at Modules/gcmodule.c:386
#4 0x0000000000525afa in collect (generation=0, n_collected=0x7fffed18be00, n_uncollectable=0x7fffed18bdf8)
 at Modules/gcmodule.c:891
#5 0x00000000005260b2 in collect_with_callback (generation=0) at Modules/gcmodule.c:1048
#6 0x000000000052615c in collect_generations () at Modules/gcmodule.c:1071
#7 0x000000000052707d in _PyObject_GC_Malloc (basicsize=48) at Modules/gcmodule.c:1580
[snip]
msg171748 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年10月01日 21:36
Ok, the problem is that _elementtree.TreeBuilder expects to receive _elementtree.Element instances, but simpleTAL's element_factory instead gives _ElementInterface instances.
In other words, TreeBuilder is completely broken.
msg171749 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年10月01日 21:39
Example of this is the following code in treebuilder_handle_start:
 if (this != Py_None) {
 if (element_add_subelement((ElementObject*) this, node) < 0)
 goto error;
(note the overly optimistic cast)
but this is really a pervasive problem, since in many places TreeBuilder is hard-wired to use a Element instance and nothing else (despite the element_factory).
Note that simpleTAL cannot use the _elementtree.Element class, since their subclass also inherits from an exception subclass, and the object layouts conflict with each other (yeah, crappy design).
msg171751 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年10月01日 21:40
I'll still commit my cleanup patch in the meantime :-)
msg171752 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年10月01日 21:46
New changeset f9224f23f473 by Antoine Pitrou in branch '3.3':
Sanitize and modernize some of the _elementtree code (see issue #16089).
http://hg.python.org/cpython/rev/f9224f23f473
New changeset 9fb0a8fc5d79 by Antoine Pitrou in branch 'default':
Sanitize and modernize some of the _elementtree code (see issue #16089).
http://hg.python.org/cpython/rev/9fb0a8fc5d79 
msg171753 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012年10月01日 21:57
Thank you, Antoine, for looking into this. I wish I could participate in a meaningful way, but alas it will be days or weeks before I can recreate a suitable setup to get back hacking on Python.
msg171778 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012年10月02日 09:00
Nice find. -- The Python version does this:
 _Element = _ElementInterface = Element
So (naively) I would think the same should be done for the C version
after importing Element.
But then one runs into the object layouts conflict that you mentioned.
On the other hand, in the original documentation direct use of
_ElementInterface was discouraged:
http://effbot.org/zone/pythondoc-elementtree-ElementTree.htm#elementtree.ElementTree._ElementInterface-class 
msg171829 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年10月02日 19:53
Here is a patch. It still needs some tests.
msg171831 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年10月02日 20:13
Updated patch with tests.
msg171981 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2012年10月04日 17:56
New changeset 7bd9626d8b4f by Antoine Pitrou in branch '3.3':
Issue #16089: Allow ElementTree.TreeBuilder to work again with a non-Element element_factory (fixes a regression in SimpleTAL).
http://hg.python.org/cpython/rev/7bd9626d8b4f
New changeset a8c044188731 by Antoine Pitrou in branch 'default':
Issue #16089: Allow ElementTree.TreeBuilder to work again with a non-Element element_factory (fixes a regression in SimpleTAL).
http://hg.python.org/cpython/rev/a8c044188731 
msg172009 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012年10月04日 22:37
Antoine, Thanks!
You said :
> the coding style there is quite old
It would be great if you could elaborate, however briefly, if there's anything else besides your fixes that is old and should be modernized. I will admit to writing some of that code very recently, but that's mostly because I was trying to follow the existing coding style of the module, which was written a long time ago.
msg172014 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012年10月04日 22:46
> > the coding style there is quite old
> 
> It would be great if you could elaborate, however briefly, if there's
> anything else besides your fixes that is old and should be modernized.
> I will admit to writing some of that code very recently, but that's
> mostly because I was trying to follow the existing coding style of the
> module, which was written a long time ago.
I don't think that's you. I was talking about some of the cruft I
removed (such as list_join() which was going through complicated hoops
instead of calling PyUnicode_Join(), or manual filling of tuples instead
of calling PyTuple_Pack()).
There are still issues with internal functions stealing references and
such.
msg172019 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2012年10月04日 23:07
Ah, OK. I was actually putting off refactorings in that code to after refactoring the test suite. The latter is in progress, it was paused by my vacation but I do plan to resume it eventually. Once the test suite is sufficiently humane (esp. moves off doctest to enable normal testing of multiple modules with the same set of tests) the refactoring should become easier.
msg175235 - (view) Author: Dmitry Shachnev (mitya57) * Date: 2012年11月09日 13:49
There are still some false-positive warnings caused by C code that affect docutils (see http://sourceforge.net/tracker/?func=detail&atid=422030&aid=3555164&group_id=38414).
Look at this traceback for exmaple:
http://paste.ubuntu.com/1345164/
There, _ElementInterfaceWrapper is a subclass of etree._ElementInterface (http://repo.or.cz/w/docutils.git/blob/HEAD:/docutils/docutils/writers/odf_odt/__init__.py#l91).
History
Date User Action Args
2022年04月11日 14:57:36adminsetgithub: 60293
2012年11月09日 13:51:34mitya57setnosy: georg.brandl, pitrou, larry, christian.heimes, ezio.melotti, Arfrever, eli.bendersky, skrah, flox, python-dev, mitya57
2012年11月09日 13:50:42mitya57setnosy: georg.brandl, pitrou, larry, christian.heimes, ezio.melotti, Arfrever, eli.bendersky, skrah, flox, python-dev, mitya57
2012年11月09日 13:49:52mitya57setnosy: + mitya57, larry
messages: + msg175235
2012年10月04日 23:07:07eli.benderskysetmessages: + msg172019
2012年10月04日 22:46:59pitrousetmessages: + msg172014
2012年10月04日 22:37:30eli.benderskysetmessages: + msg172009
2012年10月04日 17:59:30pitrousetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2012年10月04日 17:56:16python-devsetmessages: + msg171981
2012年10月02日 20:13:51pitrousetfiles: + c_treebuilder2.patch

messages: + msg171831
stage: needs patch -> patch review
2012年10月02日 19:53:31pitrousetfiles: + c_treebuilder.patch

messages: + msg171829
title: _elementtree causes segfault in GC -> _elementtree.TreeBuilder broken with a non-C-deriving element_factory
2012年10月02日 09:00:58skrahsetmessages: + msg171778
2012年10月01日 21:57:58eli.benderskysetmessages: + msg171753
2012年10月01日 21:46:40python-devsetnosy: + python-dev
messages: + msg171752
2012年10月01日 21:40:32pitrousetmessages: + msg171751
2012年10月01日 21:39:24pitrousetmessages: + msg171749
2012年10月01日 21:36:24pitrousetmessages: + msg171748
2012年10月01日 21:13:42pitrousetmessages: + msg171746
2012年10月01日 19:09:01pitrousetfiles: + ctree2.patch

messages: + msg171742
2012年10月01日 18:03:17pitrousetfiles: + ctree.patch
nosy: + pitrou
messages: + msg171739

2012年10月01日 17:53:02georg.brandlsetpriority: critical -> release blocker

messages: + msg171738
2012年09月30日 16:47:27ezio.melottisetnosy: + ezio.melotti
2012年09月30日 14:55:51skrahsetmessages: + msg171644
2012年09月30日 14:39:05eli.benderskysetmessages: + msg171642
2012年09月30日 11:45:28christian.heimessetmessages: + msg171630
2012年09月30日 11:35:00georg.brandlsetnosy: + georg.brandl
2012年09月30日 11:17:19skrahsetfiles: + issue16089-pseudo-fix.diff
keywords: + patch
messages: + msg171629
2012年09月30日 10:59:03christian.heimessetmessages: + msg171625
2012年09月30日 08:13:31skrahsetnosy: + skrah
messages: + msg171619
2012年09月30日 08:02:40floxsetnosy: + eli.bendersky, flox
components: + XML
2012年09月30日 02:56:01christian.heimessetmessages: + msg171605
2012年09月30日 02:12:49Arfreversetnosy: + Arfrever
2012年09月30日 02:07:42christian.heimescreate

AltStyle によって変換されたページ (->オリジナル) /