This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2012年09月28日 08:54 by einarfd, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| i16076-cpatch.diff | danielsh, 2012年12月03日 16:48 | review | ||
| i16076-v3-combined.diff | danielsh, 2012年12月07日 22:36 | review | ||
| i16076-v4-combined.diff | danielsh, 2012年12月08日 18:50 | review | ||
| i16076-v5.diff | danielsh, 2012年12月19日 17:19 | review | ||
| transcript-test___all__.txt | danielsh, 2012年12月29日 16:18 | review | ||
| i16076-v6.diff | danielsh, 2012年12月30日 23:57 | review | ||
| i16076-v7.diff | danielsh, 2013年01月08日 06:31 | review | ||
| i16076-v8.diff | danielsh, 2013年01月08日 23:48 | review | ||
| memleak-v1.diff | danielsh, 2013年01月11日 11:07 | review | ||
| Messages (50) | |||
|---|---|---|---|
| msg171418 - (view) | Author: Einar Fløystad Dørum (einarfd) | Date: 2012年09月28日 08:54 | |
The xml.etree.ElementTree.Element class is no longer pickleable in Python 3.3.0rc3 .
This is a regression from Python 3.2 where the Element class was pickleable, while the xml.etree.cElementTree.Element was not. So this is probably related to switching the ElementTree implementation in Python 3.3.
I've looked at the "what's new" documentation in Python 3.3, and there was nothing that indicated that the ElemenTree switchover would give this kind of issues.
I've run into this issues because I use the multiprocessing module to pass parsed XML documents between processes, and when I can't pickle Element objects. This stops working.
You can reproduce the issue with the following code:
import pickle
from xml.etree.ElementTree import Element
print(pickle.dumps(Element("foo")))
|
|||
| msg171737 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2012年10月01日 17:52 | |
In 3.2 repr(xml.etree.ElementTree.Element) is "<class 'xml.etree.ElementTree.Element'>". In 3.3 repr(xml.etree.ElementTree.Element) is "<class 'Element'>". |
|||
| msg172297 - (view) | Author: Eli Bendersky (eli.bendersky) * (Python committer) | Date: 2012年10月07日 13:19 | |
Will look into it, thanks for the report. |
|||
| msg172613 - (view) | Author: Santoso Wijaya (santoso.wijaya) * | Date: 2012年10月11日 01:37 | |
This is because, in Python 3.3, xml.etree.ElementTree.Element is overridden by the C accelerator version, _elementtree.Element. If you remove the statement, 1708 from _elementtree import * the behavior from Python 3.2 comes back. |
|||
| msg172665 - (view) | Author: Santoso Wijaya (santoso.wijaya) * | Date: 2012年10月11日 17:07 | |
OTOH, xml.etree.cElementTree.Element in Python 3.2 and earlier has never been pickleable, either.
Python 3.2.3+ (3.2:24499eebbc2f, Oct 10 2012, 13:54:45)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from xml.etree.cElementTree import Element
>>> repr(Element)
'<built-in function Element>'
>>> import pickle
>>> pickle.dumps(Element('foo'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <class 'Element'>: attribute lookup builtins.Element failed
|
|||
| msg176854 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2012年12月03日 16:48 | |
Attached patch for your consideration. I've tested pickling/unpickling and comparing the resulting object attribute by attribute (.tag, .attrib, .text, .tail for equality; and recursively .getchildren()), and 'make test' --- all seems to work. If the approach is sound, I can submit a revised patch that also updates documentation, regression tests, and potententially updates other type objects in the same manner as this patch updates _elementtree.Element . |
|||
| msg177134 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2012年12月07日 22:23 | |
TreeBuilder was also pickleable in 3.2 but now isn't so; updating summary accordingly. Attaching a checkpoint patch. It addresses both Element and TreeBuilder, and adds tests. The added tests fail if test___all__ is included in the test run, so the patch should not be applied as-is. Thanks to Ezio Melotti for preliminary review and suggestions on IRC. |
|||
| msg177136 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2012年12月07日 22:36 | |
Reattaching without the unrelated Python-ast.c change. |
|||
| msg177141 - (view) | Author: Eli Bendersky (eli.bendersky) * (Python committer) | Date: 2012年12月08日 03:15 | |
Thanks for working on it, Daniel. Unfortunately I won't have time to look at it in the near future, but I will definitely look at the patch once I get some free time to hack on Python again. |
|||
| msg177170 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2012年12月08日 18:50 | |
@Eli, thanks. In the meantime, attaching a patch addressing Ezio's review; the caveats from msg 177134 still apply. |
|||
| msg177590 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2012年12月16日 11:58 | |
Haven't had a chance to look at the test___all__ -related failures yet. Still planning to do so once I have some time. |
|||
| msg177766 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2012年12月19日 17:19 | |
Re the problem from msg 177134 ('./python -mtest test___all__ test_xml_etree' failing), it is a a preexisting problem, in the sense that applying just the Lib/test/ part of this patch suffices to trigger it. Adding 'xml.etree' to the blacklist in test___all__.py:50 suffices to address the problem. At a guess that is because when test___all__ imports the 'xml.etree' module, it doesn't filter out the '_elementtree' symbol like test_xml_etree.py:test_main() does. New patch attached; differences to previous version: - Removed debug prints. - Skip test_pickling when testing the Python xml.etree. (The skips could be removed in the future, if/when the test___all__ interaction problem is resolved.) |
|||
| msg178498 - (view) | Author: Eli Bendersky (eli.bendersky) * (Python committer) | Date: 2012年12月29日 15:00 | |
Daniel, is your patch made vs. the 3.3 branch? I'll need to apply there first, and then merge up to default (3.4). [Also, removing the 3.2 tag here. 3.2 won't be fixed to make _elementtree pickleable - it never was]. |
|||
| msg178499 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2012年12月29日 15:04 | |
I wrote the patch against default (3.4), but it applies cleanly to 3.3. |
|||
| msg178500 - (view) | Author: Eli Bendersky (eli.bendersky) * (Python committer) | Date: 2012年12月29日 15:10 | |
Also, could you explain what makes test___all__ start failing with this patch? What are you adding that makes that happen? P.S. I suspect the root reason is the bad way etree tests are structured in general. See issue 15083 |
|||
| msg178509 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2012年12月29日 16:18 | |
Any attempt to pickle an Element object in test_xml_etree.py causes that test to fail if test___all__ had been run before it; see attached transcript. It's against 3.4, with no changes other than the testsuite changes shown within. I don't know what the root cause is. As noted before, adding 'xml.etree' to test___all__.AllTest.test_all.blacklist is a workaround, so I assume the root cause lies within the test framework --- a bad interaction between the import magics in test___all__ and test_xml_etree. |
|||
| msg178546 - (view) | Author: Eli Bendersky (eli.bendersky) * (Python committer) | Date: 2012年12月30日 00:18 | |
I think I understand what's going on there. Pickle, to make sure an object can be picked, looks at the module it comes from and tries to import its class, to make sure they're the same. What happens since 3.3 is that for the Python Element class, it imports ElementTree. The harness in test_xml_etree carefully sets up to ignore _elementtree so the correct class is pulled. However, if test___all__ runs before it it just does a brute "from ElementTree import *" which places the C version in sys.modules, and this is what pickle finds. So in the meantime, until issue 15083 is resolved I think it's safe to put ET in the ignore list of test___all__. |
|||
| msg178578 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2012年12月30日 14:18 | |
New changeset 71508fc738bb by Eli Bendersky in branch '3.3': For Issue #16076: make sure that pickling of Element objects is tested, and do http://hg.python.org/cpython/rev/71508fc738bb New changeset 5a38f4d7833c by Eli Bendersky in branch 'default': For issue #16076: merge 3.3 http://hg.python.org/cpython/rev/5a38f4d7833c |
|||
| msg178579 - (view) | Author: Eli Bendersky (eli.bendersky) * (Python committer) | Date: 2012年12月30日 14:24 | |
I've added some (currently pyET specific) pickling tests. Daniel, could you re-generate the patch? Note that for the C version of pickling you can now enable the pickle test. Point to consider - can elements pickled in C be unpickled in Python and vice versa? It should probably work (see test_decimal's pickling tests). |
|||
| msg178627 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2012年12月30日 23:57 | |
Attached. Differences to previous version: - Avoid the test___all__ issue (#16817) by manipulating sys.modules directly - Support pickling interoperability between the C and Python implementations |
|||
| msg178632 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2012年12月31日 00:54 | |
BTW, maybe I'm just confused, but I'm a little surprised that C pickling -> Python unpickling works: the C __reduce__ returns a 3-element tuple, which according to pickle docs[1] must either be a dictionary or be passed to __setstate__; it's not a dictionary and the Python implementation doesn't have a __setstate__. http://docs.python.org/3/library/pickle#pickle.object.__reduce__ |
|||
| msg178721 - (view) | Author: Eli Bendersky (eli.bendersky) * (Python committer) | Date: 2013年01月01日 00:32 | |
A couple of high-level questions about the C code (this isn't a detailed review yet): 1. Why did you choose to implement __reduce__ and not __getstate__? 2. A lot of code appears to be shared between element_setstate_from_attributes and the init method implementation, can it be refactored? 3. I've been thinking about the C<->Python pickle compatibility, and it may be too much trouble at this point for no very strong benefit. What could make more sense at some point is implementing a __getstate__/__setstate__ in both the Python and C versions that are compatible (i.e. produce the same things). But this may mean changing the way the Python Element is pickled in 3.3, which may not be acceptable. So we may prefer to forego the compatibility in 3.3 and then for 3.4 make the change. I'm still not sure what is best here. |
|||
| msg178736 - (view) | Author: Eli Bendersky (eli.bendersky) * (Python committer) | Date: 2013年01月01日 15:53 | |
Other thoughts. I'm not sure why you're surprised the C->Python pickle/unpickle works. You've changed the type name from Element to _elementtree.Element, so I would guess Python always uses the C version to unpickle as well. Can you debug to verify what actually goes on under the hood? Why did you change the class name, by the way, I don't think it's a valid change at least for 3.3 in terms of backwards compatibility. Regarding that compatibility, and even easier idea would be for the C pickle to return the same __dict__ implicitly gathered from the Python version, and then only one version of the unpickle is required. |
|||
| msg178745 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2013年01月01日 18:42 | |
Yes, currently the C version is also used for unpickling. Actually
this problem was one of the reasons why _decimal sets its name to
"decimal".
from test.support import import_fresh_module
import pickle, sys
C = import_fresh_module('xml.etree.ElementTree', fresh=['_elementtree'])
P = import_fresh_module('xml.etree.ElementTree', blocked=['_elementtree'])
e = C.Element('foo', bar=42)
e.text = "text goes here"
e.tail = "opposite of head"
C.SubElement(e, 'child').append(C.Element('grandchild'))
e.append(C.Element('child'))
e.findall('.//grandchild')[0].set('attr', 'other value')
sys.modules['xml.etree.ElementTree'] = C
s = pickle.dumps(e)
s
b'\x80\x03c_elementtree\nElement\nq\x00X\x03\x00\x00\x00fooq\x01}q\x02X\x03\x00\x00\x00barq\x03K*s\x86q\x04Rq\x05X
\x0e\x00\x00\x00text goes hereq\x06X\x10\x00\x00\x00opposite of headq\x07h\x00X\x05\x00\x00\x00childq\x08\x85q\tRq
\nNNh\x00X\n\x00\x00\x00grandchildq\x0b}q\x0cX\x04\x00\x00\x00attrq\rX\x0b\x00\x00\x00other valueq\x0es\x86q\x0fRq
\x10NNN\x87q\x11b\x85q\x12\x87q\x13bh\x00h\x08\x85q\x14Rq\x15NNN\x87q\x16b\x86q\x17\x87q\x18b.'
sys.modules['xml.etree.ElementTree'] = P
x = pickle.loads(s)
type(x)
<class '_elementtree.Element'>
|
|||
| msg178770 - (view) | Author: Eli Bendersky (eli.bendersky) * (Python committer) | Date: 2013年01月01日 22:40 | |
Thank you for the input Stefan. I was actually glancing at _decimal as an example of implementing pickling and inter-compatibility between the C and Py versions of pickles. You've chosen compatibility by having the same class name and __reduce__ returning the exact same tuple for both implementation. This is a pretty good idea. Unfortunately, in the case of Element it's more difficult, because the existing Py implementation does not have __reduce__, so pickle does its thing by looking at __dict__. Changing the Py version to have a __reduce__ seems risky as it may break compatibility between the different Py versions of Element (say, from 3.2) and this is bad. |
|||
| msg178776 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2013年01月01日 22:56 | |
Eli Bendersky wrote on Tue, Jan 01, 2013 at 15:54:00 +0000:
> Why did you change the class name, by the way, I don't think it's
> a valid change at least for 3.3 in terms of backwards compatibility.
>
With unmodified tip of 3.4:
>>> import pickle, xml.etree.ElementTree as ET
>>> pickle.dumps(ET.Element('foo'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <class 'Element'>: attribute lookup builtins.Element failed
I added the "_elementtree" to the tp_name in order to bypass the above
error. Module-qualified names were in use elsewhere (including by
_elementtree._element_iterator) so it seemed reasonable. I'll defer to
you about compatibility implications of this change.
> Regarding that compatibility, and even easier idea would be for the
> C pickle to return the same __dict__ implicitly gathered from the
> Python version, and then only one version of the unpickle is required.
That makes sense. But going forward it might be even better to define
an explicit __reduce__/__getstate__ for the Python version too, so if
the instance dict grows new members, they won't get serialised
unintentionally. (This consideration is also why C code in the latest
patch makes some effort to notice unknown args in the dict.)
|
|||
| msg178783 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2013年01月01日 23:11 | |
Eli Bendersky wrote on Tue, Jan 01, 2013 at 00:32:51 +0000: > 1. Why did you choose to implement __reduce__ and not __getstate__? Maybe I was simply imitating what some other extension module was doing ;) By using __reduce__ with the type as first return value, the __setstate__ code becomes simpler because it doesn't need to address 'tag' and 'attrib'. This consideration is somewhat moot now that the unpickle-Python-pickled-Element's code path has to think about all instance attributes, including those settable by the constructor. > 2. A lot of code appears to be shared between > element_setstate_from_attributes and the init method implementation, > can it be refactored? It seems there might be room for code reuse --- both functions set tag,text,tail,attrib (but the unpickler sets the children too). I'll look into that in the next iteration (once the class name and pickle output format issues are settled). |
|||
| msg178788 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2013年01月01日 23:43 | |
Also, the class inheritance in the tests should be amended to follow #16835. |
|||
| msg178946 - (view) | Author: Eli Bendersky (eli.bendersky) * (Python committer) | Date: 2013年01月03日 14:44 | |
On Tue, Jan 1, 2013 at 2:56 PM, Daniel Shahaf <report@bugs.python.org>wrote: > > Daniel Shahaf added the comment: > > Eli Bendersky wrote on Tue, Jan 01, 2013 at 15:54:00 +0000: > > Why did you change the class name, by the way, I don't think it's > > a valid change at least for 3.3 in terms of backwards compatibility. > > > > With unmodified tip of 3.4: > > >>> import pickle, xml.etree.ElementTree as ET > >>> pickle.dumps(ET.Element('foo')) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > _pickle.PicklingError: Can't pickle <class 'Element'>: attribute > lookup builtins.Element failed > > I added the "_elementtree" to the tp_name in order to bypass the above > error. Module-qualified names were in use elsewhere (including by > _elementtree._element_iterator) so it seemed reasonable. I'll defer to > you about compatibility implications of this change. I asked on pydev, but this is a key point to resolve. Can pickling/unpickling be made to work correctly without such (or similar) change at all? How will the unpickler know which module to load when it sees a pickled object of Element type? If this change is required (even if we choose to name it "xml.etree.ElementTree.Element" for Py compatibility to fix the pickling regression, we may find ourselves in a need to change it between 3.3 and 3.3.1 and I'm not sure if that's valid. I hope my question on pydev will be resolved conclusively. Danial, could you investigate if such a change is absolutely required to make pickling/unickling of Element work? |
|||
| msg179002 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2013年01月04日 02:09 | |
Eli Bendersky wrote on Thu, Jan 03, 2013 at 14:44:02 +0000: > On Tue, Jan 1, 2013 at 2:56 PM, Daniel Shahaf <report@bugs.python.org>wrote: > > I added the "_elementtree" to the tp_name in order to bypass the above > > error. Module-qualified names were in use elsewhere (including by > > _elementtree._element_iterator) so it seemed reasonable. I'll defer to > > you about compatibility implications of this change. > > I asked on pydev, but this is a key point to resolve. Can > pickling/unpickling be made to work correctly without such (or similar) > change at all? How will the unpickler know which module to load when it > sees a pickled object of Element type? Is there a requirement that it loads a particular module? Would etree users notice the difference if pickle.load() returns an instance of the "other" Element implementation than the one they pickled? > Danial, could you investigate if such a change is absolutely required to > make pickling/unickling of Element work? Yes, I'll look into that. |
|||
| msg179011 - (view) | Author: Stefan Krah (skrah) * (Python committer) | Date: 2013年01月04日 10:09 | |
Daniel Shahaf <report@bugs.python.org> wrote: > Is there a requirement that it loads a particular module? Would etree > users notice the difference if pickle.load() returns an instance of the > "other" Element implementation than the one they pickled? Yes: If you send an "_elementtree.Element" pickle to another machine that doesn't have a working C version, it can't be unpickled. As far as I know the only way around that is to pickle as "xml.etree.ElementTree.Element". |
|||
| msg179032 - (view) | Author: Eli Bendersky (eli.bendersky) * (Python committer) | Date: 2013年01月04日 14:59 | |
On Jan 4, 2013 2:09 AM, "Stefan Krah" <report@bugs.python.org> wrote: > > > Stefan Krah added the comment: > > Daniel Shahaf <report@bugs.python.org> wrote: > > Is there a requirement that it loads a particular module? Would etree > > users notice the difference if pickle.load() returns an instance of the > > "other" Element implementation than the one they pickled? > > Yes: If you send an "_elementtree.Element" pickle to another machine that > doesn't have a working C version, it can't be unpickled. As far as I know > the only way around that is to pickle as "xml.etree.ElementTree.Element". > > ---------- > Yes this is a good point. Another thing to consider is that if both report the same name then it will be possible to unpickle on a machine running 3.2 > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue16076> > _______________________________________ |
|||
| msg179243 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2013年01月07日 04:58 | |
Eli Bendersky wrote on Thu, Jan 03, 2013 at 14:44:02 +0000: > If this change is required (even if we choose to name it > "xml.etree.ElementTree.Element" for Py compatibility to fix the pickling > regression, we may find ourselves in a need to change it between 3.3 and > 3.3.1 and I'm not sure if that's valid. I hope my question on pydev will be > resolved conclusively. > > Danial, could you investigate if such a change is absolutely required to > make pickling/unickling of Element work? There are a few options: - Change c-Element's tp_name to "xml.etree.ElementTree.Element". - Register a reduce function for c-Element's that serialises them by constructing an equivalent py-Element and returning the latter's .__dict__ via the third return value: http://docs.python.org/3/library/copyreg#copyreg.pickle - Add an entry mapping "builtins.Element" to "xml.etree.ElementTree.Element" to _compat_pickle.IMPORT_MAPPING (which is used by Lib/pickle.py:_Pickler.find_class()). I haven't tried to implement either of these approaches. |
|||
| msg179244 - (view) | Author: Eli Bendersky (eli.bendersky) * (Python committer) | Date: 2013年01月07日 05:16 | |
I think that changing the _elementtree's Element's name to match the Python version, and then making sure the same serialized object is returned - is a worthy option to try. Then it will hopefully "just work". When pickle deserializes a user-defined object that says it's a xml.etree.ElementTree.Element, it will try to import Element from xml.etree.ElementTree and should get the accelerated class. On machines without acceleration it will get the Python class. |
|||
| msg179314 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2013年01月08日 05:30 | |
In the meantime, unrelated question: should TreeBuilder be pickleable too, and interchangeably (pickle|unpickle)able between the Python and C implementations? Asking because the Python TreeBuilder has a _factory member which it initializes to 'Element', and I'm not quite sure how cross-pickling should handle that. |
|||
| msg179315 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2013年01月08日 06:31 | |
New iteration. Open issues: - Share code with the init method. The issue with sharing code with either element_init() or create_new_element() would be malloc+realloc: unlike either of these methods, we know both the attributes and the number of children at allocation time, so we can allocate directly the right number of children. - C<->Py Interchangeable pickling of TreeBuilder (per above msg). Differences to previous version: - Skip C<->Py interchangeability testing of TreeBuilder --- because that one started failing with: _pickle.UnpicklingError: state is not a dictionary It can probably be fixed, but I'd like to address the above question about pickling the factory first. - Use __getstate__ rather than __reduce__ for Element. - Make Element pickling interchangeable between c/py Element (set tp_name to "xml.etree.ElementTree.Element" and match the pickled format). |
|||
| msg179354 - (view) | Author: Eli Bendersky (eli.bendersky) * (Python committer) | Date: 2013年01月08日 14:39 | |
Daniel, thanks for this patch, it looks very good. I had some comments in the code review. As for TreeBuilder, let's split it into a separate issue - I think it's much less critical. We can discuss it later once we're done with Element. If anything, it's more important to handle ElementTree itself first - it should be very easy (much simpler than Element). With the factory method I'm not sure how the Py pickling happens. I need to dig a bit in pickling for that, because I don't know how method reference pickling works. |
|||
| msg179356 - (view) | Author: Eli Bendersky (eli.bendersky) * (Python committer) | Date: 2013年01月08日 15:00 | |
P = import_fresh_module('xml.etree.ElementTree', blocked=['_elementtree'])
tb = P.TreeBuilder(element_factory=lambda a, b: [a, b])
print(pickle.dumps(tb))
Gives: _pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtins.function failed
|
|||
| msg179386 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2013年01月08日 22:51 | |
Eli Bendersky wrote on Tue, Jan 08, 2013 at 15:00:42 +0000:
>
> Eli Bendersky added the comment:
>
> P = import_fresh_module('xml.etree.ElementTree', blocked=['_elementtree'])
> tb = P.TreeBuilder(element_factory=lambda a, b: [a, b])
> print(pickle.dumps(tb))
>
> Gives: _pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtins.function failed
Is that with or without the patch?
|
|||
| msg179393 - (view) | Author: Eli Bendersky (eli.bendersky) * (Python committer) | Date: 2013年01月08日 23:13 | |
On Tue, Jan 8, 2013 at 2:51 PM, Daniel Shahaf <report@bugs.python.org>wrote: > > Daniel Shahaf added the comment: > > Eli Bendersky wrote on Tue, Jan 08, 2013 at 15:00:42 +0000: > > > > Eli Bendersky added the comment: > > > > P = import_fresh_module('xml.etree.ElementTree', > blocked=['_elementtree']) > > tb = P.TreeBuilder(element_factory=lambda a, b: [a, b]) > > print(pickle.dumps(tb)) > > > > Gives: _pickle.PicklingError: Can't pickle <class 'function'>: attribute > lookup builtins.function failed > > Is that with or without the patch? > Without. Pickle can't handle functions that are not top-level, so bound methods, internal functions, lambdas are all out. So pickling probably wasn't a first priority for the Python version of TreeBuilder as well... |
|||
| msg179395 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2013年01月08日 23:24 | |
Dissociating TreeBuilder from this issue per recent comments. |
|||
| msg179400 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2013年01月08日 23:48 | |
v8 removes TreeBuilder handling (code + tests) and removes memcpy per review. (Outstanding issues in the review can be fixed post-v8, if needed.) |
|||
| msg179544 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年01月10日 14:07 | |
New changeset 8d6dadfecf22 by Eli Bendersky in branch '3.3': Issue #16076: make _elementtree.Element pickle-able in a way that is compatible http://hg.python.org/cpython/rev/8d6dadfecf22 New changeset 4c268b7c86e6 by Eli Bendersky in branch 'default': Issue #16076: make _elementtree.Element pickle-able in a way that is compatible http://hg.python.org/cpython/rev/4c268b7c86e6 |
|||
| msg179546 - (view) | Author: Eli Bendersky (eli.bendersky) * (Python committer) | Date: 2013年01月10日 14:08 | |
Fixed in 3.3 and default. Thanks for the good work, Daniel. A separate issue can be opened for TreeBuilder. |
|||
| msg179553 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年01月10日 14:36 | |
New changeset c46054b49b6c by Eli Bendersky in branch '3.3': Update Misc/NEWS for issue #16076 http://hg.python.org/cpython/rev/c46054b49b6c |
|||
| msg179642 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2013年01月11日 07:37 | |
The fix introduced some refleaks:
$ ./python -m test -R3:2 test_xml_etree_c
test_xml_etree_c leaked [56, 56] references, sum=112
One seems to be in __getstate__:
diff --git a/Modules/_elementtree.c b/Modules/_elementtree.c
--- a/Modules/_elementtree.c
+++ b/Modules/_elementtree.c
@@ -859,8 +859,10 @@
PICKLED_ATTRIB, self->extra->attrib,
PICKLED_TEXT, self->text,
PICKLED_TAIL, self->tail);
- if (instancedict)
+ if (instancedict) {
+ Py_DECREF(children);
return instancedict;
+ }
else {
for (i = 0; i < PyList_GET_SIZE(children); i++)
Py_DECREF(PyList_GET_ITEM(children, i));
I'm still looking for the other.
|
|||
| msg179651 - (view) | Author: Ezio Melotti (ezio.melotti) * (Python committer) | Date: 2013年01月11日 08:44 | |
This is what I found out.
I used an easily copy/pastable one-liner that creates 3 variables: e (no children), e2 (3 children), e3 (5 children).
Original leaky code (test_xml_etree_c leaked [56, 56] references, sum=112):
>>> from xml.etree import ElementTree as ET; e = ET.Element('foo'); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e2 = ET.XML(SAMPLE_XML); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><tag class='b' /><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e3 = ET.XML(SAMPLE_XML)
[76773 refs]
>>> ### e has no children and leaks 1 ref:
>>> e.__getstate__()
{'tag': 'foo', 'attrib': {}, 'text': None, 'tail': None, '_children': []}
[76791 refs]
>>> e.__getstate__()
{'tag': 'foo', 'attrib': {}, 'text': None, 'tail': None, '_children': []}
[76792 refs]
>>> e.__getstate__()
{'tag': 'foo', 'attrib': {}, 'text': None, 'tail': None, '_children': []}
[76793 refs]
>>> ### e2 has 3 children and leaks 4 refs:
>>> e2.__getstate__()
{'tag': 'body', 'attrib': {}, 'text': None, 'tail': None, '_children': [<Element 'tag' at 0xb735cef4>, <Element 'tag' at 0xb7368034>, <Element 'section' at 0xb73688f4>]}
[76798 refs]
>>> e2.__getstate__()
{'tag': 'body', 'attrib': {}, 'text': None, 'tail': None, '_children': [<Element 'tag' at 0xb735cef4>, <Element 'tag' at 0xb7368034>, <Element 'section' at 0xb73688f4>]}
[76802 refs]
The leaked refs seems to be 1 for the children list + 1 for each children.
The diff I pasted in the previous *seems* to fix this (i.e. leaks in __gestate__ are gone, tests pass), but I had it crash once (couldn't reproduce after that, so it might be unrelated*), and I'm not sure it's correct.
With that patch applied we go down to test_xml_etree_c leaked [6, 6] references, sum=12.
The remaining leaks seem to be in __setstate__.
Patched code:
>>> from xml.etree import ElementTree as ET; e = ET.Element('foo'); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e2 = ET.XML(SAMPLE_XML); SAMPLE_XML = "<body><tag class='a'>text</tag><tag class='b' /><tag class='b' /><tag class='b' /><section><tag class='b' id='inner'>subtext</tag></section></body>"; e3 = ET.XML(SAMPLE_XML)
[76773 refs]
>>> ### no more leaks for getstate:
>>> p = e.__getstate__()
[76787 refs]
>>> p = e.__getstate__()
[76787 refs]
>>> ### also no leaks when there are no child:
>>> e.__setstate__(p)
[76788 refs]
>>> e.__setstate__(p)
[76788 refs]
>>> ### no more leaks for getstate with children:
>>> p2 = e2.__getstate__()
[76807 refs]
>>> p2 = e2.__getstate__()
[76807 refs]
>>> ### one ref leaked for every child in __setstate__:
>>> e2.__setstate__(p2)
[76810 refs]
>>> e2.__setstate__(p2)
[76813 refs]
>>> e2.__setstate__(p2)
[76816 refs]
I'm not working on this anymore now, so someone more familiar with the code can take a look, see if my patch is correct, and fix the remaining leaks.
* maybe I'm doing something wrong, but ISTM that ``make -j2`` doesn't always work right away, and sometimes I get different results if I do it again without touching the code.
|
|||
| msg179667 - (view) | Author: Daniel Shahaf (danielsh) | Date: 2013年01月11日 11:07 | |
Ezio Melotti wrote on Fri, Jan 11, 2013 at 08:44:43 +0000: > >>> ### one ref leaked for every child in __setstate__: > >>> e2.__setstate__(p2) > [76810 refs] > >>> e2.__setstate__(p2) > [76813 refs] > >>> e2.__setstate__(p2) > [76816 refs] > > I'm not working on this anymore now, so someone more familiar with the > code can take a look, see if my patch is correct, and fix the > remaining leaks. > Thank for narrowing it down so much. Attached patch incorporates your getstate fix and a fix for setstate. It passes the -R 3:2 tests. |
|||
| msg179790 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年01月12日 13:21 | |
New changeset 5b36768b9a11 by Eli Bendersky in branch '3.3': Issue #16076: fix refleak in pickling of Element. http://hg.python.org/cpython/rev/5b36768b9a11 New changeset 848738d3c40f by Eli Bendersky in branch 'default': Close #16076: fix refleak in pickling of Element. http://hg.python.org/cpython/rev/848738d3c40f |
|||
| msg179792 - (view) | Author: Roundup Robot (python-dev) (Python triager) | Date: 2013年01月12日 13:43 | |
New changeset 4501813ea676 by Eli Bendersky in branch '3.3': Issue #16076: check for return value of PyTuple_New for args (following http://hg.python.org/cpython/rev/4501813ea676 New changeset 7313096e0bad by Eli Bendersky in branch 'default': Issue #16076: check for return value of PyTuple_New for args (following http://hg.python.org/cpython/rev/7313096e0bad |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:36 | admin | set | github: 60280 |
| 2013年01月12日 13:43:32 | python-dev | set | messages: + msg179792 |
| 2013年01月12日 13:21:35 | python-dev | set | status: open -> closed messages: + msg179790 stage: needs patch -> resolved |
| 2013年01月11日 11:07:02 | danielsh | set | files:
+ memleak-v1.diff messages: + msg179667 |
| 2013年01月11日 08:44:43 | ezio.melotti | set | status: closed -> open messages: + msg179651 stage: resolved -> needs patch |
| 2013年01月11日 07:37:32 | ezio.melotti | set | messages: + msg179642 |
| 2013年01月10日 14:36:27 | python-dev | set | messages: + msg179553 |
| 2013年01月10日 14:08:46 | eli.bendersky | set | status: open -> closed resolution: fixed messages: + msg179546 stage: needs patch -> resolved |
| 2013年01月10日 14:07:34 | python-dev | set | messages: + msg179544 |
| 2013年01月08日 23:48:35 | danielsh | set | files:
+ i16076-v8.diff messages: + msg179400 |
| 2013年01月08日 23:24:44 | danielsh | set | messages:
+ msg179395 title: xml.etree.ElementTree.Element and xml.etree.ElementTree.TreeBuilder are no longer pickleable -> xml.etree.ElementTree.Element is no longer pickleable |
| 2013年01月08日 23:13:59 | eli.bendersky | set | messages: + msg179393 |
| 2013年01月08日 22:51:40 | danielsh | set | messages: + msg179386 |
| 2013年01月08日 15:00:42 | eli.bendersky | set | messages: + msg179356 |
| 2013年01月08日 14:39:43 | eli.bendersky | set | messages: + msg179354 |
| 2013年01月08日 06:31:49 | danielsh | set | files:
+ i16076-v7.diff messages: + msg179315 |
| 2013年01月08日 05:30:34 | danielsh | set | messages: + msg179314 |
| 2013年01月07日 05:16:22 | eli.bendersky | set | messages: + msg179244 |
| 2013年01月07日 04:58:09 | danielsh | set | messages: + msg179243 |
| 2013年01月04日 14:59:18 | eli.bendersky | set | messages: + msg179032 |
| 2013年01月04日 10:09:41 | skrah | set | messages: + msg179011 |
| 2013年01月04日 02:09:37 | danielsh | set | messages: + msg179002 |
| 2013年01月03日 14:44:02 | eli.bendersky | set | messages: + msg178946 |
| 2013年01月01日 23:43:54 | danielsh | set | messages: + msg178788 |
| 2013年01月01日 23:11:48 | danielsh | set | messages: + msg178783 |
| 2013年01月01日 22:56:48 | danielsh | set | messages: + msg178776 |
| 2013年01月01日 22:40:00 | eli.bendersky | set | messages: + msg178770 |
| 2013年01月01日 18:42:36 | skrah | set | nosy:
+ skrah messages: + msg178745 |
| 2013年01月01日 15:53:59 | eli.bendersky | set | messages: + msg178736 |
| 2013年01月01日 00:32:50 | eli.bendersky | set | messages: + msg178721 |
| 2012年12月31日 00:54:58 | danielsh | set | messages: + msg178632 |
| 2012年12月30日 23:57:34 | danielsh | set | files:
+ i16076-v6.diff messages: + msg178627 |
| 2012年12月30日 14:24:17 | eli.bendersky | set | messages: + msg178579 |
| 2012年12月30日 14:18:33 | python-dev | set | nosy:
+ python-dev messages: + msg178578 |
| 2012年12月30日 00:30:55 | flox | set | nosy:
+ flox |
| 2012年12月30日 00:18:33 | eli.bendersky | set | messages: + msg178546 |
| 2012年12月29日 16:18:13 | danielsh | set | files:
+ transcript-test___all__.txt messages: + msg178509 |
| 2012年12月29日 15:10:33 | eli.bendersky | set | messages: + msg178500 |
| 2012年12月29日 15:04:46 | danielsh | set | messages: + msg178499 |
| 2012年12月29日 15:00:01 | eli.bendersky | set | messages:
+ msg178498 versions: - Python 3.2 |
| 2012年12月19日 17:19:39 | danielsh | set | files:
+ i16076-v5.diff messages: + msg177766 |
| 2012年12月16日 11:58:39 | danielsh | set | messages: + msg177590 |
| 2012年12月08日 18:50:45 | danielsh | set | files:
+ i16076-v4-combined.diff messages: + msg177170 |
| 2012年12月08日 03:15:19 | eli.bendersky | set | messages: + msg177141 |
| 2012年12月07日 22:42:12 | Arfrever | set | priority: high -> release blocker nosy: + larry, georg.brandl |
| 2012年12月07日 22:40:15 | danielsh | set | files: - i16076-v2-combined.diff |
| 2012年12月07日 22:36:39 | danielsh | set | files:
+ i16076-v3-combined.diff messages: + msg177136 |
| 2012年12月07日 22:23:23 | danielsh | set | files:
+ i16076-v2-combined.diff messages: + msg177134 title: xml.etree.ElementTree.Element is no longer pickleable -> xml.etree.ElementTree.Element and xml.etree.ElementTree.TreeBuilder are no longer pickleable |
| 2012年12月03日 16:48:11 | danielsh | set | files:
+ i16076-cpatch.diff nosy: + danielsh messages: + msg176854 components: + Extension Modules keywords: + patch |
| 2012年11月01日 20:44:12 | serhiy.storchaka | set | nosy:
- serhiy.storchaka |
| 2012年10月11日 17:07:17 | santoso.wijaya | set | messages:
+ msg172665 components: + Library (Lib) versions: + Python 3.2 |
| 2012年10月11日 01:37:46 | santoso.wijaya | set | nosy:
+ santoso.wijaya messages: + msg172613 |
| 2012年10月07日 13:19:26 | eli.bendersky | set | messages: + msg172297 |
| 2012年10月03日 02:35:07 | jcea | set | priority: normal -> high stage: needs patch |
| 2012年10月03日 02:34:30 | jcea | set | priority: high -> normal nosy: + jcea stage: needs patch -> (no value) |
| 2012年10月01日 17:52:06 | serhiy.storchaka | set | messages:
+ msg171737 versions: + Python 3.4 |
| 2012年10月01日 16:38:41 | pitrou | set | priority: normal -> high stage: needs patch |
| 2012年09月30日 22:38:21 | Arfrever | set | nosy:
+ Arfrever |
| 2012年09月30日 16:46:21 | ezio.melotti | set | nosy:
+ ezio.melotti |
| 2012年09月28日 10:52:14 | einarfd | set | components: - Library (Lib) |
| 2012年09月28日 09:59:15 | serhiy.storchaka | set | nosy:
+ eli.bendersky, serhiy.storchaka |
| 2012年09月28日 09:58:32 | serhiy.storchaka | set | keywords:
+ 3.3regression components: + XML |
| 2012年09月28日 08:54:59 | einarfd | create | |