homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: deepcopying an xml.dom.minidom.Document generates an invalid XML document
Type: behavior Stage: resolved
Components: XML Versions: Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Marian.Ganisin, alexandre.vassalotti, bkabrda, flox, genij.math, pitrou, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2010年10月17日 13:42 by flox, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
mganisin-Issue10131-Python-2.7.2.patch Marian.Ganisin, 2011年07月29日 09:08 patch based on python-2.7.2
copying_NodeList.patch serhiy.storchaka, 2015年11月26日 20:42 review
Messages (8)
msg118942 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010年10月17日 13:42
>>> import copy
>>> from xml.dom import minidom
>>> doc = minidom.parseString('<root/>')
>>> doc2 = copy.deepcopy(doc)
>>> doc.toxml()
u'<?xml version="1.0" ?><root/>'
>>> doc2.toxml()
u'<?xml version="1.0" ?><root/><root/>'
>>> minidom.parseString(doc2.toxml())
Traceback (most recent call last):
...
ExpatError: junk after document element: line 1, column 2
The workaround is to use doc.cloneNode(True).
msg118943 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010年10月17日 13:47
It works fine with 2.5 and 2.6.
msg141338 - (view) Author: Marian Ganisin (Marian.Ganisin) Date: 2011年07月29日 09:08
xml.dom.minicompat.NodeList provides __reduce__/__reduce_ex__ methods, they return "state" as well as "list iterator". However one of those seems to be absolutely enough for reconstruction of instance (__reduce__/__reduce_ex__ are inherited, methods __setstate/__getstate__ aka state provider/consumer are defined in NodeList).
deepcopy (actually its helper function _reconstruct) uses both, state and list iterator for copying, first it calls __setstate__(state) on new copy, then it goes through iterator and calls append(item) (as it is probably common for lists). This is it! (state and list iterator contain same information, list of items)
Prior to some minor 2.7 update deepcopy used list iterator and state in opposite order, first it run append through iterator, then __setstate__ (this overwrote new content at all, no unwanted copies appeared). So the issue is there all the time, just with no impact in the past.
Issue also occurs with simple copy.copy() on NodeList:
>>> import copy
>>> from xml.dom import minidom
>>> doc = minidom.parseString('<root/>')
>>> print copy.copy(doc.childNodes)
[<DOM Element: root at 0xb73fbbcc>, <DOM Element: root at 0xb73fbbcc>]
I am strongly convinced NodeList doesn't require __setstate__/__getstate__ methods (even more I believe they are not desired in subclass of list). Therefore I am proposing patch with removal of these methods.
If I am wrong in my final decision, somebody smarter has to find a solution. This is the reason why I described issue in deep. :)
msg146175 - (view) Author: Yevgen Yampolskiy (genij.math) Date: 2011年10月22日 17:29
You can create object copy using both copy and pickle modules. 
I assume that identical results should be produced.
pickle handles minidom.Document correctly, however copy does not. 
Even if patch to NodeList will be applied, copy or pickle modules need to be adjusted to produce identical results.
From this view point copy module needs to be adjusted.
Here is the test code:
import copy, pickle
from xml.dom.minidom import NodeList
obj = NodeList()
obj.append('a')
obj2 = copy.deepcopy(obj)
print(obj2)
obj2 = pickle.loads(pickle.dumps(obj))
print(obj2)
Output (python 2.7-3.2):
['a', 'a']
['a']
msg148291 - (view) Author: Marian Ganisin (Marian.Ganisin) Date: 2011年11月24日 22:55
Code from msg146175 with attached patch applied:
>>> import sys
>>> print(sys.version_info)
sys.version_info(major=2, minor=7, micro=2, releaselevel='final', serial=0)
>>> import copy, pickle
>>> from xml.dom.minidom import NodeList
>>> 
>>> obj = NodeList()
>>> obj.append('a')
>>> 
>>> obj2 = copy.deepcopy(obj)
>>> print(obj2)
['a']
>>> 
>>> obj2 = pickle.loads(pickle.dumps(obj))
>>> print(obj2)
['a']
Result is same in both cases.
msg148473 - (view) Author: Yevgen Yampolskiy (genij.math) Date: 2011年11月28日 10:56
Marian, I have no doubts that the patch cures THIS issue. 
The problem is that it cures the wrong thing.
It is _reconstruct function that needs to be fixed (somebody changed order of 'if' statements between 2.6 and 2.7 releases)
msg255436 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015年11月26日 21:53
New changeset 0eea57ddb75f by Serhiy Storchaka in branch '2.7':
Issue #10131: Fixed deep copying of minidom documents. Based on patch
https://hg.python.org/cpython/rev/0eea57ddb75f
New changeset aa304ad32292 by Serhiy Storchaka in branch '3.4':
Issue #10131: Fixed deep copying of minidom documents. Based on patch
https://hg.python.org/cpython/rev/aa304ad32292
New changeset 5d6b2dc7e3d0 by Serhiy Storchaka in branch '3.5':
Issue #10131: Fixed deep copying of minidom documents. Based on patch
https://hg.python.org/cpython/rev/5d6b2dc7e3d0
New changeset 9fcfdb53e8af by Serhiy Storchaka in branch 'default':
Issue #10131: Fixed deep copying of minidom documents. Based on patch
https://hg.python.org/cpython/rev/9fcfdb53e8af 
msg255437 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015年11月26日 21:56
Thank you Marian for your patch. Committed modified patch with tests. For compatibility with data pickled by unpatched Python we have to keep __setstate__.
The discrepancy between pickle and copy module is different not easy issue.
History
Date User Action Args
2022年04月11日 14:57:07adminsetgithub: 54340
2015年11月26日 21:58:42serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: test needed -> resolved
2015年11月26日 21:56:25serhiy.storchakasetmessages: + msg255437
2015年11月26日 21:53:04python-devsetnosy: + python-dev
messages: + msg255436
2015年11月26日 20:42:22serhiy.storchakasetfiles: + copying_NodeList.patch
assignee: serhiy.storchaka

nosy: + pitrou, alexandre.vassalotti, serhiy.storchaka
versions: + Python 3.4, Python 3.5, Python 3.6, - Python 3.2, Python 3.3
2013年07月24日 11:56:48bkabrdasetnosy: + bkabrda
2012年03月23日 12:42:21eli.benderskysetnosy: - eli.bendersky
2012年02月24日 10:52:07ezio.melottisetnosy: + eli.bendersky
stage: needs patch -> test needed

versions: + Python 3.3, - Python 3.1
2011年11月28日 10:56:46genij.mathsetmessages: + msg148473
2011年11月24日 22:55:51Marian.Ganisinsetmessages: + msg148291
2011年10月22日 17:29:14genij.mathsetnosy: + genij.math
messages: + msg146175
2011年07月29日 09:08:21Marian.Ganisinsetfiles: + mganisin-Issue10131-Python-2.7.2.patch

nosy: + Marian.Ganisin
messages: + msg141338

keywords: + patch
2010年10月17日 13:47:10floxsetmessages: + msg118943
2010年10月17日 13:42:19floxcreate

AltStyle によって変換されたページ (->オリジナル) /