homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: ElementTree won't parse comments
Type: enhancement Stage: resolved
Components: XML Versions: Python 3.2
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, effbot, flox, poke, scoder
Priority: normal Keywords:

Created on 2010年04月01日 01:37 by poke, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Messages (5)
msg102051 - (view) Author: Patrick Westerhoff (poke) Date: 2010年04月01日 01:37
When using xml.etree.ElementTree to parse external XML files, all XML comments within that file are being stripped out. I guess that happens because there is no comment handler in the expat parser.
Example:
test.xml
--------
<example>
 <nodeA />
 <!-- some comment -->
 <nodeB />
</example>
test.py
-------
from xml.etree import ElementTree
with open( 'test.xml', 'r' ) as f:
 xml = ElementTree.parse( f )
ElementTree.dump( xml )
Result
------
<example>
 <nodeA />
 <nodeB />
</example>
msg102078 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010年04月01日 09:01
ElementTree does parse comments, it just omit them in the tree.
A quick search lead me to this page: http://effbot.org/zone/element-pi.htm
which can be further simplified:
from xml.etree import ElementTree
class MyTreeBuilder(ElementTree.TreeBuilder):
 def comment(self, data):
 self.start(ElementTree.Comment, {})
 self.data(data)
 self.end(ElementTree.Comment)
with open('c:/temp/t.xml', 'r') as f:
 xml = ElementTree.parse(
 f, parser=ElementTree.XMLParser(target=MyTreeBuilder()))
ElementTree.dump(xml)
Now, should ElementTree do this by default? It's not certain, see how effbot's sample needs to wrap the entire file into another 'document' element.
msg102110 - (view) Author: Patrick Westerhoff (poke) Date: 2010年04月01日 17:24
Thanks for your reply, Amaury. That page really might mean that it was not intended for ElementTree to parse such things by default. Although it might be nice if there was some easy way to simply enable it, instead of having to hack it into there and depending on details of some internal code (which might change in the future).
Your code btw. didn't work for me, but based on it and on that effbot page, I came up with the following solution, which works fine.
test.py
-------
from xml.etree import ElementTree
class CommentedTreeBuilder ( ElementTree.XMLTreeBuilder ):
 def __init__ ( self, html = 0, target = None ):
 ElementTree.XMLTreeBuilder.__init__( self, html, target )
 self._parser.CommentHandler = self.handle_comment
 
 def handle_comment ( self, data ):
 self._target.start( ElementTree.Comment, {} )
 self._target.data( data )
 self._target.end( ElementTree.Comment )
with open( 'test.xml', 'r' ) as f:
 xml = ElementTree.parse( f, parser = CommentedTreeBuilder() )
ElementTree.dump( xml )
msg102112 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010年04月01日 17:29
yes, my code uses the newer version of ElementTree which will be included with 2.7 and 3.2
msg113322 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010年08月08日 21:06
IIUC it works like that by design.
The ElementTree 1.3 (which is part of Python 2.7 and 3.2) allows to define your own parser which parses comments (see previous comments).
Close as "won't fix"?
History
Date User Action Args
2022年04月11日 14:56:59adminsetgithub: 52524
2011年10月29日 02:35:58floxsetstatus: open -> closed
2010年08月08日 21:06:58floxsettype: behavior -> enhancement
versions: + Python 3.2, - Python 3.1
nosy: + scoder

messages: + msg113322
resolution: wont fix
stage: resolved
2010年04月01日 17:29:22amaury.forgeotdarcsetmessages: + msg102112
2010年04月01日 17:24:05pokesetmessages: + msg102110
2010年04月01日 13:35:26brian.curtinsetnosy: + flox
2010年04月01日 09:01:50amaury.forgeotdarcsetnosy: + amaury.forgeotdarc, effbot
messages: + msg102078
2010年04月01日 01:37:44pokecreate

AltStyle によって変換されたページ (->オリジナル) /