homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: minidom parses comments wrongly
Type: Stage: resolved
Components: XML Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Wanat, ned.deily
Priority: normal Keywords:

Created on 2015年05月14日 22:09 by Wanat, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Messages (2)
msg243222 - (view) Author: Paweł (Wanat) Date: 2015年05月14日 22:09
from xml.dom import minidom
html = """<html>
 <body>
 <!-- <img src="/images/obraz--super.jpg"/> -->
 </body>
</html>"""
minidom.parseString(html)
Result:
Traceback (most recent call last):
 File "minidom.py", line 10, in <module>
 minidom.parseString(html)
 File "/usr/lib/python2.7/xml/dom/minidom.py", line 1928, in parseString
 return expatbuilder.parseString(string)
 File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 940, in parseString
 return builder.parseString(string)
 File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 223, in parseString
 parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 3, column 34
Tested versions:
2.7.6, 2.7.3
Reason:
-- between obraz and super;
msg243241 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2015年05月15日 02:31
Thanks for your report. Alas, according to the W3C XML 1.0 specification:
"For compatibility, the string " -- " (double-hyphen) MUST NOT occur within comments." 
So, it appears minidom (and other XML parsers) are correct in rejecting your example as not well-formed XML.
http://www.w3.org/TR/xml/#sec-comments 
History
Date User Action Args
2022年04月11日 14:58:16adminsetgithub: 68385
2015年05月15日 02:31:39ned.deilysetstatus: open -> closed

type: crash ->

nosy: + ned.deily
messages: + msg243241
resolution: not a bug
stage: resolved
2015年05月14日 22:09:14Wanatcreate

AltStyle によって変換されたページ (->オリジナル) /