homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Document Object Model API - validation
Type: behavior Stage:
Components: XML Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Kyle.Keating, jocassid, martin.panter, pdeep5693, terry.reedy
Priority: normal Keywords:

Created on 2011年05月20日 22:02 by Kyle.Keating, last changed 2022年04月11日 14:57 by admin.

Files
File name Uploaded Description Edit
xmlNameVerification.py jocassid, 2013年07月28日 02:49 code to validate xml element/attribute names
Messages (7)
msg136402 - (view) Author: Kyle Keating (Kyle.Keating) Date: 2011年05月20日 22:02
I was doing some tests on using this library and I noticed xml elements and attribute names could be created with mal-formed xml because special characters which can break validation are not cleaned or converted from their literal forms. Only the attribute values are cleaned, but not the names.
For example
import xml.dom
...
doc.createElement("p></p>") 
...
will just embed a pair of p tags in the xml result. I thought that the xml spec did not permit <, >, &, \n etc. in the element name or attribute name? Could I get some clarification on this, thanks!
msg137142 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011年05月28日 18:35
I suspect you are right, but do not know the rules, and have never used the module. There is no particular person maintaining xml.dom.X at present.
Could you please fill in the ... after the import to give a complete minimal example that fails? Someone could then test it on 3.2
msg137487 - (view) Author: Kyle Keating (Kyle.Keating) Date: 2011年06月02日 17:10
This looks to break pretty good... I did confirm this on 3.0, I'm guessing 3.2 is the same.
import sys
import xml.dom
doc = xml.dom.getDOMImplementation().createDocument(None, 'xml', None)
doc.firstChild.appendChild(doc.createElement('element00'))
element01 = doc.createElement('element01')
element01.setAttribute('attribute', "script><![CDATA[alert('script!');]]></script>")
doc.firstChild.appendChild(element01)
element02 = doc.createElement("script><![CDATA[alert('script!');]]></script>")
doc.firstChild.appendChild(element02)
element03 = doc.createElement("new line \n")
element03.setAttribute('attribute-name','new line \n')
doc.firstChild.appendChild(element03)
print doc.toprettyxml(indent=" ")
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
output:
<?xml version="1.0" ?>
<xml>
 <element/>
 <element01 attribute="script&gt;&lt;![CDATA[alert('script!');]]&gt;&lt;/script
&gt;"/>
 <script><![CDATA[alert('script!');]]></script>/>
 <new line
 attribute-name="new line
"/>
</xml>
msg137488 - (view) Author: Kyle Keating (Kyle.Keating) Date: 2011年06月02日 17:13
oops, the first xml element in the output should read "<element00/>" not "<element/>"
just a typo! don't get confused!
msg193804 - (view) Author: John Cassidy (jocassid) Date: 2013年07月28日 02:49
I added the line print(str(doc)) after the call to getDomImplementation and verified that the errors that I'm seeing are coming from the xml.dom.minidom implemenation of xml.dom. Checking minidom.py I did not see any validation on the tagName that gets passed to createElement. http://www.w3.org/TR/xml11/#NT-NameStartChar lists the format of allowed names. Attached is a file containing the functions I was working on. My thinking is that if the tagName is not valid a ValueError should be thrown.
msg258344 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016年01月16日 00:57
My limited understanding is that xml.dom and minidom are supposed to implement particular interfaces. So do these DOM interfaces specify if this validation should be done? If so, this would be a bug. Or is it just a question of whether Python should do extra validation not specified by the underlying DOM API?
msg283873 - (view) Author: Pradeep (pdeep5693) Date: 2016年12月23日 08:39
xml minidom.py needs extra validation in setAttributes for certain special characters depending on the attribute name. Attribute values cannot have special characters like <,> and cant be nested as described in the example below
element01 = doc.createElement('element01')
element01.setAttribute('attribute', "script><![CDATA[alert('script!');]]></script>")
doc.firstChild.appendChild(element01)
script shouldn't be allowed as a value for an attribute and I feel it should throw an exception (Value Exception) and as described above <,> shouldn't be allowed as attributes are more like key-value pairs. Could someone tell me if this is right? If it is, then minidom.py needs this extra level of validation for the same
History
Date User Action Args
2022年04月11日 14:57:17adminsetgithub: 56338
2019年04月27日 11:39:42scoderunlinkissue5166 dependencies
2016年12月23日 08:39:36pdeep5693setnosy: + pdeep5693
messages: + msg283873
2016年01月16日 00:57:27martin.pantersetversions: + Python 3.5, Python 3.6
nosy: + martin.panter

messages: + msg258344

components: + XML, - Library (Lib)
stage: test needed ->
2016年01月16日 00:44:53martin.panterlinkissue5166 dependencies
2013年07月28日 02:49:49jocassidsetfiles: + xmlNameVerification.py
nosy: + jocassid
messages: + msg193804

2011年06月02日 17:13:17Kyle.Keatingsetmessages: + msg137488
2011年06月02日 17:10:39Kyle.Keatingsetmessages: + msg137487
2011年05月28日 18:35:29terry.reedysetnosy: + terry.reedy

messages: + msg137142
stage: test needed
2011年05月20日 22:02:10Kyle.Keatingcreate

AltStyle によって変換されたページ (->オリジナル) /