homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: In xml.etree.ElementTree findall() can't search all elements in a namespace
Type: enhancement Stage: resolved
Components: Library (Lib), XML Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: scoder Nosy List: eli.bendersky, py.user, scoder, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2016年09月21日 12:38 by py.user, last changed 2022年04月11日 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 12997 merged scoder, 2019年04月28日 18:13
Messages (5)
msg277130 - (view) Author: py.user (py.user) * Date: 2016年09月21日 12:38
In the example there are two namespaces in one document, but it is impossible to search all elements only in one namespace:
>>> import xml.etree.ElementTree as etree
>>>
>>> s = '<feed xmlns="http://def" xmlns:x="http://x"><a/><x:b/></feed>'
>>>
>>> root = etree.fromstring(s)
>>>
>>> root.findall('*')
[<Element '{http://def}a' at 0xb73961bc>, <Element '{http://x}b' at 0xb7396c34>]
>>>
>>> root.findall('{http://def}*')
[]
>>>
And same try with site package lxml works fine:
>>> import lxml.etree as etree
>>>
>>> s = '<feed xmlns="http://def" xmlns:x="http://x"><a/><x:b/></feed>'
>>>
>>> root = etree.fromstring(s)
>>>
>>> root.findall('*')
[<Element {http://def}a at 0xb70ab11c>, <Element {http://x}b at 0xb70ab144>]
>>>
>>> root.findall('{http://def}*')
[<Element {http://def}a at 0xb70ab11c>]
>>>
msg340301 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019年04月15日 18:51
lxml has a couple of nice features here:
- all tags in a namespace: "{namespace}*"
- a local name 'tag' in any (or no) namespace: "{*}tag"
- a tag without namespace: "{}tag"
- all tags without namespace: "{}*"
"{*}*" is also accepted but is the same as "*". Note that "*" is actually allowed as an XML tag name by the spec, but rare enough to hijack it for this purpose. I've actually never seen it used anywhere in the wild.
lxml's implementation isn't applicable to ElementTree (searching has been subject to excessive optimisation), but it shouldn't be hard to extend the one in ET's ElementPath.py module, as well as Element.iter() in ElementTree.py, to support this kind of tag comparison.
PR welcome.
lxml's tests are here (and in the following test methods):
https://github.com/lxml/lxml/blob/359f693b972c2e6b0d83d26a329d2d20b7581c48/src/lxml/tests/test_etree.py#L2911
Note that they actually test the deprecated .getiterator() method for historical reasons. They should probably call .iter() instead these days. lxml's ElementPath implementation is under src/lxml/_elementpath.py, but the tag comparison itself is done elsewhere in Cython code (here, in case it matters:)
https://github.com/lxml/lxml/blob/359f693b972c2e6b0d83d26a329d2d20b7581c48/src/lxml/apihelpers.pxi#L921-L1048 
msg341030 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019年04月28日 18:15
PR submitted, feedback welcome.
msg341043 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019年04月29日 05:31
BTW, I found that lxml and ET differ in their behaviour when searching for '*'. ET takes it as meaning "any tree node", whereas lxml interprets it as "any Element". Since ET's parser does not create comments and processing instructions by default, this does not make a difference in most cases, but when the tree contains comments or PIs, then they will be found by '*' in ET but not in lxml.
At least for "{*}*", they now both return only Elements. Changing either behaviour for '*' is probably not a good idea at this point.
msg341351 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019年05月03日 18:58
New changeset 47541689ccea79dfcb055c6be5800b13fcb6bdd2 by Stefan Behnel in branch 'master':
bpo-28238: Implement "{*}tag" and "{ns}*" wildcard tag selection support for ElementPath, and extend the surrounding tests and docs. (GH-12997)
https://github.com/python/cpython/commit/47541689ccea79dfcb055c6be5800b13fcb6bdd2
History
Date User Action Args
2022年04月11日 14:58:37adminsetgithub: 72425
2019年05月03日 18:59:05scodersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019年05月03日 18:58:21scodersetmessages: + msg341351
2019年04月29日 05:31:03scodersetmessages: + msg341043
2019年04月28日 18:15:07scodersetassignee: scoder
type: behavior -> enhancement
messages: + msg341030
2019年04月28日 18:13:52scodersetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request12919
2019年04月15日 18:51:42scodersetmessages: + msg340301
stage: needs patch
2019年04月15日 16:00:07xtreaksetnosy: + scoder, eli.bendersky, serhiy.storchaka

versions: + Python 3.8, - Python 3.6
2016年09月21日 12:38:12py.usercreate

AltStyle によって変換されたページ (->オリジナル) /