Using Python's LXML I must read an XML file and print from each "basic" and "expert" tag, the name and email text from it. I've done a script that works but I don't think is the best way of doing this. Is there a better (simpler) way for getting the data of the XML without having to make 2 iterations on it?
Python so far:
from lxml import etree
myXML = "data.xml"
tree = etree.parse(myXML)
root = tree.getroot()
for node in root:
if node.tag == "basic" or node.tag == "expert":
user = [None] * 4
for i, child in enumerate(node):
if child.tag == "name":
user[0] = i
user[1] = child.text
if child.tag == "email":
user[2] = i
user[3] = child.text
print user
if user[3].startswith('_'):
# do some other things with data if email begins with _ ...
Will print:
[0, 'f.bar', 1, '[email protected]']
[0, 'm.bob', 3, '[email protected]']
[0, 'm.bab', 3, '[email protected]']
XML sample:
<?xml version="1.0"?>
<users>
<id>11111</id>
<checked>True</checked>
<version>A12</mode>
<basic>
<name>f.bar</name>
<email>[email protected]</email>
<forename>Foo</forename>
<surname>Bar</surname>
</basic>
<expert>
<name>m.bob</name>
<forename>Mak</forename>
<surname>Bob</surname>
<email>[email protected]</password>
</expert>
<expert>
<name>m.bab</name>
<forename>Mak</forename>
<surname>Bab</surname>
<email>[email protected]</password>
</expert>
<guru>
<name>e.guru</name>
<forename>Nick</forename>
<email>[email protected]</password>
<surname>Gru</surname>
</guru>
</users>
1 Answer 1
Currently, you are overlooking one of the advantages in using lxml
with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.
Right now, your code really follows the syntax of Python's built-in etree
, without any xpath()
calls that can run dynamic parsing off node names.
Below iterates through all <basic>
and <expert>
tags and retrieves their child <name>
and <email>
all in one loop or list comprehension. And to retrieve their position we count their preceding siblings with count(preceding-sibling::*)
.
from lxml import etree
myXML = "data.xml"
tree = etree.parse(myXML)
user = []
# FOR LOOP
for i in tree.xpath("//*[name()='basic' or name()='expert']"):
user.append([i.xpath("count(name/preceding-sibling::*)"),
i.find("name").text,
i.xpath("count(email/preceding-sibling::*)"),
i.find("email").text])
print(user)
# [[0.0, 'f.bar', 1.0, '[email protected]'],
# [0.0, 'm.bob', 3.0, '[email protected]'],
# [0.0, 'm.bab', 3.0, '[email protected]']]
# LIST COMPREHENSION
user = [[i.xpath("count(name/preceding-sibling::*)"),
i.find("name").text,
i.xpath("count(email/preceding-sibling::*)"),
i.find("email").text]
for i in tree.xpath("//*[name()='basic' or name()='expert']")]
print(user)
# [[0.0, 'f.bar', 1.0, '[email protected]'],
# [0.0, 'm.bob', 3.0, '[email protected]'],
# [0.0, 'm.bab', 3.0, '[email protected]']]
-
\$\begingroup\$ But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information. \$\endgroup\$Ñhosko– Ñhosko2018年01月29日 11:48:10 +00:00Commented Jan 29, 2018 at 11:48
-
\$\begingroup\$ Understood. See edit still using an XPath solution with
count(.../preceding-sibling::*)
. \$\endgroup\$Parfait– Parfait2018年01月29日 15:40:43 +00:00Commented Jan 29, 2018 at 15:40