4
\$\begingroup\$

Using Python's LXML I must read an XML file and print from each "basic" and "expert" tag, the name and email text from it. I've done a script that works but I don't think is the best way of doing this. Is there a better (simpler) way for getting the data of the XML without having to make 2 iterations on it?

Python so far:

from lxml import etree
myXML = "data.xml"
tree = etree.parse(myXML)
root = tree.getroot()
for node in root:
 if node.tag == "basic" or node.tag == "expert":
 user = [None] * 4
 for i, child in enumerate(node):
 if child.tag == "name":
 user[0] = i
 user[1] = child.text
 if child.tag == "email":
 user[2] = i
 user[3] = child.text
 print user
 if user[3].startswith('_'):
 # do some other things with data if email begins with _ ...

Will print:

[0, 'f.bar', 1, '[email protected]']
[0, 'm.bob', 3, '[email protected]']
[0, 'm.bab', 3, '[email protected]']

XML sample:

<?xml version="1.0"?>
<users>
 <id>11111</id>
 <checked>True</checked>
 <version>A12</mode>
 <basic>
 <name>f.bar</name>
 <email>[email protected]</email>
 <forename>Foo</forename>
 <surname>Bar</surname>
 </basic>
 <expert>
 <name>m.bob</name>
 <forename>Mak</forename>
 <surname>Bob</surname>
 <email>[email protected]</password>
 </expert>
 <expert>
 <name>m.bab</name>
 <forename>Mak</forename>
 <surname>Bab</surname>
 <email>[email protected]</password>
 </expert>
 <guru>
 <name>e.guru</name>
 <forename>Nick</forename>
 <email>[email protected]</password>
 <surname>Gru</surname>
 </guru>
</users>
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Jan 17, 2018 at 21:06
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

Currently, you are overlooking one of the advantages in using lxml with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.

Right now, your code really follows the syntax of Python's built-in etree, without any xpath() calls that can run dynamic parsing off node names.

Below iterates through all <basic> and <expert> tags and retrieves their child <name> and <email> all in one loop or list comprehension. And to retrieve their position we count their preceding siblings with count(preceding-sibling::*).

from lxml import etree
myXML = "data.xml"
tree = etree.parse(myXML)
user = []
# FOR LOOP
for i in tree.xpath("//*[name()='basic' or name()='expert']"):
 user.append([i.xpath("count(name/preceding-sibling::*)"), 
 i.find("name").text, 
 i.xpath("count(email/preceding-sibling::*)"), 
 i.find("email").text]) 
print(user)
# [[0.0, 'f.bar', 1.0, '[email protected]'], 
# [0.0, 'm.bob', 3.0, '[email protected]'], 
# [0.0, 'm.bab', 3.0, '[email protected]']]
# LIST COMPREHENSION
user = [[i.xpath("count(name/preceding-sibling::*)"), 
 i.find("name").text, 
 i.xpath("count(email/preceding-sibling::*)"), 
 i.find("email").text] 
 for i in tree.xpath("//*[name()='basic' or name()='expert']")]
print(user)
# [[0.0, 'f.bar', 1.0, '[email protected]'], 
# [0.0, 'm.bob', 3.0, '[email protected]'], 
# [0.0, 'm.bab', 3.0, '[email protected]']]
answered Jan 24, 2018 at 22:21
\$\endgroup\$
2
  • \$\begingroup\$ But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information. \$\endgroup\$ Commented Jan 29, 2018 at 11:48
  • \$\begingroup\$ Understood. See edit still using an XPath solution with count(.../preceding-sibling::*). \$\endgroup\$ Commented Jan 29, 2018 at 15:40

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.