Finding data on XML using Python's LXML

Question 1

Using Python's LXML I must read an XML file and print from each "basic" and "expert" tag, the name and email text from it. I've done a script that works but I don't think is the best way of doing this. Is there a better (simpler) way for getting the data of the XML without having to make 2 iterations on it?

Python so far:

from lxml import etree
myXML = "data.xml"
tree = etree.parse(myXML)
root = tree.getroot()
for node in root:
 if node.tag == "basic" or node.tag == "expert":
 user = [None] * 4
 for i, child in enumerate(node):
 if child.tag == "name":
 user[0] = i
 user[1] = child.text
 if child.tag == "email":
 user[2] = i
 user[3] = child.text
 print user
 if user[3].startswith('_'):
 # do some other things with data if email begins with _ ...

Will print:

[0, 'f.bar', 1, '[email protected]']
[0, 'm.bob', 3, '[email protected]']
[0, 'm.bab', 3, '[email protected]']

XML sample:

<?xml version="1.0"?>
<users>
 <id>11111</id>
 <checked>True</checked>
 <version>A12</mode>
 <basic>
 <name>f.bar</name>
 <email>[email protected]</email>
 <forename>Foo</forename>
 <surname>Bar</surname>
 </basic>
 <expert>
 <name>m.bob</name>
 <forename>Mak</forename>
 <surname>Bob</surname>
 <email>[email protected]</password>
 </expert>
 <expert>
 <name>m.bab</name>
 <forename>Mak</forename>
 <surname>Bab</surname>
 <email>[email protected]</password>
 </expert>
 <guru>
 <name>e.guru</name>
 <forename>Nick</forename>
 <email>[email protected]</password>
 <surname>Gru</surname>
 </guru>
</users>

Question 2

Currently, you are overlooking one of the advantages in using lxml with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.

Right now, your code really follows the syntax of Python's built-in etree, without any xpath() calls that can run dynamic parsing off node names.

Below iterates through all <basic> and <expert> tags and retrieves their child <name> and <email> all in one loop or list comprehension. And to retrieve their position we count their preceding siblings with count(preceding-sibling::*).

from lxml import etree
myXML = "data.xml"
tree = etree.parse(myXML)
user = []
# FOR LOOP
for i in tree.xpath("//*[name()='basic' or name()='expert']"):
 user.append([i.xpath("count(name/preceding-sibling::*)"), 
 i.find("name").text, 
 i.xpath("count(email/preceding-sibling::*)"), 
 i.find("email").text]) 
print(user)
# [[0.0, 'f.bar', 1.0, '[email protected]'], 
# [0.0, 'm.bob', 3.0, '[email protected]'], 
# [0.0, 'm.bab', 3.0, '[email protected]']]
# LIST COMPREHENSION
user = [[i.xpath("count(name/preceding-sibling::*)"), 
 i.find("name").text, 
 i.xpath("count(email/preceding-sibling::*)"), 
 i.find("email").text] 
 for i in tree.xpath("//*[name()='basic' or name()='expert']")]
print(user)
# [[0.0, 'f.bar', 1.0, '[email protected]'], 
# [0.0, 'm.bob', 3.0, '[email protected]'], 
# [0.0, 'm.bab', 3.0, '[email protected]']]

Question 3

But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.

Question 4

Understood. See edit still using an XPath solution with count(.../preceding-sibling::*).

Parfait Parfait 9806 silver badges15 bronze badges · Answer 1 · 2018-01-24 22:21:12Z

Currently, you are overlooking one of the advantages in using lxml with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.

Right now, your code really follows the syntax of Python's built-in etree, without any xpath() calls that can run dynamic parsing off node names.

Below iterates through all <basic> and <expert> tags and retrieves their child <name> and <email> all in one loop or list comprehension. And to retrieve their position we count their preceding siblings with count(preceding-sibling::*).

from lxml import etree
myXML = "data.xml"
tree = etree.parse(myXML)
user = []
# FOR LOOP
for i in tree.xpath("//*[name()='basic' or name()='expert']"):
 user.append([i.xpath("count(name/preceding-sibling::*)"), 
 i.find("name").text, 
 i.xpath("count(email/preceding-sibling::*)"), 
 i.find("email").text]) 
print(user)
# [[0.0, 'f.bar', 1.0, '[email protected]'], 
# [0.0, 'm.bob', 3.0, '[email protected]'], 
# [0.0, 'm.bab', 3.0, '[email protected]']]
# LIST COMPREHENSION
user = [[i.xpath("count(name/preceding-sibling::*)"), 
 i.find("name").text, 
 i.xpath("count(email/preceding-sibling::*)"), 
 i.find("email").text] 
 for i in tree.xpath("//*[name()='basic' or name()='expert']")]
print(user)
# [[0.0, 'f.bar', 1.0, '[email protected]'], 
# [0.0, 'm.bob', 3.0, '[email protected]'], 
# [0.0, 'm.bab', 3.0, '[email protected]']]

But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
Understood. See edit still using an XPath solution with count(.../preceding-sibling::*).

Stack Exchange Network

Finding data on XML using Python's LXML

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Finding data on XML using Python's LXML

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions