Here is a part of an XML file that describes forms:
<?xml version="1.0" encoding="utf-8"?>
<ArrayOfHouse>
<XmlForm>
<houseNum>1</houseNum>
<plan1>
<coord>
<X> 1.2 </X>
<Y> 2.1 </Y>
<Z> 3.0 </Z>
</coord>
<color>
<R> 255 </R>
<G> 0 </G>
<B> 0 </B>
</color>
</plan1>
<plan2>
<coord>
<X> 21.2 </X>
<Y> 22.1 </Y>
<Z> 31.0 </Z>
</coord>
<color>
<R> 255 </R>
<G> 0 </G>
<B> 0 </B>
</color>
</plan2>
</XmlForm>
<XmlForm>
<houseNum>2</houseNum>
<plan1>
<coord>
<X> 11.2 </X>
<Y> 12.1 </Y>
<Z> 13.0 </Z>
</coord>
<color>
<R> 255 </R>
<G> 255 </G>
<B> 0 </B>
</color>
</plan1>
<plan2>
<coord>
<X> 211.2 </X>
<Y> 212.1 </Y>
<Z> 311.0 </Z>
</coord>
<color>
<R> 255 </R>
<G> 0 </G>
<B> 255 </B>
</color>
</plan2>
</XmlForm>
</ArrayOfHouse>
Here is my code to recuperate the coordinates of each plan for the house 1 and 2, the problem is in the this line coord=tree.findall("XmlForm/[houseNum=str(houseindex)], the same problem is raised when using houseindex.__str__()
import pandas as pd
import numpy as np
from lxml import etree
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
tree =etree.parse("myexample.xml")
#recuperate the columns name for pandas dataframe
planlist=tree.findall("XmlForm/[houseNum='1']/")
columns=[]
for el in planlist[1:]:
columns.append(el.tag)
#Declare pandas dataFrame
df=pd.DataFrame(columns=list('XYZ'),dtype=float)
for houseindex in range(0,2):
for index in range(len(columns)):
coord=tree.findall("XmlForm/[houseNum=str(houseindex)]/"+columns[index]+"/coord/")
XYZ=[]
for cc in coord:
XYZ.append(cc.text)
df.loc[index]=XYZ
print(df)
2 Answers 2
You clearly want str(houseindex) to be interpreted in Python before constructing your XPath expression. (Your error message is telling you that str() isn't an XPath function.)
Therefore, change the argument of coord=tree.findall() from
"XmlForm/[houseNum=str(houseindex)]/"+columns[index]+"/coord/"
to
"XmlForm/[houseNum="+str(houseindex)+"]/"+columns[index]+"/coord/"
Two more fixes to that XPath:
- Remove the
/before the predicate onXmlForm. - Add quotes around the equality test of
houseNum.
Final XPath with no further syntax errors
The following XPath has all three fixes combined and has no further syntax errors:
"XmlForm[houseNum='"+str(houseindex)+"']/"+columns[index]+"/coord/"
2 Comments
You don't inject "houseindex" into your string. Also be careful within your for loop of houseindex as you currently use range(0, 2) which corresponds to 0 and 1. Based on your xml example you rather want to use range(1, 3).
I believe you want to have something like this (I slightly refactored your code to improve readability):
import pandas as pd
from lxml import etree
tree = etree.parse("myexample.xml")
# recuperate the columns name for pandas dataframe
plan_list = tree.findall("XmlForm/[houseNum='1']/")
columns = [el.tag for el in plan_list[1:]]
# Declare pandas dataFrame
data = list()
for house_index in range(1, 3):
for column in columns:
element_text = "XmlForm/[houseNum='{index}']/{column}/coord/".format(index=house_index, column=column)
coord = tree.findall(element_text)
row = [cc.text for cc in coord]
data.append(row)
df = pd.DataFrame(data, columns=list('XYZ'), dtype=float)
print(df)