I have this XML from a SOAP call:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<soapenv:Header/>
<soapenv:Body>
<SessionID xmlns="http://www.gggg.com/oog">5555555</SessionID>
<QueryResult xmlns="http://www.gggg.com/oog/Query" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Code>testsk</Code>
<Records>
<Record>
<dim_id>1</dim_id>
<resource_full_name>Administrator, Sir</resource_full_name>
<resource_first_name>Sir</resource_first_name>
<resource_last_name>Administrator</resource_last_name>
<resource_email>[email protected]</resource_email>
<resource_user_name>admin</resource_user_name>
</Record>
<Record>
<dim_id>2</dim_id>
<resource_full_name>scheduler, scheduler</resource_full_name>
<resource_first_name>scheduler</resource_first_name>
<resource_last_name>scheduler</resource_last_name>
<resource_email>[email protected]</resource_email>
<resource_user_name>scheduler</resource_user_name>
</Record>
My goal: To parse each Record's sub-elements <dim_id> ... <resource_user_name> and save each record as a row in a CSV.
My Code:
dim_id_list = []
full_name_list = []
first_name_list = []
last_name_list = []
resource_email_list = []
resource_user_name_list = []
root = et.parse('xml_stuff.xml').getroot()
for dim_id in root.iter('{http://www.gggg.com/oog/Query}dim_id'):
dim_id_list.append(dim_id.text)
for resource_full_name in root.iter('{http://www.gggg.com/oog/Query}resource_full_name'):
full_name_list.append(resource_full_name.text)
for resource_first_name in root.iter('{http://www.gggg.com/oog/Query}resource_first_name'):
first_name_list.append(resource_first_name.text)
for resource_last_name in root.iter('{http://www.gggg.com/oog/Query}resource_last_name'):
last_name_list.append(resource_last_name.text)
for resource_email in root.iter('{http://www.gggg.com/oog/Query}resource_email'):
resource_email_list.append(resource_email.text)
for resource_user_name in root.iter('{http://www.gggg.com/oog/Query}resource_user_name'):
resource_user_name_list.append(resource_user_name.text)
rows = zip(dim_id_list, full_name_list, first_name_list, last_name_list, resource_email_list, resource_user_name_list)
with open('test.csv', "w", encoding='utf16', newline='') as f:
writer = csv.writer(f)
for row in rows:
writer.writerow(row)
Is there a better way to loop through the Records? This code is terribly verbose. I tried this:
for record in root.findall('.//{http://www.gggg.com/oog/Query}Record'):
dim_id = record.find('dim_id').text
# Extract each attribute, save to list. etc.
But I am getting attribute errors trying to access each record's text property.
1 Answer 1
It makes little sense to slice the data into "vertical" lists, then transpose them back into rows using zip()
. Not only is it cumbersome to do it that way, it's also fragile. If, for example, one records is missing its resource_email
child element, then all subsequent rows will be off!
You can use writer.writerows(rows)
instead of the for row in rows: writer.write(row)
loop. Furthermore, you can pass a generator expression so that the CSV writer extracts records on the fly as needed.
It's customary to import xml.etree.ElementTree as ET
rather than as et
.
Suggested solution
import csv
from xml.etree import ElementTree as ET
fieldnames = [
'dim_id',
'resource_full_name',
'resource_first_name',
'resource_last_name',
'resource_email',
'resource_user_name',
]
ns = {'': 'http://www.gggg.com/oog/Query'}
xml_records = ET.parse('xml_stuff.xml').find('.//Records', ns)
with open('test2.csv', 'w', encoding='utf16', newline='') as f:
csv.DictWriter(f, fieldnames).writerows(
{
prop.tag.split('}', 1)[1]: prop.text
for prop in xr
}
for xr in xml_records
)
If you are certain that each <Record>
always has its child elements in the right order, you can simplify it further by not explicitly stating the element/field names:
import csv
from xml.etree import ElementTree as ET
ns = {
'': 'http://www.gggg.com/oog/Query',
'soapenv': 'http://schemas.xmlsoap.org/soap/envelope/',
}
records = ET.parse('xml_stuff.xml').find('soapenv:Body/QueryResult/Records', ns)
with open('test2.csv', 'w', encoding='utf16', newline='') as f:
csv.writer(f).writerows(
[prop.text for prop in r] for r in records
)