4
\$\begingroup\$

I have this XML from a SOAP call:

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
 <soapenv:Header/>
 <soapenv:Body>
 <SessionID xmlns="http://www.gggg.com/oog">5555555</SessionID>
 <QueryResult xmlns="http://www.gggg.com/oog/Query" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <Code>testsk</Code>
 <Records>
 <Record>
 <dim_id>1</dim_id>
 <resource_full_name>Administrator, Sir</resource_full_name>
 <resource_first_name>Sir</resource_first_name>
 <resource_last_name>Administrator</resource_last_name>
 <resource_email>[email protected]</resource_email>
 <resource_user_name>admin</resource_user_name>
 </Record>
 <Record>
 <dim_id>2</dim_id>
 <resource_full_name>scheduler, scheduler</resource_full_name>
 <resource_first_name>scheduler</resource_first_name>
 <resource_last_name>scheduler</resource_last_name>
 <resource_email>[email protected]</resource_email>
 <resource_user_name>scheduler</resource_user_name>
 </Record>

My goal: To parse each Record's sub-elements <dim_id> ... <resource_user_name> and save each record as a row in a CSV.

My Code:

dim_id_list = []
full_name_list = []
first_name_list = []
last_name_list = []
resource_email_list = []
resource_user_name_list = []
root = et.parse('xml_stuff.xml').getroot()
for dim_id in root.iter('{http://www.gggg.com/oog/Query}dim_id'):
 dim_id_list.append(dim_id.text)
for resource_full_name in root.iter('{http://www.gggg.com/oog/Query}resource_full_name'):
 full_name_list.append(resource_full_name.text)
for resource_first_name in root.iter('{http://www.gggg.com/oog/Query}resource_first_name'):
 first_name_list.append(resource_first_name.text)
for resource_last_name in root.iter('{http://www.gggg.com/oog/Query}resource_last_name'):
 last_name_list.append(resource_last_name.text)
for resource_email in root.iter('{http://www.gggg.com/oog/Query}resource_email'):
 resource_email_list.append(resource_email.text)
for resource_user_name in root.iter('{http://www.gggg.com/oog/Query}resource_user_name'):
 resource_user_name_list.append(resource_user_name.text)
rows = zip(dim_id_list, full_name_list, first_name_list, last_name_list, resource_email_list, resource_user_name_list)
with open('test.csv', "w", encoding='utf16', newline='') as f:
 writer = csv.writer(f)
 for row in rows:
 writer.writerow(row)

Is there a better way to loop through the Records? This code is terribly verbose. I tried this:

for record in root.findall('.//{http://www.gggg.com/oog/Query}Record'):
 dim_id = record.find('dim_id').text
# Extract each attribute, save to list. etc.

But I am getting attribute errors trying to access each record's text property.

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Jul 28, 2022 at 16:57
\$\endgroup\$

1 Answer 1

5
\$\begingroup\$

It makes little sense to slice the data into "vertical" lists, then transpose them back into rows using zip(). Not only is it cumbersome to do it that way, it's also fragile. If, for example, one records is missing its resource_email child element, then all subsequent rows will be off!

You can use writer.writerows(rows) instead of the for row in rows: writer.write(row) loop. Furthermore, you can pass a generator expression so that the CSV writer extracts records on the fly as needed.

It's customary to import xml.etree.ElementTree as ET rather than as et.

Suggested solution

import csv
from xml.etree import ElementTree as ET
fieldnames = [
 'dim_id',
 'resource_full_name',
 'resource_first_name',
 'resource_last_name',
 'resource_email',
 'resource_user_name',
]
ns = {'': 'http://www.gggg.com/oog/Query'}
xml_records = ET.parse('xml_stuff.xml').find('.//Records', ns)
with open('test2.csv', 'w', encoding='utf16', newline='') as f:
 csv.DictWriter(f, fieldnames).writerows(
 {
 prop.tag.split('}', 1)[1]: prop.text
 for prop in xr
 }
 for xr in xml_records
 )

If you are certain that each <Record> always has its child elements in the right order, you can simplify it further by not explicitly stating the element/field names:

import csv
from xml.etree import ElementTree as ET
ns = {
 '': 'http://www.gggg.com/oog/Query',
 'soapenv': 'http://schemas.xmlsoap.org/soap/envelope/',
}
records = ET.parse('xml_stuff.xml').find('soapenv:Body/QueryResult/Records', ns)
with open('test2.csv', 'w', encoding='utf16', newline='') as f:
 csv.writer(f).writerows(
 [prop.text for prop in r] for r in records
 )
answered Jul 28, 2022 at 17:47
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.