This reads in data from a .csv file and generates an .xml file with respect to the .csv data. Can you spot any parts where re-factoring can make this code more efficient?
import csv
from xml.etree.ElementTree import Element, SubElement, Comment, tostring
from xml.etree.ElementTree import ElementTree
import xml.etree.ElementTree as etree
def main():
reader = read_csv()
generate_xml(reader)
def read_csv():
with open ('1250_12.csv', 'r') as data:
return list(csv.reader(data))
def generate_xml(reader):
root = Element('Solution')
root.set('version','1.0')
tree = ElementTree(root)
head = SubElement(root, 'DrillHoles')
head.set('total_holes', '238')
description = SubElement(head,'description')
current_group = None
i = 0
for row in reader:
if i > 0:
x1,y1,z1,x2,y2,z2,cost = row
if current_group is None or i != current_group.text:
current_group = SubElement(description, 'hole',{'hole_id':"%s"%i})
collar = SubElement (current_group, 'collar',{'':', '.join((x1,y1,z1))}),
toe = SubElement (current_group, 'toe',{'':', '.join((x2,y2,z2))})
cost = SubElement(current_group, 'cost',{'':cost})
i+=1
def indent(elem, level=0):
i = "\n" + level*" "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
if not elem.tail or not elem.tail.strip():
elem.tail = i
for elem in elem:
indent(elem, level+1)
if not elem.tail or not elem.tail.strip():
elem.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i
indent(root)
tree.write(open('holes1.xml','w'))
2 Answers 2
Minor stuff:
- Remove the last
import
-etree
is not used anywhere. - Merge the two first
import
s
Possibly speed-improving stuff:
- Avoid converting the
csv.reader
output beforereturn
ing unless absolutely necessary. - Skip
indent
unless the output must be readable by a human with a non-formatting editor. - If you need to indent the output, existing solutions are probably very efficient.
- Use
reader.next()
to skip the header line ingenerate_xml
, then you don't need to keep checking the value ofi
.
Don't use something like for elem in elem
at some point with larger for loops you will miss that elem is different variable before and in/after the for loop:
for subelem in elem:
indent(subelem, level+1)
if not subelem.tail or not elem.tail.strip():
subelem.tail = i
Since indent(subelem...) already sets the tail, you probably do not need to do that again.
for elem in elem
that reassigns elem to the last element of elem, and then you access elem again. Not sure if that's what you want to do or not, but anyway that's confusing. \$\endgroup\$