2
\$\begingroup\$

I have some XML that I am parsing in Python and I need to insert the data on the database. This is current implementation of my code that is working.

Actual XML:

<JobTechData xmlns="http://tempuri.org/JobTechData.xsd">
 <TechDataParams>
 <ProcessType>IONMILL1</ProcessType>
 <RunNumber>125229</RunNumber>
 <RunType>522</RunType>
 <ToolID>HC_VIM05</ToolID>
 <ProcessTS>2016年01月25日-08.06.53.000000</ProcessTS>
 <Recipe>S1280</Recipe>
 <PalletPosition>1</PalletPosition>
 <Target>1280</Target>
 <CumulTarget>0</CumulTarget>
 <CTarget>0</CTarget>
 <SITarget>0</SITarget>
 </TechDataParams>
</JobTechData>

And below is the code im using to parse the XML above.

def split(iterable, where):
 def splitter(acc, item, where=where):
 if item == where:
 acc.append([])
 else:
 acc[-1].append(item)
 return acc
 return reduce(splitter, iterable, [[]])
def insert_runspec(spec):
 cnx = mysql.connector.connect(user = USER,
 password = PASSWORD,host = HOST,
 database = DB, port = DB_PORT)
 cursor = cnx.cursor() 
 params = ['?' for item in spec]
 sql = 'INSERT INTO runspecdata(param_value,param_name,ics_runnum,recipe,batch,tool) VALUES (%s)' % ','.join(params)
 cursor.execute(sql,spec)
 cursor.close()
 cnx.close()
spec = []
rc = doc.getElementsByTagName('Recipe')[0].firstChild.nodeValue
rid = doc.getElementsByTagName('RunNumber')[0].firstChild.nodeValue
tid = doc.getElementsByTagName('ToolID')[0].firstChild.nodeValue
# append specification details to spec list, delimit by '|'
spec.append(int(doc.getElementsByTagName('Target')[0].firstChild.nodeValue))
spec.append('target')
spec.append('|')
spec.append(int(doc.getElementsByTagName('CumulTarget')[0].firstChild.nodeValue))
spec.append('cumul')
spec.append('|')
spec.append(int(doc.getElementsByTagName('CTarget')[0].firstChild.nodeValue))
spec.append('c-target')
spec.append('|')
spec.append('sio-target')
spec.append('|')
spec.append(int(doc.getElementsByTagName('SITarget')[0].firstChild.nodeValue))
spec.append('si-target')
splitspec = split(spec,'|')
ics = 1069
for i in splitspec:
 if any(i):
 i.append(ics)
 i.append(rc)
 i.append(rid)
 i.append(tid)
 t = tuple(i) 
 insert_runspec(t)

This is the example data inserted to database. enter image description here

This is working fine. But as you can see the use of the delimiter isn't so Pythonic. Is there a better way to parse the XML?

asked Jan 25, 2016 at 1:40
\$\endgroup\$
0

1 Answer 1

2
\$\begingroup\$

You're right, the delimiter doesn't seem pythonic. The whole logic behind your split function either.

Let's start by analyzing the problem: you want to extract parameters by name in your XML, associate it to an other name, add default values and insert it in a database.

There is nowhere a need to fetch every values before storing a row in the database, so we could write a function that does all that at once:

def connect(user, host, pwd, db, port):
 c = mysql.connector.connect(user=user, password=pwd, host=host, database=db, port=port)
 return c
def get_default_values(doc):
 rc = doc.getElementsByTagName('Recipe')[0].firstChild.nodeValue
 rid = doc.getElementsByTagName('RunNumber')[0].firstChild.nodeValue
 tid = doc.getElementsByTagName('ToolID')[0].firstChild.nodeValue
 return 1069, rc, rid, tid
def from_xml_to_db(doc, xml_name, db_name, connexion, default_values):
 param_value = int(doc.getElementsByTagName(xml_name)[0].firstChild.nodeValue)
 db_row = (param_value, param_name) + default_values
 param_binding = ', '.join('?' * len(db_row))
 sql_query = 'INSERT INTO runspecdata(param_value, param_name, ics_runnum, '\
 'recipe, batch, tool) VALUES ({})'.format(param_binding)
 cursor = connexion.cursor()
 cursor.execute(sql_query, db_row)
 cursor.close()
ics_rc_rid_tid = get_default_values(doc)
connexion = connect(USER, HOST, PASSWORD, DB, DB_PORT)
from_xml_to_db(doc, 'Target', 'target', connexion, ics_rc_rid_tid)
from_xml_to_db(doc, 'CumulTarget', 'cumul', connexion, ics_rc_rid_tid)
from_xml_to_db(doc, 'CTarget', 'c-target', connexion, ics_rc_rid_tid)
from_xml_to_db(doc, 'SITarget', 'si-target', connexion, ics_rc_rid_tid)
connexion.close()

I didn't considered the case of 'sio-target' as there is no value associated and it doesn't appear in the DB screenshot. If you want to support adding a row with no parameter associated in the XML, you may want to consider using a default value (say None for xml_name will lead to param_value = 0 or something along the lines).

However, this code is still not ideal. There are repetitions at several levels and, more importantly, there is a tight coupling between reading from XML and writing to DB. You can see that I moved the the DB connection out of the writing function to lessen coupling, let's do more of those.

To start with doc.getElementsByTagName(...)[0].firstChild.nodeValue need to be factorized. Let's add handling of default values when names does not exist:

def read_parameter(doc, name=None, default=0):
 if name is None:
 return default
 try:
 value = doc.getElementsByTagName(name)[0].firstChild.nodeValue
 except IndexError:
 value = default
 return int(value)

Here we handle the IndexError in case there is no tag named name in the document and the returned list is empty. If there is a possibility that a tag does not have elements within it, firstChild may be None and you might have to deal with an AttributeError as well (speculations, I’m not that versed into minidom).

This let us simplify:

def get_default_values(doc, ics=1069):
 return (
 ics,
 read_parameter(doc, 'Recipe'),
 read_parameter(doc, 'RunNumber'),
 read_parameter(doc, 'ToolID'),
 )

Next we need to separate reading from XML from writing to DB. Let's start with the writing:

def store_target(connexion, db_row):
 param_binding = ', '.join('?' * len(db_row))
 sql_query = 'INSERT INTO runspecdata(param_value, param_name, ics_runnum, '\
 'recipe, batch, tool) VALUES ({})'.format(param_binding)
 with connexion.cursor() as cursor:
 cursor.execute(sql_query, db_row)

Much simpler. I also changed the cursor creation to ease its management, the context manager should be implemented as part of the DB API 2.0 but if it doesn't, just use connexion.cursor() and cursor.close() as previously.

Now the XML reading is already handled by read_parameter, so let's bring it all together:

ics_rc_rid_tid = get_default_values(doc)
connexion = connect(USER, HOST, PASSWORD, DB, DB_PORT)
row = read_parameter(doc, 'Target'), 'target'
store_target(connexion, row + ics_rc_rid_tid)
row = read_parameter(doc, 'CumulTarget'), 'cumul'
store_target(connexion, row + ics_rc_rid_tid)
row = read_parameter(doc, 'CTarget'), 'c-target'
store_target(connexion, row + ics_rc_rid_tid)
row = read_parameter(doc, None), 'sio-target'
store_target(connexion, row + ics_rc_rid_tid)
row = read_parameter(doc, 'SITarget'), 'si-target'
store_target(connexion, row + ics_rc_rid_tid)
connexion.close()

Well... no... this seems not right. Let's improve that as well.

XML_TO_DB_TABLE = (
 ('Target', 'target'),
 ('CumulTarget', 'cumul'),
 ('CTarget', 'c-target'),
 (None, 'sio-target'),
 ('SITarget', 'si-target'),
)
def read_parameter(doc, name=None, default=0):
 if name is None:
 return default
 try:
 value = doc.getElementsByTagName(name)[0].firstChild.nodeValue
 except IndexError:
 value = default
 return int(value)
def get_default_values(doc, ics=1069):
 return (
 ics,
 read_parameter(doc, 'Recipe'),
 read_parameter(doc, 'RunNumber'),
 read_parameter(doc, 'ToolID'),
 )
def connect(user, host, pwd, db, port):
 c = mysql.connector.connect(user=user, password=pwd, host=host, database=db, port=port)
 return c
def store_target(connexion, db_row):
 param_binding = ', '.join('?' for _ in db_row)
 sql_query = 'INSERT INTO runspecdata(param_value, param_name, ics_runnum, '\
 'recipe, batch, tool) VALUES ({})'.format(param_binding)
 with connexion.cursor() as cursor:
 cursor.execute(sql_query, db_row)
def xml_to_db(doc, connexion, table):
 common_values = get_default_values(doc)
 with connexion:
 for xml_name, db_name in table:
 specific_values = read_parameter(doc, xml_name), db_name
 store_target(connexion, specific_values + common_values)
if __name__ == '__main__':
 doc = #Initialize it somehow
 cnx = connect(USER, HOST, PASSWORD, DB, DB_PORT)
 xml_to_db(doc, cnx, XML_TO_DB_TABLE)
 cnx.close()

Here the with connexion part is supposed to commit or rollback the transactions depending if anything went well of if there was an error. Again, if it doesn't work for MySQL, just use the management method you're used to.


A few other comments to finish with:

  • Your indentation is very weird on your connect call;
  • Using global variables inside a function can impair reusability, try to pass them as parameters instead;
  • The if __name__ == '__main__' part lets you import your module and tests functions without having any code executed beforehand;
  • You're building the binding list ((?, ?, ?, ?, ?, ?)) based on the length of the row you want to insert but you are actually expecting 6 arguments (by looking at the name of the columns involved); why not enforce that anyway and let the DB handle wrong number of arguments itself?

That last point would lead to a much simpler store_target:

def store_target(connexion, db_row):
 with connexion.cursor() as cursor:
 cursor.execute(
 'INSERT INTO runspecdata(param_value, '
 'param_name, ics_runnum, recipe, batch, tool) '
 'VALUES (?, ?, ?, ?, ?, ?)', db_row)
answered Jan 25, 2016 at 13:26
\$\endgroup\$
5
  • \$\begingroup\$ thank you very much for a detailed revision @Mathias, although i do not quite get what this line does specific_values = read_parameter(doc, xml_name), db_name \$\endgroup\$ Commented Jan 26, 2016 at 23:34
  • \$\begingroup\$ @anon a, b builds the pair (a, b)... so specific_values is a tuple \$\endgroup\$ Commented Jan 26, 2016 at 23:39
  • \$\begingroup\$ question again , does not this take longer to process? because we are inserting one by one and not as a whole? @Mathias \$\endgroup\$ Commented Jan 29, 2016 at 2:17
  • \$\begingroup\$ @anon Why do you think you were doing it all at once? for i in splitspec: ... insert_runspec(t) also did it one by one... \$\endgroup\$ Commented Jan 29, 2016 at 7:39
  • \$\begingroup\$ oh yea sorry, I overlooked that @Mathias \$\endgroup\$ Commented Jan 29, 2016 at 7:46

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.