7
\$\begingroup\$

Just needed a quick way to convert an elementtree element to a dict. I don't care if attributes/elements clash in name, nor namespaces. The XML files are small enough. If an element has multiple children which have the same name, create a list out of them:

def elementtree_to_dict(element):
 d = dict()
 if hasattr(element, 'text') and element.text is not None:
 d['text'] = element.text
 d.update(element.items()) # element's attributes
 for c in list(element): # element's children
 if c.tag not in d: 
 d[c.tag] = elementtree_to_dict(c)
 # an element with the same tag was already in the dict
 else: 
 # if it's not a list already, convert it to a list and append
 if not isinstance(d[c.tag], list): 
 d[c.tag] = [d[c.tag], elementtree_to_dict(c)]
 # append to the list
 else: 
 d[c.tag].append(elementtree_to_dict(c))
 return d

Thoughts? I'm particularly un-fond of the not instance part of the last if.

asked Mar 28, 2012 at 13:07
\$\endgroup\$
0

1 Answer 1

7
\$\begingroup\$
def elementtree_to_dict(element):
 d = dict()

I'd avoid the name d its not very helpful

 if hasattr(element, 'text') and element.text is not None:
 d['text'] = element.text

getattr has a third parameter, default. That should allow you to simplify this piece of code a bit

 d.update(element.items()) # element's attributes
 for c in list(element): # element's children

The list does nothing, except waste memory.

 if c.tag not in d: 
 d[c.tag] = elementtree_to_dict(c)
 # an element with the same tag was already in the dict
 else: 
 # if it's not a list already, convert it to a list and append
 if not isinstance(d[c.tag], list): 
 d[c.tag] = [d[c.tag], elementtree_to_dict(c)]
 # append to the list
 else: 
 d[c.tag].append(elementtree_to_dict(c))

Yeah this whole block is a mess. Two notes:

  1. Put everything in lists to begin with, and then take them out at the end
  2. call elementtree_to_dict once

    return d
    

This whole piece of code looks like a bad idea.

<foo>
 <bar id="42"/>
</foo>

Becomes

{"bar" : {"id": 42}}

Whereas

<foo>
 <bar id="42"/>
 <bar id="36"/>
</foo>

Becomes

{"bar" : [{"id" : 42}, {"id": 36}]}

The XML schema is the same, but the python "schema" will be different. It'll be annoying writing code that correctly handles both of these cases.

Having said that, here's my cleanup of your code:

def elementtree_to_dict(element):
 node = dict()
 text = getattr(element, 'text', None)
 if text is not None:
 node['text'] = text
 node.update(element.items()) # element's attributes
 child_nodes = {}
 for child in element: # element's children
 child_nodes.setdefault(child, []).append( elementtree_to_dict(child) )
 # convert all single-element lists into non-lists
 for key, value in child_nodes.items():
 if len(value) == 1:
 child_nodes[key] = value[0]
 node.update(child_nodes.items())
 return node
answered Mar 28, 2012 at 19:16
\$\endgroup\$
1
  • \$\begingroup\$ Yep, having stuff which is 0,1+ makes stuff be sometimes lists and sometimes not. Of course this can only be fixed by either having lists always (not nice) or knowing about the XML schema and forcing stuff to be lists even though there's only one (or zero!) elements. \$\endgroup\$ Commented Mar 29, 2012 at 8:08

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.