Convert elementtree to dict

Question 1

Just needed a quick way to convert an elementtree element to a dict. I don't care if attributes/elements clash in name, nor namespaces. The XML files are small enough. If an element has multiple children which have the same name, create a list out of them:

def elementtree_to_dict(element):
 d = dict()
 if hasattr(element, 'text') and element.text is not None:
 d['text'] = element.text
 d.update(element.items()) # element's attributes
 for c in list(element): # element's children
 if c.tag not in d: 
 d[c.tag] = elementtree_to_dict(c)
 # an element with the same tag was already in the dict
 else: 
 # if it's not a list already, convert it to a list and append
 if not isinstance(d[c.tag], list): 
 d[c.tag] = [d[c.tag], elementtree_to_dict(c)]
 # append to the list
 else: 
 d[c.tag].append(elementtree_to_dict(c))
 return d

Thoughts? I'm particularly un-fond of the not instance part of the last if.

Question 2

def elementtree_to_dict(element):
 d = dict()

I'd avoid the name d its not very helpful

 if hasattr(element, 'text') and element.text is not None:
 d['text'] = element.text

getattr has a third parameter, default. That should allow you to simplify this piece of code a bit

 d.update(element.items()) # element's attributes
 for c in list(element): # element's children

The list does nothing, except waste memory.

 if c.tag not in d: 
 d[c.tag] = elementtree_to_dict(c)
 # an element with the same tag was already in the dict
 else: 
 # if it's not a list already, convert it to a list and append
 if not isinstance(d[c.tag], list): 
 d[c.tag] = [d[c.tag], elementtree_to_dict(c)]
 # append to the list
 else: 
 d[c.tag].append(elementtree_to_dict(c))

Yeah this whole block is a mess. Two notes:

Put everything in lists to begin with, and then take them out at the end
call elementtree_to_dict once
```
return d
```

This whole piece of code looks like a bad idea.

<foo>
 <bar id="42"/>
</foo>

Becomes

{"bar" : {"id": 42}}

Whereas

<foo>
 <bar id="42"/>
 <bar id="36"/>
</foo>

Becomes

{"bar" : [{"id" : 42}, {"id": 36}]}

The XML schema is the same, but the python "schema" will be different. It'll be annoying writing code that correctly handles both of these cases.

Having said that, here's my cleanup of your code:

def elementtree_to_dict(element):
 node = dict()
 text = getattr(element, 'text', None)
 if text is not None:
 node['text'] = text
 node.update(element.items()) # element's attributes
 child_nodes = {}
 for child in element: # element's children
 child_nodes.setdefault(child, []).append( elementtree_to_dict(child) )
 # convert all single-element lists into non-lists
 for key, value in child_nodes.items():
 if len(value) == 1:
 child_nodes[key] = value[0]
 node.update(child_nodes.items())
 return node

Question 3

Yep, having stuff which is 0,1+ makes stuff be sometimes lists and sometimes not. Of course this can only be fixed by either having lists always (not nice) or knowing about the XML schema and forcing stuff to be lists even though there's only one (or zero!) elements.

Winston Ewert Winston Ewert 30.7k4 gold badges52 silver badges79 bronze badges · Accepted Answer · 2012-03-28 19:16:55Z

def elementtree_to_dict(element):
 d = dict()

I'd avoid the name d its not very helpful

 if hasattr(element, 'text') and element.text is not None:
 d['text'] = element.text

getattr has a third parameter, default. That should allow you to simplify this piece of code a bit

 d.update(element.items()) # element's attributes
 for c in list(element): # element's children

The list does nothing, except waste memory.

 if c.tag not in d: 
 d[c.tag] = elementtree_to_dict(c)
 # an element with the same tag was already in the dict
 else: 
 # if it's not a list already, convert it to a list and append
 if not isinstance(d[c.tag], list): 
 d[c.tag] = [d[c.tag], elementtree_to_dict(c)]
 # append to the list
 else: 
 d[c.tag].append(elementtree_to_dict(c))

Yeah this whole block is a mess. Two notes:

Put everything in lists to begin with, and then take them out at the end
call elementtree_to_dict once
```
return d
```

This whole piece of code looks like a bad idea.

<foo>
 <bar id="42"/>
</foo>

Becomes

{"bar" : {"id": 42}}

Whereas

<foo>
 <bar id="42"/>
 <bar id="36"/>
</foo>

Becomes

{"bar" : [{"id" : 42}, {"id": 36}]}

The XML schema is the same, but the python "schema" will be different. It'll be annoying writing code that correctly handles both of these cases.

Having said that, here's my cleanup of your code:

def elementtree_to_dict(element):
 node = dict()
 text = getattr(element, 'text', None)
 if text is not None:
 node['text'] = text
 node.update(element.items()) # element's attributes
 child_nodes = {}
 for child in element: # element's children
 child_nodes.setdefault(child, []).append( elementtree_to_dict(child) )
 # convert all single-element lists into non-lists
 for key, value in child_nodes.items():
 if len(value) == 1:
 child_nodes[key] = value[0]
 node.update(child_nodes.items())
 return node

Yep, having stuff which is 0,1+ makes stuff be sometimes lists and sometimes not. Of course this can only be fixed by either having lists always (not nice) or knowing about the XML schema and forcing stuff to be lists even though there's only one (or zero!) elements.

Stack Exchange Network

Convert elementtree to dict

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Convert elementtree to dict

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions