Just needed a quick way to convert an elementtree element to a dict. I don't care if attributes/elements clash in name, nor namespaces. The XML files are small enough. If an element has multiple children which have the same name, create a list out of them:
def elementtree_to_dict(element):
d = dict()
if hasattr(element, 'text') and element.text is not None:
d['text'] = element.text
d.update(element.items()) # element's attributes
for c in list(element): # element's children
if c.tag not in d:
d[c.tag] = elementtree_to_dict(c)
# an element with the same tag was already in the dict
else:
# if it's not a list already, convert it to a list and append
if not isinstance(d[c.tag], list):
d[c.tag] = [d[c.tag], elementtree_to_dict(c)]
# append to the list
else:
d[c.tag].append(elementtree_to_dict(c))
return d
Thoughts? I'm particularly un-fond of the not instance
part of the last if
.
1 Answer 1
def elementtree_to_dict(element):
d = dict()
I'd avoid the name d
its not very helpful
if hasattr(element, 'text') and element.text is not None:
d['text'] = element.text
getattr
has a third parameter, default. That should allow you to simplify this piece of code a bit
d.update(element.items()) # element's attributes
for c in list(element): # element's children
The list
does nothing, except waste memory.
if c.tag not in d:
d[c.tag] = elementtree_to_dict(c)
# an element with the same tag was already in the dict
else:
# if it's not a list already, convert it to a list and append
if not isinstance(d[c.tag], list):
d[c.tag] = [d[c.tag], elementtree_to_dict(c)]
# append to the list
else:
d[c.tag].append(elementtree_to_dict(c))
Yeah this whole block is a mess. Two notes:
- Put everything in lists to begin with, and then take them out at the end
call
elementtree_to_dict
oncereturn d
This whole piece of code looks like a bad idea.
<foo>
<bar id="42"/>
</foo>
Becomes
{"bar" : {"id": 42}}
Whereas
<foo>
<bar id="42"/>
<bar id="36"/>
</foo>
Becomes
{"bar" : [{"id" : 42}, {"id": 36}]}
The XML schema is the same, but the python "schema" will be different. It'll be annoying writing code that correctly handles both of these cases.
Having said that, here's my cleanup of your code:
def elementtree_to_dict(element):
node = dict()
text = getattr(element, 'text', None)
if text is not None:
node['text'] = text
node.update(element.items()) # element's attributes
child_nodes = {}
for child in element: # element's children
child_nodes.setdefault(child, []).append( elementtree_to_dict(child) )
# convert all single-element lists into non-lists
for key, value in child_nodes.items():
if len(value) == 1:
child_nodes[key] = value[0]
node.update(child_nodes.items())
return node
-
\$\begingroup\$ Yep, having stuff which is 0,1+ makes stuff be sometimes lists and sometimes not. Of course this can only be fixed by either having lists always (not nice) or knowing about the XML schema and forcing stuff to be lists even though there's only one (or zero!) elements. \$\endgroup\$alex– alex2012年03月29日 08:08:15 +00:00Commented Mar 29, 2012 at 8:08