I have that XML-to-JSON function based on ElementTree
. It looks very simple but until now it does what it's supposed to do: give a JSON description of the document's ElementTree
.
import xml.etree.ElementTree as ET
def dirtyParser(node):
'''dirty xml parser
parses tag, attributes, text, children
recursive
returns a nested dict'''
# mapping with recursive call
res = {'tag':node.tag,
'attributes': node.attrib,
'text': node.text,
'children': [dirtyParser(c) for c in node.getchildren()]}
# remove blanks and empties
for k, v in res.items():
if v in ['', '\n', [], {}, None]:
res.pop(k, None)
return res
Usage:
>>> some_xml = ET.fromstring(u'<?xml version="1.0" encoding="UTF-8" ?><records><record><him>Maldonado, Gavin G.</him><her>Veda Parks</her></record></records>')
>>> dirtyParser(some_xml)
>>> {'tag': 'records', 'children': [{'tag': 'record', 'children': [{'tag': 'him', 'text': 'Maldonado, Gavin G.'}, {'tag': 'her', 'text': 'Veda Parks'}]}]}
Is it really that reliable?
1 Answer 1
It's probably not reliable except if your XML data is simple.
- XML is tricky!
- You forgot the
.tail
attribute, which contains any text after a given attribute. - Whitespace is significant, so you won't be able to go back to the same XMl document.
- And everything else I don't know about.
- You forgot the
- The way Python represents dictionary is different from JSON. For example, JSON only allows
"
for quoting, not'
. You can usejson.dumps
to solve this problem. More obviously, if you were representing this data using JSON, your data would look like:
"records": [ {"him": "Maldonado, Gavin G.", "her": "Veda Parks"} ]
or something like that. That's very different from what you're outputting, so your progrem does not really represent your data using JSON, but represents the XML representing your data using JSON. But converting to "real JSON" is much more difficult except for some very specific XML, and would not be useful as a general purpose converter.
This program may be useful to you in some specific scenarios, but you'd better explicitly state what kind of data you accept and reject anything else. Also, what's the point of this?
-
\$\begingroup\$ #3 catched me: the code works more or less but it don't deal with the conceptual difference between xml and json: so it's stupid and useless. I'd better extract my data from raw xml with a purpose specific function! \$\endgroup\$outforawhile– outforawhile2014年09月22日 09:00:16 +00:00Commented Sep 22, 2014 at 9:00