I have data that looks like the "Input" below and need to convert it into JSON. My solution works by parsing the text to find a level for each data point. Then I use a recursive structure to build a JSON tree (or maybe its not JSON, but its much more useful than the original format).
First, I transform the input in the following way.
Input:
person:
address:
street1: 123 Bar St
street2:
city: Madison
state: WI
zip: 55555
web:
email: [email protected]
First-step output:
[{'name':'person','value':'','level':0},
{'name':'address','value':'','level':1},
{'name':'street1','value':'123 Bar St','level':2},
{'name':'street2','value':'','level':2},
{'name':'city','value':'Madison','level':2},
{'name':'state','value':'WI','level':2},
{'name':'zip','value':55555,'level':2},
{'name':'web','value':'','level':1},
{'name':'email','value':'[email protected]','level':2}]
This is easy to accomplish with split(':')
and by counting the number of leading tabs:
def tab_level(astr):
"""Count number of leading tabs in a string
"""
return len(astr)- len(astr.lstrip('\t'))
Then I feed the first-step output into the following function:
def ttree_to_json(ttree,level=0):
result = {}
for i in range(0,len(ttree)):
cn = ttree[i]
try:
nn = ttree[i+1]
except:
nn = {'level':-1}
# Edge cases
if cn['level']>level:
continue
if cn['level']<level:
return result
# Recursion
if nn['level']==level:
dict_insert_or_append(result,cn['name'],cn['value'])
elif nn['level']>level:
rr = ttree_to_json(ttree[i+1:], level=nn['level'])
dict_insert_or_append(result,cn['name'],rr)
else:
dict_insert_or_append(result,cn['name'],cn['value'])
return result
return result
where:
def dict_insert_or_append(adict,key,val):
"""Insert a value in dict at key if one does not exist
Otherwise, convert value to list and append
"""
if key in adict:
if type(adict[key]) != list:
adict[key] = [adict[key]]
adict[key].append(val)
else:
adict[key] = val
The approach is redundant and therefore inefficient. I also wonder whether the solution is robust (for example, I had to modify the code to accommodate repeated tags). Think of the Input above as a formatting for SGML. Any suggestions for improvement would be greatly appreciated!
2 Answers 2
I have not programmed in python but you should be able to do this in one shot. In pseudo-code it should be something like so:
function parseJsonInput (file)
var parsedJson= {};
var parentStack = [parsedJson]
for each line in file
var data = parseLine(line) //return key,value,level. null if not avail for each field
//leaf item process it by adding it to its current parent
if data.value is not null
var currentParent = parentStack.getLastElement()
currentParent[data.key] = data.value
var nextLineLevel = parseLine( peekNextLine() ).level; //peek next line level
if nextLineLevel = data.level - 1
parentStack.pop() //done processing child, about to go back to parent
else
//group node, push it as the new parent and keep on processing.
//created more variable than needed for clarity
var currentParent = parentStack.getLastElement()
currentParent[data.key] = {}
var newParent = currentParent[data.key]
parentStack.push( newParent )
endfor
return parsedJson;
end function
I haven't tried that but that should work give or take few bugs. But the basic idea as I mentioned in the comment is to transverse the file iteratively as a tree structure using iterative post-tree-transversal method. I'm not sure if the 'peekNextLine' is available to you, if not then you would need another variable to keep track of the last level processed and strategically insert that logic there -- I started doing this, but figured it might be confusing at first. Let me know if there is some issues. I can help. If I have time I can even write a quick javascript version for you to learn from.
-
\$\begingroup\$ Thank you user\d+. Using a stack is probably a better solution... and more transparent! Still, I am curious about an improved recursive solution. Is it possible to achieve the same level of efficiency with a recursive solution? \$\endgroup\$kalu– kalu2014年07月27日 02:32:42 +00:00Commented Jul 27, 2014 at 2:32
-
\$\begingroup\$ Can you please convert it to Javascript code \$\endgroup\$Laxmikant Dange– Laxmikant Dange2015年04月25日 11:48:55 +00:00Commented Apr 25, 2015 at 11:48
-
\$\begingroup\$ @LaxmikantDange sorry bud no incentive for me \$\endgroup\$dchhetri– dchhetri2015年05月25日 02:33:25 +00:00Commented May 25, 2015 at 2:33
Practical tips
Instead of:
try: nn = ttree[i+1] except: nn = {'level': -1}
This should have been:
if i + 1 < len(ttree):
nn = ttree[i + 1]
else:
nn = {'level': -1}
try/except is for handling anomalies: things that shouldn't happen under normal circumstances. Take for example an inventory service, with an API method get_item(item_id)
. Callers of the API are expected to use valid item ids. If a user asks for a non-existent item that's an anomaly. So when loading the item with item_id
from the backend storage, you would assume the query will work, and handle the case of non-existent items with a try/except.
In your program, the case of i + 1 >= len(ttree)
happens during normal operations, when processing the last line of input. This is not an anomaly, as it happens for perfectly legal inputs too. This is a case for checking with an if
instead of try/except.
Abusing try/except can sometimes hurt performance too. One time in the past I misunderstood the "ask forgiveness not permission" rule and changed many of my ifs to try/except in a program where this would happen a few dozens of times. During my tests, the speed difference was noticeable even without a timer. try/except is the right tool for handling anomalies, things that are not supposed to happen during normal operations.
Don't check for the type of an object like this:
if type(adict[key]) != list:
Use the isinstance
built-in function instead:
if isinstance(adict[key], list):
Instead of range(0, n)
you can write range(n)
, it's the same thing.
-
\$\begingroup\$ Thanks for your feedback @janos! I was not aware of the
isinstance()
method, thank you for pointing that out. Regarding your first tip, I was under the impression that asking for forgiveness is more pythonic. See, for example, this question link. Alex Martelli does it too, see link. Does try-except really make programs run slower? Do you have a source to support that assertion? \$\endgroup\$kalu– kalu2014年07月27日 02:29:33 +00:00Commented Jul 27, 2014 at 2:29 -
\$\begingroup\$ Alex Martelli says try/except is for "anomalous situations", and that's exactly what I meant with "for handling exceptional events that shouldn't happen under normal circumstances". If I understand your program correctly, it will pass through the
except
during normal operations, for example when you parse the example input, which is a valid input, and there is no anomaly here. \$\endgroup\$janos– janos2014年07月27日 04:51:54 +00:00Commented Jul 27, 2014 at 4:51 -
-
\$\begingroup\$ I suppose you're talking about the implementation of the
take
function? That's still an anomalous situation: trying to taken
items from a sequence that has less thann
items doesn't really make sense, so that's kind of an anomaly. \$\endgroup\$janos– janos2014年08月10日 06:43:14 +00:00Commented Aug 10, 2014 at 6:43
name=person
withvalue=''
\$\endgroup\$