I need to parse a simple JSON string (flat JSON, no hierarchy) for keys and values, and there is a system constraint that I cannot use any built-in JSON library and can only read a string once due to latency requirements. I need to use Python 2.7.x series and cannot use a higher version.
Note: This question is version 3 of my code to solve this problem.
Version 1 - answered by @holroy
Version 2 - answered by @Tersosauros
This is a new version of code based on guidance from @Tersosauros' post of the previous version. Note that I still do not handle some corner cases. I do not want to use two state variables as with @Tersosauros' code, but rewrote using only one state variable. I am not 100% confident if my code is correct logic and your advice is highly appreciated.
READY = 0
COLON = 1
COMMA = 2
STRING = 3
NUMBER = 4
numbers='0123456789.'
NoOps=' {'
def parse(strJSON):
state = READY
lastKey=''
lastValue=''
temp=''
result={}
for c in strJSON:
if state == READY:
if c in NoOps:
continue
elif c == '"':
state = STRING
elif c == ':':
lastKey=temp
temp=''
state = READY
elif c == ',' or c=='}':
lastValue=temp
temp=''
if lastKey:
result[lastKey]=lastValue
lastValue=''
lastKey=''
state=READY
elif c in numbers:
state=NUMBER
elif state == STRING:
if c == '"':
state = READY
else:
temp += c
if state == NUMBER:
if c in NoOps or c==',' or c=='}':
lastValue=temp
temp=''
if lastKey:
result[lastKey]=lastValue
lastValue=''
lastKey=''
state=READY
else:
temp+=c
if lastKey:
result[lastKey]=lastValue
return result
if __name__ == "__main__":
#JSONString = '{ "id": 1, "name": "A green door", "price": 12.50, "tags": ["home", "green"]}'
JSONString1 = '{ "id": "1", "name": "A green door", "price": "12.50", "tags": "home green"}'
JSONString2 = '{ "id": 1, "name": "A green door", "price": 12.50, "tags": "home green"}'
print parse(JSONString1)
print parse(JSONString2)
1 Answer 1
Variable Names
Apart from the odd (non PEP8 compliant) variable casing, you also have a temp
variable. Names like temp
are frowned upon as they convey nothing about what that variable represents.
Repeated Tests
You do seem to repeat the tests c==',' or c=='}':
a couple of times (lines 26 and 42. I think this could be refactored into using a test string, similar to your NoOps
variable.
For example, on line 42, instead of:
if c in NoOps or c==',' or c=='}':
You could just do
if c in NoOps or c in CloseOps:
Assuming you had a variable CloseOps
defined in the preamble, like this:
CloseOps = ",}"
The same thing would apply for the repeated use of this test condition on line 26.
Repeated Logic
The following code (lines 45 - 49 inclusive):
if lastKey: result[lastKey]=lastValue lastValue='' lastKey='' state=READY
Appears to be an (almost) verbatim repeat of the logic contained upon lines 29 - 33 (inclusive):
if lastKey: result[lastKey]=lastValue lastValue='' lastKey='' state=READY
Functional Decomposition would dictate that such repeated logic be refactored into it's own function.
Numbers aren't Strings!
There exists a flaw in your code (which was not present in the answer code to your last question), where-by JSON-encoded numbers are turned into strings in your resulting dict
.
If you (using your parse
function implementation from this question) do:
>>> type(parse('{"foo":"1"}')['foo'])
This yields:
<type 'str'>
(Which is fine, since here "1"
should be a string. However:
>>> type(parse('{"foo":1}')['foo'])
Also yields:
<type 'str'>
Which indicates an error, as this should be <type 'int'>
if the JSON were decoded properly.
-
\$\begingroup\$ Thanks Tersosauros, agree with all of your comments. I want to confirm in general you think using one state variable is fine, comparing using two state variables for the state machine? \$\endgroup\$Lin Ma– Lin Ma2016年03月14日 08:05:06 +00:00Commented Mar 14, 2016 at 8:05
-
1\$\begingroup\$ Yes, as I said in my previous comment on the question: "you should generally only ever need to track one state per FSM. So since all your logic is contained within one, you would need only one
state
variable." The only BIG problem with this code is the broken integer handling - the other issues (variable names, a repeated test condition, and one 4-line repeated block of code) are fairly minor. Although, as a review site, they are warranted for this. \$\endgroup\$Tersosauros– Tersosauros2016年03月14日 15:33:52 +00:00Commented Mar 14, 2016 at 15:33 -
\$\begingroup\$ Thanks Tersosauros, I accepted all of your comments, and to fix the integer type issue, I think I can fix by using
int(lastValue)
to convert the value directly? How do you think? Thanks. \$\endgroup\$Lin Ma– Lin Ma2016年03月15日 03:01:47 +00:00Commented Mar 15, 2016 at 3:01 -
1\$\begingroup\$ Yes, casting the value to an integer is what you need. There was code to do this in my answer to the last version (#2) of this question. The cast will need to be inside a try/except block to ensure that non-integer values are caught and treated as some other type (float for example) - this was the behaviour of the code you based this new code from. \$\endgroup\$Tersosauros– Tersosauros2016年03月17日 23:36:42 +00:00Commented Mar 17, 2016 at 23:36
Explore related questions
See similar questions with these tags.
temp
, as this conveys nothing about what that variable represents. \$\endgroup\$state
variable. \$\endgroup\$