I want to read python dictionary string using java. Example string:
{'name': u'Shivam', 'otherInfo': [[0], [1]], 'isMale': True}
This is not a valid JSON. I want it to convert into proper JSON using java code.
-
5Interesting assignment. And what is your question? And I agree with the following comment: why spend energy to parse a non-standard format; instead of making sure you emit JSON on the python side instead?!GhostCat– GhostCat2017年04月26日 12:33:16 +00:00Commented Apr 26, 2017 at 12:33
-
As this is not a proper JSON so I am not able to load it in JAVA. Basically I am using SCALA and json4s library.Devavrata– Devavrata2017年04月26日 12:34:15 +00:00Commented Apr 26, 2017 at 12:34
-
@GhostCat It is not possible in my case. These strings are saved in DBDevavrata– Devavrata2017年04月26日 12:34:44 +00:00Commented Apr 26, 2017 at 12:34
-
1@Devarata then convert them to JSON as they get into the database. Saving non standard formats into a db spells troubleCruncher– Cruncher2017年04月26日 12:36:39 +00:00Commented Apr 26, 2017 at 12:36
-
2Perhaps you should use Jython to allow you to pass values to a python interpreter within Java and let it return that JSON to you.RealSkeptic– RealSkeptic2017年04月26日 12:45:48 +00:00Commented Apr 26, 2017 at 12:45
3 Answers 3
well, the best way would be to pass it through a python script that reads that data and outputs valid json:
>>> json.dumps(ast.literal_eval("{'name': u'Shivam', 'otherInfo': [[0], [1]], 'isMale': True}"))
'{"name": "Shivam", "otherInfo": [[0], [1]], "isMale": true}'
so you could create a script that only contains:
import json, ast; print(json.dumps(ast.literal_eval(sys.argv[1])))
then you can make it a python oneliner like so:
python -c "import sys, ast, json ; print(json.dumps(ast.literal_eval(sys.argv[1])))" "{'name': u'Shivam', 'otherInfo': [[0], [1]], 'isMale': True}"
that you can run from your shell, meaning you can run it from within java the same way:
String PythonData = "{'name': u'Shivam', 'otherInfo': [[0], [1]], 'isMale': True}";
String[] cmd = {
"python", "-c", "import sys, ast, json ; print(json.dumps(ast.literal_eval(sys.argv[1])))",
python_data
};
Runtime.getRuntime().exec(cmd);
and as output you'll have a proper JSON string.
This solution is the most reliable way I can think of, as it's going to parse safely any python syntax without issue (as it's using the python parser to do so), without opening a window for code injection.
But I wouldn't recommend using it, because you'd be spawning a python process for each string you parse, which would be a performance killer.
As an improvement on top of that first answer, you could use some jython to run that python code in the JVM for a bit more performance.
PythonInterpreter interpreter = new PythonInterpreter();
interpreter.eval("to_json = lambda d: json.dumps(ast.literal_eval(d))")
PyObject ToJson = interpreter.get("to_json");
PyObject result = ToJson.__call__(new PyString(PythonData));
String realResult = (String) result.__tojava__(String.class);
The above is untested (so it's likely to fail and spawn dragons 👹) and I'm pretty sure you can make it more elegant. It's loosely adapted from this answer. I'll leave up to you as an exercise to see how you can include the jython environment in your Java runtime ☺.
P.S.: Another solution would be to try and fix every pattern you can think of using a gigantic regexp or multiple ones. But even if on simpler cases that might work, I would advise against that, because regex is the wrong tool for the job, as it won't be expressive enough and you'll never be comprehensive. It's only a good way to plant a seed for a bug that'll kill you at some point in the future.
P.S.2: Whenever you need to parse code from an external source, always make sure that data is sanitized and safe. Never forget about little bobby tables
10 Comments
exec of python, the code ran is exactly the one liner as being written above, and the variable is passed as an argv argument to the literal_eval function, this code is pretty safe against usual exploits.In conjunction to the other answer: it is straight forward to simply invoke that python one-liner statement to "translate" a python-dict-string into a standard JSON string.
But doing a new Process for each row in your database might turn into a performance killer quickly.
Thus there are two options that you should consider on top of that:
- establish some small "python server" that keeps running; its only job is to do that translation for JVMs that can connect to it
- you can look into jython. Meaning: simply enable your JVM to run python code. In other words: instead of writing your own python-dict-string parser; you simply add "python powers" to your JVM; and rely on existing components to that translation for you.
Comments
Hacky solution
Do a string replace ' -> ", True -> true, False -> false, and None -> null, then parse the result as Json. If you are lucky (and are willing to bet on remaining lucky in the future), this can actually work in practice.
See rh-messaging/cli-rhea/blob/main/lib/formatter.js#L240-L249 (in Javascript)
static replaceWithPythonType(strMessage) {
return strMessage.replace(/null/g, 'None').replace(/true/g, 'True').replace(/false/g, 'False').replace(/undefined/g, 'None').replace(/\{\}/g, 'None');
}
Skylark solution
Skylark is a subset (data-definition) language based on Python. There are parsers in Go, Java, Rust, C, and Lua listed on the project's page. The problem is that the Java artifacts aren't published anywhere, as discussed in Q: How do I include a Skylark configuration parser in my application?
Graal Python
Possibly this, https://github.com/oracle/graalpython/issues/96#issuecomment-1662566214
DIY Parsers
I was not able to find a parser specific to the Python literal notation. The ANTLR samples contain a Python grammar that could plausibly be cut down to work for you https://github.com/antlr/grammars-v4/tree/master/python/python3