I am implementing a json parser in C++. I have written some code and wanted to share it on Stack Overflow.
While writing this program I assumed that the json will only have objects and arrays are only string arrays.
Also I used a stack to find corresponding closing ']' or '}'.
I used vectors to store arrays so that I could get specific elements using their index.
Below is my code:
#include<iostream>
#include<stack>
#include<vector>
using namespace std;
class JsonHelper {
wstring json_wstr;
stack<wstring> wstack;
vector<wstring> v;
public:
JsonHelper(wstring input_json);
JsonHelper operator[](wstring wstr);
JsonHelper operator[](long index);
};
JsonHelper::JsonHelper(wstring input_json):json_wstr(input_json) {
//Validate Json Here
}
JsonHelper JsonHelper::operator [](std::wstring wstr){
if(json_wstr.empty()) {
throw L"Json Is Empty";
}
long pos = json_wstr.find(wstr);
if(pos == std::wstring::npos) {
throw L"Object not found in json";
}
pos = json_wstr.find(L":",pos);
pos++;
long last_pos = pos;
while(1){
if(json_wstr[pos]== L'}' && wstack.empty()) {
break;
}
if(json_wstr[pos] == L',' && wstack.empty()){
break;
}
if(json_wstr[pos] == L'{') {
wstack.push(L"{");
}
if(json_wstr[pos] == L'}'){
if(wstack.top().find(L"{") == wstring::npos)
throw L"Invalid Json";
wstack.pop();
}
if(json_wstr[pos] == L'[') {
wstack.push(L"[");
}
if(json_wstr[pos] == L']'){
if(wstack.top().find(L"[") == wstring::npos)
throw L"Invalid Json";
wstack.pop();
}
pos++;
}
wstring json_object = json_wstr.substr(last_pos,(pos-last_pos));
json_object.erase(json_object.find_last_not_of(L" \n\r\t")+1);
json_object.erase(0,json_object.find_first_not_of(L" \n\r\t"));
return JsonHelper(json_object);
}
JsonHelper JsonHelper::operator [](long index) {
wstring tmp;
if(json_wstr.empty()) {
throw L"Json Is Empty";
}
long pos = json_wstr.find_first_of(L"[");
if(pos == wstring::npos)
throw L"Invalid Json";
long last_pos = ++pos;
while(1) {
if(json_wstr[pos] == L',' && wstack.empty()) {
tmp = json_wstr.substr(last_pos,pos-last_pos);
tmp.erase(tmp.find_last_not_of(L" \n\r\t")+1);
tmp.erase(0,tmp.find_first_not_of(L" \n\r\t"));
v.push_back(tmp);
last_pos= pos+1;
}
if(json_wstr[pos] == L'{') {
wstack.push(L"{");
}
if(json_wstr[pos] == L'}'){
if(wstack.top().find(L"{") == wstring::npos)
throw L"Invalid Json";
wstack.pop();
}
if(json_wstr[pos] == L'[') {
wstack.push(L"[");
}
if(json_wstr[pos] == L']' && wstack.empty()) {
tmp = json_wstr.substr(last_pos,pos-last_pos);
tmp.erase(tmp.find_last_not_of(L" \n\r\t")+1);
tmp.erase(0,tmp.find_first_not_of(L" \n\r\t"));
v.push_back(tmp);
break;
}
if(json_wstr[pos] == L']'){
if(wstack.top().find(L"[") == wstring::npos)
throw L"Invalid Json";
wstack.pop();
}
pos++;
}
if(v.size() == 0) {
throw "Invalid Array object or array is empty";
}
return v.at(index-1);
}
int main() {
JsonHelper j(L"{\"ID\": \"SGML\",\"SortAs\": \"SGML\",\"GlossTerm\": \"Standard Generalized Markup Language\",\"Acronym\": \"SGML\",\"Abbrev\": \"ISO 8879:1986\",\"GlossDef\": {\"para\": \"A meta-markup language, used to create markup languages such as DocBook.\",\"GlossSeeAlso\": [\"XML\", \"GML\"]},\"GlossSee\": \"markup\"}");
JsonHelper j1 = j[L"GlossDef"][L"GlossSeeAlso"][2];
return 0;
}
Please suggest improvements for my code. A few improvements I am looking for are
- Validate the Json, I am thinking of using regx in c++
- Get a specific value like Float, Bool, String etc.
- Improved Performance
- Should be able to handle huge json
-
3\$\begingroup\$ xkcd.com/1171 \$\endgroup\$Loki Astari– Loki Astari2017年06月01日 06:14:29 +00:00Commented Jun 1, 2017 at 6:14
-
2\$\begingroup\$ github.com/Loki-Astari/ThorsSerializer/blob/master/README.md \$\endgroup\$Loki Astari– Loki Astari2017年06月01日 06:15:24 +00:00Commented Jun 1, 2017 at 6:15
1 Answer 1
##Questions
Validate the Json, I am thinking of using regx in c++
Bad Idea. Regular expressions are not designed for this. You should implement your own parser.
Get a specific value like Float, Bool, String etc.
Yes. You need this. Currently you can't get a value out of the object or array.
Improved Performance
Sure.
Should be able to handle huge json
For this you will need a real parser. Regular expressions will just make this worse.
##Code Review
###using namespace
Don't do this:
using namespace std;
There is a reason that the standard namespace is called std
and not standard
. Its to make it easy to prefix things without bloating the code to much.
Importing everything from the standard namesapce into the current context is very error prone and can cause problems in larger programs. Don't get into bad habits now.
For details read: Why is "using namespace std" considered bad practice?
###Wide Strings
wstring json_wstr;
Are you sure you want to use wide strings?
From: rfc7159
JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8
###Interface
// Access
JsonHelper operator[](wstring wstr);
JsonHelper operator[](long index);
That's fine as far as it goes. If your JSON is an object or an array. But what happens if it is just a string on integer? There is currently no way to access it.
##Object Accesses
JsonHelper JsonHelper::operator [](std::wstring wstr){
// STUFF
This finds the key.
long pos = json_wstr.find(wstr);
if(pos == std::wstring::npos) {
throw L"Object not found in json";
}
But it also finds keys that don't match the string exactly and also values. Say I was searching for "Key"
. This would also match "KeyValue"
or "LongKey"
. You should check that the proceeding and following character are '"'
long pos = 0L;
bool found;
do {
found = true;
pos = json_wstr.find(wstr, pos);
if(pos == std::wstring::npos) {
break;
}
if ((pos == 0) || json_wstr[pos - 1] != '"' || pos + wstr.size() >= json_wstr.size() || json_wstr[pos + wstr.size()] != '"') {
pos += 1;
found = false;
continue;
}
pos += wstr.size() + 1;
while(pos < json_wstr.size() && std::is_space(json_wstr[pos])) {
++pos;
}
if (json_wstr[pos] != ':') {
found = false;
}
}
while(!found);
if (pos == std::wstring::npos) {
throw stuff;
}
###Throwing Objects
Though it is not required. You should generally throw an object derived from std::exception. Preferably something derived from std::runtime_error.
throw std::runtime_error("Key not found in JSON");
###DRY Code
DRY: Don't repeat yourself.
There is a huges section of code that is repeated in both access functions. This should be refactored into its own method.
###String Literals in C++1
Literal string in C++ have been improved.
You don't need to escape all those "
characters in your input string.
wchar_t data[] = LR"JSON({"PLOP":"PLOP"})JSON";
-
\$\begingroup\$ Thanks Loki i will surely improve things and Update my question,Bcoz their will be more things to improve \$\endgroup\$Dipak– Dipak2017年06月01日 11:55:00 +00:00Commented Jun 1, 2017 at 11:55
-
\$\begingroup\$ @Dipak FYI, if you make improvements and want them reviewed you need to ask a new question instead of updating this question. \$\endgroup\$Null– Null2017年06月01日 17:32:55 +00:00Commented Jun 1, 2017 at 17:32
-
1\$\begingroup\$ I agree that regex is not a good fit for this kind of application. However, your comments and answer may leave the reader with the impression that
std::regex
should never be used, which is not really sound advice. Instead, I'd say to use them where extreme performance is not required, when recursion is not required and when using them makes the code simpler. \$\endgroup\$Edward– Edward2017年06月02日 13:08:41 +00:00Commented Jun 2, 2017 at 13:08