The below is a C++ parser for a reasonably simple INI grammar (although my understanding is that there isn't an official spec as such).
My grammar is roughly:
comments: (optional whitespace){;#}
section: (optional whitespace)[{printable ASCII}]
keyval: (optional whitespace){nows printable ASCII}={printable ASCII}
ini_parser.h
#pragma once
#include <map>
#include <string>
class IniParser {
public:
IniParser(const std::string &path);
template<typename T>
T get(const std::string &key, const std::string §ion);
//get values in 'default' section
template<typename T>
T get(const std::string &key) { return get<T>(key, ""); }
private:
using SectionMap = std::map<std::string, std::string>;
using IniMap = std::map<std::string, SectionMap>;
IniMap inimap;
};
ini_parser.cpp
#include <cctype>
#include <cstdint>
#include <fstream>
#include <limits>
#include <vector>
#include <sstream>
#include "ini_parser.h"
namespace {
bool iskey(char c){
return isalnum(c) || ispunct(c);
}
bool isval(char c){
return isalnum(c) || ispunct(c) || isblank(c);
}
}
IniParser::IniParser(const std::string &path) {
enum class State {
init,
section,
key,
value,
skipline
} state = State::init;
std::ifstream f{path, std::ios::in};
if(!f.is_open()){
throw std::runtime_error("file doesn't exist");
}
std::vector<char> buf;
std::string section, key;
char c;
auto err_helper = [&](const std::string §ion){
std::ostringstream ss;
ss << "invalid character '" << c << "'(" << static_cast<int>(c) << ") in parsestate=" << section << ", at " << path << ":" << f.tellg();
throw std::runtime_error(ss.str());
};
while(f.get(c)){
switch (state){
case State::skipline:
if (c == '\n'){
state = State::init;
}
break;
case State::init:
if (c == ';' || c == '#'){
state = State::skipline;
} else if (c == '[') {
state = State::section;
} else if (std::isspace(c)){
//pass
} else if (iskey(c)){
state = State::key;
buf.push_back(c);
} else {
err_helper("init");
}
break;
case State::section:
if (c == ']'){
section = std::string(buf.begin(), buf.end());
buf.clear();
state = State::skipline;
} else if (isval(c)){
buf.push_back(c);
} else {
err_helper("section");
}
break;
case State::key:
if (c == '='){
key = std::string(buf.begin(), buf.end());
buf.clear();
state = State::value;
} else if (iskey(c)) { //disallow whitespace in key
buf.push_back(c);
} else {
err_helper("key");
}
break;
case State::value:
if (c == '\n'){
const std::string token = std::string(buf.begin(), buf.end());
buf.clear();
inimap[section][key] = token;
state = State::init;
} else if (isval(c)){
buf.push_back(c);
} else {
err_helper("value");
}
break;
}
}
if(state != State::init){
err_helper("eof");
}
}
template<>
std::string IniParser::get(const std::string &key, const std::string §ion){
return inimap.at(section).at(key);
}
template<>
int IniParser::get(const std::string &key, const std::string §ion){
return std::stoi(inimap.at(section).at(key));
}
template<>
std::uint16_t IniParser::get(const std::string &key, const std::string §ion){
auto val = std::stoul(inimap.at(section).at(key));
if(val > std::numeric_limits<std::uint16_t>::max()){
throw std::overflow_error("value too large for type");
}
return static_cast<std::uint16_t>(val);
}
template<>
double IniParser::get(const std::string &key, const std::string §ion){
return std::stod(inimap.at(section).at(key));
}
//add more specializations as neccessary
After completing this it occurred to me that a regex approach might have been superior (or at least, terser), but perhaps the error reporting is better here.
2 Answers 2
I see some things that may help you improve your code.
Consider altering the grammar
Right now the grammar will accept a line like this:
[user]Not a comment, but acts like one
But explicitly reject lines like this one:
key = value
The reason is that the space between the key and the =
causes the parser to throw
a key error. That's a bit inconvenient and could be remedied by adding an additional state between the key
and =
that would exlicitly allow whitespace there.
Consider reordering slightly
Instead of this series of statements:
const std::string token = std::string(buf.begin(), buf.end());
buf.clear();
inimap[section][key] = token;
One might instead consider this:
inimap[section][key] = std::string(buf.begin(), buf.end());
buf.clear();
A smart compiler might be able to generate the same code, but the latter version means that the compiler might better be able to do a move of the string rather than a create/copy/delete.
Reconsider the interface
While I can see the appeal of a generic get
interface, everything is stored internally as a std::string
, so it might make sense to simplify the interface of this class to only return a std::string
and let the recieving code do any conversions. This has the advantage that the conversion routines are no longer part of the class and may be customized by usage or perhaps by key. For example, a custom date class might benefit from having, essentially, a custom converter from std::string
which may well be useful outside the IniParser
class (as with retrieving the data from a user and then converting).
Consider finer grained error handling
The int
version of get
has this single line as its body:
return std::stoi(inimap.at(section).at(key));
This single line has three different ways to throw
a std::out_of_range
error. The section
lookup could fail, or the key
lookup could fail or stoi
could fail. It will not be easy for the calling code to determine which of these failures occurred if one does. It's possible that the calling code only needs to know that an error occured and not which one, but a finer-grained error reporting mechanism (e.g. with custom error classes) might help if the calling code would benefi from knowing the difference between a key lookup error and a section lookup error.
Use std::regex
For an approach which uses std::regex
, see ini file parser in C++
I hope it's not poor form to review my own code, but here are some thoughts I had on this sample after looking back on it after a couple of days:
Work on ifstreams, not paths
This class does too much: it checks for the existence of a file as well as operating on it. It would better to work on an ifstream and leave the file checks to the call site. This also gives flexibility to work on other streams, not just those from files.
Check for duplicates
Duplicate keys within sections will overwrite previous values - although there's no concrete spec specifically forbidding this, it's probably not the intended behaviour.
Const members where possible
The get
methods should be const.
Explore related questions
See similar questions with these tags.