Please review my ini file parser (and potentially some general config file formats).
The data structure used is a section, see below. Does that seem a good approach?
I was wondering about the name. It will parse ini files ok, but unix config files are so variable that calling the class config might be over selling it a bit.
I don't support mid-line comments. Only comments on a line on their own are well just ignored.
All values are stored as strings. I thought that only the user of the config file will know if a value should be used in his or her application as a string or integer or whatever and onus is on user to convert.
Any comments would be much appreciated. Its intended initial use is for Windows ini files. Parsing the configuration into sections.
config.hpp
#ifndef CONFIG_HPP_
#define CONFIG_HPP_
#include <string>
#include <unordered_map>
#include <list>
struct section
{
std::string name;
std::unordered_map<std::string, std::string> keyvalues;
};
class config
{
public:
config(const std::string& filename);
section* get_section(const std::string& sectionname);
std::list<section>& get_sections();
std::string get_value(const std::string& sectionname, const std::string& keyname);
private:
void parse(const std::string& filename);
std::list<section> sections;
};
#endif // CONFIG_HPP_
config implementation - config.cpp
#include "config.hpp"
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <unordered_map>
#include <list>
// trim leading white-spaces
static std::string& ltrim(std::string& s) {
size_t startpos = s.find_first_not_of(" \t\r\n\v\f");
if (std::string::npos != startpos)
{
s = s.substr(startpos);
}
return s;
}
// trim trailing white-spaces
static std::string& rtrim(std::string& s) {
size_t endpos = s.find_last_not_of(" \t\r\n\v\f");
if (std::string::npos != endpos)
{
s = s.substr(0, endpos + 1);
}
return s;
}
config::config(const std::string& filename) {
parse(filename);
}
section* config::get_section(const std::string& sectionname) {
std::list<section>::iterator found = std::find_if(sections.begin(), sections.end(), [sectionname](const section& sect) {
return sect.name.compare(sectionname) == 0; });
return found != sections.end() ? &*found : NULL;
}
std::list<section>& config::get_sections() {
return sections;
}
std::string config::get_value(const std::string& sectionname, const std::string&keyname) {
section* sect = get_section(sectionname);
if (sect != NULL) {
std::unordered_map<std::string, std::string>::const_iterator it = sect->keyvalues.find(keyname);
if (it != sect->keyvalues.end())
return it->second;
}
return "";
}
void config::parse(const std::string& filename) {
section currentsection;
std::ifstream fstrm;
fstrm.open(filename);
if (!fstrm)
throw std::invalid_argument(filename + " could not be opened");
for (std::string line; std::getline(fstrm, line);)
{
// if a comment
if (!line.empty() && (line[0] == ';' || line[0] == '#')) {
// allow both ; and # comments at the start of a line
}
else if (line[0] == '[') {
/* A "[section]" line */
size_t end = line.find_first_of(']');
if (end != std::string::npos) {
// this is a new section so if we have a current section populated, add it to list
if (!currentsection.name.empty()) {
sections.push_back(currentsection); // copy
currentsection.name.clear(); // clear section for re-use
currentsection.keyvalues.clear();
}
currentsection.name = line.substr(1, end - 1);
}
else {
// section has no closing ] char
}
}
else if (!line.empty()) {
/* Not a comment, must be a name[=:]value pair */
size_t end = line.find_first_of("=:");
if (end != std::string::npos) {
std::string name = line.substr(0, end);
std::string value = line.substr(end + 1);
ltrim(rtrim(name));
ltrim(rtrim(value));
currentsection.keyvalues[name] = value;
}
else {
// no key value delimitter
}
}
} // for
// if we are out of loop we add last section
// this is a new section so if we have a current section populated, add it to list
if (!currentsection.name.empty()) {
sections.push_back(currentsection); // copy
currentsection.name = "";
currentsection.keyvalues.clear();
}
}
some code to do a simple test
#include "config.hpp"
#include <iostream>
#include <fstream>
#include <string>
void generate_config(const std::string& filename) {
std::ofstream ostrm;
ostrm.open(filename);
if (ostrm) {
ostrm << "[protocol]\nversion = 6 \n\n[user]\nname = Bob Smith \nemail = [email protected] \nactive = true\n\npi = 3.14159";
}
}
int main() {
// generate test file
generate_config("test1.ini");
// retrieve some information from config file
config cfg("test1.ini");
section* usersection = cfg.get_section("user");
if (usersection != NULL) {
std::cout << "section name: " << usersection->name << std::endl;
std::cout << "email=" << cfg.get_value("user", "email") << '\n';
}
}
3 Answers 3
Looks good to me. A couple remarks:
section* get_section(const std::string& sectionname); std::list<section>& get_sections();
Not so sure it is a good idea returning mutable pointers/references to the data, since changing a section
in memory has no effect to the underlaying file representation. It could lead to misunderstandings. It is probably best to keep the data immutable by returning a const
pointer and const ref
instead. That will also allow const qualifying the methods themselves.
const section* get_section(const std::string& sectionname) const;
const std::list<section>& get_sections() const;
Use auto
for these overly verbose template instantiations:
std::unordered_map<std::string, std::string>::const_iterator it = sect->keyvalues.find(keyname);
vs:
const auto it = sect->keyvalues.find(keyname);
This will also make life easier if the container or contents thereof change in the future. You just have to update the keyvalues
declaration, not everywhere it was used.
Also on the subject of using the current feature set of C++, you should prefer nullptr
over NULL
. NULL
implicitly converts to integer and messes up function overload resolution because of that, so get in the habit of using nullptr
always.
std::list
still relevant? Some people say you should just forget that linked-lists exist, in a world where CPU caches and data locality play such an important role in code performance. std::vector
is an excellent default for the majority of cases, so I would start with a vector instead. To help you decide, take a look at: Efficiency with Algorithms, Performance with Data Structures.
I see some things that may help you improve your program.
Use all of the required #include
s
The function strncpy
is used but its declaration is in #include <cstring>
which is not actually in the list of includes.
Don't expose class internals
The sections
member data is a private data member, which is fine and appropriate, but then the get_sections()
returns an unfettered reference to it, rendering the private
classification nearly meaningless. The way it's currently written, one can arbitrarily modify all internal data via that one function which is not good object-oriented design. Better would be to either return a copy or at least to return a const reference.
Prefer std::istream
to file names
The current interface gets passed a file name, but that's a relatively inflexible and unsatisfactory solution. For one thing, as your parse
code comment notes, if there's a problem opening a file (or doing anything else with it) one would normally expect that either the code handles it or that it returns some indication of error. This code does neither. Secondly, as the test code illustrates, it would have been nice if the interface had been able to accomodate a std::istream
instead of a filename. This would have allowed usage like this:
std::stringstream ss{"[protocol]\nversion = 6 \n\n[user]\nname = Bob Smith \n"
"email = [email protected] \nactive = true\n\npi = 3.14159"};
config cfg(ss);
Use auto
to simplify your code
Within the get_value
member function we have this line:
std::unordered_map<std::string, std::string>::const_iterator it = sect->keyvalues.find(keyname);
This could be made much simpler by the use of auto
:
const auto it = sect->keyvalues.find(keyname);
Use const
where practical
Several of your member functions are lookup functions which do not and should not alter the underlying config
object. Those functions, such as get_value
, should be declared const
:
std::string get_value(const std::string& sectionname, const std::string& keyname) const;
Reconsider the data structures
The basic usage of the config
object is to be able to fetch individual items by section and key name. The interface right now looks like this when used:
std::cout << "email=" << cfg.get_value("user", "email") << '\n';
For this reason, it seems to me that it would make more sense to use a different data structure than the current std::list
. Instead, one could use a std::unordered_map
of std::unordered_map
s. That would make a number of things a bit simpler.
Support write operations
It may be useful to allow the user to modify settings and then write the updated configuration file back out, but the current interface allows no way to do this.
Add better error handling
There are various comments within the code indicating an intention of adding error handling or at least error signalling. This would be useful, but there are other things that might go wrong that should be handled, such as an incomplete section name at the end of a file or items not in any named section.
Consider the use of regular expressions
Many of the things in this code would be much smaller and more concise if they were written using the C++11 regex
library. For one thing, the ltrim
and rtrim
functions would become obsolete. As an example, here's what the parse
function looks like with regex and also the data structure change I mentioned (I renamed the private variable map
):
void config::parse(std::istream& in) {
static const std::regex comment_regex{R"x(\s*[;#])x"};
static const std::regex section_regex{R"x(\s*\[([^\]]+)\])x"};
static const std::regex value_regex{R"x(\s*(\S[^ \t=]*)\s*=\s*((\s?\S+)+)\s*$)x"};
std::string current_section;
std::smatch pieces;
for (std::string line; std::getline(in, line);)
{
if (line.empty() || std::regex_match(line, pieces, comment_regex)) {
// skip comment lines and blank lines
}
else if (std::regex_match(line, pieces, section_regex)) {
if (pieces.size() == 2) { // exactly one match
current_section = pieces[1].str();
}
}
else if (std::regex_match(line, pieces, value_regex)) {
if (pieces.size() == 4) { // exactly enough matches
map[current_section][pieces[1].str()] = pieces[2].str();
}
}
}
}
-
\$\begingroup\$ You mean sections could be: typedef std::unordered_map< std::string, std::unordered_map<std::string, std::string> > sections; \$\endgroup\$arcomber– arcomber2016年05月10日 16:34:43 +00:00Commented May 10, 2016 at 16:34
-
\$\begingroup\$ In the version I created, I didn't actually use a
typedef
but yes, that's the type I meant. \$\endgroup\$Edward– Edward2016年05月10日 17:31:18 +00:00Commented May 10, 2016 at 17:31 -
1\$\begingroup\$ After reading this reply, I just had to say something like "Well done @Edward!". It's probably one of the most useful replies I've seen on Stack Exchange over the years. \$\endgroup\$Chris– Chris2018年11月06日 19:49:10 +00:00Commented Nov 6, 2018 at 19:49
Explicit constructors with single arguments
In order to avoid implicit single arg conversions by the compiler, consider declaring the constructor:
config(const std::string& filename);
as:
explicit config(const std::string& filename);