ini file parser in C++

Question 1

Please review my ini file parser (and potentially some general config file formats).

The data structure used is a section, see below. Does that seem a good approach?

I was wondering about the name. It will parse ini files ok, but unix config files are so variable that calling the class config might be over selling it a bit.

I don't support mid-line comments. Only comments on a line on their own are well just ignored.

All values are stored as strings. I thought that only the user of the config file will know if a value should be used in his or her application as a string or integer or whatever and onus is on user to convert.

Any comments would be much appreciated. Its intended initial use is for Windows ini files. Parsing the configuration into sections.

config.hpp

#ifndef CONFIG_HPP_
#define CONFIG_HPP_
#include <string>
#include <unordered_map>
#include <list>
struct section
{
 std::string name;
 std::unordered_map<std::string, std::string> keyvalues;
};
class config
{
public:
 config(const std::string& filename);
 section* get_section(const std::string& sectionname);
 std::list<section>& get_sections();
 std::string get_value(const std::string& sectionname, const std::string& keyname);
private:
 void parse(const std::string& filename);
 std::list<section> sections;
};
#endif // CONFIG_HPP_

config implementation - config.cpp

#include "config.hpp"
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <unordered_map>
#include <list>
// trim leading white-spaces
static std::string& ltrim(std::string& s) {
 size_t startpos = s.find_first_not_of(" \t\r\n\v\f");
 if (std::string::npos != startpos)
 {
 s = s.substr(startpos);
 }
 return s;
}
// trim trailing white-spaces
static std::string& rtrim(std::string& s) {
 size_t endpos = s.find_last_not_of(" \t\r\n\v\f");
 if (std::string::npos != endpos)
 {
 s = s.substr(0, endpos + 1);
 }
 return s;
}
config::config(const std::string& filename) {
 parse(filename);
}
section* config::get_section(const std::string& sectionname) {
 std::list<section>::iterator found = std::find_if(sections.begin(), sections.end(), [sectionname](const section& sect) { 
 return sect.name.compare(sectionname) == 0; });
 return found != sections.end() ? &*found : NULL;
}
std::list<section>& config::get_sections() {
 return sections;
}
std::string config::get_value(const std::string& sectionname, const std::string&keyname) {
 section* sect = get_section(sectionname);
 if (sect != NULL) {
 std::unordered_map<std::string, std::string>::const_iterator it = sect->keyvalues.find(keyname);
 if (it != sect->keyvalues.end())
 return it->second;
 }
 return "";
}
void config::parse(const std::string& filename) {
 section currentsection;
 std::ifstream fstrm;
 fstrm.open(filename);
 if (!fstrm)
 throw std::invalid_argument(filename + " could not be opened");
 for (std::string line; std::getline(fstrm, line);)
 {
 // if a comment
 if (!line.empty() && (line[0] == ';' || line[0] == '#')) {
 // allow both ; and # comments at the start of a line
 }
 else if (line[0] == '[') {
 /* A "[section]" line */
 size_t end = line.find_first_of(']');
 if (end != std::string::npos) {
 // this is a new section so if we have a current section populated, add it to list
 if (!currentsection.name.empty()) {
 sections.push_back(currentsection); // copy
 currentsection.name.clear(); // clear section for re-use
 currentsection.keyvalues.clear();
 }
 currentsection.name = line.substr(1, end - 1);
 }
 else {
 // section has no closing ] char
 }
 }
 else if (!line.empty()) {
 /* Not a comment, must be a name[=:]value pair */
 size_t end = line.find_first_of("=:");
 if (end != std::string::npos) {
 std::string name = line.substr(0, end);
 std::string value = line.substr(end + 1);
 ltrim(rtrim(name));
 ltrim(rtrim(value));
 currentsection.keyvalues[name] = value;
 }
 else {
 // no key value delimitter
 }
 }
 } // for
 // if we are out of loop we add last section
 // this is a new section so if we have a current section populated, add it to list
 if (!currentsection.name.empty()) {
 sections.push_back(currentsection); // copy
 currentsection.name = "";
 currentsection.keyvalues.clear();
 }
}

some code to do a simple test

#include "config.hpp"
#include <iostream>
#include <fstream>
#include <string>
void generate_config(const std::string& filename) {
 std::ofstream ostrm;
 ostrm.open(filename);
 if (ostrm) {
 ostrm << "[protocol]\nversion = 6 \n\n[user]\nname = Bob Smith \nemail = [email protected] \nactive = true\n\npi = 3.14159";
 }
}
int main() {
 // generate test file
 generate_config("test1.ini");
 // retrieve some information from config file
 config cfg("test1.ini");
 section* usersection = cfg.get_section("user");
 if (usersection != NULL) {
 std::cout << "section name: " << usersection->name << std::endl;
 std::cout << "email=" << cfg.get_value("user", "email") << '\n';
 }
}

Question 2

Looks good to me. A couple remarks:

section* get_section(const std::string& sectionname);
std::list<section>& get_sections();

Not so sure it is a good idea returning mutable pointers/references to the data, since changing a section in memory has no effect to the underlaying file representation. It could lead to misunderstandings. It is probably best to keep the data immutable by returning a const pointer and const ref instead. That will also allow const qualifying the methods themselves.

 const section* get_section(const std::string& sectionname) const;
 const std::list<section>& get_sections() const;

Use auto for these overly verbose template instantiations:

std::unordered_map<std::string, std::string>::const_iterator it = sect->keyvalues.find(keyname);

vs:

const auto it = sect->keyvalues.find(keyname);

This will also make life easier if the container or contents thereof change in the future. You just have to update the keyvalues declaration, not everywhere it was used.

Also on the subject of using the current feature set of C++, you should prefer nullptr over NULL. NULL implicitly converts to integer and messes up function overload resolution because of that, so get in the habit of using nullptr always.

std::list still relevant? Some people say you should just forget that linked-lists exist, in a world where CPU caches and data locality play such an important role in code performance. std::vector is an excellent default for the majority of cases, so I would start with a vector instead. To help you decide, take a look at: Efficiency with Algorithms, Performance with Data Structures.

Question 3

I see some things that may help you improve your program.

Use all of the required `#include`s

The function strncpy is used but its declaration is in #include <cstring> which is not actually in the list of includes.

Don't expose class internals

The sections member data is a private data member, which is fine and appropriate, but then the get_sections() returns an unfettered reference to it, rendering the private classification nearly meaningless. The way it's currently written, one can arbitrarily modify all internal data via that one function which is not good object-oriented design. Better would be to either return a copy or at least to return a const reference.

Prefer `std::istream` to file names

The current interface gets passed a file name, but that's a relatively inflexible and unsatisfactory solution. For one thing, as your parse code comment notes, if there's a problem opening a file (or doing anything else with it) one would normally expect that either the code handles it or that it returns some indication of error. This code does neither. Secondly, as the test code illustrates, it would have been nice if the interface had been able to accomodate a std::istream instead of a filename. This would have allowed usage like this:

std::stringstream ss{"[protocol]\nversion = 6 \n\n[user]\nname = Bob Smith \n"
 "email = [email protected] \nactive = true\n\npi = 3.14159"};
config cfg(ss);

Use `auto` to simplify your code

Within the get_value member function we have this line:

std::unordered_map<std::string, std::string>::const_iterator it = sect->keyvalues.find(keyname);

This could be made much simpler by the use of auto:

const auto it = sect->keyvalues.find(keyname);

Use `const` where practical

Several of your member functions are lookup functions which do not and should not alter the underlying config object. Those functions, such as get_value, should be declared const:

std::string get_value(const std::string& sectionname, const std::string& keyname) const;

Reconsider the data structures

The basic usage of the config object is to be able to fetch individual items by section and key name. The interface right now looks like this when used:

std::cout << "email=" << cfg.get_value("user", "email") << '\n';

For this reason, it seems to me that it would make more sense to use a different data structure than the current std::list. Instead, one could use a std::unordered_map of std::unordered_maps. That would make a number of things a bit simpler.

Support write operations

It may be useful to allow the user to modify settings and then write the updated configuration file back out, but the current interface allows no way to do this.

Add better error handling

There are various comments within the code indicating an intention of adding error handling or at least error signalling. This would be useful, but there are other things that might go wrong that should be handled, such as an incomplete section name at the end of a file or items not in any named section.

Consider the use of regular expressions

Many of the things in this code would be much smaller and more concise if they were written using the C++11 regex library. For one thing, the ltrim and rtrim functions would become obsolete. As an example, here's what the parse function looks like with regex and also the data structure change I mentioned (I renamed the private variable map):

void config::parse(std::istream& in) {
 static const std::regex comment_regex{R"x(\s*[;#])x"};
 static const std::regex section_regex{R"x(\s*\[([^\]]+)\])x"};
 static const std::regex value_regex{R"x(\s*(\S[^ \t=]*)\s*=\s*((\s?\S+)+)\s*$)x"};
 std::string current_section;
 std::smatch pieces;
 for (std::string line; std::getline(in, line);)
 {
 if (line.empty() || std::regex_match(line, pieces, comment_regex)) {
 // skip comment lines and blank lines 
 }
 else if (std::regex_match(line, pieces, section_regex)) {
 if (pieces.size() == 2) { // exactly one match
 current_section = pieces[1].str();
 }
 }
 else if (std::regex_match(line, pieces, value_regex)) {
 if (pieces.size() == 4) { // exactly enough matches
 map[current_section][pieces[1].str()] = pieces[2].str();
 }
 }
 }
}

Question 4

You mean sections could be: typedef std::unordered_map< std::string, std::unordered_map<std::string, std::string> > sections;

Question 5

In the version I created, I didn't actually use a typedefbut yes, that's the type I meant.

Question 6

After reading this reply, I just had to say something like "Well done @Edward!". It's probably one of the most useful replies I've seen on Stack Exchange over the years.

Question 7

Explicit constructors with single arguments

In order to avoid implicit single arg conversions by the compiler, consider declaring the constructor:

config(const std::string& filename);

as:

explicit config(const std::string& filename);

glampert 17.3k4 gold badges31 silver badges89 bronze badges · Accepted Answer · 2016-05-08 18:29:20Z

Looks good to me. A couple remarks:

section* get_section(const std::string& sectionname);
std::list<section>& get_sections();

Not so sure it is a good idea returning mutable pointers/references to the data, since changing a section in memory has no effect to the underlaying file representation. It could lead to misunderstandings. It is probably best to keep the data immutable by returning a const pointer and const ref instead. That will also allow const qualifying the methods themselves.

 const section* get_section(const std::string& sectionname) const;
 const std::list<section>& get_sections() const;

Use auto for these overly verbose template instantiations:

std::unordered_map<std::string, std::string>::const_iterator it = sect->keyvalues.find(keyname);

vs:

const auto it = sect->keyvalues.find(keyname);

This will also make life easier if the container or contents thereof change in the future. You just have to update the keyvalues declaration, not everywhere it was used.

Also on the subject of using the current feature set of C++, you should prefer nullptr over NULL. NULL implicitly converts to integer and messes up function overload resolution because of that, so get in the habit of using nullptr always.

std::list still relevant? Some people say you should just forget that linked-lists exist, in a world where CPU caches and data locality play such an important role in code performance. std::vector is an excellent default for the majority of cases, so I would start with a vector instead. To help you decide, take a look at: Efficiency with Algorithms, Performance with Data Structures.

Stack Exchange Network

ini file parser in C++

3 Answers 3

Use all of the required `#include`s

Don't expose class internals

Prefer `std::istream` to file names

Use `auto` to simplify your code

Use `const` where practical

Reconsider the data structures

Support write operations

Add better error handling

Consider the use of regular expressions

You must log in to answer this question.

Linked

Hot Network Questions

ini file parser in C++

3 Answers 3

Use all of the required #includes

Don't expose class internals

Prefer std::istream to file names

Use auto to simplify your code

Use const where practical

Reconsider the data structures

Support write operations

Add better error handling

Consider the use of regular expressions

You must log in to answer this question.

Linked

Related

Hot Network Questions

Use all of the required `#include`s

Prefer `std::istream` to file names

Use `auto` to simplify your code

Use `const` where practical