Preliminaries
I am working on a software that controls CAEN digitizer. The current step is to provide an ability to save/load configuration to/from an external source (hereafter - "config file"). I decided config file to be a simple text file and consisting of lines like this one:
PARAMETER VALUE
For example, this is how the trigger threshold might have been set:
CH_THRESHOLD 100
To compile a configuration for a digitizer from a config file my program should process a config file line by line and extract a pair parameter-value from each line if it is possible.
Some notes
Here are some principles I was following when writing the code:
First of all, performance is not the point. The parsing is not intended to be fast because it is a single operation, i.e. it is performed only once in a relative large time interval. I mainly refer it to the
find_first_*
function calls below (I believe it is expensive to search for a character in thatconst string
).The parsing of a config file is the first step in the loading of a config file. It is about syntax not semantics; that is why at this stage all the values are strings (though semantically they could be numbers).
Program
struct ConfigUnit
This structure represents a single pair parameter-value, so called a config unit.
struct ConfigUnit
{
std::string parameter;
std::string value;
void Print() const
{
std::cout << "Parameter: '" << parameter << "'" "\tValue: '" << value << "'" << std::endl;
}
};
Algorithm
This is actually a little bit more generic algorithm than I need because it searches for all words in a line.
const std::string alphanumeric = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
ConfigUnit GetConfigUnit( const std::string &line )
{
ConfigUnit unit;
std::vector< std::string > words;
//Find all alphanumerical words in the line
for( size_t startWord = line.find_first_of( alphanumeric ); startWord != std::string::npos; )
{
std::size_t endWord = line.find_first_not_of( alphanumeric, startWord );
words.push_back( line.substr( startWord, (endWord - startWord) ) );
startWord = line.find_first_of( alphanumeric, endWord );
}
if( words.size() == 2 )
{
unit.parameter = words.at(0);
unit.value = words.at(1);
}
return unit;
}
Test
#include <iostream>
#include <vector>
#include <string>
#include <fstream>
struct ConfigUnit
{
std::string parameter;
std::string value;
void Print() const
{
std::cout << "Parameter: '" << parameter << "'" "\tValue: '" << value << "'" << std::endl;
}
};
const std::string alphanumeric = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
ConfigUnit GetConfigUnit( const std::string &line )
{
ConfigUnit unit;
std::vector< std::string > words;
//Find all alphanumerical words in the line
for( size_t startWord = line.find_first_of( alphanumeric ); startWord != std::string::npos; )
{
std::size_t endWord = line.find_first_not_of( alphanumeric, startWord );
words.push_back( line.substr( startWord, (endWord - startWord) ) );
startWord = line.find_first_of( alphanumeric, endWord );
}
if( words.size() == 2 )
{
unit.parameter = words.at(0);
unit.value = words.at(1);
}
return unit;
}
int main()
{
std::ifstream file;
file.exceptions( std::fstream::failbit | std::fstream::badbit );
try
{
file.open( "dummy.txt" );
std::string line;
while( std::getline( file, line ) )
{
ConfigUnit unit = GetConfigUnit( line );
unit.Print();
}
file.close();
}
catch( const std::ifstream::failure &e )
{
if( file.eof() )
{
std::cout << "SUCCESS!\n";
}
else
{
std::cerr << "FAILURE! " << e.what() << std::endl;
}
}
return 0;
}
As always, please critique, suggest, correct.
1 Answer 1
General
Please ensure code compiles when posting it here. Or state a specific justification why it doesn't. (e.g. Print()
was missing a <<
)
Putting it all together
There are quite a few specific comments under your question, which I have incorporated and added a few more in a proposed alternative solution below. It is quite different in approach, technique and style. None of these are "must have", but you can take your pick.
Parsing - how flexible
The most challenging part of any parsing is the question, how flexible / tolerant does it have to be? How consistent is the file format?
Probably the simplest way to parse anything in C++ is to use std::getline
. You were already using it, but were you aware of the second version?
If your parsing requirements are not very pedantic or in need of lightspeed performance you could use the overloaded version of std::getline
which accepts a delimiter. I have shown below how that can be used.
It makes the code much simpler, but it's slightly less tolerant. e.g. double spaced gap is not good.
Printing
The common idiom for making objects "printable" in C++ is to implement the <<
operator. This is best done using a friend
function inside the class as shown.
You should almost always use '\n'
and not std::endl
, because the latter flushes the stream buffer, and that is often not needed. The '\n'
line ending will also adapt to the local machine.
Return "nothing"
In case of parsing failures, you were previously returning an "empty" ConfigUnit
(correct term is value initialized). I have shown a, probably cleaner alternative, using C++17 std::optional
.
Note how in main()
I call get_config_unit()
assign it to maybe_unit
(a common idiom to use maybe_*
) then check for the -> bool
conversion in the same if statement parens. This is using the C++17 if statement "init-statement". Then I "dereference" the std::optional<config_unit> maybe_unit
using the *
operator.
This all makes for very terse syntax and stops variables and the "maybeness" leaking out.
Use of exceptions
We can have a long debate about exceptions, there are several schools of thought. However, something I think most people agree on is that exceptions shouldn't be thrown during the "normal, expected case". They should be, well... "exceptional".
Your code was throwing and catching the eof
flag. i.e. every invocation of the programme was throwing. This is probably bad.
I double checked the "file bits" and they didn't seem very useful to me, e.g. bad_bit
doesn't get set if file is not found (perhaps the most common mode of failure), only fail_bit
gets set. But that also gets set for eof
. So I did it differently without exceptions.
RAII
Look it up: "Resource acquisition is initialisation" (and what is not stated, is that when objects go out of scope they clean up after themselves). It is a key C++ feature. It means you don't have to call file.close()
. When the object goes out of scope at end of function its destructor will be called, which will clean up.
This also means we don't have to have "one common exit point". Maybe that's what you were trying to achieve with your try .. catch
block? It's not required. The right stuff just happens (because the std::
classes are well written).
This was my test file "music.txt":
CH_THRESHOLD 100
ba pram 314
SOME-PARAM2 3000
Too_many_spaces 999
And here is the suggested code:
#include <fstream>
#include <functional>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
struct config_unit {
std::string parameter;
std::string value;
friend std::ostream& operator<<(std::ostream& ostream, const config_unit& cu) {
return ostream << "Parameter: '" << cu.parameter << "'\t"
<< "Value: '" << cu.value << "'" << '\n';
}
};
std::optional<config_unit> get_config_unit(const std::string& line) {
std::istringstream ss(line);
std::string field;
std::vector<std::string> fields;
while (getline(ss, field, ' ')) fields.push_back(field);
if (fields.size() != 2) return std::nullopt;
return config_unit{fields[0], fields[1]};
}
int main() {
std::string filename{"music.txt"};
std::ifstream fstream(filename);
if (fstream.fail()) {
std::cerr << "couldn't open file '" << filename << "\n";
return EXIT_FAILURE;
}
std::string line;
while (std::getline(fstream, line)) {
if (auto maybe_unit = get_config_unit(line); maybe_unit) {
std::cout << *maybe_unit;
}
}
return EXIT_SUCCESS;
}
And here is the output produced given the above input:
Parameter: 'CH_THRESHOLD' Value: '100'
Parameter: 'SOME-PARAM2' Value: '3000'
#include
lines. \$\endgroup\$if( words.size() == 2 )
? Seems to me you return a value initializedConfigUnit
. Is that acceptable? Have you tested it? Consider usingstd::isalpha
en.cppreference.com/w/cpp/string/byte/isalpha andstd::isdigit
en.cppreference.com/w/cpp/string/byte/isdigit. \$\endgroup\$ConfigUnit
. Then I do not process empty units. \$\endgroup\$std::isalpha
andstd::isdigit
functions, but isn't a fact I should explicitly loop over each character in a string then? I think in this case the code grows. Is something wrong with my approach using thatconst string
? \$\endgroup\$