A parse function for text line with fields separated by a comma with the known number and their type
I am trying to parse a line with a fixed amount of fields and known types.
using Text = FieldType<std::string>;
using Int = FieldType<int>;
using Name = Text;
using TextSpecificField = Text;
using IntSpecificField = Int;
struct LineData
{
Name name;
TextSpecificField textField;
IntSpecificField intField;
// LineData contains 12 fields in total, types are known
};
struct InvalidLine { std::string line, err; };
using Line = std::variant<LineData, InvalidLine>;
With that, I can write a parsing function for each specific field type:
template<Field T>
constexpr auto parse(std::string_view field) -> detail::ReturnOptErr<T>
{
if constexpr (std::is_same_v<Text, T>)
{
return {decltype(T::value){field}};
}
else if constexpr (std::is_same_v<Int, T>)
{
auto [value, err] = toNum<decltype(T::value)>(field); // Why typename T::value doesn't work?
return {T{value}, std::move(err)};
}
else
{
[]<bool flag = false> { static_assert(flag, "no match"); }();
}
}
Here my problem starts with the below parse function. I am having trouble wrapping my head around trying to simplify this piece of code. As it can be seen, there is a repeating pattern that I would like to abstract:
auto parse(std::string_view line) -> Line
{
// Parsing a string and returning string is done intentionally to show repeating pattern
auto [nameStr, rest] = split(line, ',');
auto name = parse<Name>(nameStr);
if (name.errorText)
{
return InvalidLine{std::string{line}, std::move(name.errorText.value())};
}
auto [textStr, rest2] = split(rest, ',');
auto textField = parse<TextSpecificField>(textStr);
if (textField.errorText)
{
return InvalidLine{std::string{line}, std::move(textField.errorText.value())};
}
auto [intStr, rest3] = split(rest2, ',');
auto intField = parse<IntSpecificField>(intStr);
if (intField.errorText)
{
return InvalidLine{std::string{line}, std::move(intField.errorText.value())};
}
return LineData{std::move(name.value), std::move(textField.value), intField.value};
}
Is there an elegant way to approach this? I thought about using exceptions, but I don't like that solution. I also tried to approach this problem using template recursion, but I could not come up with an idea of returning Line
from such a parser.
BR
#include <algorithm>
#include <array>
#include <functional>
#include <iostream>
#include <optional>
#include <string_view>
#include <tuple>
#include <type_traits>
#include <variant>
// - Utility ---------------------------------------------------------------------------------------
namespace detail
{
template <typename FieldType>
struct ReturnOptErr
{
FieldType value;
std::optional<std::string> errorText;
};
} // namespace detail
template<typename T>
constexpr auto toNum(std::string_view valueStr) -> detail::ReturnOptErr<T>
{
// Implementation of number parsing using <charconv>
return detail::ReturnOptErr<T>{123};
}
constexpr auto split(std::string_view input, char delim) noexcept -> std::pair<std::string_view, std::string_view>
{
auto const delimPos = input.find(delim);
if (delimPos == std::string_view::npos)
{
return {input, ""};
}
auto const token = input.substr(0, delimPos);
auto const rest = input.substr(delimPos + 1);
return {token, rest};
}
// - Data ------------------------------------------------------------------------------------------
template<typename T>
struct FieldType
{
T value;
};
template<typename T>
concept Field = std::is_same_v<FieldType<decltype(T::value)>, T>;
auto operator<<(std::ostream& os, Field auto const& field) -> std::ostream&
{
return os << field.value;
}
using Text = FieldType<std::string>;
using Int = FieldType<int>;
using Name = Text;
using TextSpecificField = Text;
using IntSpecificField = Int;
// more field types
// ------------------------------------------------------------------------------------------------
struct LineData
{
Name name;
TextSpecificField textField;
IntSpecificField intField;
// LineData containts 12 fields in total, types are known
};
struct InvalidLine { std::string line, err; };
using Line = std::variant<LineData, InvalidLine>;
// - Parser ----------------------------------------------------------------------------------------
template<Field T>
constexpr auto parse(std::string_view field) -> detail::ReturnOptErr<T>
{
if constexpr (std::is_same_v<Text, T>)
{
return {decltype(T::value){field}};
}
else if constexpr (std::is_same_v<Int, T>)
{
auto [value, err] = toNum<decltype(T::value)>(field); // Why typename T::value doesn't work?
return {T{value}, std::move(err)};
}
else
{
[]<bool flag = false> { static_assert(flag, "no match"); }();
}
}
auto parse(std::string_view line) -> Line
{
// Parsing a string and returning string is done intentionally to show repeating pattern
auto [nameStr, rest] = split(line, ',');
auto name = parse<Name>(nameStr);
if (name.errorText)
{
return InvalidLine{std::string{line}, std::move(name.errorText.value())};
}
auto [textStr, rest2] = split(rest, ',');
auto textField = parse<TextSpecificField>(textStr);
if (textField.errorText)
{
return InvalidLine{std::string{line}, std::move(textField.errorText.value())};
}
auto [intStr, rest3] = split(rest2, ',');
auto intField = parse<IntSpecificField>(intStr);
if (intField.errorText)
{
return InvalidLine{std::string{line}, std::move(intField.errorText.value())};
}
return LineData{std::move(name.value), std::move(textField.value), intField.value};
}
int main()
{
std::string lineStr = "name, text, 123456";
auto lineVar = parse(lineStr);
auto& line = std::get<LineData>(lineVar);
std::cout << line.name << ", " << line.textField << ", " << line.intField << "\n";
}
-
\$\begingroup\$ If you haven't already, you might want to take a look at how this sort of thing is handled by the combination of boos::fusion and boost::qi. boost.org/doc/libs/1_79_0/libs/spirit/doc/html/spirit/qi/… \$\endgroup\$Jerry Coffin– Jerry Coffin2022年08月08日 20:53:55 +00:00Commented Aug 8, 2022 at 20:53
-
\$\begingroup\$ @JerryCoffin Thank you, that is interesting concept to write a parser, probably even high quality one, but I would like to stick to a hand-written one. I am satisfied with what I got after I applied G. Sliepen suggestion. \$\endgroup\$Norbert– Norbert2022年08月10日 17:12:15 +00:00Commented Aug 10, 2022 at 17:12
1 Answer 1
Create a helper function
To reduce the code repetition, a classic solution is to write a helper function. You can do that here as well. First let's look at how we want parse()
to look:
auto parse(std::string_view line) -> Line
{
LineData data;
parse_field(line, &data.name);
parse_field(line, &data.textField);
parse_field(line, &data.intField);
return data;
}
The function parse_field()
gets a reference to the member of data
, and by making it a template it can deduce the type for us. It could look like:
void parse_field(std::string_view& line, auto& member)
{
auto [field, line] = split(line, ',');
member = parse<decltype(member)>(field);
}
Note the reference to line
in the latter function. I have also omitted all the error checking here, that should be added back of course. Note that you can do that in a way that still preserves the simple structure of parse()
above. For example, you could pass a reference to a InvalidLine
to parse_field()
, and have parse_field()
return a bool
on error, so you can chain the calls to parse_field()
with ||
.
You still need to repeat parse_field(line, &data.member)
for each member, you can reduce even that by creating a variadic parse_fields()
function, so that in parse()
you only have to write:
auto parse(std::string_view line) -> Line
{
LineData data;
parse_fields(line, &data.name, &data.textField, &data.intField);
return data;
}
A fully generic solution
It would be nice to create a fully generic parse()
function that would work with any struct
, not just Line
. It is unfortunate that C++ doesn't have introspection yet; because ideally you would write a function like:
template<typename LineType>
requires std::is_class<LineType>
auto parse(std::string_view line) {
LineType result;
for (member: LineType) {
auto [field, line] = split(line, ',');
result.member = parse<decltype(result.member)>(field);
}
return result;
}
However, you can approach this hypothetical solution by using a std::tuple
instead of a struct
to hold all the fields of a line. A solution could then look like so:
template <typename T>
constexpr bool is_tuple = false;
template<typename ... Ts>
constexpr bool is_tuple<std::tuple<Ts...>> = true;
template<TupleType>
requires is_tuple<TupleType>
auto parse(std::string_view line) {
TupleType result;
auto parse_one = [&](auto& arg) {
auto [field, line] = split(line, ',');
arg = parse<decltype(arg)>(field);
};
std::apply([&](auto&... args) {
(parse_one(args), ...);
}, result);
return result;
}
Then you could do this:
using LineData = std::tuple<Name, textField, intField>;
...
LineData data = parse<decltype(data)>(line);
You can avoid having to pass the type as a template parameter by not using a return value but passing the result using a reference function parameter. Of course the drawback is that getting access to the fields in a tuple is not very nice.
-
\$\begingroup\$ Thank you for your help. Regarding the first part of the answer "helper function": When error checking is added back, we are back with repeated ifs, basically only split will be abstracted. Also in
parse_field
, theline
is reassigned, too bad there is no way to make it work with structured bindings. As for the second part of the answer "generic solution": It is nice a solution. This one actually addresses the problem, but I would like to stick with the struct for my data model. \$\endgroup\$Norbert– Norbert2022年08月06日 20:20:06 +00:00Commented Aug 6, 2022 at 20:20 -
\$\begingroup\$ It's still doable without falling back to repeated
if
s. I've added a hint at a possible solution to the answer. \$\endgroup\$G. Sliepen– G. Sliepen2022年08月06日 21:38:04 +00:00Commented Aug 6, 2022 at 21:38 -
\$\begingroup\$ I see now, thank you. I'm going to accept this answer. However, I am still not satisfied with the general approach using in-out parameters. \$\endgroup\$Norbert– Norbert2022年08月07日 09:05:44 +00:00Commented Aug 7, 2022 at 9:05
-
\$\begingroup\$ You could use a lambda function that captures
[&]
to avoid the in-out parameters, but under the hood it's all the same. Also, if someone comes up with a better idea, you can always change the accepted answer. \$\endgroup\$G. Sliepen– G. Sliepen2022年08月07日 09:24:19 +00:00Commented Aug 7, 2022 at 9:24