So I've been trying to implement a Token class in C++. At first I wanted to use a simple enum class
to store the Kind
of the Token.
But then I came across a problem. Some Tokens (such as keywords, identifiers, numeric literals, etc.) would need to store an external value along with their Kind
, while others wouldn't. I tried to get around the problem by creating a struct for each Token Kind and then using an std::variant
for the Kind
type. Here's the code I ended up with:
#include <cstdint>
#include <string>
#include <variant>
class Token
{
public:
struct Num
{
std::int64_t val;
};
struct Float
{
double val;
};
struct String
{
std::string val;
};
struct Keyword
{
std::string val;
};
struct Identity
{
std::string val;
};
struct LParen
{
};
struct RParen
{
};
struct LBrace
{
};
struct RBrace
{
};
struct LBracket
{
};
struct RBracket
{
};
struct Semicolon
{
};
struct Colon
{
};
struct Dot
{
};
struct Operator
{
enum class Name : std::uint_fast8_t
{
Add = 0,
Sub,
Mult,
Div,
Pow
};
Name val;
};
struct EndOfFile
{
};
struct Invalid
{
};
using Kind =
std::variant<Num, Float, String, Keyword, Identity, LParen, RParen, LBrace, RBrace,
LBracket, RBracket, Semicolon, Colon, Dot, Operator, EndOfFile, Invalid>;
struct Span
{
std::uint_fast64_t row;
std::uint_fast64_t col;
};
Token() = delete;
Token(Kind const& kind, Span span) : m_kind(kind), m_span(span) {}
auto kind() const -> Kind { return m_kind; }
auto span() const -> Span { return m_span; }
auto isEOF() const -> bool { return std::holds_alternative<EndOfFile>(m_kind); }
auto isValid() const -> bool { return !std::holds_alternative<Invalid>(m_kind); }
private:
Kind m_kind;
Span m_span;
};
Is this a valid way to do what I originally intended to do? Or is it way too complicated and there's a simpler way?
1 Answer 1
Seems great to me — a nice way to simulate algebraic data types in C++.
The only improvement I can think of:
struct Operator
{
enum class Name : std::uint_fast8_t
{
Add = 0,
Sub,
Mult,
Div,
Pow
};
Name val;
};
Why not make Operator
itself an enum class
?
I might consider changing
Token(Kind const& kind, Span span) : m_kind(kind), m_span(span) {}
to
template <typename T>
Token(T&& kind, Span span)
: m_kind(std::forward<T>(kind)) // #include <utility>
, m_span(span)
{
}
to avoid the overhead of constructing an extra variant
.
The OP mentioned grouping the symbols together into an enum class
in the comments. Whether this is beneficial depends on how the symbols will be used. It is easier to handle all the symbols together:
enum class Symbol {
LParen, RParen, LBrace, RBrace, LBracket, RBracket, Semicolon, Colon, Dot,
};
// ...
std::ostream& operator<<(std::ostream& os, Symbol symbol) {
static const std::unordered_map<Symbol, char> table {
{ Symbol::LParen, '(' },
{ Symbol::RParen, ')' },
// etc.
};
return os << table.at(symbol);
}
std::ostream& operator<<(std::ostream& os, const Token& token) {
std::visit(Overloaded {
// ...
[&](Symbol symbol) { os << symbol; }, // uniform handling
// ...
}, token.kind);
// ...
}
versus
struct LParen {};
struct RParen {};
struct LBrace {};
struct RBrace {};
// etc.
// ...
std::ostream& operator<<(std::ostream& os, const Token& token) {
std::visit(Overloaded {
// ...
[&](LParen) { os << '('; },
[&](RParen) { os << ')'; },
[&](LBrace) { os << '{'; },
[&](RBRace) { os << '}'; }, // manual handling
// ...
}, token.kind);
// ...
}
But it is harder to check if a token is a given symbol:
enum class Symbol {
LParen, RParen, LBrace, RBrace, LBracket, RBracket, Semicolon, Colon, Dot,
};
// ...
bool is_symbol(const Token& token, Symbol symbol) {
if (auto p = std::get_if<Symbol>(token)) {
return *p == symbol;
} else {
return false;
}
}
versus simply a call to std::holds_alternative
.
The same applies to the operators.
-
\$\begingroup\$ Ah yeah, changing operator itself to an enum class is something I should do. Also, I don't quite get what you did with the constructor. I don't even know if it's possible to make a template constructor for a non-template class \$\endgroup\$Famiu– Famiu2021年05月30日 04:22:50 +00:00Commented May 30, 2021 at 4:22
-
\$\begingroup\$ @Famiu It is indeed possible - I edited the answer to clarify. The difference here is that the argument is directly forwarded to the constructor of
std::variant
, whereas in your version, a temporarystd::variant
is constructed and then copied tom_kind
. \$\endgroup\$L. F.– L. F.2021年05月30日 04:25:55 +00:00Commented May 30, 2021 at 4:25 -
\$\begingroup\$ Do you think I should group all symbols together inside a symbols enum, or is it fine how it is? \$\endgroup\$Famiu– Famiu2021年05月30日 04:29:26 +00:00Commented May 30, 2021 at 4:29
-
\$\begingroup\$ @Famiu Ultimately, it depends on the usage pattern. Grouping them together is beneficial when the code handles all symbols in a uniform manner, but it would be slightly more work (a helper function, that is) to check if a token is a specific symbol (
get_if
+==
). \$\endgroup\$L. F.– L. F.2021年05月30日 04:33:09 +00:00Commented May 30, 2021 at 4:33 -
\$\begingroup\$ Does the same apply for the operators, then? Should I flatten them out as well or keep them as an enum? \$\endgroup\$Famiu– Famiu2021年05月30日 04:36:12 +00:00Commented May 30, 2021 at 4:36
Explore related questions
See similar questions with these tags.