I created a simple parser in Rust and defined the AST like this:
enum Expression {
Number(i32),
BinaryOperator(Box<Expression>, Operator, Box<Expression>),
Identifier(String),
}
enum ProgramLine {
SetIdentifier(String, Expression),
Print(Expression),
}
struct Program {
lines: Vec<ProgramLine>,
}
I would like to keep information about where these nodes come from, so when there is an error, I can also report the line number.
How is it solved in parsers? My first idea is to add something line linenumber to each type, like this:
enum Expression {
Number(i32, i32),
BinaryOperator(Box<Expression>, Operator, Box<Expression>, i32),
Identifier(String, i32),
}
But it does not seem right, because the AST should only represent the syntax tree of the language. But where should I store a context like this?
1 Answer 1
Yes, you have described the standard approach.
Creating a raw text node type which has a line number, and then having others inherit from that might be attractive.
Error messages will typically want three items:
- file path
- source line number
- offset within line
Your approach suggests the first may be easy to get from context.
the AST should only represent the syntax tree of the language.
Ok, that's a fair perspective.
We don't have to store such details in AST nodes. We could say that "diagnostics happen seldom", and commit to re-parsing the file when reporting a diagnostic. This races with a developer's editor, which might Save updated text.
A middle approach would consult a global boolean, and only inflate the size of nodes when line tracking is enabled.
Explore related questions
See similar questions with these tags.
rustc
does it or talk to its developers?&str
). You can recalculate the line + column number from that later. If you have multiple source files, the span must contain a reference/name/ID for the file.