I am trying to parse code into AST. I want to keep minimum keywords and delimiters in the AST while keeping the semantics.
In Python function definition def foo():, the last : is syntactically required but also redundant in semantic structure, which means it can be omitted in the AST and reconstructed from the AST.
But I also notice that the : in slice expression start:end is semantically relevant that it can't be omitted in the AST. Removing it will lead to ambiguous semantic; take index: and :index, for example.
I wonder if there are more keywords or delimiters in Python grammar that share a similar property to : in slice expressions?
Tree-sitter is what I am using for parsing.
1 Answer 1
Function definitions in general contain a couple of bits of punctuation that are operators in other contexts:
def f(a, / , b, *, c, **kwargs): ...
This function definition has one positional-only argument, one normal argument, one keyword-only argument, and a kwargs dictionary. But also /, *, and ** are normal operators.
Python 3.10 also introduces soft keywords: match, case, and type can be keywords in specific contexts but are not keywords per se, and could be used as ordinary variables as well.
def f(cls: type[Foo]): ...
type = Foo
f(type)
Note that an AST will not necessarily preserve punctuation at all. For example, you can see how the ast module parses a simple function call
>>> import ast
>>> s = """f(type, "x")"""
>>> print(ast.dump(ast.parse(s))
Module(
body=[
Expr(
value=Call(
func=Name(id='f', ctx=Load()),
args=[
Name(id='type', ctx=Load()),
Constant(value='x')]))])
Notice that this representation doesn't include the parentheses or comma from the function call, and the exact quoting syntax of the "x" parameter isn't preserved. That's pretty typical of language parsers.
:wouldn't be in the AST either way - e.g. in Python's own AST:indexandindex:would both be represented as aSlice, withNonevalues forlowerandupperrespectively.