-1

I am trying to parse code into AST. I want to keep minimum keywords and delimiters in the AST while keeping the semantics.

In Python function definition def foo():, the last : is syntactically required but also redundant in semantic structure, which means it can be omitted in the AST and reconstructed from the AST.

But I also notice that the : in slice expression start:end is semantically relevant that it can't be omitted in the AST. Removing it will lead to ambiguous semantic; take index: and :index, for example.

I wonder if there are more keywords or delimiters in Python grammar that share a similar property to : in slice expressions?

Tree-sitter is what I am using for parsing.

jonrsharpe
123k30 gold badges275 silver badges487 bronze badges
asked Sep 4 at 10:37
1
  • The : wouldn't be in the AST either way - e.g. in Python's own AST :index and index: would both be represented as a Slice, with None values for lower and upper respectively. Commented Sep 4 at 10:45

1 Answer 1

0

Function definitions in general contain a couple of bits of punctuation that are operators in other contexts:

def f(a, / , b, *, c, **kwargs): ...

This function definition has one positional-only argument, one normal argument, one keyword-only argument, and a kwargs dictionary. But also /, *, and ** are normal operators.

Python 3.10 also introduces soft keywords: match, case, and type can be keywords in specific contexts but are not keywords per se, and could be used as ordinary variables as well.

def f(cls: type[Foo]): ...
type = Foo
f(type)

Note that an AST will not necessarily preserve punctuation at all. For example, you can see how the ast module parses a simple function call

>>> import ast
>>> s = """f(type, "x")"""
>>> print(ast.dump(ast.parse(s))
Module(
 body=[
 Expr(
 value=Call(
 func=Name(id='f', ctx=Load()),
 args=[
 Name(id='type', ctx=Load()),
 Constant(value='x')]))])

Notice that this representation doesn't include the parentheses or comma from the function call, and the exact quoting syntax of the "x" parameter isn't preserved. That's pretty typical of language parsers.

answered Sep 4 at 10:55
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.