0
$\begingroup$

The reference manual for Lua contains a complete EBNF grammar for the language. However, it's simplified compared to the grammar used by the actual Lua parser.

In particular, the simplified grammar is ambiguous. For example, one of the rules is exp = exp binop exp, which means that an expression like a - b - c could be parsed as either (a - b) - c or a - (b - c). To be able to generate a parse tree, a parser needs either an unambiguous grammar or at some out-of-band information - for example, a table that specifies the precedence and associativity of each operator.

So, the ambiguous simplified grammar is not enough to implement a parser for Lua. But could it be used to implement a recognizer - a program that determines whether a sequence of tokens is a valid Lua program, without worrying about the exact parse tree? Or does a recognizer also require an unambiguous grammar?

asked Nov 19, 2024 at 12:53
$\endgroup$

2 Answers 2

2
$\begingroup$

It depends on the parsing technique used.

General context-free parsing techniques (such as GLR and Earley) can handle arbitrary grammars just fine.

In practice, we often see more limited parsers that require deterministic grammars, e.g., LALR(1) or LL(1). Popular parser generators such as Yacc and ANTLR are limited in such a way. So if you pick an existing parser generator, pay attention to how general it is.

answered Nov 19, 2024 at 15:35
$\endgroup$
0
$\begingroup$

You need a recogniser (better yet a parser) for the language. An unambiguous context-free grammar with a parser that can handle any such grammar is fine. Another method is operator precedence where you specify which tokens are operators, whether they are unary or binary, left or right associative and what precedence. That will handle many languages just fine.

Or you can add a catch-all rule: Just state that any program that has two parse trees according to your parsing algorithm is not part of the language. In your case a - b - c would not be part of the language (nor would a + b + c actually), so your parsing algorithm is now unambiguous. Operator precedence where +, - are left associative would work (in FORTRAN the ** operator is right assocative).

Many or most practical languages have edge cases that are solved by some ad hoc rule. Some will use a parsing algorithm that always produces the same parse of multiple trees first, and you define the language according to the first parse tree generated.

answered Nov 19, 2024 at 14:24
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.