Does a language recognizer require an unambiguous context-free grammar?

Question 1

The reference manual for Lua contains a complete EBNF grammar for the language. However, it's simplified compared to the grammar used by the actual Lua parser.

In particular, the simplified grammar is ambiguous. For example, one of the rules is exp = exp binop exp, which means that an expression like a - b - c could be parsed as either (a - b) - c or a - (b - c). To be able to generate a parse tree, a parser needs either an unambiguous grammar or at some out-of-band information - for example, a table that specifies the precedence and associativity of each operator.

So, the ambiguous simplified grammar is not enough to implement a parser for Lua. But could it be used to implement a recognizer - a program that determines whether a sequence of tokens is a valid Lua program, without worrying about the exact parse tree? Or does a recognizer also require an unambiguous grammar?

Question 2

It depends on the parsing technique used.

General context-free parsing techniques (such as GLR and Earley) can handle arbitrary grammars just fine.

In practice, we often see more limited parsers that require deterministic grammars, e.g., LALR(1) or LL(1). Popular parser generators such as Yacc and ANTLR are limited in such a way. So if you pick an existing parser generator, pay attention to how general it is.

Question 3

You need a recogniser (better yet a parser) for the language. An unambiguous context-free grammar with a parser that can handle any such grammar is fine. Another method is operator precedence where you specify which tokens are operators, whether they are unary or binary, left or right associative and what precedence. That will handle many languages just fine.

Or you can add a catch-all rule: Just state that any program that has two parse trees according to your parsing algorithm is not part of the language. In your case a - b - c would not be part of the language (nor would a + b + c actually), so your parsing algorithm is now unambiguous. Operator precedence where +, - are left associative would work (in FORTRAN the ** operator is right assocative).

Many or most practical languages have edge cases that are solved by some ad hoc rule. Some will use a parsing algorithm that always produces the same parse of multiple trees first, and you define the language according to the first parse tree generated.

reinierpost 6,3391 gold badge24 silver badges40 bronze badges · Answer 1 · 2024-11-19 15:35:21Z

It depends on the parsing technique used.

General context-free parsing techniques (such as GLR and Earley) can handle arbitrary grammars just fine.

In practice, we often see more limited parsers that require deterministic grammars, e.g., LALR(1) or LL(1). Popular parser generators such as Yacc and ANTLR are limited in such a way. So if you pick an existing parser generator, pay attention to how general it is.

gnasher729 32.6k36 silver badges58 bronze badges · Answer 2 · 2024-11-19 14:24:59Z

You need a recogniser (better yet a parser) for the language. An unambiguous context-free grammar with a parser that can handle any such grammar is fine. Another method is operator precedence where you specify which tokens are operators, whether they are unary or binary, left or right associative and what precedence. That will handle many languages just fine.

Or you can add a catch-all rule: Just state that any program that has two parse trees according to your parsing algorithm is not part of the language. In your case a - b - c would not be part of the language (nor would a + b + c actually), so your parsing algorithm is now unambiguous. Operator precedence where +, - are left associative would work (in FORTRAN the ** operator is right assocative).

Many or most practical languages have edge cases that are solved by some ad hoc rule. Some will use a parsing algorithm that always produces the same parse of multiple trees first, and you define the language according to the first parse tree generated.

Stack Exchange Network

Does a language recognizer require an unambiguous context-free grammar?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Does a language recognizer require an unambiguous context-free grammar?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions