Operator-precedence grammar
An operator precedence grammar is a kind of grammar for formal languages.
Technically, an operator precedence grammar is a context-free grammar that has the property (among others)[1] that no production has either an empty right-hand side or two adjacent nonterminals in its right-hand side. These properties allow precedence relations to be defined between the terminals of the grammar. A parser that exploits these relations is considerably simpler than more general-purpose parsers, such as LALR parsers. Operator-precedence parsers can be constructed for a large class of context-free grammars.
Precedence relations
[edit ]Operator precedence grammars rely on the following three precedence relations between the terminals:[2]
Relation | Meaning |
---|---|
{\displaystyle a\lessdot b} | a yields precedence to b |
{\displaystyle a\doteq b} | a has the same precedence as b |
{\displaystyle a\gtrdot b} | a takes precedence over b |
These operator precedence relations allow to delimit the handles in the right sentential forms: {\displaystyle \lessdot } marks the left end, {\displaystyle \doteq } appears in the interior of the handle, and {\displaystyle \gtrdot } marks the right end. Contrary to other shift-reduce parsers, all nonterminals are considered equal for the purpose of identifying handles.[3] The relations do not have the same properties as their un-dotted counterparts; e. g. {\displaystyle a\doteq b} does not generally imply {\displaystyle b\doteq a}, and {\displaystyle b\gtrdot a} does not follow from {\displaystyle a\lessdot b}. Furthermore, {\displaystyle a\doteq a} does not generally hold, and {\displaystyle a\gtrdot a} is possible.
Let us assume that between the terminals ai and ai+1 there is always exactly one precedence relation. Suppose that $ is the end of the string. Then for all terminals b we define: {\displaystyle \$\lessdot b} and {\displaystyle b\gtrdot \$}. If we remove all nonterminals and place the correct precedence relation: {\displaystyle \lessdot }, {\displaystyle \doteq }, {\displaystyle \gtrdot } between the remaining terminals, there remain strings that can be analyzed by an easily developed bottom-up parser.
Example
[edit ]For example, the following operator precedence relations can be introduced for simple expressions:[4]
- {\displaystyle {\begin{array}{c|cccc}&\mathrm {id} &+&*&\$\\\hline \mathrm {id} &&\gtrdot &\gtrdot &\gtrdot \\+&\lessdot &\gtrdot &\lessdot &\gtrdot \\*&\lessdot &\gtrdot &\gtrdot &\gtrdot \\\$&\lessdot &\lessdot &\lessdot &\end{array}}}
They follow from the following facts:[5]
- + has lower precedence than * (hence {\displaystyle +\lessdot *} and {\displaystyle *\gtrdot +}).
- Both + and * are left-associative (hence {\displaystyle +\gtrdot +} and {\displaystyle *\gtrdot *}).
The input string[4]
- {\displaystyle \mathrm {id} _{1}+\mathrm {id} _{2}*\mathrm {id} _{3}}
after adding end markers and inserting precedence relations becomes
- {\displaystyle \$\lessdot \mathrm {id} _{1}\gtrdot +\lessdot \mathrm {id} _{2}\gtrdot *\lessdot \mathrm {id} _{3}\gtrdot \$}
Operator precedence parsing
[edit ]Having precedence relations allows to identify handles as follows:[4]
- scan the string from left until seeing {\displaystyle \gtrdot }
- scan backwards (from right to left) over any {\displaystyle \doteq } until seeing {\displaystyle \lessdot }
- everything between the two relations {\displaystyle \lessdot } and {\displaystyle \gtrdot }, including any intervening or surrounding nonterminals, forms the handle
It is generally not necessary to scan the entire sentential form to find the handle.
Operator precedence parsing algorithm
[edit ]The algorithm below is from Aho et al.:[6]
If $ is on the top of the stack and ip points to $ then return else Let a be the top terminal on the stack, and b the symbol pointed to by ip if a {\displaystyle \lessdot } b or a {\displaystyle \doteq } b then push b onto the stack advance ip to the next input symbol else if a {\displaystyle \gtrdot } b then repeat pop the stack until the top stack terminal is related by {\displaystyle \lessdot } to the terminal most recently popped else error() end
Precedence functions
[edit ]An operator precedence parser usually does not store the precedence table with the relations, which can get rather large. Instead, precedence functions f and g are defined.[7] They map terminal symbols to integers, and so the precedence relations between the symbols are implemented by numerical comparison: {\displaystyle f(a)<g(b)} must hold if {\displaystyle a\lessdot b} holds, etc.
Not every table of precedence relations has precedence functions, but in practice for most grammars such functions can be designed.[8]
Algorithm for constructing precedence functions
[edit ]The below algorithm is from Aho et al.:[9]
- Create symbols fa and ga for each grammar terminal a and for the end of string symbol;
- Partition the created symbols in groups so that fa and gb are in the same group if {\displaystyle a\doteq b} (there can be symbols in the same group even if their terminals are not connected by this relation);
- Create a directed graph whose nodes are the groups. For each pair {\displaystyle (a,b)} of terminals do: place an edge from the group of gb to the group of fa if {\displaystyle a\lessdot b}, otherwise if {\displaystyle a\gtrdot b} place an edge from the group of fa to that of gb;
- If the constructed graph has a cycle then no precedence functions exist. When there are no cycles, let {\displaystyle f(a)} be the length of the longest path from the group of fa and let {\displaystyle g(a)} be the length of the longest path from the group of ga.
Example
[edit ]Consider the following table (repeated from above):[10]
- {\displaystyle {\begin{array}{c|cccc}&\mathrm {id} &+&*&\$\\\hline \mathrm {id} &&\gtrdot &\gtrdot &\gtrdot \\+&\lessdot &\gtrdot &\lessdot &\gtrdot \\*&\lessdot &\gtrdot &\gtrdot &\gtrdot \\\$&\lessdot &\lessdot &\lessdot &\end{array}}}
Using the algorithm leads to the following graph:
gid \ fid f* \ / g* / f+ | \ | g+ | | g$ f$
from which we extract the following precedence functions from the maximum heights in the directed acyclic graph:
id + * $ f 4 2 4 0 g 5 1 3 0
Operator-precedence languages
[edit ]The class of languages described by operator-precedence grammars, i.e., operator-precedence languages, is strictly contained in the class of deterministic context-free languages, and strictly contains visibly pushdown languages.[11]
Operator-precedence languages enjoy many closure properties: union, intersection, complementation,[12] concatenation,[11] and they are the largest known class closed under all these operations and for which the emptiness problem is decidable. Another peculiar feature of operator-precedence languages is their local parsability,[13] that enables efficient parallel parsing.
There are also characterizations based on an equivalent form of automata and monadic second-order logic.[14]
Notes
[edit ]- ^ Aho, Sethi & Ullman 1988, p. 203
- ^ Aho, Sethi & Ullman 1988, pp. 203–204
- ^ Aho, Sethi & Ullman 1988, pp. 205–206
- ^ a b c Aho, Sethi & Ullman 1988, p. 205
- ^ Aho, Sethi & Ullman 1988, p. 204
- ^ Aho, Sethi & Ullman 1988, p. 206
- ^ Aho, Sethi & Ullman 1988, pp. 208–209
- ^ Aho, Sethi & Ullman 1988, p. 209
- ^ Aho, Sethi & Ullman 1988, pp. 209–210
- ^ Aho, Sethi & Ullman 1988, p. 210
- ^ a b Crespi Reghizzi & Mandrioli 2012
- ^ Crespi Reghizzi, Mandrioli & Martin 1978
- ^ Barenghi et al. 2015
- ^ Lonati et al. 2015
References
[edit ]- Aho, Alfred V.; Sethi, Ravi; Ullman, Jeffrey D. (1988). Compilers — Principles, Techniques, and Tools. Addison-Wesley.
- Crespi Reghizzi, Stefano; Mandrioli, Dino (2012). "Operator precedence and the visibly pushdown property". Journal of Computer and System Sciences. 78 (6): 1837–1867. doi:10.1016/j.jcss.201112006 .
- Crespi Reghizzi, Stefano; Mandrioli, Dino; Martin, David F. (1978). "Algebraic Properties of Operator Precedence Languages". Information and Control. 37 (2): 115–133. doi:10.1016/S0019-9958(78)90474-6 .
- Barenghi, Alessandro; Crespi Reghizzi, Stefano; Mandrioli, Dino; Panella, Federica; Pradella, Matteo (2015). "Parallel parsing made practical". Science of Computer Programming. 112 (3): 245–249. doi:10.1016/j.scico.201509002 . hdl:11311/971391 .
- Lonati, Violetta; Mandrioli, Dino; Panella, Federica; Pradella, Matteo (2015). "Operator Precedence Languages: Their Automata-Theoretic and Logic Characterization". SIAM Journal on Computing. 44 (4): 1026–1088. doi:10.1137/140978818. hdl:2434/352809 .
Further reading
[edit ]- Floyd, R. W. (July 1963). "Syntactic Analysis and Operator Precedence". Journal of the ACM. 10 (3): 316–333. doi:10.1145/321172.321179 . S2CID 19785090.
External links
[edit ]- Nikolay Nikolaev: IS53011A Language Design and Implementation, Course notes for CIS 324, 2010.