Generalized context-free grammar

Abstract language theory concept

Generalized context-free grammar (GCFG) is a grammar formalism that expands on context-free grammars by adding potentially non-context-free composition functions to rewrite rules.^[1] Head grammar (and its weak equivalents) is an instance of such a GCFG which is known to be especially adept at handling a wide variety of non-CF properties of natural language.

Description

[edit ]

A GCFG consists of two components: a set of composition functions that combine string tuples, and a set of rewrite rules. The composition functions all have the form $f(\langle x_{1},...,x_{m}\rangle ,\langle y_{1},...,y_{n}\rangle ,...)=\gamma$ {\displaystyle f(\langle x_{1},...,x_{m}\rangle ,\langle y_{1},...,y_{n}\rangle ,...)=\gamma }, where $\gamma$ {\displaystyle \gamma } is either a single string tuple, or some use of a (potentially different) composition function which reduces to a string tuple. Rewrite rules look like $X\to f(Y,Z,...)$ {\displaystyle X\to f(Y,Z,...)}, where $Y$ {\displaystyle Y}, $Z$ {\displaystyle Z}, ... are string tuples or non-terminal symbols.

The rewrite semantics of GCFGs is fairly straightforward. An occurrence of a non-terminal symbol is rewritten using rewrite rules as in a context-free grammar, eventually yielding just compositions (composition functions applied to string tuples or other compositions). The composition functions are then applied, successively reducing the tuples to a single tuple.

Example

[edit ]

A simple translation of a context-free grammar into a GCFG can be performed in the following fashion. Given the grammar in (1 ), which generates the palindrome language $\{ww^{R}:w\in \{a,b\}^{*}\}$ {\displaystyle \{ww^{R}:w\in \{a,b\}^{*}\}}, where $w^{R}$ {\displaystyle w^{R}} is the string reverse of $w$ {\displaystyle w}, we can define the composition function conc as in (2a ) and the rewrite rules as in (2b ).

S\to \epsilon ~|~aSa~|~bSb

{\displaystyle S\to \epsilon ~|~aSa~|~bSb} 1

conc(\langle x\rangle ,\langle y\rangle ,\langle z\rangle )=\langle xyz\rangle

{\displaystyle conc(\langle x\rangle ,\langle y\rangle ,\langle z\rangle )=\langle xyz\rangle } 2a

S\to conc(\langle \epsilon \rangle ,\langle \epsilon \rangle ,\langle \epsilon \rangle )~|~conc(\langle a\rangle ,S,\langle a\rangle )~|~conc(\langle b\rangle ,S,\langle b\rangle )

{\displaystyle S\to conc(\langle \epsilon \rangle ,\langle \epsilon \rangle ,\langle \epsilon \rangle )~|~conc(\langle a\rangle ,S,\langle a\rangle )~|~conc(\langle b\rangle ,S,\langle b\rangle )} 2b

The CF production of abbbba is

S

aSa

abSba

abbSbba

abbbba

and the corresponding GCFG production is

S\to conc(\langle a\rangle ,S,\langle a\rangle )

{\displaystyle S\to conc(\langle a\rangle ,S,\langle a\rangle )}

conc(\langle a\rangle ,conc(\langle b\rangle ,S,\langle b\rangle ),\langle a\rangle )

{\displaystyle conc(\langle a\rangle ,conc(\langle b\rangle ,S,\langle b\rangle ),\langle a\rangle )}

conc(\langle a\rangle ,conc(\langle b\rangle ,conc(\langle b\rangle ,S,\langle b\rangle ),\langle b\rangle ),\langle a\rangle )

{\displaystyle conc(\langle a\rangle ,conc(\langle b\rangle ,conc(\langle b\rangle ,S,\langle b\rangle ),\langle b\rangle ),\langle a\rangle )}

conc(\langle a\rangle ,conc(\langle b\rangle ,conc(\langle b\rangle ,conc(\langle \epsilon \rangle ,\langle \epsilon \rangle ,\langle \epsilon \rangle ),\langle b\rangle ),\langle b\rangle ),\langle a\rangle )

{\displaystyle conc(\langle a\rangle ,conc(\langle b\rangle ,conc(\langle b\rangle ,conc(\langle \epsilon \rangle ,\langle \epsilon \rangle ,\langle \epsilon \rangle ),\langle b\rangle ),\langle b\rangle ),\langle a\rangle )}

conc(\langle a\rangle ,conc(\langle b\rangle ,conc(\langle b\rangle ,\langle \epsilon \rangle ,\langle b\rangle ),\langle b\rangle ),\langle a\rangle )

{\displaystyle conc(\langle a\rangle ,conc(\langle b\rangle ,conc(\langle b\rangle ,\langle \epsilon \rangle ,\langle b\rangle ),\langle b\rangle ),\langle a\rangle )}

conc(\langle a\rangle ,conc(\langle b\rangle ,\langle bb\rangle ,\langle b\rangle ),\langle a\rangle )

{\displaystyle conc(\langle a\rangle ,conc(\langle b\rangle ,\langle bb\rangle ,\langle b\rangle ),\langle a\rangle )}

conc(\langle a\rangle ,\langle bbbb\rangle ,\langle a\rangle )

{\displaystyle conc(\langle a\rangle ,\langle bbbb\rangle ,\langle a\rangle )}

\langle abbbba\rangle

{\displaystyle \langle abbbba\rangle }

Linear Context-free Rewriting Systems (LCFRSs)

[edit ]

Weir (1988)^[1] describes two properties of composition functions, linearity and regularity. A function defined as $f(x_{1},...,x_{n})=...$ {\displaystyle f(x_{1},...,x_{n})=...} is linear if and only if each variable appears at most once on either side of the =, making $f(x)=g(x,y)$ {\displaystyle f(x)=g(x,y)} linear but not $f(x)=g(x,x)$ {\displaystyle f(x)=g(x,x)}. A function defined as $f(x_{1},...,x_{n})=...$ {\displaystyle f(x_{1},...,x_{n})=...} is regular if the left hand side and right hand side have exactly the same variables, making $f(x,y)=g(y,x)$ {\displaystyle f(x,y)=g(y,x)} regular but not $f(x)=g(x,y)$ {\displaystyle f(x)=g(x,y)} or $f(x,y)=g(x)$ {\displaystyle f(x,y)=g(x)}.

A grammar in which all composition functions are both linear and regular is called a Linear Context-free Rewriting System (LCFRS). LCFRS is a proper subclass of the GCFGs, i.e. it has strictly less computational power than the GCFGs as a whole.

On the other hand, LCFRSs are strictly more expressive than linear-indexed grammars and their weakly equivalent variant tree adjoining grammars (TAGs).^[2] Head grammar is another example of an LCFRS that is strictly less powerful than the class of LCFRSs as a whole.

LCFRS are weakly equivalent to (set-local) multicomponent TAGs (MCTAGs) and also with multiple context-free grammar (MCFGs [1]).^[3] and minimalist grammars (MGs). The languages generated by LCFRS (and their weakly equivalents) can be parsed in polynomial time.^[4]

References

[edit ]

^ ^a ^b Weir, David Jeremy (Sep 1988). Characterizing mildly context-sensitive grammar formalisms (PDF) (Ph.D.). Paper. Vol. AAI8908403. University of Pennsylvania Ann Arbor.
^ Laura Kallmeyer (2010). Parsing Beyond Context-Free Grammars. Springer Science & Business Media. p. 33. ISBN 978-3-642-14846-0.
^ Laura Kallmeyer (2010). Parsing Beyond Context-Free Grammars. Springer Science & Business Media. p. 35-36. ISBN 978-3-642-14846-0.
^ Johan F.A.K. van Benthem; Alice ter Meulen (2010). Handbook of Logic and Language (2nd ed.). Elsevier. p. 404. ISBN 978-0-444-53727-0.

v
t
e

Automata theory: formal languages and formal grammars

Chomsky hierarchy Grammars Languages Abstract machines

Type-0
—
Type-1
—
—
—
—
—
Type-2
—
—
Type-3
—
—

Turing machine
Decider
Linear-bounded
PTIME Turing Machine
Nested stack
Thread automaton
restricted Tree stack automaton
Embedded pushdown
Nondeterministic pushdown
Deterministic pushdown
Visibly pushdown
Finite
Counter-free (with aperiodic finite monoid)
Acyclic finite

Each category of languages, except those marked by a ^*, is a proper subset of the category directly above it. Any language in each category is generated by a grammar and by an automaton in the category in the same line.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Generalized_context-free_grammar&oldid=1064807261#Linear_Context-free_Rewriting_Systems_(LCFRSs)"

Description

Example

Linear Context-free Rewriting Systems (LCFRSs)

See also

References