Production (computer science)

Method of symbol substitution

Not to be confused with Deployment environment § Production.

For other uses, see Production.

This article needs additional citations for verification . Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Production" computer science – news · newspapers · books · scholar · JSTOR (November 2012) (Learn how and when to remove this message)

Formal languages
Part of a series on
Key concepts Formal system Alphabet Syntax Formal semantics Semantics (programming languages) Formal grammar Formation rule Well-formed formula Automata theory Regular expression Production Ground expression Atomic formula
Applications Formal methods Propositional calculus Predicate logic Mathematical notation Natural language processing Programming language theory Mathematical linguistics Computational linguistics Syntax analysis Formal verification Automated theorem proving
v t e

In computer science, a production or production rule is a rewrite rule that replaces some symbols with other symbols.^[1] A finite set of productions $P$ {\displaystyle P} is the main component in the specification of a formal grammar (specifically a generative grammar).

In such grammars, a set of productions is a special case of relation on the set of strings $V^{*}$ {\displaystyle V^{*}} (where ${}^{*}$ {\displaystyle {}^{*}} is the Kleene star operator) over a finite set of symbols $V$ {\displaystyle V} called a vocabulary that defines which non-empty strings can be substituted with others. The set of productions is thus a special kind subset

P\subset V^{*}\times V^{*}

{\displaystyle P\subset V^{*}\times V^{*}}

and productions are then written in the form $u\to v$ {\displaystyle u\to v} to mean that $(u,v)\in P$ {\displaystyle (u,v)\in P} (not to be confused with $\to$ {\displaystyle \to } being used as function notation, since there may be multiple rules for the same $u$ {\displaystyle u}). Given two subsets $A,B\subset V^{*}$ {\displaystyle A,B\subset V^{*}}, productions can be restricted to satisfy $P\subset A\times B$ {\displaystyle P\subset A\times B}, in which case productions are said "to be of the form $A\to B$ {\displaystyle A\to B}. Different choices and constructions of $A,B$ {\displaystyle A,B} lead to different types of grammars. In general, any production of the form

u\to \epsilon ,

{\displaystyle u\to \epsilon ,}

where $\epsilon$ {\displaystyle \epsilon } is the empty string (sometimes also denoted $\lambda$ {\displaystyle \lambda }), is called an erasing rule, while productions that would produce strings out of nowhere, namely of the form

\epsilon \to v,

{\displaystyle \epsilon \to v,}

are never allowed.

In order to allow the production rules to create meaningful sentences, the vocabulary is partitioned into (disjoint) sets $\Sigma$ {\displaystyle \Sigma } and $N$ {\displaystyle N} providing two different roles:

$\Sigma$ {\displaystyle \Sigma } denotes the terminal symbols known as an alphabet containing the symbols allowed in a sentence;
$N$ {\displaystyle N} denotes nonterminal symbols, containing a distinguished start symbol $S\in N$ {\displaystyle S\in N}, that are needed together with the production rules to define how to build the sentences.

In the most general case of an unrestricted grammar, a production $u\to v$ {\displaystyle u\to v}, is allowed to map arbitrary strings $u$ {\displaystyle u} and $v$ {\displaystyle v} in $V$ {\displaystyle V} (terminals and nonterminals), as long as $u$ {\displaystyle u} is not empty. So unrestricted grammars have productions of the form

V^{*}\setminus \{\epsilon \}\to V^{*}

{\displaystyle V^{*}\setminus \{\epsilon \}\to V^{*}}

or if we want to disallow changing finished sentences

V^{*}NV^{*}=(V^{*}\setminus \Sigma ^{*})\to V^{*}

{\displaystyle V^{*}NV^{*}=(V^{*}\setminus \Sigma ^{*})\to V^{*}},

where $V^{*}NV^{*}$ {\displaystyle V^{*}NV^{*}} indicates concatenation and forces a non-terminal symbol to always be present in of the left-hand side of the productions, $\cup$ {\displaystyle \cup } denotes set union, and $\setminus$ {\displaystyle \setminus } denotes set minus or set difference. If we do not allow the start symbol to occur in $v$ {\displaystyle v} (the word on the right side), we have to replace $V^{*}$ {\displaystyle V^{*}} by $(V\setminus \{S\})^{*}$ {\displaystyle (V\setminus \{S\})^{*}} in the right-hand side.^[2]

The other types of formal grammar in the Chomsky hierarchy impose additional restrictions on what constitutes a production. Notably in a context-free grammar, the left-hand side of a production must be a single nonterminal symbol. So productions are of the form:

N\to V^{*}

{\displaystyle N\to V^{*}}

Grammar generation

[edit ]

To generate a string in the language, one begins with a string consisting of only a single start symbol, and then successively applies the rules (any number of times, in any order) to rewrite this string. This stops when a string containing only terminals is obtained. The language consists of all the strings that can be generated in this manner. Any particular sequence of legal choices taken during this rewriting process yields one particular string in the language. If there are multiple different ways of generating this single string, then the grammar is said to be ambiguous.

For example, assume the alphabet consists of $a$ {\displaystyle a} and $b$ {\displaystyle b}, with the start symbol $S$ {\displaystyle S}, and we have the following rules:

1.

S\rightarrow aSb

{\displaystyle S\rightarrow aSb}

2.

S\rightarrow ba

{\displaystyle S\rightarrow ba}

then we start with $S$ {\displaystyle S}, and can choose a rule to apply to it. If we choose rule 1, we replace $S$ {\displaystyle S} with $aSb$ {\displaystyle aSb} and obtain the string $aSb$ {\displaystyle aSb}. If we choose rule 1 again, we replace $S$ {\displaystyle S} with $aSb$ {\displaystyle aSb} and obtain the string $aaSbb$ {\displaystyle aaSbb}. This process is repeated until we only have symbols from the alphabet (i.e., $a$ {\displaystyle a} and $b$ {\displaystyle b}). If we now choose rule 2, we replace $S$ {\displaystyle S} with $ba$ {\displaystyle ba} and obtain the string $aababb$ {\displaystyle aababb}, and are done. We can write this series of choices more briefly, using symbols: $S\Rightarrow aSb\Rightarrow aaSbb\Rightarrow aababb$ {\displaystyle S\Rightarrow aSb\Rightarrow aaSbb\Rightarrow aababb}. The language of the grammar is the set of all the strings that can be generated using this process: $\{ba,abab,aababb,aaababbb,\dotsc \}$ {\displaystyle \{ba,abab,aababb,aaababbb,\dotsc \}}.

References

[edit ]

^ "Formal Grammars" (PDF). Standford.edu.
^ See Klaus Reinhardt: Prioritatszahlerautomaten und die Synchronisation von Halbspursprachen Archived 2018年01月17日 at the Wayback Machine; Fakultät Informatik der Universität Stuttgart; 1994 (German)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Production_(computer_science)&oldid=1319651885"

Grammar generation

See also

References