Context-sensitive language

Language defined by context-sensitive grammar

"Context-dependent" redirects here. For the type of memory, see Context-dependent memory.

In formal language theory, a context-sensitive language is a formal language that can be defined by a context-sensitive grammar, where the applicability of a production rule may depend on the surrounding context of symbols. Unlike context-free grammars, which can apply rules regardless of context, context-sensitive grammars allow rules to be applied only when specific neighboring symbols are present, enabling them to express dependencies and agreements between distant parts of a string.

These languages correspond to type-1 languages in the Chomsky hierarchy and are equivalently defined by noncontracting grammars (grammars where production rules never decrease the total length of a string). Context-sensitive languages can model natural language phenomena such as subject-verb agreement, cross-serial dependencies, and other complex syntactic relationships that cannot be captured by simpler grammar types,^{[citation needed ]} making them important for computational linguistics and natural language processing.

Computational properties

[edit ]

Computationally, a context-sensitive language is equivalent to a linear bounded nondeterministic Turing machine, also called a linear bounded automaton. That is a non-deterministic Turing machine with a tape of only $kn$ {\displaystyle kn} cells, where $n$ {\displaystyle n} is the size of the input and $k$ {\displaystyle k} is a constant associated with the machine. This means that every formal language that can be decided by such a machine is a context-sensitive language, and every context-sensitive language can be decided by such a machine.

This set of languages is also known as NLINSPACE or NSPACE(O(n)), because they can be accepted using linear space on a non-deterministic Turing machine.^[1] The class LINSPACE (or DSPACE(O(n))) is defined the same, except using a deterministic Turing machine. Clearly LINSPACE is a subset of NLINSPACE, but it is not known whether LINSPACE = NLINSPACE.^[2]

Examples

[edit ]

One of the simplest context-sensitive but not context-free languages is $L=\{a^{n}b^{n}c^{n}:n\geq 1\}$ {\displaystyle L=\{a^{n}b^{n}c^{n}:n\geq 1\}}: the language of all strings consisting of n occurrences of the symbol "a", then n "b"s, then n "c"s (abc, aabbcc, aaabbbccc, etc.). A superset of this language, called the Bach language,^[3] is defined as the set of all strings where "a", "b" and "c" (or any other set of three symbols) occurs equally often (aabccb, baabcaccb, etc.) and is also context-sensitive.^[4]^[5]

L can be shown to be a context-sensitive language by constructing a linear bounded automaton which accepts L. The language can easily be shown to be neither regular nor context-free by applying the respective pumping lemmas for each of the language classes to L.

Similarly:

$L_{\textit {Cross}}=\{a^{m}b^{n}c^{m}d^{n}:m\geq 1,n\geq 1\}$ {\displaystyle L_{\textit {Cross}}=\{a^{m}b^{n}c^{m}d^{n}:m\geq 1,n\geq 1\}} is another context-sensitive language; the corresponding context-sensitive grammar can be easily projected starting with two context-free grammars generating sentential forms in the formats $a^{m}C^{m}$ {\displaystyle a^{m}C^{m}} and $B^{n}d^{n}$ {\displaystyle B^{n}d^{n}} and then supplementing them with a permutation production like $CB\rightarrow BC$ {\displaystyle CB\rightarrow BC}, a new starting symbol and standard syntactic sugar.

$L_{MUL3}=\{a^{m}b^{n}c^{mn}:m\geq 1,n\geq 1\}$ {\displaystyle L_{MUL3}=\{a^{m}b^{n}c^{mn}:m\geq 1,n\geq 1\}} is another context-sensitive language (the "3" in the name of this language is intended to mean a ternary alphabet); that is, the "product" operation defines a context-sensitive language (but the "sum" defines only a context-free language as the grammar $S\rightarrow aSc|R$ {\displaystyle S\rightarrow aSc|R} and $R\rightarrow bRc|bc$ {\displaystyle R\rightarrow bRc|bc} shows). Because of the commutative property of the product, the most intuitive grammar for $L_{\textit {MUL3}}$ {\displaystyle L_{\textit {MUL3}}} is ambiguous. This problem can be avoided considering a somehow more restrictive definition of the language, e.g. $L_{\textit {ORDMUL3}}=\{a^{m}b^{n}c^{mn}:1<m<n\}$ {\displaystyle L_{\textit {ORDMUL3}}=\{a^{m}b^{n}c^{mn}:1<m<n\}}. This can be specialized to $L_{\textit {MUL1}}=\{a^{mn}:m>1,n>1\}$ {\displaystyle L_{\textit {MUL1}}=\{a^{mn}:m>1,n>1\}} and, from this, to $L_{m^{2}}=\{a^{m^{2}}:m>1\}$ {\displaystyle L_{m^{2}}=\{a^{m^{2}}:m>1\}}, $L_{m^{3}}=\{a^{m^{3}}:m>1\}$ {\displaystyle L_{m^{3}}=\{a^{m^{3}}:m>1\}}, etc.

$L_{REP}=\{w^{|w|}:w\in \Sigma ^{*}\}$ {\displaystyle L_{REP}=\{w^{|w|}:w\in \Sigma ^{*}\}} is a context-sensitive language. The corresponding context-sensitive grammar can be obtained as a generalization of the context-sensitive grammars for $L_{\textit {Square}}=\{w^{2}:w\in \Sigma ^{*}\}$ {\displaystyle L_{\textit {Square}}=\{w^{2}:w\in \Sigma ^{*}\}}, $L_{\textit {Cube}}=\{w^{3}:w\in \Sigma ^{*}\}$ {\displaystyle L_{\textit {Cube}}=\{w^{3}:w\in \Sigma ^{*}\}}, etc.

$L_{\textit {EXP}}=\{a^{2^{n}}:n\geq 1\}$ {\displaystyle L_{\textit {EXP}}=\{a^{2^{n}}:n\geq 1\}} is a context-sensitive language.^[6]

$L_{\textit {PRIMES2}}=\{w:|w|{\mbox{ is prime }}\}$ {\displaystyle L_{\textit {PRIMES2}}=\{w:|w|{\mbox{ is prime }}\}} is a context-sensitive language (the "2" in the name of this language is intended to mean a binary alphabet). This was proved by Hartmanis using pumping lemmas for regular and context-free languages over a binary alphabet and, after that, sketching a linear bounded multitape automaton accepting $L_{PRIMES2}$ {\displaystyle L_{PRIMES2}}.^[7]

$L_{\textit {PRIMES1}}=\{a^{p}:p{\mbox{ is prime }}\}$ {\displaystyle L_{\textit {PRIMES1}}=\{a^{p}:p{\mbox{ is prime }}\}} is a context-sensitive language (the "1" in the name of this language is intended to mean a unary alphabet). This was credited by A. Salomaa to Matti Soittola by means of a linear bounded automaton over a unary alphabet^[8] (pages 213–214, exercise 6.8) and also to Marti Penttonen by means of a context-sensitive grammar also over a unary alphabet (See: Formal Languages by A. Salomaa, page 14, Example 2.5).

An example of recursive language that is not context-sensitive is any recursive language whose decision is an EXPSPACE-hard problem, say, the set of pairs of equivalent regular expressions with exponentiation.

Properties of context-sensitive languages

[edit ]

The union, intersection, concatenation of two context-sensitive languages is context-sensitive, also the Kleene plus of a context-sensitive language is context-sensitive.^[9]
The complement of a context-sensitive language is itself context-sensitive^[10] a result known as the Immerman–Szelepcsényi theorem.
Membership of a string in a language defined by an arbitrary context-sensitive grammar, or by an arbitrary deterministic context-sensitive grammar, is a PSPACE-complete problem.

References

[edit ]

^ Rothe, Jörg (2005), Complexity theory and cryptology, Texts in Theoretical Computer Science. An EATCS Series, Berlin: Springer-Verlag, p. 77, ISBN 978-3-540-22147-0, MR 2164257 .
^ Odifreddi, P. G. (1999), Classical recursion theory. Vol. II, Studies in Logic and the Foundations of Mathematics, vol. 143, Amsterdam: North-Holland Publishing Co., p. 236, ISBN 978-0-444-50205-6, MR 1718169 .
^ Pullum, Geoffrey K. (1983). Context-freeness and the computer processing of human languages. Proc. 21st Annual Meeting of the ACL.
^ Bach, E. (1981). "Discontinuous constituents in generalized categorial grammars" Archived 2014年01月21日 at the Wayback Machine. NELS, vol. 11, pp. 1–12.
^ Joshi, A.; Vijay-Shanker, K.; and Weir, D. (1991). "The convergence of mildly context-sensitive grammar formalisms". In: Sells, P., Shieber, S.M. and Wasow, T. (Editors). Foundational Issues in Natural Language Processing. Cambridge MA: Bradford.
^ Example 9.5 (p. 224) of Hopcroft, John E.; Ullman, Jeffrey D. (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley
^ J. Hartmanis and H. Shank (Jul 1968). "On the Recognition of Primes by Automata" (PDF). Journal of the ACM . 15 (3): 382–389. doi:10.1145/321466.321470. hdl:1813/5864 . S2CID 17998039.
^ Salomaa, Arto (1969), Theory of Automata, ISBN 978-0-08-013376-8, Pergamon, 276 pages. doi:10.1016/C2013-0-02221-9
^ John E. Hopcroft; Jeffrey D. Ullman (1979). Introduction to Automata Theory, Languages, and Computation . Addison-Wesley. ISBN 9780201029888.; Exercise 9.10, p.230. In the 2000 edition, the chapter on context-sensitive languages has been omitted.
^ Immerman, Neil (1988). "Nondeterministic space is closed under complementation" (PDF). SIAM J. Comput. 17 (5): 935–938. CiteSeerX 10.1.1.54.5941 . doi:10.1137/0217058. Archived (PDF) from the original on 2004年06月25日.

Sipser, M. (1996), Introduction to the Theory of Computation, PWS Publishing Co.

v
t
e

Automata theory: formal languages and formal grammars

Chomsky hierarchy Grammars Languages Abstract machines

Type-0
—
Type-1
—
—
—
—
—
Type-2
—
—
Type-3
—
—

Turing machine
Decider
Linear-bounded
PTIME Turing Machine
Nested stack
Thread automaton
restricted Tree stack automaton
Embedded pushdown
Nondeterministic pushdown
Deterministic pushdown
Visibly pushdown
Finite
Counter-free (with aperiodic finite monoid)
Acyclic finite

Each category of languages, except those marked by a ^*, is a proper subset of the category directly above it. Any language in each category is generated by a grammar and by an automaton in the category in the same line.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Context-sensitive_language&oldid=1320713219"

Computational properties

Examples

Properties of context-sensitive languages

See also

References