In Michael Sipser's book, "Introduction to the Theory of Computation," regular expressions are defined as follows:
Based on this definition, how can I formally prove that any regular expression can be written as a finite formula where each of the elements in the formula are either:
- Parentheses,
- Union operation, concatenation operation, or Kleene star operation,
- One of the basic regular languages, i.e., a letter from the alphabet, the empty string, or the empty language.
-
$\begingroup$ (May be difficult: I see no requirement of finiteness in Definition 1.52.) $\endgroup$greybeard– greybeard2024年06月11日 10:53:35 +00:00Commented Jun 11, 2024 at 10:53
1 Answer 1
What you have there is a recursive definition. Every regular expression is generated by a finite number of applications of rules 1.-6., so you can simply do mathematical induction over the number of steps required to derive your expression1.
So start like this:
- If $R$ was derived in 1 step, then it must have the form $R = a \in \Sigma$, $R = \varepsilon$ or $R = \emptyset$. Therefore...
- Assume that all regular expressions derived in $\leq n$ steps have the desired property.
- If $R$ was derived in $n + 1$ steps, then $R$ must have the form $R = (R_1 \cup R_2), R = (R_1 \circ R_2)$ or $R = R_1^*$ where $R_1, R_2$ were derived in $\leq n$ steps. Using the induction hypothesis...
Why is every regular expression generated in a finite number of steps?
The recursive definition as written is a bit ambiguous. It only tells us which objects are regular expressions, but not which objects aren't. It is (sadly) often left implied, but when speaking about a recursively defined set, we usually mean the smallest set satisfying the requirements.
So given the definition above, we would formally define the set of regular expressions as the smallest set $\mathcal{R}$ satisfying 1. and 2. below
- $a, \varepsilon, \emptyset \in \mathcal{R}$ for all $a \in \Sigma$
- $(R_1 \cup R_2), (R_1 \circ R_2), (R_1^*) \in \mathcal{R}$ if $R_1, R_2 \in \mathcal{R}$
Now define the function $S: \mathbb{N} \to \wp(\mathcal{R})$ (by the recursion theorem) as follows
- $S(0) = \{\varepsilon, \emptyset\} \cup \Sigma$
- $S(n + 1) = \{(R_1 \cup R_2), (R_1 \circ R_2), (R_1^*), R_1 : R_1, R_2 \in S(n)\}$
Intuitively, think about $S(n)$ as the set of regular expressions derivable in $n$ steps or less. Define $$\mathcal{R}' := \bigcup_{n \in \mathbb{N}} S(n)$$
and note, that $\mathcal{R}' \subseteq \mathcal{R}$. Since $\mathcal{R}'$ also satisfies 1. and 2. above, it follows that $\mathcal{R} \subseteq \mathcal{R}'$ since $\mathcal{R}$ is the smallest set satisfying 1. and 2.
So $\mathcal{R} = \mathcal{R}'$.
1 You can also use structural induction, you might have already encountered it in a similar context in a formal logic class.
-
$\begingroup$ Thank you for your response. I have a follow-up question regarding your assumption that every regular expression is composed of a finite number of steps. You mentioned that each regular expression is generated by a finite number of applications of the rules (1-6). Could you please clarify or formally justify this assumption? Specifically, how does the recursive definition ensure that the construction process will always terminate after a finite number of steps? I'm looking for a rigorous explanation or proof based solely on the definition provided. $\endgroup$Vegetal605– Vegetal6052024年06月14日 22:38:30 +00:00Commented Jun 14, 2024 at 22:38
-
1$\begingroup$ @Vegetal605 Sorry, I totally forgot about this 😅 I'll add an addition to my answer later today/tomorrow. $\endgroup$Knogger– Knogger2024年06月19日 10:34:31 +00:00Commented Jun 19, 2024 at 10:34
-
$\begingroup$ Thanks so much for the detailed explanation and the addition about the recursion theorem. Your clarification really helped me understand why every regular expression is generated in a finite number of steps. $\endgroup$Vegetal605– Vegetal6052024年06月23日 07:39:42 +00:00Commented Jun 23, 2024 at 7:39
Explore related questions
See similar questions with these tags.