ToC_slides_ch1
View on GitHub | Download Local
Extracted Content (for search)
Click to view slide text
Theory of Computation Regular Languages Dimitris Diochnos School of Computer Science University of Oklahoma
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
1 / 100
Outline
1
Finite Automata
2
Nondeterminism
3
Regular Expressions
4
Non-Regular Languages
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
2 / 100
Finite Automata
Table of Contents
1
Finite Automata
2
Nondeterminism
3
Regular Expressions
4
Non-Regular Languages
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
3 / 100
Finite Automata
Finite Automata
Simplest computational model: Finite state machine or finite automaton Good model for computers with extremely limited amount of memory (e.g., automatic door). Markov chains are probabilistic counterparts of finite automata.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
4 / 100
Finite Automata
Finite Automata – example Example 1 (A simple finite automaton M1 )
M1 recognizes strings that: have at least one 1, end in a 1, end in an even number of zeros, following the last 1 (i.e., there is at least one 1) D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
5 / 100
Finite Automata
Formal definition of a finite automaton
Finite automaton A finite automaton is a 5-tuple (Q, Σ, δ, q0 , F ), where: Q is a finite set called the states, Σ is a finite set called the (input) alphabet, δ : Q × Σ → Q is the transition function (it includes the rules for moving), q0 ∈ Q is the start state, and F ⊆ Q is the set of accept states (final states).
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
6 / 100
Finite Automata
Language
Language A Language of machine M is the set of strings that M accepts. Notation: L(M) = A We say that M recognizes A or that M accepts A. In the example 1, with M1 , above, L(M1 ) = A where A = {w|w contains at least one 1 and an even number of zeros follow the last 1}.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
7 / 100
Finite Automata
Examples of finite automata (a) Example 2 (Automaton M2 )
M2 = ({q1 , q2 }, {0, 1}, δ, q1 , {q2 })
δ 0 1 q1 q1 q2 q2 q1 q2 State diagram and formal description contain the same information. D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
8 / 100
Finite Automata
Remark
A good way to understand a machine: Try it on some sample input strings. Example 3 String 1101 is accepted, String 110 is rejected. Then: M2 accepts all strings that end with a 1. Thus, L(M2 ) = {w|w ends in a 1}.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
9 / 100
Finite Automata
Examples of finite automata (b)
Example 4 (Automaton M3 )
L(M3 ) = {w|w is the empty string ε or ends in a 0}
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
10 / 100
Finite Automata
Examples of finite automata (c) Example 5 (Automaton M4 )
L(M4 ) = {w|w starts and ends with the same symbol} D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
11 / 100
Finite Automata
Formal definition of computation Computation Let M = (Q, Σ, δ, q0 , F ) be a finite automaton and let w = w1 w2 · · · wn be a string where each wi is a member of the alphabet Σ. Then M accepts w if a sequence of states r0 , r1 , … , rn in Q exists with three conditions: 1
r0 = q0 ,
2
δ(ri , wi+1 ) = ri+1 , for i = 0, … , n − 1, and
3
rn ∈ F .
Condition 1 says that the machine starts in the start state. Condition 2 says that the machine goes from state to state according to the transition function. Condition 3 says that the machine accepts its input if it ends up in an accept state. We say that M recognizes language A if A = {w|M accepts w}. D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
12 / 100
Finite Automata
Definition of regular language
Regular language A language is called a regular language if some finite automaton recognizes it.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
13 / 100
Finite Automata
Designing Finite Automata Example recognizing odd number of 1s:
Example recognizing 001 as a substring:
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
14 / 100
Finite Automata
Regular Operations
Definition Let A and B be languages. We define the regular operations union, concatenation, and star as follows: Union: A ∪ B = {x|x ∈ A or x ∈ B}. Concatenation: A ◦ B = {xy|x ∈ A and y ∈ B}. Star: A∗ = {x1 x2 … xk |k ≥ 0 and each xi ∈ A}. Remark: The empty string ε is always a member of A∗ , no matter what A is.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
15 / 100
Finite Automata
Closeness under Operations A collection of objects is closed under some operation if applying that operation to members of the collection returns an object still in the collection. The collection of regular languages is closed under all three of the regular operations. Example 6 (Closeness) Let the alphabet Σ be the standard 26 letters {a, b, … , z}. If A = {good, bad} and B = {boy, girl}, then A ∪ B = {good, bad, boy, girl}, A ◦ B = {goodboy, goodgirl, badboy, badgirl}, and A∗ = {ε, good, bad, goodgood, goodbad, badgood, badbad, goodgoodgood, goodgoodbad, goodbadgood, goodbadbad, … }.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
16 / 100
Finite Automata
Closeness under Union Operation
Theorem The class of regular languages is closed under the union operation. In other words, if A1 and A2 are regular languages, so is A1 ∪ A2 . Proof Idea Construct M from M1 (that recognizes A1 ) and M2 (that recognizes A2 ). Simulate both M1 and M2 simultaneously, and accept if either of the simulations accept. Remark: We can not “rewind the input tape” to try the simulation on M2 .
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
17 / 100
Finite Automata
Closeness under Union Operation (cont.) Proof Let M1 recognize A1 , where M1 = (Q1 , Σ, δ1 , q1 , F1 ), and M2 recognize A2 , where M2 = (Q2 , Σ, δ2 , q2 , F2 ). Construct M to recognize A1 ∪ A2 , where M = (Q, Σ, δ, q0 , F ). 1
Q = {(r1 , r2 )|r1 ∈ Q1 and r2 ∈ Q2 }. This set is the Cartesian product of sets Q1 and Q2 and is written Q1 × Q2 .
2
Σ is the same as in M1 and M2 .
3
δ((r1 , r2 ), a) = (δ1 (r1 , a), δ2 (r2 , a)).
4
q0 = (q1 , q2 ).
5
F = {(r1 , r2 )|r1 ∈ F1 or r2 ∈ F2 }.
Correctness is straightforward. D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
18 / 100
Finite Automata
Closeness under Union Operation (cont.)
Remarks on the proof: The last expression, F = {(r1 , r2 )|r1 ∈ F1 or r2 ∈ F2 }, is the same as F = (F1 × Q2 ) ∪ (Q1 × F2 ), which is not the same as F = F1 × F2 obviously. F = F1 × F2 proves closure for intersection.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
19 / 100
Nondeterminism
Table of Contents
1
Finite Automata
2
Nondeterminism
3
Regular Expressions
4
Non-Regular Languages
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
20 / 100
Nondeterminism
Motivation for Nondeterminism
If we wanted to prove closure under the concatenation operator ◦ for two regular languages A1 and A2 (i.e., A1 ◦ A2 ), then a main problem is that for a given w ∈ A1 ◦ A2 we do not know where to break w in w1 , w2 words such that w1 ∈ A1 , w2 ∈ A2 and w = w1 ◦ w2 . To solve this problem, we introduce nondeterminism. Thus we have: DFA: Deterministic Finite Automaton NFA: Nondeterministic Finite Automaton Note: The plural for Automaton is Automata or Automatons.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
21 / 100
Nondeterminism
Difference between DFAs and NFAs NFAs: may have zero, one, or many arrows exiting from each state for each alphabet symbol, may have zero, one, or many arrows exiting from each state with the label ε. Example:
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
22 / 100
Nondeterminism
How do NFAs compute? We reach a state with multiple ways to proceed. ⇒ the machine (automaton) splits into multiple copies of itself and follows all the possibilities in parallel (each copy takes one of the possible ways to proceed). We reach a state where no branch of the computation exiting from it can compute the current symbol ⇒ the particular copy of the machine dies (along with the branch of the computation associated with it. If any of the copies of the machine is in an accept state at the end of the input, the NFA accepts the input string. States with an ε symbol on an exiting arrow is encountered ⇒ splits into multiple copies.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
23 / 100
Nondeterminism
Summing up Nondeterminism may be seen: – as a parallel computation with multiple (independent) “processes” running in parallel, – as a tree of possibilities.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
24 / 100
Nondeterminism
Example
Example 7 (Computation by automaton) Input: 010110 is given to the following automaton. What is the computation that takes place?
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
25 / 100
Nondeterminism
Example (cont.)
The computation of the automaton above on input 010110 D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
26 / 100
Nondeterminism
Example Example 8 (Automaton design) Let A be the language consisting of all strings over {0, 1} containing a 1 in the third position from the end (e.g., 000100 ∈ A but 0011 ∈ / A). Design an automaton that recognizes A and describe it. The following NFA recognizes A.
Idea It stays in q1 until it “guesses” that it is three places from the end. If the input symbol is 1, it branches to q2 and uses q3 and q4 to “check” on whether its guess was correct. D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
27 / 100
Nondeterminism
Example (cont.) Example 9 (Conversion of NFA to DFA) Convert the above NFA into an equivalent DFA.
This is the smallest DFA for language A, containing 8 states. D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
28 / 100
Nondeterminism
Nondeterministic Finite Automaton Definition A nondeterministic finite automaton is a 5-tuple (Q, Σ, δ, q0 , F ), where 1
Q is a finite set of states,
2
Σ is a finite alphabet,
3
δ : Q × Σε → P(Q) is the transition function,
4
q0 ∈ Q is the start state, and
5
F ⊆ Q is the set of accept states.
Note: Σε = Σ ∪ {ε} P(Q) = powerset of Q
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
29 / 100
Nondeterminism
Example
Example 10 Recall the NFA below that accepts all strings that contain either 101 or 11 as a substring. Provide its formal description.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
30 / 100
Nondeterminism
Example (cont.) The formal description of the above NFA is (Q, Σ, δ, q1 , F ), where Q = {q1 , q2 , q3 , q4 }, Σ = {0, 1}, δ is given as q1 q2 q3 q4
0 {q1 } {q3 } ∅ {q4 }
1 {q1 , q2 } ∅ {q4 } {q4 }
ε ∅ {q3 } ∅ ∅
q1 is the start state, F = {q4 }.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
31 / 100
Nondeterminism
NFA accepts w
We say that an NFA N accepts w if we can write w as w = y1 y2 · · · ym , where each yi is a member of Σε and a sequence of states r0 , r1 , … , rm exists in Q with three conditions: 1
r0 = q0 ,
2
ri+1 ∈ δ(ri , yi+1 ), for i = 0, … , m − 1, and
3
rm ∈ F .
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
32 / 100
Nondeterminism
Equivalence of NFAs and DFAs
The equivalence between NFAs and DFAs is: Surprising because NFAs appear to have more power than DFAs, Useful because NFAs are usually much easier to describe than DFAs for a given language. We say that two machines are equivalent if they recognize the same language.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
33 / 100
Nondeterminism
Equivalence of NFAs and DFAs
Theorem 11 Every nondeterministic finite automaton has an equivalent deterministic finite automaton. Proof idea Construct a DFA M that simulates the NFA N that recognizes some language A. We need to keep track of the set of states that are active during the computation performed by N. ⇒ If N has k states ⇒ the DFA will have 2k states (one for each subset of the k states).
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
34 / 100
Nondeterminism
Equivalence of NFAs and DFAs Proof Let N = (Q, Σ, δ, q0 , F ) be the NFA recognizing some language A. Construct DFA M = (Q ′ , Σ, δ ′ , q0′ , F ′ ) recognizing A. (Assume for now that N has no ε arrows.) 1 2
Q ′ = P(Q). For R ∈ Q ′ and a ∈ Σ, let δ ′ (R, a) = {q ∈ Q|q ∈ δ(r, a) for some r ∈ R}. Another way S of writing it down: ′ δ (R, a) = r∈R δ(r, a).
3
q0′ = {q0 }.
4
F ′ = {R ∈ Q ′ |R contains an accept state of N}.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
35 / 100
Nondeterminism
Equivalence of NFAs and DFAs
Proof (cont.) For R ⊆ Q let Λ(R) = {q ∈ Q | q can be reached from R by traveling along 0 or more ε arrows}. Thus, δ ′ (R, a) = {q ∈ Q|q ∈ Λ(δ(r, a)) for some r ∈ R}. Additionally, q0′ = Λ({q0 }) are all the possible states that can be reached from the start state of N (i.e., q0 ) after following ε arrows (and not reading yet the input). The construction of M is correct and thus we can simulate the NFA N.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
36 / 100
Nondeterminism
Equivalence of NFAs and DFAs
Corollary 12 A language is regular if and only if some nondeterministic finite automaton recognizes it.
(Discuss both directions of “if and only if” by recalling the definition and the previous theorem.)
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
37 / 100
Nondeterminism
Example
Example 13 (Converting an NFA to a DFA) Convert the NFA below to a DFA:
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
38 / 100
Nondeterminism
Example (cont.) Solution:
The ∅ state performs the garbage collection.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
39 / 100
Nondeterminism
Example (cont.)
We can simplify the DFA by eliminating states {1} and {1, 2}:
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
40 / 100
Nondeterminism
Closure under the Regular Operations: Union
Theorem 14 The class of regular languages is closed under the union operation. Proof idea. Create an NFA N from N1 and N2 that recognize the languages A1 and A2 respectively.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
41 / 100
Nondeterminism
Closure under the Regular Operations: Union (cont.)
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
42 / 100
Nondeterminism
Closure under the Regular Operations: Union (cont.) Proof. Let N1 = (Q1 , Σ, δ1 , q1 , F1 ) recognize A1 , and N2 = (Q2 , Σ, δ2 , q2 , F2 ) recognize A2 . Construct N = (Q, Σ, δ, q0 , F ) to recognize A1 ∪ A2 . 1
Q = {q0 } ∪ Q1 ∪ Q2 .
2
q0 is the start state of N.
3
F = F1 ∪ F2 .
4
Define δ so that for any q ∈ Q and any a ∈ Σε : δ1 (q, a), q ∈ Q1 δ (q, a), q ∈ Q 2 2 δ(q, a) = {q1 , q2 }, q = q0 and a = ε ∅, q = q0 and a ̸= ε. D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
43 / 100
Nondeterminism
Closure under the Regular Operations: Concatenation
Theorem 15 The class of regular languages is closed under concatenation. Proof idea. Nondeterministically guess where to make the split. (In the figure below, the originally terminal states in N1 get connected to the start state of N2 .)
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
44 / 100
Nondeterminism
Closure under the Regular Operations: Concatenation
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
45 / 100
Nondeterminism
Closure under the Regular Operations: Concatenation Proof. Let N1 = (Q1 , Σ, δ1 , q1 , F1 ) recognize A1 , and N2 = (Q2 , Σ, δ2 , q2 , F2 ) recognize A2 . Construct N = (Q, Σ, δ, q1 , F2 ) to recognize A1 ◦ A2 . 1
Q = Q1 ∪ Q2 .
2
q1 is the start state of N as well.
3
Accept states for N are those of N2 so F2 .
4
Define δ so that for any q ∈ Q and any a ∈ Σε : δ1 (q, a), q ∈ Q1 and q ∈ / F1 δ (q, a), q ∈ F1 and a ̸= ε 1 δ(q, a) = δ1 (q, a) ∪ {q2 }, q ∈ F1 and a = ε δ (q, a), q ∈ Q2 . 2 D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
46 / 100
Nondeterminism
Closure under the Regular Operations: Star
Theorem 16 The class of regular languages is closed under the star operation. Proof idea. Construct an NFA N that accepts its input whenever it can be broken into several pieces and N1 accepts each such piece. Note: It is a bad idea to make accepting the original starting state of N1 .
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
47 / 100
Nondeterminism
Closure under the Regular Operations: Star
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
48 / 100
Nondeterminism
Closure under the Regular Operations: Star Proof. Let N1 = (Q1 , Σ, δ1 , q1 , F1 ) recognize A1 . Construct N = (Q, Σ, δ, q0 , F ) to recognize A∗1 . 1
Q = q0 ∪ Q1 .
2
q0 is the new start state.
3
F = {q0 } ∪ F1 .
4
Define δ so that for any q ∈ Q and any a ∈ Σε : δ1 (q, a), q ∈ Q1 and q ∈ / F1 q ∈ F1 and a ̸= ε δ1 (q, a), δ(q, a) = δ1 (q, a) ∪ {q1 }, q ∈ F1 and a = ε {q1 }, q = q0 and a = ε ∅, q = q0 and a ̸= ε. D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
49 / 100
Regular Expressions
Table of Contents
1
Finite Automata
2
Nondeterminism
3
Regular Expressions
4
Non-Regular Languages
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
50 / 100
Regular Expressions
Introduction to Regular Expressions In arithmetic, we can use the operations + and × to build up expressions such as (5 + 3) × 4. Similarly, we can use the regular operations to build up expressions describing languages, which are called regular expressions. An example is: (0 + 1)0∗ . The value of this expression is the language. (0 + 1) means ({0} ∪ {1}) which is {0, 1} 0∗ means {0}∗ (0 + 1)0∗ is shorthand for (0 + 1) ◦ 0∗ (i.e., usually drop/imply the concatenation symbol)
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
51 / 100
Regular Expressions
Examples of Regular Expressions
Example 17 (Simple Regular Expressions) (0 + 1)∗ : all possible binary strings (i.e., strings of 0s and 1s). Let Σ = {0, 1}. Then Σ is a shorthand for the regular expression (0 + 1), i.e., Σ is all the possible strings of one character (length 1). Σ∗ 1: all strings that end with a 1. (0Σ∗ ) + (Σ∗ 1): strings that start with a 0 or end with a 1.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
52 / 100
Regular Expressions
Regular Expressions
Definition 18 (Regular expression) We say that R is a regular expression if R is 1
a for some a ∈ Σ,
2
ε,
3
∅,
4
(R1 + R2 ), where R1 and R2 are regular expressions,
5
(R1 ◦ R2 ), where R1 and R2 are regular expressions, or
6
(R1∗ ), where R1 is a regular expression.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
53 / 100
Regular Expressions
Regular Expressions
Some remarks on the definition: L(R) is the language of a regular expression R. In (1) and (2), a and ε represent the languages {a} and {ε} respectively. (2) is different from (3): (2) is a language with one string: {ε} (3) is the empty language: {}
When there are no parentheses, evaluate in the precedence order: star, concatenation, union. R+ = RR∗ . In other words, R+ + ε = R∗ .
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
54 / 100
Regular Expressions
Identities for Regular Expressions
R + ∅ = R. R ◦ ε = R (attaching the empty string to any string, will not change the string). In principle R + ε ̸= R. For example, if R = 0, then L(R) = {0} but L(R + ε) = {ε, 0}. R ◦ ∅ ̸= R. For example, if R = 0, then L(R) = {0} but L(R ◦ ∅) = L(∅) = ∅. (Concatenating the empty set to any set yields the empty set.)
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
55 / 100
Regular Expressions
Equivalence with Finite Automata
Theorem 19 A language is regular if and only if some regular expression describes it. Lemma 20 (‘⇐’ direction of Thm 19) If a language is described by a regular expression, then it is regular. Proof idea. Show how to convert R into an NFA recognizing A.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
56 / 100
Regular Expressions
Equivalence with Finite Automata (cont.) Proof. Let R be a regular expression. Let’s convert R into an NFA N. We consider the six cases in the formal definition of regular expressions. 1
R = a for some a ∈ Σ. Then L(R) = {a}, and the following NFA recognizes L(R).
Formally, N = ({q1 , q2 }, Σ, δ, q1 , {q2 }), where δ(q1 , a) = {q2 } and δ(r, b) = ∅ for r ̸= q1 or b ̸= a. 2
R = ε. Then L(R) = {ε}, and the following NFA recognizes L(R).
(δ(r, b) = ∅ for any r and b) D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
57 / 100
Regular Expressions
Equivalence with Finite Automata (cont.) Proof (cont.) 3
R = ∅. Then L(R) = {∅}, and the following NFA recognizes L(R).
(δ(r, b) = ∅ for any r and b) 4
R = R1 + R2 .
5
R = R1 ◦ R2 .
6
R = R1∗ .
For the last three cases, we use the fact that regular languages are closed under the regular operations (and we use the constructions given in the corresponding proofs). In other words, we construct the NFA for R from the NFAs for R1 and R2 . D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
58 / 100
Regular Expressions
Example Example 21 Convert the regular expression (ab + a)∗ to an NFA in a sequence of stages.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
59 / 100
Regular Expressions
Example Example 22 Convert the regular expression (a + b)∗ aba to an NFA.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
60 / 100
Regular Expressions
Generalized Nondeterministic Finite Automata Definition A Generalized Nondeterministic Finite Automaton (GNFA) is an NFA wherein the transition arrows may have any regular expression as labels (instead of only members of the alphabet or ε). The GNFA reads blocks of symbols from the input, not necessarily just one symbol at a time (as in an ordinary NFA). The GNFA moves along a transition arrow connecting two states by reading a block of symbols from the input; that string is a string that corresponds to the regular expression on that arrow. A GNFA, being nondeterministic, may have several different ways to process the same input string.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
61 / 100
Regular Expressions
Example Example 23 (A generalized nondeterministic finite automaton)
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
62 / 100
Regular Expressions
Conditions for GNFAs
For convenience, we require that GNFAs always have a special form that meets the following conditions: (i) The start state has transition arrows going to every other state, but no arrows coming in from any other state. (ii) There is only a single accept state, and it has: arrows coming in from every other state no arrows going to any other state qaccept ̸= qstart
(iii) Except for qstart and qaccept , for every state: one arrow goes from it to every other state, one arrow goes from it to itself.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
63 / 100
Regular Expressions
Conversion of a DFA into a GNFA
Add a new start state with an ε arrow to the old start state. Add a new accept state with ε arrows coming in from every old accept state. Arrows with multiple labels are replaced each with a single arrow whose label is the union of the previous labels. Add arrows labeled ∅ between states that had no arrows. Note: ∅ does not recognize any string; not even ε. ⇒ Such arrows can never be used.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
64 / 100
Regular Expressions
Constructing an equivalent GNFA with one fewer state starting with k > 2 states Select a state, other than start or accept, rip it out of the machine, repair the remainder so that the same language is still recognized. Call the removed state qrip . (This state exists because k > 2.) New labels on arrows should compensate for the absence of qrip by adding back the lost computations.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
65 / 100
Regular Expressions
Constructing an equivalent GNFA with one fewer state starting with k > 2 states (cont.)
So, ultimately, we can construct a GNFA with just two states: starting and accept. The label on the arrow connecting these two states is the regular expression that the original GNFA was accepting.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
66 / 100
Regular Expressions
Generalized nondeterministic finite automaton (GNFA)
Definition A generalized nondeterministic finite automaton (GNFA) is a 5-tuple, (Q, Σ, δ, qstart , qaccept ), where: 1
Q is the finite set of states,
2
Σ is the input alphabet,
3
δ : (Q \ {qaccept }) × (Q \ {qstart }) → R is the transition function, (R is the collection of all regular expressions over the alphabet Σ)
4
qstart is the start state, and
5
qaccept is the accept state.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
67 / 100
Regular Expressions
Computation of a GNFA
Definition A GNFA accepts a string w in Σ∗ if w = w1 w2 · · · wk , where each wi is in Σ∗ and a sequence of states q0 , q1 , … , qk exists such that 1
q0 = qstart is the start state,
2
qk = qaccept is the accept state, and
3
for each i, we have wi ∈ L(Ri ), where Ri = δ(qi−1 , qi ) (in other words, Ri is the expression on the arrow from qi−1 to qi ).
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
68 / 100
Regular Expressions
Regular Languages
Now let’s turn to the other direction of the proof of Theorem 19. Lemma 24 If a language is regular, then it is described by a regular expression. Proof idea Let A be a regular language. Because A is regular, a DFA accepts it. We convert DFAs into equivalent regular expressions. First we convert DFAs into GNFAs, and then GNFAs into regular expressions.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
69 / 100
Regular Expressions
Regular Languages
Proof. Let M be the DFA for language A. Create a GNFA G starting from M (as discussed earlier). Use the procedure CONVERT(G), which takes as input a GNFA G, and outputs the equivalent regular expression.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
70 / 100
Regular Expressions
Regular Languages Proof (cont.) CONVERT(G): 1
Let k be the number of states of G.
2
If k = 2, then G has a start state and an accept state, and a single arrow connecting them, labeled with a regular expression R. Return the expression R.
3
If k > 2, select any state qrip ∈ Q \ {qstart , qaccept }. Let G′ be the GNFA (Q ′ , Σ, δ ′ , qstart , qaccept ), where Q ′ = Q \ {qrip }, and for any qi ∈ Q ′ \ {qaccept } and any qj ∈ Q ′ \ {qstart }, let δ ′ (qi , qj ) = (R1 )(R2 )∗ (R3 ) + (R4 ), for R1 = δ(qi , qrip ), R2 = δ(qrip , qrip ), R3 = δ(qrip , qj ), and R4 = δ(qi , qj ).
4
Return CONVERT(G′ ).
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
71 / 100
Regular Expressions
Regular Languages
Claim 72 For any GNFA G, CONVERT(G) is equivalent to G. Proof By induction on k, the number of states of the GNFA. Basis (k = 2): G can only have a single arrow connecting the start state with the accept state. The regular expression label R on this arrow describes all strings that allow G to get to the accept state. Hence this expression is equivalent to G.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
72 / 100
Regular Expressions
Regular Languages Proof (cont.) Induction step: Assume that the claim is true for k − 1 states and use this assumption to prove that the claim is true for k states. First show that G (k states) and G′ (k − 1 states) recognize the same language. Suppose that G accepts an input w. In an accepting computation, G enters the sequence of states: σ = (qstart , q1 , q2 , q3 , … , qaccept ). If qrip does not appear in σ, then G′ also accepts w. The reason is that each of the new regular expressions labeling the arrows of G′ contains the old regular expression as part of a union. If qrip does appear in σ, removing each run of consecutive qrip states forms an accepting computation for G′ . The states qi and qj bracketing a run have a new regular expression on the arrow between them that describes all strings taking qi to qj via qrip on G. So G′ accepts w. D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
73 / 100
Regular Expressions
Regular Languages Proof (cont.) Conversely, suppose that G′ accepts an input w. An arrow between any two states qi and qj in G′ describes the strings taking qi to qj in G, either directly or via qrip . So G must also accept the input w. ⇒ Thus G and G′ are equivalent. The induction hypothesis states that when the algorithm calls itself recursively on input G′ , the result is a regular expression that is equivalent to G′ (since G′ has k − 1 states). Hence this regular expression also is equivalent to G, and the algorithm is proved correct. This concludes the proof of Claim 72, Lemma 24, and Theorem 19. D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
74 / 100
Regular Expressions
Example Example 25 Convert the DFA below into a regular expression, using the preceding algorithm.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
75 / 100
Regular Expressions
Example (cont.) Add a new start state (s) and a new accept state (a):
We avoid drawing arrows with labels ∅, even though the arrows are there. D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
76 / 100
Regular Expressions
Example (cont.) Remove state 2: Note that the label b(a + b)∗ from 1 to a is consistent with CONVERT(G): R1 = b R2 = a + b R3 = ε R4 = ∅ so we get (b)(a + b)∗ (ε) = b(a + b)∗
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
77 / 100
Regular Expressions
Example (cont.) Remove state 1:
Follow the same procedure. ⇒ The label on this last arrow joining them is the regular expression that is equivalent to the original DFA.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
78 / 100
Regular Expressions
Example Example 26 Convert the DFA below into a regular expression, using the preceding algorithm.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
79 / 100
Regular Expressions
Example
Introduce states s and a:
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
80 / 100
Regular Expressions
Example (cont.) Remove state 1:
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
81 / 100
Regular Expressions
Example (cont.) Remove state 2:
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
82 / 100
Regular Expressions
Example (cont.)
Remove state 3:
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
83 / 100
Non-Regular Languages
Table of Contents
1
Finite Automata
2
Nondeterminism
3
Regular Expressions
4
Non-Regular Languages
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
84 / 100
Non-Regular Languages
Non-Regular Languages
Certain languages cannot be recognized by any finite automaton. Consider the language B = {0n 1n |n ≥ 0}. Issue: The machine seems to need to remember how many 0s have been seen so far as it reads the input. ⇒ Because the number of 0s isn’t limited ⇒ the machine will need to keep track of an unlimited number of possibilities. ⇒ Not possible with a finite number of states. (Not a valid argument in general; see below.)
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
85 / 100
Non-Regular Languages
Non-Regular Languages
Consider C = {w|w has an equal number of 0s and 1s}, and D = {w|w has an equal number of occurrences of 01 and 10 as substrings}. At first glance, a recognizing machine appears to need to count in each case, and therefore neither language appears to be regular. As expected, C is not regular, but surprisingly D is regular! We need a formal technique to be able to prove non-regularity.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
86 / 100
Non-Regular Languages
Pumping Lemma for Regular Languages
All regular languages have a special property. ⇒ If a language does not have this property, then it is guaranteed that it is not regular. Property: All strings in the language can be “pumped” if they are at least as long as a certain special value, called the pumping length. (That means each such string contains a section that can be repeated any number of times with the resulting string remaining in the language.)
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
87 / 100
Non-Regular Languages
Pumping Lemma Theorem 27 (Pumping Lemma) If A is a regular language, then there is a number p (the pumping length) where, if s is any string in A of length at least p, then s may be divided into three pieces, s = xyz, satisfying the following conditions: 1
for each i ≥ 0, xy i z ∈ A,
2
|y| > 0, and
3
|xy| ≤ p.
Recall: |s|: the length of string s, y i : i copies of y are concatenated together (hence y 0 = ε). D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
88 / 100
Non-Regular Languages
Pumping Lemma
Remarks on the Pumping Lemma: Either x or z may be ε, but y ̸= ε (due to (2)). Theorem trivially true without condition (2). Condition (3) is an extra technical condition that might be useful when proving certain languages to be nonregular.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
89 / 100
Non-Regular Languages
Pumping Lemma
Proof idea of the Pumping Lemma. Let the pumping length p be the number of states of M. Let n be the length of input s, such that n ≥ p. This means that s has n symbols ⇒ dictates n transitions ⇒ If we write down the states visited by s, we will have a repetition. Here is why (pigeonhole principle):
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
90 / 100
Non-Regular Languages
Pumping Lemma
It is s = xyz, so: M also accepts xy i z, for any i > 0, as well as xy 0 z = xz, |y| > 0, satisfying condition (2), The first p + 1 states in the sequence must contain a repetition (by pigeonhole principle). Therefore, |xy| ≤ p. D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
91 / 100
Non-Regular Languages
Pumping Lemma Proof. Let M = (Q, Σ, δ, q1 , F ) be a DFA recognizing A and p be the number of states of M. Let s = s1 s2 · · · sn be a string in A of length n, where n ≥ p. Let r1 , … , rn+1 be the sequence of states that M enters while processing s, so ri+1 = δ(ri , si ), for 1 ≤ i ≤ n. This sequence has length n + 1 ≥ p + 1 ⇒ Among the first p + 1 states in the sequence, at least two must be the same (by pigeonhole principle). Call rj the first of these (j ̸= l) and rl the second (l ≤ p + 1). Now let x = s1 · · · sj−1 , y = sj · · · sl−1 , and z = sl · · · sn . Since x takes M from r1 to rj , y takes M from rj to rj , and z takes M from rj to rn+1 , which is an accept state, M must accept xy i z for i ≥ 0. We know that j ̸= l, so |y| > 0; and l ≤ p + 1, so |xy| ≤ p. Thus we have satisfied all conditions of the pumping lemma. D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
92 / 100
Non-Regular Languages
Use of Pumping Lemma (to show non-regularity)
Steps: 1
Assume a language A is regular, in order to obtain a contradiction, with pumping length p such that all strings of length p or greater in A can be pumped.
2
Find a string s ∈ A such that |s| ≥ p but s cannot be pumped, no matter how we split s into xyz.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
93 / 100
Non-Regular Languages
Example Example 28 Let B = {0n 1n |n ≥ 0}. Use the pumping lemma to prove that B is not regular. Proof. Assume to the contrary that B is regular. Let p be the pumping length given by the pumping lemma. Consider s = 0p 1p . (It is s = xyz, with |y| > 0.) We consider 3 cases to show that this result is impossible. If string y consists only of 0s, then xyyz has more 0s than 1s and thus xyyz ∈ / B, violating condition (1) of pumping lemma ⇒ contradiction. If string y consists only of 1s ⇒ contradiction as above. If string y consists of both 0s and 1s, then xyyz may have the same number of 0s and 1s, but some 1s will appear before the 0s, so xyyz ∈ /B again ⇒ contradiction. D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
94 / 100
Non-Regular Languages
Example (cont.)
Note: We could have avoided cases 2 and 3 by applying condition (3) of the pumping lemma that requires that |xy| ≤ p, thus simplifying the above argument.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
95 / 100
Non-Regular Languages
Example Example 29 Let C = {w|w has an equal number of 0s and 1s}. Use the pumping lemma to prove that C is not regular. Proof. The proof is by contradiction. Consider s = 0p 1p . By condition (3) of the pumping lemma |xy| ≤ p ⇒ So y contains only 0s ⇒ So xyyz ∈ / C ⇒ So s cannot be pumped. The string selection is important. For example, s′ = (01)p can be pumped, even taking condition (3) into account. Can you see how to pump it? One way to do so is by setting: x = ε, y = 01, and z = (01)p−1 . Then xy i z ∈ C for every i. D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
96 / 100
Non-Regular Languages
Example (cont.)
Alternative proof. Assume C is regular. The language 0∗ 1∗ is also regular. Then C ∩ 0∗ 1∗ also has to be regular (due to closure under intersection). But C ∩ (0∗ 1∗ ) = B, and we know that B is nonregular from Example 28.
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
97 / 100
Non-Regular Languages
Example Example 30 Let F = {ww|w ∈ {0, 1}∗ }. Show that F is nonregular, using the pumping lemma. Proof. Assume to the contrary that F is regular. Consider s = 0p 10p 1. Because s ∈ F and |s| > p, the pumping lemma guarantees that s can be split into three pieces, s = xyz, satisfying the three conditions of the lemma, hence |xy| ≤ p. We show that this outcome is impossible. With condition (3) the proof follows because y must consist only of 0s, and then xyyz ∈ F . D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
98 / 100
Non-Regular Languages
Example Example 31 2
Let D = {1n |n ≥ 0}. Use the pumping lemma to prove that D is not regular. Proof. The proof is by contradiction. Assume to the contrary that D is regular. 2 Let s be the string 1p . Now consider the two strings xyz and xy 2 z. The lengths of these strings differ by at most p (i.e., the length of y). By condition (3) of the pumping lemma, |xy| ≤ p and thus |y| ≤ p. We have |xyz| = p2 and so |xyyz| ≤ p2 + p < p2 + 2p + 1 = (p + 1)2 . Moreover, condition (2) implies that |y| > 0 and so |xy 2 z| > p2 . Therefore, the length of p2 < |xyyz| < (p + 1)2 . So we arrive at the contradiction xyyz ∈ / D and conclude that D is not regular. D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
99 / 100
Non-Regular Languages
Example
Example 32 Let E = {0i 1j |i > j}. Use the pumping lemma to prove that E is not regular. Proof. The proof is by contradiction. Assume that E is regular. Let s = 0p+1 1p . We reach a contradiction by pumping down. The pumping lemma states that xy i z ∈ E even when i = 0, so the string xy 0 z = xz should also be in E. But |xy| ≤ p so y only has 0s. Thus removing string y decreases the number of 0s in s, and xz cannot have more 0s than 1s. (Recall that s has just one more 0 than 1.)
D. Diochnos (OU - CS)
Theory of Computation: Regular Languages
100 / 100