CS106 Compiler Principles and Construction Fall 2011 Lecture 4 Regular Expression
CS106
Compiler Principles and Construction
Fall 2011
Lecture 4 Regular Expression
MUST FIT
Dr. Zhiyao Liang
Regular Expression and NFA Dr. Zhiyao Liang
1
Regular
Languages, Expressions, and Grammars
There are three equivalent ways to say that L is a regular language:
L is a regular language if and only if:
There is a finite automata, say M, such that L=L(M)
There is a regular expression for L (that denotes L)
There is a regular grammar that generates L.
We will study regular expressions and regular grammars.
Regular Expression and NFA Dr. Zhiyao Liang
2
Definition: Regular Expression
Let Σ be a given alphabet. Then
∅ , λ, and a (for each a∈Σ), are regular expressions. These are called primitive regular expressions.
If r1 and r2 are regular expressions, so are
r1+r2 , r1∙r2 , r1* , and (r1).
A string is a regular expression if and only if it can be derived from the primitive regular expressions by a finite number of applications of the rules in 2.
Regular Expression and NFA Dr. Zhiyao Liang
3
Regular Expression
Example: For Σ = {a, b, c},
(a + b ∙ c)*∙(c + ∅) is a regular expression,
(a + b +) is not.
What is the meaning of a regular expression
Every regular expression represents a language.
Why is ( )needed?
To avoid ambiguity.
Regular Expression and NFA Dr. Zhiyao Liang
4
Definition: L(r)
The language L(r) denoted by any regular expression r is defined by the following rules.
∅ => { } /* empty set */
λ => {λ}
a => {a}, for every a ∈ Σ
If r1 and r2 are regular expressions, then
L(r1 + r2 ) = L(r1) ∪ L(r2)
L(r1∙r2 ) = L(r1)L(r2)
L((r1)) = L(r1)
L(r1*) = (L(r1))*
Regular Expression and NFA Dr. Zhiyao Liang
5
Regular Expression …
Precedence rules are used to avoid ambiguity when there is no ( ):
* ∙ +
a+b∙c = a+(b∙c) ¹ (a+b)∙c
a+b* ¹ (a+b)*
r+Æ = r = r∙l ¹ r+l
r1∙r2 is the same as r1r2
In some context, when c and d are two strings, then cd is a single string, not a language.
Example: For S = {a, b, c},
r = (a + b)*(a + bb) is regular
L(r) = L((a + b)*)L((a + bb)) = L((a+b)*) L(a+bb)
If w Î L((a + b)*) and w’ Î L(a + bb), then ww’ Î L(r)
Regular Expression and NFA Dr. Zhiyao Liang
6
Examples
For S = {a, b}, and r = (aa)*(bb)*b
how to denote L(r) by some set notation?
L(r) = {a2nb2m+1 : n ≥ 0, m ≥ 0}
For S = {0, 1}, give a regular expression r such that
L(r) = {w ∈ Σ*: w has at least one pair of consecutive zeros}
r = (0+1)*00(0+1)*
Regular Expression and NFA Dr. Zhiyao Liang
7
Examples
Find a regular expression for the language
L(r) = {w ∈ {0, 1}* : w has no pair of consecutive zeros}.
One answer: (class discussion)
r = (1* + (01))* + (1* + (01))*0
Other answers:
r = (1 + 01)*(0 + λ)
r = (1*01*)*(0 + λ) + 1*(0 + λ )
Maybe an easier way:
make an NFA M for L(r), then translate M to r.
Regular Expression and NFA Dr. Zhiyao Liang
8
Between Regular Expressions and NFA’s
Theorem: Let r be a regular expression. Then there exists some NFA that accepts L(r). Consequently, L(r) is a regular language.
Proof: We can construct an NFA for any r. Since regular expression are defined recursively, we build the NFA with structures, starting from the parts for the primitive regular expressions.
Correctness: r constructed from smaller expressions by operations. For each operation (structural induction) we can show that the constructed NFA represents the expression made by the operation.
Regular Expression and NFA Dr. Zhiyao Liang
9
Regular Expression NFA
The NFA’s for the primitive regular expressions: ∅, λ, and a:
Note that the NFA’s have a uniform style
One final state, and the final state is different from the starting state.
Regular Expression and NFA Dr. Zhiyao Liang
10
A normal form of NFA
For every NFA M, there is an NFA M’, such that L(M) = L(M’), and that in M’
there is only one final state,
and the final state is different from the starting state.
Why?
Add one extra state q’ to M.
In M, if q is final state, then make a λ edge from q to q’.
Make all final states in M non-final, and make q’ final.
The new automata is M’ .
Regular Expression and NFA Dr. Zhiyao Liang
11
The NFA for r1+r2
Regular Expression and NFA Dr. Zhiyao Liang
12
The NFA for r1r2
Regular Expression and NFA Dr. Zhiyao Liang
13
The NFA for r1*
Regular Expression and NFA Dr. Zhiyao Liang
14
NFA RE
Note that each edge of a NFA can be labeled with a regular expression.
λ
a
Label a,b can be represented as a + b
Given a transition graph G for an NFA. We want to make a generalized transition graph, say G’, so that if in G’, from q to q’ there is an edge labeled with a regular expression r, then it must be true that in G, for any string w ∈ L(r), It is true that
δ*(q, w) = q’.
Regular Expression and NFA Dr. Zhiyao Liang
15
The extended transition graph, which is translated from any NFA, has only two different states, the starting state and the final state .
The general form of such a NFA looks like:
Note that if there is no edge from q to q’, then we can consider an edge from q to q’ labeled with ∅.
What is the regular expression for this NFA?
r1*r2(r4 + r3r1*r2)*
r1*r2(r4* + r3r1*r2)*
NFA -> RE
Regular Expression and NFA Dr. Zhiyao Liang
16
NFA RE, the procedure
Starting from the transition graph G of an NFA, note that G is also a extended transition graph.
Make sure the NFA is in the normal form (just discussed): F = {qfinal}, and qfinal≠q0
Choose a a state q (it will be removed), for each q such that q≠q0 and q≠qfinal
Assume q has a self-loop edge labeled with r.
If q has no self-loop edge, we can consider it has one labeled with λ.
For each ordered pair of states (q1, q2) — it is possible q1=q2 — if there is an edge in G from q1 to q labeled with r1, and an edge in G from q to q2 labeled with r2, then
If there is an existing edge from q1 to q2 labeled with r’, change its label to r’+r1r*r2.
Otherwise, make a new edge from q1 to q2, labeled with r’+r1r*r2.
Remove q and all edges connected with q.
When G has only two states, qfinal and q0 , get the RE from G, as showed earlier, that is the RE for the NFA.
Regular Expression and NFA Dr. Zhiyao Liang
17
NFA RE: example
Graph made by jflap
Find a regular expression for the language:
L = {w ∈ {a, b}*: number of a is even and number of b is odd}
Starting
Regular Expression and NFA Dr. Zhiyao Liang
18
Removing the state OE
Regular Expression and NFA Dr. Zhiyao Liang
19
Removing OO
Regular Expression and NFA Dr. Zhiyao Liang
20
Practical Usage of Regular Expressions
Regular expressions are very useful.
Unix editors: emacs, vi
Commands: grep
Compilers, notation and syntax analysis
To check if a string agrees with a regular expression,
A simple Implementation:
Translate the RE to NFA, then to DFA.
Then implement the DFA into a program.
Run the program with the string.
Regular Expression and NFA Dr. Zhiyao Liang
21
Exercises
Find a regular expression for
L = {w ∈ {0, 1}*: w has exactly one pair of consecutive zeros}
Find the NFA for the language {w ∈ {0, 1}* : w has no pair of consecutive zeros}. Then translate the NFA to RE.
Regular Expression and NFA Dr. Zhiyao Liang
22
Exercise Cont.
What is the language accepted by the following generalized transition graph? What is the RE?
Regular Expression and NFA Dr. Zhiyao Liang
23
/docProps/thumbnail.jpeg