Nondeterministic Finite
Automata (NFA)
Copyright By PowCoder代写 加微信 powcoder
Previous Lecture
Scanner: converts a sequence of characters to
a sequence of tokens
Scanner implemented using FSMs
FSM: DFA or NFA
This Lecture
NFAs from a formal perspective
Theorem: NFAs and DFAs are equivalent
Regular languages and Regular expressions
Creating a Scanner
Scanner Generator
NFAs, formally
finite set of states
the alphabet
(characters) start state
final states
transition function 0 1
s1 {s1} {s1,
To check if string is in L(M) of NFA M, simulate
set of choices it could make
1 1 1
s1 s2 st st
s1 s1 s2 st
s1 s1 s1 s2
s1 s1 s1 s1
At least one sequence of transitions that:
Consumes all input (without getting stuck)
Ends in one of the final states
NFA and DFA are Equivalent
Two automata M and M’ are equivalent iff L(M) = L(M’)
Lemmas to be proven
Lemma 1: Given a DFA M, one can construct an NFA
M’ that recognizes the same language as M, i.e., L(M’)
Lemma 2: Given an NFA M, one can construct a DFA
M’ that recognizes the same language as M, i.e., L(M’)
Proving Lemma 2
Lemma 2: Given an NFA M, one can construct a DFA M’
that recognizes the same language as M, i.e., L(M’) =
Part 1: Given an NFA M without ε–transitions, one can
construct a DFA M’ that recognizes the same language as M
Part 2: Given an NFA M with ε–transitions, one can construct an
NFA M’ without ε–transitions that recognizes the same
language as M
NFA w/ ε NFA w/o ε DFA
Part 2 Part 1
NFA w/o ε–Transitions to
NFA M to DFA M’
Intuition: Use a single state in M’ to simulate
sets of states in M
M has |Q| states
M’ can have only up to 2|Q| states
Defn: let succ(s,c) be the set of choices the NFA
could make in state s with character c
succ(A,x) = {A,B}
succ(A,y) = {A}
succ(B,x) = {C}
succ(B,y) = {C}
succ(C,x) = {D}
succ(C,y) = {D}
A {A, B} {A}
NFA w/o ε–Transitions to
Build new DFA M’ where Q’ = 2Q
succ(A,x) = {A,B}
succ(A,y) = {A}
succ(B,x) = {C}
succ(B,y) = {C}
succ(C,x) = {D}
succ(C,y) = {D}
To build DFA: Add an edge from state S on character c to state S’
if S’ represents the set of all states that a state in
S could possibly transition to on input c
A {A, B} {A}
Proving Lemma 2
Lemma 2: Given an NFA M, one can construct a DFA M’
that recognizes the same language as M, i.e., L(M’) =
Part 1: Given an NFA M without ε–transitions, one can
construct a DFA M’ that recognizes the same language as M
Part 2: Given an NFA M with ε–transitions, one can construct an
NFA M’ without ε–transitions that recognizes the same
language as M
ɛ-transitions
E.g.: xn, where n is even or divisible by 3
Useful for taking union of two FSMs
In example, left side accepts even n;
right side accepts n divisible by 3
Eliminating ɛ-transitions
Definition: Epsilon Closure
eclose(s) = set of all states reachable from s using
zero or more epsilon transitions
We want to construct ɛ-free NFA M’ that is equivalent to M
P {P, Q, R}
Proving Lemma 2
Lemma 2: Given an NFA M, one can construct a DFA M’
that recognizes the same language as M, i.e., L(M’) =
Part 1: Given an NFA M without ε–transitions, one can
construct a DFA M’ that recognizes the same language as M
Part 2: Given an NFA M with ε–transitions, one can construct an
NFA M’ without ε–transitions that recognizes the same
language as M
Summary of FSMs
DFAs and NFAs are equivalent
An NFA can be converted into a DFA, which can be
implemented via the table-driven approach
ɛ-transitions do not add expressiveness to NFAs
Algorithm to remove ɛ-transitions
Regular Languages and
Regular Expressions
Regular Language
Any language recognized by an FSM is a
regular language
• Single-line comments beginning with //
• Integer literals
• {ε, ab, abab, ababab, abababab, …. }
• C/C++ identifiers
Regular Expression
A pattern that defines a regular language
Regular language: set of (potentially infinite) strings
Regular expression: represents a set of (potentially
infinite) strings by a single pattern
{ε, ab, abab, ababab, abababab, …. } ⇔ (ab)*
Why do we need them?
Each token in a programming language can be
defined by a regular language
Scanner-generator input: one regular
expression for each token to be recognized by
Regular expressions are inputs to a scanner
Regular Expression
operands: single characters, epsilon
operators: from low to high precedence
“or”: a | b
“followed by”: a.b, ab
“Kleene star”: a* (0 or more a-s)
Regular Expression
Conventions:
aa is a . a
letter is a|b|c|d|…|y|z|A|B|…|Z
digit is 0|1|2|…|9
not(x) all characters except x
. is any character
parentheses for grouping, e.g., (ab)* is {ɛ, ab, abab, ababab, … }
Regexp, example
Precedence: * > . > |
digit | letter letter
(digit) | (letter . letter)
one digit, or two letters
digit | letter letter*
(digit) | (letter . (letter)*)
one digit, or one or more letters
digit | letter+
Regexp, example
Hex strings
start with 0x or 0X
followed by one or more hexadecimal digits
optionally end with l or L
0(x|X)hexdigit+(L|l|ɛ)
where hexdigit = digit|a|b|c|d|e|f|A|…|F
Regexp, example
Integer literals: sequence of digits preceded by
optional +/-
Example: -543, +15, 0007
Regular expression
(+|-|ε)digit+
Regexp, example
Single-line comments
Example: // this is a comment
Regular expression
//(not(‘\n’))*’\n’
Regexp, example
C/C++ identifiers: sequence of letters/digits/
underscores; cannot begin with a digit; cannot
end with an underscore
Example: a, _bbb7, cs_536
Regular expression
letter | (letter|_)(letter|digit|_)*(letter|digit)
Regular Languages
Languages recognized/defined by FSMs
Regular Expressions
Single-pattern representations of regular languages
Used for defining tokens in a scanner generator
Creating a Scanner
Scanner Generator
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com