CS计算机代考程序代写 c/c++ compiler assembly Finite-State

Finite-State
Machines (FSMs)
CS 536

Some announcements
P1

TA office hours

3

A compiler is a
recognizer of language S (Source)
a translator from S to T (Target)
a program in language H (Host)

For example, gcc: S is C, T is x86, H is C

Last time

Why do we need a compiler?
• Processors can execute only binaries

(machine-code/assembly programs)

• Writing assembly programs will make you
lose your mind

• Write programs in a nice(ish) high-level
language like C; compile to binaries

Last time

5

front end = understand source code S
IR = intermediate representation
back end = map IR to T

Last time

6

front end

back end

Symbol
table

P1

P2

P3

P4, P5

P6

Last time

Special linkage between scanner
and parser in most compilers

Source
Program

lexical analyzer
(scanner)

syntax analyzer
(parser)

Sequence of characters

Sequence of tokens

Conceptual
organization

syntax analyzer
(parser)

next token,
please

source code
a < = p lexical analyzer (scanner) The scanner Translates sequence of chars into a sequence of tokens (ignoring whitespace) Each time the scanner is called it should: • find the longest prefix (lexeme) of the remaining input that corresponds to a token • return that token 8 a = 2 * b + abs(-71) ident (a) asgn int lit (2) times ident (b) plus ident (abs) lparens int lit (-71) rparens How to create a scanner? • For every possible lexeme that can occur in source program, return corresponding token • Inefficient • Error-prone Scanner generator • Generates a scanner • Inputs: - one regular expression for each token - one regular expressions for each item to ignore (comments, whitespace, etc.) • Output: scanner program • How does a scanner generator work? - Finite-state machines (FSMs) 10 FSMs: Finite State Machines (A.k.a. finite automata, finite-state automata, etc.) Input: string (sequence of chars) Output: accept / reject i.e., input is legal in language Language defined by an FSM is the set of strings accepted by the FSM 11 Example 1 Language: single line comments with // 12 • Nodes are states • Edges are transitions • Start state has an arrow (only one start state) • Final states are double circles (one or more) Example 1 Language: single line comments with // 13 1. “// this is a comment.” 2. “/ / this is not.” 3. “// \n” 4. “Not // a comment” Example 2 Language: Integer literals with an optional + or – (token: int-lit) e.g., -543, +15, 0007 14 1 ‘+’ ‘-’ 2 3 digit digit digit FSMs, formally 15 finite set of states the alphabet (characters) start state final states transition function ‘+’ ‘-’ digit 1 2 2 3 2 3 3 3 M ≡ L(M) = set of integer literals FSM example, formally 16 anything else, machine is stuck s1 s0 a b c s0 s1 M ≡ What is L(M)? L(M) = {ε, ab, abab, ababab, abababab, …. } Coding an FSM curr_state = start_state done = false while (!done) ch = nextChar() next = table[curr_state][ch] if (next == stuck || ch == EOF) done = true else curr_state = next return final_states.contains(curr_state) && next!=stuck 17 FSM types: DFA & NFA Deterministic no state has >1 outgoing edge with same label

Nondeterministic
states may have multiple outgoing edges with same label
edges may be labelled with special symbol ɛ (empty string)

ɛ-transitions can happen without reading input

18

NFA Example
Language: Integer literals with an optional + or –
(token: int-lit)

e.g., -543, +15, 0007

19

1
‘+’

‘-’
2

3

digit
digit

digit

1
‘+’, ‘-’

2
‘ε’

3

digit

digit

A string is accepted by an NFA if there exists a
sequence of transitions leading to a final state

Why NFA?
Simpler and more intuitive than DFA
Language: sequence of 0s and 1s, ending with 00

20

Extra example

A C/C++ identifier is a sequence of one or more
letters, digits, or underscores. It cannot start
with a digit.

21

Extra Example – Part 1
A C/C++ identifier is a sequence of one or more
letters, digits, or underscores. It cannot start
with a digit.

22

1
‘_’, letter

2

digit, letter, ‘_’

Extra example

A C/C++ identifier is a sequence of one or more
letters, digits, or underscores. It cannot start
with a digit.

What if you wanted to add the restriction that it
can’t end with an underscore?

23

Extra Example – Part 2
What if you wanted to add the restriction that it
can’t end with an underscore?

24

1
‘_’, letter

3

digit, letter,
‘_’

2
digit, letter

letter

Recap

The scanner reads a stream of characters and
tokenizes it (i.e., finds tokens)

Tokens are defined using regular expressions,
scanners are implemented using FSMs

FSMs can be non-deterministic

Next time: understand connection between
DFA and NFA, regular languages and regular
expressions

25

automatatutor.com

Loris D’Antoni

Play with automata!

http://automatatutor.com