程序代写代做代考 compiler assembly c/c++ Finite-state machines

Finite-state machines

Finite-State Machines (FSMs)
CS 536

Some announcements
P1
TA office hours

3

A compiler is a
recognizer of language S (Source)
a translator from S to T (Target)
a program in language H (Host)

For example, gcc: S is C, T is x86, H is C
Last time

Why do we need a compiler?
Processors can execute only binaries (machine-code/assembly programs)
Writing assembly programs will make you lose your mind
Write programs in a nice(ish) high-level language like C; compile to binaries
Last time

5

front end = understand source code S
IR = intermediate representation
back end = map IR to T
Last time

6

front end
back end

Symbol
table
P1
P2
P3
P4, P5
P6
Last time

Special linkage between scanner and parser in most compilers
7
Source Program
lexical analyzer (scanner)
syntax analyzer (parser)
Sequence of characters
Sequence of tokens

Conceptual
organization

syntax analyzer (parser)

next token,
please

source code
a < = p lexical analyzer (scanner) The scanner Translates sequence of chars into a sequence of tokens (ignoring whitespace) Each time the scanner is called it should: find the longest prefix (lexeme) of the remaining input that corresponds to a token return that token 8 a = 2 * b + abs(-71) ident (a) asgn int lit (2) times ident (b) plus ident (abs) lparens int lit (-71) rparens How to create a scanner? For every possible lexeme that can occur in source program, return corresponding token Inefficient Error-prone Scanner generator Generates a scanner Inputs: - one regular expression for each token - one regular expressions for each item to ignore (comments, whitespace, etc.) Output: scanner program How does a scanner generator work? - Finite-state machines (FSMs) 10 FSMs: Finite State Machines (A.k.a. finite automata, finite-state automata, etc.) Input: string (sequence of chars) Output: accept / reject i.e., input is legal in language Language defined by an FSM is the set of strings accepted by the FSM 11 Example 1 Language: single line comments with // 12 Nodes are states Edges are transitions Start state has an arrow (only one start state) Final states are double circles (one or more) Example 1 Language: single line comments with // 13 “// this is a comment.” “/ / this is not.” “// \n” “Not // a comment” Example 2 Language: Integer literals with an optional + or – (token: int-lit) e.g., -543, +15, 0007 14 1 ‘+’ ‘-’ 2 3 digit digit digit FSMs, formally 15 finite set of states the alphabet (characters) start state final states transition function ‘+’ ‘-’ digit 1 2 2 3 2 3 3 3 M ≡ L(M) = set of integer literals FSM example, formally 16 anything else, machine is stuck s1 s0 a b c s0 s1 M ≡ What is L(M)? L(M) = {ε, ab, abab, ababab, abababab, …. } Coding an FSM curr_state = start_state done = false while (!done) ch = nextChar() next = table[curr_state][ch] if (next == stuck || ch == EOF) done = true else curr_state = next return final_states.contains(curr_state) && next!=stuck 17 FSM types: DFA & NFA Deterministic no state has >1 outgoing edge with same label
Nondeterministic
states may have multiple outgoing edges with same label
edges may be labelled with special symbol ɛ (empty string)

ɛ-transitions can happen without reading input
18

NFA Example
Language: Integer literals with an optional + or – (token: int-lit)
e.g., -543, +15, 0007

19
1
‘+’
‘-’
2
3

digit
digit
digit
1
‘+’, ‘-’
2
‘ε’
3

digit
digit
A string is accepted by an NFA if there exists a
sequence of transitions leading to a final state

Why NFA?
Simpler and more intuitive than DFA
Language: sequence of 0s and 1s, ending with 00

20

Extra example
A C/C++ identifier is a sequence of one or more letters, digits, or underscores. It cannot start with a digit.
21

Extra Example – Part 1
A C/C++ identifier is a sequence of one or more letters, digits, or underscores. It cannot start with a digit.

22
1
‘_’, letter
2

digit, letter, ‘_’

Extra example
A C/C++ identifier is a sequence of one or more letters, digits, or underscores. It cannot start with a digit.
What if you wanted to add the restriction that it can’t end with an underscore?
23

Extra Example – Part 2
What if you wanted to add the restriction that it can’t end with an underscore?

25
1
‘_’, letter
3

digit, letter, ‘_’
2
digit, letter
letter

Recap
The scanner reads a stream of characters and tokenizes it (i.e., finds tokens)
Tokens are defined using regular expressions, scanners are implemented using FSMs
FSMs can be non-deterministic
Next time: understand connection between DFA and NFA, regular languages and regular expressions
27

automatatutor.com

Loris D’Antoni

Play with automata!