程序代写代做代考 Fortran Java flex RegExps & DFAs

RegExps & DFAs

RegExps & DFAs
CS 536

Pre-class warm up
Write the regexp for Fortran real literals
An optional sign (‘+’ or ‘-‘)
An integer or:
1 or more digits followed by a ‘.’ followed by 0 or more digits
or: A ‘.’ followed by one or more digits

(‘+’|’-’|ε) (digit+(‘.’|ε) | (digit*’.’digit+))
2

Last time
Explored NFAs
for every NFA there is an equivalent DFA
epsilon edges add no expressive power
Introduce regular languages / expressions
3

Today
Convert regexps to DFAs
From language recognizers to tokenizers
4

Regexp to NFAs
Literals/epsilon correspond to simple DFAs
Operators correspond to methods of joining DFAs
5

x^n, where n is even or divisible by 3

Regexp to NFA rules
Rules for operands
6

Regexp to NFA rules
Rules for alternation A|B
7

Make new start state q’ and new final state f’
Make original final states non-final
Add to δ:
q’,ε → qA
q’,ε → qB
Fa,ε→f’
Fb,ε→f’

Regexp to NFA rules
Rule for catenation A.B
8
Make new start state q’ and new final state f’
Make original final states non-final
Add to δ:
q’,ε → qA
fA,ε → qB
fb,ε→f’

Regexp to NFA rules
Rule for iteration A*
9
Make new start state q’ and new final state f’
Make original final states non-final
Add to δ:
q’,ε → qA
q’,ε→f’
f’,ε→qA

Regexp operator precedence
10
Operator Precedence Analogous math operator
| low addition
. medium multiplication
* high exponentiation

Tree representation of a regexp
11

Operator Precedence
| low
. medium
* high

Bottom-up conversion
12

Bottom-up conversion
13

Bottom-up conversion
14

Bottom-up conversion
15

Bottom-up conversion
16

Bottom-up conversion
17

Bottom-up conversion
18

Bottom-up conversion
19

Bottom-up conversion
20

Regexp to DFAs
We now have an NFA
We need to go to DFA
21

But what’s so great about DFAs?

Table-driven DFAs
Recall that δ can be expressed as a table
This leads to a very efficient array representation
22

s = start state
while (more input){
c = read char
s = table[s][c]
}
if s is final, accept

FSMs for tokenization
FSMs only check for language membership of a string
the scanner needs to recognize a stream of many different tokens using the longest match
the scanner needs to know what was matched
Idea: imbue states with actions that will fire when state is reached
23

A first cut at actions
Consider the language of Pascal identifiers
24

BAD: not longest match
Accounting for longest matches

BAD: maybe we needed that
character

A second take at actions
Give our FSMs ability to put chars back
25

Our first scanner
Consider a language with two statements
assignments: ID = expr
increments: ID += expr
where expr is of the form
ID + ID
ID ^ ID
ID < ID ID <= ID Identifiers ID follow C conventions 26 Combined DFA 27 29 do{ read char perform action / update state if (action was to return a token){ start again in start state } } (while not EOF or stuck); Lexical analyzer generators aka scanner generators The transformation from regexp to scanner is formally defined Can write tools to synthesize a lexer automatically Lex: unix scanner generator Flex: fast lex JLex: Java version of Lex 30 JLex Declarative specification tell it what you want scanned, it will figure out the rest Input: set of regexps + associated actions xyz.jlex file Output: Java source code for a scanner xyz.jlex.java source code of scanner 31 jlex format 3 sections separated by %% user code section directives regular expressions + actions 32 33 Rules section Format is {code} where regex is a regular expression for a single token
can use macros from the directive sections in regex, surround with curly braces
Conventions
chars represent themselves (except special characters)
chars inside “” represent themselves (except \)
Regexp operators
| * + ? () .
Character class operators
– range
^ not
\ escape
34

35