程序代写代做代考 Fortran Java go C flex RegExps & DFAs

RegExps & DFAs
CS 536

Pre-class warm up
Write the regexp for Fortran real literals
An optional sign (¡®+¡¯ or ¡®-¡®)
An integer or:
1 or more digits followed by a ¡®.¡¯ followed by 0 or more digits or: A ¡®.¡¯ followed by one or more digits
(¡®+¡¯|¡¯-¡¯|¦Å) (digit+(¡®.¡¯|¦Å) | (digit*¡¯.¡¯digit+))
2

Last time
Explored NFAs
for every NFA there is an equivalent DFA epsilon edges add no expressive power
Introduce regular languages / expressions
3

Today
Convert regexps to DFAs
From language recognizers to tokenizers
4

Regexp to NFAs
Literals/epsilon correspond to simple DFAs
Operators correspond to methods of joining
x^n, where n is even or divisible by 3
DFAs
5

Regexp to NFA rules
Rules for operands
6

Regexp to NFA rules
Rules for alternation A|B
Make new start state q¡¯ and new final state f¡¯
Make original final states non-final
Add to ¦Ä:
q¡¯,¦Å ¡ú qA q¡¯,¦Å ¡ú qB Fa,¦Å¡úf¡¯ Fb,¦Å¡úf¡¯
7

Regexp to NFA rules
Rule for catenation A.B
Make new start state q¡¯ and new final state f¡¯
Make original final states non-final
Add to ¦Ä:
q¡¯,¦Å ¡ú qA fA,¦Å ¡ú qB fb,¦Å¡úf¡¯
8

Regexp to NFA rules
Rule for iteration A*
Make new start state q¡¯ and new final state f¡¯
Make original final states non-final
Add to ¦Ä:
q¡¯,¦Å ¡ú qA q¡¯,¦Å¡úf¡¯ f¡¯,¦Å¡úqA
9

Regexp operator precedence
Operator Precedence
Analogous math operator
|
low
addition
.
medium multiplication
*
high
exponentiation
10

Tree representation of a regexp
Operator Precedence
| *
low
.
medium
high
11

Bottom-up conversion
12

Bottom-up conversion
13

Bottom-up conversion
14

Bottom-up conversion
15

Bottom-up conversion
16

Bottom-up conversion
17

Bottom-up conversion
18

Bottom-up conversion
19

Bottom-up conversion
20

Regexp to DFAs
We now have an NFA We need to go to DFA
But what¡¯s so great about DFAs?
21

Table-driven DFAs
Recall that ¦Ä can be expressed as a table
This leads to a very efficient array representation
s = start state
while (more input){
c = read char
s = table[s][c]
}
if s is final, accept
22

FSMs for tokenization
FSMs only check for language membership of a string
the scanner needs to recognize a stream of many different tokens using the longest match
the scanner needs to know what was matched
Idea: imbue states with actions that will fire when state is reached
23

A first cut at actions
Consider the language of Pascal identifiers
BAD: not longest match
Accounting for longest matches
BAD: maybe we needed that character
24

A second take at actions
Give our FSMs ability to put chars back
25

Our first scanner
Consider a language with two statements assignments: ID = expr
increments: ID += expr
where expr is of the form ID + ID
ID ^ ID
ID < ID ID <= ID Identifiers ID follow C conventions 26 Combined DFA 27 do{ read char perform action / update state if (action was to return a token){ start again in start state } } (while not EOF or stuck); 29 Lexical analyzer generators aka scanner generators The transformation from regexp to scanner is formally defined Can write tools to synthesize a lexer automatically Lex: unix scanner generator Flex: fast lex JLex: Java version of Lex 30 JLex Declarative specification tell it what you want scanned, it will figure out the rest Input: set of regexps + associated actions xyz.jlex file Output: Java source code for a scanner xyz.jlex.java source code of scanner 31 jlex format 3 sections separated by %% user code section directives regular expressions + actions 32 33 Rules section Format is {code} where regex is a regular expression for a single token
can use macros from the directive sections in regex, surround with curly braces
Conventions
chars represent themselves (except special characters) chars inside ¡°¡± represent themselves (except \)
Regexp operators | * + ? () .
Character class operators – range
^ not
\ escape
34

35