程序代写 RegExps & DFAs

RegExps & DFAs

Pre-class warm up

Copyright By PowCoder代写 加微信 powcoder

Write the regexp for Fortran real literals
An optional sign (‘+’ or ‘-‘)

An integer or:

1 or more digits followed by a ‘.’ followed by 0 or more digits

or: A ‘.’ followed by one or more digits

(‘+’|’-’|ε) (digit+(‘.’|ε) | (digit*’.’digit+))

Explored NFAs
for every NFA there is an equivalent DFA

epsilon edges add no expressive power

Introduce regular languages / expressions

Convert regexps to DFAs

From language recognizers to tokenizers

Regexp to NFAs

Literals/epsilon correspond to simple DFAs

Operators correspond to methods of joining

x^n, where n is even or divisible by 3

Regexp to NFA rules

Rules for operands

Regexp to NFA rules
Rules for alternation A|B

Make new start state q’ and new final state f’

Make original final states non-final

Regexp to NFA rules
Rule for catenation A.B

Make new start state q’ and new final state f’

Make original final states non-final

Regexp to NFA rules
Rule for iteration A*

Make new start state q’ and new final state f’

Make original final states non-final

Regexp operator
precedence

Operator Precedence Analogous math operator

| low addition

. medium multiplication

* high exponentiation

Tree representation of a

Operator Precedence

Bottom-up conversion

Bottom-up conversion

Bottom-up conversion

Bottom-up conversion

Bottom-up conversion

Bottom-up conversion

Bottom-up conversion

Bottom-up conversion

Bottom-up conversion

Regexp to DFAs

We now have an NFA

We need to go to DFA

But what’s so great about DFAs?

Table-driven DFAs

Recall that δ can be expressed as a table

This leads to a very efficient array representation

s = start state
while (more input){
c = read char
s = table[s][c]

if s is final, accept

FSMs for tokenization

FSMs only check for language membership of a

the scanner needs to recognize a stream of many different
tokens using the longest match

the scanner needs to know what was matched

Idea: imbue states with actions that will fire
when state is reached

A first cut at actions

Consider the language of Pascal identifiers

BAD: not longest match

Accounting for longest matches

BAD: maybe we needed that

A second take at actions

Give our FSMs ability to put chars back

Our first scanner

Consider a language with two statements
assignments: ID = expr
increments: ID += expr

where expr is of the form

Identifiers ID follow C conventions

Combined DFA

perform action / update state

if (action was to return a token){
start again in start state

} (while not EOF or stuck);

Lexical analyzer generators

aka scanner generators

The transformation from regexp to scanner is
formally defined

Can write tools to synthesize a lexer
automatically

Lex: unix scanner generator

Flex: fast lex

JLex: Java version of Lex

J specification
tell it what you want scanned, it will figure out the rest

Input: set of regexps + associated actions
xyz.jlex file

Output: Java source code for a scanner
xyz.jlex.java source code of scanner

jlex format

3 sections separated by %%
user code section

directives

regular expressions + actions

Rules section

Format is {code} where regex is a regular expression for a single

can use macros from the directive sections in regex, surround with curly braces

Conventions
chars represent themselves (except special characters)

chars inside “” represent themselves (except \)

Regexp operators
| * + ? () .

Character class operators

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com