CS计算机代考程序代写 compiler flex FIT2014 Theory of Computation Lecture 6 Regular Expressions

FIT2014 Theory of Computation Lecture 6 Regular Expressions

Monash University
Faculty of Information Technology

FIT2014 Theory of Computation

Lecture 6
Regular Expressions

slides by Graham Farr
based in part on previous slides by David Albrecht

COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969

Warning
This material has been reproduced and communicated to you by or on behalf of Monash University

in accordance with s113P of the Copyright Act 1968 (the Act).
The material in this communication may be subject to copyright under the Act.

Any further reproduction or communication of this material by you may be the subject of copyright protection under the Act.
Do not remove this notice.

1 / 22

Overview

I Some Problems

I Applications of Regular Expressions

I Regular Expressions

I Regular Languages

2 / 22

Some Problems

I Find all the files which contain old subject course codes.

I Find all the e-mail addresses in a set of mail files.

I Change the way comments in programs are formatted in your web pages.

I Using web server access files, record how many times each page is visited, and
how many times each link is used.

3 / 22

Applications of Regular Expressions

I Useful way to describe simple patterns.

I Used in several programs:

I Editors: vi, emacs

I Filters: egrep, sed, gawk

I Programming languages: JFlex, CUP, Perl

4 / 22

Filters

egrep

I A program which searches a file for a pattern described by a regular expression.

sed

I A program which enables stream editing of files.

awk, nawk, gawk

I Programming languages which enable text manipulation.

5 / 22

Programming Languages

JFlex, flex, lex

I Languages used to generate lexical analysers.

CUP, bison, yacc

I Languages used to generate compilers.

Perl

I A powerful scripting language, developed in the 1980s by Larry Wall.

6 / 22

Regular Expressions for Small Languages

The empty language ∅

Language {ε} consisting only of the empty word ε

Language {w} consisting only of the single word w w

E.g.:
the language {abbab} abbab

7 / 22

Alternatives, Grouping

Alternatives
are indicated by ∪.
E.g.:

1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9
is a regular expression for:
{1, 2, 3, 4, 5, 6, 7, 8, 9}

Grouping
is indicated by ( ).
E.g.:

(ab ∪ ba)(e ∪ g)
is a regular expression for:
{abe, abg, bae, bag}

8 / 22

Finite Languages

consist of finite number of words.

E.g.:
{abaaba, abbbba, abbaba}

Regular Expression:
abaaba ∪ abbbba ∪ abbaba

. . . or:
ab(aa ∪ bb ∪ ba)ba

9 / 22

Kleene Star

Matching an expression zero or more times is indicated by ∗

For example:

a∗ represents {ε, a, aa, aaa, aaaa, . . .}

(ab)∗ represents {ε, ab, abab, ababab, . . .}

ab∗ represents {a, ab, abb, abbb, abbbb, . . .}

Note: ab∗ 6= (ab)∗

10 / 22

Kleene Star

(aa ∪ bb)∗ = (aa ∪ bb)0 ∪ (aa ∪ bb)1 ∪ (aa ∪ bb)2 ∪ · · ·
= ε ∪ (aa ∪ bb) ∪ (aa ∪ bb)(aa ∪ bb) ∪ · · ·

represents:

{ ε, aa, bb, aaaa, aabb, bbaa, bbbb, aaaaaa, aaaabb, aabbaa, . . . }

11 / 22

Parse Tree

(a ∪ ε)b∗

(a ∪ ε) b∗

a ∪ ε b

a ε

12 / 22

Definition

1. ∅ and ε are regular expressions

2. All letters in the alphabet are regular expressions.

3. If R and S are regular expressions, then so are:

(i) (R)

(ii) RS

(iii) R ∪ S

(iv) R∗

13 / 22

Regular Language

A language which can be described by a regular expression is called a regular
language.

If a word belongs to the language described by a regular expression, then we say it is
matched by the regular expression.

14 / 22

Example: EVEN-EVEN

Recall:

EVEN-EVEN = {All strings in which a and b each occur an even number of times }
= {ε, aa, bb, aaaa, aabb, abab, abba, . . .}.

Regular Expression:

( aa ∪ bb ∪ (ab ∪ ba)(aa ∪ bb)∗(ab ∪ ba) )∗

15 / 22

Things to think about . . .

Is the set of all English words (in some standard dictionary) a regular language?

Is DOUBLEWORD (see Lecture 1) a regular language?

Is PALINDROME a regular language?

Is the set of all grammatical English sentences a regular language?

How would you determine, for a given string and regular expression, whether the string
matches the regular expression?

16 / 22

Example: Floating Point Number

A floating point number has one or more digits, which may begin with a minus sign (−),
and which may contain a decimal point.

E.g.,
0 1.2 − 3 − 4.675 002 023.50

17 / 22

Sequence of Digits

One Digit
0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9

Two Digits

(0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9)(0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9)

Three Digits

(0∪1∪2∪3∪4∪5∪6∪7∪8∪9)(0∪1∪2∪3∪4∪5∪6∪7∪8∪9)(0∪1∪2∪3∪4∪5∪6∪7∪8∪9)

One or more Digits

(0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9)(0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9)∗

18 / 22

Sequences of Digits

Digit
D = (0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9)

Two Digits
DD or D2

Three Digits
DDD or D3

One or more Digits
DD∗

19 / 22

Numbers

One Digit
D = (0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9)

Nonnegative Integers

N = DD∗ e.g.: 1 123 1209 002 020

Integers
Z = N ∪ (−N)

Floating Point Number

F = Z ∪ (Z .) ∪ (.N) ∪ (−.N) ∪ (Z .N)

20 / 22

Other Notations

alternative notation

R ∪ S R | S
0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9 [0-9]
any letter a to z [a-z]
RR∗ R+

ε ∪ R R?

Be careful! Many variations exist. Tools that use regexps differ in many details:

I whether or not they use the above special meanings of + and ? ;
I how they handle the parentheses and vertical bar for alternatives

I sometimes (· · · |· · · )
I sometimes \(· · · \|· · · \)

I how they use full stop (often represents any non-newline character);
I how they represent newline characters
I . . .

21 / 22

Revision

I Regular Expressions

I Definitions

I How to use them to define languages

I Regular languages

Read Sipser, §1.3, pp 63–66.

Additional Reading

I Jeffrey E.F. Friedl, Mastering Regular Expressions: Powerful Techniques for Perl
and Other Tools, O’Reilly, 1997.

22 / 22