FIT2014 Theory of Computation Lecture 6 Regular Expressions
Monash University
Faculty of Information Technology
FIT2014 Theory of Computation
Lecture 6
Regular Expressions
slides by Graham Farr
based in part on previous slides by David Albrecht
COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969
Warning
This material has been reproduced and communicated to you by or on behalf of Monash University
in accordance with s113P of the Copyright Act 1968 (the Act).
The material in this communication may be subject to copyright under the Act.
Any further reproduction or communication of this material by you may be the subject of copyright protection under the Act.
Do not remove this notice.
1 / 22
Overview
I Some Problems
I Applications of Regular Expressions
I Regular Expressions
I Regular Languages
2 / 22
Some Problems
I Find all the files which contain old subject course codes.
I Find all the e-mail addresses in a set of mail files.
I Change the way comments in programs are formatted in your web pages.
I Using web server access files, record how many times each page is visited, and
how many times each link is used.
3 / 22
Applications of Regular Expressions
I Useful way to describe simple patterns.
I Used in several programs:
I Editors: vi, emacs
I Filters: egrep, sed, gawk
I Programming languages: JFlex, CUP, Perl
4 / 22
Filters
egrep
I A program which searches a file for a pattern described by a regular expression.
sed
I A program which enables stream editing of files.
awk, nawk, gawk
I Programming languages which enable text manipulation.
5 / 22
Programming Languages
JFlex, flex, lex
I Languages used to generate lexical analysers.
CUP, bison, yacc
I Languages used to generate compilers.
Perl
I A powerful scripting language, developed in the 1980s by Larry Wall.
6 / 22
Regular Expressions for Small Languages
The empty language ∅
Language {ε} consisting only of the empty word ε
Language {w} consisting only of the single word w w
E.g.:
the language {abbab} abbab
7 / 22
Alternatives, Grouping
Alternatives
are indicated by ∪.
E.g.:
1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9
is a regular expression for:
{1, 2, 3, 4, 5, 6, 7, 8, 9}
Grouping
is indicated by ( ).
E.g.:
(ab ∪ ba)(e ∪ g)
is a regular expression for:
{abe, abg, bae, bag}
8 / 22
Finite Languages
consist of finite number of words.
E.g.:
{abaaba, abbbba, abbaba}
Regular Expression:
abaaba ∪ abbbba ∪ abbaba
. . . or:
ab(aa ∪ bb ∪ ba)ba
9 / 22
Kleene Star
Matching an expression zero or more times is indicated by ∗
For example:
a∗ represents {ε, a, aa, aaa, aaaa, . . .}
(ab)∗ represents {ε, ab, abab, ababab, . . .}
ab∗ represents {a, ab, abb, abbb, abbbb, . . .}
Note: ab∗ 6= (ab)∗
10 / 22
Kleene Star
(aa ∪ bb)∗ = (aa ∪ bb)0 ∪ (aa ∪ bb)1 ∪ (aa ∪ bb)2 ∪ · · ·
= ε ∪ (aa ∪ bb) ∪ (aa ∪ bb)(aa ∪ bb) ∪ · · ·
represents:
{ ε, aa, bb, aaaa, aabb, bbaa, bbbb, aaaaaa, aaaabb, aabbaa, . . . }
11 / 22
Parse Tree
(a ∪ ε)b∗
(a ∪ ε) b∗
a ∪ ε b
a ε
12 / 22
Definition
1. ∅ and ε are regular expressions
2. All letters in the alphabet are regular expressions.
3. If R and S are regular expressions, then so are:
(i) (R)
(ii) RS
(iii) R ∪ S
(iv) R∗
13 / 22
Regular Language
A language which can be described by a regular expression is called a regular
language.
If a word belongs to the language described by a regular expression, then we say it is
matched by the regular expression.
14 / 22
Example: EVEN-EVEN
Recall:
EVEN-EVEN = {All strings in which a and b each occur an even number of times }
= {ε, aa, bb, aaaa, aabb, abab, abba, . . .}.
Regular Expression:
( aa ∪ bb ∪ (ab ∪ ba)(aa ∪ bb)∗(ab ∪ ba) )∗
15 / 22
Things to think about . . .
Is the set of all English words (in some standard dictionary) a regular language?
Is DOUBLEWORD (see Lecture 1) a regular language?
Is PALINDROME a regular language?
Is the set of all grammatical English sentences a regular language?
How would you determine, for a given string and regular expression, whether the string
matches the regular expression?
16 / 22
Example: Floating Point Number
A floating point number has one or more digits, which may begin with a minus sign (−),
and which may contain a decimal point.
E.g.,
0 1.2 − 3 − 4.675 002 023.50
17 / 22
Sequence of Digits
One Digit
0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9
Two Digits
(0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9)(0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9)
Three Digits
(0∪1∪2∪3∪4∪5∪6∪7∪8∪9)(0∪1∪2∪3∪4∪5∪6∪7∪8∪9)(0∪1∪2∪3∪4∪5∪6∪7∪8∪9)
One or more Digits
(0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9)(0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9)∗
18 / 22
Sequences of Digits
Digit
D = (0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9)
Two Digits
DD or D2
Three Digits
DDD or D3
One or more Digits
DD∗
19 / 22
Numbers
One Digit
D = (0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9)
Nonnegative Integers
N = DD∗ e.g.: 1 123 1209 002 020
Integers
Z = N ∪ (−N)
Floating Point Number
F = Z ∪ (Z .) ∪ (.N) ∪ (−.N) ∪ (Z .N)
20 / 22
Other Notations
alternative notation
R ∪ S R | S
0 ∪ 1 ∪ 2 ∪ 3 ∪ 4 ∪ 5 ∪ 6 ∪ 7 ∪ 8 ∪ 9 [0-9]
any letter a to z [a-z]
RR∗ R+
ε ∪ R R?
Be careful! Many variations exist. Tools that use regexps differ in many details:
I whether or not they use the above special meanings of + and ? ;
I how they handle the parentheses and vertical bar for alternatives
I sometimes (· · · |· · · )
I sometimes \(· · · \|· · · \)
I how they use full stop (often represents any non-newline character);
I how they represent newline characters
I . . .
21 / 22
Revision
I Regular Expressions
I Definitions
I How to use them to define languages
I Regular languages
Read Sipser, §1.3, pp 63–66.
Additional Reading
I Jeffrey E.F. Friedl, Mastering Regular Expressions: Powerful Techniques for Perl
and Other Tools, O’Reilly, 1997.
22 / 22