Compiler Theory
Assignment 3, January 29, 2019
Regular Expressions
Reading: Chapter 3.3: Regular Expressions
Chapter 3.4: Finite State Automata
Chapter 3.5: From Regular Expressions to Finite State Automata
A “regular expression” is a string of characters that specifies a pattern that can
be test-matched to strings made up of characters from a given alphabet. Regular
expressions are constructed as follows:
Atomic expressions:
single character from the given alphabet: matches the character itself
e (epsilon) the empty string – matches a zero-length string
[a-z] character class; matches a single character in the range or ranges given
inside the square brackets, i.e. [a-zA-Z0-9] is one alphanumeric character.
Compound expressions (r and s are regular expressions here)
rs concatenation, r followed by s; matches a string matching r immediately
followed by a string matching s.
r|s r or s; matches a string matching r or matching s.
r* Kleene closure; matches the concatenation of zero or more strings
matching r; may be different strings.
r+ positive closure; matches the concatenation of one or mere strings
matching r; may be different strings.
(r) parentheses for controlling the order of operations.
Precedence: In the absence of parentheses, * and + have highest precedence, then
concatenation, then |.
Quoting: If the alphabet contains any of the characters that have special meaning
in regular expression syntax, such as | * + ( ) [ ] -, then those special
characters can be used as ordinary characters by quoting them, i.e. ‘+’.
Assignment:
Write regular expressions and draw finite state automata for the following
languages. The alphabet for all is {a,b,c}. Empty strings are legal unless
stated otherwise. A string is accepted if the statement is true about it,
regardless of what else is true about the string.
1. String contains exactly one a. ( cabcc is ok, but not cabbac or bbcc )
2. String contains exactly two a’s. ( aa and babbca are ok, but not aaaa )
OVER
3. String contains an even number of a’s (zero counts as even).
( abaabba is ok, but not abccaabb )
4. The first b (if any) occurs before the first c (if any).
( aababbcab, aaacc, and abba are ok, but not aaacabcaa )
5. All b’s occur before all c’s.
( aababbaccaa, acca, and bba are ok, but not aacab or abcab )
6. The pair ab does not occur.
( acacba is ok, but not aacbcabb )