Coursework 1
This coursework is worth 5% and is due on 18 October at 18:00. You are asked
to implement a regular expression matcher and submit a document containing
the answers for the questions below. You can do the implementation in any
programming language you like, but you need to submit the source code with
which you answered the questions, otherwise a mark of 0% will be awarded.
You can submit your answers in a txt‑file or pdf. Code send as code. Please
package everything inside a zip‑file that creates a directory with the name
YournameYourfamilyname
on my end. Thanks!
Disclaimer
It should be understood that the work you submit represents your own effort.
You have not copied from anyone else. An exception is the Scala code I showed
during the lectures or uploaded to KEATS, which you can freely use.
If you have any questions, please send me an email in good time.
Task
The task is to implement a regular expression matcher based on derivatives of
regular expressions. The implementation should be able to deal with the usual
(basic) regular expressions
0, 1, c, r1 + r2, r1 · r2, r∗
but also with the following extended regular expressions:
[c1, c2, . . . , cn] a set of characters—for character ranges
r+ one or more times r
r? optional r
r{n} exactly n‑times
r{..m} zero or more times r but no more than m‑times
r{n..} at least n‑times r
r{n..m} at least n‑times r but no more than m‑times
∼ r not‑regular‑expression of r
You can assume that n and m are greater or equal than 0. In the case of r{n,m}
you can also assume 0 ≤ n ≤ m.
Important! Your implementation should have explicit case classes for the ba‑
sic regular expressions, but also explicit case classes for the extended regular
1
expressions.1 That means do not treat the extended regular expressions by just
translating them into the basic ones. See also Question 3, where you are asked
to explicitly give the rules for nullable and der for the extended regular expres‑
sions. Something like
der c (r+)
def
= der c (r · r∗)
is not allowed as answer in Question 3 and not allowed in your code.
The meanings of the extended regular expressions are
L([c1, c2, . . . , cn])
def
= {[c1], [c2], . . . , [cn]}
L(r+)
def
=
∪
1≤i . L(r)
i
L(r?)
def
= L(r) ∪ {[]}
L(r{n})
def
= L(r)n
L(r{..m})
def
=
∪
0≤i≤m . L(r)
i
L(r{n..})
def
=
∪
n≤i . L(r)
i
L(r{n..m})
def
=
∪
n≤i≤m . L(r)
i
L(∼ r) def= Σ∗ − L(r)
whereby in the last clause the set Σ∗ stands for the set of all strings over the
alphabet Σ (in the implementation the alphabet can be just what is represented
by, say, the type Char). So ∼ r means in effect “all the strings that r cannot
match”.
Be careful that your implementation of nullable and der satisfies for every regu‑
lar expression r the following two properties (see also Question 3):
• nullable(r) if and only if [] ∈ L(r)
• L(der c r) = Der c (L(r))
Question 1 (Unmarked)
What is your King’s email address (you will need it in Question 5)? Also could
you please let me know from where you will be mainly studying? (online /
in‑person, in London / somewhere else) Thanks!
Question 2 (Unmarked)
Can you please also list all programming languages in which you have already
written programs (include only instances where you have spent at least a good
working day fiddling with a program)? This is just for my curiosity to estimate
what your background is.
1Please call them RANGE, PLUS, OPTIONAL, NTIMES, UPTO, FROM and BETWEEN.
2
Question 3
From the lectures you have seen the definitions for the functions nullable and
der for the basic regular expressions. Implement and write down the rules for
the extended regular expressions:
nullable([c1, c2, . . . , cn])
def
= ?
nullable(r+) def= ?
nullable(r?) def= ?
nullable(r{n}) def= ?
nullable(r{..m}) def= ?
nullable(r{n..}) def= ?
nullable(r{n..m}) def= ?
nullable(∼ r) def= ?
der c ([c1, c2, . . . , cn])
def
= ?
der c (r+)
def
= ?
der c (r?)
def
= ?
der c (r{n})
def
= ?
der c (r{..m})
def
= ?
der c (r{n..})
def
= ?
der c (r{n..m})
def
= ?
der c (∼ r) def= ?
Remember your definitions have to satisfy the two properties
• nullable(r) if and only if [] ∈ L(r)
• L(der c r)) = Der c (L(r))
Given the definitions of nullable and der, it is easy to implement a regular ex‑
pression matcher. Test your regular expression matcher with (at least) the ex‑
amples:
string a? ∼ a a{3} (a?){3} a{..3} (a?){..3} a{3..5} (a?){3..5}
[]
a
aa
aaa
aaaaa
aaaaaa
Does your matcher produce the expected results? Make sure you also test
corner‑cases, like a{0}!
3
Question 4
As you can see, there are a number of explicit regular expressions that deal with
single or several characters, for example:
c matches a single character
[c1, c2, . . . , cn] matches a set of characters—for character ranges
ALL matches any character
The latter is useful for matching any string (for example by using ALL∗). In
order to avoid having an explicit constructor for each case, we can generalise
all these cases and introduce a single constructorCFUN( f )where f is a function
from characters to booleans. In Scala code this would look as follows:
abstract class Rexp
…
case class CFUN(f: Char => Boolean) extends Rexp
The idea is that the function f determineswhich character(s) arematched, namely
those where f returns true. In this question implement CFUN and define
nullable(CFUN( f )) def= ?
der c (CFUN( f )) def= ?
in your matcher and then also give definitions for
c
def
= CFUN(?)
[c1, c2, . . . , cn]
def
= CFUN(?)
ALL def= CFUN(?)
You can either add the constructor CFUN to your implementation in Question
3, or you can implement this questions first and then use CFUN instead of
RANGE and CHAR in Question 3. In an ideal world one would do this task first,
but this might confuse you with what you need to do in the previous question.
Question 5
Suppose [a‑z0‑9_ .‑] stands for the regular expression
[a, b, c, . . . , z, 0, . . . , 9, _, ., ‑] .
Define in your code the following regular expression for email addresses
([a‑z0‑9_ .−]+) · @ · ([a‑z0‑9 .−]+) · . · ([a‑z .]{2,6})
and calculate the derivative according to your own email address. When cal‑
culating the derivative, simplify all regular expressions as much as possible by
applying the following 7 simplification rules:
4
r · 0 7→ 0
0 · r 7→ 0
r · 1 7→ r
1 · r 7→ r
r + 0 7→ r
0+ r 7→ r
r + r 7→ r
Write down your simplified derivative in a readable notation using parentheses
where necessary. That means you should use the infix notation +, ·, ∗ and so
on, instead of raw code.
Question 6
Implement the simplification rules in your regular expression matcher. Con‑
sider the regular expression / · ∗ · (∼ (ALL∗ · ∗ · / · ALL∗)) · ∗ · / and decide
whether the following four strings are matched by this regular expression. An‑
swer yes or no.
1. “/**/”
2. “/*foobar*/”
3. “/*test*/test*/”
4. “/*test/*test*/”
Question 7
Let r1 be the regular expression a · a · a and r2 be (a{19,19}) · (a?).
Decidewhether the following three strings consisting of as only can bematched
by (r+1 )
+. Similarly test them with (r+2 )
+. Again answer in all six cases with
yes or no.
These are strings are meant to be entirely made up of as. Be careful when copy‑
and‑pasting the strings so as to not forgetting any a and to not introducing any
other character.
5. “aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa”
6. “aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa”
7. “aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa”
5