Overview
在本作业中,您将重构一个文本匹配正则表达式引擎以使用 一个更强类型的接口。这些更强的类型不仅减少了可能性 对于错误,同时也提高了代码库的可读性和清晰度。在上面 强类型版本,您将实现许多额外的便利组合 nators 用于处理正则表达式。这个任务会给你经验 使用 Haskell 类型系统扩展(特别是 GADTs),有编程经验 使用 monads 和 applicative functors,并向您介绍一些概念,例如 alternatives。
1Provided Code
提供的代码由多个模块组成:
Hare.hs contains the stubs of the code you should implement.
Untyped.hs contains an untyped, unextended version of the regex engine.
HareMonad.hs contains the monadic type used to write the matching algorithm.
Tests.hs contains the main function for running tests.
Tests/Support.hs contains the support code for running tests.
Tests/UnitTests.hs contains properties for the basic regular expressions.
Tests/Transcript.hs contains some acceptance tests for your combinators, for analysing
a UNSW transcript.
Tests/Examples.hs contains all examples from this spec, as unit tests.
Note: 只能修改‘Hare.hs’, so make sure your submission compiles
with the original versions of all other fifiles.
Regular Expressions
正则表达式是描述和操作字符串模式的流行方式。
它们通常用于搜索与特定模式匹配的子字符串,并且然后从该子字符串中提取信息。
许多语言都将正则表达式作为内置功能或在其标准中图书馆。如果你做过COMP2041,你会接触过好几个这样的局域网规。不幸的是,Haskell 不是这样的语言。为了解决这个问题,海伦野兔
开始设计 H.A.R.E,一个 Haskell 的正则表达式库。她首先将正则表达式的基本构建块编码为 Haskell数据类型。
data RE = Empty – 匹配空 string ‘’ ‘’
| Fail – 匹配没有string
| Char [Char] — Matches a single character from the list
| Seq RE RE — Matches the two expressions in sequence
| Choose RE RE — Matches either the first, or the second
| Star RE — Matches the expression zero or more times
注意:如果你之前看过正则表达式,你可能会习惯更多的功能 比上面那些可用。一般来说,这些特征可以转化为 上面给出的特征。 下表给出了这些正则表达式的一些简单示例。
2Regular Expression Matches
Star (Char [‘0’..’9′]) `Seq` Char [‘0’..’9′] “1234”,”039″,”0″,…
Star (Char [‘a’]) `Choose` Star (Char [‘b’]) “”,”a”,”b”,”aa”,”bb”..
Fail — nothing
Empty “”
为了定义正则表达式的匹配算法,海伦定义了一个类型 称为 Hare(可在 HareMonad.hs 中找到),它与 State String 类似,不同之处在于 返回类型包含在 monad f 中
:
newtype Hare f a = Hare { runHare :: String -> f (String, a) }
The hare function can be used to “run” a Hare computation on a string:
hare :: Functor f => Hare f a -> String -> f a
hare a s = fmap snd (runHare a s)
Helen has parameterised the Hare type by f so that it could be instantiated to Maybe
if the user only wishes to fifind the fifirst match for a particular regular expression, or to
the list type constructor [] if the user wishes to fifind all possible matches of the regular
expression. This requires us to make use of the Alternative type class, which has two
built in methods:
class Applicative f => Alternative f where
empty :: f a
(<|>) :: f a -> f a -> f a
The empty method denotes failure. For the list instance, it is the empty list [], whereas
for the Maybe instance it is Nothing. The (<|>) method denotes alternation. For the
list instance it is the same as concatentation (++), whereas for the Maybe instance, it is
a left-biased choice, like so:
instance Alternative Maybe where
empty = Nothing
Just a <|> b = Just a
Nothing <|> b = b
3To avoid confusing empty with the Empty constructor for regular expressions, Helen also
defifined an alias for empty, called failure:
failure :: (Alternative f, Monad f) => Hare f a
failure = empty
The readCharacter action is also of note, which removes a character from the String
state and returns it. If there are no characters left in the string, it will fail:
readCharacter :: (Alternative f, Monad f) => Hare f Char
readCharacter = Hare $ \s -> case s of
[] -> empty
(x:xs) -> pure (xs,x)
Armed with just readCharacter and the Monad, Applicative and Alternative in
stances for Hare, Helen defifines the matching algorithm, in a function called match (found
in Untyped.hs):
match :: (Monad f, Alternative f) => RE -> Hare f Results
Here the Results type describes the string that was matched by each part of the regular
expression:
data Results = None
| Character Char
| Tuple Results Results
| Repetition [Results]
Seeing as the Empty expression matches any string, Helen simply returns None in that
case:
match Empty = pure None
Conversely, because Fail matches no strings at all, Helen always calls failure if at
tempting to match this expression:
match Fail = failure
To defifine the case for Char, Helen uses the built-in function guard, which will fail if the
given condition evaluates to false:
match (Char cs) = do
x <- readCharacter
guard (x `elem` cs)
pure (Character x)
To defifine the case for Seq, Helen makes use of the Monad instance, fifirst matching on the
fifirst expression a, then the second b:
4match (Seq a b) = do
ra <- match a
rb <- match b
pure (Tuple ra rb)
Because there is no dependencies between ra and rb, this could also have been defifined
using the Applicative instance, as follows:
match (Seq a b) = Tuple <$> match a <*> match b
For the case for Choose, Helen makes use of the Alternative (<|>) method:
match (Choose a b) =
match a
<|> match b
For Star, Helen observed the following bijection
1
:
Star r === (r `Seq` Star r) `Choose` Empty
For this reason, Helen implements Star a by fifirst matching the expression a, then
matching on Star a recursively. If either match fails, it falls back to returning the
empty match:
match (Star a) =
addFront <$> match a <*> match (Star a)
<|> pure (Repetition [])
where
addFront x (Repetition xs) = Repetition (x:xs)
addFront _ _ = error “(should be) impossible!”
Note that Helen defifined Star to choose the longer alternative fifirst so that if used with
Maybe, the match function will return the longest match rather than the empty string.
Having defifined match, Helen can now defifine the regex matching operator (=~), which
searches through a string for a match to the expression:
(=~) :: (Monad f, Alternative f) => String -> RE -> f Results
str =~ re = hare matchAnywhere str
where matchAnywhere = match re <|> (readCharacter >> matchAnywhere)
If we look at some examples, we can see that invoking it with the Maybe monad returns
just the fifirst result, whereas the list version returns all results:
*> “COMP3141” =~ (Char [‘0’..’9′]) :: Maybe Results
Just (Character ‘3’)
*> “COMP3141” =~ (Char [‘0’..’9′]) :: [Results]
[Character ‘3’,Character ‘1’,Character ‘4’,Character ‘1’]
1
While they return difffferent results, there is a bijection between them
5A Typed Implementation (12 Marks)
Helen’s untyped implementation works, but it’s not very clean. For example, we know
that the Results returned by the regular expression
Char [‘0’..’9′] `Seq` Star (Char [‘0’..’9′])
will be of the format:
Tuple (Character c) (Repetition [Character x, Character y, …])
but any function designed to process the Results from this regular expression would
have to be partial, because we did not encode the format of the results returned into the
type system.
Instead of having a Results type, your fifirst task will be to defifine a type-indexed
version of the RE type and associated match function, in Hare.hs.
data RE :: * -> * where
Empty :: RE ()
Fail :: RE a
— Your constructors here
Each constructor has an additional type argument which specififies the type of the result
returned from each expression. We have defifined the fifirst two constructors for you. You
will have to fifind appropriate defifinitions for the rest of them.
For implementing match, feel free to look at Helen’s original implementation in the
Untyped.hs module as a guide. The test cases provided may also be instructive in
getting your implementation type-checking and correct.
Marking Criteria
Marks Description
2 Empty type correct; match implementation correct.
2 Fail type correct; match implementation correct.
2 Char type correct; match implementation correct.
2 Seq type correct; match implementation correct.
2 Choose type correct; match implementation correct.
2 Star type correct; match implementation correct.
12 Total
6Adding Actions (1 Marks)
Next, we will add an additional constructor to RE that does not change the set of strings
matched, but allows us to run an arbitrary transformation on the results returned from
the match:
Action :: (a -> b) -> RE a -> RE b
Your task in this part is to implement the match function for this constructor. You may
fifind the Functor instance for Hare useful.
Examples
*> “***” =~ Action length (Star (Char [‘*’])) :: [Int]
[3,2,1,0,2,1,0,1,0,0]
*> “AXax” =~ Action isUpper (Char [‘a’,’A’]) :: [Bool]
[True, False]
*> let f (x,y) = [x,y]
*> let atoz = Char [‘a’..’z’]
*> “ab01cd20” =~ Action f (atoz `Seq` atoz) :: [String]
[“ab”,”cd”]
Marking Criteria
Marks Description
1 Action clause in match correct.
1 Total
7Utility Combinators (7 Marks)
Using our newly minted Action constructor, you must now defifine a series of combinator
functions that allow us to defifine a lot of the features people normally expect to fifind in
a regular expressions system.
cons function
The cons function matches a regular expression for an element of type a, followed by a
regular expression for a list [a], and returns that element prepended to the list.
cons :: RE a -> RE [a] -> RE [a]
Examples:
*> “10100” =~ cons (Char [‘1’]) (Star (Char [‘0’])) :: [String]
[“10″,”1″,”100″,”10″,”1”]
*> “10100” =~ cons (Char [‘1’]) (Action (const []) Empty) :: [String]
[“1″,”1”]
plus function
The plus function is like Star, but requires the expression given to occur at least once.
Thus, this function can be defifined using Action, Seq and Star.
plus :: RE a -> RE [a]
Examples:
*> “10100” =~ plus (Char [‘0’]) :: [String]
[“0″,”00″,”0″,”0”]
*> let atoz = Char [‘a’..’z’]
*> let digits = Char [‘0’..’9′]
*> “ab1c3” =~ plus (atoz `Seq` digits) :: [[(Char,Char)]]
[[(‘b’,’1′),(‘c’,’3′)],[(‘b’,’1′)],[(‘c’,’3′)]]
string function
The string function produces a regex that only matches the given string, and returns
it. You should be able to implement it using Char, cons, Empty and Action.
string :: String -> RE String
Examples:
*> let comp3141 = string “COMP3141”
*> “My favourite subject is COMP3141” =~ comp3141 :: Maybe String
Just “COMP3141”
*> “My favourite subject is MATH1141” =~ comp3141 :: Maybe String
Nothing
8choose function
The choose function is like the Choose constructor, but generalised to lists. That is,
choose [a, b] is equivalent to a ‘Choose‘ b. What should choose [] be?
choose :: [RE a] -> RE a
Examples:
*> let re = choose [string “COMP”, string “MATH”, string “PHYS”]
*> “COMP3141, MATH1081, PHYS1121, COMP3121” =~ re :: [String]
[“COMP”,”MATH”,”PHYS”,”COMP”]
*> “abc” =~ choose [] :: Maybe String
Nothing
option function
The option function matches the given expression zero or one times. In other words,
if the expression matches, it returns Just that match, otherwise Nothing. You can
implement this combinator with Action, Choose and Empty.
option :: RE a -> RE (Maybe a)
Examples:
*> let digits = Char [‘0’..’9′]
*> let sign = Action (fromMaybe ‘+’) (option (Char [‘-‘])
*> “-30 5 3” =~ (sign `cons` plus digits) :: [String]
[“-30″,”-3″,”+30″,”+3″,”+0″,”+5″,”+3″]
*> “foo” =~ option (Char [‘a’]) :: [Maybe Char]
[Nothing,Nothing,Nothing,Nothing]
Note: As with Star, prefer longer matches over shorter ones, so that the results appear
as in the above example.
rpt function
The rpt combinator allows a regular expression to be repeated a fifixed number of times.
The expression rpt n x is equivalent to repeating x in sequence n times, returning the
results in a list.
rpt :: Int -> RE a -> RE [a]
Examples:
9*> let digits = Char [‘0’..’9′]
*> let programs = choose [string “COMP”, string “PHYS”, string “MATH”]
*> let courseCode = programs `Seq` rpt 4 digits
*> “COMP3141, MATH1081, and PHYS1121” =~ courseCode :: [(String,String)]
[(“COMP”,”3141″),(“MATH”,”1081″),(“PHYS”,”1121″)]
*> “foo” =~ rpt 0 (Char [‘a’]) :: Maybe [Char]
Just “”
rptRange function
Lastly, the rptRange function takes a range of numbers (x, y). You may assume that
x ≤ y. It will match the given regex at least x times and at most y times.
rptRange :: (Int, Int) -> RE a -> RE [a]
Examples:
*> “1234” =~ rptRange (2,4) (Char [‘0’..’9′]) :: [String]
[“1234″,”123″,”12″,”234″,”23″,”34”]
*> “1234” =~ rptRange (3,3) (Char [‘0’..’9′]) :: [String]
[“123″,”234”]
Note: As with Star and option, prefer longer matches to shorter ones.
Marking Criteria
Marks Description
1 cons implementation correct.
1 plus implementation correct.
1 string implementation correct.
1 choose implementation correct.
1 option implementation correct.
1 rpt implementation correct.
1 rptRange implementation correct.
7 Total
10