COMP90045 Programming Language Implementation
Parsing with Haskell
Harald Søndergaard Lecture 8
Semester 1, 2019
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 1 / 23
Recursive Descent
Recursive descent parsers are top down parsers that consist of a set of mutually recursive functions, one per nonterminal.
The code of each of these functions can be derived automatically from the grammar, but it can also be written by hand without much difficulty.
In either case, the function bodies can be augmented with error recovery code specialized to given situations, as well as with code to build parse trees.
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 2 / 23
Recursive Descent: Functions
All functions consistently maintain fixed lookahead, usually one token.
Each function first decides which production to apply, using the lookahead and the LOOK sets of the nonterminal’s productions.
The processing of a production consists of a sequence of steps, one step per symbol on the right-hand side of the production:
If the symbol is a terminal then check that the next token is that terminal, and consume it; otherwise, signal an error.
If the symbol is a nonterminal, call its function.
This is easily implemented in a functional language: A parser takes a list of tokens as input and returns a pair, namely the AST fragment it has synthesized, and the list of remaining tokens—those this parser
did not need. The unneeded tokens are passed to the next parser.
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 3 / 23
Parsing with Haskell
On the LMS, under ”Videos”, you will find some small videos showing detailed Haskell code for some small parsing tasks.
Some of the parsers make use of the parser library Parsec.
On the LMS, under ”Other Resources”, you will find various Parsec
introductions and manuals.
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 4 / 23
Parsec
Parsec is a library that supports construction of recursive descent parsing, with backtracking.
It offers a large number of useful parser combinators, functions that can be used to combine basic parsers into more complex parsers. The combinators implement typical patterns of combination. They make programming easier and programs more readable.
In the Parsec view, a parser is a state transformer and is best thought of as an instance of the state monad.
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 5 / 23
The Type of a Parser
A parser takes a list of tokens and returns a “result” of some type a. Usually a is a type of abstract syntax trees. When the parser runs, it will, as a side-effect, consume some of the tokens, and return the corresponding result (or signal an error).
Apart from manipulating the list of tokens, the parser may want to do other stateful computation, such as updating a symbol table or keep track of the current line number.
With a monadic formulation, we avoid having parsers (overtly) passing values around, of some complicated UserState.
type Parser a = Parsec Token UserState a
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 6 / 23
Running a Parser
type Parser a = Parsec [Char] UserState a
This gives us a type of parser which
1 Considers tokens to be simply strings;
2 Can manipulate an internal state of type UserState;
3 Returns results of type a.
An element p of type Parser Exp denotes a parser.
The function runParser allows us to run p and extract its result:
do
let output = runParser p initState “” input print output
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 7 / 23
Tokens as Strings
Actually, runParser returns a result of an Either type; the result is either “Right result” or “Left err”.
Taking tokens to be simply snippets of string is common amongst Parsec users—basically we can let the parser do the work of the lexer.
The module Text.Parsec.Char offers low-level “character parsers” of type CharParser u a, such as digit which parses a digit and returns the parsed character, and satisfy which takes a predicate p and parses any character satisfying p.
digit = satisfy isDigit letter = satisfy isAlpha
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 8 / 23
Tokens as Strings
Here are some other simple but useful parsers and parser constructors:
char :: Char -> CharParser u Char string :: String -> CharParser u String whiteSpace :: CharParser u ()
comma = char ’,’
colon = char ’:’ pWhere = string “where”
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 9 / 23
Parser Combinators
We will want to “sequence” parsers:
lexeme p =do
e <- p
whiteSpace
return e
The infix combinator “<|>” is for choice: divLike
= string “div” <|> string “mod” <|> string “rem”
Alternatively,
divLike
= choice [string “div”, string “mod”, string “rem”]
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 10 / 23
Parser Combinators
balancedParens :: CharParser u () balancedParens
=do
char ’(’
balancedParens
char ’)’
balancedParens
<|>
return ()
PLI (Sem 1, 2019)
Parsing with Haskell
⃝c University of Melbourne
11 / 23
Choice and Backtracking
The parser p1 <|> p2 will commit to behaving like p1, unless p1 cannot get started. If p1 fails after having consumed some input, there is no rolling back of the input.
testOr
= string “(a)” <|> string “(b)”
So (perhaps unexpectedly) testOr will fail on input (b).
The combinator “try” allows for roll-back: try p behaves like p but
pretends not to have consumed input if p fails. testOr
= try (string “(a)”) <|> string “(b)”
Of course the use of “try” incurs a performance penalty.
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 12 / 23
Repetition
The “many” combinator will apply its argument parser zero or more times. Its type, roughly: Parser a -> Parser [a].
Similarly, “many1” applies its argument parser one or more times. We could define them (if Parsec didn’t do so already):
many p = many1 p <|> return []
many1 p = do x <- p
xs <- many p
return (x:xs) word = many1 letter
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 13 / 23
Separators
A common pattern is that we have a syntactic category for “sequences” of things of category A, separated (or perhaps terminated) by some string s.
declList = sepBy1 decl comma
The parser declList acts like repeated use of the decl parser,
interspersed with use of the comma parser.
Related combinators are sepBy (allows 0 uses of the argument parser), endBy, and endBy1. The combinators skipMany and skipMany1 are like many and many1, except they don’t return the values from the argument parser.
Also check out chainl which allows you to control the associativity of binary operators.
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 14 / 23
Separators
separator :: Parser ()
separator = skipMany1 (space <|> char ’,’)
sentence :: Parser [String] sentence
=do
words <- sepBy1 word separator oneOf ".?!"
return words
“oneOf cs” is the parser that looks for any one of the characters cs.
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 15 / 23
The Error Combinator
The error combinator > attaches a string s to a parser; with the intention that failure should result in a message “expecting s”:
digit = satisfy isDigit > “digit” letter = satisfy isAlpha > “letter” space = satisfy isSpace > “space”
pStmt
= pAsg <|> pIf <|> pWhile <|> pCall
> “statement”
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 16 / 23
Error Messages
Parsers built with Parsec have this error messaging built-in.
? parseTest sentence “That name, Bond!” [“That”,”name”,”Bond”]
? parseTest sentence “A surprise, 007?” parse error at (line 1, column 13) unexpected “0”
expecting space, “,” or letter
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 17 / 23
Adjusting Error Messages
In this example, it might have been nicer to simply say that a “word” was expected.
Let us ask the word parser to express what we are looking for: word = many1 letter > “word”
We can suppress messages from the separator parser like so: separator = skipMany1 (space <|> char ’,’ > “”)
? parseTest sentence “A surprise, 007?” parse error at (line 1, column 13): unexpected “0”
expecting word
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 18 / 23
Expression Parsing
The task of parsing expressions that involve lots of unary and binary operators, with different precedence rules, and possibly different associativity, can be tedious.
Parsec offers support, through its Text.Parsec.Expr module. import Text.Parsec.Expr
pExp :: Parser Expr
pExp = buildExpressionParser table pFac
pFac = choice [parens pExp, pNum, pIdent]
Here table is a Haskell table that gives information about the
operators, their precedence and associativity.
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 19 / 23
Expression Parsing
table = [ [ prefix “-” UnaryMinus ] , [ binary “*” Mul ]
, [ binary “+” Add, binary “-” Sub ]
, [ relation “=” Eq ] ]
prefix name fun
= Prefix (do { reservedOp name; return fun })
binary name op
= Infix (do { reservedOp name; return op }) AssocLeft
relation name rel
= Infix (do { reservedOp name; return rel }) AssocNone
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 20 / 23
Expression Parsing
table = [ [ prefix “-” UnaryMinus ] , [ binary “*” Mul ]
, [ binary “+” Add, binary “-” Sub ] , [ relation “=” Eq ]
]
The table says that unary minus binds tighter than multiplication, and so on. Addition and subtraction have the same precedence.
The other details on the previous slide specify that all binary operators used are left-associative, and equality is non-associative (that is, e1 = e2 = e3 is not well-formed).
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 21 / 23
Lexeme Parsing
Another useful Parsec module is Text.Parsec.Token.
It offers a function, makeTokenParser which makes it very easy to specify the lexical details of the source language. For this reason, parsers using Parsec rarely interact with a separate lexer.
The same module offers a raft of useful basic parsers, such as natural, identifier, and parens.
See the Kid parser for more examples.
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 22 / 23
Student Rep Wanted!
We are being asked to find a student rep for COMP90045.
One task will be to take student feedback to a student-staff liaison meeting on 15 April, 1–2pm (yes, that means pizza).
PLI (Sem 1, 2019) Parsing with Haskell ⃝c University of Melbourne 23 / 23