# Parser
A video on this section can be found [here](https://bham.cloud.panopto.eu/Panopto/Pages/Viewer.aspx?id=2b11842f-958c-48c6-bf16-ac85012ed0f0).
Recall our concrete syntax in BNF notation:
“`
Program ::= Identifier := Expr; — assignment
| { [Program] } — block
| while (Expr) Program — whileStatement
| If (Expr) Program — ifStatement
| If (Expr) Program else Program
Expr ::= Expr1 | Expr1 OrOp Expr — lowest precedence
Expr1 ::= Expr2 | Expr2 AndOp Expr1
Expr2 ::= Expr3 | Expr3 EqOp Expr2
Expr3 ::= Expr4 | Expr4 CompOp Expr3
Expr4 ::= Expr5 | Expr5 AddOp Expr4
Expr5 ::= Expr6 | Expr6 MulOp Expr5
Expr6 ::= Expr7 | NotOp Expr6
Expr7 ::= Constant | Identifier | (Expr) — highest precedence
OrOp ::= ||
AndOp ::= &&
EqOp ::= ==
CompOp ::= <= | < | >= | >
AddOp ::= + | –
MulOp ::= * | / | %
NotOp ::= !
“`
The following is a direct translation of BNF to monadic [parsing combinators](/LectureNotes/Sections/monads.md#monadic-parsing):
“`haskell
module Parser where
import Data.Char
import Control.Monad
import AbstractSyntax
import Parsing
“`
The function `program` parses a program according to the above BNF definition. The result is a `Program` tree in the `Parser` monad:
“`haskell
program :: Parser Program
“`
Similarly, the various `expr` functions below parse expressions, with an expression tree in the `Parser` monad:
“`haskell
expr, expr1, expr2, expr3, expr4, expr5, expr6, expr7 :: Parser Expr
“`
And the following parse and return `Expr` constructors:
“`haskell
orOp, andOp, eqOp, compOp, addOp, mulOp, notOp :: Parser ([Expr] -> Expr)
“`
Based on the BNF production rule for programs, we define:
“`haskell
program =
assignment
<|> block
<|> whileStatement
<|> ifStatement
“`
where the functions `assignment`, `block`, `whileStatement` and `ifStatement` are defined below. The production rule for assignments is `Identifier := Expr;`, which we translate as follows to monadic parsing:
“`haskell
assignment =
do
i <- identif
symbol ":="
e <- expr
symbol ";"
return (i := e)
```
This works as follows:
* We parse an identifier with `identif` (defined below), which is bound to the variable `i`.
* We then parse the string `":="`.
* We then parse an expression with `expr`, which is bound to the variable `e`.
* We then parse the string `";"`.
* We finally return the program tree `i := e`.
The production rule for blocks is `{ [Program] }` (a list of programs enclosed in curly braces):
```haskell
block =
do
symbol "{"
ps <- many program
symbol "}"
return (Block ps)
```
The function `many`, applied to the function `program`, parses a list of programs. It is predefined in the type class `Alternative` (see the [monadic parsing](/LectureNotes/Sections/monads.md#monadic-parsing) lecture notes, the textbook or [hoogle](http://hackage.haskell.org/package/base-4.10.0.0/docs/Control-Applicative.html#v:many)). This works because the type constructor `Parser` is in the `Alternative` type class.
The production rule for while-statements is `while (Expr) Program`:
```haskell
whileStatement =
do
symbol "while"
symbol "("
e <- expr
symbol ")"
p <- program
return (While e p)
```
The production rule for if-statements is
```
If (Expr) Program
| If (Expr) Program else Program
```
This becomes, in factorized form:
```haskell
ifStatement =
do
symbol "if"
symbol "("
e <- expr
symbol ")"
p1 <- program
((do
symbol "else"
p2 <- program
return (IfElse e p1 p2))
<|>
(return (If e p1)))
“`
That is:
* Parse `”if”` and then `”(“` and then an expression `e` and then `”)”` and then a program `p1`.
* Then parse one of
* `”else”` and then a program `p2` (then return the tree `IfElse e p1 p2`), or
* nothing (then return the tree `If e p1`).
The following function is used in order to implement parsing of BNF definitions of the form
“`
expr := expr’ | expr’ op expr
“`
in factorized form:
“`haskell
binExpr :: Parser e -> Parser ([e] -> e) -> Parser e -> Parser e
binExpr expr’ op expr =
do
e’ <- expr'
((do
o <- op
e <- expr
return (o [e',e]))
<|>
return e’)
“`
Then the definitions of expressions become:
“`haskell
expr = binExpr expr1 orOp expr
expr1 = binExpr expr2 andOp expr1
expr2 = binExpr expr3 eqOp expr2
expr3 = binExpr expr4 compOp expr3
expr4 = binExpr expr5 addOp expr4
expr5 = binExpr expr6 mulOp expr5
expr6 = expr7
<|>
do
op <- notOp
e <- expr6
return (op [e])
expr7 = constant
<|> do
i <- identif
return (Var i)
<|> do
symbol “(”
e <- expr
symbol ")"
return e
```
The above use the following helper functions:
```haskell
parseOp :: String -> OpName -> Parser ([Expr] -> Expr)
parseOp s op = do
symbol s
return (Op op)
orOp = parseOp “||” Or
andOp = parseOp “&&” And
eqOp = parseOp “==” Eq
compOp = parseOp “<=" Leq
<|> parseOp “<" Less
<|> parseOp “>=” Geq
<|> parseOp “>” Greater
addOp = parseOp “+” Add
<|> parseOp “-” Sub
mulOp = parseOp “*” Mul
<|> parseOp “/” Div
<|> parseOp “%” Mod
notOp = parseOp “!” Not
“`
This is to parse numerical constants:
“`haskell
constant :: Parser Expr
constant = do
n <- integer
return (Constant(toInteger n))
```
Notice that `integer` (defined in the textbook code [Parsing.hs](Parsing.hs)) parses an `Int` rather than an `Integer`, and this is why we need the type cast `toInteger`.
This is to parse identifiers, which are defined like in the textbook but excluding our keywords:
```haskell
keywords = ["if", "else", "while"]
identif :: Parser String
identif =
do
cs <- token identifier
guard (not (elem cs keywords))
return cs
```
Parsing a program can cause an error:
```haskell
parseProgram :: String -> Program
parseProgram xs = case parse program xs of
[(p , [])] -> p
[(_ , s)] -> error (“syntax: unparsed string ” ++ s)
_ -> error “syntax: failed to parse program”
“`
There is no error when we get a singleton list (unambiguous parsing) consisting of a pair `(p,[])` where `p` is a program syntax tree and where `[]` indicates that there was nothing left unparsed after the program.
#### Next: [Interpreter](Interpreter.md)