CS 3EA3: Example Haskell Code for Recursive Descent Parsing

Wolfram Kahl September 17, 2009

The module Data.Char provides us with the functions isDigit, isLetter, and ord :: Char → Int. module ExprParse where

import Data.Char
An extremely simple expression datatype (abstract syntax):

data Expr
= Var String
| Num Integer
| Add Expr Expr | Mul Expr Expr

For the concrete syntax, we encode the different precedence levels into separate nonterminals to achieve a context-free grammar that allows recursive descent parsing:

Expr    ::= Term + Expr
Term    ::= Factor * Term
Factor  ::= Number

|   Term
|   Factor
|   Identifier   |    ( Expr )

The precedence levels can easil be accommodated by the unparsing function1; whether parentheses are added is decided depending on the precedence argument, which stands for the precedence of the surrounding constructor. Adding parentheses uses an auxiliary function paren:

paren :: String → String paren s = ’(’ : s + “)”

showPrecExpr :: Int → Expr → String
showPrecExpr p (Var s) = s
showPrecExpr p (Num k) = show k
showPrecExpr p (Add e1 e2) = (if p > 4 then paren else id)

(showPrecExpr 4 e1 + ” + ” + showPrecExpr 4 e2 )
1Idiomatic Haskell would define the more efficient showsPrecExpr :: Int → Expr → String → String.

showPrecExpr p (Mul e1 e2) = (if p > 5 then paren else id) (showPrecExpr 5 e1 + ” * ” + showPrecExpr 5 e2 )

For installing this as standard show function for Expr values, we choose lowest precedence by default:

instance Show Expr where show = showPrecExpr 0

For recursive descent parsing see also http://en.wikipedia.org/wiki/Recursive_descent; for an extremely “low-tech” Haskell version we use a parser type that takes a String as argument, and returns a pair consisting of a parsed Expr, and the not-yet-parsed rest of the String:

parseExpr , parseTerm , parseFactor :: String → (Expr , String )
This is a very simple choice; it forces us to use Haskell run-time errors for reacting to parse errors.

The grammar rules can now straight-forwardly be translated into Haskell, always continuing to work on the rest strings:

parseExpr cs = let (t,cs′) = parseTerm cs in case cs′ of ’+’ : cs′′ → let (e, cs′′′) = parseExpr cs′′ in (Add t e, cs′′′)

→ (t,cs′)

parseTerm cs = let (f , cs′) = parseFactor cs in case cs′ of ’*’:cs′′ →let(t,cs′′′)=parseTerm cs′′ in(Mul f t,cs′′′)

→ (f , cs′)

parseFactor (’(’ : cs) = let (e, cs′) = parseExpr cs in case cs′ of ’)’:cs′′ →(e,cs′′)

→ error “closing parenthesis expected” parseFactor (c : cs)

| isLetter c = let (ident,cs′) = parseIdentRest [c] cs in (Var ident,cs′)

| isDigit c = let (num, cs′) = parseNumberRest (numFromDigit c) cs in (Num num, cs′)

parseFactor cs = error (“factor expected; ” + show cs + ” found”)
Since we allow multi-letter dentifiers and multi-digit numbers, we use auxiliary parser to complete

them after we recognised the first character:

parseIdentRest :: String → String → (String , String ) parseIdentRest s (c : cs ) | isLetter c = parseIdentRest (s + [ c ]) cs parseIdentRest s cs = (s,cs)

parseNumberRest :: Integer → String → (Integer , String ) parseNumberRest n (c : cs)

| isDigit c = parseNumberRest (10 ∗ n + numFromDigit c) cs parseNumberRest n cs = (n,cs)

numFromDigit :: Char → Integer numFromDigit c = toInteger (ord c − ord ’0’)

Related Posts