CS 3EA3: Example Haskell Code for Recursive Descent Parsing
Wolfram Kahl September 17, 2009
The module Data.Char provides us with the functions isDigit, isLetter, and ord :: Char → Int. module ExprParse where
import Data.Char
An extremely simple expression datatype (abstract syntax):
data Expr
= Var String
| Num Integer
| Add Expr Expr | Mul Expr Expr
For the concrete syntax, we encode the different precedence levels into separate nonterminals to achieve a context-free grammar that allows recursive descent parsing:
Expr ::= Term + Expr Term ::= Factor * Term Factor ::= Number
| Term | Factor | Identifier | ( Expr )
The precedence levels can easil be accommodated by the unparsing function1; whether parentheses are added is decided depending on the precedence argument, which stands for the precedence of the surrounding constructor. Adding parentheses uses an auxiliary function paren:
paren :: String → String paren s = ’(’ : s + “)”
showPrecExpr :: Int → Expr → String
showPrecExpr p (Var s) = s
showPrecExpr p (Num k) = show k
showPrecExpr p (Add e1 e2) = (if p > 4 then paren else id)
(showPrecExpr 4 e1 + ” + ” + showPrecExpr 4 e2 )
1Idiomatic Haskell would define the more efficient showsPrecExpr :: Int → Expr → String → String.
1
showPrecExpr p (Mul e1 e2) = (if p > 5 then paren else id) (showPrecExpr 5 e1 + ” * ” + showPrecExpr 5 e2 )
For installing this as standard show function for Expr values, we choose lowest precedence by default:
instance Show Expr where show = showPrecExpr 0
For recursive descent parsing see also http://en.wikipedia.org/wiki/Recursive_descent; for an extremely “low-tech” Haskell version we use a parser type that takes a String as argument, and returns a pair consisting of a parsed Expr, and the not-yet-parsed rest of the String:
parseExpr , parseTerm , parseFactor :: String → (Expr , String )
This is a very simple choice; it forces us to use Haskell run-time errors for reacting to parse errors.
The grammar rules can now straight-forwardly be translated into Haskell, always continuing to work on the rest strings:
parseExpr cs = let (t,cs′) = parseTerm cs in case cs′ of ’+’ : cs′′ → let (e, cs′′′) = parseExpr cs′′ in (Add t e, cs′′′)
→ (t,cs′)
parseTerm cs = let (f , cs′) = parseFactor cs in case cs′ of ’*’:cs′′ →let(t,cs′′′)=parseTerm cs′′ in(Mul f t,cs′′′)
→ (f , cs′)
parseFactor (’(’ : cs) = let (e, cs′) = parseExpr cs in case cs′ of ’)’:cs′′ →(e,cs′′)
→ error “closing parenthesis expected” parseFactor (c : cs)
| isLetter c = let (ident,cs′) = parseIdentRest [c] cs in (Var ident,cs′)
| isDigit c = let (num, cs′) = parseNumberRest (numFromDigit c) cs in (Num num, cs′)
parseFactor cs = error (“factor expected; ” + show cs + ” found”)
Since we allow multi-letter dentifiers and multi-digit numbers, we use auxiliary parser to complete
them after we recognised the first character:
parseIdentRest :: String → String → (String , String ) parseIdentRest s (c : cs ) | isLetter c = parseIdentRest (s + [ c ]) cs parseIdentRest s cs = (s,cs)
parseNumberRest :: Integer → String → (Integer , String ) parseNumberRest n (c : cs)
| isDigit c = parseNumberRest (10 ∗ n + numFromDigit c) cs parseNumberRest n cs = (n,cs)
numFromDigit :: Char → Integer numFromDigit c = toInteger (ord c − ord ’0’)
2