COMP90045 Programming Language Implementation
LALR Parser Generation with happy
Harald Søndergaard Lecture 12
Semester 1, 2019
PLI (Sem 1, 2019) Parser Generation with happy ⃝c University of Melbourne 1 / 18
Parser Generators
A parser generator takes a description of a context-free language and builds a parser from that.
Often the generated parser is assumed to collaborate with an automatically generated scanner, so parser generators and scanner generators often come as pairs.
Lex and Yacc are the ancestors of most generator tools. They produce C source files containing a scanner and a parser. More modern incarnations are flex and bison.
Similar tools exist to generate scanners and parsers in other programming languages. Here we focus on happy.
Yacc, bison, happy and many similar tools generate LALR parsers.
PLI (Sem 1, 2019) Parser Generation with happy ⃝c University of Melbourne 2 / 18
LALR: Merging Similar LR States
0S→•aTd$ S →•bUd$ S →•aUe$ S→•bTe$
2S→a•Td$ S→a•Ue$ T→•cd U→•ce
1S→b•Ud$ S→b•Te$ T→•ce U→•cd
4T→c•d U→c•e
a
S→aTd
S→bUd S→aUebc S→bTe
T→c
U→c
c
Many LR(1) parsers contain pairs (or larger sets) of states whose LR(1) items differ only in their right-context tokens. In an LALR parser such states are merged.
3T→c•e U→c•d
PLI (Sem 1, 2019) Parser Generation with happy ⃝c University of Melbourne 3 / 18
3+4T→c•de U→c•de
Building LALR Parsers
The straightforward way to construct the tables for an LALR parser: Construct the state machine for an LR(1) parser.
Compute the core of each state. The core of a set of LR(1) items is the set of LR(0) items you get by discarding the right-context tokens.
Merge all states that have the same core, yielding the smaller LALR state machine.
Construct the action table of the LALR parser from this smaller state machine using the same algorithm we use to construct the action table of the LR(1) parser from the original state machine.
PLI (Sem 1, 2019) Parser Generation with happy ⃝c University of Melbourne 4 / 18
Building LALR Parsers (Continued)
If a state J in the LALR state machine results from the union of the original states I1, …, In, then for all X, the states denoted by goto(I1,X), …, goto(In,X) in the LR(1) parser must all have the same core, and therefore would be merged into one state in the LALR state machine. Makegoto(J,X) go to this state.
PLI (Sem 1, 2019) Parser Generation with happy ⃝c University of Melbourne 5 / 18
Parser Specifications
A grammar specification (happy input) has this structure:
{ module/import section } declarations
%%
productions (translation rules) { auxiliary Haskell functions }
The first and last part (between braces) will be copied to the output program.
The specification is kept in a file Parse.y and from this, happy produces a parser called Parse.hs.
Say we want to handle the language given by this grammar:
exp ::= exp + exp | exp – exp | exp * exp | exp ^ exp | ( exp ) | number
PLI (Sem 1, 2019) Parser Generation with happy ⃝c University of Melbourne 6 / 18
Generating the Parser and Scanner
Let us assume that we want to use alex to generate a scanner function lexer, and then happy can use this function in the generated parser.
Here is a Makefile that will generate an executable called Parse:
Parse: Parse.y Lexer.x alex Lexer.x
happy Parse.y
ghc Parse.hs
Note: The last 3 lines start with a single tab character.
PLI (Sem 1, 2019) Parser Generation with happy ⃝c University of Melbourne 7 / 18
The Scanner Spec (in Lexer.x)
{
module Lexer ( Token(..), lexer) where }
%wrapper “basic” — Actions will be of type String -> Token
rules :-
$white+ ;
\+ {
\- {
\* {
\^ {
\( {
\) {
[0-9]+ {
\s->PLUS}
\s -> MINUS }
\s -> TIMES }
\s->EXP}
\s->LEFT}
\s -> RIGHT } \s->NUM(reads::Int)}
PLI (Sem 1, 2019)
Parser Generation with happy
⃝c University of Melbourne
8 / 18
The Scanner Spec (in Lexer.x), Continued
{
data Token
= PLUS | MINUS | TIMES | EXP | LEFT | RIGHT | NUM Int deriving (Eq, Show)
lexer = alexScanTokens
}
PLI (Sem 1, 2019) Parser Generation with happy ⃝c University of Melbourne 9 / 18
Parser Generator Declarations (in Parse.y)
{
module Main where
import Lexer
}
%name p
%tokentype { Token } %error { parseError }
%left ’+’ ’-’
%left ’*’
%right ’^’
The parse function is to be called p.
%token
’+’ {PLUS} ’-’ { MINUS } ’*’ { TIMES } ’^’ {EXP} ’(’ {LEFT} ’)’ { RIGHT } num {NUM$$}
%%
We want ‘^’ to be right associative and to bind tighter than ‘*’, which is left-associative and binds tighter than +’ and ‘-’.
PLI (Sem 1, 2019) Parser Generation with happy ⃝c University of Melbourne 10 / 18
The Productions (Parse.y Continued)
Exp :: { Int }
: num
|Exp’+’Exp |Exp’-’Exp |Exp’*’Exp |Exp’^’Exp |’(’Exp’)’
{$1} {$1+$3} {$1-$3} {$1*$3} {$1^$3} {$2}
An action is associated with each production.
In actions, $n refers to the value of the nth symbol on the right-hand side.
The idea is that, when the grammar rule is used (in a reduction), the corresponding expression is evaluated. The calculated value becomes the value associated with production’s left-hand side (a non-terminal).
In this example, values are integers, but we could also make them snippets of abstract syntax trees…
PLI (Sem 1, 2019) Parser Generation with happy ⃝c University of Melbourne 11 / 18
The Productions (Parse.y Continued)
Here we define instead a type Exp for ASTs, and we make the parser generate values of this type:
data Exp
= Num Int
| Plus Exp Exp
| Minus Exp Exp
| Times Exp Exp
| Power Exp Exp
deriving (Eq,Show)
Exp :: { Exp }
: num
| Exp ’+’ Exp
| Exp ’-’ Exp
| Exp ’*’ Exp
| Exp ’^’ Exp
| ’(’ Exp ’)’
{Num$1} {Plus$1$3} {Minus$1$3} {Times$1$3} {Power$1$3} {$2}
PLI (Sem 1, 2019)
Parser Generation with happy
⃝c University of Melbourne 12 / 18
Last Bits (Parser.y)
{
parseError :: [Token] -> a parseError toks
= error (“Parse error at token ” ++ show (take 5 toks))
main :: IO ()
main
= do
progname <- getProgName
args <- getArgs
checkArgs progname args input <- readFile (head args) let output = p (lexer input) putStrLn (show output)
}
PLI (Sem 1, 2019) Parser Generation with happy ⃝c University of Melbourne 13 / 18
Associativity and Precedence Again
You can give a token an associativity by specifying %left, %right or %nonassoc. Tokens declared later have higher precedence.
The associativity and precedence are there to help happy resolve ambiguities in the grammar.
%nonassoc EQ NE LT GT LE GE %left PLUS MINUS
%left TIMES
%right EXP
This says, for example, that exponentiation associates to the right
(so 3^2^3 is 6561, not 729) exponentiation binds tighter than addition
(so 4+6^2 is 40, not 100)
PLI (Sem 1, 2019) Parser Generation with happy ⃝c University of Melbourne 14 / 18
Context-Dependent Precedence
It is not uncommon that an operator lexeme plays different roles, with different precedences.
For example, we may want to extend our grammar to allow for unary minus:
exp ::= exp + exp | exp - exp | exp * exp
| exp ^ exp | ( exp ) | - exp | number
But we want unary minus to bind tighter than ‘*’, ‘+’, and tighter than the binary ‘-’.
PLI (Sem 1, 2019) Parser Generation with happy ⃝c University of Melbourne 15 / 18
Context-Dependent Precedence
%left ’+’ ’-’
%left ’*’
%nonassoc NEG
%right ’^’
Exp :: { Exp }
: num
{Num$1} {Plus$1$3} {Minus$1$3} {Times$1$3} {Power$1$3} {$2} {Neg$2}
Here we have introduced a dummy token NEG and we use the %prec annotation to override the usual precedence of the minus.
| Exp
| Exp
| Exp
| Exp
| ’(’
|’-’Exp %precNEG
’+’ Exp
’-’ Exp
’*’ Exp
’^’ Exp
Exp ’)’
PLI (Sem 1, 2019) Parser Generation with happy ⃝c University of Melbourne 16 / 18
More Support
Under “Other Resources” on the LMS is a “Happy User Guide”.
Unfortunately parser generators are notorious for their bad error handling and reporting. Also, debugging parser specifications is virtually impossible without an understanding of the technology (shift-reduce parsing) on which the generated parser is based.
If you run happy with the ‘-i’ option, it will generate a file Parse.info which contains information about how the parser was generated (in fact the whole sets-of-items collection is in there).
If you run happy with the ‘-adi’ options, the parser it generates will also provide a trace of all its actions.
PLI (Sem 1, 2019) Parser Generation with happy ⃝c University of Melbourne 17 / 18
Next Up
Next week (Week 7) we cover semantic analysis, then turn to intermediate code generation.
Please note that we lose the Week 7 Friday tutes to Good Friday.
A replacement tute is scheduled for Thursday 18 April 17:20 to 18:10 in David Caro 206. If that time does not suit you, feel free to drop in to one of the other Thursday tutes (15:20 and 16:20 in David Caro 205).
PLI (Sem 1, 2019) Parser Generation with happy ⃝c University of Melbourne 18 / 18