CS202 ASSIGNMENT CODE
ASSIGNMENT 2
SOLVE THE EXERCISE(S) BY MODIFYING THE Compiler.hs FILE IN THE a2 DIRECTORY OF THE ASSIGNMENTS REPOSITORY AND SUBMIT YOUR Compiler.hs FILE ON THE COURSE BLACKBOARD UNDER “ASSIGNMENT 2”.
DUE: MONDAY, FEB 10, 11:59PM
Complete Compiler.hs so that it implements a compiler from a subset of the R1 language to x86 assembly. The required subset is defined by the grammar below.
R1Exp ::=
|
| R1Exp + R1Exp
| let
You will need to complete the definitions of the following passes:
– uniquify
– remove-complex-opera
– explicate-control
– select-instructions
– assign-homes
– patch-instructions
This assignment corresponds roughly to the following exercises in the textbook:
– Exercise 2 (page 29)
– Exercise 3 (page 29)
– Exercise 4 (page 31)
– Exercise 5 (page 33)
– Exercise 6 (page 34)
– Exercise 7 (page 35)
Challenge Exercise
Implement a random expression generator that generates correct expressions. Write a function called randomTest which generates 100 test cases and runs your compiler on them.
NOTE: if you are an undergraduate student, and you solve the challenge exercise for extra credit, please note this in a comment at the top of your file.
Online Compiler
An online compiler for this assignment is available here.
Setting up Haskell and Stack
We will implement our compilers in Haskell, and build them using Stack, a build system for Haskell.
Stack is available for Windows, MacOS, Linux, BSD, and more. To install Stack, following the instructions here. If you run into problems installing Stack, please post to Piazza or email the instructor.
I recommend using Atom, emacs, or Vim to edit Haskell code. You may find Hoogle useful for looking up Haskell library functions.
Building the Runtime
The second step is to build the runtime system that our compiled programs will use. The runtime is implemented in C, in the file runtime.c. You can compile the runtime into an object file (runtime.o) as follows:
gcc -c -g -std=c99 runtime.c
This will produce a file named runtime.o. The -g flag is to tell the compiler to produce debug information that you may need to use the gdb (or lldb) debugger.
Next, suppose your compiler has produced the x86 assembly program file foo.s (the .s filename extension is the standard one for assembly programs). To produce an executable program, you can then run:
gcc -g runtime.o foo.s
which will produce the executable program named a.out by linking your program with the runtime.
Running the Compiler & Tests
To compile a program into an assembly file, navigate to the directory containing the compiler implementation and run Main.hs. For example:
cd a1/
stack runghc Main.hs tests/test1.r0
will produce the file tests/test1.s.
To run your compiled program, first use GCC to assemble it into a binary, then run the resulting binary (which will be called a.out):
gcc -g ../runtime.o tests/test1.s
./a.out
The process above runs a single program and allows you to view its output.
To run all of the tests, navigate to the directory containing the compiler implementation and run RunTests.hs. For example:
cd a1/
stack runghc RunTests.hs
This process allows you to quickly verify that all of the test cases pass, but does not print out the output of each compiler pass.
Assignment Submission
This repository contains the skeleton of each assignment’s solution. The only file you will need to change is Compiler.hs. When you submit your assignment solution on Blackboard, you should upload _only_ the Compiler.hs file. Please do not change any other files; I won’t have access to changes you make to other files when grading your assignments. module AST where
data R0Expr = IntE Int | PlusE R0Expr R0Expr deriving (Eq, Ord, Show) {-# LANGUAGE Strict #-} module Compiler where
import Data.List import Data.Maybe import Text.Pretty.Simple (pPrint, pPrintNoColor)
import Gensym import AST
type Binding = (String, R0Expr)
————————
— select-instructions
————————
data X86Arg = VarXE String | DerefE String Int | RegE String | IntXE Int deriving (Eq, Ord, Show)
data X86Instr = MovqE X86Arg X86Arg | AddqE X86Arg X86Arg | CallqE String | RetqE deriving (Eq, Ord, Show)
siInt :: R0Expr -> X86Arg siInt (IntE i) = undefined
siTail :: R0Expr -> [X86Instr] siTail e = undefined
————–
— print-x86
————–
macos :: Bool macos = False
printFun :: String -> String printFun s = case macos of True -> “_” ++ s False -> s
printX86Arg :: X86Arg -> String printX86Arg e = undefined
printX86Instr :: X86Instr -> String printX86Instr e = undefined
printX86 :: [X86Instr] -> String printX86 ss = undefined
——————-
— compile / main
——————-
compile :: R0Expr -> String compile = printX86 . siTail
logOutput :: Show b => String -> (a -> b) -> (a -> IO b) logOutput name f = x -> do let result = f x putStrLn “————————————————–” putStrLn $ “Output of pass ” ++ name ++ “:” putStrLn “————————————————–” pPrintNoColor result putStrLn “” return result
compileLog :: R0Expr -> IO String compileLog e = (logOutput “input” id) e >>= (logOutput “siTail” siTail) >>= (logOutput “printX86” printX86)
{-# LANGUAGE Strict #-} module Interpreter where
import System.Environment import AST import Parser hiding (main)
eval :: R0Expr -> Int eval e = case e of IntE i -> i PlusE e1 e2 -> (eval e1) + (eval e2)
main :: IO () main = do [fileName] <- getArgs putStrLn "============================================================" putStrLn $ "Interpreting the file: " ++ (show fileName) putStrLn "============================================================" programAST <- parseFile (fileName) putStrLn $ show $ eval programAST {-# LANGUAGE Strict #-} module Main where import System.Environment import Parser hiding (main) import Compiler hiding (main) main :: IO () main = do [fileName] <- getArgs putStrLn "============================================================" putStrLn $ "Compiling the file: " ++ (show fileName) putStrLn "============================================================" programAST <- parseFile (fileName) compiledASM <- compileLog programAST writeFile (fileName ++ ".s") compiledASM module Parser where import Text.Parsec import Text.Parsec.String (Parser) import qualified Text.Parsec.Expr as Ex import qualified Text.Parsec.Token as Tok import Data.Functor.Identity import AST langDef :: Tok.LanguageDef () langDef = Tok.LanguageDef { Tok.commentStart = "{-" , Tok.commentEnd = "-}" , Tok.commentLine = "--" , Tok.nestedComments = True , Tok.identStart = letter , Tok.identLetter = alphaNum <|> oneOf “_'” , Tok.opStart = oneOf “:!#%&*+./<=>?@\^|-~” , Tok.reservedNames = [“let”, “in”] , Tok.reservedOpNames = [“+”, “=”] , Tok.caseSensitive = True }
lexer :: Tok.TokenParser () lexer = Tok.makeTokenParser langDef
parens :: Parser a -> Parser a parens = Tok.parens lexer
whiteSpace :: Parser () whiteSpace = Tok.whiteSpace lexer
reserved :: String -> Parser () reserved = Tok.reserved lexer
semiSep :: Parser a -> Parser [a] semiSep = Tok.semiSep lexer
reservedOp :: String -> Parser () reservedOp = Tok.reservedOp lexer
prefixOp :: String -> (a -> a) -> Ex.Operator String () Identity a prefixOp s f = Ex.Prefix (reservedOp s >> return f)
intLit :: Parser R0Expr intLit = Tok.integer lexer >>= return . IntE . fromIntegral
identifier :: Parser String identifier = Tok.identifier lexer
— Operator table table :: Ex.OperatorTable String () Identity R0Expr table = [ [infixOp “+” (PlusE) Ex.AssocLeft ] ]
infixOp name fun assoc = Ex.Infix (do{ reservedOp name; return fun }) assoc
expr :: Parser R0Expr expr = Ex.buildExpressionParser table factor
factor :: Parser R0Expr factor = intLit <|> parens expr
contents :: Parser a -> Parser a contents p = do Tok.whiteSpace lexer r <- p eof return r parseExpr :: String -> Either ParseError R0Expr parseExpr s = parse (contents expr) “” s
parseFile :: String -> IO R0Expr parseFile fileName = do s <- readFile fileName case parseExpr s of Right e -> return e Left err -> error $ “Failed to parse ” ++ fileName ++ “: ” ++ (show err)
main :: IO () main = do putStrLn $ show $ parseExpr “0 + (1 + x)” putStrLn $ show $ parseExpr “let x = 5 + z in x + y” e <- parseFile "testfile.r1" putStrLn $ show e module RunTests where import System.Directory import Data.List import System.Process import Parser hiding (main) import Compiler hiding (main) import Interpreter hiding (main) logging :: Bool logging = False logOut :: String -> IO () logOut s | logging = putStrLn s | otherwise = return ()
readInt :: String -> Int readInt = read
runTest :: String -> IO () runTest fileName = do putStrLn “============================================================” putStrLn fileName putStrLn “============================================================”
e <- parseFile ("tests/" ++ fileName) let interpResult = eval e putStrLn $ "Interpreter result: " ++ (show interpResult) -- Compile the program let r = compile e if logging then compileLog e else return r -- Write out the assembly code to a .s file let assemblyFileName = "tests/" ++ fileName ++ ".s" writeFile assemblyFileName r -- Compile the assembly to an executable gccOutput <- readProcess "gcc" ["-g", "../runtime.o", assemblyFileName] "" putStrLn $ "GCC Output: " ++ gccOutput -- Run the executable x86Result <- readProcess "./a.out" [] "" putStrLn $ "x86 Result: " ++ x86Result -- Check that executable result was the same let x86ResultInt = readInt x86Result case x86ResultInt == interpResult of True -> putStrLn “Passed!” False -> putStrLn “FAIL: Results don’t match”
— Remove the .s file and compiled files removeFile assemblyFileName removeFile “a.out”
putStrLn “”
main :: IO () main = do sourceFiles <- listDirectory "tests" let files = sort $ filter (".r0" isSuffixOf) sourceFiles putStrLn $ "Test files: " ++ (show files) mapM_ runTest files putStrLn "done" module AST where data R1Expr = IntE Int | VarE String | PlusE R1Expr R1Expr | LetE String R1Expr R1Expr deriving (Eq, Ord, Show) {-# LANGUAGE Strict, ScopedTypeVariables #-} module Compiler where import Data.List import Data.Maybe import Text.Pretty.Simple (pPrint, pPrintNoColor) import Gensym import AST -- a Type Alias for variables -- Variables are represented by Strings type Variable = String ------------- -- uniquify ------------- -- A Variable Environment maps variables to variables type VEnv = [(Variable, Variable)] -- The uniquify pass, for an expression -- Input: -- - an R1 expression -- - a variable environment -- Output: an R1 expression, with variables renamed to be unique uniquifyExp :: R1Expr -> VEnv -> R1Expr uniquifyExp e env = undefined
— The uniquify pass, for an R1 program — Input: an R1 expression — Output: an R1 expression, with variables renamed to be unique — This function simply calls uniquifyExp with the empty variable environment uniquify :: R1Expr -> R1Expr uniquify e = uniquifyExp e []
————————-
— remove-complex-opera
————————-
— a Binding maps a variable to an expression type Binding = (Variable, R1Expr)
— The remove-complex-operand pass on an expression in TAIL POSITION — input: COMPLEX EXPRESSION — output: COMPLEX EXPRESSION in A-Normal Form rcoExp :: R1Expr -> R1Expr rcoExp e = undefined
— The remove-complex-operand pass on an expression in ARGUMENT POSITION — input: COMPLEX EXPRESSION — output: pair: SIMPLE EXPRESSION and LIST OF BINDINGS — — the LIST OF BINDINGS maps variables to SIMPLE EXPRESSIONS rcoArg :: R1Expr -> (R1Expr, [Binding]) rcoArg e = undefined
— Make a “LET” expression from a list of bindings and a final “body” expression mkLet :: [Binding] -> R1Expr -> R1Expr mkLet [] body = body mkLet ((x, e) : bs) body = LetE x e (mkLet bs body)
———————-
— explicate-control
———————-
data C0Arg = IntC0 Int | VarC0 Variable deriving (Eq, Ord, Show)
data C0Basic = C0ArgE C0Arg | C0PlusE C0Arg C0Arg deriving (Eq, Ord, Show)
data C0Stmt = AssignC0 Variable C0Basic deriving (Eq, Ord, Show)
data C0Tail = ReturnC0 C0Basic | SeqC0 C0Stmt C0Tail deriving (Eq, Ord, Show)
— Compile a R1 argument (integer or variable) into a C0Arg expression ecArg :: R1Expr -> C0Arg ecArg e = case e of IntE i -> undefined VarE x -> undefined
— Compile a BASIC R1 Expression into a C0Basic Expression ecBasic :: R1Expr -> C0Basic ecBasic e = case e of PlusE e1 e2 -> undefined _ -> undefined
— The explicate-control pass on an expression in TAIL POSITION — input: a COMPLEX EXPRESSION in A-Normal Form — output: a C0 Tail expression ecTail :: R1Expr -> C0Tail ecTail e = case e of IntE _ -> undefined VarE _ -> undefined PlusE e1 e2 -> undefined LetE x e1 e2 -> undefined
— The explicate-control pass on an expression in ASSIGNMENT POSITION — input: — – the variable being assigned — – the R1 Expression it is being assigned to — – a C0 Tail expression describing what should happen _after_ the assignment — output: a C0 Tail expression ecAssign :: Variable -> R1Expr -> C0Tail -> C0Tail ecAssign x e k = case e of IntE _ -> undefined VarE _ -> undefined PlusE e1 e2 -> undefined LetE x’ e1 e2 -> undefined
————————
— select-instructions
————————
data X86Arg = VarXE Variable | DerefE String Int | RegE String | IntXE Int deriving (Eq, Ord, Show)
data X86Instr = MovqE X86Arg X86Arg | AddqE X86Arg X86Arg | RetqE deriving (Eq, Ord, Show)
siArg :: C0Arg -> X86Arg siArg e = case e of IntC0 i -> undefined VarC0 x -> undefined
— The select-instructions pass on a C0Stmt statement — input: a C0Stmt — output: a list of pseudo-x86 instructions siStmt :: C0Stmt -> [X86Instr] siStmt e = case e of AssignC0 x (C0ArgE a) -> undefined AssignC0 x (C0PlusE a1 a2) -> undefined
— The select-instructions pass on a C0Tail expression — input: a C0 Tail expression — output: a list of pseudo-X86 instructions siTail :: C0Tail -> [X86Instr] siTail e = case e of ReturnC0 a -> undefined SeqC0 s t -> undefined
—————–
— assign-homes
—————–
— Find the variables used in an x86 “arg” varsArg :: X86Arg -> [Variable] varsArg e = case e of VarXE s -> undefined RegE r -> undefined IntXE i -> undefined
— Find the variables used in an x86 instruction varsInstr :: X86Instr -> [Variable] varsInstr e = case e of MovqE a1 a2 -> undefined AddqE a1 a2 -> undefined RetqE -> []
— Given an integer offset and a variable name, — map the variable to a memory location on the stack — (i.e. give the variable a “home”) mkStackLoc :: (Int, Variable) -> (Variable, X86Arg) mkStackLoc (i, x) = (x, DerefE “rbp” (-8 * (i + 1)))
— The assign-homes pass — input: a list of pseudo-x86 instructions — output: a pair — – a list of x86 instructions (without variables) — – the number of stack locations used assignHomes :: [X86Instr] -> ([X86Instr], Int) assignHomes ss = — get a list of variable names without duplicates let localVariables = nub $ concat (map varsInstr ss) — assign each variable a location on the stack stackAssignments = zip [0..] localVariables — make a stack location expression for each assignment homes = map mkStackLoc stackAssignments — replace each use of a variable with a ref to its home newInstructions = map (ahInstr homes) ss — return the new instructions and number of homes used in (newInstructions, length homes)
— The assign-homes pass, for a single instruction — inputs: — – a mapping from variables to their “homes” — – a single pseudo-x86 instruction — output: a single x86 instruction ahInstr :: [(Variable, X86Arg)] -> X86Instr -> X86Instr ahInstr homes e = case e of MovqE a1 a2 -> undefined AddqE a1 a2 -> undefined RetqE -> undefined
— The assign-homes pass, for a single pseudo-x86 “arg” — inputs: — – a mapping from variables to their “homes” — – a single pseudo-x86 “arg” — output: a single x86 “arg” ahArg :: [(Variable, X86Arg)] -> X86Arg -> X86Arg ahArg homes e = undefined
———————–
— patch-instructions
———————–
— The patch-instructions pass — input: a pair — – a list of x86 instructions — – the number of stack locations used — output: a pair — – a list of _patched_ x86 instructions — – the number of stack locations used patchInstructions :: ([X86Instr], Int) -> ([X86Instr], Int) patchInstructions (ss, numHomes) = (concat $ map piInstr ss, numHomes)
— The patch-instructions pass, for a single instruction — input: a pair — – a single x86 instruction — – the number of stack locations used — output: a pair — – a single _patched_ x86 instruction — – the number of stack locations used — Patched instructions contain at most one memory access in each movq or addq instruction piInstr :: X86Instr -> [X86Instr] piInstr e = case e of MovqE (DerefE r1 i1) (DerefE r2 i2) -> undefined MovqE _ _ -> [e] AddqE (DerefE r1 i1) (DerefE r2 i2) -> undefined AddqE _ _ -> [e] RetqE -> [e]
————–
— print-x86
————–
— Set this to True if you’re using macos macos :: Bool macos = False
— Print a function or label name — Add a _ at the front if we’re using macos printFun :: String -> String printFun s = case macos of True -> “_” ++ s False -> s
— Align the size of the stack frame to alignment bytes — Input: — – n: the number of bytes of stack space used on the current stack frame — – alignment: the desired alignment (in bytes) – for x86, usually 16 bytes — Output: the size in bytes of the correctly aligned stack frame align :: Int -> Int -> Int align n alignment = case n mod alignment of 0 -> n _ -> n + (alignment – (n mod alignment))
— The printX86 pass for x86 “args” printX86Arg :: X86Arg -> String printX86Arg e = undefined
— The printX86 pass for x86 instructions printX86Instr :: X86Instr -> String printX86Instr e = undefined
— The printX86 pass for x86 programs — Input: a pair — – a list of instructions — – the number of stack locations used in the program — Output: x86 assembly, as a string printX86 :: ([X86Instr], Int) -> String printX86 (ss, numHomes) = undefined
——————-
— compile / main
——————-
compile :: R1Expr -> String compile = printX86 . patchInstructions . assignHomes . siTail . ecTail . rcoExp . uniquify
logOutput :: Show b => String -> (a -> b) -> (a -> IO b) logOutput name f = x -> do let result = f x putStrLn “————————————————–” putStrLn $ “Output of pass ” ++ name ++ “:” putStrLn “————————————————–” pPrintNoColor result putStrLn “” return result
compileLog :: R1Expr -> IO String compileLog e = (logOutput “input” id) e >>= (logOutput “uniquify” uniquify) >>= (logOutput “rcoExp” rcoExp) >>= (logOutput “ecTail” ecTail) >>= (logOutput “siTail” siTail) >>= (logOutput “assignHomes” assignHomes) >>= (logOutput “patchInstructions” patchInstructions) >>= (logOutput “printX86” printX86)
{-# LANGUAGE Strict #-} module Interpreter where
import System.Environment import AST import Parser hiding (main)
type Env = [(String, Int)]
eval :: R1Expr -> Env -> Int eval e env = case e of IntE i -> i VarE x -> case lookup x env of Just i -> i Nothing -> error $ “Failed to find variable ” ++ (show x) ++ ” in environment ” ++ (show env) PlusE e1 e2 -> (eval e1 env) + (eval e2 env) LetE x e1 e2 -> let v1 = eval e1 env env’ = (x, v1) : env in eval e2 env’
main :: IO () main = do [fileName] <- getArgs putStrLn "============================================================" putStrLn $ "Interpreting the file: " ++ (show fileName) putStrLn "============================================================" programAST <- parseFile (fileName) putStrLn $ show $ eval programAST [] {-# LANGUAGE Strict #-} module Main where import System.Environment import Parser hiding (main) import Compiler hiding (main) main :: IO () main = do [fileName] <- getArgs putStrLn "============================================================" putStrLn $ "Compiling the file: " ++ (show fileName) putStrLn "============================================================" programAST <- parseFile (fileName) compiledASM <- compileLog programAST writeFile (fileName ++ ".s") compiledASM module Parser where import Text.Parsec import Text.Parsec.String (Parser) import qualified Text.Parsec.Expr as Ex import qualified Text.Parsec.Token as Tok import Data.Functor.Identity import AST langDef :: Tok.LanguageDef () langDef = Tok.LanguageDef { Tok.commentStart = "{-" , Tok.commentEnd = "-}" , Tok.commentLine = "--" , Tok.nestedComments = True , Tok.identStart = letter , Tok.identLetter = alphaNum <|> oneOf “_'” , Tok.opStart = oneOf “:!#%&*+./<=>?@\^|-~” , Tok.reservedNames = [“let”, “in”] , Tok.reservedOpNames = [“+”, “=”] , Tok.caseSensitive = True }
lexer :: Tok.TokenParser () lexer = Tok.makeTokenParser langDef
parens :: Parser a -> Parser a parens = Tok.parens lexer
whiteSpace :: Parser () whiteSpace = Tok.whiteSpace lexer
reserved :: String -> Parser () reserved = Tok.reserved lexer
semiSep :: Parser a -> Parser [a] semiSep = Tok.semiSep lexer
reservedOp :: String -> Parser () reservedOp = Tok.reservedOp lexer
prefixOp :: String -> (a -> a) -> Ex.Operator String () Identity a prefixOp s f = Ex.Prefix (reservedOp s >> return f)
intLit :: Parser R1Expr intLit = Tok.integer lexer >>= return . IntE . fromIntegral
identifier :: Parser String identifier = Tok.identifier lexer
var :: Parser R1Expr var = identifier >>= return . VarE
— funCall :: Parser R1Expr — funCall = do — f <- identifier -- parens whiteSpace -- return $ FunCallE f -- Operator table table :: Ex.OperatorTable String () Identity R1Expr table = [ [infixOp "+" (PlusE) Ex.AssocLeft ] ] infixOp name fun assoc = Ex.Infix (do{ reservedOp name; return fun }) assoc expr :: Parser R1Expr expr = Ex.buildExpressionParser table factor letExpr :: Parser R1Expr letExpr = do reserved "let" x <- identifier reservedOp "=" e1 <- expr reserved "in" e2 <- expr return $ LetE x e1 e2 factor :: Parser R1Expr factor = intLit -- <|> try funCall <|> var <|> letExpr <|> parens expr
contents :: Parser a -> Parser a contents p = do Tok.whiteSpace lexer r <- p eof return r parseExpr :: String -> Either ParseError R1Expr parseExpr s = parse (contents expr) “” s
parseFile :: String -> IO R1Expr parseFile fileName = do s <- readFile fileName case parseExpr s of Right e -> return e Left err -> error $ “Failed to parse ” ++ fileName ++ “: ” ++ (show err)
main :: IO () main = do putStrLn $ show $ parseExpr “0 + (1 + x)” putStrLn $ show $ parseExpr “let x = 5 + z in x + y” e <- parseFile "testfile.r1" putStrLn $ show e module RunTests where import System.Directory import Data.List import System.Process import Parser hiding (main) import Compiler hiding (main) import Interpreter hiding (main) logging :: Bool logging = False logOut :: String -> IO () logOut s | logging = putStrLn s | otherwise = return ()
readInt :: String -> Int readInt = read
runTest :: String -> IO () runTest fileName = do putStrLn “============================================================” putStrLn fileName putStrLn “============================================================”
e <- parseFile ("tests/" ++ fileName) let interpResult = eval e [] putStrLn $ "Interpreter result: " ++ (show interpResult) -- Compile the program let r = compile e if logging then compileLog e else return r -- Write out the assembly code to a .s file let assemblyFileName = "tests/" ++ fileName ++ ".s" writeFile assemblyFileName r -- Compile the assembly to an executable gccOutput <- readProcess "gcc" ["-g", "../runtime.o", assemblyFileName] "" putStrLn $ "GCC Output: " ++ gccOutput -- Run the executable x86Result <- readProcess "./a.out" [] "" putStrLn $ "x86 Result: " ++ x86Result -- Check that executable result was the same let x86ResultInt = readInt x86Result case x86ResultInt == interpResult of True -> putStrLn “Passed!” False -> putStrLn “FAIL: Results don’t match”
— Remove the .s file and compiled files removeFile assemblyFileName removeFile “a.out”
putStrLn “”
main :: IO () main = do sourceFiles <- listDirectory "tests" let files = sort $ filter (".r1" isSuffixOf) sourceFiles putStrLn $ "Test files: " ++ (show files) mapM_ runTest files putStrLn "done"