Homework 1 – A Compiler for Simple Expressions ∗
Zhiyao Liang MUST FIT zyliang@must.edu.mo
Wednesday 31 October 2018
Abstract
The task of this assignment is to implement a compiler that can evaluate simple arithmetic expressions. [2] [1].
1 Preparation
You need to have some programming environment to edit, compile and run C programs.
2 The ExprS language
We define a language called Expr–Simple (ExpS for short) in this section. ExpS can be described in English as follows:
∗The picture above is found from the web, which is an example of simple math expres- sion. It is powerful, isn’t it?
1
• An ID (variable) is a sequence of English letters.
• A number is sequence of digits, possibly followed by a negative sign
−. Only integers are allowed.
• An expression is any composition of number, ID, and the 4 arithmetic operators: + – * /, and parenthesis: (). The meaning of an expression is the same as the arithmatic expression that we learned in middle school. I.e., considering precedence, * and / are below (), and above + and -. The associativity of the operators is from left to right. The value of an expression is an integer. Especially a division returns an integer value, just like in C. For example, 7/3 is 2.
• A statement is either an assignment statement or expression statement.
– An assignment as the form x = expr, followed by a carriage return (Re- turn) or end of file (EOF). That is, a variable name and an expression appearing on the LHS and RHS of an assignment operator respectively. This assignment declares a new variable x if x has not been introduced before. Otherwise, if x is introduced earlier, it updates the value of x.
– An expression statement is an expression followed by Return or EOF.
• Excution output. For an expression statement, the value of the expression is printed out. For an assignment statement the output is optional: nothing, or the value of the RHS expression is printed out.
• Error report. When some lexical or grammar error is found, or a divisor is found to be 0, an error message is printed out accordingly.
• Runtime Environment. It covers two parts:
1. During the runtime, a datastructure is maintained that can support the
following operations efficiently:
– Record the names and values of all variables that are declared (appeared on LHS of some assignment).
– Check if a name is declared or not.
– Obtain the value of a name.
– Update the value of a name.
2. Some method to evaluate any expression (obtain the value). It is based on this datastructure (just mentioned), and the parse tree provided by a parser) of the expression to be evaluated.
2
2.1 Lexical rules of ExpS
We use regular expression to describe the tokens. Here is some explanation of notations:
• Alternatives (logical or) is represented by a vertical bar |, instead of the + sign.
• Concatenation is represented by a center dot ·, instead of a space.
• Three dots . . . means a range; E.g., a . . . z means all lowercase letters.
The tokens of ExpS are defined as follows by regular–expressions. A super- script star ∗ represents the kleen star.
Assign Plus Minus Times Divid LP aren RP aren Digit N atural N umber Letter ID
: =
: +
: −
: ∗
: /
: (
: )
: 0…9
: Digit · Digit∗
: N atural | M inus · N atural : a…z | A…Z
: Letter · Letter∗
Table 1: the lexical rules of tokens
2.2 Tokens of ExprS
The token types of Exps could be all of the names on the LHS of the rules in Table 2.1, except Digit and Letter. Some additional token types could be added, for example:
• Return: It represents the carriage return appearing after each state- ment.
• EOF: It means end of file. An extension of the compiler is to read a script file of ExpS, which is a file containing a sequence of ExpS state- ments. When the end of file is reached, an EOF token is generated.
• QUIT: When this token appears, the program finishes. 3
3 Grammar of ExprS
The grammar of ExprS can be described by the following simple grammar. A name starting with a lowercase letter is a variable, otherwise it is a constant (token).
program → statementList → statement → end → assignment → expression →
operator →
statementList
statement | statement · statementList assignment · end | exprssion · end Return | EOF
ID · assign · expression
Number | ID |
LParen · expression · RParen | expression · operator · expression Plus | Minus | Times | Divid
Note that the grammar rules in Table 3 is not ideal for writing a recursive– descendent parser, since it does not represent the precedence and associa- tivity of operators.
4 A sample run of the calculator program
The following is what appears on screen in a sample run. A line starts with > is a response from the caculator program, otherwise, it is a line typed by a user.
> welcome to the caculator x =3
x+3
>6
y=8
x+y/2
>7 ((3+5)/6)*(x+y/x+(3+4)) > 12
3+
> syntax error …
6/0
> runtime error, divide by 0
3+!
4
> scanner error, ! is a problem
x+z
> z is not found
quit
> bye
5
1.
2.
Tasks to do
(5%) Adjust the grammar rules of ExprS so that it can describe the priority and associativity of operators. Such a grammar can support recursive–descendant parsing easily. Describe the grammar rules in a separate text file or as comments in your source code.
Design and implement a compiler that covers the following parts:
• (30%) A scanner of ExprS: Given a statement (just a statement) of ExprS, the scanner generates a linked list of tokens.
• (45%) A parser of ExprS: Given a linked list of tokens, the parser generates a parse tree.
• (15%) The runtime environment of ExprS, which is described in Section 2.
• (10%) Some code to test the compiler and interact with a user, as shown in Section 4.
Note that there is no need to generate some target code.
Write a program that tests and uses this compiler. The program works as an interpretor, like calculator that is described in Section 4.
3.
5.1 Bonus
The following tasks will be considered as extensions and will receive bonus.
• (5%) Using JFlap to draw a DFA that can describe the computation of a scanner of ExprS. Use the notations described in class to mark the edges of the DFA. Attached the JFlap file to the email.
5
•
•
•
6
• • •
•
•
•
•
(5%) Allowing floating point numbers (numbers with decimal points). The value of variables and expressions can be floating points. Espe- cially, you can allow division of floating point numbers, just like C. For example, the value 6/4 is 1, while 6.0/4 is 1.5.
(5%) Allowing = to be an operator, just like the C programming language. For example: The value of x=(y=z=6)*5 is 30. Only variable names can appear on the LHS of =. The precedence of = is below all other operators. You need to adjust the grammar accordingly.
(5%) Allowing a program of ExprS to be saved in some script file. A state- ment of
load filename
will load and run the file with the filename; all statements in the file will be executed in batch. When running a script file, the execution may stop when an error is found.
Submitting your homework
Due time: 10 December 2018 Monday, 7:00pm.
At most 3 students can form a group to do this homework together.
At the top of the every text file that you changed or created (maybe just the .c and .h files), record your name (if mutiple students forma group, provide all the names) and date as comments.
Please attach all of your source code files to the email, possibly zip them into a single file. Any information about how to run and under- stand your code (a readme file) will be helpful.
Attention: Do not attach any binary files (.o, .obj, .exe, .out), since google will consider them virus. Just send the source code and textual documents.
Send your email to:
zyliangstu@gmail.com
The title (subject) of your email should be like:
[name][hmkNum info]
6
Name can be in Chinese or English (Pingying). For example, if a student wants to submit homework 5, whose name Shan Li, then the title of the email should be:
[Li, Shan][hmk1 expression]
If it is a group work, include all members’ name in the title.
References
[1] Andrew W. Appel. Modern Compiler Implementation in C: Basic Tech- niques. Cambridge University Press, New York, NY, USA, 1997.
[2] Kenneth C. Louden. Compiler Construction Principles and Practice. PWS Publishing Company, 1997. ISBN 0-534-93972-4 http://www.cs.sjsu.edu/~louden/cmptext/.
7