程序代写代做代考 compiler C-

C-
Zhiyao Liang
MUST FIT
August, 2012
1/31

C- Overview
􏹩 It is prononced as C Minus.
􏹩 IsasubsetofC.
􏹩 Data includes integer and arrays of integer.
􏹩 Has recursive functions.
􏹩 Has if and while statements.
􏹩 A program includes global data declarations and function definitions.
3/31

C-Minus Program Sample 1
int fact( int x)
/* recursive factorial functions */
{ if (x > 1)
return x * fact(x-1);
else
return 1; }
void main(void) { intx;
x = read();
if (x >0) write (fact(x));
}
5/31

C-Minus Program Sample 2
int x[10];
int minloc (int a[], int low, int hight) { inti;intx;intk;
k = low;
x = a[low]
i = low + 1;
while (i < high) { if (a[i] < x) {x=a[i]; k=i; } i = i + 1; } return k; } 6/31 C-Minus Program Sample 2, cont. void sort(int a[], int low, int hight) { inti; intk; i = low ; while (i < high-1) { intt; k = minloc(a, I, high); t = a[k]; a[k] = a[i]; a[i] = t; i = i + 1; } } 7/31 C-Minus Program Sample 2, cont. void main(void) { inti; i = 0; while (i < 10) { x[i] = input(); i = i + 1; } sort(x, 0, 10); i = 0; while (i < 10) } {output(x[i]); i = i + 1} 8/31 Lexical Conventions of C- (Appendix A) 􏹩 Keywords: else if int return void while 􏹩 Special symbols + - * / < <= > >= == !=
= ; , ( ) [ ] { } /* */
􏹩 Other tokens are ID and NUM, defined by:
ID = letter letter*
NUM = digit digit*
letter = a|…|z|A|…|Z
digit = 0|…|9
􏹩 White space: blanks, newlines, tabs
White space is ignored except those that separate tokens.
􏹩 Comments are like /* …*/
Comments can appear any place where space can appear.
10/31

A DFA for making a C- Scanner (built using jflap)
11/31

BNF Grammar of the Syntax of C-
The following grammar has been significantly updated. See the homework file parse.c.
1. program → declaration-list
2. declaration-list → declaration-list declaration | declaration
3. declaration → var-declaration | fun-declaration
4. var-declaration → type-specifier ID | type-specifier ID [ NUM ]
5. type-specifier → int | void
6. fun-declaration → type-specifier ID ( params ) compound-stmt
7. params → param-list | void
8. param-list → param-list , param | param
9. param → type-specifier ID | type-specifier ID [ ]
10. compound-stmt → local-declarations statement-list
11. local-declarations → local-declarations var-declaration | empty
12. statement-list → statement-list statement |empty
13. statement → expression-stmt | compound-stmt |
selection-stmt | iteration-stmt | return-stmt
14. expression-stmt → expression ; | ;
13/31

BNF Grammar of the Syntax of C-, cont.
15. selection-stmt → if (expression ) statement | if (expression ) statement else statement
16. iteration-stmt → while ( expression ) statement
17. return-stmt → return ; | return expression ;
18. expression → var = expression | simple-exprssion
19. var → ID | ID [expression]
20. simple-expression → additive-expression relop additive-expression | additive-expression
21. relop → <= | < | > | >= | == | !=
22. additive-expression → additive-expression addop term | term
23. addop→+ | –
24. term → term mulop factor | factor
25. mulop→* | /
26. factor → ( expression ) | var | call | NUM
27. call → ID (args)
28. args → arg-list | empty
29. arg-list → arg-list , expression | expression
14/31

Types in C-
The value of expression can have one of three types, as follows:
􏹩 Integer. For example, an constant expression 123, or an arithmetic operation between two integer expressions.
􏹩 Void. For example, a call to a function whose defined return type is void.
􏹩 Address. In C-, each address is an address of an integer. For example, an array name, which could be an argument in a function call, has the address type.
The semantic analyzer with decided the type of each expression in a C- program, and check if there is any violation of types. These violations are defined as errors in the following slides.
16/31

Extension of C-, allowing address expression and operations
􏹩 An array name has the address type. For example, an array name can be passed as an argument in a function.
􏹩 An index operator expression has two allowed forms.
1. e1[e2] where e1 is an array’s name, e2 is an expression with integer type.
2. (e1)[e2] where e1 is an expression with address type, and e2 is an expression with integer type.
􏹩 e1 addop e2 is an address, when one of e1 and e2 is an address, while the other one is an integer, and addop is either + or -.
􏹩 e1 – e2 is an integer, when both e1 and e2 are addresses. The value of this operation should be calculated as
(e1 – e2)/sizeof(int). In C, only minus operation between two addresses of the same type is allowed. In C-, currently we consider each address is an address of an integer.
􏹩 e1 relop e2isaninteger,whilerelopisoneof> < >= <= == !=. Two addresses can be compared. 􏹩 Any other operation involving an address expression is not allowed. 17/31 Errors to be found by a Semantics analyzer of C- For a declaration or parameter of name x: 1. x is already defined in the current scope. 2. x is the same as a keyword. (Actually such an error is impossible to detected by the analyzer, since the parser or scanner must have find it) 3. x is a the name of a variable declaration, or an array declaration, or a paramter (since the parameter has a name, it is not of the kind VOID_PARAM), but the declaration type of x is void (recorded as VOID_TYPE). 4. x is a function declaration with return type integer, but in its body there is a branch of execution that misses a statement of the form return exp;, for some exression exp that has integer type. 5. x is a function declaration with return type void, but in its body there is a branch of execution that contains a statement of the form return exp;, for some expression exp whose type is not void. 18/31 Errors to be found by a Semantics analyzer of C-, cont. For a reference of a name x (using the name): 6. x is used as an ID, (i.e., x is an ID expression). 􏹩 The declaration of the name x cannot be found. 􏹩 Or, the declaration of the name x can be found but is a function. 7. An ID x is used as an array name in an array expression (i.e., an expression of index operator of the form x[e], for some expression e), but no array declaration with the name x can be found. (No need to check this error, why?) 8. x is used as a function name in a function call, but such a function declaration with the name x cannot be found. 9. x is used as a function name (in a function call), but the number of arguments in the function call is different from the number of parameters in the function declaration. 10. x is used as a function name (in a function call), but the type of an argument does not match the type of its corresponding parameter in the declaration of function x. 19/31 Errors to Be Found by a semantics analyzer of C-, cont. 11. For an array expression of the form (e1)[e2], or of the form e1[e2] —note that it is recorded by the parser as an expression of the index operator— e2 does not have the integer type. . 12. For an array expression of the form (e1)[e2], e1 does not have the address type. We don’t need to consider the case of e1[e2] where e1 is an ID but does have the integer type, since this case is covered by Error 3. 13. exp is an expression using a multiplicative operator, which is ∗ or /, but one of its two operands expression does not have the integer type. 20/31 Errors to be found by a semantics analyzer of C-, cont. For an expression exp: 14. exp applies a relational operator, which is one of: <= < > >= == !=,buttheitisnottruethat: thetwo operand expressions both have the integer type, or both have the address type.
15. exp applies an additive operator (+ or −), and one of the operand has the void type.
16. exp applies the + operator, but the two operands are both addresses .
17. exp applies the assignment operator, but its right hand side expression does not have the integer type.
18. exp applies the assignment operator, but its left hand side expression does not have the integer type.
19. In the program there is no definition of a funtion with the prototype int main(void).
21/31

Semantic Analyzer Implementation: Overview
􏹩 No need to use a generic traversal function, which is a mixture of pre-order and post-order traversal, as shown by the textbook.
􏹩 It is logically clearer to separate the preorder and postorder traversal.
􏹩 Semantic analysis is performed in three stages in the following order:
1. Initiate a symbol table for the top-level scope.
2. A pre-order traversal of the syntax tree to build the symbol
table, and find errors related to declaration and reference.
3. A post-order traversal of the syntax tree to do type checking
(assign types to expressions, and find errors related to types).
22/31

Implementation: Preparing the Top-level Symbol Table
Before the pre-order traversal, prepare the symbol table for the top-level scope:
􏹩 Create a symbol table for the top-level scope, and make it the current symbol table.
􏹩 create two tree nodes of function declaration for the two built-in functions read and write. Their line numbers are 0, representing the built-in input and output function. These two nodes are not connected to the syntax tree, but is connected by the symbol table.
􏹩 Insert the two bucket-list-records of the two functions input and output into the current symbol table (the top-level one), where the above two tree nodes are connected.
23/31

Implementation: Behavior of a Pre-order Traversal
For a treenode N:
􏹩 If N is a declaration, or a parameter that is not void (kind.param is not VOID_PARAM):
􏹩 Check and report error 1, 2, 3.
􏹩 If no error, insert a bucket-list-record into the current symbol
table for this declaration. Assign N.something with the address of the corresponding bucket-list-record.
􏹩 if a new block is reached, which means: N is a function declaration, or N is a compound statement that is NOT the body of a function declaration, then,
􏹩 attach a new symbol table newSt to the lower list of the current symbol table, make newSt the current symbol table for handling N’s children nodes.
24/31

Implementation: Behavior of a Pre-order Traversal Cont.
􏹩 if N is a reference of a name, which means a N appears in a function call, index expression, or ID expression:
􏹩 Check and report error 6, 7, 8, and 9.
􏹩 If no error, insert a line-list-record into the buck-list-record
that is the look-up result of this name. Assign the something field of this line-list-record with the address of the corresponding bucket-list-record.
25/31

Implementation: Behavior of a Post-order Traversal, Decide
Expression Type
For an expression node exp, assign a value to exp->type as follows:
􏹩 If exp is a constant (a number), exp->type is INT_TYPE.
􏹩 If exp is a function call, exp->type is the same as the return type of its declaration (int or void). Note that the declaration node should be easily found by a search based on exp.something, which is already assigned with a value during the pre-order traversal.
􏹩 If exp is an ID expression, and the name is an array name (lookinging up the name finds an array declaration), exp->type is ADDR_TYPE.
􏹩 If exp is an ID expression, and the name is a variable (not array, not function), then exp->type is the same as the type of the declaration of the variable. (Although it must be INT_TYPE in C-, we check the declaration for to simulate a real C compiler.)
26/31

Implementation: Behavior of a Post-order Traversal, Decide Type of an Expression, Cont.
􏹩 If exp is an array-element expression (applying the index operator), exp->type is INT_TYPE. We only allow integer arrays in C-, and each address value is an address of an integer.
􏹩 If exp applies a multiplicative operator (∗ or /), or a relational operator (<= < > >= == ! =), then exp->type is INT_TYPE.
􏹩 If exp applies the + operator, then
􏹩 if both operands are integers, then exp->type is INT_TYPE.
􏹩 if one operand is an integer, while the other one is an address,
then exp->type is ADDR_TYPE.
􏹩 if exp applies the − operator, then
􏹩 if both operands are integers, then exp->type is INT_TYPE.
􏹩 if one operand is an integer, while the other one is an address,
then exp->type is ADDR_TYPE.
􏹩 if both operands are addresses, then exp->type is INT_TYPE
􏹩 If exp applies the an assignment operator or comma operator, then exp->type is the same as the type of the RHS expression.
27/31

Semantics Analyzer Implementation: Behavior of a Post-order Traversal, Cont.
􏹩 Check and report errors, 10, 11, 12, 13, 14, 15, 16, 17, and 18. 􏹩 For a function declaration node, check errors 4 and 5.
28/31

Semantic Analyzer Discussions
􏹩 When to check the error of no main function (Error 19)?
􏹩 Any other error can be checked by the semantic analyzer ?
􏹩 The parser requires that for an array declaration, the array size is just a number. So the semantic analyzer does not need to worry about the case when a void expression appear as the size of an array.
􏹩 The scanner does not allow negative number.
􏹩 Two functions, read and write, are built in for a C- program. Although we do not see their definitions in a C- program, these two function names are assumed to be included on the top-level scope.
􏹩 C does not allow two declarations with the same name in the same scope, even the two declarations are different kinds. E.g., some error message will be printed if there are two declarations in a block like: int x; and int a[3].
29/31

Semantic Analyzer Discussions, Cont.
􏹩 Shown by experiement with gcc, its look-up procedure is “lazy”. Looking up a name x stops at the first (closest) declaration with the name x, If the declaration does not match, e.g. An array declaration with the name x is found when the looking up a variable x, then an error is reported, although a declaration of variable x can be found in some further place.
􏹩 In a general C program, the type of some expression can be decided without checking the types of the subexpressions, such as a functio call. In C-, we have similar situations, such as assigning the INT_TYPE to index operator expression, which is the supposed type of such an expression.
􏹩 In a general C program, there are expressions whose type should be decided according to its sub-expressions. In C-, there are similar situations, such as the type of the + or = expressions. That is why type of expressions are decided in a post-order traversal of the parse tree.
30/31

Semantic Analyzer Discussions: Handling Errors
􏹩 For an expression e whose type depends on its sub-expressions, such as +, −, or =, if the type of its sub-expression is wrong, then we assign e.type with VOID_TYPE.
􏹩 In the above case, the analyer should provide an option to stop at the first type error detected, or to allow the type error to propogate. So, when the later is chosen, we expected that analyzer can print a stack of error messages triggered by one type error. For example, consider the expression
x = y = (56)[7].
􏹩 For an expression whose type does not depend on its sub-expressions, we can still assign its type, even though its sub-expressions’ type can be wrong. For example, for an expression e[d], when the type of e or d is void, which is wrong, we can still decide the (supposed) type of e[d] is integer .
31/31