Project 4 Part A – Small C Parser

Project 4 Part A – Small C Parser

Due 11:59pm Oct 24 Oct 29, 2016

Introduction

In this project, you will write a parser for SmallC, a tiny subset of the C programming language. As part of the project, you will write a parser that translates a plain text C program into an abstract syntax tree (AST), and a pretty print function that formats the code in a specific style. If you want to leran C, you can revisit the textbook from CMSC216.

For purposes of this project, we will only test your parser with valid input. Thus your code may do whatever you want on a bad input. We do, however, recommend adding reasonable error handling code to your project for cases of malformed or otherwise incorrect C input, because it will make developing your project easier. As you are testing your program, you may inadvertently create incorrect input data; substantial time may be lost in trying to debug the program, only to find a few mistyped characters in your input data are the source of the problem.

In Project 4 Part B, you will write an interpreter for SmallC. The interpreter executes the code represented as an AST, which is generated by the parser in this project. Imperative programming is not allowed. You can use semicolon seperated print statements. If you do not finish this project, you will not be able to do Project 4 Part B.

If you find any error in the description or in the test files, report it to the instructor. Make sure you check the piazza announcements and errata section of the project periodically.

Getting Started

Download the following archive file p4a.zip and extract its contents.

Along with files used to make direct submissions to the submit server (submit.jar, .submit, submit.rb), you will find the following project files:

Part 1: Parsing SmallC

Put your solution to this part in the to do section of file smallc.ml.

Your next task is to write a parser for C program, which in this case will be a function that turns a string into a C abstract syntax tree (AST). Your parser will take as input a sequence of tokens, produced by a scanner, which are the terminals of the C grammar, and output the C AST.

We’ve supplied you with a function tokenize : string -> token list that acts as a scanner/lexer, converting the string input into a list of tokens, represented by the following data type:

type token =
 | Tok_Id of string
 | Tok_Num of int
 | Tok_String of string
 | Tok_Assign
 | Tok_Greater
 | Tok_Less
 | Tok_Equal
 | Tok_LParen
 | Tok_RParen
 | Tok_Semi
 | Tok_Main
 | Tok_LBrace
 | Tok_RBrace
 | Tok_Int
 | Tok_Sum
 | Tok_Mult
 | Tok_Pow
 | Tok_Print
 | Tok_If
 | Tok_Else
 | Tok_While
 | Tok_END

For example,

int main(){
    int a;
    a = 10;
    int b;
    b = 1;
    int c;
    c = a + b;
    printf(c);
}

becomes a string “int main(){int a; a= 10; int b; b = 1; int c; c = a+b; printf(c);}”.

when called as

tokenize "int main(){int a; a= 10; int b; b = 1; int c; c = a+b; printf(c);}",

the return value is

[Tok_Int; Tok_Main; Tok_LParen; Tok_RParen; Tok_LBrace; Tok_Int;
Tok_Id "a"; Tok_Semi; Tok_Id "a"; Tok_Assign; Tok_Num 10; Tok_Semi;
Tok_Int; Tok_Id "b"; Tok_Semi; Tok_Id "b"; Tok_Assign; Tok_Num 1;
Tok_Semi; Tok_Int; Tok_Id "c"; Tok_Semi; Tok_Id "c"; Tok_Assign;
Tok_Id "a"; Tok_Sum; Tok_Id "b"; Tok_Semi; Tok_Print; Tok_LParen;
Tok_Id "c"; Tok_RParen; Tok_Semi; Tok_RBrace; Tok_END]

What to do: You write a function parse_Function : token list -> ast * token list that takes a list of tokens as input (returned from tokenize) and returns an AST. Once you have done this, you can run it using the code in the file parser.ml. You are not allowed to use parser generator tools.

You should use the idea of a recursive descent parser, as we discussed in class. Thus we suggest you write a function:parse_Function, which parses the non-terminal Function representing a single C function. parse_Function calls other fucntions such as parse_methodBody, parse_Statement to parse the body of the function. The context free grammar for smallC is given next, followed by the OCaml type definition ast of SmallC abstract syntax trees.

SmallC Grammar

The grammar for C you need to support is as follows

  basicType-> 'int'
  mainMethod-> basicType 'main' '(' ')' '{' methodBody '}'
  methodBody->(localDeclaration | statement)*
  localDeclaration->basicType ID ';'
  statement->
    whileStatement
    |ifStatement
    |assignStatement
    |printStatement

  assignStatement->ID '=' exp ';'
  ifStatement -> 'if' '(' exp ')'  '{' ( statement)* '}'  ( 'else' '{'( statement)* '}')?
  whileStatement -> 'while''(' exp ')' '{'(statement )*'}'
  printStatement->'printf' '(' exp ')' ';'
  exp -> additiveExp (('>'  | '<'  | '==' ) additiveExp )*
  additiveExp -> multiplicativeExp ('+' multiplicativeExp)*
  multiplicativeExp-> powerExp ( '*' powerExp  )*
  powerExp->primaryExp ( '^' primaryExp) *
  primaryExp->'(' exp ')' | ID | INITLIT
  ID->( 'a'..'z' | 'A'..'Z') ( 'a'..'z' | 'A'..'Z' | '0'..'9')*
  INTLIT-> ('0' |	('1'..'9') ('0'..'9')* )
  WS-> (' '|'\r'|'\t'|'\n')
      where

      • basicType is the basic data type in SmallC. That means SmallC only supports one data type, int. SmallC does not suppoer pointers, arrays, or strings.
      • mainMethod is the main function in SmallC. SmallC has only one function called “main”. Function “main” returns “int” and takes no arguments. Because SmallC does not support return statement, “main” does not return any value. SmallC does not support preprocessing directive ‘#include’, #define, or global variables. All variables are declared locally inside the main function. SmallC programs always start with “int main(){…”
      • methodBody is the body of the main function. It includes zero or more local variable declarations and statements.
      • localDeclaration declares the identifiers (variables). All variables are int type. Variables must be declared before they can be used and same variable cannot be declared twice. SmallC interpreter throws exception either violation. Checking if a variable has been declared is for Part B of the project. Ignore it in Part A.
      • statement produces all available statements in SmallC. SmallC only has assignment statement, if statement, and while statement.
      • assignStatement assigns the value of the right hand side expression to the left hand side identifier. All expressions have the same data type int.
      • ifStatement is the only brach statement SmallC supports. “Else” branch is optional. The conditional expression return 1 for true, and -1 for false. SmallC supports nested if/else statements.
      • whileStatement is the repetition statement in SmallC. The conditional expression return 1 for true, and -1 for false. While statement body has 0 or more statements. SmallC supports nested while statements.
      • printStatement is the statement to print the value of an expression. “printf(“%d\n”, exp)” is equivalent to “printf(exp)” in SmallC.
      • exp is the expression in SmallC. SmallC expressions can only have +,*, ^ (power), parenthesis “()” operations, and logical operations Greater “>”, Less “<“, and Equal “==”. (Clarification: in C, “^” is xor, not power operator. We use “^” as power in SmallC). The order of precedence from high to low is (), ^, *, +. Logical operations all have same precedence. For example: a=10; b=2; a+b*3^3 evalues to 64. Operators +,*, ^,>,<,== are all right associative. For example: “2 + 3 + 4” is parsed as “Sum(Num 2,Sum(Num 3,Num 4))”, and “5 > 6>1” is parsed as “Greater(Num 5,Greater(Num 6,Num 1))”.
      • additiveExp, multiplicativeExp, powerExp, primaryExp represent the precedence of +,*,^ from low to high.

      • ID is the identifier. Identifiers may contain upper- and lower-case letters, digits, but starts with a letter.
      • INITLIT is the integer number literals.
      • Ws is the white space.

SmallC AST

For this project, a SmallC is represented using an AST (abstract syntax tree) defined using the following OCaml data type:

type ast =
  | Id of string   (*ID "a" *)
  | Num of int     (* NUm 10 *)
  | Define of data_type * ast  (* data type * ID *)
  | Assign of ast * ast        (* ID * expression *)
  | List of ast list
  | Fun of data_type * string * ast * ast   (* return type * function name * argument list * statement list *)
  | Sum of ast * ast       (*   exp * exp *)
  | Greater of ast * ast   (*   exp * exp *)
  | Equal of ast * ast     (*   exp * exp *)
  | Less of ast * ast      (*   exp * exp *)
  | Mult of ast * ast     (*   exp * exp *)
  | Pow of  ast * ast     (*   exp * exp *)
  | Print of ast          (* print (expression ) *)
  | If of ast * ast * ast	(* cond * if brach * else branch *)
  | While of ast * ast       (* cond * while body *)
  | Paren of ast

Part 2: SmallC Pretty Print

In this section, you will impelment pretty_print, a function that prints the SmallC source code in indented format. Pretty print uses following style:

    • Function return type and function name, arguments are in same line. Statements within the braces are indented to the next level, 4 “_” (underscores) further from the margin.
    • Opening brace associated with a control statement is in the same line. Closing brace is indented to the same level as the control statement. Statements within the braces are indented to the next level.
    • One space before and after >,<,==,+,*,^,=.
    • No space before or after “(” and “)”.
    • No empty lines
    • No empty line after the final “}”
    • For example:
int
main(){int x;int y;
while(x == y) {x=x+1;



                             a = 100;}

                   if(a == b){
		     printf(20);
                   }else{
                   printf(10);
}
}

      pretty print output of above code is:
int main(){
____int x;
____int y;
____while(x == y){
________x = x + 1;
________a = 100;
____}
____if(a == b){
________printf(20);
____}else{
________printf(10);
____}
} [print a single "\n" here.]

Testing and Submission

We will test your project by calling your parsing and evaluation functions directly, so be sure to give those functions the types we expect, as given above. You can work on the interpreter and parser in any order, we will test each part independently.

You may assume that all input test cases are syntactically correct. If the input SmallC code is not legal you may perform any action (e.g., exit, throw an exception).

All your code should be in one fils, smallc.ml. You can submit your project in two ways:

  • Submit your smallc.ml files directly to the submit server.
    • You can submit multiple files by putting the files in a .zip archive first. On Windows you can select the two files, then right click to select the “Send to->Compressed (zipped) Folder” option to create a .zip archive. Once your files are in a single zip archive, bring up the upload dialog box by clicking on the submit link in the column “web submission”. Select your archive file using the “Browse” button, then press the “Submit project!” button.
    • The submit server now allows multiple files (from the same directory) to be selected. Bring up the upload dialog box by clicking on the submit link in the column “web submission”. Browse to the directory containing your project files, then click on both SmallCTest.txt and SmallC.ml. Now press the “Submit project!” button.