9/29/21
Compilation
COMP712 Programming Languages
AUT
1
Program words symbols
Every statement turns into a tree, Sn
tokens
Syntax Analyser
A list of trees
A program is transformed into a list of trees
2
Lexical Analyser, so far
Program
…… S1 S2….
12
Program words tokens symbols
Syntax Analyser
A list of trees
An Example
Every statement turns into a tree, Sn
X := Y + 5; Print x;
:=
x+ y5
print While x < 100 do x
Begin
: end
while block
S1 S2....
< x5
3
: :
What do you do if it is a name?
Analysing a sentence at a time
Program words symbols
x:=y+5; x Print x;
Syntax A list of
tokens
avariablename
trees
Analyser
4
34
Analysing a sentence at a time
x := y + 5; x Print x;
: You hash it into the
s.name
Hash Table
Syntax Analyser
What else is passed on to the syntax analyser?
symbol table. What is returned here?
NameTable “x”
P1 P2 flags P3
Symbol Table
5
Analysing a sentence at a time
x := y + 5; Print x;
:
:
x
s.name
Hash Table
Syntax Analyser
NameTable “x”
P1 P2 flags P3
But why point to here?
s.name ?
S.Name and a pointer to the symbol table
Symbol Table
6
56
1
9/29/21
x := y + 5; Print x;
:
:
:
s.colon?
No, you really want to return s.becomes
Analysing a sentence at a time
Read next symbol
Just like reading in a comment /*...*/, when you read in a “/”, you have to check if it is followed by a *. So, reading in a “:”, you have to check if it is followed by a “=“.
7
Analysing a sentence at a time
x := y + 5; := Print x;
Syntax Analyser
NameTable “x”
P1 P2 flags P3
Build the tree for this statement. What does it look like?
: :
Do you need to hash this to the symbol table?
No, but what do you do next?
s.becomes
Hash Table
Symbol Table
8
78
Analysing a sentence at a time
:= x
x := y + 5; Print x;
:
:
:=
Syntax s.becomes Analyser
But, what is x?
9
Hash Table
NameTable “x”
P1 P2 flags P3
Symbol Table
Analysing a sentence at a time
:=
s.name
x := y + 5; := Print x;
:
:
Syntax Analyser
NameTable “x”
P1 P2 flags P3
s.becomes
Hash Table
Symbol Table
10
9 10
Hash Table
NameTable “x”
The symbol table thus holds information about the variable, known as run-time properties. It is organized as a push- down stack. It has 3 pointers, P1, P2, & P3.
Analysing a sentence at a time
P1 P2
flags P3
Symbol Table
Q: why use a pushdown stack and how is it
implemented?
Because you could have different types of x’s.
11
Push-down stack: refresh
Information coming in: 1, 2, 3, 4.
Information going out?
Rationale
A pushdown stack thus prevents access to earlier information until the current information is being processed.
4, 3, 2, 1 i.e. it is a FILO queue
Implementation
1
12
4 3 2 1
A
stack list
A
linked 2
11 12
2
9/29/21
Using a linked list: Push and Pop Top of the stack
Push 3 onto
the stack 2
Pop 3 off 1 the stack
3
A
linked 2
list
1
You use a linked list when the information you push/pop is more complex.
13
Hash Table
NameTable “x”
The symbol table thus holds information about the variable, known as run-time properties. It is organized as a push- down stack. It has 3 pointers, P1, P2, & P3.
14
Lexical Analyser (cont.)
P1 P2
flags P3
Symbol Table
So, where is the PDS?
Which pointer is used to implement the linked list? P1
13 14
Hash Table
NameTable “x”
P1 P2 flags P3
P2? pointer to the variable identity i.e. its actual name
P3? pointer to the code of the procedure, if any.
Again, why do we need a pushdown stack?
15
Lexical Analyser (cont.)
So, which pointer is used to implement the linked list? P1
Symbol Table
To handle multiple occurrences
Begin integer x; :
:
Begin real x;
: :
Hash Table
NameTable “x”
P1 P2 flags P3
Having read the first “x”, your symbol table is as shown on the right. What happens when the lexical analyser reads the second “x”?
16
15 16
To handle multiple occurrences
You compute the runtime properties of the second x and put it into the symbol table. How?
Begin integer x; :
:
Begin real x;
: :
Hash Table
NameTable “x”
P1 P2 flags P3 real x
P1 P2 flags P3 integerx
17
To handle multiple occurrences
Begin integer x; :
:
Begin real x;
: :
Hash Table
NameTable “x”
P1 P2 flags P3 real x
P1 P2 flags P3 integerx
You put the real x at the top of the stack, why?
So that all references to x now refer to it without having to search for it. And you don’t need to refer to integer x until you exit this block.
18
17 18
3
9/29/21
Hash Table
NameTable “x”
P1 P2 flags P3 realx P1 P2 flags P3 intx
NameTable “x”
P1 P2 flags P3 realx P1 P2 flags P3 intx
To handle multiple occurrences
Now, what happens when you exit the block?
You pop out the description for real x so that from now on, any reference to x points to the descriptor for integer x.
19
To handle multiple occurrences
Begin integer x; :
:
Begin real x;
Integer x : :
End;
:
x := 5;
But, where did the descriptor
for real x go? Attached to all the trees that refer
to real x!
Hash Table
NameTable “x”
P1 P2
flags P3
Descriptor for real x removed
20
19 20
Hash Table
NameTable “x”
The symbol table holds the run-time properties of the variables used in the program but what are these properties?
Run-time properties of variables
P1 P2
flags P3
Symbol Table
Note that these properties are what we need to know about the variables when the program is executed.
21
These properties are:
Its string name P2
Declaration status: yes/no boolean
Variable type: e.g. integer, real,.. Is this a reserved word: yes/no If this is a function, its code P3 Block level number number Offset value number
const boolean
But, what are these two?
22
21 22
To understand what the block level number and offset value are for, something is amiss here: why is there no storage allocated for holding the runtime value of a variable? Isn’t its value a runtime property?
“x”
flags
In other words, why don’t we have:
P1 P2
/ “..” value P3
23
Then, we could do: X := 5
This is known as
static allocation
of storage as found in early languages (ex. Fortran 77).
“x” / flags
“..”
P1 P2
5
P3
Disadvantages? No recursion! Why? Because if you allocate value there, what happens
when you call your function again?
You don’t have storage for it or if you use the same store, you over-ride the previous value.
We need to introduce the concept of a data frame. 24
23 24
4
9/29/21
Data Frame
Any part of the program that allows the declaration of variables – procedures, blocks, etc. – needs a data frame
A data frame is thus a collection of memory locations (i.e. storage for the variable) that are created during runtime.
To create a data frame, we need to know how many variables are used in the program.
25
Import java.util.Scanner;
Public class PrimeTest
{ public static void isPrime (int n)
{ boolean prime = true; If (n < 2) prime = false; for (int i=2; i < n; i++)
{ if (n%i==0)
{ prime = false; break; }
}
if (prime)
Where are the data frames created in this program?
System.out.printf(“%d %s\n”, n, “is a prime number”) else
System.out.printf(“%d %s\n”, n, “is not a prime number”) }
Public static void main (String[]args)
{ Scanner sc = new Scanner (System.in);
Sytem.out.println( “enter a number to check if its prime or not”) int num = sc.nextInt();
isPrime (num);
}
26
25 26
Import java.util.Scanner;
Public class PrimeTest
{ public static void isPrime (int n)
{ boolean prime = true; If (n < 2) prime = false; for (int i=2; i < n; i++)
{ if (n%i==0)
i: { prime = false; break; }
}
if (prime)
System.out.printf(“%d %s\n”, n, “is a prime number”) else
System.out.printf(“%d %s\n”, n, “is not a prime number”) }
Public static void main (String[]args)
{ Scanner sc = new Scanner (System.in);
Sytem.out.println( “enter a number to check if its prime or not”) int num = sc.nextInt();
isPrime (num);
}
prime:
n:
num:
27
Import java.util.Scanner;
So, whenever you use this code, you need to create2 locations to hold “n” and “prime”.
But you also need one when you enter the for- loop for “i”.
And you need one location for “num”.
Public class PrimeTest
{ public static void isPrime (int n)
{ boolean prime = true; If (n < 2) prime = false; for (int i=2; i < n; i++)
{ if (n%i==0)
{ prime = false; break; }
}
if (prime)
System.out.printf(....) else
System.out.printf(....) }
Public static void main (String[]args)
{ Scanner sc = new Scanner (System.in);
Sytem.out.println(....) int num = sc.nextInt(); isPrime (num);
}
28
27 28
Data frames are created from a stack and are allocated only when that piece of code is executed.
Imagine if you call: mainàisPrimeàisPrime.
main
4loop isPrime
4loop isPrime
29
To add and remove data frames, you need two pointers – the base (blue) and the top (red) pointers
main isPrime 4loop isPrime 4loop
30
29 30
5
9/29/21
To access the location assigned for each variable, you assigned each variable an offset number. Then do: base+offset
Main: num(0); isPrime: n(0), prime(1) 4loop: i(0).
main
num =
isPrime
+ 0
Offset numbers Pointer’s address
prime = + 1 n= +0
31
But after you exit the for-loop and return to isPrime, what happens?
You need to do the opposite (i.e. shift the pointers down) but how do you know where your pointers go?
4loop isPrime 4loop isPrime main
32
31 32
To restore the stack, you need to store the return pointer:
4loop 4loop 4loop 4loop
In the 3rd version, store 2 pointers: where to return and where is the top
33
Possible flags: “x” /
“..”
Declaration status: declare or not
Declaration type e.g. integer, real,..
P1 P2
flags P3
Variable type: reserved or user-defined
Block level number What about
Offset value
block level
34
33 34
Begin integer x;
:
So, here is block 0
:
Begin integer y;
:
y := y + x;
: here is End; block 1
😡 : = 5 ; h e r e i s block 0
How do we get x?
So, you need another pointer to point to the correct block
4loop
35
Such a method for allocating storage is known as dynamic allocation.
Advantages? Allow recursion in PLs!
How? First, observe no actual locations are reserved in the symbol table (as opposed to static allocation).
What then is recorded in the
symbol table? 2 numbers – block level and offset value
36
35 36
6
9/29/21
S Y M B O L
T A B L E
Create an entry for each variable and compute their runtime properties
Pass runtime properties of variables to the tree
Source Program read
character
Lexical analysis
Lexical terms
Syntax analysis
Parser tree
Simple translation
Object code Loader
Target program
The process of compilation
We are almost here!
Optimise code (optional)
37
The task of lexical analysis is to divide up the input into lexical items and compute the runtime properties of variables used. The data structure needed is a symbol table.
The task of syntax analysis is to combine the lexical items into phrases which eventually lead to legal sentences. The data structure used is a tree.
Syntax analysis, thus, consists of two independent parts: how to recognise a phrase and how to produce a tree node.
38
37 38
But before we move into Syntax Analysis, let’s look at implementing a Lexical Analyser. What basic functions do we need?
A function to read each character from a program and convert it to tokens.
A function to lookup the name in the symbol table and if necessary, compute the runtime properties of the symbol.
39
A function to read each character from the program and convert it to token:
Let Nextsym()
$(
switchon read-char() into
$( case ‘+’: ? case ‘-’: ?
case ‘0’...’9’: ? case ‘a’...’z’: ?
: :
return sym
40
39 40
A function to read each character from the program and convert it to token
Let Nextsym()
$(
switchon read-char() into
$( case ‘+’: sym := s.plus; endcase case ‘-’: sym := s.minus; endcase
case ‘0’...’9’: ? case ‘a’...’z’: ?
: :
return sym
41
A function to read each character from the Java program and convert it to token:
Let Nextsym()
$(
switchon read-char() into
$( case ‘+’: sym := s.plus; endcase case ‘-’: sym := s.minus; endcase
case ‘0’...’9’: a routine to read in a number
case ‘a’...’z’: a routine to read in a name :
:
return sym
42
41 42
7
9/29/21
A function to lookup the name in the symbol table.
Let lookupword() $(
use the hash function to find an entry for the word read
check name-table to find the correct entry.
{how do you do this?}
If there is, retrieve and return its type.
If there isn’t, insert a new entry (what?) and return s.name (why?)
$)
43
The Parsing Process
COMP712 Programming Languages
AUT
44
43 44
A typical view of the compilation process:
parsing
Your Program
Lexical Analyser
Syntax Analyser
What is generated?
Symbol Table Parse Tree Tokens = terminal words
45
A typical view of the compilation process:
parsing
What happens next?
Code generation
Interpretation i.e walk the tree
46
Your Program
Lexical Analyser
Syntax Analyser
45 46
A typical view of the compilation process:
parsing
But the above does not say whether one performs a complete lexical analysis prior to performing syntax analysis or doing both simultaneously.
Your Program
Lexical Analyser
Syntax Analyser
47
Thus, we could have a two-pass parser:
1 pass
Or, a single pass parser: 1 pass
2 pass
In the previous lecture, we look at parsing from the LA standpoint. Now we look at it from the SA standpoint
48
Your Program
Lexical Analyser
Syntax Analyser
Syntax Analyser
Lexical Analyser
Your Program
47 48
8
9/29/21
Consider (single) parsing the simple Algol-like program:
Begin integer x, y; x := 5;
y := x;
:
Begin real y;
: end;
: End;
First, what does parsing means?
It means syntax analyser calls lexical analyser to get tokens and use them to build a parse tree immediately
49
Consider parsing the simple Algol-like program:
Begin integer x, y; x := 5;
y := x;
:
Begin real y;
: end;
: End;
What happens when you read “begin”?
LA finds the entry “begin” in the hash table
It returns s.begin to the SA
What happens next?
50
49 50
Consider parsing the simple Algol-like program:
Begin integer x, y; x := 5;
y := x;
:
Begin real y;
: end;
: End;
SA knows what s.begin means. So, what action is taken?
SA ignores s.begin and starts the tree building process.
SA initialises tree:
Tree
51
Consider parsing the simple Algol-like program:
Begin integer x, y; x := 5;
y := x;
:
Begin real y;
: end;
: End;
What happens when you read “integer”?
LA finds the entry “integer” in the hash table. Why? What does it return to SA?
s.integer
52
51 52
Consider parsing the simple Algol-like program:
Begin integer x, y; x := 5;
y := x;
:
Begin real y;
: end;
: End;
SA knows what s.integer means. So, what actions are taken?
SA sets declaration = true
SA sets decl_type = s.integer
Global variables
53
Consider parsing the simple Algol-like program:
Begin integer x, y; x := 5;
y := x;
:
Begin real y;
: end;
: End;
What happens when you read “x”?
LA looks for the entry “x” in the hash table
But it is not there!
54
53 54
9
9/29/21
LA creates an entry “x” in the symbol table:
Begin integer x, y; x:=5;
y := x;
:
:
Begin real y;
: end;
: End;
/ “x” Isthatall? No P1P2 flags /
Flags:
Declared: yes
Type: s.integer Reserved or user-defined Block level: 0
Offset: 0
55
Continue parsing the simple Algol-like program:
Begin integer x, y; x := 5;
y := x;
:
Begin real y;
: end;
: End;
What happens when you read “,”?
It returns s.comma to the SA
SA ignores s.comma and knows more declaration coming.
56
55 56
Continue parsing the simple Algol-like program:
Begin integer x, y; x := 5;
y := x;
:
Begin real y;
: end;
: End;
What happens when you read “y”?
LA looks for the entry “y” in the hash table
Again, it is not there!
57
Begin integer x, y; x := 5;
y := x;
:
:
Begin real y;
: end;
: End;
/
.........
“x”
LA creates another entry in the symbol table.
/ “y” P1P2 flags /
Flags:
Declared: yes
Type: integer Reserved or user-defined Block level: 0
Offset: 1
58
57 58
Block level tells you that these variables are at the same level i.e using the same data frame.
Enter a new block level, you reset offset value to zero and increase block level by 1.
Exit a new block level, you reset offset value to the previous value and decrease block level by 1.
Offset value allows you to index its location on the stack.
59
Recall what is a data frame?
The memory required by your program
In this program, we need 2 locations at the outer block
To get x, you do:
Base address + offset of x
Begin integer x, y; x := 5;
y := x;
:
:
Begin real y;
:
y := 0.5*x; end;
: End;
y:
x:
60
59 60
10
9/29/21
Continue parsing the simple Algol-like program:
Begin integer x, y; x := 5;
y := x;
:
Begin real y;
: end;
: End;
What happens when you read “;”?
It returns s.semicolon to the SA
SA ignores it and sets decl_type = nil.
61
Continue parsing the simple Algol-like program:
Begin integer x, y; x := 5;
y := x;
:
Begin real y;
: end;
What happens when you read “x”?
LA finds the entry “x” in the hash table
: It returns s.name and End; a pointer to “x” in the
symbol table to SA
62
61 62
SA gets s.name and a pointer to the symbol table:
Flags:
Declared: yes Type: s.integer user-defined
Block level: 0 Offset: 0
/ “x”
P1 P2
flags /
Since SA did not encounter s.integer or s.real or s.procedure, it knows declaration has finished. So, declaration := false. It now expects an operator next.
63
Continue parsing the simple Algol-like program:
Begin integer x, y; x := 5;
y := x;
:
Begin real y;
: end;
: End;
After reading x, what is read?
: not := and reading ”:”, what happens next?
It has to check if the next symbol is “=“ and if so, returns?
s.becomes to SA
64
63 64
SA gets s.becomes and starts to build the first sentence which is an assignment expression.
/ “x”
:= .........
/ “y”
s.name
Note where
this arrow is pointing to
P1P2 flags /
65
Continue parsing the simple Algol-like program:
Begin integer x, y; x := 5;
y := x;
:
Begin real y;
: end;
: End;
What happens when you read “5”?
LA returns the value 5 and its type s.integer to SA
SA knows the right hand side is an arithmetic expression. Call a function to build such a tree.
66
65 66
11
67 68
9/29/21
Continue parsing the simple Algol-like program:
Begin integer x, y; x := 5;
y := x;
:
Begin real y;
: end;
: End;
What happens when you read “;”?
LA returns s.semicolon to SA
SA knows end of statement reached.
67
SA now completes the assignment expression tree. It checks that variable x is of the right type.
/ “x”
.........
/ “y” P1P2 flags /
s.name
:=
s.value 5
68
SA builds the first statement tree in the program
Tree
/ “x”
:= .........
/ “y”
x5
P1 P2 flags /
69
69
12