8/11/21
Hints for drawing an FSM. Consider Q6 in the tutorial:
Construct a deterministic FSM that recognizes the set of all bit strings that contain the string 101.
1. See what the input values are – in this case 0 and 1. In the previous example, it is the alphabets
1
2. See what the result you want. In this case the pattern 101 and in the previous example: HAPY8.
3. Work out the acceptance states.
S 1 S 0 S2 1 S3 01
2
12
34
3. Then, work out the non-acceptance states.
01
101 S0 S1 S2
S3
3
Q5: Consider the grammar G:
S → aS S → bA A→ε A → cA
Derive all the legal strings for this grammar.
4
Q5: Consider the grammar G: S→aS|bA A→ε | cA
Always start with the rule, S: Why?
aS S
This will give you a and then repeat S. So, we have a…..
bA This will give you b and then repeat A. So, we have b…..
BUT no more S.
5
Consider the grammar G:
S→aS|bA
aS S
bA
A→ε | cA
Expanding this, we have abA,
aabA, aaabA, aaaabA
Expanding this, we have
b, bc, bcc, bccc,
Hence we get: {anbcm: n >= 0, m >= 0} but why n, m is >= 0?
6
56
1
0
8/11/21
Compilation
COMP712 Programming Languages
AUT
7
A program that converts your high-level language to machine code is a compiler
De-compiler
Programming language
Machine code
Compiler
8
78
A simple program in Java
import java.util.Scanner; class AddNumbers
{
public static void main(String args[]) {
What does compilation mean?
int x, y, z;
System.out.println(“Enter 2 integers:”); Scanner in = new Scanner(System.in);
x = in.nextInt();
y = in.nextInt();
z = x + y;
System.out.println(“Sum of integers = “+z);
} }
9
What does compilation mean?
import java.util.Scanner; class AddNumbers
{
public static void main(String args[])
{ int x, y, z;
System.out.println(“Enter 2 integers:”); Scanner in = new Scanner(System.in);
x = in.nextInt();
y = in.nextInt();
z=x+y;
System.out.println(“Sum of integers = “+z);
}}
Can’t go from program to machine code directly, why?
10
9 10
What does compilation mean?
import java.util.Scanner; class AddNumbers
{
public static void main(String args[])
{
int x, y, z;
System.out.println(“Enter 2 integers:”); Scanner in = new Scanner(System.in); x = in.nextInt();
y = in.nextInt();
z=x+y;
System.out.println(“Sum of integers = “+z); }
}
Each statement cannot be converted directly to a machine code
11
What does compilation mean?
import java.util.Scanner; class AddNumbers
{
public static void main(String args[])
{
int x, y, z;
System.out.println(“Enter 2 integers:”);
Scanner in = new Scanner(System.in);
x = in.nextInt();
y = in.nextInt();
z = x + y;
System.out.println(“Sum of integers = “+z);
} }
Parse each sentence and check if it is legal AND turn each into a useful intermediate representation
12
11 12
2
8/11/21
What compilation means: X := Y + 5;
1st step: check if sentence is legal – how?
Look up the grammar!
2nd step: turn the string into a useful intermediate representation – what kind of a representation?
A tree!
13
What compilation means: X := Y + 5;
Why is a tree useful?
So that one could walk the tree and perform the task
14
:= X+
Y5
13 14
Walking the tree:
Each tree node is an operator
Recognise this node as an assignment statement
Generate a STO instruction to the address on its LHS
Go down the RHS
It is another tree! Generate the code for this tree and return its value
Complete the STO instruction
15
:= X+
Y5
What compilation means: X := Y + 5;
So, what could one do when walking the tree above?
Generate machine code. This is called a compiler.
16
:= X+
Y5
Move acc Y Add acc 5 :
:
15 16
Interpreting the tree:
Each tree node is an operator
Recognise this node as an assignment statement
Recognise the variable used is “X”
Go down the RHS
It is another tree! Execute the tree and return its value
Assign its value to “X”
:= X+
Y5
17
What interpretation means:
:= X+
Y5
X := Y + 5;
Put result in variable x
So, what could one do when walking the tree above?
Execute the instruction. This is called an interpreter.
18
17 18
3
8/11/21
The full process of compilation
Source Program read
character
Lexical analysis
Lexical terms
Syntax analysis
Can anyone tell me what this is?
Optimise tree
(optional)
Optimise code (optional)
Parser tree
Simple translation
Object code Loader
Target program
19
Lexical Analysis Consider the follow java program:
/* The famous hello world in java */
Public class Helloworld
{
public static void main (String[] args) { System.out.println(“Hello World!”);
} }
What do you see? The computer must see what you see.
20
19 20
Need to identify:
Public class Helloworld {
comments User defined name
Lexical Analysis
/* The famous hello world in java */
public static void main (String[] args) { System.out.println(“Hello World!”);
}
Right brackets
The bolded words are words/terminals of the Java language. Name another one?
Reserved words
}
21
The job of the lexical analyser: Convert English words to Java words! Example:
/* The famous hello world in java*/
Public
class
Helloworld {{ “Helloworld!” string
comment Public class identifier
22
21 22
But first, what do I mean by a Java word? Is
it an English word?
/* The famous hello world in java*/
Public
class
Helloworld {{ “Helloworld!” string
comment Public class identifier
Java
words
23
No, but then how do you communicate that you have read a comment or the word “public” or a “{“?
/* The famous hello world in java*/
?
Lexical Analyser
Do I return a string “comment”?
Do I return a variable called comment?
Do I return a constant, say, 1 to indicate a comment is read? Yes but why?
24
23 24
4
8/11/21
No, but then how do you communicate that you have read a comment or the word “public” or a “{“?
?
The syntax analyser will have to check what token is read and it would be a pain to use string comparison.
25
/* The famous hello world in java*/
Do I return a constant, say, 1 to indicate a comment is read? Yes but why?
Lexical Analyser
We use constants to represent tokens found in a given PL:
{ constant
comment = 1; public = 2; class = 3; { = 4; etc….}
/* The famous hello world in java*/
Now what is returned?
comment
The constant “comment” which is equivalent to 1.
26
Lexical Analyser
25 26
In other words, you return tokens that you and the Java compiler can understand.
{ constant
comment = 1; public = 2; class = 3; { = 4; etc….}
To avoid confusion, it is better to label them as:
{ constant
s.comment = 1; s.public = 2; s.class = 3; s.leftbracket = 4; s.rightbracket = 5, etc….}
s.xxxàimplies a syntatic class
27
Oops, what about user defined name such as Helloworld?
Helloworld ? Can you use a constant for Helloworld (say,
S.Helloword = 15)?
No, you don’t know what name a user would use! Give another example of such type: numbers
28
Lexical Analyser
27 28
29 30
For numbers and user defined names, use generic names, s.name and s.integer
And return its value too! Helloworld
s.name “Helloword”
Lexical Analyser
Oops again!, you may use the same identifier name with different values!
29
Hence, handling user defined names is not straightforward!
But, what do I mean by not straightforward?
i.e. not simply reading in something and returning a single token
Give some examples as to why handling user defined names can be complicated?
30
5
8/11/21
User defined names can be:
• A variable and it has scope and extend
• A variable that denotes a data structure e.g. an array
• A variable that denotes a function
31
So, how do we handle user defined names?
You need to use a more complex data structure where you can store the user defined names and any other useful information about them.
Such a data structure is referred to as a symbol table.
It is obviously an important data structure used by the compiler for compiling your program and it thus needs to be implemented efficiently.
32
31 32
Think of writing a lexical analyser yourself. What functions do you need?
Your Java
program Symbol
table
tokens
Lexical Analyser
Syntax Analyser
33
So far I have hinted to you what is involved in lexical analysis or if you like, the rationale for what we want to do. Next, we will look at some implementation details about the process.
Before we do that, take a couple of minutes to digest what I have said so far.
34
33 34
S
YM
B O L
T A B L E
Create an entry
Cross reference
Source Program read
character
Lexical analysis
Lexical terms
Syntax analysis
Parser tree
Simple translation
Object code Loader
Target program
The process of compilation again
But, now I hope you know a bit more what is involved.
Optimise tree
(optional)
Optimise code (optional)
35
Lexical Analysis
LA is a process whereby characters in your program are turned into tokens in a programming language.
Technically, this means that the characters are converted into meaningful tokens.
36
35 36
6
37 38
8/11/21
From characters to tokens
Your program
/* The famous hello world in java */
Public class Helloworld
{
What the compiler sees
s.comment
s.public
s.class
s.name +value
s.bracket :
:
public static void main (String[] args) { System.out.println(“Hello World!”);
} }
What happens if it is a name?
37
From characters to tokens
Your program
/* The famous hello world in java */
Public class Helloworld
{
public static void main (String[] args) { System.out.println(“Hello World!”);
} }
What the compiler sees
s.comment
s.public
s.class
s.name +value s.public
What is the value? The string “Helloworld”
38
From characters to tokens
Your program
/* The famous hello world in java */
Public class Helloworld
{
public static void main (String[] args) { System.out.println(“Hello World!”);
} }
Where do you put them?
What the compiler sees
s.comment
s.public
s.class
s.name + value s.public
In the symbol table
39
What compilation now means:
s.becomes s.plus
X := Y + 5;
Symbol table
code interpret
40
s.name “x”
s.name ”y”
39 40
41 42
public class Factorial
{
public static void main(String[] args) { final int number = 100;
You have same variable name but use differently!
for(int i = 0; i < number; i++) System.out.println( i + "! is " + factorial(i));
}
public static int factorial(int n) { int number= 1;
for(int i = 2; i <= n; i++) number*= i;
return number; }
}
Can you spot something peculiar in this program from a compiler standpoint?
41
s.becomes s.plus
5
X := Y + 5;
Symbol table
code
interpret
So, having a single entry is not enough!
s.name “x”
s.name ”y”
You could have same names but for different types i.e. you could have 2 different “x’s”.
42
7
8/11/21
But, before we deal with that problem, how do you put “x” in the table? What algorithm?
s.becomes s.plus
5
X := Y + 5;
Symbol table
Use a hash function.
code interpret
43
s.name “x”
s.name ”y”
Hashing
Given a table for storing data, a hash function is a function that computes a unique address for each piece of data.
Data Store with a
Advantage?
Fast access Problem? Clashes
fixed size
44
43 44
Handling collision - Chaining
Store with a fixed size
Data
All data hashed to the same location are linked together. Another solution is to rehash
45
Resolving clashes in a Symbol Table
“y”
?
A clash means you hash to the same address
Resolve by chaining means finding space for it via a chain of cells
Why does this happen?
s.name “x”
s.name “x”
s.name “y”
46
45 46
So far, what is in a Symbol Table?
1. This table is a collection of
s.name “y”
2. The second descriptor holds all information collected about the run-time object (e.g types of variable, array, procedure, etc.). What is shown here is only the variable type, S.Name
47
s.name “x”
Symbol Table
s.becomes s.plus
X := Y + 5;
Symbol table
code interpret
3. Every occurrence of a name is converted to a pointer to the place in the symbol table at which information about the name is stored
48
s.name “x”
s.name ”y”
5
47 48
8
8/11/21
Symbol Table
• This table is used by different phases of the compiler
• Lexical analyzer – keeps a record of names encountered so far
• Later phases – translate from compile- names to run-time objects
• Its design is central; look ups need to be fast
49
Let us “compile” a program:
/* The famous hello world in java */
Public class Helloworld
{
public static void main (String[] args) { System.out.println(“Hello World!”);
} }
What does the compiler see first?
A comment – but how does it see that?
Design a FSM to read in a comment.
50
49 50
/* The famous hello world in java */
Public class Helloworld
{
public static void main (String[] args) { System.out.println(“Hello World!”);
} }
FSM:
/**/
But there is a bug!?
What does the compiler do with the comment? Throw it away!
Not *
51
/* The famous hello world in java */
Public class Helloworld
{
public static void main (String[] args) { System.out.println(“Hello World!”);
} }
Next: read “public”, now what?
Put it in the symbol table?
“public”
52
51 52
No, read “public”, check if it is a reserved word. If it is, then you return the appropriate token (in this case: s.public).
However, if you were to read “x”. What happens?
53
You still check if it is a reserved word. If it is, then you return the appropriate token (in this case: it is not).
Since it is not, you put the name “x” in the symbol table using a hash function to calculate its location:
x
Comment on the name
space
s.name “x”
algorithm so far
54
53 54
9
8/11/21
The algorithm is inefficient! Why?
x
Every occurrence of a variable will need to be checked if it is a reserved word!
s.name “x”
name space
55
A better algorithm: Pre-hashed reserved words and names of pre-defined procedures
x
Why?
s.public “public” s.main “main”
s.name “x”
name space
Why do we have s.public and not s.name?
56
55 56
Then, every time you encounter a name, go immediately to the hash table. If not there, put an entry in. Otherwise, retrieve the necessary information
x
s.name “x”
name space
s.public “public” s.main “main”
57
x
What happens name
if x hashed into a location that contains another word?
space
s.public “public” s.main “main”
58
57 58
You do chaining, remember?
s.public “public” s.main “main”
x
name space
s.name “x”
Then what happens when you encounter “x” again in the program?
59
But, variable names are not just names, are they?
They denote run-time memory locations, data structure names denote run-time collections of memory locations, procedures denote ? a piece of code
Encountering them, we must create sufficient information that enables us to allocate such information to these variables.
60
59 60
10
8/11/21
So a lexical analyser needs to collect more information about the names used.
x
name space
s.public “public” ”
s.name “x”
So, what do we
know about ”x”?
s.main “main
what do we know about ”main” and “public”?
61
In fact, the symbol table actually has 3 tables:
x
Hash Table
P1 P2
NameTable
“x” / flags P3
name space
Symbol Table
Note the word “symbol table” is therefore quite misleading
62
61 62
So, what is a hash table?
x
Hash Table
“x”
NameTable
/
name space
P1 P2
flags P3
Symbol Table
63
A list of addresses pointing to variable names
So, what is a name table?
The actual name string
What is this? NameTable
x
Hash Table
“x”
/
name space
P1 P2
flags P3
Symbol Table
A list of variable names hashed to the same entry in the hash table.
64
63 64
Hash Table
What is this?
NameTable
“x”
x
/
name space
P1 P2
flags P3
Symbol Table
Don’t forget – another variable that hashed to the same location
65
What is a symbol table?
x
Hash Table
“x”
NameTable
/
name space
P1 P2
flags P3
Symbol Table
A list of run-time properties for a given variable
66
65 66
11
8/11/21
“x” / “..” What are these run-time
P1 P2 flags P3
properties?
P1 – pointer to some runtime information about the name
P2 – pointer to the variable identity i.e. its actual name
P3 – pointer to the code of the procedure if any
Why no storage is allocated for holding the value of a variable?
67
In other words, why don’t we have:
So that we could do:
X := 5
P1 P2
P1 P2
“x” / “..” flags value P3
“x”
flags 5
/ “..”
P3
68
67 68
This is known as
static allocation
of storage as foundinearly languages (ex. Fortran 77).
Disadvantages?
P1 P2
“x”
flags
/ “..” value P3
No recursion! Why?
Because if you allocate value there, what happens when you call your function again?
69
STOP here on 11/8 Lecture 5. Mid- term test will cover from lecture 1 upto this point. In the next lecture (18/8), I will do a revision lecture before the mid-term test (25/8).
70
69 70
x
Hash Table
P1 P2
NameTable
“x” / “..” flags P3
name space
Symbol Table
Q: Why do we have more than one entry in the symbol table?
Remember, we could have x for different types?
71
An example:
x
name space
Hash Table
“x”
NameTable
/ “main”
/ P2 flags /
P1P2 flags / / flags /
?
72
71 72
12
8/11/21
Possible flags: “x” /
“..”
Declaration status: yes or not
P1 P2
flags P3
Declaration type e.g. integer, real,.. Variable type: reserved or user-defined
Block level number
Offset value are these two?
Hmm..what
73
Block level number Offset value
“x”
P1 P2 flags
/ “..” P3
74
To understand the use of these two flags, we are getting close to the next phase of a compiler. Which phase is that?
73 74
S
YM
B O L
T A B L E
Create an entry
Cross reference
Source Program read
character
Lexical analysis
Lexical terms
Syntax analysis
Parser tree
Simple translation
Object code Loader
Target program
The process of compilation
We are almost here!
Optimise code (optional)
75
The task of lexical analysis is to divide up the input into lexical items. The data structure needed is a symbol table
The task of syntax analysis is to combine the lexical items into phrases which eventually lead to legal sentences.
Thus, syntax analysis consists of two independent parts: how to recognise a phrase and how to produce a tree node.
76
75 76
But before we move into Syntax Analysis, let’s look at implementing a Lexical Analyser. What basic functions do we need?
A function to read each character from a Java program and convert it to tokens.
A function to lookup the name in the symbol table.
77
A function to read each character from the Java program and convert it to token:
Let Nextsym()
$(
switchon read-char() into
$( case ‘+’: ? case ‘-’: ?
case ‘0’…’9’: ? case ‘a’…’z’: ?
: :
return sym
78
77 78
13
8/11/21
A function to read each character from the Java program and convert it to token
Let Nextsym()
$(
switchon read-char() into
$( case ‘+’: sym := s.plus; endcase case ‘-’: sym := s.minus; endcase
case ‘0’…’9’: ? case ‘a’…’z’: ?
: :
return sym
79
A function to read each character from the Java program and convert it to token:
Let Nextsym()
$(
switchon read-char() into
$( case ‘+’: sym := s.plus; endcase case ‘-’: sym := s.minus; endcase
case ‘0’…’9’: a routine to read in a number
case ‘a’…’z’: a routine to read in a name :
:
return sym
80
79 80
A function to lookup the name in the symbol table?
Let lookupword() $(
use the hash function to find an entry for the word read
check name-table to find the correct entry.
{how do you do this?}
If there is, retrieve and return its type else
insert a new entry and return s.name $)
81
Data Frame
Any part of the program that allows the declaration of variables – procedures, blocks, etc. – creates a data frame
A data frame is collection of memory locations (i.e. storage for the variable) created when needed.
But, isn’t a procedure a piece of code? Why does it need a data frame?
82
81 82
Import java.util.Scanner;
Public class PrimeTest
{ public static void isPrime (int n)
{ boolean prime = true; If (n < 2) prime = false; for (int i=2; i < n; i++)
{ if (n%i==0)
{ prime = false; break; }
}
if (prime)
Where are the data frames created in this program?
System.out.printf(“%d %s\n”, n, “is a prime number”) else
System.out.printf(“%d %s\n”, n, “is not a prime number”) }
Public static void main (String[]args)
{ Scanner sc = new Scanner (System.in);
Sytem.out.println( “enter a number to check if its prime or not”) int num = sc.nextInt();
isPrime (num);
}
83
Import java.util.Scanner;
Public class PrimeTest
{ public static void isPrime (int n)
{ boolean prime = true; If (n < 2) prime = false; for (int i=2; i < n; i++)
{ if (n%i==0)
i: { prime = false; break; }
}
if (prime)
System.out.printf(“%d %s\n”, n, “is a prime number”) else
System.out.printf(“%d %s\n”, n, “is not a prime number”) }
Public static void main (String[]args)
{ Scanner sc = new Scanner (System.in);
Sytem.out.println( “enter a number to check if its prime or not”) int num = sc.nextInt();
isPrime (num);
}
prime:
n:
num:
84
83 84
14
85 86
8/11/21
Import java.util.Scanner;
Public class PrimeTest
{ public static void isPrime (int n)
{ boolean prime = true; If (n < 2) prime = false; for (int i=2; i < n; i++)
{ if (n%i==0) }
prime: DF n:
So, what this means is that you create all the storage needed when you enter a block or a procedure.
A data frame is a stack and so to access the value of n, you do DF(0). To access the value of prime, you do DF(1).
85
Possible flags: “x” /
“..”
Declaration status: declare or not
Declaration type e.g. integer, real,..
P1 P2
flags P3
Variable type: reserved or user-defined
Block level number Hence, we
Offset value
have these two values?
86
Such a method for allocating storage is known as dynamic allocation.
Advantages? Allow recursion in PLs!
How? First, observe no actual locations are reserved in the symbol table (as opposed to static allocation).
What then is recorded in the
symbol table? 2 numbers – block level and offset value
87
So, how does it work?
Call Procedure P1
prime:
n:
Create a data frame for it DF
If it calls itself, create a new DF
prime:
n:
DF
....and so on
So, storage is allocated away
prime:
n:
DF
from the Symbol Table and when needed!
88
87 88
15