程序代做CS代考 chain compiler assembly Compiler Design Week 10

Compiler Design Week 10

Detailed content
Weekly program
 Week  Week  Week  Week  Week  Week  Week  Week  Week
1 – Introduction to Compiler Design 2 – Lexical Analysis
3 – CD Programming Language
4 – Syntax Analysis
5 – Top Down Parsing
6 – Symbol Table and Error Recovery
7 – Bottom-Up Parsing
8 – Semantic Analysis
9 – Run-Time Environment: Stack Machine (SM) Architecture
2
 Week 10 – Code Generation
 Week 11 – Code Optimisation
 Week 12 – Revision
 Week 13 – Extra revision (if needed)
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au

Week 10 Lecture Outline
Code Generation
 Code Generation
 Intermediate Representations
 Stack Machine Code Module
 Key to Code Generation
 Recursive Tree Traversal for CG  Processing Declarations
 Order of CG
 Maintaining the Generated Code  Backward and Forward Branches  Backchaining/Backpatching
 Sub-Programs
3
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au

Code Generation
Lexical Analyser
Lexemes (tokens)
Syntax Tree
Syntax Analyser
Symbol Table
Source Code
Object Code
Source Code
Front-End
Lexical error Syntax error Semantic error
Intermediate representation
Back-End
Target machine code
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
Code Gen.
4

Intermediate Representations
• Graphical representations:
– Abstract Syntax Tree (AST)
– Directed Acyclic Graph (DAG)
• Postfix notation: operations on values stored on operand stack
– a := b * -c in Postfix notation a b c uminus * assign
• Three-address code: (e.g. triples and quads) t := x op y
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
5

Reason for Separation
• Where should the Code Generator go with respect to the other parts of the Compiler?
– In a separate module that accepts (at least) the entire Syntax Tree from the Semantic Analyser. Why?
– This is the most often and most likely replaced part of the compiler
– Developing a compiler for the same language on a different architecture becomes a simpler task
• A change in any other phase of the compiler is effectively a change in the language being compiled,
• A change/replacement of the Code Generator may simply be the extension of the language onto more hardware platforms.
• If minor changes to the language are required, they can be done
independently of the Code Generator(s), and it is quite possible
that the Code Generator(s) will not need to change
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
6

Aim of Code Generation
• Code produced by compiler must be correct
– Source to target program transformation is semantics preserving
• Code produced by compiler should be of high quality
– Effective use of target machine resources
– Heuristic techniques can generate good but suboptimal code, because generating optimal code is undecidable
• Issues in code generation
– Type of code? Absolute, relocatable, assembly, bytecode
• Dependsontherequirement
– Need to select instruction among alternatives
– Need to consider instruction costs
• Involvesdifferentaddressingmodes
– Efficient utilization of registers
• Optimal register assignment in general is NP-complete
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
7

The SM Code Module
Instr Size
Instructions
Int Size
Integer Constants
Float Size
Float Constants
String Size
Strings’ Characters
• Flat text file of integer (and floating point) values
• Loading of the file is controlled by the 4 word-counts and is StreamTokenizer-based and so it can be basically any format that distinguishes each value as a single whitespace-delimited token
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
Note: The instruction space should be word-filled probably with HALT (0) instruction.
Note: Individual String Constants do not have to be word-filled. So individual
strings don’t have to start at word boundary. Only the last one needs to be word-filled.
8

The SM Code Module
• 4 explicit sections
• Each of the 4 sections is preceded by a word-count
– Instructions – word-filled, usually with HALT instructions
– Integer Constants
– Floating Point Constants
– String Constants – last one word-filled, usually with Null chars
• Note: any errors in matching the word counts with their relevant code module sections, will result in the loader not setting up your program properly when it is run on the SM.
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
9

SM Code
10
80 0 0 0 80 52 91 0
0 0
80 0
0 0
0 0
88 62
88 62
63 80
96 80 0 0 0 104
0 0 96 11 51 91
0 88 61 43 81 0
96 62 81
65 65 81
65 90 0
0 0 0 0 0 0 0 0 114
0 0 0
0 0 23 24 25 26 27 28
71 70 4
17 12 15
3 0 2
72 11 0
49 50 65 00 67 68 69 00
53 54 55 71 72 73 00 00
88 62 65 0 0 0
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
10

Order of Code Generation
• The order in which code is generated for the main program and sub- programs is also important, i.e. main program first and then functions (or vice-versa).
• Remember that the entry point of the program is worked out from the Code Module by SM simulator.
Entry point (0,0) Entry point (0,0)
fun1
fun2
main
Constant
main
fun1
fun2
Constants
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
Jump
11

The SM Code Module
• Position of Sub routines
• Position of real, larger integer constants or string literals will require forward-branch style code generation
– Manage multiple references to those (forward-branched) constants (back chaining)
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
12

Key to Code Generation
• Code generated for most phrases of a program will be independent of
the context within which the phrase is used.
• Eg. the expression x+2 (within the main program area of a CD
program) has code of:
LV 1,x LB 2 ADD
NADD
• Remember that 1, x means (Base Reg 1, offset of var X)
– The base register number and the offset of var x comes from symbol table
– Works similarly for constants, global variable, local variable, parameters
Var: X
Const 2
• How to do this?
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
NADD
Var: X
Const: 5000
13

Key to Code Generation
Code for expression x+2 :
LV 1, x LB 2 ADD
This will be the same for any of the following:
(x+2) * y
NMUL
Var: y Const 2
y=x+2
NASGN
Var: x
if ( x+2 > 10 )…
NIFTE
NADD
Var: y
NADD
NGRT NADD
LA
IF: stats
Else: stats
Var: x
Const 2
Const: 10 Const 2
code for x+2
LA
1, y
0, “else 1”
LV MUL
05/10/2021
1, y
code for x+2
ST
code for x+2
Var: x
LB 10 SUB
GT
BF
[OR: LV 0, “10”]
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
14

Recursion (again?)
• This implies that a simple recursive approach can be used:
Eg. for any conditional of the form if expr1 > expr2 the code generator can
– invoke itself in expr1 and then
– invokeitselfinexpr2andthen
– generate a SUB and a GT instruction.
NGRT
• A compiler which has a parser that produces a syntax tree makes this very straightforward.
• The code generator will behave like a tree traversal routine.
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
e1
e2
15

Recursion (continued … … )
• When the code generator processes an NGRT node it calls a routine to do e1, then the same routine to do e2, and before leaving NGRT, it generates the SUB and GT instructions.
– All the expressions can be done similar way
– In fact all the codes can be generated in similar fashion
• We could have written one big tree traversal routine, but you can see that this would be pretty large and messy – even for a simple language like ours.
• The syntax tree has a fixed structure so we can write a number of small traversal routines which will do the same job and be easier to write and understand (not to mention test & correct).
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
16

Processing Declarations (A)
Consider the instruction LV 1,x
– x is part of the main program and an offset has been alllocated for it
from the start of the base register b1
– Offset of x is a multiple of 8 (meaning each variable has 8 bytes –
64 bits)
For each variable, we need to remember where it is allocated.
• It is given by the name of the base register and the offset of variable’s
location from the base address of the register.
• This (base, offset) information is to be stored in the symbol table for
each variable declaration.
• While generating code, this info can be accessed from the symbol table
record for the variables earlier appearing in the syntax tree.
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
17

Processing Declarations (B) Where to get the (base, offset) information
Base Register:
• If the identifier is a constant, then base register is 0
• If it is in the main program then base register is 1, and
• If it is in a procedure or function then base register is 2
Offset:
• For each base register, use a counter
• The counter needs to be incremented by 8, each time a new identifier is encountered for the respective base register.
• The counter for Base register 2 is reset to either –8 or +16 (why not 0 or +8?) every time we enter the declaration section of a new procedure or function.
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
18

Constants Section
• Constants are accessed relative to base register 0 (base address is at the beginning of the instruction area)
• Most of the constants are not declared explicitly!
• Small integer constants can use immediate mode addressing, using LB
and LH instructions – e.g., LB 2 (loads value 2 at the top of stack)
• Constants come after the instructions in object code. So, we can’t know about the offsets of constants until after the code generation for all the instructions is over!
• Once the size of the instruction space is known, we can do a pass through the Constants Section of the Symbol Table and simply allocate an offset to each item.
– Handled using backchaining/backpatching
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
19

Global Variables (& Main Program Locals)
• Each will have a Base Register value of 1 set into its Symbol Table record
• The “offset counter” can start at 0 and be incremented in lots of 8 every time a declaration of a new variable is encountered in the Syntax Tree.
• Because this Symbol Table (or the “main program area” of the Symbol Table if scoping is not implemented) is effectively global, it will be possible to process the declarations by way of an iterator over the Symbol Table.
• Processing the declarations via the Syntax Tree may have the advantage of allocating offsets in the order that the variable names were declared.
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
20

Declarations in Sub-Programs
• Every variable declared within a sub-program will have a Base Register value of 2
• Each procedure and function will have its own symbol table and there are two distinct types of variables in this symbol table
– Formal Parameters
• have negative offsets, starting at –8 and proceeding –16, –24, etc.
– Local Variables
• have positive offsets, starting at 16, and proceeding 24, 32, etc.
– Remember that (2, 0) is the Call Frame, and (2, 8) is the number of actual parameters, for this call.
• It is important that parameters be allocated in order (why?)
• Having processed the sub-program declarations, every formal parameter and local variable will have a (BR,offset) pair allocated and stored into its relevant Symbol Table record.
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
21

The first few instructions …
• Once all the global variables (and main locals) have had offsets allocated and stored into their relevant Symbol Table records, we know how much space needs to be allocated at the beginning of the runtime stack at the start of the program execution to do the actual allocation of memory for these variables.
• Suppose there are 17 variables (136 bytes of offset allocated),
– Then we need to set the stack pointer forward by these 17 words.
– Can be done two ways:
• LB 17 and ALLOC [ LH can be also used instead of LB] OR
• LV 0, ‘17’ and ALLOC
– Be careful that 17 is already one of the integer constants set up in the instruction area.
• We will also need to use ARRAY instruction to actually allocate every array
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
22

Maintaining the Generated Code
• SM expects specific format
• Need to generate code according to this format
• But, code module can not be produced sequentially due to absence of
– the constants area during code generation from syntax tree
– branching to addresses in conditional statements and absent functions.
• So a structure, that maintains the code generated until it can be output, is definitely required.
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
23

Maintaining the Generated Code
• We will also need to maintain a “Code Generator Equivalent of the Program Counter” so that we can identify branch targets, etc.
CGPC
0000::80 0 0 080 0005:: 52
0006:: 41 3
0008:: 80 0 0
……. ……..
0 104
• A current Code Generation Position (CGP) or Code Generation Program Counter (CG-PC) is updated globally to the Code Generation Process, while we simply recursively call appropriate routines that will generate instructions required for a particular (small) fragment of code.
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
24

Generating Instructions
• Given the Key to Code Generation outlined earlier when augmented by the variable allocation strategy and the Symbol Table, this is now quite straight forward.
• Update CGPC globally to the Code Generation Process
• We simply recursively call appropriate routines that will generate instructions required for a particular (small) fragment of program.
• Some special usages will need to be taken care of:
– is the variable used in reference context or value context?
– do we need to remember this code position for some purpose?
• BUT … code to do similar things is always generated in similar ways.
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
25

Code Generation – Illustration
CGPC=0
• inti,j,k
S/T
i
1, 0
j
1, 8
k
1,16
LB 3 ALLOC
00
08
16
24
32
40
48
….
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
26

Code Generation – Illustration
CGPC=3
• inti,j,k • input i;
S/T
i
1,0
j
1,8
k
1,16
LA “i” [S/T: 1,0] READI
ST
00
08
16
24
32
40
48
….
LB
3
ALLOC
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
27

Code Generation – Illustration
CGPC=10
• inti,j,k
• input i;
• Input j;
S/T
i
1,0
j
1,8
k
1,16
00
LB
3
ALLOC
LA 1
0
0
0
0
08
READI
ST
16
24
32
40
48
….
LA “j” [ST: 1, 8]
READI
ST
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
28

Code Generation – Illustration
CGPC=17
• inti,j,k
• input i;
• Input j; • k=i+j;
S/T
i
1,0
j
1,8
k
1,16
00
LB
3
ALLOC
LA 1
0
0
0
0
08
READI
ST
LA 1
0
0
0
8
READI
16
ST
24
32
40
48
….
LA “k” [ST: 1, 16] LV “i” [sT: 1, 0] LV “j’ [ST: 1, 8] ADD
ST
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
29

Code Generation – Illustration
CGPC=34
• inti,j,k
• input i;
• Input j; • k=i+j; •… •…
S/T
i
1,0
j
1,8
k
1,16
00
LB
3
ALLOC
LA 1
0
0
0
0
08
READI
ST
LA 1
0
0
0
8
READI
16
ST
LA 1
0
0
0
16
LV 1
0
24
0
0
0
LV 1
0
0
0
8
32
ADD
ST
40
48
….
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
30

Array/Record Structures
• In CD arrays can only have elements which are structure types, and structure types can only be used as array elements.
• This can be easily simulated in SM with a SM array
• Each field name maps onto a particular offset within each element of the array – each element has a size equal to the number of fields in the structure type
• Since each field is only a simple type it will occupy a single SM word
• The array size is just the number of elements in the array multiplied by the number of fields in the record
OFF
Z
Z[0]
Z[1]
field
a
b
c
a
Z[2]
b
c
a
b
c
a
b
..
E.g. X is a:real, b:int, c:boolean end /– size of struct is 3 Y is array [10] of X /– length of array is 10 Z : Y /– Z will be an SM array of size 10*3 = 30
00 01 02 03 04 05
A designation like Z[4].b will be evaluated as the (4*3+1)th word of Z. 06 07
05/10/2021
COMP3290 – Semester 2 – 2021 | www.newcastle.edu.au
08 09 10 ..
31



Code Generation for Arrays
X[M+N].a = Y+5

Y+5> ST
will recognise an array
LV 1,x
M+N> INDEX
M+N>
ADD

NASGN
Var
EXP


Var: M
Var: N
ID, X
ARRV EXP
NADD
ID: a

05/10/2021
COMP3290 - Semester 2 - 2021 | www.newcastle.edu.au
32




Code Generation for Arrays


X[M+N].a = Y+5

Y+5> ST
will recognise an array
LV 1,x
M+N> INDEX
M+N>
ADD


Y = X[M+N].a
X[M+N] ST

LV 1,x
M+N> INDEX
L
Note that identifying when to use an LV, LA, or L instructions will require some thought


05/10/2021
COMP3290 - Semester 2 - 2021 | www.newcastle.edu.au
33

Some Special Cases
• Take the case of: – X+=Y+Z
+=assign
LA 1,x Simple var Add
– This will have as a Syntax Tree
– Code could be
LV 1,y LV 1,z ADD LV 1,x ADD ST
• BUT .. What else could/should it be? – An introduction to optimization
05/10/2021
COMP3290 - Semester 2 - 2021 | www.newcastle.edu.au
X
Simple var
YZ
Simple var
34

Backward Branches
• How do we cause the program to Branch?
• Or Call a sub-program?
• The instructions are:
• LA 0, “byte position of target”
• B (or BF or BT or JS2)
• Note that this designation is nothing other than a (BR, Offset) pair which can be stored in a Symbol Table record in a similar fashion to any other variable.
• In the case of a Branch target, it will not need to be attached to any hashing structure, it can simply be a lone S/T record.
• When we generate code at a place which we know will be a branch target, we “remember the position” so that it can be referred to later – in exactly the same way that we do when we allocate/use variables.
(Eg: repeat statements in CD)
05/10/2021
COMP3290 - Semester 2 - 2021 | www.newcastle.edu.au
35

Backward Branches
repeat (i = 0)
i = i+1
until i < 10

BT
asgn
NREPT
stmts bool
Remember
10000: Generate the assignment codes
10008: Generate the codes for statements
10021: LA 0, 10008 [ address of the beginning of ]
Generate code for Boolean BT
• How do you identify the backward branch target address?
– Remember CGPC
– It is a bound address
05/10/2021
COMP3290 - Semester 2 - 2021 | www.newcastle.edu.au
36

Backward Branches
• How to manage nesting? – Recursion
– Remember CGPC locally
05/10/2021
repeat (i = 0)
...
repeat (j = 10)
...
j = j -1;
until j > 10;
...
i = i+1;
until i < 10; COMP3290 - Semester 2 - 2021 | www.newcastle.edu.au 37 Forward Branches • These are different in structure: – Do not know the target jump address – Still need to use (remembering) a particular position (offset) within the code module (BR 0) • So can use a similar S/T record set up for this purpose – Also note that even though we don’t know the value of the offset for the target address of the Branch instruction, we still know exactly how many bytes of instruction space is required • One essential difference in function call is that there may be many places (more than one) which require a branch to a (currently unknown) target position • One solution to this is ... Backchaining ... 05/10/2021 COMP3290 - Semester 2 - 2021 | www.newcastle.edu.au 38 Code generation for IF statement • if-then-else NIFTE stmts Remember 1 Remember 2 10000: LA 0, “Blank 1” 10005: 10204: BF
10205: 10405: LA 0, “Blank 2”
10410: B
10411:
10500:
bool
stmts
Jump To Else Block
Jump Over the Else Block
05/10/2021
COMP3290 - Semester 2 - 2021 | www.newcastle.edu.au
39

Code generation for FOR statement
• for loop
20000:
NFOR asgnlst bool
Remember 1
20010: LA 0, “Blank 1”
20015: 20204: BF
20205:
20405: LA 0, “Remember 1”
20410: B
20411:
stmts
Jump over the body Jump back to beginning
05/10/2021
COMP3290 - Semester 2 - 2021 | www.newcastle.edu.au
40

Code Generation for Sub-Programs
• If code is generated for the main program first then (almost) every sub- program call will be a forward branch (JS2)
• Parameters will be special cases:
– How do we differentiate between value , reference and
constant reference parameters?
– What are the implications on code generation of a formal
parameter being either value or reference?
• Other special cases:
– The return statement(s) actually turn(s) out to be straight
forward because they map onto the architecture so well
• Most other things will remain the same, and are taken care of by different BR and offset values in the Symbol Table records.
05/10/2021
COMP3290 - Semester 2 - 2021 | www.newcastle.edu.au
41

Code Generation for Sub-Programs
• The sample code with function calls:
main () ...
fun1()
...
fun1()
...
fun2()
...
fun1()
fun2()
... fun1() ...
05/10/2021
COMP3290 - Semester 2 - 2021 | www.newcastle.edu.au
42

Code Generation for Sub-Programs
• First call to fun1() reached while generating code.
00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000:
main () ...
fun1() [LA 0,off-f1]
...
fun1()
...
fun2()
...
fun1()
fun2()
... fun1() ...
main () ...
fun1()
...
fun1()
...
fun2()
...
fun1()
fun2()
... fun1() ...
05/10/2021
COMP3290 - Semester 2 - 2021 | www.newcastle.edu.au
Fun1: Unbound
43

Code Generation for Sub-Programs
• Second call to fun1() reached while generating code.
00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000:
main () ...
fun1() [LA 0,off-f1]
...
fun1() [LA 0,off-f1]
... fun2() ...
fun1()
fun2()
... fun1() ...
main () ...
fun1()
...
fun1()
...
fun2()
...
fun1()
fun2()
... fun1() ...
05/10/2021
COMP3290 - Semester 2 - 2021 | www.newcastle.edu.au
Fun1: Unbound
44

Code Generation for Sub-Programs
• The entry-point of fun1 reached and off-f1 is now known/bound.
00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00440: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000:
main () ...
fun1() [LA 0,440]
...
fun1() [LA 0,440]
... fun2() ...
fun1()
fun2()
... fun1() ...
main () ...
fun1()
...
fun1()
...
fun2()
...
fun1()
fun2()
... fun1() ...
05/10/2021
COMP3290 - Semester 2 - 2021 | www.newcastle.edu.au
Fun1: Bound, (0, 440 )
45

Code Generation for Sub-Programs
• A third call to fun1() reached afterwards.
00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00440: 00000: 00000: 00000: 00000: 00000: 00000: 00000: 00000:
main () ...
fun1() [LA 0,440]
...
fun1() [LA 0,440]
... fun2() ...
fun1()
fun2()
...
fun1()
...
main () ...
fun1()
...
fun1()
...
fun2()
...
fun1()
fun2()
... fun1() ...
• What you do?
05/10/2021
COMP3290 - Semester 2 - 2021 | www.newcastle.edu.au
Fun1: Bound, (0, 440 )
46