程序代写代做代考 computer architecture compiler scheme Compilers and computer architecture Code-generation (2): register-machines

Compilers and computer architecture Code-generation (2): register-machines
Martin Berger
November 2015

Recall the function of compilers

Plan for this week

Plan for this week
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
In the previous section we introduced the stack machine architecture, and then investigated a simple syntax-directed code-generator for this architecture.
This week we continue looking at code generation, but for register machines, a faster CPU architecture. If time permits, we’ll also look at accumulator machines.

Recall: stack machine architecture

Recall: stack machine architecture
…
…
66
22
…
…
…
…
Add
13
PushAbs
12
PushAbs
…
3
Jump
15 14 13 12 11 10 9 8 7 6 5
4 3 2 1
0
SP
PC
Current instruction = Jump
Temporary register = 12

Recall: stack machine language

Recall: stack machine language
Nop
PushAbs x CompGreaterThan Jump l
Plus
Times
Negate
Pop x PushImm n CompEq JumpTrue l Minus Divide

Recall: stack machine language
Nop
PushAbs x CompGreaterThan Jump l
Plus
Times
Negate
Pop x PushImm n CompEq JumpTrue l Minus Divide
Important: arguments (e.g. to Plus) are always on top of the stack, and are ’removed’ (by rearranging the stack pointer (SP)). The result of the command is placed on the top of the stack.

Register machines
The problem with the stack machine is that memory access (on modern CPUs) is very slow in comparion with CPU operations.

Register machines
The problem with the stack machine is that memory access (on modern CPUs) is very slow in comparion with CPU operations.
The stack machine forces us constantly to access memory, even for the simplest operations. It would be nice if the CPU let us store, access and manipulate data directly, rather than only work on the top elements of the stack.
Registers are fast, temporary, addressable storage in the CPU, that let us do this, whence register machines.

Compilation for register machines

Compilation for register machines
Each CPU has only a small finite number of registers (e.g. 16, 32, 128). That can be a problem. Why?

Compilation for register machines
Each CPU has only a small finite number of registers (e.g. 16, 32, 128). That can be a problem. Why?
Because for small expressions, we can fit all the relevant parameters into the registers, but for the execution of larger expressions this is no longer the case. Moreover, what registers are available at each point in the computation depends on what other code is being executed. Hence a compiler must be able to do the following things.
􏹩 Generatecodethathasallrelevantparametersonlyin registers.
􏹩 Generatecodethathassome(maybeevenmost)relevant parameters outside of registers (e.g. main memory or stack).

A simple register machine

A simple register machine
…
…
66
22 …
…
…
…
Add
13
PushAbs
12
PushAbs
…
3
Jump
15
14
13 PC 12
11
10
SP
Cur. instr. = Jump
R0 = 2
9
8 R1= 42
7
6
5
4
3
2 R6=116 1
0
R2 = 33 R3 =12 R4 = 20 R5= 25
R7 = 123

A simple register machine
…
…
66
22 …
…
…
…
Add
13
PushAbs
12
PushAbs
…
3
Jump
15
14
13 PC 12
11
10
R2=33 65 R3 =12
4 R4=20 3 R5= 25
2 R6=116
SP
Cur. instr. = Jump
R0 = 2
9
8 R1= 42
Quite similar to the stack machine, but we have additional registers. Important: we can operate on registers, e.g. add, multiply … content of registers.
7
1 0
R7 = 123

Dealing with registers by ’divide-and-conquer’

Dealing with registers by ’divide-and-conquer’
To solve the difficult problem of generating efficient code for register machines, we split the code generation problem into three simpler sub-problems.
􏹩 Generatecodeassuminganunlimitedsupplyofregisters.
􏹩 Modifythetranslatortoevaluateexpressionsintheorder which minimises the number of registers needed, while still generating efficient code.

Commands for register machines

Commands for register machines
We assume an unlimited supply of registers R0, R1, R2, … ranged over by r, r’ We call these general purpose registers (as distinct from PC, SP).

Commands for register machines
We assume an unlimited supply of registers R0, R1, R2, … ranged over by r, r’ We call these general purpose registers (as distinct from PC, SP).
Nop Pop r
Push r Load r x
LoadImm r n Store r x
CompGreaterThan r r’
Does nothing
removes the top of the stack and stores it in register r
Pushes the content of the register r on stack
Loads the content of memory location x into register r
Loads integer n into register r
Stores the content of register r in memory location x
Compares the content of register r with the content of register r’. Stores 1 in r if former is bigger than latter, otherwise stores 0

Commands for register machines
CompEq r r’
Jump l JumpTrue r l
JumpFalse r l
Plus r r’ …
Compares the content of register r with the content of register r. Stores 1 in r if both are equal, otherwise stores 0
Jumps to l
Jumps to address/label l if the content of register r is not 0
Jumps to address/label l if the content of register r is 0
Adds the content of r and r’, leaving the result in r Remaining arithmetic operations are similar
Some commands have arguments (called operands). They take two (if the command has one operand) or three units of storage, the others only one.

Source language

Source language
Our source language is unchanged: a really simple imperative language.
M ::= M;M|forx=EtoE{M}|x:=E
E ::= n|x|E+E|E−E|E∗E|E/E|−E
Everything that’s difficult to compile, e.g. procedures, objects, is left out. We come to that later

Code generation for register machines

Code generation for register machines
def codegenExpr ( exp : Expr, target : Register )
: List [ Instruction ]

Code generation for register machines
def codegenExpr ( exp : Expr, target : Register )
: List [ Instruction ]
Important convention 1: The code generated by codeGenExpr( e, i ) can use registers Ri, Ri+1, … upwards, but must leave the other registers R0 … Ri-1 unchanged!
Important convention 2: The result of evaluating the expression is returned in register target.

Code generation for constants

Code generation for constants
def codegenExpr ( exp : Expr, target : Register ) = {
if exp is of shape
…
Const ( n ) then
List ( I_LoadImm ( target, n ) ) } }

Code generation for variables

Code generation for variables
def codegenExpr ( exp : Expr, target : Register ) = {
if exp is of shape
…
Ident ( x ) then
List ( I_Load ( target, x ) )

Code generation for binary expressions

Code generation for binary expressions
def codegenExpr ( exp : Expr, target : Register ) = {
if exp is of shape
…
Binop ( lhs, op, rhs ) then {
codegenExpr ( rhs, target ) ++
codegenExpr ( lhs, target+1 ) ++
codegenBinop ( op, target, target+1 ) }
where
def codegenBinop ( op : Op, r1 : Register,
r2 : Register ) = {
if op is of shape
Plus then List ( I_Plus ( r1, r2 ) )
Minus then List ( I_Minus ( r1, r2 ) )
Times then List ( I_Times ( r1, r2 ) )
Divide then List ( I_Divide ( r1, r2 ) ) } }

Code generation for binary expressions

Code generation for binary expressions
Note that the call codegenExpr(lhs, target+1) in
Binop ( lhs, op, rhs ) then {
codegenExpr ( rhs, target ) ++
codegenExpr ( lhs, target+1 ) ++
codegenBinop ( op, target, target+1 ) }
leaves the result of the first call codegenExpr(rhs, target) in the register target unchanged by our assumptions that codegenExpr never modifies registers below its second argument.

Example x*3+4

Example x*3+4
The expression x*3+4 has the following AST.

Example x*3+4
The expression x*3+4 has the following AST.
Binop (
Binop ( Ident ( “x” ), Times, Const ( 3 ) ),
Plus,
Const ( 4 ) )

Example x*3+4
The expression x*3+4 has the following AST.
Binop (
Binop ( Ident ( “x” ), Times, Const ( 3 ) ),
Plus,
Const ( 4 ) )
When we compile it (to target register r17, say), we obtain:

Example x*3+4
The expression x*3+4 has the following AST.
Binop (
Binop ( Ident ( “x” ), Times, Const ( 3 ) ),
Plus,
Const ( 4 ) )
When we compile it (to target register r17, say), we obtain:
LoadImm r17 4
LoadImm r18 3
Load r19 x
Times r18 r19
Plus r17 r18

How can this be improved (1)?

How can this be improved (1)?
Let’s use commutativity of addition (a+b = b+a) and compile 4+x*3! When we compile it, we obtain:

How can this be improved (1)?
Let’s use commutativity of addition (a+b = b+a) and compile 4+x*3! When we compile it, we obtain:
LoadImm r17 3
Load r18 x
Times r17 r18
LoadImm r18 4
Plus r17 r18
How is this better?

Side by side

Side by side
Compilation of x*3+4
LoadImm r17 4
LoadImm r18 3
Load r19 x
Times r18 r19
Plus r17 r18
Compilation of 4+x*3
LoadImm r17 3
Load r18 x
Times r17 r18
LoadImm r18 4
Plus r17 r18

Side by side
Compilation of x*3+4
LoadImm r17 4
LoadImm r18 3
Load r19 x
Times r18 r19
Plus r17 r18
Compilation of 4+x*3
LoadImm r17 3
Load r18 x
Times r17 r18
LoadImm r18 4
Plus r17 r18
The translation on the left uses 3 registers, while the right only two. We are currently assuming an unbounded number of registers, so there’s no difference, but for realistic CPUs, the number of registers is small, so smart translation strategies that save registers are better. More on this later!

How can this be improved (2)?

How can this be improved (2)?
We used two registers – can we get away with fewer?

How can this be improved (2)?
We used two registers – can we get away with fewer?
Probably not with this machine architecture. But most widely used CPUs can use constants directly in arithmetic operations, e.g.
MulImm r 3
Multiplies the content of register r with 3, storing the result in r
AddImm r 4
Adds the content of register r with 3, storing the result in r

Translation of statements

Translation of statements
Similar to stack machine, except that arguments and results of expressions are held in registers. We’ll see this in detail later.

Translation of statements
Similar to stack machine, except that arguments and results of expressions are held in registers. We’ll see this in detail later.
Question: Does the codeGenStatement method need to be passed a ’target register’? Answer: Yes, because statements may contain expressions, e.g. x := x*y+3.
Question: What register should we use as first register when we start code generation? Answer: doesn’t matter for unrestricted registers, but if registers are finite, use the lowest possible, i.e. R0. Why? Because code generation will not use registers below target, so any register below the initial one will not be used at all (= waste).

Bounded register numbers

Bounded register numbers
It’s easy to compile to register machine code, when the number of registers is unlimited. Now we look at compilation to register machines with a fixed number of registers.
Let’s go to the other extreme: just one register called accumulator. Operations take one of their arguments from the accumulator, and store the result in the accumulator. Additional arguments are taken from the top of the stack.
Why is this interesting? We can combine two strategies:
􏹩 While>1freeregistersremain,usetheregistermachine strategy discussed above for compilation.
􏹩 Whenthelimitisreached(ie.whenthereisoneregister left), revert to the accumulator strategy, using the last register as the accumulator.

Related Posts