Compilers and computer architecture Code-generation (2): register-machines
Martin Berger
November 2015
Recall the function of compilers
Plan for this week
Plan for this week
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
In the previous section we introduced the stack machine architecture, and then investigated a simple syntax-directed code-generator for this architecture.
Plan for this week
Source program
Lexical analysis
Syntax analysis
Semantic analysis, e.g. type checking
Intermediate code generation
Optimisation
Code generation
Translated program
In the previous section we introduced the stack machine architecture, and then investigated a simple syntax-directed code-generator for this architecture.
This week we continue looking at code generation, but for register machines, a faster CPU architecture. If time permits, we’ll also look at accumulator machines.
Recall: stack machine architecture
Recall: stack machine architecture
…
…
66
22
…
…
…
…
Add
13
PushAbs
12
PushAbs
…
3
Jump
15 14 13 12 11 10 9 8 7 6 5
4 3 2 1
0
SP
PC
Current instruction = Jump
Temporary register = 12
Recall: stack machine language
Recall: stack machine language
Nop
PushAbs x CompGreaterThan Jump l
Plus
Times
Negate
Pop x PushImm n CompEq JumpTrue l Minus Divide
Recall: stack machine language
Nop
PushAbs x CompGreaterThan Jump l
Plus
Times
Negate
Pop x PushImm n CompEq JumpTrue l Minus Divide
Important: arguments (e.g. to Plus) are always on top of the stack, and are ’removed’ (by rearranging the stack pointer (SP)). The result of the command is placed on the top of the stack.
Register machines
Register machines
The problem with the stack machine is that memory access (on modern CPUs) is very slow in comparion with CPU operations.
Register machines
The problem with the stack machine is that memory access (on modern CPUs) is very slow in comparion with CPU operations.
The stack machine forces us constantly to access memory, even for the simplest operations. It would be nice if the CPU let us store, access and manipulate data directly, rather than only work on the top elements of the stack.
Register machines
The problem with the stack machine is that memory access (on modern CPUs) is very slow in comparion with CPU operations.
The stack machine forces us constantly to access memory, even for the simplest operations. It would be nice if the CPU let us store, access and manipulate data directly, rather than only work on the top elements of the stack.
Registers are fast, temporary, addressable storage in the CPU, that let us do this, whence register machines.
Register machines
The problem with the stack machine is that memory access (on modern CPUs) is very slow in comparion with CPU operations.
The stack machine forces us constantly to access memory, even for the simplest operations. It would be nice if the CPU let us store, access and manipulate data directly, rather than only work on the top elements of the stack.
Registers are fast, temporary, addressable storage in the CPU, that let us do this, whence register machines.
But compilation for register machines is more complicated than compilation for stack machines. Can you guess why?
Compilation for register machines
Compilation for register machines
Each CPU has only a small finite number of registers (e.g. 16, 32, 128). That can be a problem. Why?
Compilation for register machines
Each CPU has only a small finite number of registers (e.g. 16, 32, 128). That can be a problem. Why?
Because for small expressions, we can fit all the relevant parameters into the registers, but for the execution of larger expressions this is no longer the case. Moreover, what registers are available at each point in the computation depends on what other code is being executed. Hence a compiler must be able to do the following things.
Compilation for register machines
Each CPU has only a small finite number of registers (e.g. 16, 32, 128). That can be a problem. Why?
Because for small expressions, we can fit all the relevant parameters into the registers, but for the execution of larger expressions this is no longer the case. Moreover, what registers are available at each point in the computation depends on what other code is being executed. Hence a compiler must be able to do the following things.
Generatecodethathasallrelevantparametersonlyin registers.
Compilation for register machines
Each CPU has only a small finite number of registers (e.g. 16, 32, 128). That can be a problem. Why?
Because for small expressions, we can fit all the relevant parameters into the registers, but for the execution of larger expressions this is no longer the case. Moreover, what registers are available at each point in the computation depends on what other code is being executed. Hence a compiler must be able to do the following things.
Generatecodethathasallrelevantparametersonlyin registers.
Generatecodethathassome(maybeevenmost)relevant parameters outside of registers (e.g. main memory or stack).
Compilation for register machines
Each CPU has only a small finite number of registers (e.g. 16, 32, 128). That can be a problem. Why?
Because for small expressions, we can fit all the relevant parameters into the registers, but for the execution of larger expressions this is no longer the case. Moreover, what registers are available at each point in the computation depends on what other code is being executed. Hence a compiler must be able to do the following things.
Generatecodethathasallrelevantparametersonlyin registers.
Generatecodethathassome(maybeevenmost)relevant parameters outside of registers (e.g. main memory or stack).
Detectwhichoftheaboveisthecase,andbeableto switch between the two seamlessly.
Compilation for register machines
Each CPU has only a small finite number of registers (e.g. 16, 32, 128). That can be a problem. Why?
Because for small expressions, we can fit all the relevant parameters into the registers, but for the execution of larger expressions this is no longer the case. Moreover, what registers are available at each point in the computation depends on what other code is being executed. Hence a compiler must be able to do the following things.
Generatecodethathasallrelevantparametersonlyin registers.
Generatecodethathassome(maybeevenmost)relevant parameters outside of registers (e.g. main memory or stack).
Detectwhichoftheaboveisthecase,andbeableto switch between the two seamlessly.
All of this makes compilers more difficult.
Compilation for register machines
Each CPU has only a small finite number of registers (e.g. 16, 32, 128). That can be a problem. Why?
Because for small expressions, we can fit all the relevant parameters into the registers, but for the execution of larger expressions this is no longer the case. Moreover, what registers are available at each point in the computation depends on what other code is being executed. Hence a compiler must be able to do the following things.
Generatecodethathasallrelevantparametersonlyin registers.
Generatecodethathassome(maybeevenmost)relevant parameters outside of registers (e.g. main memory or stack).
Detectwhichoftheaboveisthecase,andbeableto switch between the two seamlessly.
All of this makes compilers more difficult.
A simple register machine
A simple register machine
…
…
66
22 …
…
…
…
Add
13
PushAbs
12
PushAbs
…
3
Jump
15
14
13 PC 12
11
10
SP
Cur. instr. = Jump
R0 = 2
9
8 R1= 42
7
6
5
4
3
2 R6=116 1
0
R2 = 33 R3 =12 R4 = 20 R5= 25
R7 = 123
A simple register machine
…
…
66
22 …
…
…
…
Add
13
PushAbs
12
PushAbs
…
3
Jump
15
14
13 PC 12
11
10
R2=33 65 R3 =12
4 R4=20 3 R5= 25
2 R6=116
SP
Cur. instr. = Jump
R0 = 2
9
8 R1= 42
Quite similar to the stack machine, but we have additional registers. Important: we can operate on registers, e.g. add, multiply … content of registers.
7
1 0
R7 = 123
Dealing with registers by ’divide-and-conquer’
Dealing with registers by ’divide-and-conquer’
To solve the difficult problem of generating efficient code for register machines, we split the code generation problem into three simpler sub-problems.
Dealing with registers by ’divide-and-conquer’
To solve the difficult problem of generating efficient code for register machines, we split the code generation problem into three simpler sub-problems.
Generatecodeassuminganunlimitedsupplyofregisters.
Dealing with registers by ’divide-and-conquer’
To solve the difficult problem of generating efficient code for register machines, we split the code generation problem into three simpler sub-problems.
Generatecodeassuminganunlimitedsupplyofregisters.
Modifythetranslatortoevaluateexpressionsintheorder which minimises the number of registers needed, while still generating efficient code.
Dealing with registers by ’divide-and-conquer’
To solve the difficult problem of generating efficient code for register machines, we split the code generation problem into three simpler sub-problems.
Generatecodeassuminganunlimitedsupplyofregisters.
Modifythetranslatortoevaluateexpressionsintheorder which minimises the number of registers needed, while still generating efficient code.
Inventaschemetohandlecaseswherewerunoutof registers.
Dealing with registers by ’divide-and-conquer’
To solve the difficult problem of generating efficient code for register machines, we split the code generation problem into three simpler sub-problems.
Generatecodeassuminganunlimitedsupplyofregisters.
Modifythetranslatortoevaluateexpressionsintheorder which minimises the number of registers needed, while still generating efficient code.
Inventaschemetohandlecaseswherewerunoutof registers.
Let’s start by looking at register machines with an unlimited number of registers.
Commands for register machines
Commands for register machines
We assume an unlimited supply of registers R0, R1, R2, … ranged over by r, r’ We call these general purpose registers (as distinct from PC, SP).
Commands for register machines
We assume an unlimited supply of registers R0, R1, R2, … ranged over by r, r’ We call these general purpose registers (as distinct from PC, SP).
Nop Pop r
Push r Load r x
LoadImm r n Store r x
CompGreaterThan r r’
Does nothing
removes the top of the stack and stores it in register r
Pushes the content of the register r on stack
Loads the content of memory location x into register r
Loads integer n into register r
Stores the content of register r in memory location x
Compares the content of register r with the content of register r’. Stores 1 in r if former is bigger than latter, otherwise stores 0
Commands for register machines
CompEq r r’
Jump l JumpTrue r l
JumpFalse r l
Plus r r’ …
Compares the content of register r with the content of register r. Stores 1 in r if both are equal, otherwise stores 0
Jumps to l
Jumps to address/label l if the content of register r is not 0
Jumps to address/label l if the content of register r is 0
Adds the content of r and r’, leaving the result in r Remaining arithmetic operations are similar
Commands for register machines
CompEq r r’
Jump l JumpTrue r l
JumpFalse r l
Plus r r’ …
Compares the content of register r with the content of register r. Stores 1 in r if both are equal, otherwise stores 0
Jumps to l
Jumps to address/label l if the content of register r is not 0
Jumps to address/label l if the content of register r is 0
Adds the content of r and r’, leaving the result in r Remaining arithmetic operations are similar
Some commands have arguments (called operands). They take two (if the command has one operand) or three units of storage, the others only one.
Commands for register machines
CompEq r r’
Jump l JumpTrue r l
JumpFalse r l
Plus r r’ …
Compares the content of register r with the content of register r. Stores 1 in r if both are equal, otherwise stores 0
Jumps to l
Jumps to address/label l if the content of register r is not 0
Jumps to address/label l if the content of register r is 0
Adds the content of r and r’, leaving the result in r Remaining arithmetic operations are similar
Some commands have arguments (called operands). They take two (if the command has one operand) or three units of storage, the others only one.
Question: Why do we bother with stack operations?
Commands for register machines
CompEq r r’
Jump l JumpTrue r l
JumpFalse r l
Plus r r’ …
Compares the content of register r with the content of register r. Stores 1 in r if both are equal, otherwise stores 0
Jumps to l
Jumps to address/label l if the content of register r is not 0
Jumps to address/label l if the content of register r is 0
Adds the content of r and r’, leaving the result in r Remaining arithmetic operations are similar
Some commands have arguments (called operands). They take two (if the command has one operand) or three units of storage, the others only one.
Question: Why do we bother with stack operations?
Used for e.g. procedure/method invocation and to store intermediate results. We’ll talk about that later.
Source language
Source language
Our source language is unchanged: a really simple imperative language.
M ::= M;M|forx=EtoE{M}|x:=E
E ::= n|x|E+E|E−E|E∗E|E/E|−E
Everything that’s difficult to compile, e.g. procedures, objects, is left out. We come to that later
Code generation for register machines
Code generation for register machines
def codegenExpr ( exp : Expr, target : Register )
: List [ Instruction ]
Code generation for register machines
def codegenExpr ( exp : Expr, target : Register )
: List [ Instruction ]
Important convention 1: The code generated by codeGenExpr( e, i ) can use registers Ri, Ri+1, … upwards, but must leave the other registers R0 … Ri-1 unchanged!
Code generation for register machines
def codegenExpr ( exp : Expr, target : Register )
: List [ Instruction ]
Important convention 1: The code generated by codeGenExpr( e, i ) can use registers Ri, Ri+1, … upwards, but must leave the other registers R0 … Ri-1 unchanged!
Important convention 2: The result of evaluating the expression is returned in register target.
Code generation for register machines
def codegenExpr ( exp : Expr, target : Register )
: List [ Instruction ]
Important convention 1: The code generated by codeGenExpr( e, i ) can use registers Ri, Ri+1, … upwards, but must leave the other registers R0 … Ri-1 unchanged!
Important convention 2: The result of evaluating the expression is returned in register target.
One way of thinking about this target is that it is used to track where the stack pointer would point.
Code generation for register machines
def codegenExpr ( exp : Expr, target : Register )
: List [ Instruction ]
Important convention 1: The code generated by codeGenExpr( e, i ) can use registers Ri, Ri+1, … upwards, but must leave the other registers R0 … Ri-1 unchanged!
Important convention 2: The result of evaluating the expression is returned in register target.
One way of thinking about this target is that it is used to track where the stack pointer would point.
Similar conventions for codegen for statements.
Code generation for constants
Code generation for constants
def codegenExpr ( exp : Expr, target : Register ) = {
if exp is of shape
…
Const ( n ) then
List ( I_LoadImm ( target, n ) ) } }
Code generation for variables
Code generation for variables
def codegenExpr ( exp : Expr, target : Register ) = {
if exp is of shape
…
Ident ( x ) then
List ( I_Load ( target, x ) )
Code generation for binary expressions
Code generation for binary expressions
def codegenExpr ( exp : Expr, target : Register ) = {
if exp is of shape
…
Binop ( lhs, op, rhs ) then {
codegenExpr ( rhs, target ) ++
codegenExpr ( lhs, target+1 ) ++
codegenBinop ( op, target, target+1 ) }
where
def codegenBinop ( op : Op, r1 : Register,
r2 : Register ) = {
if op is of shape
Plus then List ( I_Plus ( r1, r2 ) )
Minus then List ( I_Minus ( r1, r2 ) )
Times then List ( I_Times ( r1, r2 ) )
Divide then List ( I_Divide ( r1, r2 ) ) } }
Code generation for binary expressions
Code generation for binary expressions
Note that the call codegenExpr(lhs, target+1) in
Binop ( lhs, op, rhs ) then {
codegenExpr ( rhs, target ) ++
codegenExpr ( lhs, target+1 ) ++
codegenBinop ( op, target, target+1 ) }
leaves the result of the first call codegenExpr(rhs, target) in the register target unchanged by our assumptions that codegenExpr never modifies registers below its second argument.
Code generation for binary expressions
Note that the call codegenExpr(lhs, target+1) in
Binop ( lhs, op, rhs ) then {
codegenExpr ( rhs, target ) ++
codegenExpr ( lhs, target+1 ) ++
codegenBinop ( op, target, target+1 ) }
leaves the result of the first call codegenExpr(rhs, target) in the register target unchanged by our assumptions that codegenExpr never modifies registers below its second argument.
Please convince yourself that each clause of codegenExpr really implements this guarantee!
Example x*3+4
Example x*3+4
The expression x*3+4 has the following AST.
Example x*3+4
The expression x*3+4 has the following AST.
Binop (
Binop ( Ident ( “x” ), Times, Const ( 3 ) ),
Plus,
Const ( 4 ) )
Example x*3+4
The expression x*3+4 has the following AST.
Binop (
Binop ( Ident ( “x” ), Times, Const ( 3 ) ),
Plus,
Const ( 4 ) )
When we compile it (to target register r17, say), we obtain:
Example x*3+4
The expression x*3+4 has the following AST.
Binop (
Binop ( Ident ( “x” ), Times, Const ( 3 ) ),
Plus,
Const ( 4 ) )
When we compile it (to target register r17, say), we obtain:
LoadImm r17 4
LoadImm r18 3
Load r19 x
Times r18 r19
Plus r17 r18
How can this be improved (1)?
How can this be improved (1)?
Let’s use commutativity of addition (a+b = b+a) and compile 4+x*3! When we compile it, we obtain:
How can this be improved (1)?
Let’s use commutativity of addition (a+b = b+a) and compile 4+x*3! When we compile it, we obtain:
LoadImm r17 3
Load r18 x
Times r17 r18
LoadImm r18 4
Plus r17 r18
How is this better?
Side by side
Side by side
Compilation of x*3+4
LoadImm r17 4
LoadImm r18 3
Load r19 x
Times r18 r19
Plus r17 r18
Compilation of 4+x*3
LoadImm r17 3
Load r18 x
Times r17 r18
LoadImm r18 4
Plus r17 r18
Side by side
Compilation of x*3+4
LoadImm r17 4
LoadImm r18 3
Load r19 x
Times r18 r19
Plus r17 r18
Compilation of 4+x*3
LoadImm r17 3
Load r18 x
Times r17 r18
LoadImm r18 4
Plus r17 r18
The translation on the left uses 3 registers, while the right only two. We are currently assuming an unbounded number of registers, so there’s no difference, but for realistic CPUs, the number of registers is small, so smart translation strategies that save registers are better. More on this later!
How can this be improved (2)?
How can this be improved (2)?
We used two registers – can we get away with fewer?
How can this be improved (2)?
We used two registers – can we get away with fewer?
Probably not with this machine architecture. But most widely used CPUs can use constants directly in arithmetic operations, e.g.
MulImm r 3
Multiplies the content of register r with 3, storing the result in r
How can this be improved (2)?
We used two registers – can we get away with fewer?
Probably not with this machine architecture. But most widely used CPUs can use constants directly in arithmetic operations, e.g.
MulImm r 3
Multiplies the content of register r with 3, storing the result in r
AddImm r 4
Adds the content of register r with 3, storing the result in r
How can this be improved (2)?
We used two registers – can we get away with fewer?
Probably not with this machine architecture. But most widely used CPUs can use constants directly in arithmetic operations, e.g.
MulImm r 3
Multiplies the content of register r with 3, storing the result in r
AddImm r 4
Adds the content of register r with 3, storing the result in r Then x*3+4 translates to
Load r17 x
TimesImm r17 3
PlusImm r17 4
This is a different address(ing) mode. We may see more about this later.
Translation of statements
Translation of statements
Similar to stack machine, except that arguments and results of expressions are held in registers. We’ll see this in detail later.
Translation of statements
Similar to stack machine, except that arguments and results of expressions are held in registers. We’ll see this in detail later.
Question: Does the codeGenStatement method need to be passed a ’target register’?
Translation of statements
Similar to stack machine, except that arguments and results of expressions are held in registers. We’ll see this in detail later.
Question: Does the codeGenStatement method need to be passed a ’target register’? Answer: Yes, because statements may contain expressions, e.g. x := x*y+3.
Translation of statements
Similar to stack machine, except that arguments and results of expressions are held in registers. We’ll see this in detail later.
Question: Does the codeGenStatement method need to be passed a ’target register’? Answer: Yes, because statements may contain expressions, e.g. x := x*y+3.
Question: What register should we use as first register when we start code generation?
Translation of statements
Similar to stack machine, except that arguments and results of expressions are held in registers. We’ll see this in detail later.
Question: Does the codeGenStatement method need to be passed a ’target register’? Answer: Yes, because statements may contain expressions, e.g. x := x*y+3.
Question: What register should we use as first register when we start code generation? Answer: doesn’t matter for unrestricted registers, but if registers are finite, use the lowest possible, i.e. R0. Why? Because code generation will not use registers below target, so any register below the initial one will not be used at all (= waste).
Translation of statements
Similar to stack machine, except that arguments and results of expressions are held in registers. We’ll see this in detail later.
Question: Does the codeGenStatement method need to be passed a ’target register’? Answer: Yes, because statements may contain expressions, e.g. x := x*y+3.
Question: What register should we use as first register when we start code generation? Answer: doesn’t matter for unrestricted registers, but if registers are finite, use the lowest possible, i.e. R0. Why? Because code generation will not use registers below target, so any register below the initial one will not be used at all (= waste).
Now we do something more interesting.
Bounded register numbers
Bounded register numbers
It’s easy to compile to register machine code, when the number of registers is unlimited. Now we look at compilation to register machines with a fixed number of registers.
Let’s go to the other extreme: just one register called accumulator. Operations take one of their arguments from the accumulator, and store the result in the accumulator. Additional arguments are taken from the top of the stack.
Bounded register numbers
It’s easy to compile to register machine code, when the number of registers is unlimited. Now we look at compilation to register machines with a fixed number of registers.
Let’s go to the other extreme: just one register called accumulator. Operations take one of their arguments from the accumulator, and store the result in the accumulator. Additional arguments are taken from the top of the stack.
Why is this interesting?
Bounded register numbers
It’s easy to compile to register machine code, when the number of registers is unlimited. Now we look at compilation to register machines with a fixed number of registers.
Let’s go to the other extreme: just one register called accumulator. Operations take one of their arguments from the accumulator, and store the result in the accumulator. Additional arguments are taken from the top of the stack.
Why is this interesting? We can combine two strategies:
Bounded register numbers
It’s easy to compile to register machine code, when the number of registers is unlimited. Now we look at compilation to register machines with a fixed number of registers.
Let’s go to the other extreme: just one register called accumulator. Operations take one of their arguments from the accumulator, and store the result in the accumulator. Additional arguments are taken from the top of the stack.
Why is this interesting? We can combine two strategies:
While>1freeregistersremain,usetheregistermachine strategy discussed above for compilation.
Bounded register numbers
It’s easy to compile to register machine code, when the number of registers is unlimited. Now we look at compilation to register machines with a fixed number of registers.
Let’s go to the other extreme: just one register called accumulator. Operations take one of their arguments from the accumulator, and store the result in the accumulator. Additional arguments are taken from the top of the stack.
Why is this interesting? We can combine two strategies:
While>1freeregistersremain,usetheregistermachine strategy discussed above for compilation.
Whenthelimitisreached(ie.whenthereisoneregister left), revert to the accumulator strategy, using the last register as the accumulator.
Bounded register numbers
It’s easy to compile to register machine code, when the number of registers is unlimited. Now we look at compilation to register machines with a fixed number of registers.
Let’s go to the other extreme: just one register called accumulator. Operations take one of their arguments from the accumulator, and store the result in the accumulator. Additional arguments are taken from the top of the stack.
Why is this interesting? We can combine two strategies:
While>1freeregistersremain,usetheregistermachine strategy discussed above for compilation.
Whenthelimitisreached(ie.whenthereisoneregister left), revert to the accumulator strategy, using the last register as the accumulator.
The effect is that most expressions get the full benefit of registers, while unusually large expressions are handled correctly.