L6_1 Assembly_Functions
EECS 370 – Introduction to Computer Organization – Fall 2020
EECS 370 – Introduction to Computer Organization – © Bill Arthur 1 The material in this presentation cannot be copied in any form without written permission
Learning Objectives
• Understand how program data, particularly at the granularity of a function, maps to memory
• Identify data passed between functions and the mapping of that data to memory
EECS 370 – Introduction to Computer Organization
2
Schedule Reminders
• Homework 2 is due 9/29
• A homework assignment is usually released soon after the previous one is due
• Project 1s and 1m due 9/24
EECS 370 – Introduction to Computer Organization
3
Call and Return
Order of execution: int x = 5; bar();
int y = 10; return;
x = x + 1;
return;
Notes:
// branch to 3000
return; // branch to 1008
return; // branch to ???
Address:
1000 1004 1008 1012
void foo() {
int x = 5; bar();
x = x + 1; return;
void bar() {
int y = 10;
return;
3000 3004
}
Remember:
There can be many call sites for a function in a program, i.e., more than just foo() will call bar()
void baz() {
bar();
return; }
}
EECS 370 – Introduction to Computer Organization
4
Unconditional Branching Instructions
• There are three types of unconditional branches in the LEGv8 ISA.
• The first (B) is the PC relative branch with the 26-bit offset from the last slide. • The second (BR) jumps to the address contained in a register (X30 above)
• The third (BL) is like our PC relative branch but it does something else.
• It sets X30 (always) to be the current PC+4 before it branches. • Why?
• Function calls – return to next instruction
EECS 370 – Introduction to Computer Organization
5
Branch with Link (BL)
• Branch with Link is the branch instruction used to call functions
• Functions need to know where they were called from so they can return. • In particular they will need to return to right after the function call
• Can use “BR X30”
• Say that we execute the instruction BL #500 when at PC 1000. • What address will be branched to?
• What value is stored in X30?
• How is that value in X30 useful?
EECS 370 – Introduction to Computer Organization
6
Converting function calls to assembly code
C code: factorial(5);
1. Need to pass arguments to the called function (factorial())
2. Need to save return address of the caller
3. Need to save register values
4. Need to jump to factorial (callee)
Execute instructions for factorial() Jump to return address
5. Need to get return value (for non-void return type)
6. Need to restore register values
EECS 370 – Introduction to Computer Organization
7
Task 1: Passing Arguments
• Where should you put all the arguments?
• Registers?
• Fast access but few in number and wrong size for some objects
• Memory?
• Good general solution but slow
• ARMv8 solution—and the usual answer:
• Registers and memory
• Put the first few arguments in registers (if they fit) (X0 – X7)
• Put the rest in memory on the call stack— important concept
• Comment: Make sure you understand the general idea behind a stack data structure—ubiquitous in computing
• As basic concept it is a list in that you can only access at one end by pushing a data item into the top of the stack and popping an item off of the stack—real stacks are a little more complex
EECS 370 – Introduction to Computer Organization
8
Call stack
• ARM conventions (and most other processors) allocate a region of memory for the “call” stack
• This memory is used to manage all the storage requirements to simulate function call semantics
• Parameters (that were not passed through registers)
• Local variables
• Temporary storage (when you run out of registers and need
somewhere to save a value)
• Return address
• etc.
• Sections of memory on the call stack [a.k.a. stack frames] are allocated when you make a function call, and de-allocated when you return from a function—the stack frame is a fixed template of memory locations
EECS 370 – Introduction to Computer Organization
9
An Older ARM (Linux) Memory Map
Stack: starts at 0x0000 007F FFFF FFFC and grows down to lower addresses. Bottom of the stack resides in the SP register
Heap: starts above static data and grows up to higher addresses. Allocation done explicitly with malloc(). Deallocation with free(). Runtime error if no free memory before running into SP address. NB not same as data structure heap—just uninitialized mem.
Static: starts above text. Holds all global variables and those locals explicitly declared as “static”.
Text: starts at 0x0000 0000 0004 0000. Holds all instructions in the program (except for dynamically linked library routines, DLLs)
frames
stack
heap
Static data
Text (instructions)
EECS 370 – Introduction to Computer Organization
10
(start_brk) (start_data)
Assigning variables to memory spaces
stack
heap
static
text
int w;
void foo(int x) {
static int y[4];
char *p;
p = malloc(10);
// more instructions
printf(“%s\n”, p);
return; }
w goes in static, as it is a global
x goes on the stack, as it is a parameter
y goes in static, 1 copy of this!!
p goes on the stack
allocate 10 bytes on heap, p set to the address
string goes in static with a pointer to string on stack, p goes on stack
EECS 370 – Introduction to Computer Organization
11
The stack grows as functions are called
void foo() {
int x, y[2];
bar(x); }
void bar(int x) {
int a[3];
printf(); }
inside foo foocallsbar
bar calls printf
bar returns printf returns
foo’s stack frame
bar’s stack frame
printf’s stack frame
EECS 370 – Introduction to Computer Organization
12
The stack grows as functions are called
void foo() {
int x, y[2];
bar(x); }
void bar(int x) {
int a[3];
printf(); }
inside foo
bar calls printf
foo’s stack frame
bar’s stack frame
printf’s stack frame
foo’s stack frame
foo calls bar
foo’s stack frame
bar’s stack frame
EECS 370 – Introduction to Computer Organization
13
The stack shrinks as functions return
void foo() {
int x, y[2];
bar(x); }
void bar(int x) {
int a[3];
printf(); }
In printf
printf returns to bar
foo’s stack frame
bar’s stack frame
foo’s stack frame
bar’s stack frame
printf’s stack frame
EECS 370 – Introduction to Computer Organization
14
bar returns to foo
foo’s stack frame
Stack frame contents (0)
void foo() {
int x, y[2];
bar(x); }
void bar(int x) {
int a[3];
printf(); }
foo’s stack frame
return address to main
x
y[0]
y[1]
spilled registers in foo
EECS 370 – Introduction to Computer Organization
15
Stack frame contents (1)
void foo() {
int x, y[2];
bar(x); }
void bar(int x) {
int a[3];
printf(); }
foo calls bar
return address to main
x
y[0]
y[1]
spilled registers in foo
x
return address to foo
a[0]
a[1]
a[2]
spilled registers in bar
EECS 370 – Introduction to Computer Organization
16
bar’s frame foo’s frame
Stack frame contents (2)
void foo() {
int x, y[2];
bar(x); }
void bar(int x) {
int a[3];
printf(); }
bar calls printf
return address to main
x
y[0]
y[1]
spilled registers in foo
x
return address to foo
a[0]
a[1]
a[2]
spilled registers in bar
return address to bar
printf local variables
EECS 370 – Introduction to Computer Organization
17
printf ’s
frame bar’s frame foo’s frame
Recursive function example
return address to …
2
return address to main
x, y[0], y[1]
spills in foo
1
return addr to foo
x, y[0], y[1]
spills in foo
0
return addr to foo
x, y[0], y[1]
spills in foo
void main() {
foo(2); }
void foo(int a) {
int x, y[2];
if (a > 0)
{
foo(a-1); {
}
main calls foo
EECS 370 – Introduction to Computer Organization
18
foo calls foo
foo calls foo
Logistics
• There are 3 videos for lecture 6 • L6_1 – Assembly_Functions
• L6_2 – Registers_Caller/Callee
• L6_3 – Caller/Callee-Examples
• There is one worksheet for lecture 6 1. Caller/Callee-saved registers – wait!
EECS 370 – Introduction to Computer Organization
19
L6_2 Registers_Caller/Callee
EECS 370 – Introduction to Computer Organization – Fall 2020
EECS 370 – Introduction to Computer Organization – © Bill Arthur 20 The material in this presentation cannot be copied in any form without written permission
Learning Objectives
• Understand how program data, particularly at the granularity of a function, maps to registers
• Identify data passed between functions and the mapping of that data to registers
EECS 370 – Introduction to Computer Organization
21
What about values in registers?
• When function “foo” calls function “bar”, function “bar” is, like all assembly code, going to store some values in registers.
• But function “foo” might have some values stored in registers that it wants to use after the call.
• How can “foo” be sure “bar” won’t overwrite those values?
• One answer: “foo” could save those values to memory (on the stack) before it calls “bar”.
• Now “bar” can freely use registers
• And “foo” will have to copy the values back from
memory once “bar” returns.
• In this case the “caller” (foo) is saving the registers to the stack.
foo: addi x0, x0, #1
bl bar
add x1, x0, x2
EECS 370 – Introduction to Computer Organization
22
bar: movz x0, #3000
The process of putting less frequently used variables (or those needed later) into memory is called spilling registers
Register Spilling Example
• The function foo is going to have values a, b, c, and d kept in registers.
• For sake of argument, let’s say that “a” is stored in X1, “b” in X2, etc. • When foo calls bar, bar might end up writing to some of the same
registers that foo is using.
• In this case, if bar were to change the value of X1 (which holds a) or X4 (which holds d) then foo wouldn’t behave correctly.
• What we need is some way to ensure that when we call a function, it will not “trash” values we need after the call
• Definition: a value that is defined before a function call and needed after a function call is said to be “live” across the function call.
void foo(){ int a,b,c,d; a = 5;
b = 6;
c = a+1; d=c-1;
bar();
d = a + d;
return; }
EECS 370 – Introduction to Computer Organization
23
One solution: Caller Save
• Anytime one function calls another function, the caller should first save to the stack any registers whose values it might need later.
• After returning, those values will be copied back from the stack into their original register.
• Now the “callee” function will not overwrite values its “caller” still needs.
• We call this option “caller save”
• Again, let’s assume that a is stored in X1, b in X2, etc.
• If we are using this “caller-save” option
• What registers do we need to save to the stack before
calling bar?
• What registers do we need to restore from the stack?
void foo(){ int a,b,c,d; a = 5;
b = 6;
c = a+1; d=c-1;
save X1 to stack
save X4 to stack
bar();
restore X1 from stack
restore X4 from stack
d = a + d;
return; }
EECS 370 – Introduction to Computer Organization
24
Another solution: Callee Save
• We could instead have each function save every register it is going to use before it does anything else.
• Again, let’s assume a is in X1, b in X2, etc.
• In this case, we’d save X1, X2, X3, and X4 before we did anything else and then restore them at the end of the function.
• Thus whatever function called “foo” would be sure its registers wouldn’t get trashed by foo as foo would save and restore all registers.
• All functions would do the same thing.
• so foo’s values in X1 and X4 are safe from bar trashing them.
void foo(){ int a,b,c,d;
a = 5;
b = 6;
c = a+1; d=c-1;
bar();
d = a + d;
return; }
EECS 370 – Introduction to Computer Organization
25
Another solution: Callee Save
void foo(){ int a,b,c,d;
save X1, X2, X3, X4
to stack
a = 5;
b = 6;
c = a+1; d=c-1;
bar();
d = a + d;
restore X1, X2, X3, X4
from stack
return;
•
•
We could instead have each function save every register it is going to use before it does anything else.
• Again, let’s assume a is in X1, b in X2, etc.
In this case, we’d save X1, X2, X3, and X4 before we did anything else and then restore them at the end of the function.
• Thus whatever function called “foo” would be sure its registers wouldn’t get trashed by foo as foo would save and restore all registers.
• All functions would do the same thing.
• so foo’s values in X1 and X4 are safe from bar
trashing them.
} 26 EECS 370 – Introduction to Computer Organization
“caller-save” vs. “callee-save”
• So we have two basic options:
• Each function can save registers before you make the function call and
restore the registers when you return (caller-save).
• What if the function you are calling doesn’t use that register? No harm done, but wasted
work!!!
• You can save all registers you are going to use at the very start of each
function (callee-save).
• What if the caller function doesn’t need that value? No harm done, but wasted work!!!
• Most common scheme is to have some of the registers be the responsibility of the caller, and others be the responsibility of the callee.
EECS 370 – Introduction to Computer Organization
27
Caller-Callee Save/Restore
Caller Save
Caller save Caller restore
foo() { . ..
bar ();
. .. }
Callee Save
bar() { . .. . .. . ..
}
Callee save
Callee restore
Caller save registers: Callee may change, so caller responsible for saving immediately before call and restoring immediately after call
Callee save registers: Must be the same value as when called. May do this by either not changing the value in a register or by inserting saves at the start of the function and restores at the end
EECS 370 – Introduction to Computer Organization
28
Saving/Restoring Optimizations
• Caller-saved
• Only needs saving if it is “live” across a function call
• Live = contains a useful value: Assign value before function call, use that value after the function call
• In a leaf function (a function that calls no other function), caller saves can be used without saving/restoring
• Callee-saved
• Only needs saving at beginning of function and restoring at end of function • Only save/restore it if function overwrites the register
• Each has its advantages. Neither is always better. EECS 370 – Introduction to Computer Organization
29
Calling Convention
• This is a convention: calling convention
• There is no difference in H/W between caller and callee save registers
• Passing parameters in registers is also a convention
• Allows assembly code written by different people to work together
• Need conventions about who saves regs and where args are passed.
• These conventions collectively make up the ABI or “application binary
interface”
• Why are these conventions important?
• What happens if a programmer/compiler violates them? EECS 370 – Introduction to Computer Organization
30
Caller/Callee Selection
• Select assignment of variables to registers such that the sum of caller/callee costs is minimized
• Execute fewest save/restores
• Each function greedily picks its own assignment ignoring the assignments in other functions
• Calling convention assures all necessary registers will be saved
• 2 types of problems
1.Given a single function→Assume it is called 1 time
2.Set of functions or program→Compute number of times each function is called if it is obvious (i.e., loops with known trip counts or you are told)
EECS 370 – Introduction to Computer Organization
31
Assumptions
• A function can be invoked by many different call sites in different functions.
• Assume no inter-procedural analysis (hard problem)
• A function has no knowledge about which registers are used in either its
caller or callee
• Assume main() is not invoked by another function
• Implication
• Any register allocation optimization is done using function local information
EECS 370 – Introduction to Computer Organization
32
Logistics
• There are 3 videos for lecture 6 • L6_1 – Assembly_Functions
• L6_2 – Registers_Caller/Callee
• L6_3 – Caller/Callee-Examples
• There is one worksheet for lecture 6 1. Caller/Callee-saved registers – wait!
EECS 370 – Introduction to Computer Organization
33
L6_3 Caller/Callee-Examples
EECS 370 – Introduction to Computer Organization – Fall 2020
EECS 370 – Introduction to Computer Organization – © Bill Arthur 34 The material in this presentation cannot be copied in any form without written permission
Learning Objectives
• Identify trade-offs between caller-save, callee-save, and mixed caller/callee-save register spilling
• Understanding of how to apply caller/callee/mixed register spilling to arbitrary functions and programs
EECS 370 – Introduction to Computer Organization
35
Caller-saved vs. callee saved – Multiple function case (0)
void bar(){
int g,h,i,j; .
g = 0; h = 1; i = 2; j = 3; final();
j = g+h+i;
.
.
.
}
void main(){ int a,b,c,d; .
c = 5; d = 6; a = 2; b = 3; foo();
d = a+b+c+d; .
.
.
}
void foo(){ int e,f; .
.
e = 2; f = 3; bar();
e = e + f;
.
.
. }
void final(){ int y,z;
.
.
y = 2; z = 3; .
z = y+z;
.
.
. }
Note: assume main does not have to save any callee registers
EECS 370 – Introduction to Computer Organization
36
Caller-saved vs. callee saved – Multiple function case (1)
• Questions:
1.In assembly code, how many registers need to be stored/loaded in total if we
use a caller-save convention ?
2.In assembly code, how many registers need to be stored/loaded in total if we
use a callee-save convention ?
3.In assembly code, how many registers need to be stored/loaded in total if we
use a mixed caller/callee-save convention with 3 callee and 3 caller registers ?
EECS 370 – Introduction to Computer Organization
37
Question 1: Caller-Save
void bar(){
int g,h,i,j; .
g = 0; h = 1; i = 2; j = 3; [3 STURW] final();
[3 LDURSW]
j = g+h+i;
.
.
.
}
void main(){ int a,b,c,d; .
c = 5; d = 6; a = 2; b = 3; [4 STURW] foo();
[4 LDURSW]
d = a+b+c+d; .
.
.
}
void foo(){ int e,f; .
.
e = 2; f = 3;
[2 STURW]
bar();
[2 LDURSW]
e = e + f; .
.
.
}
void final(){ int y,z;
.
.
y = 2; z = 3; .
z = y+z;
.
.
. }
EECS 370 – Introduction to Computer Organization
38
Total: 9 STURW / 9 LDURSW
Question 2: Callee-Save
void bar(){
[4 STURW]
int g,h,i,j;
.
g = 0; h = 1;
i = 2; j = 3;
final();
j = g+h+i;
.
[4 LDURSW]
}
void main(){ int a,b,c,d; .
c = 5; d = 6; a = 2; b = 3; foo();
d = a+b+c+d; .
.
.
}
void foo(){
[2 STURW]
int e,f;
.
.
e = 2; f = 3; bar();
e = e + f; .
.
.
[2 LDURSW]
}
void final(){
[2 STURW]
int y,z;
.
.
y = 2; z = 3; .
z = y+z;
.
.
[2 LDURSW]
}
Note: assume main does not have to save any callee registers
EECS 370 – Introduction to Computer Organization
39
Total: 8 STURw / 8 LDURSW
Caller-Save and Callee-Save Registers(0)
• Again, what we really tend to do is have some of the registers be the responsibility of the caller and others the responsibility of the callee.
• So if you have six registers, then X0-X2 are caller-save registers and X3-X5 are callee-save registers.
• How does that help?
EECS 370 – Introduction to Computer Organization
40
X0-X2 are caller-save, X3-X5 are callee-save
Question 3: Mixed 3 Each Caller/Callee-Save
void bar(){
int g,h,i,j;
.
g = 0; h = 1;
i = 2; j = 3;
final();
j = g+h+i; .
}
void main(){
int a,b,c,d;
.
c = 5; d = 6;
a = 2; b = 3;
foo();
d = a+b+c+d; }
void foo(){
int e,f;
.
.
e = 2; f = 3; bar();
e = e + f; .
.
.
}
void final(){
int y,z;
.
.
y = 2; z = 3; .
z = y+z; .
.
}
EECS 370 – Introduction to Computer Organization
41
X0-X2 are caller-save, X3-X5 are callee-save
Question 3: Mixed 3 Each Caller/Callee-Save
For main, we’d like to put all the variables into callee save registers. But we only have 3 callee save registers (X3-X5), so one variable needs to end up in a caller-save register.
void main(){ int a,b,c,d; .
c = 5; d = 6; a = 2; b = 3; [1 STURW] foo();
[1 LDURSW]
d = a+b+c+d; }
1 caller reg. 3 callee reg.
EECS 370 – Introduction to Computer Organization
42
X0-X2 are caller-save, X3-X5 are callee-save
Question 3: Mixed 3 Each Caller/Callee-Save
void foo(){
[2 STURW]
int e,f;
.
.
e = 2; f = 3; bar();
e = e + f; .
.
.
[2 LDURSW]
}
2 callee reg.
For foo it doesn’t really matter what registers we use. Either way we will have to save and restore 2 values
EECS 370 – Introduction to Computer Organization
43
X0-X2 are caller-save, X3-X5 are callee-save
Question 3: Mixed 3 Each Caller/Callee-Save
void bar(){
[3 STURW]
int g,h,i,j;
.
g = 0; h = 1;
i = 2; j = 3;
final();
j = g+h+i;
.
[3 LDURSW]
}
1 caller reg. 3 callee reg.
EECS 370 – Introduction to Computer Organization
44
For bar, “j” should be allocated to caller-save register. The others don’t matter.
X0-X2 are caller-save, X3-X5 are callee-save
Question 3: Mixed 3 Each Caller/Callee-Save
void final(){
int y,z;
.
.
y = 2; z = 3; .
z = y+z; .
.
}
EECS 370 – Introduction to Computer Organization
45
3 caller reg.
X0-X2 are caller-save, X3-X5 are callee-save
Question 3: Mixed 3 Each Caller/Callee-Save
For main, we’d like to put all the variables into callee save registers. But we only have 3 callee save registers (X3-X5), so one variable needs to end up in a caller-save register.
void bar(){
[3 STURW]
int g,h,i,j;
.
g = 0; h = 1;
i = 2; j = 3;
final();
j = g+h+i;
.
[3 LDURSW]
}
void main(){ int a,b,c,d; .
c = 5; d = 6; a = 2; b = 3; [1 STURW] foo();
[1 LDURSW]
d = a+b+c+d; }
void foo(){
[2 STURW]
int e,f;
.
.
e = 2; f = 3; bar();
e = e + f; .
.
.
[2 LDURSW]
}
void final(){
int y,z;
.
.
y = 2; z = 3; .
z = y+z; .
.
}
1 caller reg. 3 callee reg.
3 caller reg.
2 callee reg.
1 caller reg. 3 callee reg.
For foo it doesn’t really matter what registers we use. Either way we will have to save and restore 2 values
EECS 370 – Introduction to Computer Organization
46
For bar, “j” should be allocated to caller-save register. The others don’t matter.
Total: 6 STURW / 6 LDURSW
It can get quite a bit more complex than this
• Multiple function calls in a given function will make things more complex
• As will loops
• The video review for caller/callee save found on the class website is quite useful
• Discussion videos also go over some examples
EECS 370 – Introduction to Computer Organization
47
Does Recursion Change Caller/Callee?
• No! Treat foo() just like any normal function and assume you are calling an unknown function.
void main(){ int a,b,c,d; .
c = 5; d = 6; a = 2; b = 3; foo();
d = a+b+c+d; .
.
.
}
void foo(){ int a,b,c; c = 4;
.
a = 2; b = 3;
foo(c-1,b+1);
a = a + b;
.
.
b = 4;
foo(b,9);
b = a – b;
.
}
EECS 370 – Introduction to Computer Organization
48
LEGv8 ABI-–Application Binary Interface
• The ABI is an agreement about how to use the various registers. These can be broken into three groups
• Some registers are reserved for special use Register usage definitions
• (X31) zero register, (X30) link register, (X29) frame pointer, (x28) stack pointer, (X16-18)
reserved for other uses. • Callee save: X19-X27
• Caller save: X0-X15
• In addition, we pass arguments using registers X0-X7 (and memory if there
are more arguments)
• Return value goes into X0
EECS 370 – Introduction to Computer Organization
49
Logistics
• There are 3 videos for lecture 6 • L6_1 – Assembly_Functions
• L6_2 – Registers_Caller/Callee
• L6_3 – Caller/Callee-Examples
• There is one worksheet for lecture 6
1. Caller/Callee-saved registers – complete now!
EECS 370 – Introduction to Computer Organization
50