• Most of a program is written in a high-level language (HLL) such as C
– easier to implement – easier to understand – easier to maintain
• Only a few functions are written in assembly – to improve performance, or
– because it’s difficult or impossible to do in a HLL.
Copyright By PowCoder代写 加微信 powcoder
Most of code, including function main
C source code
ASM source code
A few small functions
Creating a Program
Library header files
Object code
Object code
Library code
Execu- table Program
Consequence
Assembly functions must be compatible with instructions generated by the C compiler to …
1. Call a function
2. Pass parameters
3. Receive return value
4. Use CPU registers
FUNCTION CALL AND RETURN Simple Call-Return in C
1. Suspend the current sequence
2. Record the return address
3. Transfer control to the function
Branch with Link Instruction (BL)
Execute the function
Use the recorded return address to go back and resume the suspended code.
voidf1(void) {
Branch Indirect Instruction (BX)
FUNCTION CALL AND RETURN
ARM instructions used to call and return from functions
Instruction
Branch with Link
Function Call:
LRreturn address, PCaddress of label
Branch Indirect
Function Return:
LR (aka R14) and PC (aka R15) are two of the 16 registers inside the CPU. LR is the “Link Register” – used w/function calls to hold the return address PC is the “Program Counter” – holds the address of the next instruction.
FUNCTION CALL AND RETURN Simple Call-Return in Assembly
Consider: There is only one Link Register. What if inside function f1 there is a call to another function?
The “Branch Indirect” instruction (BX) copies the return address from LR into PC, thus transferring control back to where the function had been called.
The “Branch with Link” instruction (BL) saves the address of the instruction immediately following it (the return address) in the Link Register (LR).
BL f1 f1: ● ●● ● ● ●
FUNCTION CALL AND RETURN Nested Call-Return in Assembly
Saves the return address in LR
Saves the contents of LR on the stack.
Modifies the link register (LR), writing over f1’s return address!
BL f1 ●● ●● ●●
Restores the contents of LR from the stack.
Copies the saved return address from LR back into PC
FUNCTION CALL AND RETURN
ARM instructions used in functions that call other functions
Pushed data
Pushed data
Top of stack
Stack Space
SP is one of the 16 CPU registers (R13) – holds the address of the top of the stack.
1000 (register R13) 996 992 988 984 980 976 972
Instruction
Push registers onto stack
PUSH registerlist
SPSP – 4 × #registers Copy registers to mem[SP]
Pop registers from stack
POP register list
Copy mem[SP] to registers, SPSP + 4 × #registers
“register list” format: { reg, reg, reg-reg, …}
IMPORTANT! Registers in the list must be listed in numerical order. 9
Optimizing Function Calls
f1: PUSH {LR} // StackLR f1: PUSH {LR} // StackLR ●● ●● ●●
BL f2 BL f2
// LRStack POP {PC} // PCStack // PCLR
POP {LR} BX LR
Optimizing Function Calls
BL f1 f1: ●● ●● ●●
POP {LR} BX LR
BLf1 f1: ● ●● ●● ●
ARM PROCEDURE CALL STANDARD Registers used as input parameters
AAPCS Name
1st parameter
2nd parameter
3rd parameter
4th parameter
One register is used for each 8, 16 or 32-bit parameter.
A sequential register pair is used for each 64-bit parameter.
PARAMETER PASSING
void foo(int8_t, int32_t, int64_t) ;
C Function Call
Compiler Output
int8_t int32_t int64_t
x8 ; y32 ; z64 ;
foo(x8, y32, z64) ; ●
foo(5, -10, 20) ;
LDR // R0 <-- &x8 LDRSB // R0 <-- x8
LDR // R1 <-- &y32 LDR // R1 <-- y32 LDR // R2 <-- &z64 LDRD // R3.R2 <-- z64 BL foo
LDR // R1 <-- -10 LDR // R3.R2 <-- 20 LDR
R0,=x8 R0,[R0]
R1,=y32 R1,[R1]
R2,=z64 R2,R3,[R2]
Inside foo, you must use
the register copies of the
actual arguments.
// R0 <-- 5
R2,=20 R3,=0
Why 2 Instructions to load a variable?
• Instructions are either 16 or 32 bits wide.
• All addresses are 32 bits wide.
• Instructions have no room for a 32-bit address – Some bits are used to specify the operation code – Some bits are used to specify register operands
• Solution 1: Use an address “displacement”
– Address distance from instruction to operand
– Assumes operand is located near the instruction
Why 2 Instructions to load a variable?
• Instructions and constant are stored in a region of memory that is read-only
• Variables are stored in a region of memory that must be writeable.
• Those two regions are too far apart for the operand to be specified using an address displacement.
– The magnitude of the displacement could require too many bits to fit inside the instruction.
Why 2 Instructions to load a variable?
• Solution 2: Use solution 1 to copy the (constant) address of the operand into a register, then use that register to provide the address in the second instruction:
// All of the following is in read-only memory
// First LDR uses an address displacement
LDR R0,.temp // R0content of .temp LDR R1,[R0] // R1content of x
... // location .temp is near the first instruction
.temp .word x // a constant (address of x)
ARM PROCEDURE CALL STANDARD Registers used to return function result
AAPCS Name
8, 16 or 32-bit result, or the least-significant half of a 64-bit result
The most-significant half of a 64- bit result
PREPARING THE RETURN VALUE Functions that return an 8, 16 or 32-bit result
Functions must provide a full 32-bit
representation of return value,
even for 8 and 16-bit results.
Review Summary
• Call functions using Branch with Link (BL)
– Saves the return address in Link Register (LR)
– Copies the target address into Program Counter (PC)
• Return from a function using BX LR
• Writing functions that call other functions: – Using BL to call another function changes LR – Use PUSH {LR} / POP {LR} to preserve LR
• Pass parameters using registers R0-R3
– 64-bit parameters in consecutive register pairs
• Return 8, 16 and 32 bit results using R0
– 64-bit result returned in R1.R0 register pair
Review Summary
void f2(vinoti3d2)_;t, int32_t) ;
int32_t f1(void) void f1(void)
return 10 ;
LDR R0,=10
PUSH {LR} .
LDR R1,=2 .
f2() ; f2(1, 2) ;
POP {PC} BX LR
HOW COMPILER HANDLES RETURN VALUES Promoting a return value from 8 to 16, 32 or 64 bits
8, 16 and 32-bit func- tions must always provide a 32-bit result.
Promotion to 64-bits requires extending the result after the call
HOW COMPILER HANDLES RETURN VALUES Functions that return 64-bit result
Functions that return a 64-bit result leave it in R0 and R1 before returning.
PREPARING THE RETURN VALUE Functions that return an 8 or 16-bit result
uint8_t u8 ;
// What value is assigned?
u8 = Add1(255) ;
Answer: u8 = 0 // 0 ≤ Add1 ≤ 255
uint16_t u16 ; uint32_t u32 ; uint64_t u64 ;
// What value is assigned? u16 = (uint16_t) Add1(255) ; u32 = (uint32_t) Add1(255) ; u64 = (uint64_t) Add1(255) ;
Answer: Always 0 // 0 ≤ Add1 ≤ 255
uint8_t Add1(uint8_t x)
return x + 1 ; }
Add1: ADD R0,R0,1 BX LR
This implementation would return 255+1 = 256
Add1: ADD R0,R0,1
UXTB R0,R0
UXTB: Unsigned eXTend (zero-extend) Byte 23
PREPARING THE RETURN VALUE Functions that return an 8 or 16-bit result
Instruction
operand2 options:
UXTB Rd, operand2
Rd ← Zero extend operand2<7..0>
1. Rm (a register)
2. Rm,ROR constant
(constant=8, 16 or 24)
UXTH Rd, operand2
Rd ← Zero extend operand2<15..0>
SXTB Rd, operand2
Rd ← Sign extend operand2<7..0>
SXTH Rd, operand2
Rd ← Sign extend operand2<15..0>
REGISTER USAGE CONVENTIONS ARM PROCEDURE CALL STANDARD
AAPCS Name
Argument / result /scratch register 1
Do not have to preserve original contents
Argument / result /scratch register 2
Argument / scratch register 3
Argument / scratch register 4
Variable register 1
Must preserve original contents
Variable register 2
Variable register 3
Variable register 4
Variable register 5
Variable register 6
Variable register 7
Variable register 8
Intra-Procedure-call scratch register
Do not have to preserve
Stack Pointer
Reserved, DO NOT USE
Link Register
Program Counter
FUNCTION CODING CONVENTIONS Functions that modify only registers R0 – R3, R12
Using only R0-R3, R12 and no function call
● OK to modify
● R0-R3 and R12 ●
Using only R0-R3, R12 and calling another function
f2: PUSH{LR}
● OK to modify R0–R3, R12 ●
● OK to modify R0–R3, R12 ●
Function f3 might modify R0 – R3, R12
FUNCTION CODING CONVENTIONS Functions that modify for example registers R4 and R5
Using R4 and R5 and no function call
f1: PUSH {R4,R5}
● OK to modify ● R0-R5 and R12 ●
POP {R4,R5} BX LR
Using R4 and R5
and calling another function
f2: PUSH {R4,R5,LR}
● OK to modify
● R0–R5 and R12 ●
● OK to modify
● R0–R5 and R12
POP {R4,R5,PC}
Function f3 might modify R0-R3 and R12, but preserves R4-R11
FUNCTION CODING CONVENTIONS Two functions with parameters, one calling the other
Assembly Version
int32_t f1(int32_t x)
return f2(4) + x ; }
f1: PUSH {R4,LR} // Preserve R4
MOV R4,R0 // Keep x safe in R4
LDR R0,=4 // R0 <-- f2’s arg BL f2 // R0 <-- f2(4)
ADD R0,R0,R4 // R0 <-- f2(4) + x
POP {R4,PC} // Restore R4
On entry to function: R0
After MOV instruction: R0 After LDR instruction: R0
After the BL instruction: R0 After the ADD instruction: R0 After the POP instruction: R0
parameter x
parameter x
constant 4
f2 return value f1 return value f1 return value
R4 some value R4 parameter x R4 parameter x
R4 parameter x R4 parameter x R4 some value
Sample Program
void InitializeHardware(char *, char *) ;
• Must be first executable statement
• Initializes processor, CPU clock cycle counter,
the display and user push button.
• Creates stack and heap and initializes static
variables.
• Formats display area
uint32_t GetClockCycleCount(void) ;
start = GetClockCycleCount() ; // do some calculation here stop = GetClockCycleCount() ;
printf("Cycles=%d\n", stop-start) ;
Returns the number of processor clock cycles since initialization.
STM32F429I-DISCO
Cycles=1788
GetClockCycles
void WaitForPushButton(void) ;
Causes program to pause and wait for user to push the blue push button on the board.
void ClearDisplay(void) ;
STM32F429I-DISCO
Sine Function
Erases the scrollable area of the display.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com