LOAD/STORE ARCHITECTURE
Arithmetic & Logic Unit (ALU)
All computations (add, subtract, etc.) are performed in the ALU.
Instruction operands and results are held in registers.
Copyright By PowCoder代写 加微信 powcoder
“Store”: Copy the contents of a register back into a variable in memory.
“Load”: Copy the contents of a variable from memory into a register.
Main Memory
But Variables reside in main memory.
INSTRUCTIONS FOR COPYING DATA
Constant to Register:
LDR* (MOV, MVN, MOVW, MOVT)
This LDR is a “pseudo” instruction
“Store Register” Instructions:
STR, STRD, STRB, STRH
Register to Register: MOV
“Load Register” Instructions:
LDR, LDRD, LDRB, LDRH, LDRSB, LDRSH
This LDR is a real instruction
Main Memory
MOV R0,100 MVN R0,100
16-bit instruction: 0-255 // R0~100 (-101) 16-bit instruction: 0-255
REGISTER CONSTANT
MOV R0,-100 // R0-100
// Assembler replaces this by MVN R0,99
MOVW R0,1000 // R01000 32-bit instructions: 0-65535
MOVT R0,1000 // R01000 << 16 MOVW R0,100000 & 0xFFFF // LS 16 bits
MOVT R0,100000 >> 16 // MS 16 bits
An arbitrary 32-bit constant
The LDR Pseudo-Instruction
A “pseudo-instruction” is not a real ARM instruction. When used, the assembler replaces it with an equivalent operation using a real instruction.
Format: LDR Rd,=constant
The equals sign distinguishes this pseudo- instruction from a real LDR instruction.
The pseudo-instruction is replaced by one of the following if possible:
Else it is replaced by a real LDR that loads the constant from memory:
LDR R0,=100000
LDR R0,.temp
Instruction Format and Width
.temp .word
WRITING INTEGER CONSTANTS
• Decimal: 123
• Binary: 0b10110111 • Octal: 0123
• Hexadecimal: 0xFACE
• ASCIICharacter ‘a (8 bits)
C-style character constants also work (‘a’), but not all escape sequences (‘\0 or ‘\0’) do. It’s safer to use hex (0x00) instead.
REGISTERMEMORY (32-BITS) Load Register from Word
LDR R0,[R1]
// Copies a 32-bit word
// from the memory location // whose address is in R1 // into register R0.
32-bit word
register R0
Used with data of type int32_t, uint32_t, and all pointers
REGISTER PAIRMEMORY (64-BITS) Load Register Pair from DoubleWord
LDRD R0,R1,[R2]
// Copies the lower
// half of the value
// held in the 64-bit
// memory location whose
// address is in R2
// into register R0,
// and the upper half
// into R1.
Bits 32-63
register R1
register R0
The 64-bit operand must be word aligned (located at a mod 4 adrs) or an address fault will occur.
Used with data of type int64_t and uint64_t
Copying variables < 32 bits wide
(32-bit register copy must have same value)
Zero-Extend: Add leading 0’s
4-bit example: 00⋯0011012 = 1310
00⋯0011012 = 1310
Signed (2’s complement)
Sign-Extend: Replicate sign bit
4-bit example: 11⋯1111012 = -310
11⋯ 𝟏𝟏11012 = -310
Instructions that zero-extend: UXTB, UXTH, LDRB, LDRH
Instructions that sign-extend: SXTB, SXTH, LDRSB, LDRSH
REGISTERMEMORY (8-BITS UNSIGNED) Load Register from (Unsigned) Byte
LDRB R0,[R1]
// Copies the unsigned
// value held in the
// 8-bit memory location
// whose address is in R1
// into bits 0-7 of register
// R0 and 0’s into bits 8-31.
Used with data of type uint8_t
register R0
REGISTERMEMORY (16-BITS UNSIGNED) Load Register from (Unsigned) HalfWord
LDRH R0,[R1]
// Copies the unsigned
// value held in the
// 16-bit memory location
// whose address is in R1
// into bits 0-15 of register
// R0 and 0’s into bits 16-31.
Used with data of type uint16_t
register R0
REGISTERMEMORY (8-BITS SIGNED) Load Register from Signed Byte
LDRSB R0,[R1]
// Copies the signed
// value held in the
// 8-bit memory location // whose address is in R1 // into bits 0-7 of
// register R0 and 24
// copies of bit 7 of
// sbyte8 into bits 8-31.
Used with data of type int8_t
24 copies of bit 7
register R0
REGISTERMEMORY (16-BITS SIGNED) Load Register from Signed HalfWord
LDRSH R0,[R1]
// Copies the signed
// value held in the
// 16-bit memory location
// whose address is in R1
// into bits 0-15 of R0 and
// 16 copies of bit 15 of
// shalf16 into bits 16-31.
Used with data of type int16_t
16 copies of bit 15
register R0
// Copies all 32 bits // of the value held // in register R1 into // the register R0
register R1
REGISTER REGISTER Move Instruction
register R0
REGISTER→MEMORY (32-BITS) Store Register to Word
STR R0,[R1]
// Copies all 32 bits
// of the value held
// in register R0 into
// the 32-bit memory
// location whose address
// is in register R1
register R0
32-bit word
Used with data of type int32_t, uint32_t, and all pointers
REGISTER PAIR→MEMORY (64-BITS) Store Register Pair to DoubleWord
STRD R0,R1,[R2]
// Copies the contents // of register R0 into // the lower half, and // register R1 into the // upper half, of the
// 64-bit memory location // whose address is in // register R2.
register R1
register R0
Bits 32-63
The 64-bit operand must be word aligned (located at a mod 4 adrs) or an address fault will occur.
Used with data of type int64_t and uint64_t
REGISTER→MEMORY (8-BITS) Store Register to Byte
STRB R0,[R1]
// Copies bits 0-7 of
// the value held in
// register R0 into
// the 8-bit memory
// location whose address
// is in register R1
register R0
Register bits 8-31 are not copied.
Used with data of type int8_t and uint8_t
REGISTER→MEMORY (16-BITS) Store Register to HalfWord
STRH R0,[R1]
// Copies bits 0-15
// of the value held
// in register R0
// into the 16-bit
// memory location
// whose address is
// in register R1.
register R0
Register bits 16-31 are not copied.
Used with data of type int16_t and uint16_t
Variable YVariable X (32 bits) Common Coding Mistake
// Assume R2=&x, R3=&y
LDR R0,[R2]
LDR R1,[R3]
Register R0 Register R1 x (in memory) y (in memory) ? ? 1000 ?
1000 ? 1000 ?
1000 ? 1000 ?
1000 1000 1000 ?
The second LDR instruction doesn’t move Y into R1, it merely makes a copy of its value. Thus the MOV doesn’t change Y, it only changes the copy in R1.
Variable YVariable X (32 bits) Correct Coding Solution
// Assume R2=&x, R3=&y
LDR R0,[R2]
STR R0,[R3]
Register R0 x (in memory) y (in memory)
? 1000 ? 1000 1000 ?
1000 1000 1000
DATA COPYING INSTRUCTIONS
Pointers are always 32 bits wide. Copy with LDR and STR.
int_32, uint_32, pointer LDR/STR
32-bit register(s)
int_64, uint_64
LDRSH/STRH
LDRSB/STRB
EXAMPLES OF COPYING DATA
8-bit destination
16-bit destination
32-bit destination
64-bit destination
LDR R0,=5 STRB R0, [⋯ ]
LDR R0,=5 STRH R0,[⋯]
LDR R0,=5 STR R0, [⋯ ]
LDR1 R1,=0 STRD R0,R1, [⋯ ]
8-bit Variable
LDRB R0, [⋯ ] STRB R0, [⋯ ]
LDRB2 R0, [⋯ ] STRH R0,[⋯]
LDRB2 R0, [⋯ ] STR R0, [⋯ ]
LDRB2 R0, [⋯ ] LDR3 R1,=0 STRD R0,R1, [⋯ ]
16-bit Variable
LDRB R0, [⋯ ] STRB R0, [⋯ ]
LDRH R0, [⋯ ] STRH R0,[⋯]
LDRH4 R0, [⋯ ] STR R0, [⋯ ]
LDRH4 R0, [⋯ ] LDR3 R1,=0 STRD R0,R1, [⋯ ]
32-bit Variable
LDRB R0, [⋯ ] STRB R0, [⋯ ]
LDRH R0, [⋯ ] STRH R0,[⋯]
LDR R0, [⋯ ] STR R0, [⋯ ]
LDR R0, [⋯ ] LDR3 R1,=0 STRD R0,R1, [⋯ ]
64-bit Variable
LDRB R0, [⋯ ] STRB R0, [⋯ ]
LDRH R0, [⋯ ] STRH R0, [⋯ ]
LDR R0, [⋯ ] STR R0, [⋯ ]
LDRD R0,R1, [⋯ ] STRD R0,R1, [⋯ ]
1 Replace with LDR R1,=-1 if source operand is a negative constant. 2 Replace with LDRSB if source operand is signed.
3 Replace with ASR R1,R0,31 if source operand is signed. 4 Replace with LDRSH if source operand is signed.
Contents versus Address
LDR R0,x ; Copies the contents of variable ‘x’ from memory into a
; register, but x must be within ±4095 bytes of the instruction
LDR R0,=x ; Copies the address of a variable ‘x’ into a register
Function call in C
Code produced by the compiler
void f1(int32_t *) ;
int32_t s32 ; ●
f1(&s32) ; ●
LDR R0,=s32 // load R0 with &s32
BL f1 // call function f1 ●
ADR R0,x ; Also copies the address of a variable ‘x’ into a register, ; BUT x must be within ±4095 bytes of the instruction.
PC-Relative Addressing
Address of x is a constant, determined before execution
PC-Relative addressing only works when the data is near the instruction!
LDR R0,x // To the assembler, the symbol
// “x” represents the address
// of the variable. The address
// is a constant, determined
// before execution begins.
Number of bits depends on the instruction.
Address of x
Displacement constant
(PC-relative)
(distance from this instruction)
Pointer Dereferencing
int32_t *p ; // ‘p’ is a pointer to a 32-bit integer
The address of p is a constant, determined during assembly, but
the content of p is a variable, possibly modified during execution Address of *p must be determined at run-time:
LDR R0,=p // R0 &p
LDR R0,[R0] // R0 p (adrs of *p)
// R1*p 15 1110 65 32 0
LDR Offset R0 R1 (imm. offset)
LDR R1,[R0]
Determining an Operand Address
// ‘a’ is an array of four 32-bit integ
elative addressing only
int32_t a[4] ;
Address of a[2] is a constant, determined during assembly: The symbol ‘a’ is a
works when the data is near the instruction!
11 10 8 7 0
constant (the adrs of a[0]), and thus the expression ‘a+8’ is the address of a[2] LDRR0,a+8 //R0a[2]
Displacement constant
Address of a[k] must be computed at run-time because ‘k’ is a variable
Address of a[2]
// R0 &a[0] (a constant) // R1 &k (a constant)
// R1 k (a variable)
(PC-relative)
(distance from this instruction)
LDR R1,[R1]
LDR R2,[R0,R1,LSL 2] // adrs = 𝑅0 + 4 𝑥 𝑅1
1 096521 65430
111110000101
LDR (Register Offset Mode) R0 R2 LSL 2 R1
ADDRESSING MODES
(Calculating a Memory Address)
Pre-Indexed Mode:
1. R0R0 + 4
2. R0 provides address
Post-Indexed Mode:
1. R0 provides address 2. R0R0 + 4
Use these in loops to reduce the number of instructions.
Immediate Offset Mode:
[R0] [R0,4]
Register Offset Mode:
[R0,R1] [R0,R1,LSL 2]
R0 is the ‘base’;
R1 or R1,LSL2 is the ‘offset’ from the base.
R0 is the ‘base’; the optional constant 4 is the ‘offset’ from the base.
Review: Pointer Arithmetic
a16[3] a16[2] a16[1]
int16_t a16[5] ;
Note: Each member of the array is an object consisting of 2 bytes.
A pointer holds an address.
An address is always 32-bits. Thus all pointers are 32 bits wide.
int16_t *p16 ;
p16 = &a16[0] ;
p16 = p16 + 1 ;
The data type (int16_t) indicates the size and signedness of the objects that the pointer points to.
Adding 1 to a pointer causes it to point to the next object. Since each object is 2 bytes, the address must increase by 2.
IMMEDIATE OFFSET MODE
[Rn{,constant}]
Rn + constant
1.[R5,100] 2.[R5]
Instruction Register
Immediate + Offset
IMMEDIATE OFFSET: POINTERS & ARRAYS
Function in C
Function in assembly
void f1(int32_t *p32)
*p32 = 0 ;
*(p32 + 1) = 0 ;
f1: LDR R1,=0 // R1 <-- 0
// R1 --> memory[R0]
// R1 –> memory[R0+4]
BX LR // return
STR R1,[R0]
STR R1,[R0,4]
Pointer arithmetic! Adding 1 to p32 adds 4 to address.
Function in C
Function in assembly
void f2(int32_t a32[]) {
a32[0] = 0 ;
a32[1] = 0 ;
f2: LDR R1,=0
STR R1,[R0]
STR R1,[R0,4]
Array and pointer parameters are treated the same
REGISTER OFFSET MODE
[Rn,Rm,LSL constant]
Rn + (Rm << constant)
[R4,R5,LSL 2]
Instruction Register
#bits to shift left: Only 0, 1, 2, or 3 (Multiply by 1, 2, 4, or 8)
z constant
left shifter
Register + Offset
Instruction Register
LDRH R0,[R1,R2,LSL 1]
Subscripting: a16[k] = 0
LDR R1,=a16
LDR R2,[R2]
STRH R0,[R1,R2,LSL 1]
// R00 (value to store)
// R1starting address of array (&a16[0] = 1240) // R2address of the subscript (k)
// R2subscript (assume k=3 here)
// R0→a16[k]
R2 (subscript)
Left shifter
R1 (starting address)
#bits to shift left = 1 (2 x R2)
a16[6] a16[5] a16[4] a16[3] a16[2] a16[1] a16[0]
REGISTER OFFSET: POINTERS & ARRAYS
Function in C
void f1(int8_t *p8, int16_t *p16, int32_t k32)
Function in assembly
f1: LDR R3,=0
STRB R3,[R0,R2]
*(p8 +k32)=0;
STRH R3,[R1,R2,LSL 1]
*(p16 + k32) = 0 ;
Function in C
void f2(int8_t a8[], int16_t a16[], int32_t k32) {
Pointer arithmetic! R2,LSL 1 = 2*k32.
Function in assembly
f2: LDR R3,=0
STRB R3,[R0,R2]
a8[k32] = 0 ;
a16[k32] = 0 ;
STRH R3,[R1,R2,LSL 1]
PHYSICAL MEMORY DESIGN
Address bits 0 & 1 select 1 of 4 bytes
8-bits 8-bits
8-bits 8-bits
1024 select one 32-bit 1020
address bits 30-2 physical word
1012 address = 00) 1008
(bits 0 & 1 of word
Memory Addressing
• Logically, memory is organized into bytes
• Every byte has its own 32-bit address.
• Every memory read retrieves a 32-bit physical word.
• Accessing a byte:
– Most-significant 30 bits of address select the physical
– Least-significant 2 bits of address select one of 4 bytes within the word
POINTERS AND STRUCTURES
uint32_t uint16_t uint64_t }s ;
4 bytes (32 bits)
100C – 100F 1008 – 100B 1004 – 1007 1000 – 1003
s.z64 (bits 63..32)
s.z64 (bits 31..0)
x32 ; // 4 bytes y16 ; // 2 bytes z64 ; // 8 bytes
To optimize speed, C places each member of a structure in memory so that it can be retrieved using the minimum number of memory accesses:
• 16-bit data is placed on an even (mod 2) address.
• 32 and 64-bit data is placed on a mod 4 address.
So even though this structure only contains 14 bytes of data, it occupies 16 bytes of memory.
POINTERS AND STRUCTURES
4 bytes (32 bits)
100C – 100F 1008 – 100B 1004 – 1007 1000 – 1003
// 2 bytes
// 4 bytes
// 2 bytes
// 4 bytes
#pragma pack(1)
4 bytes (32 bits)
1008 – 100B 1004 – 1007 1000 – 1003
Optimized for speed (default)
// 2 bytes
// 4 bytes
// 2 bytes
// 4 bytes
s1.b3231..16
s1.b3215..0
#pragma pack()
Optimized to conserve memory
POINTERS AND STRUCTURES
Function Call in C
Accessing s2.d32
#pragma pack(1)
uint16_t a16 ; uint32_t b32 ; uint16_t c16 ; uint32_t d32 ; } s2 ;
#pragma pack()
f:// R0 = &s2 ...
// R1 = s2->d32
LDR R1,[R0,8] …
Function Call in C
Accessing s1.d32
uint16_t a16 ; uint32_t b32 ; uint16_t c16 ; uint32_t d32 ; } s1 ;
f:// R0 = &s1 …
// R1 = s1->d32
LDR R1,[R0,12] …
Address Alignment
Word-Alignment = Address is multiple of 4
• Word-Aligned 32-bit operand:
– Fully contained within a single physical word
– Only 1 memory cycle to read (load) or write (store)
• Unaligned 32-bit operand:
– Split across 2 physical words
– Requires 2 memory cycles to read or write
Halfword-Alignment = Address is multiple of 2
The .align assembler directive
• Format: .align constant
– constant is the number of least-significant bits
of the next address that must be zeroes.
• Examples:
.align .align .align
// Next address is a multiple of 2 (halfword-aligned) // Next address is a multiple of 4 (word-aligned)
// Default (no constant)→word-aligned
• Inserts a 16-bit NOP (“No Operation”) instruction if needed to force word-alignment.
Instruction Fetch
• Processor requires instructions to be halfword-aligned – Every instruction starts on a mod 2 address (even address)
• Every instruction fetch retrieves 32 bits from memory:
– One 32-bit instruction, or
– Two 16-bit instructions, or
– One 16-bit instruction and first half of a 32-bit instruction, or – Second half of a 32-bit instruction and a 16-bit instruction
Optimizing Instruction a .align directive at the entry point of every function to force the function to start on a word- aligned (mod 4) address.
– Each Instruction is either 2 or 4 bytes in length.
– N 32-bit or 2N 16-bit instructions = 4N bytes
– Word-alignment allows 4N instruction bytes to be fetched in N memory read cycles rather than N+1.
Func1: PUSH
Func2: PUSH
.global Func1
{R4,LR} POP {R4,PC}
.global Func2
{R4,LR} POP {R4,PC}
Optimizing Instruction a .align directive at the top of every loop to force the loop to start on a word-aligned address.
• Word-alignment allows a loop of 4N instruction bytes that is repeated R times to be fetched in R*N rather than R*(N+1) memory read cycles.
• OK to place .align between two instructions
– No effect if already word-aligned
– Otherwise inserts a NOP
(NOP = “No Operation” instruction)
Func3: PUSH
.glabal Func3
B L1 L2: ⋯
POP {R4,PC}
int32_t f1(int8_t s8)
return s8 + 1 ;
1. 8 and 16-bit ints are promoted to native CPU word size before use in expressions, thus…
2. Functions receive 8 and 16-bit parameters as 32-bit ints on our processor (Cortex-M4F).
// R0 = s8 (sign-extended to 32-bits)
.global f1 .align .thumb_func
f1:ADD R0,R0,1 // R0 = s8 + 1 BX LR
int32_t f2(int8_t *ps8)
return *ps8 + 1 ; }
// R0 = ps8 (a 32-bit ptr to int8_t)
.global f2 .align .thumb_func
ADD R0,R0,1 BX LR
R0,[R0] // R0 = *ps8
// R0 = *ps8 + 1
int32_t f3(int16_t *ps16)
return *(ps16 + 1) ;
// R0 = ps16 (a 32-bit ptr to int16_t)
.global f3 .align .thumb_func
f3: ADD LDRSH
R0,R0,2 // R0 = ps16 + 1 R0,[R0] // R0 = *(ps16 + 1) LR
.global f3 .align .thumb_func
f3: LDRSH R0,[R0,2] BX LR
int32_t f4(int32_t a32[])
return a32[1] ; }
.global f4 .align .thumb_func
f4: LDR R0,[R0,4] BX LR
// R0 = a32 (a 32-bit ptr to int32_t)
.global f4 .align .thumb_func
f4:ADD R0,R0,4 // R0 = a32 + 1 LDR R0,[R0] // R0 = a32[1] BX LR
int32_t f5(int32_t a32[], int32_t k32)
return a32[k32] ;
f5: LDR R0,[R0,R1,LSL 2] BX LR
// R0 = a32 (a 32-bit ptr to int32_t)
// R1 = k32 (a 32-bit int)
.global f5 .align .thumb_func
f5: LSL ADD LDR
R1,R1,2 // R1 = 4*k32 (scaled) R0,R0,R1 // R0 = a32 + 4*k32 R0,[R0] // R0 = a32[k32]
int32_t f6(int32_t a32[], int32_t k32) {
return (a32+k32)[0] ;
} // R1 = k32 (a 32-bit int)
.global f6 int32_t f6(int32_t a32[], int32_t k32)
return *(a32+k32) ; }
.thumb_func
// R0 = a32 (a 32-bit ptr to int32_t)
ADD R0,R0,R1 // R0 = a32 + 4*k32 LDR R0,[R0] // R0 = *(a32 + k32) BX LR // R0 = (a32+k32)[0]
R1,R1,2 // R1 = 4*k32 (scaled)
f6: LDR R0,[R0,R1,LSL 2] BX LR
int16_t *f7(int16_t *ps16) {
return ps16 + 1 ;
// R0 = ps16 (a 32-bit ptr to int16_t)
.global f7 .align .thumb_func
f7: ADD R0,R0,2 // R0 = ps16 + 1 BX LR
int32_t f8(int16_t **pps16)
return **pps16 ;
1. pps16 is a pointer to a pointer to an int16_t. 2. *pps16 is a pointer to an int16_t.
3. **pps16 is an int16_t
// R0 = pps16 (a 32-bit ptr to int16_t *)
.global f8 .align .thumb_func
LDRSH R0,[R0] BX LR
R0,[R0] // R0 = *pps16 // R0 = **pps16
int32_t f9(int16_t **pps16)
return **(pps16 + 1) ;
f9: LDR LDRSH
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com