CS代写 LS 16 bits

LOAD/STORE ARCHITECTURE
Arithmetic & Logic Unit (ALU)
All computations (add, subtract, etc.) are performed in the ALU.
Instruction operands and results are held in registers.

Copyright By PowCoder代写 加微信 powcoder

“Store”: Copy the contents of a register back into a variable in memory.
“Load”: Copy the contents of a variable from memory into a register.
Main Memory
But Variables reside in main memory.

INSTRUCTIONS FOR COPYING DATA
Constant to Register:
LDR* (MOV, MVN, MOVW, MOVT)
This LDR is a “pseudo” instruction
“Store Register” Instructions:
STR, STRD, STRB, STRH
Register to Register: MOV
“Load Register” Instructions:
LDR, LDRD, LDRB, LDRH, LDRSB, LDRSH
This LDR is a real instruction
Main Memory

MOV R0,100 MVN R0,100
16-bit instruction: 0-255 // R0~100 (-101) 16-bit instruction: 0-255
REGISTER  CONSTANT
MOV R0,-100 // R0-100
// Assembler replaces this by MVN R0,99
MOVW R0,1000 // R01000 32-bit instructions: 0-65535
MOVT R0,1000 // R01000 << 16 MOVW R0,100000 & 0xFFFF // LS 16 bits MOVT R0,100000 >> 16 // MS 16 bits
An arbitrary 32-bit constant

The LDR Pseudo-Instruction
A “pseudo-instruction” is not a real ARM instruction. When used, the assembler replaces it with an equivalent operation using a real instruction.
Format: LDR Rd,=constant
The equals sign distinguishes this pseudo- instruction from a real LDR instruction.
The pseudo-instruction is replaced by one of the following if possible:
Else it is replaced by a real LDR that loads the constant from memory:
LDR R0,=100000
LDR R0,.temp
Instruction Format and Width
.temp .word

WRITING INTEGER CONSTANTS
• Decimal: 123
• Binary: 0b10110111 • Octal: 0123
• Hexadecimal: 0xFACE
• ASCIICharacter ‘a (8 bits)
C-style character constants also work (‘a’), but not all escape sequences (‘\0 or ‘\0’) do. It’s safer to use hex (0x00) instead.

REGISTERMEMORY (32-BITS) Load Register from Word
LDR R0,[R1]
// Copies a 32-bit word
// from the memory location // whose address is in R1 // into register R0.
32-bit word
register R0
Used with data of type int32_t, uint32_t, and all pointers

REGISTER PAIRMEMORY (64-BITS) Load Register Pair from DoubleWord
LDRD R0,R1,[R2]
// Copies the lower
// half of the value
// held in the 64-bit
// memory location whose
// address is in R2
// into register R0,
// and the upper half
// into R1.
Bits 32-63
register R1
register R0
The 64-bit operand must be word aligned (located at a mod 4 adrs) or an address fault will occur.
Used with data of type int64_t and uint64_t

Copying variables < 32 bits wide (32-bit register copy must have same value) Zero-Extend: Add leading 0’s 4-bit example: 00⋯0011012 = 1310 00⋯0011012 = 1310 Signed (2’s complement) Sign-Extend: Replicate sign bit 4-bit example: 11⋯1111012 = -310 11⋯ 𝟏𝟏11012 = -310 Instructions that zero-extend: UXTB, UXTH, LDRB, LDRH Instructions that sign-extend: SXTB, SXTH, LDRSB, LDRSH REGISTERMEMORY (8-BITS UNSIGNED) Load Register from (Unsigned) Byte LDRB R0,[R1] // Copies the unsigned // value held in the // 8-bit memory location // whose address is in R1 // into bits 0-7 of register // R0 and 0’s into bits 8-31. Used with data of type uint8_t register R0 REGISTERMEMORY (16-BITS UNSIGNED) Load Register from (Unsigned) HalfWord LDRH R0,[R1] // Copies the unsigned // value held in the // 16-bit memory location // whose address is in R1 // into bits 0-15 of register // R0 and 0’s into bits 16-31. Used with data of type uint16_t register R0 REGISTERMEMORY (8-BITS SIGNED) Load Register from Signed Byte LDRSB R0,[R1] // Copies the signed // value held in the // 8-bit memory location // whose address is in R1 // into bits 0-7 of // register R0 and 24 // copies of bit 7 of // sbyte8 into bits 8-31. Used with data of type int8_t 24 copies of bit 7 register R0 REGISTERMEMORY (16-BITS SIGNED) Load Register from Signed HalfWord LDRSH R0,[R1] // Copies the signed // value held in the // 16-bit memory location // whose address is in R1 // into bits 0-15 of R0 and // 16 copies of bit 15 of // shalf16 into bits 16-31. Used with data of type int16_t 16 copies of bit 15 register R0 // Copies all 32 bits // of the value held // in register R1 into // the register R0 register R1 REGISTER  REGISTER Move Instruction register R0 REGISTER→MEMORY (32-BITS) Store Register to Word STR R0,[R1] // Copies all 32 bits // of the value held // in register R0 into // the 32-bit memory // location whose address // is in register R1 register R0 32-bit word Used with data of type int32_t, uint32_t, and all pointers REGISTER PAIR→MEMORY (64-BITS) Store Register Pair to DoubleWord STRD R0,R1,[R2] // Copies the contents // of register R0 into // the lower half, and // register R1 into the // upper half, of the // 64-bit memory location // whose address is in // register R2. register R1 register R0 Bits 32-63 The 64-bit operand must be word aligned (located at a mod 4 adrs) or an address fault will occur. Used with data of type int64_t and uint64_t REGISTER→MEMORY (8-BITS) Store Register to Byte STRB R0,[R1] // Copies bits 0-7 of // the value held in // register R0 into // the 8-bit memory // location whose address // is in register R1 register R0 Register bits 8-31 are not copied. Used with data of type int8_t and uint8_t REGISTER→MEMORY (16-BITS) Store Register to HalfWord STRH R0,[R1] // Copies bits 0-15 // of the value held // in register R0 // into the 16-bit // memory location // whose address is // in register R1. register R0 Register bits 16-31 are not copied. Used with data of type int16_t and uint16_t Variable YVariable X (32 bits) Common Coding Mistake // Assume R2=&x, R3=&y LDR R0,[R2] LDR R1,[R3] Register R0 Register R1 x (in memory) y (in memory) ? ? 1000 ? 1000 ? 1000 ? 1000 ? 1000 ? 1000 1000 1000 ? The second LDR instruction doesn’t move Y into R1, it merely makes a copy of its value. Thus the MOV doesn’t change Y, it only changes the copy in R1. Variable YVariable X (32 bits) Correct Coding Solution // Assume R2=&x, R3=&y LDR R0,[R2] STR R0,[R3] Register R0 x (in memory) y (in memory) ? 1000 ? 1000 1000 ? 1000 1000 1000 DATA COPYING INSTRUCTIONS Pointers are always 32 bits wide. Copy with LDR and STR. int_32, uint_32, pointer LDR/STR 32-bit register(s) int_64, uint_64 LDRSH/STRH LDRSB/STRB EXAMPLES OF COPYING DATA 8-bit destination 16-bit destination 32-bit destination 64-bit destination LDR R0,=5 STRB R0, [⋯ ] LDR R0,=5 STRH R0,[⋯] LDR R0,=5 STR R0, [⋯ ] LDR1 R1,=0 STRD R0,R1, [⋯ ] 8-bit Variable LDRB R0, [⋯ ] STRB R0, [⋯ ] LDRB2 R0, [⋯ ] STRH R0,[⋯] LDRB2 R0, [⋯ ] STR R0, [⋯ ] LDRB2 R0, [⋯ ] LDR3 R1,=0 STRD R0,R1, [⋯ ] 16-bit Variable LDRB R0, [⋯ ] STRB R0, [⋯ ] LDRH R0, [⋯ ] STRH R0,[⋯] LDRH4 R0, [⋯ ] STR R0, [⋯ ] LDRH4 R0, [⋯ ] LDR3 R1,=0 STRD R0,R1, [⋯ ] 32-bit Variable LDRB R0, [⋯ ] STRB R0, [⋯ ] LDRH R0, [⋯ ] STRH R0,[⋯] LDR R0, [⋯ ] STR R0, [⋯ ] LDR R0, [⋯ ] LDR3 R1,=0 STRD R0,R1, [⋯ ] 64-bit Variable LDRB R0, [⋯ ] STRB R0, [⋯ ] LDRH R0, [⋯ ] STRH R0, [⋯ ] LDR R0, [⋯ ] STR R0, [⋯ ] LDRD R0,R1, [⋯ ] STRD R0,R1, [⋯ ] 1 Replace with LDR R1,=-1 if source operand is a negative constant. 2 Replace with LDRSB if source operand is signed. 3 Replace with ASR R1,R0,31 if source operand is signed. 4 Replace with LDRSH if source operand is signed. Contents versus Address LDR R0,x ; Copies the contents of variable ‘x’ from memory into a ; register, but x must be within ±4095 bytes of the instruction LDR R0,=x ; Copies the address of a variable ‘x’ into a register Function call in C Code produced by the compiler void f1(int32_t *) ; int32_t s32 ; ● f1(&s32) ; ● LDR R0,=s32 // load R0 with &s32 BL f1 // call function f1 ● ADR R0,x ; Also copies the address of a variable ‘x’ into a register, ; BUT x must be within ±4095 bytes of the instruction. PC-Relative Addressing Address of x is a constant, determined before execution PC-Relative addressing only works when the data is near the instruction! LDR R0,x // To the assembler, the symbol // “x” represents the address // of the variable. The address // is a constant, determined // before execution begins. Number of bits depends on the instruction. Address of x Displacement constant (PC-relative) (distance from this instruction) Pointer Dereferencing int32_t *p ; // ‘p’ is a pointer to a 32-bit integer The address of p is a constant, determined during assembly, but the content of p is a variable, possibly modified during execution Address of *p must be determined at run-time: LDR R0,=p // R0  &p LDR R0,[R0] // R0  p (adrs of *p) // R1*p 15 1110 65 32 0 LDR Offset R0 R1 (imm. offset) LDR R1,[R0] Determining an Operand Address // ‘a’ is an array of four 32-bit integ elative addressing only int32_t a[4] ; Address of a[2] is a constant, determined during assembly: The symbol ‘a’ is a works when the data is near the instruction! 11 10 8 7 0 constant (the adrs of a[0]), and thus the expression ‘a+8’ is the address of a[2] LDRR0,a+8 //R0a[2] Displacement constant Address of a[k] must be computed at run-time because ‘k’ is a variable Address of a[2] // R0  &a[0] (a constant) // R1  &k (a constant) // R1  k (a variable) (PC-relative) (distance from this instruction) LDR R1,[R1] LDR R2,[R0,R1,LSL 2] // adrs = 𝑅0 + 4 𝑥 𝑅1 1 096521 65430 111110000101 LDR (Register Offset Mode) R0 R2 LSL 2 R1 ADDRESSING MODES (Calculating a Memory Address) Pre-Indexed Mode: 1. R0R0 + 4 2. R0 provides address Post-Indexed Mode: 1. R0 provides address 2. R0R0 + 4 Use these in loops to reduce the number of instructions. Immediate Offset Mode: [R0] [R0,4] Register Offset Mode: [R0,R1] [R0,R1,LSL 2] R0 is the ‘base’; R1 or R1,LSL2 is the ‘offset’ from the base. R0 is the ‘base’; the optional constant 4 is the ‘offset’ from the base. Review: Pointer Arithmetic a16[3] a16[2] a16[1] int16_t a16[5] ; Note: Each member of the array is an object consisting of 2 bytes. A pointer holds an address. An address is always 32-bits. Thus all pointers are 32 bits wide. int16_t *p16 ; p16 = &a16[0] ; p16 = p16 + 1 ; The data type (int16_t) indicates the size and signedness of the objects that the pointer points to. Adding 1 to a pointer causes it to point to the next object. Since each object is 2 bytes, the address must increase by 2. IMMEDIATE OFFSET MODE [Rn{,constant}] Rn + constant 1.[R5,100] 2.[R5] Instruction Register Immediate + Offset IMMEDIATE OFFSET: POINTERS & ARRAYS Function in C Function in assembly void f1(int32_t *p32) *p32 = 0 ; *(p32 + 1) = 0 ; f1: LDR R1,=0 // R1 <-- 0 // R1 --> memory[R0]
// R1 –> memory[R0+4]
BX LR // return
STR R1,[R0]
STR R1,[R0,4]
Pointer arithmetic! Adding 1 to p32 adds 4 to address.
Function in C
Function in assembly
void f2(int32_t a32[]) {
a32[0] = 0 ;
a32[1] = 0 ;
f2: LDR R1,=0
STR R1,[R0]
STR R1,[R0,4]
Array and pointer parameters are treated the same

REGISTER OFFSET MODE
[Rn,Rm,LSL constant]
Rn + (Rm << constant) [R4,R5,LSL 2] Instruction Register #bits to shift left: Only 0, 1, 2, or 3 (Multiply by 1, 2, 4, or 8) z constant left shifter Register + Offset Instruction Register LDRH R0,[R1,R2,LSL 1] Subscripting: a16[k] = 0 LDR R1,=a16 LDR R2,[R2] STRH R0,[R1,R2,LSL 1] // R00 (value to store) // R1starting address of array (&a16[0] = 1240) // R2address of the subscript (k) // R2subscript (assume k=3 here) // R0→a16[k] R2 (subscript) Left shifter R1 (starting address) #bits to shift left = 1 (2 x R2) a16[6] a16[5] a16[4] a16[3] a16[2] a16[1] a16[0] REGISTER OFFSET: POINTERS & ARRAYS Function in C void f1(int8_t *p8, int16_t *p16, int32_t k32) Function in assembly f1: LDR R3,=0 STRB R3,[R0,R2] *(p8 +k32)=0; STRH R3,[R1,R2,LSL 1] *(p16 + k32) = 0 ; Function in C void f2(int8_t a8[], int16_t a16[], int32_t k32) { Pointer arithmetic! R2,LSL 1 = 2*k32. Function in assembly f2: LDR R3,=0 STRB R3,[R0,R2] a8[k32] = 0 ; a16[k32] = 0 ; STRH R3,[R1,R2,LSL 1] PHYSICAL MEMORY DESIGN Address bits 0 & 1 select 1 of 4 bytes 8-bits 8-bits 8-bits 8-bits 1024 select one 32-bit 1020 address bits 30-2 physical word 1012 address = 00) 1008 (bits 0 & 1 of word Memory Addressing • Logically, memory is organized into bytes • Every byte has its own 32-bit address. • Every memory read retrieves a 32-bit physical word. • Accessing a byte: – Most-significant 30 bits of address select the physical – Least-significant 2 bits of address select one of 4 bytes within the word POINTERS AND STRUCTURES uint32_t uint16_t uint64_t }s ; 4 bytes (32 bits) 100C – 100F 1008 – 100B 1004 – 1007 1000 – 1003 s.z64 (bits 63..32) s.z64 (bits 31..0) x32 ; // 4 bytes y16 ; // 2 bytes z64 ; // 8 bytes To optimize speed, C places each member of a structure in memory so that it can be retrieved using the minimum number of memory accesses: • 16-bit data is placed on an even (mod 2) address. • 32 and 64-bit data is placed on a mod 4 address. So even though this structure only contains 14 bytes of data, it occupies 16 bytes of memory. POINTERS AND STRUCTURES 4 bytes (32 bits) 100C – 100F 1008 – 100B 1004 – 1007 1000 – 1003 // 2 bytes // 4 bytes // 2 bytes // 4 bytes #pragma pack(1) 4 bytes (32 bits) 1008 – 100B 1004 – 1007 1000 – 1003 Optimized for speed (default) // 2 bytes // 4 bytes // 2 bytes // 4 bytes s1.b3231..16 s1.b3215..0 #pragma pack() Optimized to conserve memory POINTERS AND STRUCTURES Function Call in C Accessing s2.d32 #pragma pack(1) uint16_t a16 ; uint32_t b32 ; uint16_t c16 ; uint32_t d32 ; } s2 ; #pragma pack() f:// R0 = &s2 ... // R1 = s2->d32
LDR R1,[R0,8] …
Function Call in C
Accessing s1.d32
uint16_t a16 ; uint32_t b32 ; uint16_t c16 ; uint32_t d32 ; } s1 ;
f:// R0 = &s1 …
// R1 = s1->d32
LDR R1,[R0,12] …

Address Alignment
Word-Alignment = Address is multiple of 4
• Word-Aligned 32-bit operand:
– Fully contained within a single physical word
– Only 1 memory cycle to read (load) or write (store)
• Unaligned 32-bit operand:
– Split across 2 physical words
– Requires 2 memory cycles to read or write
Halfword-Alignment = Address is multiple of 2

The .align assembler directive
• Format: .align constant
– constant is the number of least-significant bits
of the next address that must be zeroes.
• Examples:
.align .align .align
// Next address is a multiple of 2 (halfword-aligned) // Next address is a multiple of 4 (word-aligned)
// Default (no constant)→word-aligned
• Inserts a 16-bit NOP (“No Operation”) instruction if needed to force word-alignment.

Instruction Fetch
• Processor requires instructions to be halfword-aligned – Every instruction starts on a mod 2 address (even address)
• Every instruction fetch retrieves 32 bits from memory:
– One 32-bit instruction, or
– Two 16-bit instructions, or
– One 16-bit instruction and first half of a 32-bit instruction, or – Second half of a 32-bit instruction and a 16-bit instruction

Optimizing Instruction a .align directive at the entry point of every function to force the function to start on a word- aligned (mod 4) address.
– Each Instruction is either 2 or 4 bytes in length.
– N 32-bit or 2N 16-bit instructions = 4N bytes
– Word-alignment allows 4N instruction bytes to be fetched in N memory read cycles rather than N+1.
Func1: PUSH
Func2: PUSH
.global Func1
{R4,LR} POP {R4,PC}
.global Func2
{R4,LR} POP {R4,PC}

Optimizing Instruction a .align directive at the top of every loop to force the loop to start on a word-aligned address.
• Word-alignment allows a loop of 4N instruction bytes that is repeated R times to be fetched in R*N rather than R*(N+1) memory read cycles.
• OK to place .align between two instructions
– No effect if already word-aligned
– Otherwise inserts a NOP
(NOP = “No Operation” instruction)
Func3: PUSH
.glabal Func3
B L1 L2: ⋯
POP {R4,PC}

int32_t f1(int8_t s8)
return s8 + 1 ;
1. 8 and 16-bit ints are promoted to native CPU word size before use in expressions, thus…
2. Functions receive 8 and 16-bit parameters as 32-bit ints on our processor (Cortex-M4F).
// R0 = s8 (sign-extended to 32-bits)
.global f1 .align .thumb_func
f1:ADD R0,R0,1 // R0 = s8 + 1 BX LR

int32_t f2(int8_t *ps8)
return *ps8 + 1 ; }
// R0 = ps8 (a 32-bit ptr to int8_t)
.global f2 .align .thumb_func
ADD R0,R0,1 BX LR
R0,[R0] // R0 = *ps8
// R0 = *ps8 + 1

int32_t f3(int16_t *ps16)
return *(ps16 + 1) ;
// R0 = ps16 (a 32-bit ptr to int16_t)
.global f3 .align .thumb_func
f3: ADD LDRSH
R0,R0,2 // R0 = ps16 + 1 R0,[R0] // R0 = *(ps16 + 1) LR
.global f3 .align .thumb_func
f3: LDRSH R0,[R0,2] BX LR

int32_t f4(int32_t a32[])
return a32[1] ; }
.global f4 .align .thumb_func
f4: LDR R0,[R0,4] BX LR
// R0 = a32 (a 32-bit ptr to int32_t)
.global f4 .align .thumb_func
f4:ADD R0,R0,4 // R0 = a32 + 1 LDR R0,[R0] // R0 = a32[1] BX LR

int32_t f5(int32_t a32[], int32_t k32)
return a32[k32] ;
f5: LDR R0,[R0,R1,LSL 2] BX LR
// R0 = a32 (a 32-bit ptr to int32_t)
// R1 = k32 (a 32-bit int)
.global f5 .align .thumb_func
f5: LSL ADD LDR
R1,R1,2 // R1 = 4*k32 (scaled) R0,R0,R1 // R0 = a32 + 4*k32 R0,[R0] // R0 = a32[k32]

int32_t f6(int32_t a32[], int32_t k32) {
return (a32+k32)[0] ;
} // R1 = k32 (a 32-bit int)
.global f6 int32_t f6(int32_t a32[], int32_t k32)
return *(a32+k32) ; }
.thumb_func
// R0 = a32 (a 32-bit ptr to int32_t)
ADD R0,R0,R1 // R0 = a32 + 4*k32 LDR R0,[R0] // R0 = *(a32 + k32) BX LR // R0 = (a32+k32)[0]
R1,R1,2 // R1 = 4*k32 (scaled)
f6: LDR R0,[R0,R1,LSL 2] BX LR

int16_t *f7(int16_t *ps16) {
return ps16 + 1 ;
// R0 = ps16 (a 32-bit ptr to int16_t)
.global f7 .align .thumb_func
f7: ADD R0,R0,2 // R0 = ps16 + 1 BX LR

int32_t f8(int16_t **pps16)
return **pps16 ;
1. pps16 is a pointer to a pointer to an int16_t. 2. *pps16 is a pointer to an int16_t.
3. **pps16 is an int16_t
// R0 = pps16 (a 32-bit ptr to int16_t *)
.global f8 .align .thumb_func
LDRSH R0,[R0] BX LR
R0,[R0] // R0 = *pps16 // R0 = **pps16

int32_t f9(int16_t **pps16)
return **(pps16 + 1) ;
f9: LDR LDRSH

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com