Embedded Systems with ARM Cortex-M Microcontrollers in Assembly Language and C (Dr. Yifeng Zhu)
Chapter 4 ARM Arithmetic and Logic Instructions
ECE 3375b Electrical and Computer Engineering Western University
Winter 2019
1
Overview:
Arithmetic and Logic Instructions
Shift
LSL (logic shift left), LSR (logic shift right), ASR (arithmetic shift right), ROR (rotate right), RRX (rotate right with extend) Logic
AND (bitwise and), ORR (bitwise or), EOR (bitwise exclusive or), ORN (bitwise or not), MVN (move not)
Bit set/clear
BFC (bit field clear), BFI (bit field insert), BIC (bit clear), CLZ (count leading zeroes)
Bit/byte reordering
RBIT (reverse bit order in a word), REV (reverse byte order in a word), REV16 (reverse byte order in each half-word independently), REVSH (reverse byte order in each half-word independently)
Addition
ADD, ADC (add with carry)
Subtraction
SUB, RSB (reverse subtract), SBC (subtract with carry)
Multiplication
MUL (multiply), MLA (multiply-accumulate), MLS (multiply-subtract), SMULL (signed long multiply-accumulate), SMLAL (signed long multiply-accumulate), UMULL (unsigned long multiply-subtract), UMLAL (unsigned long multiply-subtract)
Division
SDIV (signed), UDIV (unsigned) Saturation
SSAT(signed),USAT(unsigned) Sign extension
SXTB (signed), SXTH, UXTB, UXTH Bit field extract
SBFX (signed), UBFX (unsigned) Syntax
2
Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Example: Add
Unified Assembler Language (UAL) Syntax
3
Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
ADD r1, r2, r3
ADD r1, r2, #4
Traditional Thumb Syntax ADD r1, r3
ADD r1, #15
; r1 = r2 + r3
; r1 = r2 + 4
; r1 = r1 + r3
; r1 = r1 + 15
Commonly Used Arithmetic Operations
ADD {Rd,} Rn, Op2
Add.Rd Rn + Op2
ADC {Rd,} Rn, Op2
Add with carry. Rd Rn + Op2 + Carry
SUB {Rd,} Rn, Op2
Subtract. Rd Rn – Op2
SBC {Rd,} Rn, Op2
Subtract with carry. Rd Rn – Op2 + Carry – 1
RSB {Rd,} Rn, Op2
Reverse subtract. Rd Op2 – Rn
MUL {Rd,} Rn, Rm
Multiply. Rd (Rn × Rm)[31:0]
MLA Rd, Rn, Rm, Ra
Multiply with accumulate.
Rd (Ra + (Rn × Rm))[31:0]
MLS Rd, Rn, Rm, Ra
Multiply and subtract, Rd (Ra – (Rn × Rm))[31:0]
SDIV {Rd,} Rn, Rm
Signed divide. Rd Rn / Rm
UDIV {Rd,} Rn, Rm
Unsigned divide. Rd Rn / Rm
SSAT Rd, #n, Rm {,shift #s}
Signed saturate
USAT Rd, #n, Rm {,shift #s}
Unsigned saturate
4 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Example:
S: Set Condition Flags
start
LDR r0, =0xFFFFFFFFF
LDR r1, =0x00000001
ADDS r0, r0, r1 stop B stop
• For most instructions, we can add a suffix S to update the N, Z, C, V bit flags of the APSR register.
• In this example, the Z and C bits are set.
5 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Program Status Register
Application PSR (APSR), Interrupt PSR (IPSR), Execution PSR (EPSR) APSR
IPSR EPSR
31
N
30
Z
29
C
28
V
27
Q
26
25242322
Reserved
21
20
19181716
GE
15
14
13
12
11
10
9
Re
876
served
5
4
3
2
1
0
Re
served
ISR number
ICI/I T
T
Reserved
ICI/IT
Note:
• GE flags are only available on Cortex-M4 and M7
6 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Program Status Register
Application PSR (APSR), Interrupt PSR (IPSR), Execution PSR (EPSR) APSR
IPSR EPSR
Combine them together into one register (PSR) PSR
Note:
31
N
30
Z
29
C
28
V
27
Q
26
25 24 23 22
Reserved
21
20
19 18 17 16
GE
15
14
13
12
11
10
9
Re
876
served
5
4
3
2
1
0
Re
served
ISR number
ICI/I T
T
Reserved
ICI/IT
N
Z
C
V
Q
ICI/I T
T
Reserved
GE
Reserved
ICI/IT
ISR
number
• •
GE flags are only available on Cortex-M4 and M7 Use PSR in code
7
Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Example: 64-bit Addition
Most-significant (Upper) 32 bits Least-significant (Lower) 32 bits
00000002FFFFFFFF 0000000400000001
0000000700000000
Carry out
• A register can only store 32 bits
• A 64-bit integer needs two registers
• Split 64-bit addition into two 32-bit additions
+
8
Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Example: 64-bit Addition
start
;C =A+B
; Two 64-bit integers A (r1,r0) and B (r3, r2). ; Result C (r5, r4)
; A = 00000002FFFFFFFF
; B = 0000000400000001
LDR r0, =0xFFFFFFFF ; A’s lower 32 bits
LDR r1, =0x00000002 ; A’s upper 32 bits
LDR r2, =0x00000001 ; B’s lower 32 bits
LDR r3, =0x00000004 ; B’s upper 32 bits
; AddAandB
ADDS r4, r2, r0 ; C[31..0] = A[31..0] + B[31..0], update Carry ADC r5, r3, r1 ; C[64..32] = A[64..32] + B[64..32] + Carry
stop B stop
9 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Example: 64-bit Subtraction
start
;C=A-B
; Two 64-bit integers A (r1,r0) and B (r3, r2). ; Result C (r5, r4)
; A = 00000002FFFFFFFF
; B = 0000000400000001
LDR r0, =0xFFFFFFFF ; A’s lower 32 bits
LDR r1, =0x00000002 ; A’s upper 32 bits
LDR r2, =0x00000001 ; B’s lower 32 bits
LDR r3, =0x00000004 ; B’s upper 32 bits
; Subtract B from A
SUBS r4, r0, r2 ; C[31..0]= A[31..0] – B[31..0], update Carry SBC r5, r1, r3 ; C[64..32]= A[64..32] – B[64..32] – Carry
stop B stop
10 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Example: Short Multiplication and Division ; MUL: Signed multiply
MUL r6,r4,r2 ;r6=LSB32(r4×r2) ; UMUL: Unsigned multiply
UMUL r6, r4, r2 ; r6 = LSB32( r4 × r2 )
; MLA: Multiply with accumulation
MLA r6,r4,r1,r0 ;r6=LSB32(r4×r1)+r0 ; MLS: Multiply with subtract
MLS r6,r4,r1,r0 ;r6=LSB32(r4×r1)-r0
11 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Example: Long Multiplication
UMULL RdLo, RdHi, Rn, Rm
Unsigned long multiply. RdHi,RdLo unsigned(Rn × Rm)
SMULL RdLo, RdHi, Rn, Rm
Signed long multiply. RdHi,RdLo signed(Rn × Rm)
UMLAL RdLo, RdHi, Rn, Rm
Unsigned multiply with accumulate.
RdHi,RdLo unsigned(RdHi,RdLo + Rn × Rm)
SMLAL RdLo, RdHi, Rn, Rm
Signed multiply with accumulate.
RdHi,RdLo signed(RdHi,RdLo + Rn × Rm)
UMULLr3,r4,r0,r1 SMULLr3,r4,r0,r1 UMLALr3,r4,r0,r1 SMLALr3,r4,r0,r1
;r4:r3=r0 r1,r4=MSBbits,r3=LSBbits ;r4:r3=r0 r1
;r4:r3=r4:r3+r0 r1
;r4:r3=r4:r3+r0 r1
12 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Bitwise Logic
AND {Rd,} Rn, Op2
Bitwise logic AND. Rd Rn & operand2
ORR {Rd,} Rn, Op2
Bitwise logic OR. Rd Rn | operand2
EOR {Rd,} Rn, Op2
Bitwise logic exclusive OR. Rd Rn ^ operand2
ORN {Rd,} Rn, Op2
Bitwise logic NOT OR. Rd Rn | (NOT operand2)
BIC {Rd,} Rn, Op2
Bit clear. Rd Rn & NOT operand2
BFC Rd, #lsb, #width
Bit field clear. Rd[(width+lsb–1):lsb] 0
BFI Rd, Rn, #lsb, #width
Bit field insert.
Rd[(width+lsb–1):lsb] Rn[(width-1):0]
MVN Rd, Op2
Move NOT, logically negate all bits. Rd 0xFFFFFFFF EOR Op2
13 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Example: AND r2, r0, r1 32 bits
r0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 r1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1
r2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 Bit-wise Logic AND
14 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Example: ORR r2, r0, r1 32 bits
r0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 r1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1
r2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Bit-wise Logic OR
15 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Example: BIC r2, r0, r1 Bit Clear
r2 = r0 & NOT r1
Step 1:
00000000000000000000000000001111 NOT r1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
Step 2:
r0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 NOT r1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
r2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
r1
16 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Example: BFC and BFI
Bit Field Clear (BFC) and Bit Field Insert (BFI).
Syntax
BFC Rd, #lsb, #width
BFI Rd, Rn, #lsb, #width
Examples:
BFC R4, #8, #12
; Clear bit 8 to bit 19 (12 bits) of R4 to 0
BFI R9, R2, #8, #12
; Replace bit 8 to bit 19 (12 bits) of R9 with bit 0 to bit 11 from R2. 17 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Bit Operators (&, |, ~) vs Boolean Operators (&& ,||, !)
A && B
Boolean and
A& B
Bitwise and
A||B
Boolean or
A|B
Bitwise or
!B
Boolean not
~B
Bitwise not
The Boolean operators perform word-wide operations, not bitwise.
For example,
“0x10 & 0x01” = 0x00, but “0x10 && 0x01” = 0x01. “~0x01” = 0xFFFFFFFE, but “!0x01” = 0x00.
18 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Check a Bit in C
Example: k = 5
a 1 << k a & (1<
𝑆𝐴𝑇𝑥 =ቐ
−2𝑛−1 𝑖𝑓 𝑥 < −2𝑛−1 𝑥 𝑜𝑡h𝑒𝑟𝑤𝑖𝑠𝑒
USAT saturates a signed value to the unsigned range 0 ≤ x ≤ 2n - 1. 𝑈𝑆𝐴𝑇𝑥 =ቊ2𝑛−1 𝑖𝑓𝑥>2𝑛−1
𝑥 𝑜𝑡h𝑒𝑟𝑤𝑖𝑠𝑒
Examples:
SSAT r2, #11, r1 ; output range: -210 r2 210 USATr2,#11,r3 ;outputrange:0 r2 211
23 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Example of Saturation
Assume data are limited to 16 bits Without
saturation
With saturation
24
Chapter 4
Reverse Order
RBIT Rd, Rn
Reverse bit order in a word. for(i=0;i<32;i++) Rd[i] RN[31–i]
REV Rd, Rn
Reverse byte order in a word.
Rd[31:24] Rn[7:0], Rd[23:16] Rn[15:8], Rd[15:8] Rn[23:16], Rd[7:0] Rn[31:24]
REV16 Rd, Rn
Reverse byte order in each half-word. Rd[15:8] Rn[7:0], Rd[7:0] Rn[15:8], Rd[31:24] Rn[23:16], Rd[23:16] Rn[31:24]
REVSH Rd, Rn
Reverse byte order in bottom half-word and sign extend. Rd[15:8] Rn[7:0], Rd[7:0] Rn[15:8],
Rd[31:16] Rn[7] & 0xFFFF
RBIT Rd, Rn Rn
Rd Example:
LDR r0, =0x12345678 ; r0 = 0x12345678
RBIT r1, r0 ; Reverse bits, r1 = 0x1E6A2C48
25 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Reverse Order
RBIT Rd, Rn
Reverse bit order in a word. for(i=0;i<32;i++) Rd[i] RN[31–i]
REV Rd, Rn
Reverse byte order in a word.
Rd[31:24] Rn[7:0], Rd[23:16] Rn[15:8], Rd[15:8] Rn[23:16], Rd[7:0] Rn[31:24]
REV16 Rd, Rn
Reverse byte order in each half-word. Rd[15:8] Rn[7:0], Rd[7:0] Rn[15:8], Rd[31:24] Rn[23:16], Rd[23:16] Rn[31:24]
REVSH Rd, Rn
Reverse byte order in bottom half-word and sign extend. Rd[15:8] Rn[7:0], Rd[7:0] Rn[15:8],
Rd[31:16] Rn[7] & 0xFFFF
REV Rd, Rn Rn
Rd Example:
LDR R0, =0x12345678 REV R1, R0
; R0 = 0x12345678 ; R1 = 0x78563412
26 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Reverse Order
RBIT Rd, Rn
Reverse bit order in a word. for(i=0;i<32;i++) Rd[i] RN[31–i]
REV Rd, Rn
Reverse byte order in a word.
Rd[31:24] Rn[7:0], Rd[23:16] Rn[15:8], Rd[15:8] Rn[23:16], Rd[7:0] Rn[31:24]
REV16 Rd, Rn
Reverse byte order in each half-word. Rd[15:8] Rn[7:0], Rd[7:0] Rn[15:8], Rd[31:24] Rn[23:16], Rd[23:16] Rn[31:24]
REVSH Rd, Rn
Reverse byte order in bottom half-word and sign extend. Rd[15:8] Rn[7:0], Rd[7:0] Rn[15:8],
Rd[31:16] Rn[7] & 0xFFFF
REV16 Rd, Rn Rn
Rd Example:
LDR R0, =0x12345678 REV16 R2, R0
; R0 = 0x12345678 ; R2 = 0x34127856
27 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Reverse Order
RBIT Rd, Rn
Reverse bit order in a word. for(i=0;i<32;i++) Rd[i] RN[31–i]
REV Rd, Rn
Reverse byte order in a word.
Rd[31:24] Rn[7:0], Rd[23:16] Rn[15:8], Rd[15:8] Rn[23:16], Rd[7:0] Rn[31:24]
REV16 Rd, Rn
Reverse byte order in each half-word. Rd[15:8] Rn[7:0], Rd[7:0] Rn[15:8], Rd[31:24] Rn[23:16], Rd[23:16] Rn[31:24]
REVSH Rd, Rn
Reverse byte order in bottom half-word and sign extend. Rd[15:8] Rn[7:0], Rd[7:0] Rn[15:8],
Rd[31:16] Rn[7] & 0xFFFF
REVSH Rd, Rn Rn
Rd Example:
LDR R0, =0x33448899 REVSH R1, R0
; R0 = 0x33448899 ; R1 = 0xFFFF9988
28 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Sign and Zero Extension
int8_t a = -1; // a signed 8-bit integer, a = 0xFF
int16_t b = -2; int32_t c;
c = a; c = b;
// a signed 16-bit integer, b = 0xFFFE // a signed 32-bit integer
// sign extension required, c = 0xFFFFFFFF // sign extension required, c = 0xFFFFFFFE
29 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Sign and Zero Extension
SXTB {Rd,} Rm {,ROR #n}
Sign extend a byte.
Rd[31:0] Sign Extend((Rm ROR (8 × n))[7:0])
SXTH {Rd,} Rm {,ROR #n}
Sign extend a half-word.
Rd[31:0] Sign Extend((Rm ROR (8 × n))[15:0])
UXTB {Rd,} Rm {,ROR #n}
Zero extend a byte.
Rd[31:0] Zero Extend((Rm ROR (8 × n))[7:0])
UXTH {Rd,} Rm {,ROR #n}
Zero extend a half-word.
Rd[31:0] Zero Extend((Rm ROR (8 × n))[15:0])
LDR R0, =0x55AA8765
SXTB R1, R0 SXTH R1, R0 UXTB R1, R0 UXTH R1, R0
; R1 = 0x00000065 ; R1 = 0xFFFF8765 ; R1 = 0x00000065 ; R1 = 0x00008765
30 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Move Data between Registers
MOV
Rd operand2
MVN
Rd NOT operand2
MRS Rd, spec_reg
Move from special register to general register
MSR spec_reg, Rm
Move from general register to special register
MOV r4, r5
MVN r4, r5
MOV r1, r2, LSL #3 MOV r0, PC
MOV r1, SP
; Copy r5 to r4
; r4 = bitwise logical NOT of r5 ; r1 = r2 << 3
; Copy PC (r15) to r0
; Copy SP (r14) to r1
31 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Move Immediate Number to Register
MOVW Rd, #imm16
Move Wide, Rd #imm16
MOVT Rd, #imm16
Move Top, Rd #imm16 << 16
MOV Rd, #const
Move, Rd const
Example: Load a 32-bit number into a register
MOVW r0, #0x4321 ; r0 = 0x00004321
MOVT r0, #0x8765 ; r0 = 0x87654321
Order does matter!
• •
MOVW will zero the upper halfword MOVT won’t zero the lower halfword
MOVT r0, #0x8765 ; r0 = 0x8765xxxx MOVW r0, #0x4321 ; r0 = 0x00004321
32
Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
Barrel Shifter
The second operand of ALU has a special hardware called Barrel shifter
Example:
ADD r1, r0, r0, LSL #3 ; r1 = r0 + r0 << 3 = 9 × r0
33 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4
The Barrel Shifter
Logical Shift Left (LSL)
Arithmetic Shift Right (ASR)
Logical Shift Right (LSR)
Rotate Right (ROR)
Rotate Right Extended (RRX)
34 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu) Chapter 4
Why is there rotate right but no rotate left?
Rotate left can be replaced by a rotate right with a different rotate offset.
Implementation of Barrel Shifter
Typically, Barrel shifters are implemented as a cascade of parallel 2-to-1 multiplexers.
S1 S0
Y3 Y2 Y1
Y0
00
D3 D2 D1
D0
01
D0 D3 D2 D1
10
D1 D0 D3
D2
11
D2 D1 D0 D3
Example four-bit Barrel shifter that performs rotate right
35 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu) Chapter 4
Implementation of Barrel Shifter
Example: S1S0 = 00
S1 S0
Y3 Y2 Y1
Y0
00
D3 D2 D1
D0
01
D0 D3 D2 D1
10
D1 D0 D3
D2
11
D2 D1 D0 D3
Example four-bit Barrel shifter that performs rotate right
36 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu) Chapter 4
Implementation of Barrel Shifter
Example: S1S0 = 01
S1 S0
Y3 Y2 Y1
Y0
00
D3 D2 D1
D0
01
D0 D3 D2 D1
10
D1 D0 D3
D2
11
D2 D1 D0 D3
Example four-bit Barrel shifter that performs rotate right
37 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu) Chapter 4
Implementation of Barrel Shifter
Example: S1S0 = 10
S1 S0
Y3 Y2 Y1
Y0
00
D3 D2 D1
D0
01
D0 D3 D2 D1
10
D1 D0 D3
D2
11
D2 D1 D0 D3
Example four-bit Barrel shifter that performs rotate right
38 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu) Chapter 4
Implementation of Barrel Shifter
Example: S1S0 = 11
S1 S0
Y3 Y2 Y1
Y0
00
D3 D2 D1
D0
01
D0 D3 D2 D1
10
D1 D0 D3
D2
11
D2 D1 D0 D3
Example four-bit Barrel shifter that performs rotate right
39 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu) Chapter 4
Barrel Shifter
Examples:
ADD r1, r0, r0, LSL #3
; r1 = r0 + r0 << 3 = r0 + 8 × r0
ADD r1, r0, r0, LSR #3
; r1 = r0 + r0 >> 3 = r0 + r0/8 (unsigned)
ADD r1, r0, r0, ASR #3
; r1 = r0 + r0 >> 3 = r0 + r0/8 (signed)
Use Barrel shifter to speed up the application ADDr1,r0,r0,LSL#3 <=> MOVr2,#9 ;r2=9
MUL r1, r0, r2 ; r1 = r0 * 9
40 Embedded Systems with ARM Cortex-M Microcontrollers (Dr. Y. Zhu)
Chapter 4