DESN2000: Engineering Design & Professional Practice (EE&T)
Control flow & conditional operations
School of Electrical Engineering & Telecommunications Graduate School of Biomedical Engineering
Copyright By PowCoder代写 加微信 powcoder
Biomedical Microsystems Lab
Introduction: controlling execution flow
Condition code flags & conditional execution
Branch instructions
Selection structures: if … else … Repetition structures
Jump tables
© 2022 UNSW Sydney
Introduction: controlling execution
• High-level programming constructs: 1. Sequence
2. Selection
3. Repetition
4. Function
if {…} then {…} else {…} while {…}
int func(…) {…}
• Mapping these into assembly code:
Sequence: PC automatically incremented by 4 bytes on each instruction fetch. Selection: two possibilities in ARM:
branch instructions
conditional execution (of most instructions) Repetition: using branch instructions.
Functions: achieved through branch instructions.
© 2022 UNSW Sydney
Conditional flags & conditional execution
Condition flags in CPSR:
N (negative)
N=1whenthemostsignificantbit(MSB)oftheALUoutputis’1’. s31 Z (zero)
a31 …a0 b31 …b0
Z = 1 when the ALU output is zero.
s31 ^s30 ^…^s0
C = 1 when there is a carry out from the MSB. The C flag is also
changed by the shifting operations. cout
V (overflow)
V = 1 when the result cannot be represented in 32 bits following a signed arithmetic operation. cin cout
Program Status Register
• Both CPSR and SPSR have the following format
27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8
Do not modify / Read as Zero
© 2022 UNSW Sydney • Bit [31:28] – conditional flags
– N – Negative
Conditional flags & conditional execution
• Condition flags can be updated by:
• The ‘S’ option of data processing instructions: ADDS, MOVS, …
• Flag-setting instructions: CMP, CMN, TST, TEQ.
• Shift operations (only update C flag).
• Special instructions to edit the CPSR bits.
Program Status Register
• Both CPSR and SPSR have the following format
27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8
Do not modify / Read as Zero
© 2022 UNSW Sydney • Bit [31:28] – conditional flags
– N – Negative
Flag setting instructions
• CMP (compare): compares 1st and 2nd operands by performing a subtraction, while
setting the flags.
CMP R1, #10 ; (R1-10) and sets the flags CMP R1, R0 ; (R1-R0) and sets the flags CMP R1, R0, LSL #3 ; (R1-R0×23) and sets the flags
• CMN (compare negative): Compares 1st operand with the negative of the 2nd operand, while setting the flags.
CMN R1, R0 ; (R1+R0) and sets the flags CMN R1, R0, LSL #3 ; (R1+R0×23) and sets the flags
• Assembler will replace CMP with CMN when appropriate, e.g. CMP R1, #-10 will be replaced by CMN R1, #10.
© 2022 UNSW Sydney
Flag setting instructions
• TST (test bits): logical AND between operands. Affects all but the V flag.
TST R1, #0x14 TST R1, R0
TST R1, R0, LSL#3
; (R1 AND 0x14) and sets the flags ; (R1 AND R0) and sets the flags
; (R1 AND R0×23) and sets the flags
• TEQ (test equivalence): logical XOR between operands. Affects all but the V flag. TEQ can be used to check whether two values are the same.
TEQ R1, R0 ; (R1 XOR R0) and sets the flags TEQ R1, R0, LSL #3 ; (R1 XOR R0×23) and sets the flags
© 2022 UNSW Sydney
Flag setting instructions
Example 1. What are the condition flags after the last instruction?
LDR R0, =#0xFFFFFFFF LDR R1, =#0xFFFFFFFF ADDS R1, R1, R0
; 1111 … 1111 ; 1111 … 1111 ; ———– ; 1 1111 … 1110
N=1 Z=0 C=1 V=0
Example 2. What are the condition flags after the last instruction?
Result characteristics:
MSB=1, non-zero, has carry-out, Carry-in ⊗ carry-out = 0
LDR R1, =0x7FFFFFFF MVN R2, #0x0 SUBSR1,R1,R2,LSL#2
; R1: 0111 … 1111 ; R2: 1111 … 1111 ;R2LSL#2:1111…1100 ;11 ; R1: 0111 … 1111 ; R2 1’comp: 0000 … 0011
R1-R2=R1+not(R2) +1
Result characteristics:
MSB=1, non-zero, no carry-out, Carry-in ⊗ carry-out = 1
———– 0 1000 … 0011
N=1 Z=0 C=0 V=1
© 2022 UNSW Sydney
Using flags for conditional execution
• ARM instructions can execute conditionally.
• The condition is
• specified with a two-letter suffix (next slide), e.g. EQ, CC, …
• tested against the CPSR flags.
• The instruction is a no-op if the conditions are not met.
Often avoids the need for branch (B, BL), so no pipeline stalls (will see why) and increasing throughput. It also increase code density.
• Data processing instructions can set condition flags by suffixing S. The comparison instructions CMP and TST do this implicitly.
• Example: repeat while ≠ 1
MOV r1, #10
SUBSr1,r1,#1 ; i=i-1 BNE loop ; while (i != 0)
© 2022 UNSW Sydney
Using flags for conditional execution
• Example: if-statement. The last two instructions are executed only if a == 5
CMP r0,#5 ;if(a==5) MOVEQ r0, #10
BLEQ fn ; fn(10)
• Example: if … else … statement.
CMP r0,#0 MOVLEr0,#0 MOVGT r0, #1
;if(x<=0) ; x=0 ; else
• Example: if (... || ...) statement.
CMP r0, #'A’ ; if (c == 'A’ CMPNE r0, #'B’ ; || c == 'B’) MOVEQr1,#1 ; y=1
C evaluates logical operators left-to-right with short-circuiting (only check 2nd possibility if 1st is false).
© 2022 UNSW Sydney
Using flags for conditional execution
Conditional codes, status flags ...
Field Mnemonic
Conditional code
Flags status
Not Equal Unsigned ≥
Unsigned <
Positive or zero
No overflow
C set and Z clear
Unsigned >
C clear and Z set
Unsigned ≤
Z clear, N = V
Z set, N ≠ V
© 2022 UNSW Sydney
Using flags for conditional execution
• Can combine the S bit with conditional execution:
ADDEQS r0, r1, r2
• Branching vs conditional execution: the CPU’s branch penalty is at least 3 cycles.
• The ARMCC (not GCC) compiler make extensive use of condition codes in an optimization step called branch removal.
© 2022 UNSW Sydney
Branch instructions
Two branch instructions:
B (branch): Regular branch. Often required to implement loops.
BL (branch and link): Branch to a new location, then return to the original location. The link register (LR or R14) is used to hold the return address. Used to implement subroutines and function calls.
The branch instruction bit fields:
Condition flags Address offset (PC + offset x 22)
31 28 27 24 23 0 XXXX101L
24-bit signed offset
© 2022 UNSW Sydney
Branch instructions
ARM address bus is 32-bit wide. How to specify 32-bit branch target address with 32- bit instruction? PC-relative addressing.
Branch address = (current PC) + (offset x 22).
The 24-bit signed offset specifies the word offset from the current PC. The offset is multiplied by 4 during target address computation.
24-bits signed offset ⇒ ± 32MB limit for branch target. Problem: cannot branch more than 32 MB away from PC.
31 28 27 24 23 0 XXXX101L
24-bit signed offset
© 2022 UNSW Sydney
Branch instructions: 3-stage pipeline
Instruc,on (Opcode)
12345 FDELA
F: : Decode E: Execute L: Linkret A: Adjust
© 2022 UNSW Sydney
Branch instructions: 3-stage pipeline
Instruc,on (Opcode)
12 45 FD LA
0x8008 XXX F
F: : Decode E: Execute L: Linkret A: Adjust
© 2022 UNSW Sydney
Branch instructions: 3-stage pipeline
Instruc,on (Opcode)
123 5 FDE A
0x8FEC ADD F
F: : Decode E: Execute L: Linkret A: Adjust
© 2022 UNSW Sydney
Branch instructions: 3-stage pipeline
Instruc,on (Opcode)
123 5 FDE A
0x8FEC ADD F
1. Discard two instr
.. wasted work/power
2. Decoder & Executor circuit idle
F: : Decode E: Execute L: Linkret A: Adjust
© 2022 UNSW Sydney
Branch instructions: 3-stage pipeline
Instruc,on (Opcode)
0x8FF0 SUB
1234 F D E L
F: : Decode E: Execute L: Linkret A: Adjust
Executor circuit idle
© 2022 UNSW Sydney
Branch instructions: 3-stage pipeline
Instruc,on (Opcode)
FDELA F:Fetch
E: Execute
L: Linkret
Executing circuit working again, after sitting out 2 cycles
0x8FF4 MOV
© 2022 UNSW Sydney
Pipeline full
Branch instructions: 3-stage pipeline
Instruc,on (Opcode)
3 clock cycles
0x8FF4 MOV
12345 FDELA
F: : Decode E: Execute L: Linkret A: Adjust
© 2022 UNSW Sydney
Branch instructions: 3-stage pipeline
Instruc,on (Opcode)
12345 F D E
F: : Decode E: Execute L: Linkret A: Adjust
© 2022 UNSW Sydney
Additional house-keeping
done during pipeline stall (optimisation)
Branch instruction: summary
We have seen:
In cycle 3, the BL instruction is at the execution stage while instructions at 0x8004 and 0x8008 are being decoded and fetched, respectively. If the branch is taken, these two instructions should not proceed to execute and decode during cycle 4.
So whenever a branch is taken, the instructions already fetched and decoded are thrown away.
Once BL is executed, CPU fetches the next instruction from the branch target address 0x8FEC.
Additional house-keeping for linking:
In cycle 4, current PC value 0x8008 is stored in the LR (R14) However, this is not the proper return address.
In cycle 5, LR is adjusted as LR – 4, thereby pointing to the correct return address 0x8004.
© 2022 UNSW Sydney
Conditional branching
B
Selection structures
• Example: with the following integer variables and register assignments: f →V1, g→V2, h→V3, i→V4, j→V5
Consider the following example.
if (i == j)
f = g + h;
elsef = g – h; true i == j
ARM assembly:
CMP V4, V5
SUB V1, V2, V3
ADD V1, V2, V3
• Q: how many clock cycles does this take?
i == j: 1 + 3 + 1 = 5 i!=j: 1+1+1+3=6
© 2022 UNSW Sydney
Selection structures
• Example: with the following integer variables and register assignments: f →V1, g→V2, h→V3, i→V4, j→V5
Consider the following example.
if (i == j)
f = g + h;
elsef = g – h;
ARM assembly:
CMP V4, V5
SUB V1, V2, V3
ADD V1, V2, V3
• Q: can you make it more efficient?
© 2022 UNSW Sydney
ADDEQV1,V2,V3 ; f=g+h SUBNEV1,V2,V3 ;elsef=g-h
; if (i==j)
Repetition structures
• while loops:
loop …
eval xxxx ; evaluate cond
loop xxxx ; evaluate cond BNE exit
while (cond==true) { …
© 2022 UNSW Sydney
Repetition structures
• do … while loops:
loop …
xxxx ; eval cond BEQ loop
} while (cond==true)
• for loops:
MOV R1, #10
loop …
SUBS R1, R1, #1 BNE loop
MOV R1, #0 loop CMP R1, #10
ADD R1, R1, #1
B loop exit
for (j=0; j<10; j++) { ...
Shorter, Starting from 10 and counts down to 0
© 2022 UNSW Sydney
Repetition structures
• Example: translate the following c program to assembly.
for (i = 0; i < 8; i++) {
a[i] = b[7-i];
• Arrays a[] and b[] contain bytes. Array a[] has base address 0x40000000, located in the SRAM. Array b[] is place within the program memory using DCB directives.
• Two considerations: the algorithms, memory arrangement.
• Algorithm: swapping elements between a[] and b[].
© 2022 UNSW Sydney
Repetition structures
• Index upwards
AREA Progloop, CODE, READONLY
SRAM_BASE EQU 0x40000000 ENTRY
Define location of a[]
MOV r0, #0
ADR r1, arrayb MOV r2, #SRAM_BASE
loop CMP r0, #8 BGE done
RSB r3, r0, #7 LDRB r5, [r1,r3] STRB r5, [r2,r0] ADD r0, r0, #1 B loop
done B done
; load base addr of b[] into r1 ; load base addr of a[] into r2
; if i >= 10, finish ; index = 7-i
; load b[7-i]
; store to a[i]
arrayb DCB 0xA, 0x9, 0x8, 0x7, 0x6, 0x5, 0x4, 0x3 END
at end of program space
© 2022 UNSW Sydney
Repetition structures
• Index upwards
AREA Progloop, CODE, READONLY SRAM_BASE EQU 0x40000000
Pseudo instr. to copy mem addr to reg
MOV r0, #0
ADR r1, arrayb MOV r2, #SRAM_BASE
loop CMP r0, #8 BGE done
RSB r3, r0, #7 LDRB r5, [r1,r3] STRB r5, [r2,r0] ADD r0, r0, #1 B loop
done B done
; load base addr of b[] into r1 ; load base addr of a[] into r2
; if i >= 10, finish ; index = 7-i
; load b[7-i]
; store to a[i]
arrayb DCB 0xA, 0x9, 0x8, 0x7, 0x6, 0x5, 0x4, 0x3 END
Set up mem. addresses
© 2022 UNSW Sydney
Repetition structures
• Index upwards
AREA Progloop, CODE, READONLY SRAM_BASE EQU 0x40000000
MOV r0, #0
ADR r1, arrayb MOV r2, #SRAM_BASE
loop CMP r0, #8 BGE done
RSB r3, r0, #7 LDRB r5, [r1,r3] STRB r5, [r2,r0] ADD r0, r0, #1 B loop
done B done
; load base addr of b[] into r1 ; load base addr of a[] into r2
arrayb DCB 0xA, 0x9, 0x8, 0x7, 0x6, 0x5, 0x4, 0x3 END
; if i >= 10, finish ; index = 7-i
; load b[7-i]
; store to a[i]
© 2022 UNSW Sydney
Repetition structures
• Example: translate the following c program to assembly.
for (i = 0; i < 6; i++) {
sum += a[i]; }
• The array a[] should be declared within the code space (program memory) using appropriate directives. All variables are integers.
© 2022 UNSW Sydney
Repetition structures
• Indexing upwards
AREA Progloop, CODE, READONLY ENTRY
MOVr0,#0 ;i MOV r1, #0 ; sum ADR r2, arraya
loop CMP r0, #6 BGE done
LDR r3, [r2, r0, LSL #2] ADD r1, r1, r3
ADD r0, r0, #1
done B done
arraya DCD -1,-2,-3,-4,-5,-6 END
Summing loop
Place a[] at end of program space, note ’D’ for 32-bit width
© 2022 UNSW Sydney
Repetition structures
• Indexing upwards
AREA Progloop, CODE, READONLY ENTRY
MOVr0,#0 ;i MOV r1, #0 ; sum ADR r2, arraya
loop CMP r0, #6 BGE done
4 x offset (
LDR r3, [r2, r0, LSL #2] ADD r1, r1, r3
ADD r0, r0, #1
done B done
arraya DCD -1,-2,-3,-4,-5,-6 END
32-bit wide)
© 2022 UNSW Sydney
Repetition structures
• Indexing upwards
AREA Progloop, CODE, READONLY ENTRY
MOVr0,#0 1;i MOVr1,#0 1;sum ADR r2, arraya 1 CMP r0, #6 1
3 LDR r3, [r2, r0, LSL #2] 1 ADD r1, r1, r3
1 ADD r0, r0, #1
DCD -1,-2,-3,-4,-5,-6 END
1 BGE done 3
7th - exit
done arraya
• Q: how many CPU cycles does this summation code use?
Cycles = 3 + (10 * 6 repetitions) + 4 = 67
Most instr.: 1 cycle B/BL: 3 cycles
Memory access: 3 cycles
© 2022 UNSW Sydney
More branch removal
Example: translate the following c program to assembly.
if (var == ‘!’ || var == ‘?’) found++;
The variables var and found correspond to registers R0 and R1, respectively.
Standard implementation:
Optimized implementation:
TEQ R0, #'!'
TEQ R0, #'?'
true ADD R1, R1, #1
© 2022 UNSW Sydney
≥ 5 cycles
TEQ R0, #'!'
TEQNE R0, #'?'
ADDEQ R1, R1, #1
Case selection structure
• Switch case statements:
switch (k) {
f=i+j; break; case 1:
f=g+h; break; case 2:
f=g-h; break; case 3:
f=i-j; break;
• Nested if ... else ladder:
elseif (k==1) f=g+h;
elseif (k==2) f=g-h;
elseif (k==3) f=i-j;
Assuming f →V1, i→V2, j→V3, g→V4, h→V5 , k→V6
CMP v6, #0
ADD v1, v2, v3
L1 CMP v6, #1 BNE L2
ADD v1, v4, v5
L2 CMP v6, #2
SUB v1, v4, v5
L3 CMP v6, #3 BNE exit
SUB v1, v2, v3
If no match: 4 branches... very inefficient
© 2022 UNSW Sydney
Efficient case selection structure: jump table
• General strategy: avoid branch instruction, index to target code by changing the PC. • Implementation:
1. Check whether the controlling variable is within range (0 ≤ V6 ≤ 3): CMP V6, #0
CMP V6, #3
2. Implement the code for each case as separate code block with a label.
3. Put these labels into a table using the DCD directive and load the starting address of
this table into a base register using the ADR instruction.
4. Change the PC using LDR with this base register and the controlling variable as the
© 2022 UNSW Sydney
Efficient case selection structure: jump table
CMP V6, #3
ADR R0, JUMP_TABLE
LDR PC, [R0, V6, LSL #2]
JUMP_TABLE DCD L0, L1, L2, L3 L0 .
. Case 0 code
exit B exit
JUMP_TABLE
&L0 &L1 &L2
© 2022 UNSW Sydney
Case 1 code
Case 2 code
Case 3 code
L0 code &L3 L1 code
Ensuring 0 ≤ V6 ≤ 3
Branch to case code directly
Table of case labels (label = code address)
Program mem. space
Efficient case selection structure: jump table
AREA jumpexample, CODE, READONLY EQU 2
MOV r0, #0
MOV r1, #3
MOV r2, #2
BL arithfunc
B stop
CMP r0, #num
MOVHS pc, lr
ADR r3, jumptable
LDR pc, [r3, r0, LSL #2] DCD doAdd, doSub
ADD r0, r1, r2
MOV pc, lr
SUB r0, r1, r2
MOV pc, lr
; case selection
; put ret addr in lr (r14)
; return if r0 >= #num
switch (r0) {
return r1+r2;
return r1-r2;
© 2022 UNSW Sydney
Introduction: controlling execution flow
Condition code flags & conditional execution
Branch instructions
Selection structures: if … else … Repetition structures
Jump tables
In Moodle:
• Start working on Lab (Due: end of your 3-hr lab)
• Start doing Week 4 exercise
© 2022 UNSW Sydney
References
[1] , ARM Assembly Language: Fundamentals and Techniques, CRC Press, 2015 (2nd Edition).
[2] ARM Architecture Reference Manual.
© 2022 UNSW Sydney
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com