CS代写 CS2305: Computer Architecture

CS2305: Computer Architecture
Instruction-level
Parallelism
(Computer Architecture: Chapter 3 & Appendix C)

Department of Computer Science and Engineering
Shanghai University

Chapter 3: ILP and Its Exploitation
p 3.1 Concepts and Challenges
p 3.2 Basic Compiler Techniques
p 3.3 Reducing Branch Costs with Advanced Branch Prediction
p 3.4 Overcoming Data Hazards with Dynamic Scheduling
p 3.5 Hardware-based Speculation

Chapter 3: ILP and Its Exploitation
Background
p “Instruction Level Parallelism (ILP)” refers to overlap execution of instructions
m Pipelining become universal technique in 1985 p There are two main approaches to exploit ILP
m Hardware-based dynamic approaches n Used in server and desktop processors
m Compiler-based static approaches
n Not as successful outside of scientific applications
The goal of this chapter:
1) to look at the limitation imposed by data and control hazards;
2) Investigate the topic of increasing the ability of the compiler (software) and the processor (hardware) to exploit ILP

Chapter 3: ILP and Its Exploitation
Pipeline CPI
When exploiting instruction-level parallelism, the goal is to minimize CPI
Pipeline CPI = Ideal pipeline CPI + Structural Stalls + Data hazard stalls + Control Stalls
Ideal pipeline CPI is a measure of the maximum performance attainable by the implementation
By reducing each of the terms in the right-hand side, we decrease the overall pipeline CPI or improve IPC (instructions per clock)

Chapter 3: ILP and Its Exploitation
Major Techniques for Improving Pipeline CPI

Chapter 3: ILP and Its Exploitation
What is Instruction-Level Parallelism
p Basic block: a straight-line code sequence with no branches in except to the entry and no branches out except at the exit
p Parallelism with a basic block is limited
m The average dynamic branch frequency is often between
15% and 25%
m Thus, typical size of basic block = 3-6 instructions
p Conclusion: Must optimize across branches

Chapter 3: ILP and Its Exploitation
Dependences and Hazards
Dependences: Hazards:
When there are hazards, the pipeline may be stopped in order to resolve the hazards
Determining how one instruction depends on another is critical to determining
1) how much parallelism exists in a program and
2) how that parallelism can be exploited!

Chapter 3: ILP and Its Exploitation
Three Types of Dependences
Data dependences (also called true data dependences) Data flow between instructions
Name dependences (anti- dependence and output dependence)
Control dependence

Chapter 3: ILP and Its Exploitation
Data Dependences
p Dependencies are a property of programs
p Pipeline organization determines if a dependence results in an actual hazard being detected and if the hazard causes a stall
p Data dependence conveys:
m Possibility of a hazard
m Order in which results must be calculated
m Upper bound on exploitable instruction level parallelism

Chapter 3: ILP and Its Exploitation
Name Dependences
p Two instructions use the same register or memory location but no flow of information
m Not a true data dependence, but is a problem when reordering instructions
mAntidependence: instructionjwritesaregisteror memory location that instruction i reads
n Initial ordering (i before j) must be preserved
m Output dependence: instruction i and instruction j write the same register or memory location
n Ordering must be preserved
p To resolve, use renaming techniques

Chapter 3: ILP and Its Exploitation
Control Dependence
p Ordering of instruction i with respect to a branch instruction
m Instruction control dependent on a branch cannot be moved before the branch so that its execution is no longer controlled by the branch
m An instruction not control dependent on a branch cannot be moved after the branch so that its execution is controlled by the branch

Chapter 3: ILP and Its Exploitation
Data Hazards
Read after write (RAW)
• j tries to read a source before i writes it • Program order must be preserved!
Write after write (WAW)
• j tries to write an operand before it is written by i Write after read (WAR)
• j tries to write a destination before it is read by i, so i incorrectly gets the new value

Chapter 3: ILP and Its Exploitation
Latencies of FP operations
To avoid a pipeline stall, the execution of a dependent instruction must be separated from the source instruction by a distance in clock cycles equal to the pipeline latency of that source instruction
A compiler’s ability to perform pipeline scheduling is dependent both on the amount of ILP available in the program and on the latencies of the functional units in the pipeline

Chapter 3: ILP and Its Exploitation
Pipeline Stalls
for (i=999; i>=0; i=i-1) x[i] = x[i] + s;
L.D F0,0(R1)
ADD.D F4,F0,F2
S.D F4,0(R1)
DADDUI R1,R1,#-8
stall (assume integer load latency is 1) BNE R1,R2,Loop

Chapter 3: ILP and Its Exploitation
Pipeline Scheduling
Scheduled code:
Loop:L.D F0,0(R1) DADDUI R1,R1,#-8
ADD.D F4,F0,F2
S.D F4,8(R1) BNE R1,R2,Loop

Chapter 3: ILP and Its Exploitation
Loop Unrolling
Unroll by a factor of 4 (assume # elements is divisible by 4) Eliminate unnecessary instructions
L.D F0,0(R1)
ADD.D F4,F0,F2
S.D F4,0(R1) ;drop DADDUI & BNE L.D F6,-8(R1)
ADD.D F8,F6,F2
S.D F8,-8(R1) ;drop DADDUI & BNE L.D F10,-16(R1)
ADD.D F12,F10,F2
S.D F12,-16(R1) ;drop DADDUI & BNE L.D F14,-24(R1)
ADD.D F16,F14,F2
S.D F16,-24(R1)
DADDUI R1,R1,#-32
BNE R1,R2,Loop

Chapter 3: ILP and Its Exploitation
Loop Unrolling/Pipeline Scheduling
L.D F0,0(R1)
L.D F6,-8(R1)
L.D F10,-16(R1) L.D F14,-24(R1) ADD.D F4,F0,F2 ADD.D F8,F6,F2 ADD.D F12,F10,F2 ADD.D F16,F14,F2 S.D F4,0(R1)
S.D F8,-8(R1) DADDUI R1,R1,#-32 S.D F12,16(R1)
S.D F16,8(R1)
BNE R1,R2,Loop

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts