CS3350B Computer Organization Chapter 4: Instruction-Level Parallelism Part 1: Pipelining
Alex Brandt
Department of Computer Science University of Western Ontario, Canada
Monday March 8, 2021
Alex Brandt
Chapter 4: ILP , Part 1: Pipelining
Monday March 8, 2021
1 / 30
Outline
1 Overview
2 Pipelining: An Analogy
3 Pipelining For Performance
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 2 / 30
Instruction-Level Parallelism
For a computer architecture, its instruction-level parallelism (ILP) is a measure of the number of instructions it can perform simultaneously.
ILP is usually achieved dynamically—after compile time—by the processor itself manipulating program execution.
Circuitry (and appropriate control signals) needs to be added to the processor to handle the execution of many instructions simultaneously and to handle the dynamic nature of ILP.
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 3 / 30
Achieving ILP
ILP can be achieved in many ways. Some topics we will look at:
Pipelining
Superscalar execution
VLIW – very long instruction word Register renaming
Branch prediction
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 4 / 30
“Pipelining” in Combinational Circuits
Break up a combinational circuit, reduce propagation delay, insert a register to store intermediate results, increase clock frequency.
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 5 / 30
Pipe, Pipeline, Pipelining
Unix pipe: pass data from one program to another. ls -la | grep “foo.txt”
Data pipeline: a sequential series of processing elements (CPUs, circuits, programs, etc.) where the output of one is passed as the input to another. Buffer storage is needed between elements to store temporary data.
Pipelining: a technique for instruction-level parallelism where each stage of the datapath is always kept busy. Instructions are overlapped.
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 6 / 30
Pipelining the RISC Datapath
Each stage is executing a different instruction. 5 stages ⇒ 5 instructions executed at once.
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 7 / 30
Outline
1 Overview
2 Pipelining: An Analogy
3 Pipelining For Performance
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 8 / 30
Doing Laundry
We have 4 loads of laundry to do: A, B, C, D.
To process each load we need to: ë Wash
ë Dry
ë Fold
ë Put-away
Each stage of doing laundry takes 30 minutes.
Could process each load sequentially or use pipelining.
Alex Brandt
Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 9 / 30
Doing Laundry: Sequentially
Each load of laundry is done one at a time: ë Wash A, Dry A, Fold A, Put-away A.
ë Wash B, Dry B, Fold B, Put-away B.
⋮
Takes 8 hours in total. There has to be a better way.
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 10 / 30
Doing Laundry: Pipeline
Each stage of doing laundry must process each load sequentially. But each load of laundry can overlap.
No dependency between drying load A and washing load B, etc. Put-away A while Folding B while drying C while washing D. Takes 3.5 hours in total.
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 11 / 30
Pipelining Terms via Analogy
Pipelining: many tasks (loads of laundry) being executed simultaneously using different resources (washer, dryer, etc.).
Time to complete a single task (latency) does not change.
ë Each load by itself still takes 2 hours. Number of tasks that can be completed in
one unit of time (throughput) increases. Potential speed up via pipelining equals the
number of stages in pipeline.
Actual speed-up never exactly equals potential.
Alex Brandt
Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 12 / 30
Pipelining Terms via Analogy
Actual speed-up never exactly equals potential.
Fill time: time taken to “fill” the pipeline. Initially, not every stage is used.
Drain time: time taken to “empty” the pipeline. Not all stages are used once the last task begins.
Imagine a new washing machine takes only 20 minutes. This does not increase pipeline speed.
ë Dryer still takes 30 minutes.
ë Washer must wait for dryer to finish before
laundry can move from washer to dryer.
Alex Brandt
Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 13 / 30
Outline
1 Overview
2 Pipelining: An Analogy
3 Pipelining For Performance
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 14 / 30
The RISC Datapath
IF ID EXMEMWB
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 15 / 30
Review: Single Cycle Datapath
Clock cycle is long enough to handle critical path through datapath. Time for data to pass through entire datapath.
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 16 / 30
Performance of Single Cycle Datapath
Let’s assume that accessing memory takes 200ps and ALU propagation delay is 200ps.
ë IF stage, EX stage, MEM stage.
Let’s assume accessing registers takes 100ps.
ë ID stage, WB stage.
What is the minimum clock cycle?
ë Sum of all stages since some instructions use all stages. ë 200+100+200+200+100=800ps.
Instr. IF ID
R-type 200ps 100ps Branch 200ps 100ps sw 200ps 100ps
EX MEM WB
200ps – 100ps 200ps – – 200ps 200ps –
Total
600ps 500ps 700ps
lw 200ps 100ps 200ps 200ps 100ps 800ps
Alex Brandt
Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 17 / 30
Improving Performance of Datapath
Clock frequency
Parallel execution of instructions via overlap: pipelining.
Superscalar, VLIW (to come later).
Branch prediction (to come later).
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 18 / 30
Review: Multi-Cycle for Combinational Circuits
Break up a combinational circuit, reduce propagation delay, insert a register to store intermediate results, increase clock frequency.
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 19 / 30
Multi-Cycle Datapath for MIPS
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 20 / 30
Multi-Cycle Datapath
Clock cycle is long enough to handle slowest stage of the pipeline. Time for data to pass through one (the slowest) stage of pipeline.
Example: Minimum clock cycle is 200ps.
Instr. IF R-type 200ps
Branch 200ps
sw 200ps
lw 200ps 100ps 200ps 200ps 100ps 800ps
ID EX 100ps 200ps
MEM WB – 100ps
Total
100ps 200ps 100ps 200ps
– – 200ps –
600ps 500ps 700ps
Alex Brandt
Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 21 / 30
Pipelining for Performance
Further increase clock frequency?
Could break up datapath into more and more stages but…
ë More registers.
ë More complexity in datapath and controller design ⇒ overhead. ë Still limited by slowest stage (memory).
Leverage the parallelism gained by pipelining.
Parallelism in execution of instructions yields fewer cycles per
instruction (CPI)
The Classic Performance Equation
CPU time = Instruction_count × CPI × clock_cycle or
CPU time = Instruction_count × CPI ⇑ clock_rate
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 22 / 30
RISC Pipeline Performance
Overlap instructions, start the next before the former completes.
Some instructions will “waste” a cycle as they flow through unused stages.
lw sw add
Cycle 1 IFetch
Cycle 2 Dec IFetch
Cycle 3 Exec Dec IFetch
Cycle 4 Mem Exec Dec
Cycle 5 WB Mem Exec
Cycle 6 Cycle 7 Cycle 8
WB
Mem WB
Latency: time to complete one instruction. Does not decrease with pipelining. May actually increase slightly if each stage originally did not take the same amount of time.
Throughput: number of instructions that can be completed in some amount of time. Increases with pipelining.
Once pipeline is full CPI is 1.
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 23 / 30
Performance: With and Without Pipelining
𝑇𝑐 = clock cycle time
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 24 / 30
Pipeline Parallelism
time to drain pipeline
Potential speed-up via parallelism is equal to the number of stages. 5 stages ⇒ 5x potential speed up.
A pipeline is “full” when every stage is occupied by an instruction (every stage does not have to necessarily be doing work).
Pipeline fill time and drain time reduce actual speed up.
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 25 / 30
Quantifying Pipelined Speedup
If the time for each stage is the same:
Ideal Speedup = Number of Stages
If the time for each stage is not the same:
Ideal Speedup = Time between instructionsnon-pipelined
Time between instructionspipelined
Actual Speedup = Time to completenon-pipelined Time to completepipelined
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 26 / 30
Calculating Speedup
From previous example:
Single-cycle datapath: 800ps clock cycle.
Pipelined: 200ps clock cycle.
Uneven time for each stage. ID and WB only 100ps. 3 lw instructions.
Ideal Speedup = 800 = 4 Actual Speedup = 2400 = 1.714 200 1400
If we have 1000000 lw instructions?
Actual Speedup = 1000000 × 800 ≈ 4
1000000 × 200 + 800
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 27 / 30
Calculating Pipelined Time
Classic Performance Equation:
CPU time = Instruction_count × CPI × clock cycle
Time for pipelined execution:
Timepipelined = Fill time + (IC × clock cycle)
(Assuming no stalls or hazards.)
Once pipeline is full, one instr. completes every cycle ⇒ CPI is 1.
ë Gives IC × 1 × clock cycle
Pipeline is only not full during fill or drain time.
Fill time = Drain time = (number of stages – 1) × clock cycle
ë Assuming number of instructions > number of stages.
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 28 / 30
Calculating Pipelined Time
time to drain pipeline
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 29 / 30
Summary
Pipelining is the simultaneous execution of multiple instructions each in a different stage of the datapath.
Pipelining gives increased clock frequency by multi-cycle datapath. Limited by the slowest stage.
Pipelining gives essentially a CPI of 1.
Speed-up must account for fill time and drain time.
All of the discussion so far assumed there is no conflicts between instructions, hardware, circuits, etc.
ë Pipeline hazards severely impact performance and potential speed-up. ë Chapter 4: Part 2: Pipeline hazards.
Alex Brandt Chapter 4: ILP , Part 1: Pipelining Monday March 8, 2021 30 / 30