PowerPoint 演示文稿
CO101
Principle of Computer
Organization
Lecture 11: Pipelined MIPS
Processor 1
Liang Yanyan
澳門科技大學
Macau of University of Science and Technology
Pipelining: Laundry example
• Four persons: A, B, C, and D.
• Each has one load of clothes
to wash, dry, fold, and stash.
• Assume
• “Washer” takes 30 minutes.
• “Dryer” takes 30 minutes.
• “Folder” takes 30 minutes.
• “Stasher” takes 30 minutes
to put clothes into drawers.
2
Wash: 30 mins
Dry: 30 mins
Fold: 30 mins
Stash: 30 mins
Sequential Laundry
3
• If each load is done sequentially: 8 hours for 4 loads.
• Question: can B washes after A has finished washing?
30
Time
30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
6 PM 7 8 9 10 11 12 1 2 AM
A
B
C
D
Pipelined Laundry
4
• Start work as soon as possible.
• It takes 3.5 hours for 4 loads!
• Speedup = 8/3.5 = 2.3
• What is the speedup if there are 10000 loads?
12 2 AM 6 PM 7 8 9 10 11 1
Time 30 30 30 30 30 30 30
A
B
C
D
Pipelining Lessons
5
• Multiple tasks operate
simultaneously using distinct
resources.
• Pipelining doesn’t improve latency
of individual task, it improves
throughput of entire workload.
• Potential speedup = Number of
pipeline stages.
• Time to “fill” pipeline and time to
“drain” it reduces speedup.
• Pipeline rate (throughput) limited
by slowest pipeline stage (i.e.
slowest task)
• Wash takes 300 mins?
• Unbalanced lengths of
pipeline stages reduces
throughput.
6 PM 7 8 9
Time
30 30 30 30 30 30 30
A
B
C
D
Use pipelining to improve speedup by increasing
throughput.
Question:
• Can we improve the performance of processor using
pipelining?
• Increase the speedup (throughput) by pipelining?
6
Review: Instruction Critical Paths
• Calculate cycle time assuming negligible delays (for
muxes, control unit, sign extend, PC access, shift left 2,
wires) except:
• Instruction and Data Memory (4 ns)
• ALU and adders (2 ns)
• Register File access (reads or writes) (1 ns)
7
LW instruction: contains 5 steps
• IF: Instruction fetch from instruction memory.
• ID: Instruction decode and register fetch.
• EX: Calculate the memory address.
• MEM: Read the data from the data memory.
• WB: Write the data back to the register file.
8
IF ID EX MEM WB LW
Observation of single cycle processor
• Each step of LW uses distinct functional unit.
• Most of functional units are idle during the execution.
• E.g. ALU is idle during IF, ID, etc.
• Put slackers back to work.
9
How Can We Make It Faster?
• Start fetching and executing the next instruction before
the current one has completed.
• Pipelining – modern processors are almost pipelined for
performance.
• Remember the performance equation:
• CPU time = IC * CPI * Clock Cycle
• Under ideal conditions and with a large number of
instructions, the speedup from pipelining is
approximately equal to the number of pipe stages.
10
Single cycle processor: partitioning
11
MIPS Pipeline Datapath Additions
• State registers between each pipeline stage to isolate them.
12
Pipeline Design
• Can we fetch the 2nd LW instruction when decoding the
1st LW instruction?
13
IF and ID together
14
Pipelining LW instructions
• Pipeline depth is the number of stages, in here, 5.
• Cycle 1 – 4 : the pipeline is filing, as there are unused functional
units.
• Cycle 5: the pipeline is full, all functional units are used.
• Cycle 6 – 9: the pipeline is flushing (emptying).
15
Execution
sequence
Pipeline execution diagram.
Pipelined processor
• Time on filling the pipeline (assume cycle time = 200ps)
• 4 cycles = 4 x 200ps
• Time on pipeline is full and emptying
• 10000 cycles = 10000 x 200ps
• Total time
• 10004 x 200ps ≈ 2M ps
• Speedup? Why cycle time is 200ps?
16
Once the
pipeline is full,
one instruction
is completed
every cycle, so
CPI = 1.
A Pipelined MIPS Processor
• Start the next instruction before the current one has
completed.
• improves throughput – total amount of work done in a given time.
• instruction latency (execution time, delay time, response time –
time from the start of an instruction to its completion) is not
reduced.
• clock cycle (pipeline stage time) is limited by the slowest
stage.
• for some stages don’t need the whole clock cycle (e.g., WB).
• for some instructions, some stages are wasted cycles (i.e.,
nothing is done during MEM and WB cycle for beq instruction).
17
IF ID EX do nothing do nothing beq
Graphically Representing MIPS Pipeline
• Can help with answering questions like:
• How many cycles does it take to execute this code?
• What is the ALU doing during cycle 4?
• Is there a hazard, why does it occur, and how can it be fixed?
18
Other Pipeline Structures Are Possible
• What about the (slow) multiply operation?
• Make the clock twice as slow or …
• let it take two cycles (since it doesn’t use the DM stage)
• What if the data memory access is twice as slow as the
instruction memory?
• make the clock twice as slow or …
• let data memory access take two cycles (and keep the same
clock rate)
19
Other Sample Pipeline Alternatives
• ARM7
• XScale
20
Can Pipelining Get Us Into Trouble?
• Yes: Pipeline Hazards
• structural hazards:
• a required resource is busy
• data hazards:
• attempt to use data before it is ready
• control hazards:
• deciding on control action depends on previous instruction
• Can usually resolve hazards by waiting
• pipeline control must detect the hazard
• and take action to resolve hazards
21
Laundry example
• What happen if we only have one machine which
combines washing and drying functions?
22
= +
12 2 AM 6 PM 7 8 9 10 11 1
Time 30 30 30 30 30 30 30
A
B
C
D
B cannot start wash, as A is using the machine
for drying, this is called structural hazard!
Structural Hazards
23
Attempt to use the same resource in two different ways at the same time.
However, the hardware cannot support two writes simultaneously.
Resolve Structural Hazards 1
24
Structural Hazards
• There are other conflicts for use of a
resource.
• In MIPS pipeline with a single memory
• Load/store requires data access
• Hence, pipeline datapaths require separate
instruction/data memories, or separate
instruction/data cachesinstruction/data memories.
• Harvard Model.
• Instruction fetch requires instruction access
• Since Register File.
25
Resolve Structural Hazard 2
26
Resolve Structural Hazard 3
27
Pipelining the MIPS ISA
• What makes it easy
• all instructions are the same length (32 bits)
• can fetch in the 1st stage and decode in the 2nd stage
• few instruction formats (three) with symmetry across formats
• can begin reading register file in 2nd stage
• memory operations occur only in loads and stores
• can use the execute stage to calculate memory addresses
• each instruction writes at most one result (i.e., changes the
machine state) and does it in the last few pipeline stages (MEM
or WB)
• operands must be aligned in memory so a single data transfer
takes only one data memory access
28
CO101�Principle of Computer Organization
Pipelining: Laundry example
Sequential Laundry
Pipelined Laundry
Pipelining Lessons
Question:
Review: Instruction Critical Paths
LW instruction: contains 5 steps
Observation of single cycle processor
How Can We Make It Faster?
Single cycle processor: partitioning
MIPS Pipeline Datapath Additions
Pipeline Design
IF and ID together
Pipelining LW instructions
Pipelined processor
A Pipelined MIPS Processor
Graphically Representing MIPS Pipeline
Other Pipeline Structures Are Possible
Other Sample Pipeline Alternatives
Can Pipelining Get Us Into Trouble?
Laundry example
Structural Hazards
Resolve Structural Hazards 1
Structural Hazards
Resolve Structural Hazard 2
Resolve Structural Hazard 3
Pipelining the MIPS ISA