程序代写代做代考 cache mips PowerPoint 演示文稿

PowerPoint 演示文稿

CO101
Principle of Computer

Organization
Lecture 11: Pipelined MIPS

Processor 1

Liang Yanyan

澳門科技大學
Macau of University of Science and Technology

Pipelining: Laundry example
• Four persons: A, B, C, and D.

• Each has one load of clothes
to wash, dry, fold, and stash.

• Assume
• “Washer” takes 30 minutes.
• “Dryer” takes 30 minutes.
• “Folder” takes 30 minutes.
• “Stasher” takes 30 minutes

to put clothes into drawers.

2

Wash: 30 mins

Dry: 30 mins

Fold: 30 mins

Stash: 30 mins

Sequential Laundry

3

• If each load is done sequentially: 8 hours for 4 loads.
• Question: can B washes after A has finished washing?

30
Time

30 30 30 30 30 30 30 30 30 30 30 30 30 30 30

6 PM 7 8 9 10 11 12 1 2 AM

A

B
C
D

Pipelined Laundry

4

• Start work as soon as possible.
• It takes 3.5 hours for 4 loads!

• Speedup = 8/3.5 = 2.3
• What is the speedup if there are 10000 loads?

12 2 AM 6 PM 7 8 9 10 11 1

Time 30 30 30 30 30 30 30
A

B
C

D

Pipelining Lessons

5

• Multiple tasks operate
simultaneously using distinct
resources.

• Pipelining doesn’t improve latency
of individual task, it improves
throughput of entire workload.

• Potential speedup = Number of
pipeline stages.

• Time to “fill” pipeline and time to
“drain” it reduces speedup.

• Pipeline rate (throughput) limited
by slowest pipeline stage (i.e.
slowest task)
• Wash takes 300 mins?
• Unbalanced lengths of

pipeline stages reduces
throughput.

6 PM 7 8 9
Time

30 30 30 30 30 30 30
A

B
C

D

Use pipelining to improve speedup by increasing
throughput.

Question:
• Can we improve the performance of processor using

pipelining?
• Increase the speedup (throughput) by pipelining?

6

Review: Instruction Critical Paths
• Calculate cycle time assuming negligible delays (for

muxes, control unit, sign extend, PC access, shift left 2,
wires) except:
• Instruction and Data Memory (4 ns)
• ALU and adders (2 ns)
• Register File access (reads or writes) (1 ns)

7

LW instruction: contains 5 steps

• IF: Instruction fetch from instruction memory.
• ID: Instruction decode and register fetch.
• EX: Calculate the memory address.
• MEM: Read the data from the data memory.
• WB: Write the data back to the register file.

8

IF ID EX MEM WB LW

Observation of single cycle processor
• Each step of LW uses distinct functional unit.
• Most of functional units are idle during the execution.

• E.g. ALU is idle during IF, ID, etc.
• Put slackers back to work.

9

How Can We Make It Faster?
• Start fetching and executing the next instruction before

the current one has completed.
• Pipelining – modern processors are almost pipelined for

performance.
• Remember the performance equation:

• CPU time = IC * CPI * Clock Cycle

• Under ideal conditions and with a large number of
instructions, the speedup from pipelining is
approximately equal to the number of pipe stages.

10

Single cycle processor: partitioning

11

MIPS Pipeline Datapath Additions
• State registers between each pipeline stage to isolate them.

12

Pipeline Design
• Can we fetch the 2nd LW instruction when decoding the

1st LW instruction?

13

IF and ID together

14

Pipelining LW instructions
• Pipeline depth is the number of stages, in here, 5.
• Cycle 1 – 4 : the pipeline is filing, as there are unused functional

units.
• Cycle 5: the pipeline is full, all functional units are used.
• Cycle 6 – 9: the pipeline is flushing (emptying).

15

Execution
sequence

Pipeline execution diagram.

Pipelined processor

• Time on filling the pipeline (assume cycle time = 200ps)
• 4 cycles = 4 x 200ps

• Time on pipeline is full and emptying
• 10000 cycles = 10000 x 200ps

• Total time
• 10004 x 200ps ≈ 2M ps

• Speedup? Why cycle time is 200ps?
16

Once the
pipeline is full,
one instruction
is completed
every cycle, so
CPI = 1.

A Pipelined MIPS Processor
• Start the next instruction before the current one has

completed.
• improves throughput – total amount of work done in a given time.
• instruction latency (execution time, delay time, response time –

time from the start of an instruction to its completion) is not
reduced.

• clock cycle (pipeline stage time) is limited by the slowest
stage.
• for some stages don’t need the whole clock cycle (e.g., WB).
• for some instructions, some stages are wasted cycles (i.e.,

nothing is done during MEM and WB cycle for beq instruction).

17

IF ID EX do nothing do nothing beq

Graphically Representing MIPS Pipeline

• Can help with answering questions like:
• How many cycles does it take to execute this code?
• What is the ALU doing during cycle 4?
• Is there a hazard, why does it occur, and how can it be fixed?

18

Other Pipeline Structures Are Possible
• What about the (slow) multiply operation?

• Make the clock twice as slow or …
• let it take two cycles (since it doesn’t use the DM stage)

• What if the data memory access is twice as slow as the

instruction memory?
• make the clock twice as slow or …
• let data memory access take two cycles (and keep the same

clock rate)

19

Other Sample Pipeline Alternatives
• ARM7

• XScale

20

Can Pipelining Get Us Into Trouble?
• Yes: Pipeline Hazards

• structural hazards:
• a required resource is busy

• data hazards:
• attempt to use data before it is ready

• control hazards:
• deciding on control action depends on previous instruction

• Can usually resolve hazards by waiting

• pipeline control must detect the hazard
• and take action to resolve hazards

21

Laundry example
• What happen if we only have one machine which

combines washing and drying functions?

22

= +

12 2 AM 6 PM 7 8 9 10 11 1

Time 30 30 30 30 30 30 30
A

B
C

D
B cannot start wash, as A is using the machine
for drying, this is called structural hazard!

Structural Hazards

23

Attempt to use the same resource in two different ways at the same time.
However, the hardware cannot support two writes simultaneously.

Resolve Structural Hazards 1

24

Structural Hazards
• There are other conflicts for use of a

resource.
• In MIPS pipeline with a single memory

• Load/store requires data access
• Hence, pipeline datapaths require separate

instruction/data memories, or separate
instruction/data cachesinstruction/data memories.

• Harvard Model.
• Instruction fetch requires instruction access

• Since Register File.

25

Resolve Structural Hazard 2

26

Resolve Structural Hazard 3

27

Pipelining the MIPS ISA
• What makes it easy

• all instructions are the same length (32 bits)
• can fetch in the 1st stage and decode in the 2nd stage

• few instruction formats (three) with symmetry across formats
• can begin reading register file in 2nd stage

• memory operations occur only in loads and stores
• can use the execute stage to calculate memory addresses

• each instruction writes at most one result (i.e., changes the
machine state) and does it in the last few pipeline stages (MEM
or WB)

• operands must be aligned in memory so a single data transfer
takes only one data memory access

28

CO101�Principle of Computer Organization
Pipelining: Laundry example
Sequential Laundry
Pipelined Laundry
Pipelining Lessons
Question:
Review: Instruction Critical Paths
LW instruction: contains 5 steps
Observation of single cycle processor
How Can We Make It Faster?
Single cycle processor: partitioning
MIPS Pipeline Datapath Additions
Pipeline Design
IF and ID together
Pipelining LW instructions
Pipelined processor
A Pipelined MIPS Processor
Graphically Representing MIPS Pipeline
Other Pipeline Structures Are Possible
Other Sample Pipeline Alternatives
Can Pipelining Get Us Into Trouble?
Laundry example
Structural Hazards
Resolve Structural Hazards 1
Structural Hazards
Resolve Structural Hazard 2
Resolve Structural Hazard 3
Pipelining the MIPS ISA