程序代写代做代考 mips cache Chapter ...

Chapter …

Morgan Kaufmann Publishers 4 November, 2018

Chapter 4 — The Processor 1

Chapter 4 — The Processor — 58

Pipelining Analogy

 Pipelined laundry: overlapping execution
 Parallelism improves performance

§
4

.5
A

n
O

v
e

rv
ie

w
o

f P
ip

e
lin

in
g Four loads:

 Speedup

= 8/3.5 = 2.3

 Non-stop:

 Speedup

= 2n/0.5n + 1.5 ≈ 4

= number of stages

Pop Quiz

 On the previous slide, we had 4 stages.
Therefore, each load of laundry should
finish in 25% of the time.

 A: True

 B: False

Chapter 4 — The Processor — 59

Morgan Kaufmann Publishers 4 November, 2018

Chapter 4 — The Processor 2

Pipelining Impacts Throughput

 … not latency (time it takes for a single
instruction to complete)

 Another example: can nine women
produce a baby in a single month?

 Nine women can produce nine babies in
nine months (avg. 1 baby / month)

Chapter 4 — The Processor — 60

Chapter 4 — The Processor — 61

MIPS Pipeline

 Five stages, one step per stage
1. IF: Instruction fetch from memory

2. ID: Instruction decode & register read

3. EX: Execute operation or calculate
address

4. MEM: Access memory operand

5. WB: Write result back to register

Morgan Kaufmann Publishers 4 November, 2018

Chapter 4 — The Processor 3

Chapter 4 — The Processor — 62

Pipeline Performance

 Assume time for stages is
 100ps for register read or write

 200ps for other stages

 Compare pipelined datapath with single-cycle
datapath

Instr Instr fetch Register

read

ALU op Memory

access

write

Total time

lw 200ps 100 ps 200ps 200ps 100 ps 800ps

sw 200ps 100 ps 200ps 200ps 700ps

R-format 200ps 100 ps 200ps 100 ps 600ps

beq 200ps 100 ps 200ps 500ps

Chapter 4 — The Processor — 63

Pipeline Performance

Single-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

Morgan Kaufmann Publishers 4 November, 2018

Chapter 4 — The Processor 4

Chapter 4 — The Processor — 64

Pipeline Speedup

 If all stages are balanced
 i.e., all take the same time

 Time between instructionspipelined
= Time between instructionsnonpipelined

Number of stages

 If not balanced, speedup is less

 Speedup due to increased throughput
 Latency (time for each instruction) does

not decrease

Chapter 4 — The Processor — 65

Pipelining and ISA Design

 MIPS ISA designed for pipelining
 All instructions are 32-bits

 Easier to fetch and decode in one cycle
 c.f. x86: 1- to 17-byte instructions

 Few and regular instruction formats
 Can decode and read registers in one step

 Load/store addressing
 Can calculate address in 3rd stage, access

memory in 4th stage

 Alignment of memory operands
 Memory access takes only one cycle

Morgan Kaufmann Publishers 4 November, 2018

Chapter 4 — The Processor 5

Chapter 4 — The Processor — 66

Hazards

 Situations that prevent starting the
next instruction in the next cycle

 Structure hazards
 A required resource is busy

 Data hazard
 Need to wait for previous instruction to

complete its data read/write

 Control hazard
 Deciding on control action depends on

previous instruction

Chapter 4 — The Processor — 67

Structure Hazards

 Conflict for use of a resource

 In MIPS pipeline with a single memory
 Load/store requires data access

 Instruction fetch would have to stall for
that cycle
 Would cause a pipeline “bubble”

 Hence, pipelined datapaths require
separate instruction/data memories
 Or separate instruction/data caches

Morgan Kaufmann Publishers 4 November, 2018

Chapter 4 — The Processor 6

Chapter 4 — The Processor — 68

Data Hazards

 An instruction depends on completion of
data access by a previous instruction
 add $s0, $t0, $t1
sub $t2, $s0, $t3

Chapter 4 — The Processor — 69

Forwarding (aka Bypassing)

 Use result when it is computed
 Don’t wait for it to be stored in a register

 Requires extra connections in the datapath

Morgan Kaufmann Publishers 4 November, 2018

Chapter 4 — The Processor 7

Chapter 4 — The Processor — 70

Load-Use Data Hazard

 Can’t always avoid stalls by forwarding
 If value not computed when needed

 Can’t forward backward in time!

Chapter 4 — The Processor — 71

Code Scheduling to Avoid Stalls

 Reorder code to avoid use of load result
in the next instruction

 C code for A = B + E; C = B + F;

lw $t1, 0($t0)

lw $t2, 4($t0)

add $t3, $t1, $t2

sw $t3, 12($t0)

lw $t4, 8($t0)

add $t5, $t1, $t4

sw $t5, 16($t0)

stall

lw $t1, 0($t0)

lw $t2, 4($t0)

lw $t4, 8($t0)

add $t3, $t1, $t2

sw $t3, 12($t0)

add $t5, $t1, $t4

sw $t5, 16($t0)

11 cycles13 cycles

Morgan Kaufmann Publishers 4 November, 2018

Chapter 4 — The Processor 8

Chapter 4 — The Processor — 72

Control Hazards

 Branch determines flow of control
 Fetching next instruction depends on

branch outcome
 Pipeline can’t always fetch correct

instruction
 Still working on ID stage of branch

 In MIPS pipeline
 Need to compare registers and compute

target early in the pipeline
 Add hardware to do it in ID stage

Chapter 4 — The Processor — 73

Stall on Branch

 Wait until branch outcome determined
before fetching next instruction

Morgan Kaufmann Publishers 4 November, 2018

Chapter 4 — The Processor 9

Chapter 4 — The Processor — 74

Branch Prediction

 Longer pipelines can’t readily determine
branch outcome early
 Stall penalty becomes unacceptable

 Predict outcome of branch
 Only stall if prediction is wrong

 In MIPS pipeline
 Can predict branches not taken

 Fetch instruction after branch, with no
delay

Chapter 4 — The Processor — 75

MIPS with Predict Not Taken

Prediction

correct

Prediction

incorrect

Morgan Kaufmann Publishers 4 November, 2018

Chapter 4 — The Processor 10

Chapter 4 — The Processor — 76

More-Realistic Branch Prediction

 Static branch prediction
 Based on typical branch behavior

 Example: loop and if-statement branches
 Predict backward branches taken

 Predict forward branches not taken

 Dynamic branch prediction
 Hardware measures actual branch behavior

 e.g., record recent history of each branch

 Assume future behavior will continue the trend
 When wrong, stall while re-fetching, and update history

Pop Quiz

 We have approaches to mitigate pipeline
stalls from:

 A: structural hazards

 B: structural and control hazards

 C: data and control hazards

 D: structural, data, and control hazards

Chapter 4 — The Processor — 77

Morgan Kaufmann Publishers 4 November, 2018

Chapter 4 — The Processor 11

Chapter 4 — The Processor — 78

Pipeline Summary

 Pipelining improves performance by
increasing instruction throughput
 Executes multiple instructions in parallel

 Each instruction has the same latency

 Subject to hazards
 Structure, data, control

 Instruction set design affects
complexity of pipeline implementation

The BIG Picture

Chapter 4 — The Processor — 79

MIPS Pipelined Datapath

§
4

.6
P

ip
e

lin
e

d
D

a
ta

p
a

th
a

n
d

C
o

n
tro

MEM

Right-to-left

flow leads to

hazards

Morgan Kaufmann Publishers 4 November, 2018

Chapter 4 — The Processor 12

Chapter 4 — The Processor — 80

Pipeline registers

 Need registers between stages
 To hold information produced in previous

cycle

Chapter 4 — The Processor — 81

Pipeline Operation

 Cycle-by-cycle flow of instructions
through the pipelined datapath
 “Single-clock-cycle” pipeline diagram

 Shows pipeline usage in a single cycle

 Highlight resources used

 c.f. “multi-clock-cycle” diagram
 Graph of operation over time

 We’ll look at “single-clock-cycle”
diagrams for load & store

Morgan Kaufmann Publishers 4 November, 2018

Chapter 4 — The Processor 13

Chapter 4 — The Processor — 82

IF for Load, Store, …

Chapter 4 — The Processor — 83

ID for Load, Store, …

Morgan Kaufmann Publishers 4 November, 2018

Chapter 4 — The Processor 14

Chapter 4 — The Processor — 84

EX for Load

Chapter 4 — The Processor — 85

MEM for Load

Morgan Kaufmann Publishers 4 November, 2018

Chapter 4 — The Processor 15

Chapter 4 — The Processor — 86

WB for Load

Wrong

number

Chapter 4 — The Processor — 87

Corrected Datapath for Load

Related Posts