CS计算机代考程序代写 mips compiler computer architecture cache Digital System Design 4 Lecture 11 – Processor Architecture 3

Digital System Design 4 Lecture 11 – Processor Architecture 3
Computer Architecture Dr Chang Liu

Course Outline
Week
Lecture
Topic
Chapter
Tutorial
1
1
Introduction
1
2
A Historical Perspective
2
3
Modern Technology and Types of Computer
2
4
Computer Perfomance
1
3
5
Digital Logic Review
C
3
6
Instruction Set Architecture 1
2
4
7
Instruction Set Architecture 2
2
4
8
Processor Architecture 1
4
59
Instruction Set Architecture 3 2
5
10
Processor Architecture 2
4
Festival of Creative Learning
6
11
Processor Architecture 3
4
6
12
Processor Architecture 4
Processor Architecture 3 – Chang Liu
4
2

• Pipelining • Hazards
This Lecture
Processor Architecture 3 – Chang Liu 3

Pipelining: The Laundry Analogy • Pipelined laundry: overlapping execution
– Parallelism improves performance
• Fourloads:
– Speedup
= 8/3.5 = 2.3
• Non-stop:
– Speedup
= 2n/(0.5n + 1.5) ≈4
= number of stages
Processor Architecture 3 – Chang Liu 4

MIPS Pipeline
Five stages, one step per stage
1. IF: Instruction fetch from memory
2. ID: Instruction decode & register read
3. EX: Execute operation or calculate address
4. MEM: Access memory operand
5. WB: Write result back to register
Processor Architecture 3 – Chang Liu 5

MIPS Pipelined Datapath
MEM
Right-to-left flow leads to hazards
WB
Processor Architecture 3 – Chang Liu 6

Pipeline Performance
• Assume time for stages is
– 100ps for register read or write – 200ps for other stages
• Comparepipelineddatapathwithsingle-cycle datapath
Instr
Instr fetch
Register read
ALU op
Memory access
Register write
Total time
lw
200ps
100 ps
200ps
200ps
100 ps
800ps
sw
200ps
100 ps
200ps
200ps
700ps
R-format
200ps
100 ps
200ps
100 ps
600ps
beq
200ps
100 ps
200ps
500ps
Processor Architecture 3 – Chang Liu 7

Pipeline Performance
Single-cycle (T = 800ps) c
Pipelined (T = 200ps) c
Processor Architecture 3 – Chang Liu 8

Pipeline Speedup
• If all stages are balanced
– i.e., all take the same time
– Time between instructionspipelined
= Time between instructionsnonpipelined
Number of stages
• If not balanced, speedup is less
• Speedup due to increased throughput
– Latency (time for each instruction) does not decrease
Processor Architecture 3 – Chang Liu 9

Hazards
• Situations that prevent starting the next instruction in the next cycle
• Structure hazards
– A required resource is busy
• Data hazards
– Need to wait for previous instruction to complete
its data read/write • Control hazards
– Deciding on control action depends on previous instruction
Processor Architecture 3 – Chang Liu 10

Structure Hazards • Conflict for use of a resource
• In MIPS pipeline with a single memory
– Load/store requires data access
– Instruction fetch would have to stall for that cycle • Would cause a pipeline “bubble”
• Hence, pipelined datapaths require separate instruction/data memories
– Or separate instruction/data caches
Processor Architecture 3 – Chang Liu 11

Data Hazards
• An instruction depends on completion of data access by a previous instruction
– add $s0, $t0, $t1 sub $t2, $s0, $t3
Processor Architecture 3 – Chang Liu 12

Forwarding (aka Bypassing)
• Use result when it is computed
– Don’t wait for it to be stored in a register
– Requires extra connections in the datapath
Processor Architecture 3 – Chang Liu 13

Load-Use Data Hazard
• Can’t always avoid stalls by forwarding – If value not computed when needed
– Can’t forward backward in time!
Processor Architecture 3 – Chang Liu 14

MIPS…or is it?
• “…Data hazards can be detected quite easily when the program’s machine code is written by the compiler.
• The original Stanford RISC machine relied on the compiler to add the NOP instructions in this case, rather than having the circuitry to detect and (more taxingly) stall the first two pipeline stages. Hence the name MIPS: Microprocessor without Interlocked Pipeline Stages.
• It turned out that the extra NOP instructions added by the compiler expanded the program binaries enough that the instruction cache hit rate was reduced. The stall hardware, although expensive, was put back into later designs to improve instruction cache hit rate…
•at which point the acronym no longer makes sense.”
Processor Architecture 3 – Chang Liu 15
http://en.wikipedia.org/wiki/Classic_RISC_pipeline#Solution_B._Pipeline_interlock

Code Scheduling to Avoid Stalls
• Reorder code to avoid use of load result in the next instruction
• C code for A = B + E; C = B + F;
lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0)
lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0)
stall
stall
13 cycles 11 cycles
Processor Architecture 3 – Chang Liu
16

Control Hazards • Branch determines flow of control
– Fetching next instruction depends on branch outcome
– Pipeline can’t always fetch correct instruction • Still working on ID stage of branch
• In MIPS pipeline
– Need to compare registers and compute target
early in the pipeline
– Add hardware to do it in ID stage
Processor Architecture 3 – Chang Liu 17

Stall on Branch
• Wait until branch outcome determined before fetching next instruction
Processor Architecture 3 – Chang Liu 18

Branch Prediction
• Longer pipelines can’t readily determine branch outcome early
– Stall penalty becomes unacceptable • Predict outcome of branch
– Only stall if prediction is wrong
• In MIPS pipeline
– Can predict branches not taken
– Fetch instruction after branch, with no delay
Processor Architecture 3 – Chang Liu 19

MIPS with Predict Not Taken
Prediction correct
Prediction incorrect
Processor Architecture 3 – Chang Liu 20

More-Realistic Branch Prediction
• Static branch prediction
– Based on typical branch behavior
– Example: loop and if-statement branches • Predict backward branches taken
• Predict forward branches not taken
• Dynamicbranchprediction
– Hardware measures actual branch behavior
• e.g., record recent history of each branch
– Assume future behavior will continue the trend
• When wrong, stall while re-fetching, and update history
Processor Architecture 3 – Chang Liu 21

Next Lecture • Pipelined Datapath
• Pipeline Control
Processor Architecture 3 – Chang Liu 22