CS计算机代考程序代写 mips compiler computer architecture cache Digital System Design 4

Digital System Design 4

Digital System Design 4
Lecture 11 – Processor Architecture 3

Computer Architecture

Dr Chang Liu

Course Outline
Week Lecture Topic Chapter Tutorial

1 1 Introduction

1 2 A Historical Perspective

2 3 Modern Technology and Types of Computer

2 4 Computer Perfomance 1

3 5 Digital Logic Review C

3 6 Instruction Set Architecture 1 2

4 7 Instruction Set Architecture 2 2

4 8 Processor Architecture 1 4

5 9 Instruction Set Architecture 3 2

5 10 Processor Architecture 2 4

Festival of Creative Learning

6 11 Processor Architecture 3 4

6 12 Processor Architecture 4 4Processor Architecture 3 – Chang Liu 2

This Lecture

• Pipelining

• Hazards

Processor Architecture 3 – Chang Liu 3

• Pipelined laundry: overlapping execution

– Parallelism improves performance

• Four loads:

– Speedup
= 8/3.5 = 2.3

• Non-stop:

– Speedup
= 2n/(0.5n + 1.5)
≈ 4
= number of stages

Pipelining: The Laundry Analogy

Processor Architecture 3 – Chang Liu 4

MIPS Pipeline

Five stages, one step per stage

1. IF: Instruction fetch from memory

2. ID: Instruction decode & register read

3. EX: Execute operation or calculate address

4. MEM: Access memory operand

5. WB: Write result back to register

Processor Architecture 3 – Chang Liu 5

MIPS Pipelined Datapath

WB

MEM

Right-to-left

flow leads to

hazards

Processor Architecture 3 – Chang Liu 6

Pipeline Performance

• Assume time for stages is

– 100ps for register read or write

– 200ps for other stages

• Compare pipelined datapath with single-cycle
datapath

Instr Instr fetch Register
read

ALU op Memory
access

Register
write

Total time

lw 200ps 100 ps 200ps 200ps 100 ps 800ps

sw 200ps 100 ps 200ps 200ps 700ps

R-format 200ps 100 ps 200ps 100 ps 600ps

beq 200ps 100 ps 200ps 500ps

Processor Architecture 3 – Chang Liu 7

Pipeline Performance

Single-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

Processor Architecture 3 – Chang Liu 8

Pipeline Speedup

• If all stages are balanced

– i.e., all take the same time

– Time between instructionspipelined
= Time between instructionsnonpipelined

Number of stages

• If not balanced, speedup is less

• Speedup due to increased throughput

– Latency (time for each instruction) does not
decrease

Processor Architecture 3 – Chang Liu 9

Hazards

• Situations that prevent starting the next
instruction in the next cycle

• Structure hazards
– A required resource is busy

• Data hazards
– Need to wait for previous instruction to complete

its data read/write

• Control hazards
– Deciding on control action depends on previous

instruction

Processor Architecture 3 – Chang Liu 10

Structure Hazards

• Conflict for use of a resource

• In MIPS pipeline with a single memory

– Load/store requires data access

– Instruction fetch would have to stall for that cycle

• Would cause a pipeline “bubble”

• Hence, pipelined datapaths require separate
instruction/data memories

– Or separate instruction/data caches

Processor Architecture 3 – Chang Liu 11

Data Hazards
• An instruction depends on completion of data

access by a previous instruction

– add $s0, $t0, $t1
sub $t2, $s0, $t3

Processor Architecture 3 – Chang Liu 12

Forwarding (aka Bypassing)

• Use result when it is computed

– Don’t wait for it to be stored in a register

– Requires extra connections in the datapath

Processor Architecture 3 – Chang Liu 13

Load-Use Data Hazard

• Can’t always avoid stalls by forwarding

– If value not computed when needed

– Can’t forward backward in time!

Processor Architecture 3 – Chang Liu 14

MIPS…
• “…Data hazards can be detected quite easily when the program’s machine code is written by

the compiler.

• The original Stanford RISC machine relied on the compiler to add the NOP instructions in this
case, rather than having the circuitry to detect and (more taxingly) stall the first two pipeline
stages. Hence the name MIPS: Microprocessor without Interlocked Pipeline Stages.

• It turned out that the extra NOP instructions added by the compiler expanded the program
binaries enough that the instruction cache hit rate was reduced. The stall hardware, although
expensive, was put back into later designs to improve instruction cache hit rate…

•at which point the acronym no longer makes sense.”

or is it?

http://en.wikipedia.org/wiki/Classic_RISC_pipeline#Solution_B._Pipeline_interlock

Processor Architecture 3 – Chang Liu 15

Code Scheduling to Avoid Stalls

• Reorder code to avoid use of load result in the
next instruction

• C code for A = B + E; C = B + F;

lw $t1, 0($t0)

lw $t2, 4($t0)

add $t3, $t1, $t2

sw $t3, 12($t0)

lw $t4, 8($t0)

add $t5, $t1, $t4

sw $t5, 16($t0)

stall

stall

lw $t1, 0($t0)

lw $t2, 4($t0)

lw $t4, 8($t0)

add $t3, $t1, $t2

sw $t3, 12($t0)

add $t5, $t1, $t4

sw $t5, 16($t0)

11 cycles13 cycles

Processor Architecture 3 – Chang Liu 16

Control Hazards

• Branch determines flow of control
– Fetching next instruction depends on branch

outcome

– Pipeline can’t always fetch correct instruction
• Still working on ID stage of branch

• In MIPS pipeline
– Need to compare registers and compute target

early in the pipeline

– Add hardware to do it in ID stage

Processor Architecture 3 – Chang Liu 17

Stall on Branch

• Wait until branch outcome determined before
fetching next instruction

Processor Architecture 3 – Chang Liu 18

Branch Prediction

• Longer pipelines can’t readily determine
branch outcome early

– Stall penalty becomes unacceptable

• Predict outcome of branch

– Only stall if prediction is wrong

• In MIPS pipeline

– Can predict branches not taken

– Fetch instruction after branch, with no delay

Processor Architecture 3 – Chang Liu 19

MIPS with Predict Not Taken

Prediction

correct

Prediction

incorrect

Processor Architecture 3 – Chang Liu 20

More-Realistic Branch Prediction

• Static branch prediction

– Based on typical branch behavior

– Example: loop and if-statement branches
• Predict backward branches taken

• Predict forward branches not taken

• Dynamic branch prediction

– Hardware measures actual branch behavior
• e.g., record recent history of each branch

– Assume future behavior will continue the trend
• When wrong, stall while re-fetching, and update history

Processor Architecture 3 – Chang Liu 21

Next Lecture

• Pipelined Datapath

• Pipeline Control

Processor Architecture 3 – Chang Liu 22