程序代写代做代考 assembly mips PowerPoint 演示文稿

PowerPoint 演示文稿

CO101
Principle of Computer

Organization
Lecture 08: Single Cycle

Processor 1

Liang Yanyan

澳門科技大學
Macau of University of Science and Technology

The Big Picture: Where are We Now?
• The Five Classic Components of a Computer

• Today’s Topic: Design a Single Cycle Processor

2

Control

Datapath

Memory

Processor
Input

Output

The Big Picture: The Performance Perspective

• Performance of a machine is determined by:
• Instruction count
• Clock cycle time
• Clock cycles per instruction (CPI)

• Processor design (datapath and control) will determine:
• Clock cycle time
• Clock cycles per instruction

• Single cycle processor:
• Advantage: One clock cycle per instruction
• Disadvantage: long cycle time

3

The Processor
• We’re ready to look at an implementation of the MIPS
• Simplified to contain only:

• memory-reference instructions: lw, sw
• arithmetic-logical instructions: add, addu, sub, subu, and, or,

xor, nor, slt, sltu
• arithmetic-logical immediate instructions: addi, addiu, andi, ori,

xori, slti, sltiu
• control flow instructions: beq, j

• Generic implementation:
• use the program counter (PC) to supply
the instruction address and fetch the
instruction from memory (and update the PC)
• decode the instruction (and read registers)
• execute the instruction

4

How to Design a Processor: step-by-step
1. Analyze instruction set => datapath requirements.

• the meaning of each instruction
• datapath must include storage element for ISA registers

• E.g. 32 registers for MIPS
• datapath must support each register transfer

2. Select set of datapath components.
3. Assemble datapath meeting the requirements.
4. Analyze implementation of each instruction to determine

setting of control signals.
5. Assemble the control logic.

5

Abstract Implementation View
• Two types of functional units:

• elements that operate on data values (combinational)
• elements that contain state (sequential)

• Single cycle operation
• Split memory (Harvard) model – one memory for

instructions and one for data

6

Step 1a: The MIPS subset for today
• ADD and SUB

• add rd, rs, rt
• sub rd, rs, rt

• ADDI
Immediate:
• addi rt, rs,

imm16
• LOAD and

STORE word
• lw rt, imm16(rs)
• sw rt,

imm16(rs)
• BRANCH:

• beq rs, rt,
imm16

7

op rs rt rd shamt funct
0 6 11 16 21 26 31

6 bits 6 bits 5 bits 5 bits 5 bits 5 bits

op rs rt immediate
0 16 21 26 31

6 bits 16 bits 5 bits 5 bits

op rs rt immediate
0 16 21 26 31

6 bits 16 bits 5 bits 5 bits

op rs rt immediate
0 16 21 26 31

6 bits 16 bits 5 bits 5 bits

What does it mean?

8

• inst Register Transfers Update program counter

• ADD R[rd] <= R[rs] + R[rt]; PC <– PC + 4 • SUB R[rd] <= R[rs] – R[rt]; PC <– PC + 4 • ADDI R[rt] <= R[rs] + sign_ext(imm16); PC <– PC + 4 • LOAD R[rt] <= MEM[ R[rs] + sign_ext(imm16)]; PC <– PC + 4 • STORE MEM[ R[rs] + sign_ext(imm16) ] <= R[rt]; PC <– PC + 4 • BEQ if ( R[rs] == R[rt] ) then PC <= PC + 4 + sign_ext(imm16) × 4 else PC <= PC + 4 Step 1b: Datapath requirements • Ability to perform computation • E.g. add & sub • Storage element • Store instruction & data • Store registers (32 x 32) • read RS • read RT • Write RT or RD • Program Counter (PC) • Point to the instruction to be loaded • Add 4 or extended immediate to PC • PC <= PC + 4 • PC <= PC + 4 + [sign_ext(imm16)] × 4 • Extender • Sign extend, e.g. lw t1, imm16(t1) • Zero extend, e.g. addiu t1, t4, 5 9 Step 2: Components of the Datapath • Computation: Arithmetic Logic Unit (ALU) • Store instruction and data: Memory • Store register: Register file • Choose different PCs: Multiplexer (MUX) • if … else … • Sign or Zero extend: Extender 10 Combinational Logic Elements (Basic Building Blocks) 11 • MUX • ALU 32 32 A B 32 Result OP 32 A B 32 Y 32 Select M U X A L U opcode e.g. condition of if statement, true, false zero zero used to indicate whether ALU output is 0 or not. zero = 1 if ALU output is 0, otherwise, Zero = 0. Storage Element: Register (Basic Building Block) • Register • Similar to the D Flip Flop except • N-bit input and output • Write Enable • Write Enable: • 0: register will NOT be updated • 1: content of the register will be updated at the rising edge of clock signal 12 Clock Data In Write Enable N N Data Out Storage Element: Register File • Consists of 32 registers (locations): • Two 32-bit outputs: ReadData1 and ReadData2. • One 32-bit input: WriteData. • RegWrite • Enable writing data to Register File or not. • Register is selected by: • ReadAddr1 (5 bits) selects the register to put on ReadData1. • ReadAddr2 (5 bits) selects the register to put on ReadData2. • WriteAddr (5 bits) selects the register (location) to store data from WriteData when RegWrite is 1. 13 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 RegWrite Storage Element: Memory • Memory • One data input: WriteData. • One data output: ReadData. • Memory word is selected by Address. • Address selects the word to put on ReadData. • When MemWrite == 1: Address selects the memory location to store WriteData. • When MemRead == 1: Address selects the memory location to put on ReadData. • Number of address bits depends on number of memory locations. 14 Data Memory Address Write Data Read Data MemWrite MemRead Clocking Methodologies • Clocking methodology defines when signals can be read and when they can be written clock rate = 1/(clock cycle) e.g.,10 nsec clock cycle = 100 MHz clock rate 1 nsec clock cycle = 1 GHz clock rate • State element design choices • master-slave and edge-triggered flipflops: A clocking methodology defines when signals can be read and written – would NOT want to read a signal at the same time it was being written. 15 Our Implementation • An edge-triggered methodology • Typical execution • read contents of some state elements • send values through some combinational logic • write results to one or more state elements • Assumes state elements are written on every clock cycle; if not, need explicit write control signal • write occurs only when both the write control is asserted and the clock edge occurs 16 Step 3: Datapath Assembly • a. Instruction Fetch • b. Instruction execution • Read Operands • Execute Operation • Register e.g. R[rd] <= R[rs] op R[rt] • Immediate e.g. R[rt] <= R[rs] + sign_ext(Imm16) • Load e.g. R[rt] <= MEM[ R[rs] + sign_ext(Imm16)]; • Store e.g. MEM[ R[rs] + sign_ext(Imm16) ] <= R[rt]; • Branch e.g. PC <= PC + [sign_ext(Imm16)] × 4 • Add datapath step by step 17 3a: Data path for instruction fetch • Fetch the instructions from Instruction Memory. • An address input: ReadAddress. • A data output: Instruction. • ReadAddress selects the instruction to put on Instruction. • Update the program counter (PC): • Sequential Code: PC <= PC + 4 • Branch and Jump: PC <= “something else” • Connect PC to Instruction Memory. 18 Read Address Instruction Instruction Memory Add PC 4 3b: Execute operation: Register • R[rd] <= R[rs] op R[rt] Example: add rd, rs, rt • Register is selected by instruction’s rs, rt, and rd fields. • Data are sent from Register File to ALU for computation. • Result is written back to Register File from ALU. 19 op rs rt rd shamt funct 0 6 11 16 21 26 31 6 bits 6 bits 5 bits 5 bits 5 bits 5 bits Read Address Instruction [31-0] Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU zero ALUctr RegWrite Instr[25-21] Instr[20-16] Instr[15-11] 3c: Execute operation: Immediate • R[rt] <= R[rs] op SignExt[imm16] ] Example: addi t0, t1, -10 20 11 op rs rt immediate 0 16 21 26 31 6 bits 16 bits 5 bits 5 bits rd? immediate 0 16 15 31 16 bits 16 bits 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Changes compared with 3b: 1. ALU input can come from register or immediate => add
sign extender and Mux

2. Destination can be Rd or Rt

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read
Data 1

Read
Data 2

RegWrite

Sign
Extend 16 32

ALUSrc
RegDst

1

1

0
0

Instr[15-0]

Instr[25-21]

Instr[20-16]

Instr[15-11]

Read
Address

Instruction
[31-0]

Instruction
Memory

Add

PC

4

ALU
zero

ALUctr

SignExt

Consider the slt Instruction
• Remember the R format instruction slt

slt $t0, $s0, $s1 # if $s0 < $s1 # then $t0 = 1 # else $t0 = 0 • Where does the 1 (or 0) come from to store into $t0 in the Register File at the end of the execute cycle? 21 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 RegWrite Sign Extend 16 32 ALUSrc RegDst 1 1 0 0 Instr[15-0] Instr[25-21] Instr[20-16] Instr[15-11] Read Address Instruction [31-0] Instruction Memory Add PC 4 ALU zero ALUctr SignExt 3d: Execute operation: Load • R[rt] <= Mem[R[rs] + SignExt[imm16]] Example: lw rt, imm16(rs) 22 11 op rs rt immediate 0 16 21 26 31 6 bits 16 bits 5 bits 5 bits rd Changes compared with 3c: Data to Register File can come from ALU result or memory output => add memory and Mux

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read
Data 1

Read
Data 2

RegWrite

Sign
Extend 16 32

ALUSrc
RegDst

1

1

0
0

Instr[15-0]

Instr[25-21]

Instr[20-16]

Instr[15-11]

Read
Address

Instruction
[31-0]

Instruction
Memory

Add

PC

4

ALU

zero

ALUctr

Data
Memory

Address

Write Data

Read Data 1

0

MemRead

MemtoReg

MemWrite

SignExt

3e: Execute operation: Store
• Mem[ R[rs] + SignExt[imm16]] <= R[rt] ] Example: sw rt, imm16(rs) 23 op rs rt immediate 0 16 21 26 31 6 bits 16 bits 5 bits 5 bits Changes compared with 3d: 1. Can store data from register to memory => add datapath

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read
Data 1

Read
Data 2

RegWrite

Sign
Extend 16 32

ALUSrc
RegDst

1

1

0
0

Instr[15-0]

Instr[25-21]

Instr[20-16]

Instr[15-11]

Read
Address

Instruction
[31-0]

Instruction
Memory

Add

PC

4

ALU

zero

ALUctr

Data
Memory

Address

Write Data

Read Data 1

0

MemRead

MemtoReg

MemWrite

SignExt

3f: The Branch Instruction

• beq rs, rt, Label

• mem[PC] Fetch the instruction from memory

• EQUAL <= if (R[rs] == R[rt]) ? Calculate the branch condition • EQUAL <= 1 if they are equal, EQUAL <= 0 if they are not equal • if (EQUAL eq 1) Calculate the next instruction’s address • PC <- PC + 4 + ( SignExt(imm16) x 4 ) • else • PC <- PC + 4 24 op rs rt immediate 0 16 21 26 31 6 bits 16 bits 5 bits 5 bits 3f: Execute operation: Branch 25 1. New PC can be (PC+4) or (PC+4+4×imm16), => add adder and Mux

2. Check equal, adjust PCsrc based on Zero

calculate next
instruction’s address

Read
Address

Instr[31-0]

Instruction
Memory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read
Data 1

Read
Data 2

ALU

zero

RegWrite

Data
Memory

Address

Write Data

Read Data

MemWrite

MemRead

Sign
Extend 16 32

MemtoReg
ALUSrc

Shift
left 2

Add

PCSrc

RegDst

1

1

1

0
0

0

0

1

Instr[15-0]

Instr[25-21]

Instr[20-16]

ALUctr

Instr[15-11]

SignExt

beq rs, rt, Label

3g: Execute operation: Jump

26

Address of target inst. (target PC) = (upper 4 bits of (PC + 4) : offset) << 2 Read Address Instr[31-0] Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU zero RegWrite Data Memory Address Write Data Read Data MemWrite MemRead Sign Extend 16 32 MemtoReg ALUSrc Shift left 2 Add PCSrc RegDst 1 1 1 0 0 0 0 1 Instr[15-0] Instr[25-21] Instr[20-16] Shift left 2 0 1 32 Instr[25-0] 26 PC+4[31-28] 28 Jump op 26 bit offset value ALUctr Instr[15-11] SignExt Creating a Single Datapath from the Parts • Assemble the datapath elements • Fetch and execute each instructions in one clock cycle – single cycle design • no datapath resource can be used more than once per instruction, so some must be duplicated (e.g., why we have a separate Instruction Memory and Data Memory) • to share datapath elements between two different instruction classes will need multiplexors at the input of the shared elements with control lines to do the selection • Cycle time is determined by length of the longest path 27 Putting it All Together: A Single Cycle Datapath 28 Read Address Instr[31-0] Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU zero RegWrite Data Memory Address Write Data Read Data MemWrite MemRead Sign Extend 16 32 MemtoReg ALUsrc Shift left 2 Add PCsrc RegDst 1 1 1 0 0 0 0 1 Instr[15-0] Instr[25-21] Instr[20-16] Instr[15-11] Shift left 2 0 1 32 Instr[25-0] 26 PC+4[31-28] 28 Jump ALUctr SignExt Clock Distribution 29 What have been done in one clock cycle for lw? • Instruction fetch • Instruction decode • Load data from register Rs • Select immediate (imm16) and put to ALU • Calculate memory address • R[rs] + imm16 • Load data from memory • Write data to register R[rt] • Calculate next PC Long cycle time!! 30 Step 4: Designing control signals • Control signals are used to control the datapath to perform necessary operations. • Need to decide what control signals are required such that we can fully control our processor to perform calculation. • Example: add t0, t1, t2 • Need the ALU to perform addition. • As a result, we need a control signal “ALUctr” to control the ALU to perform necessary operation. • Will be addressed in next lecture. 31 Step 5: Adding control logic • Control logic is a component used to determine the value of control signals based on instruction type. • E.g. ALUctr, RegDst, …… 32 ALUctr RegDst ALUSrc ExtOp MemtoReg MemWr Zero Instruction<31:0>

<21:25>

<16:20>

<11:15>

<0:15>
imm16 Rd Rs Rt

PCsrc

Adr

Inst
Memory

DATA PATH

Control logic

Op

<21:25>

Fun

RegWr

An abstract view of the processor

33

Data
Out

Clk

5

Rw Ra Rb
32 32-bit
Registers

Rd

A
L

U

Clk

Data
In

Data
Address

Data
Memory

Instruction

Instruction
Address

Instruction
Memory

C
lk

P

C

5
Rs

5
Rt

32

32
32 32

A

B

N
ex

t A
dd

re
ss

Control logic

Datapath

Control Signals Conditions

Summary
• 5 steps to design a processor

• 1. Analyze instruction set => datapath requirements
• 2. Select set of datapath components
• 3. Assemble datapath meeting the requirements
• 4. Analyze implementation of each instruction to determine

setting of control signals
• 5. Assemble the control logic

• MIPS makes it easier
• Instructions same size
• Source registers always in same place, R[rt], R[rs]
• Immediates same size
• ALU input data always come from registers/immediates

• Single cycle datapath => CPI=1, Cycle time => long
• Next time: implementing control logic.

34

CO101�Principle of Computer Organization
The Big Picture: Where are We Now?
The Big Picture: The Performance Perspective
The Processor
How to Design a Processor: step-by-step
Abstract Implementation View
Step 1a: The MIPS subset for today
What does it mean?
Step 1b: Datapath requirements
Step 2: Components of the Datapath
Combinational Logic Elements (Basic Building Blocks)
Storage Element: Register (Basic Building Block)
Storage Element: Register File
Storage Element: Memory
Clocking Methodologies
Our Implementation
Step 3: Datapath Assembly
3a: Data path for instruction fetch
3b: Execute operation: Register
3c: Execute operation: Immediate
Consider the slt Instruction
3d: Execute operation: Load
3e: Execute operation: Store
3f: The Branch Instruction
3f: Execute �operation: Branch
3g: Execute operation: Jump
Creating a Single Datapath from the Parts
Putting it All Together: A Single Cycle Datapath
Clock Distribution
What have been done in one clock cycle for lw?
Step 4: Designing control signals
Step 5: Adding control logic
An abstract view of the processor
Summary