PowerPoint 演示文稿
CO101
Principle of Computer
Organization
Lecture 08: Single Cycle
Processor 1
Liang Yanyan
澳門科技大學
Macau of University of Science and Technology
The Big Picture: Where are We Now?
• The Five Classic Components of a Computer
• Today’s Topic: Design a Single Cycle Processor
2
Control
Datapath
Memory
Processor
Input
Output
The Big Picture: The Performance Perspective
• Performance of a machine is determined by:
• Instruction count
• Clock cycle time
• Clock cycles per instruction (CPI)
• Processor design (datapath and control) will determine:
• Clock cycle time
• Clock cycles per instruction
• Single cycle processor:
• Advantage: One clock cycle per instruction
• Disadvantage: long cycle time
3
The Processor
• We’re ready to look at an implementation of the MIPS
• Simplified to contain only:
• memory-reference instructions: lw, sw
• arithmetic-logical instructions: add, addu, sub, subu, and, or,
xor, nor, slt, sltu
• arithmetic-logical immediate instructions: addi, addiu, andi, ori,
xori, slti, sltiu
• control flow instructions: beq, j
• Generic implementation:
• use the program counter (PC) to supply
the instruction address and fetch the
instruction from memory (and update the PC)
• decode the instruction (and read registers)
• execute the instruction
4
How to Design a Processor: step-by-step
1. Analyze instruction set => datapath requirements.
• the meaning of each instruction
• datapath must include storage element for ISA registers
• E.g. 32 registers for MIPS
• datapath must support each register transfer
2. Select set of datapath components.
3. Assemble datapath meeting the requirements.
4. Analyze implementation of each instruction to determine
setting of control signals.
5. Assemble the control logic.
5
Abstract Implementation View
• Two types of functional units:
• elements that operate on data values (combinational)
• elements that contain state (sequential)
• Single cycle operation
• Split memory (Harvard) model – one memory for
instructions and one for data
6
Step 1a: The MIPS subset for today
• ADD and SUB
• add rd, rs, rt
• sub rd, rs, rt
• ADDI
Immediate:
• addi rt, rs,
imm16
• LOAD and
STORE word
• lw rt, imm16(rs)
• sw rt,
imm16(rs)
• BRANCH:
• beq rs, rt,
imm16
7
op rs rt rd shamt funct
0 6 11 16 21 26 31
6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
op rs rt immediate
0 16 21 26 31
6 bits 16 bits 5 bits 5 bits
op rs rt immediate
0 16 21 26 31
6 bits 16 bits 5 bits 5 bits
op rs rt immediate
0 16 21 26 31
6 bits 16 bits 5 bits 5 bits
What does it mean?
8
• inst Register Transfers Update program counter
• ADD R[rd] <= R[rs] + R[rt]; PC <– PC + 4
• SUB R[rd] <= R[rs] – R[rt]; PC <– PC + 4
• ADDI R[rt] <= R[rs] + sign_ext(imm16); PC <– PC + 4
• LOAD R[rt] <= MEM[ R[rs] + sign_ext(imm16)]; PC <– PC + 4
• STORE MEM[ R[rs] + sign_ext(imm16) ] <= R[rt]; PC <– PC + 4
• BEQ if ( R[rs] == R[rt] ) then PC <= PC + 4 + sign_ext(imm16) × 4
else PC <= PC + 4
Step 1b: Datapath requirements
• Ability to perform computation
• E.g. add & sub
• Storage element
• Store instruction & data
• Store registers (32 x 32)
• read RS
• read RT
• Write RT or RD
• Program Counter (PC)
• Point to the instruction to be loaded
• Add 4 or extended immediate to PC
• PC <= PC + 4
• PC <= PC + 4 + [sign_ext(imm16)] × 4
• Extender
• Sign extend, e.g. lw t1, imm16(t1)
• Zero extend, e.g. addiu t1, t4, 5
9
Step 2: Components of the Datapath
• Computation: Arithmetic Logic Unit (ALU)
• Store instruction and data: Memory
• Store register: Register file
• Choose different PCs: Multiplexer (MUX)
• if … else …
• Sign or Zero extend: Extender
10
Combinational Logic Elements (Basic
Building Blocks)
11
• MUX
• ALU
32
32
A
B
32
Result
OP
32
A
B
32
Y
32
Select
M
U
X
A
L
U
opcode
e.g. condition of if statement,
true, false
zero
zero used to indicate
whether ALU output
is 0 or not.
zero = 1 if ALU
output is 0, otherwise,
Zero = 0.
Storage Element: Register (Basic
Building Block)
• Register
• Similar to the D Flip Flop
except
• N-bit input and output
• Write Enable
• Write Enable:
• 0: register will NOT be
updated
• 1: content of the
register will be updated
at the rising edge of
clock signal
12
Clock
Data In
Write Enable
N N
Data Out
Storage Element: Register File
• Consists of 32 registers (locations):
• Two 32-bit outputs:
ReadData1 and ReadData2.
• One 32-bit input: WriteData.
• RegWrite
• Enable writing data to Register File or not.
• Register is selected by:
• ReadAddr1 (5 bits) selects the register to put on ReadData1.
• ReadAddr2 (5 bits) selects the register to put on ReadData2.
• WriteAddr (5 bits) selects the register (location) to store
data from WriteData when RegWrite is 1.
13
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
RegWrite
Storage Element: Memory
• Memory
• One data input: WriteData.
• One data output: ReadData.
• Memory word is selected by Address.
• Address selects the word to put on ReadData.
• When MemWrite == 1: Address selects the
memory
location to store WriteData.
• When MemRead == 1: Address selects the
memory
location to put on ReadData.
• Number of address bits depends on number of
memory locations.
14
Data
Memory
Address
Write Data
Read Data
MemWrite
MemRead
Clocking Methodologies
• Clocking methodology defines when signals can be read and
when they can be written
clock rate = 1/(clock cycle)
e.g.,10 nsec clock cycle = 100 MHz clock rate
1 nsec clock cycle = 1 GHz clock rate
• State element design choices
• master-slave and edge-triggered flipflops: A clocking methodology
defines when signals can be read and written – would NOT want to
read a signal at the same time it was being written.
15
Our Implementation
• An edge-triggered methodology
• Typical execution
• read contents of some state elements
• send values through some combinational logic
• write results to one or more state elements
• Assumes state elements are written on every clock cycle; if not,
need explicit write control signal
• write occurs only when both the write control is asserted and the clock
edge occurs
16
Step 3: Datapath Assembly
• a. Instruction Fetch
• b. Instruction execution
• Read Operands
• Execute Operation
• Register e.g. R[rd] <= R[rs] op R[rt]
• Immediate e.g. R[rt] <= R[rs] + sign_ext(Imm16)
• Load e.g. R[rt] <= MEM[ R[rs] + sign_ext(Imm16)];
• Store e.g. MEM[ R[rs] + sign_ext(Imm16) ] <= R[rt];
• Branch e.g. PC <= PC + [sign_ext(Imm16)] × 4
• Add datapath step by step
17
3a: Data path for instruction fetch
• Fetch the instructions from Instruction Memory.
• An address input: ReadAddress.
• A data output: Instruction.
• ReadAddress selects the instruction to put on Instruction.
• Update the program counter (PC):
• Sequential Code: PC <= PC + 4
• Branch and Jump: PC <= “something else”
• Connect PC to Instruction Memory.
18
Read
Address
Instruction
Instruction
Memory
Add
PC
4
3b: Execute operation: Register
• R[rd] <= R[rs] op R[rt] Example: add rd, rs, rt
• Register is selected by instruction’s rs, rt, and rd fields.
• Data are sent from Register File to ALU for computation.
• Result is written back to Register File from ALU.
19
op rs rt rd shamt funct
0 6 11 16 21 26 31
6 bits 6 bits 5 bits 5 bits 5 bits 5 bits
Read
Address
Instruction
[31-0]
Instruction
Memory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
zero
ALUctr RegWrite
Instr[25-21]
Instr[20-16]
Instr[15-11]
3c: Execute operation: Immediate
• R[rt] <= R[rs] op SignExt[imm16] ] Example: addi t0, t1, -10
20
11
op rs rt immediate
0 16 21 26 31
6 bits 16 bits 5 bits 5 bits rd?
immediate
0 16 15 31
16 bits 16 bits
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Changes compared with 3b:
1. ALU input can come from
register or immediate => add
sign extender and Mux
2. Destination can be Rd or Rt
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
RegWrite
Sign
Extend 16 32
ALUSrc
RegDst
1
1
0
0
Instr[15-0]
Instr[25-21]
Instr[20-16]
Instr[15-11]
Read
Address
Instruction
[31-0]
Instruction
Memory
Add
PC
4
ALU
zero
ALUctr
SignExt
Consider the slt Instruction
• Remember the R format instruction slt
slt $t0, $s0, $s1 # if $s0 < $s1 # then $t0 = 1 # else $t0 = 0 • Where does the 1 (or 0) come from to store into $t0 in the Register File at the end of the execute cycle? 21 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 RegWrite Sign Extend 16 32 ALUSrc RegDst 1 1 0 0 Instr[15-0] Instr[25-21] Instr[20-16] Instr[15-11] Read Address Instruction [31-0] Instruction Memory Add PC 4 ALU zero ALUctr SignExt 3d: Execute operation: Load • R[rt] <= Mem[R[rs] + SignExt[imm16]] Example: lw rt, imm16(rs) 22 11 op rs rt immediate 0 16 21 26 31 6 bits 16 bits 5 bits 5 bits rd Changes compared with 3c: Data to Register File can come from ALU result or memory output => add memory and Mux
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
RegWrite
Sign
Extend 16 32
ALUSrc
RegDst
1
1
0
0
Instr[15-0]
Instr[25-21]
Instr[20-16]
Instr[15-11]
Read
Address
Instruction
[31-0]
Instruction
Memory
Add
PC
4
ALU
zero
ALUctr
Data
Memory
Address
Write Data
Read Data 1
0
MemRead
MemtoReg
MemWrite
SignExt
3e: Execute operation: Store
• Mem[ R[rs] + SignExt[imm16]] <= R[rt] ] Example: sw rt, imm16(rs)
23
op rs rt immediate
0 16 21 26 31
6 bits 16 bits 5 bits 5 bits
Changes compared with 3d:
1. Can store data from register
to memory => add datapath
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
RegWrite
Sign
Extend 16 32
ALUSrc
RegDst
1
1
0
0
Instr[15-0]
Instr[25-21]
Instr[20-16]
Instr[15-11]
Read
Address
Instruction
[31-0]
Instruction
Memory
Add
PC
4
ALU
zero
ALUctr
Data
Memory
Address
Write Data
Read Data 1
0
MemRead
MemtoReg
MemWrite
SignExt
3f: The Branch Instruction
• beq rs, rt, Label
• mem[PC] Fetch the instruction from memory
• EQUAL <= if (R[rs] == R[rt]) ? Calculate the branch condition • EQUAL <= 1 if they are equal, EQUAL <= 0 if they are not equal • if (EQUAL eq 1) Calculate the next instruction’s address • PC <- PC + 4 + ( SignExt(imm16) x 4 ) • else • PC <- PC + 4 24 op rs rt immediate 0 16 21 26 31 6 bits 16 bits 5 bits 5 bits 3f: Execute operation: Branch 25 1. New PC can be (PC+4) or (PC+4+4×imm16), => add adder and Mux
2. Check equal, adjust PCsrc based on Zero
calculate next
instruction’s address
Read
Address
Instr[31-0]
Instruction
Memory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read
Data 1
Read
Data 2
ALU
zero
RegWrite
Data
Memory
Address
Write Data
Read Data
MemWrite
MemRead
Sign
Extend 16 32
MemtoReg
ALUSrc
Shift
left 2
Add
PCSrc
RegDst
1
1
1
0
0
0
0
1
Instr[15-0]
Instr[25-21]
Instr[20-16]
ALUctr
Instr[15-11]
SignExt
beq rs, rt, Label
3g: Execute operation: Jump
26
Address of target inst. (target PC) = (upper 4 bits of (PC + 4) : offset) << 2 Read Address Instr[31-0] Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU zero RegWrite Data Memory Address Write Data Read Data MemWrite MemRead Sign Extend 16 32 MemtoReg ALUSrc Shift left 2 Add PCSrc RegDst 1 1 1 0 0 0 0 1 Instr[15-0] Instr[25-21] Instr[20-16] Shift left 2 0 1 32 Instr[25-0] 26 PC+4[31-28] 28 Jump op 26 bit offset value ALUctr Instr[15-11] SignExt Creating a Single Datapath from the Parts • Assemble the datapath elements • Fetch and execute each instructions in one clock cycle – single cycle design • no datapath resource can be used more than once per instruction, so some must be duplicated (e.g., why we have a separate Instruction Memory and Data Memory) • to share datapath elements between two different instruction classes will need multiplexors at the input of the shared elements with control lines to do the selection • Cycle time is determined by length of the longest path 27 Putting it All Together: A Single Cycle Datapath 28 Read Address Instr[31-0] Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU zero RegWrite Data Memory Address Write Data Read Data MemWrite MemRead Sign Extend 16 32 MemtoReg ALUsrc Shift left 2 Add PCsrc RegDst 1 1 1 0 0 0 0 1 Instr[15-0] Instr[25-21] Instr[20-16] Instr[15-11] Shift left 2 0 1 32 Instr[25-0] 26 PC+4[31-28] 28 Jump ALUctr SignExt Clock Distribution 29 What have been done in one clock cycle for lw? • Instruction fetch • Instruction decode • Load data from register Rs • Select immediate (imm16) and put to ALU • Calculate memory address • R[rs] + imm16 • Load data from memory • Write data to register R[rt] • Calculate next PC Long cycle time!! 30 Step 4: Designing control signals • Control signals are used to control the datapath to perform necessary operations. • Need to decide what control signals are required such that we can fully control our processor to perform calculation. • Example: add t0, t1, t2 • Need the ALU to perform addition. • As a result, we need a control signal “ALUctr” to control the ALU to perform necessary operation. • Will be addressed in next lecture. 31 Step 5: Adding control logic • Control logic is a component used to determine the value of control signals based on instruction type. • E.g. ALUctr, RegDst, …… 32 ALUctr RegDst ALUSrc ExtOp MemtoReg MemWr Zero Instruction<31:0>
<21:25>
<16:20>
<11:15>
<0:15>
imm16 Rd Rs Rt
PCsrc
Adr
Inst
Memory
DATA PATH
Control logic
Op
<21:25>
Fun
RegWr
An abstract view of the processor
33
Data
Out
Clk
5
Rw Ra Rb
32 32-bit
Registers
Rd
A
L
U
Clk
Data
In
Data
Address
Data
Memory
Instruction
Instruction
Address
Instruction
Memory
C
lk
P
C
5
Rs
5
Rt
32
32
32 32
A
B
N
ex
t A
dd
re
ss
Control logic
Datapath
Control Signals Conditions
Summary
• 5 steps to design a processor
• 1. Analyze instruction set => datapath requirements
• 2. Select set of datapath components
• 3. Assemble datapath meeting the requirements
• 4. Analyze implementation of each instruction to determine
setting of control signals
• 5. Assemble the control logic
• MIPS makes it easier
• Instructions same size
• Source registers always in same place, R[rt], R[rs]
• Immediates same size
• ALU input data always come from registers/immediates
• Single cycle datapath => CPI=1, Cycle time => long
• Next time: implementing control logic.
34
CO101�Principle of Computer Organization
The Big Picture: Where are We Now?
The Big Picture: The Performance Perspective
The Processor
How to Design a Processor: step-by-step
Abstract Implementation View
Step 1a: The MIPS subset for today
What does it mean?
Step 1b: Datapath requirements
Step 2: Components of the Datapath
Combinational Logic Elements (Basic Building Blocks)
Storage Element: Register (Basic Building Block)
Storage Element: Register File
Storage Element: Memory
Clocking Methodologies
Our Implementation
Step 3: Datapath Assembly
3a: Data path for instruction fetch
3b: Execute operation: Register
3c: Execute operation: Immediate
Consider the slt Instruction
3d: Execute operation: Load
3e: Execute operation: Store
3f: The Branch Instruction
3f: Execute �operation: Branch
3g: Execute operation: Jump
Creating a Single Datapath from the Parts
Putting it All Together: A Single Cycle Datapath
Clock Distribution
What have been done in one clock cycle for lw?
Step 4: Designing control signals
Step 5: Adding control logic
An abstract view of the processor
Summary