Single Cycle Processor Design
Single Cycle Processor Design
COE 301
Computer Organization
Dr. Muhamed Mudawar
College of Computer Sciences and Engineering
King Fahd University of Petroleum and Minerals
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Presentation Outline
Designing a Processor: Step-by-Step
Datapath Components and Clocking
Assembling an Adequate Datapath
Controlling the Execution of Instructions
Main, ALU, and PC Control
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
2
Here is an outline of today’’s lecture.
Mainly, we will be building a datapath step by step for a subset of the MIPS instruction set.
+1 = 4 min. (X:44)
Designing a Processor: Step-by-Step
Analyze instruction set => datapath requirements
The meaning of each instruction is given by the register transfers
Datapath must include storage elements for ISA registers
Datapath must support each register transfer
Select datapath components and clocking methodology
Assemble datapath meeting the requirements
Analyze implementation of each instruction
Determine the setting of control signals for register transfer
Assemble the control logic
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
3
Review of MIPS Instruction Formats
All instructions are 32-bit wide
Three instruction formats: R-type, I-type, and J-type
Op6: 6-bit opcode of the instruction
Rs5, Rt5, Rd5: 5-bit source and destination register numbers
sa5: 5-bit shift amount used by shift instructions
funct6: 6-bit function field for R-type instructions
immediate16: 16-bit immediate constant or PC-relative offset
address26: 26-bit target address of the jump instruction
Op6
Rs5
Rt5
Rd5
funct6
sa5
Op6
Rs5
Rt5
immediate16
Op6
address26
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
4
MIPS Subset of Instructions
Only a subset of the MIPS instructions is considered
ALU instructions (R-type): add, sub, and, or, xor, slt
Immediate instructions (I-type): addi, slti, andi, ori, xori
Load and Store (I-type): lw, sw
Branch (I-type): beq, bne
Jump (J-type): j
This subset does not include all the integer instructions
But sufficient to illustrate design of datapath and control
Concepts used to implement the MIPS subset are used to construct a broad spectrum of computers
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
5
In today’s lecture, I will show you how to implement the following subset of MIPS instructions: add, subtract, or immediate, load, store, branch, and the jump instruction.
The Add and Subtract instructions use the R format. The Op together with the Func fields together specified all the different kinds of add and subtract instructions.
Rs and Rt specifies the source registers. And the Rd field specifies the destination register.
The Or immediate instruction uses the I format. It only uses one source register, Rs. The other operand comes from the immediate field. The Rt field is used to specified the destination register. (Note that dest is the Rt field!)
Both the load and store instructions use the I format and both add the Rs and the immediate filed together to from the memory address.
The difference is that the load instruction will load the data from memory into Rt while the store instruction will store the data in Rt into the memory.
The branch on equal instruction also uses the I format. Here Rs and Rt are used to specified the registers we need to compare.
If these two registers are equal, we will branch to a location offset by the immediate field.
Finally, the jump instruction uses the J format and always causes the program to jump to a memory location specified in the address field.
I know I went over this rather quickly and you may have missed something. But don’t worry, this is just an overview. You will keep seeing these (point to the format) all day today.
+3 = 13 min. (X:53)
Details of the MIPS Subset
Instruction Meaning Format
add rd, rs, rt addition op6 = 0 rs5 rt5 rd5 0 0x20
sub rd, rs, rt subtraction op6 = 0 rs5 rt5 rd5 0 0x22
and rd, rs, rt bitwise and op6 = 0 rs5 rt5 rd5 0 0x24
or rd, rs, rt bitwise or op6 = 0 rs5 rt5 rd5 0 0x25
xor rd, rs, rt exclusive or op6 = 0 rs5 rt5 rd5 0 0x26
slt rd, rs, rt set on less than op6 = 0 rs5 rt5 rd5 0 0x2a
addi rt, rs, imm16 add immediate 0x08 rs5 rt5 imm16
slti rt, rs, imm16 slt immediate 0x0a rs5 rt5 imm16
andi rt, rs, imm16 and immediate 0x0c rs5 rt5 imm16
ori rt, rs, imm16 or immediate 0x0d rs5 rt5 imm16
xori rt, imm16 xor immediate 0x0e rs5 rt5 imm16
lw rt, imm16(rs) load word 0x23 rs5 rt5 imm16
sw rt, imm16(rs) store word 0x2b rs5 rt5 imm16
beq rs, rt, offset16 branch if equal 0x04 rs5 rt5 offset16
bne rs, rt, offset16 branch not equal 0x05 rs5 rt5 offset16
j address26 jump 0x02 address26
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Register Transfer Level (RTL)
RTL is a description of data flow between registers
RTL gives a meaning to the instructions
All instructions are fetched from memory at address PC
Instruction RTL Description
ADD Reg(rd) ← Reg(rs) + Reg(rt); PC ← PC + 4
SUB Reg(rd) ← Reg(rs) – Reg(rt); PC ← PC + 4
ORI Reg(rt) ← Reg(rs) | zero_ext(imm16); PC ← PC + 4
LW Reg(rt) ← MEM[Reg(rs) + sign_ext(imm16)]; PC ← PC + 4
SW MEM[Reg(rs) + sign_ext(imm16)] ← Reg(rt); PC ← PC + 4
BEQ if (Reg(rs) == Reg(rt))
PC ← PC + 4 + 4 × sign_ext(offset16)
else PC ← PC + 4
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Instruction Fetch/Execute
R-type Fetch instruction: Instruction ← MEM[PC]
Fetch operands: data1 ← Reg(rs), data2 ← Reg(rt)
Execute operation: ALU_result ← func(data1, data2)
Write ALU result: Reg(rd) ← ALU_result
Next PC address: PC ← PC + 4
I-type Fetch instruction: Instruction ← MEM[PC]
Fetch operands: data1 ← Reg(rs), data2 ← Extend(imm16)
Execute operation: ALU_result ← op(data1, data2)
Write ALU result: Reg(rt) ← ALU_result
Next PC address: PC ← PC + 4
BEQ Fetch instruction: Instruction ← MEM[PC]
Fetch operands: data1 ← Reg(rs), data2 ← Reg(rt)
Equality: zero ← subtract(data1, data2)
Branch: if (zero) PC ← PC + 4 + 4×sign_ext(offset16)
else PC ← PC + 4
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
8
Instruction Fetch/Execute – cont’d
LW Fetch instruction: Instruction ← MEM[PC]
Fetch base register: base ← Reg(rs)
Calculate address: address ← base + sign_extend(imm16)
Read memory: data ← MEM[address]
Write register Rt: Reg(rt) ← data
Next PC address: PC ← PC + 4
SW Fetch instruction: Instruction ← MEM[PC]
Fetch registers: base ← Reg(rs), data ← Reg(rt)
Calculate address: address ← base + sign_extend(imm16)
Write memory: MEM[address] ← data
Next PC address: PC ← PC + 4
Jump Fetch instruction: Instruction ← MEM[PC]
Target PC address: target ← PC[31:28] || address26 || ‘00’
Jump: PC ← target
concatenation
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Requirements of the Instruction Set
Memory
Instruction memory where instructions are stored
Data memory where data is stored
Registers
31 × 32-bit general purpose registers, R0 is always zero
Read source register Rs
Read source register Rt
Write destination register Rt or Rd
Program counter PC register and Adder to increment PC
Sign and Zero extender for immediate constant
ALU for executing instructions
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Next . . .
Designing a Processor: Step-by-Step
Datapath Components and Clocking
Assembling an Adequate Datapath
Controlling the Execution of Instructions
Main, ALU, and PC Control
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
11
Here is an outline of today’’s lecture.
Mainly, we will be building a datapath step by step for a subset of the MIPS instruction set.
+1 = 4 min. (X:44)
Combinational Elements
ALU, Adder
Immediate extender
Multiplexers
Storage Elements
Instruction memory
Data memory
PC register
Register file
Clocking methodology
Timing of writes
Components of the Datapath
32
Address
Instruction
Instruction
Memory
32
m
u
x
0
1
select
Extend
32
16
ExtOp
A
L
U
ALUOp
ALU result
zero
32
32
32
overflow
PC
32
32
clk
Registers
RA
RB
BusA
RegWrite
BusB
RW
5
5
5
32
32
32
BusW
clk
Data
Memory
Address
Data_in
Data_out
Mem
Read
Mem
Write
32
32
32
clk
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Register
Similar to the D-type Flip-Flop
n-bit input and output
Write Enable (WE):
Enable / disable writing of register
Negated (0): Data_Out will not change
Asserted (1): Data_Out will become Data_In after clock edge
Edge triggered Clocking
Register output is modified at clock edge
Register Element
Register
Data_In
Clock
Write
Enable
n bits
Data_Out
n bits
WE
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
13
As far as storage elements are concerned, we will need a N-bit register that is similar to the D flip-flop I showed you in class.
The significant difference here is that the register will have a Write Enable input.
That is the content of the register will NOT be updated if Write Enable is not asserted (0).
The content is updated at the clock tick ONLY if the Write Enable signal is asserted (1).
+1 = 31 min. (Y:11)
Register File consists of 31 × 32-bit registers
BusA and BusB: 32-bit output busses for reading 2 registers
BusW: 32-bit input bus for writing a register when RegWrite is 1
Two registers read and one written in a cycle
Registers are selected by:
RA selects register to be read on BusA
RB selects register to be read on BusB
RW selects the register to be written
Clock input
The clock input is used ONLY during write operation
During read, register file behaves as a combinational logic block
RA or RB valid => BusA or BusB valid after access time
MIPS Register File
Register
File
RA
RB
BusA
RegWrite
BusB
RW
5
5
5
32
32
32
BusW
Clock
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
14
We will also need a register file that consists of 32 32-bit registers with two output busses (busA and busB) and one input bus.
The register specifiers Ra and Rb select the registers to put on busA and busB respectively.
When Write Enable is 1, the register specifier Rw selects the register to be written via busW.
In our simplified version of the register file, the write operation will occurs at the clock tick.
Keep in mind that the clock input is a factor ONLY during the write operation.
During read operation, the register file behaves as a combinational logic block.
That is if you put a valid value on Ra, then bus A will become valid after the register file’s access time.
Similarly if you put a valid value on Rb, bus B will become valid after the register file’s access time. In both cases (Ra and Rb), the clock input is not a factor.
+2 = 33 min. (Y:13)
Details of the Register File
BusA
BusB
“0”
“0”
RA
Decoder
5
RB
Decoder
5
R1
R2
R31
.
.
.
BusW
Decoder
RW
5
Clock
RegWrite
.
.
.
R0 is not used
32
32
32
32
32
32
32
32
32
Tri-state
buffers
WE
WE
WE
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Allow multiple sources to drive a single bus
Two Inputs:
Data_in
Enable (to enable output)
One Output: Data_out
If (Enable) Data_out = Data_in
else Data_out = High Impedance state (output is disconnected)
Tri-state buffers can be
used to build multiplexors
Tri-State Buffers
Data_in
Data_out
Enable
Data_0
Data_1
Output
Select
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Building a Multifunction ALU
0
1
2
3
0
1
2
3
Logic Unit
2
AND = 00
OR = 01
NOR = 10
XOR = 11
Logical
Operation
Shifter
2
SLL = 00
SRL = 00
SRA = 01
ROR = 11
Shift/Rotate
Operation
A
32
32
B
A
d
d
e
r
c0
32
32
ADD = 0
SUB = 1
Arithmetic
Operation
Shift = 00
SLT = 01
Arith = 10
Logic = 11
ALU
Selection
32
2
Shift Amount
ALU Result
5
sign
≠
zero
overflow
SLT: ALU does a SUB and check the sign and overflow
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Details of the Shifter
Implemented with multiplexers and wiring
Shift Operation can be: SLL, SRL, SRA, or ROR
Input Data is extended to 63 bits according to Shift Op
The 63 bits are shifted right according to S4S3S2S1S0
S0
32
31
31
1
31
1
split
33
1
1
S1
31
2
31
2
split
35
2
31
2
S2
4
31
4
31
4
split
39
4
31
S3
8
31
8
31
8
split
47
8
31
S4
16
31
16
31
16
0
1
mux
split
63
16
31
Shift Right
0 or 16 bits
Shift Right
0 or 8 bits
Shift Right
0 or 4 bits
Shift Right
0 or 2 bits
Shift Right
0 or 1 bit
0
1
mux
0
1
mux
0
1
mux
0
1
mux
Extender
32
Shift
op
2
Data
Data_out
5
sa
SLL
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Details of the Shifter – cont’d
Input data is extended from 32 to 63 bits as follows:
If shift op = SRL then ext_data[62:0] = 031 || data[31:0]
If shift op = SRA then ext_data[62:0] = data[31]31 || data[31:0]
If shift op = ROR then ext_data[62:0] = data[30:0] || data[31:0]
If shift op = SLL then ext_data[62:0] = data[31:0] || 031
For SRL, the 32-bit input data is zero-extended to 63 bits
For SRA, the 32-bit input data is sign-extended to 63 bits
For ROR, 31-bit extension = lower 31 bits of data
Then, shift right according to the shift amount
As the extended data is shifted right, the upper bits will be: 0 (SRL), sign-bit (SRA), or lower bits of data (ROR)
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
19
Implementing Shift Left Logical
The wiring of the above shifter dictates a right shift
However, we can convert a left shift into a right shift
For SLL, 31 zeros are appended to the right of data
To shift left by 0 is equivalent to shifting right by 31
To shift left by 1 is equivalent to shifting right by 30
To shift left by 31 is equivalent to shifting right by 0
Therefore, for SLL use the 1’s complement of the shift amount
ROL is equivalent to ROR if we use (32 – rotate amount)
ROL by 10 bits is equivalent to ROR by (32–10) = 22 bits
Therefore, software can convert ROL to ROR
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Instruction and Data Memories
Instruction memory needs only provide read access
Because datapath does not write instructions
Behaves as combinational logic for read
Address selects Instruction after access time
Data Memory is used for load and store
MemRead: enables output on Data_out
Address selects the word to put on Data_out
MemWrite: enables writing of Data_in
Address selects the memory word to be written
The Clock synchronizes the write operation
Separate instruction and data memories
Later, we will replace them with caches
MemWrite
MemRead
Data
Memory
Address
Data_in
Data_out
32
32
32
Clock
32
Address
Instruction
Instruction
Memory
32
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
21
The last storage element you will need for the datapath is the idealized memory to store your data and instructions.
This idealized memory block has just one input bus (DataIn) and one output bus (DataOut).
When Write Enable is 0, the address selects the memory word to put on the Data Out bus.
When Write Enable is 1, the address selects the memory word to be written via the DataIn bus at the next clock tick.
Once again, the clock input is a factor ONLY during the write operation.
During read operation, it behaves as a combinational logic block.
That is if you put a valid value on the address lines, the output bus DataOut will become valid after the access time of the memory.
+2 = 35 min. (Y:15)
Clocking Methodology
Clocks are needed in a sequential logic to decide when a state element (register) should be updated
To ensure correctness, a clocking methodology defines when data can be written and read
Combinational logic
Register 1
Register 2
clock
rising edge
falling edge
We assume edge-triggered clocking
All state changes occur on the same clock edge
Data must be valid and stable before arrival of clock edge
Edge-triggered clocking allows a register to be read and written during same clock cycle
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Determining the Clock Cycle
With edge-triggered clocking, the clock cycle must be long enough to accommodate the path from one register through the combinational logic to another register
Tcycle ≥ Tclk-q + Tmax_comb + Ts
Combinational logic
Register 1
Register 2
clock
writing edge
Tclk-q
Tmax_comb
Ts
Th
Tclk-q : clock to output delay through register
Tmax_comb : longest delay through combinational logic
Ts : setup time that input to a register must be stable before arrival of clock edge
Th: hold time that input to a register must hold after arrival of clock edge
Hold time (Th) is normally satisfied since Tclk-q > Th
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Clock Skew
Clock skew arises because the clock signal uses different paths with slightly different delays to reach state elements
Clock skew is the difference in absolute time between when two storage elements see a clock edge
With a clock skew, the clock cycle time is increased
Clock skew is reduced by balancing the clock delays
Tcycle ≥ Tclk-q + Tmax_combinational + Tsetup+ Tskew
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Next . . .
Designing a Processor: Step-by-Step
Datapath Components and Clocking
Assembling an Adequate Datapath
Controlling the Execution of Instructions
Main, ALU, and PC Control
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
25
Here is an outline of today’’s lecture.
Mainly, we will be building a datapath step by step for a subset of the MIPS instruction set.
+1 = 4 min. (X:44)
We can now assemble the datapath from its components
For instruction fetching, we need …
Program Counter (PC) register
Instruction Memory
Adder for incrementing PC
Instruction Fetching Datapath
The least significant 2 bits of the PC are ‘00’ since PC is a multiple of 4
Datapath does not handle branch or jump instructions
PC
32
Address
Instruction
Instruction
Memory
32
32
32
4
A
d
d
next PC
clk
Improved datapath increments upper 30 bits of PC by 1
32
Address
Instruction
Instruction
Memory
32
30
PC
00
+1
30
Improved
Datapath
next PC
clk
00
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
26
Datapath for R-type Instructions
Control signals
ALUOp is the ALU operation as defined in the funct field for R-type
RegWr is used to enable the writing of the ALU result
Op6
Rs5
Rt5
Rd5
funct6
sa5
ALUOp
RegWr
A
L
U
32
32
ALU result
32
Rs and Rt fields select two registers to read. Rd field selects register to write
BusA & BusB provide data input to ALU. ALU result is connected to BusW
32
Address
Instruction
Instruction
Memory
32
30
PC
00
+1
30
Registers
RA
RB
BusA
BusB
RW
BusW
5
Rs
5
Rt
5
Rd
clk
Same clock updates PC and Rd register
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Datapath for I-type ALU Instructions
Control signals
ALUOp is derived from the Op field for I-type instructions
RegWr is used to enable the writing of the ALU result
ExtOp is used to control the extension of the 16-bit immediate
Op6
Rs5
Rt5
immediate16
ALUOp
RegWr
32
Address
Instruction
Instruction
Memory
32
30
PC
00
+1
30
5
Registers
RA
RB
BusA
BusB
RW
BusW
5
Rs
5
Rt
ExtOp
32
32
ALU result
32
32
A
L
U
Extender
Imm16
Second ALU input comes from the extended immediate. RB and BusB are not used
Same clock edge updates PC and Rt
Rt selects register to write, not Rd
clk
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Combining R-type & I-type Datapaths
Control signals
ALUOp is derived from either the Op or the funct field
RegWr enables the writing of the ALU result
ExtOp controls the extension of the 16-bit immediate
RegDst selects the register destination as either Rt or Rd
ALUSrc selects the 2nd ALU source as BusB or extended immediate
A mux selects RW as either Rt or Rd
Another mux selects 2nd ALU input as either data on BusB or the extended immediate
ALUOp
RegWr
ExtOp
A
L
U
ALU result
32
32
Registers
RA
RB
BusA
BusB
RW
5
32
BusW
32
Address
Instruction
Instruction
Memory
32
30
PC
00
+1
30
Rs
5
Rd
Extender
Imm16
Rt
32
RegDst
ALUSrc
0
1
clk
0
1
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Controlling ALU Instructions
For R-type ALU instructions, RegDst is ‘1’ to select Rd on RW and ALUSrc is ‘0’ to select BusB as second ALU input. The active part of datapath is shown in green
For I-type ALU instructions, RegDst is ‘0’ to select Rt on RW and ALUSrc is ‘1’ to select Extended immediate as second ALU input. The active part of datapath is shown in green
A
L
U
ALUOp
ALU result
32
32
Registers
RA
RB
BusA
RegWr = 1
BusB
RW
5
32
BusW
32
Address
Instruction
Instruction
Memory
32
30
PC
00
+1
30
Rs
5
Rd
Extender
ExtOp
Imm16
Rt
0
1
0
1
RegDst = 1
ALUSrc = 0
clk
clk
A
L
U
ALUOp
ALU result
32
32
Registers
RA
RB
BusA
RegWr = 1
BusB
RW
5
32
BusW
32
Address
Instruction
Instruction
Memory
32
30
PC
00
+1
30
Rs
5
Rd
Extender
ExtOp
Imm16
Rt
32
0
1
0
1
RegDst = 0
ALUSrc = 1
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Details of the Extender
Two types of extensions
Zero-extension for unsigned constants
Sign-extension for signed constants
Control signal ExtOp indicates type of extension
Extender Implementation: wiring and one AND gate
ExtOp = 0 Upper16 = 0
ExtOp = 1
Upper16 = sign bit
.
.
.
ExtOp
Upper
16 bits
Lower
16 bits
.
.
.
Imm16
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Additional Control signals
MemRd for load instructions
MemWr for store instructions
WBdata selects data on BusW as ALU result or Memory Data_out
BusB is connected to Data_in of Data Memory for store instructions
Adding Data Memory to Datapath
A data memory is added for load and store instructions
A 3rd mux selects data on BusW as either ALU result or memory data_out
Data
Memory
Address
Data_in
Data_out
32
32
A
L
U
ALUOp
32
Registers
RA
RB
BusA
Reg
Wr
BusB
RW
5
BusW
32
Address
Instruction
Instruction
Memory
32
30
PC
00
+1
30
Rs
5
Rd
E
ExtOp
Imm16
Rt
0
1
RegDst
ALUSrc
0
1
32
MemRd
MemWr
32
ALU result
32
0
1
WBdata
ALU calculates data memory address
clk
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Controlling the Execution of Load
ALUOp
= ADD
RegWr
= 1
ExtOp = 1
32
Data
Memory
Address
Data_in
Data_out
32
A
L
U
Registers
RA
RB
BusA
BusB
RW
5
BusW
32
Address
Instruction
Instruction
Memory
32
30
PC
00
+1
30
Rs
5
Rd
E
Imm16
Rt
0
1
0
1
32
ALU result
32
0
1
32
32
ALUOp = ‘ADD’ to calculate data memory address as Reg(Rs) + sign-extend(Imm16)
ALUSrc = ‘1’ selects extended immediate as second ALU input
MemRd = ‘1’ to read data memory
RegDst = ‘0’ selects Rt as destination register
RegWr = ‘1’ to enable writing of register file
WBdata = ‘1’ places the data read from memory on BusW
ExtOp = 1 to sign-extend Immmediate16 to 32 bits
Clock edge updates PC and Register Rt
RegDst
= 0
ALUSrc
= 1
WBdata
= 1
MemRd
= 1
MemWr
= 0
clk
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Controlling the Execution of Store
ALUOp
= ADD
RegWr
= 0
ExtOp = 1
32
Data
Memory
Address
Data_in
Data_out
32
A
L
U
Registers
RA
RB
BusA
BusB
RW
5
BusW
32
Address
Instruction
Instruction
Memory
32
30
PC
00
+1
30
Rs
5
Rd
E
Imm16
Rt
0
1
0
1
32
ALU result
32
0
1
32
32
ALUOp = ‘ADD’ to calculate data memory address as Reg(Rs) + sign-extend(Imm16)
ALUSrc = ‘1’ selects extended immediate as second ALU input
MemWr = ‘1’ to write data memory
RegDst = ‘X’ because no register is written
RegWr = ‘0’ to disable writing of register file
WBdata = ‘X’ because don’t care what data is put on BusW
ExtOp = 1 to sign-extend Immmediate16 to 32 bits
Clock edge updates PC and Data Memory
RegDst
= X
ALUSrc
= 1
WBdata
= X
MemRd
= 0
MemWr
= 1
clk
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Op
Branch Target Address
ALU
Op
RegWr
A
L
U
Address
Instruction
Instruction
Memory
Rs
Rd
E
Rt
Jump Target = PC[31:28] ‖ Imm26
ALU result
clk
PC
00
Data
Memory
Address
Data_in
Data_out
Registers
RA
RB
BusA
BusB
RW
BusW
+1
Mem
Rd
Mem
Wr
WB
data
1
0
Imm16
Next PC Address
0
1
1
0
ALU
Src
Reg
Dst
New adder for computing branch target address
Adding Jump and Branch to Datapath
Zero
PCSrc
2
1
0
+
Additional Control Signals
PCSrc for PC control: 1 for a jump and 2 for a taken branch
Zero flag for branch control: whether branch is taken or not
Adding a mux at the PC input
ExtOp
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Op
= J
Branch Target Address
ALU
Op
= X
Reg
Wr
= 0
A
L
U
Address
Instruction
Instruction
Memory
Rs
Rd
E
Rt
Jump Target = PC[31:28] ‖ Imm26
ALU result
clk
PC
00
Registers
RA
RB
BusA
BusB
RW
BusW
+1
Mem
Rd
= 0
Mem
Wr
= 0
WB
data
= X
1
0
Imm16
Next PC Address
0
1
1
0
ALU
Src
= X
Reg
Dst
= X
Controlling the Execution of a Jump
Zero = X
PCSrc = 1
2
1
0
+
Data
Memory
Address
Data_in
Data_out
ExtOp = X
MemRd = MemWr = RegWr = 0, Don’t care about other control signals
Clock edge updates PC register only
If (Opcode == J) then
PCSrc = 1 (Jump Target)
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Op
BEQ
Branch Target Address
ALU
Op
= SUB
Reg
Wr
= 0
A
L
U
Address
Instruction
Instruction
Memory
Rs
Rd
E
Rt
Jump Target = PC[31:28] ‖ Imm26
ALU result
clk
PC
00
Registers
RA
RB
BusA
BusB
RW
BusW
+1
Mem
Rd
= 0
Mem
Wr
= 0
WB
data
= X
1
0
Imm16
Next PC Address
0
1
1
0
ALU
Src
= 0
Reg
Dst
= X
Controlling the Execution of a Branch
Zero = 1
PCSrc = 2
2
1
0
+
Data
Memory
Address
Data_in
Data_out
ExtOp = 1
ALUSrc = 0, ALUOp = SUB, ExtOp = 1, MemRd = MemWr = RegWr = 0
Clock edge updates PC register only
If (Opcode == BEQ && Zero == 1) then PCSrc = 2 (Branch Target)
else PCSrc = 0 (Next PC)
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Next . . .
Designing a Processor: Step-by-Step
Datapath Components and Clocking
Assembling an Adequate Datapath
Controlling the Execution of Instructions
Main, ALU, and PC Control
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
38
Here is an outline of today’’s lecture.
Mainly, we will be building a datapath step by step for a subset of the MIPS instruction set.
+1 = 4 min. (X:44)
Main, ALU, and PC Control
Main Control Input
6-bit opcode field
Main Control Output
Main control signals
Datapath
32
Address
Instruction
Instruction
Memory
ALU Control Input
6-bit opcode field
6-bit function field
ALU Control Output
ALUOp signal for ALU
ALU
Control
Op6
RegDst
RegWr
ExtOp
ALUSrc
MemRd
MemWr
WBdata
Main
Control
PC
0
1
2
PC
Control
PC Control Input
6-bit opcode
ALU zero flag
PC Control Output
PCSrc signal
Op6
ALUOp
funct6
Zero
PCSrc
Zero
A
L
U
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Single-Cycle Datapath + Control
Main
Control
Op
Branch Target Address
A
L
U
Address
Instruction
Instruction
Memory
Rs
Rd
Ext
Rt
Jump Target = PC[31:28] ‖ Imm26
ALU result
clk
PC
00
Data
Memory
Address
Data_in
Data_out
Registers
RA
RB
BusA
BusB
RW
BusW
+1
1
0
Imm16
Next PC Address
0
1
1
0
+
0
1
2
ExtOp
RegDst
RegWr
WBdata
MemRd
MemWr
ALUSrc
ExtOp
Zero
ALU
Ctrl
ALUop
func
PC
Ctrl
PCSrc
Zero
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Signal Effect when ‘0’ Effect when ‘1’
RegDst Destination register = Rt Destination register = Rd
RegWr No register is written Destination register (Rt or Rd) is written with the data on BusW
ExtOp 16-bit immediate is zero-extended 16-bit immediate is sign-extended
ALUSrc Second ALU operand is the value of register Rt that appears on BusB Second ALU operand is the value of the extended 16-bit immediate
MemRd Data memory is NOT read Data memory is read
Data_out ← Memory[address]
MemWr Data Memory is NOT written Data memory is written
Memory[address] ← Data_in
WBdata BusW = ALU result BusW = Data_out from Memory
Main Control Signals
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
41
Main Control Truth Table
Op RegDst RegWr ExtOp ALUSrc MemRd MemWr WBdata
R-type 1 = Rd 1 X 0 = BusB 0 0 0 = ALU
ADDI 0 = Rt 1 1 = sign 1 = Imm 0 0 0 = ALU
SLTI 0 = Rt 1 1 = sign 1 = Imm 0 0 0 = ALU
ANDI 0 = Rt 1 0 = zero 1 = Imm 0 0 0 = ALU
ORI 0 = Rt 1 0 = zero 1 = Imm 0 0 0 = ALU
XORI 0 = Rt 1 0 = zero 1 = Imm 0 0 0 = ALU
LW 0 = Rt 1 1 = sign 1 = Imm 1 0 1 = Mem
SW X 0 1 = sign 1 = Imm 0 1 X
BEQ X 0 1 = sign 0 = BusB 0 0 X
BNE X 0 1 = sign 0 = BusB 0 0 X
J X 0 X X 0 0 X
X is a don’t care (can be 0 or 1), used to minimize logic
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
RegDst = R-type
RegWrite = (SW + BEQ + BNE + J)
ExtOp = (ANDI + ORI + XORI)
ALUSrc = (R-type + BEQ + BNE)
MemRd = LW
MemWr = SW
WBdata = LW
Logic Equations for Main Control Signals
Op6
R-type
ADDI
SLTI
ANDI
ORI
XORI
LW
SW
BEQ
BNE
RegDst
RegWr
ExtOp
ALUSrc
MemRd
WBdata
MemWr
Logic Equations
J
Decoder
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
ALU Control Truth Table
Op funct ALUOp 4-bit Coding
R-type AND AND 0001
R-type OR OR 0010
R-type XOR XOR 0011
R-type ADD ADD 0100
R-type SUB SUB 0101
R-type SLT SLT 0110
ADDI X ADD 0100
SLTI X SLT 0110
ANDI X AND 0001
ORI X OR 0010
XORI X XOR 0011
LW X ADD 0100
SW X ADD 0100
BEQ X SUB 0101
BNE X SUB 0101
J X X X
Other bit-coding can be used. The goal is to simplify the ALU control.
The 4-bit Coding defines the binary ALU operations.
Logic equations are derived for the 4-bit coding.
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
PC Control Truth Table
Op Zero flag PCSrc
R-type X 0 = Increment PC
J X 1 = Jump Target Address
BEQ 0 0 = Increment PC
BEQ 1 2 = Branch Target Address
BNE 0 2 = Branch Target Address
BNE 1 0 = Increment PC
Other than Jump or Branch X 0 = Increment PC
The ALU Zero flag is used by BEQ and BNE instructions
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
PC Control Logic
The PC control logic can be described as follows:
if (Op == J) PCSrc = 1;
else if ((Op == BEQ && Zero == 1) ||
(Op == BNE && Zero == 0)) PCSrc = 2;
else PCSrc = 0;
Branch = (BEQ . Zero) + (BNE . Zero)
Branch = 1, Jump = 0 PCSrc = 2
Branch = 0, Jump = 1 PCSrc = 1
Branch = 0, Jump = 0 PCSrc = 0
Branch
Op
BEQ
BNE
Decoder
J
Jump
Zero
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
Summary
5 steps to design a processor
Analyze instruction set => datapath requirements
Select datapath components & establish clocking methodology
Assemble datapath meeting the requirements
Analyze implementation of each instruction to determine control signals
Assemble the control logic
MIPS makes Control easier
Instructions are of the same size
Source registers always in the same place
Immediate constants are of same size and same location
Operations are always on registers/immediates
Single Cycle Processor Design COE 301 – Computer Organization © Muhamed Mudawar – slide ‹#›
/docProps/thumbnail.jpeg