Control Unit
Modern Design (1/3)
Modern design is composed of (1) Datapath and
Copyright By PowCoder代写 加微信 powcoder
Control inputs
Datapath inputs
(2) Controller (control unit or control path)
Control signals
Status signals
Control unit
Control outputs
Datapath outputs
High-level block diagram
Modern Design (2/3) Register-transfer-level block diagram
: Combinational circuit
: Sequential circuit
Control Unit
Modern Design (3/3)
An synthesis example of case statement
When 1=> X:= X+2;
A:= X+5; When 2=> A:= X+3;
When others => A:= X+W; end case;
(a) HDL description
(b) Control-flow representation (c) Data-flow representation
X := X+2; A := X+5;
Summation Problem (1/4) Calculate S= x1+ x2 + x3 + x4 + x5 with a ASIC chip
1. Sum up five inputs in the same period by using 4 adders
Five inputs must be ready at the same time. Why and How ?
a. How many input pins and output pins ? c. How fast you need ?
b. What is the resolutions of xi ? d. What is your design cost ?
Summation Problem (2/4) Calculate S= x1+ x2 + x3 + x4 + x5
2. Sum up five inputs in the different time units by using only 1 adders
Initial S= 0
Time unit _1 S<=S+ x1
Time unit _2 S<=S+ x2 Time unit _3 S<=S+ x3 Time unit _4 S<=S+ x4 Time unit _5 S<=S+ x5
Only one input must be ready at a time. Why?
Cost is lower and critical path is shorter than the Method_1 .
But its working rate is slower than Method_1. (5 clock cycles for 1 summation result)
Calculate S= x1+ x2 + x3 + x4 + . . . + x50 xi yi
Summation Problem (3/4)
Control S Unit S2 (send out 3
proper control signals)
If S1=0, a1=xi If S1=1, a1=yi
If S2=0 S3=0, r = a2
If S2=0 S3=1, r = a1+a2 If S2=1 S3=0, r = a1-a2
IfS2=1S3=1, r=a1
What is the length of
clock period ( )? Critical (longest) path delay
must come now (e.g., x
clock rate
r must be ready before the next positive edge comes
Accumulator (register)
S1 S2 S3resE
T_0 0 0 0 1 0 T_1 0 1 1 0 0
00100 .....
T_51 X 0 0 0 1
Summation Problem (4/4)
Control unit should send out proper control signals at each state. There are two ways to generate those control signals:
(1) Microgramming control
a. Store control signals of each state at memory (ROM) b. Read out the control signals one by one
S1 S2 S3resE
T_0 0 0 0 1 0 T_1 0 1 1 0 0
00100 .....
T_51 X 0 0 0 1
(2) Hardwired control
Use dedicate logic gates to generate the proper signals state by state (one by one)
Clock Period (1/4)
Gate: not and or xor Delay: 1ns 2.4ns 2.4 ns 4.2ns
longest path for stage 1 11.4 ns
16.6 ns R1(Registers,p-trigger) R2(p-trigger)longest path for stage 2R3(p-trigger)
16.6 ns > 11.4ns
critical path=16.6 ns, so the clock period must be more than 16.6 ns (e.g., 18ns), why ?
New value for R1 is ready here New value for R2 is ready here
Correct result for stage 1 must be ready here before next p-edge Correct result for stage 2 must be ready here before next p-edge
need 16.6 ns
New value for R2 is ready here New value for R3 is ready here
need 11.4 ns
How to decide the clock period in a system?
Clock Period (2/4)
Combi- national Circuit
registers I
delay for C_I (10 ns)
Combi- national Circuit
Combi- national Circuit
Combi- national Circuit
IV registers
delay for C_II delay for C_III delay for C_IV (25 ns) (12 ns) (8 ns)
1. Find out
among combinational circuits C_I, C_II, C_III and C_IV. ).
the longest delay
2. The longest delay is named as the critical path (here is 25 ns
3. The clock period can be set as
little longer than the critical path, why?
4. clockfrequency 1 (here 1 1 40MHz) clockperiod 25ns 2510-9
Clock Period (3/4)
Combi- national Circuit
registers I
delay for C_I (10 ns)
Combi- national Circuit
Combi- national Circuit
(12 ns) (8 ns)
Separate C_II into two parts (C_II_a and C_II_b) by inserting proper registers to achieve faster clock frequency
IV registers delay for C_III delay for C_IV
delay for C_II (25 ns)
Combinational Circuit II
inserted registers registers
delay for C_II_a delay for C_II_b
Now, clock frequency is 71.4 MHz
Clock Period (4/4)
Better HDL style
data1 data2
Separating combinational and sequential circuits
Combinational Logic
Sequential Logic
module EXAMPLE(data1,data2,clk,q); input data1, data2, clk;
reg data,q;
always @(data1 or data2) data = com(data1,data2);
always @(posedge clk) q <= data;
Sequential Logic
Combinational Logic
Design for Summation Problem (1/7)
Calculate S= x1+ x2 + x3 + x4 + x5
module adder1(x1, x2, x3, x4, x5, out); input x1, x2, x3, x4, x5;
output [2:0] out;
reg [2:0] out;
or x2 or x3 or x4 or x5) out=(((x1+x2)+x3)+x4)+x5;
Four stages
Assume the adder’s delay is k ns
Unstable output (Delay is 4*kns)
Design for Summation Problem (2/7)
Three stages
module adder2(x1, x2, x3, x4, x5, out); input x1, x2, x3, x4, x5;
output [2:0] out;
reg [2:0] out;
or x2 or x3 or x4 or x5) out=((x1+x2)+(x3+x4))+x5;
(shorter delay)
kns kns kns
Unstable output (Delay is 3*kns -- less than Method_1)
Design for Summation Problem (3/7)
module adder3(x1, x2, x3, x4, x5, clk, out); x1 input x1, x2, x3, x4, x5, clk;
output [2:0] out; reg [2:0] out;
clk) out=((x1+x2)+(x3+x4))+x5;
kns kns kns
Stable output with register (3-bit flip-flop)
Delay is 3*kns+cns (reg assign delay)
Design for Summation Problem (4/7) Method_4
module adder4(clk, x1, x2, x3, x4, x1
input clk,x1, x2, x3, x4, x5;
output [2:0] out;
reg [2:0] out, temp1, temp2,temp3; clk)
begin x4 temp1<=(x1+x2)+x3;
temp2<=x4; temp3<=x5; x5 out<=temp1+temp2+temp3;
end endmodule
temp2 temp3
Delay is 2*kns+cns which is less than Method_1 (4kns), Method_2 (3kns) and Method_3 (3kns+cns)
So, this method can achieve the best (fastest) clock rate because its critical path is shortest. However, the correct out is generated after two clock cycles not just one (also named as datapath pipelining)
Design for Summation Problem (5/7)
1. Wire delay
2. Register assignment delay
Critical path is about 4kns
A correct output is generated every clock cycle
Event 123456 Completed
4k 8k 12k 16k 20k 24k
faster clock rate
kns kns kns
about 2.5 kns
about 2.5 kns
Critical path is about 2.5 kns, why?
A correct output is generated after two clock cycles
Event 123456
time 5k 7.5k 10k 12.5k 15k 17.5k
Two events are parallel processed in the unit. Faster clock rate but higher cost (3 extra regs)
temp2 temp3
Design for Summation Problem (6/7)
Critical path is about 4kns
kns kns kns kns
Event 1 Event 2 Event 3 Event 4 Event 5
Event 1 Event 2 Event 3 Event 4 Event 5
about 2.5 kns
about 2.5 kns
temp2 temp3
Critical path is about 2.5 kns
Design for Summation Problem (7/7)
temp2 temp3
about 2.5 kns
Critical path is about 2.5 kns
about 3.5 kns Critical path is about 3.5 kns
about 2.5 kns
about 1.5 kns
Which one is better ? Balance is important
Clock Skew Problem
Clock Skew:時脈偏移 或 時脈歪斜 一般來說,在同步循序(synchronous
sequential)電路中,各正反器的 clock delay
是接在一起的,所以各正反器應該是 在同一個瞬間,同步改變狀態。但是 實際上,因為佈線、繞線等延遲的問 題導致各正反器之 clock 有時脈偏移 (clock skew,各正反器之clock沒有 clock 在同一瞬間 high、low 變化),可能 會造成電路動作不正常。 通常需要特別處理,盡量讓每個正反 器在同一瞬間一起動作。
Optimization for RTL Design
Control inputs
Datapath inputs
Control signals
Status signal
Control unit
Control output
Optimization for control unit:
Datapath output
Optimization for datapath:
1. Resource optimization 2. Time optimization
1. As suggestion by most textbooks of “Logic System Design”
2. Write a good-style HDL descriptions which are optimized by EDA tools
Optimization for Control Unit
Traditional Optimization Flow for Control Unit
Design description or timing diagram
Control Unit
Finite State Machine (FSM)
Develop state diagram
Develop next-state and output tables
Derive excitation equation
Optimize logic circuit
Minimize states
Derive logic schematic and timing diagram
Encode input, states, and outputs
Simulation
Decide the memory elements
Functional verification and timing analysis
Finite State Machine (1/4)
Moore machine: S O (output is dependent only on current state) Mealy machine: S I O (output is dependent on input and state)
State diagram
Four states: S0, S1, S2, S3
Input/Output
Next-state and output tables (I=input)
Present State
Next State
A mealy machine
1/0 0/1 Initial state Input/Output
Finite State Machine (2/4)
Next State Logic (combinational)
Current State Register (sequential)
Output Logic (combinational)
SIO inputs
asynchronous reset
Mealy output
asynchronous reset
Moore Machine (state-based machine)
Moore output
Next State Logic (combinational)
Current State Register (sequential)
Output Logic (combinational)
Mealy Machine (input-based machine)
Finite State Machine (3/4)
For best legibility, describe FSM using two or three always@ statements
(1) current state or state register (sequential circuit) (2) next state logic (combinational circuit)
(3) output logic (combinational circuit)
Two combinational logic can be merged Use parameter to describe the state name
Control=0 ST1 Y=2
Finite State Machine (4/4)
module FSM(Clock, Reset, Control, Y) input Clock, Reset, Control;
output [2:0] Y;
always @(Control or Currentstate) begin
reg [1:0] CurrentState, Nextstate; reg [2:0] Y;
parameter [1:0] ST0 = 2’b00, ST1 = 2’b01,
Next state logic (Comb.C.)
NextState = ST0; case (CurrentState)
ST0: NextState <= ST1; ST1: if (Control)
NextState <= ST2; else
NextState <= ST3;
ST2: NextState <= ST3;
ST3: NextState <= ST0; endcase end
State name (parameter)
ST2 = 2’b10, ST3 = 2’b11;
always @(posedge Clock or posedge Reset) if (Reset)
always @(CurrentState) begin
case(CurrentState) ST0:Y<=1; ST1:Y<=2;
State register
CurrentState <= ST0; else
CurrentState <= NextState;
Output logic (Comb.C.)
ST2: Y <= 3; endcase
ST3: Y <= 4;
end endmodule
Moore Machine (1/8)
Design description or timing diagram
Develop state diagram
Derive excitation equation
Develop next-state and output tables
Optimize logic circuit
Minimize states
Derive logic schematic and timing diagram
Encode input, states, and outputs
Simulation
Decide the memory elements
Functional verification and timing analysis
Optimization flow
SO S:stateO:output Next-state and output tables (I=input)
Present State
Next State
0/1 S0 1/0
S1 1/1 S2 0/0
original state table
Assume that we use JK flip-flops for storage
Moore Machine (2/8)
Present State
Next State
Present State
need 2 flip-flops (named M and N)
characteristic table
excitation table
Next State
X MJ=I 0 X
Moore Machine (3/8)
00 01 11 10 X 00 01 11 10
111XX 10X1X
00 01 11 10 00 1XX10
00 01 11 10 01 10101
Next state logic
00 01 11 10 0X 1X10X
Output=M’N+MN’
State register
Q Output logic
Moore Machine (4/8) Synthesis Result
Next state logic
JState register Q
module JK_FF(Clk, J, K, Q);
Moore Machine (5/8)
output Q, Q_Bar;
reg Q, Q_Bar; always @(posedge Clk) begin
case({J,K}) 2'b00:
Q=Q; 2'b01:
Q=0; 2'b10:
Q=1; 2'b11:
Q=~Q; endcase
end endmodule
Output logic
module moore_JK(Clk, I, Out_Data); input Clk, I; output Out_Data;
wire temp1, temp2, temp3, temp4, temp5, temp6;
assign temp1 = I & temp5; assign temp4 = I & temp2; assign Out_Data = (temp3 &
temp5) | (temp2 & temp6); JK_FF M(Clk, I, temp1, temp2,
JK_FF N(Clk, temp4, temp3, temp5,
temp6); endmodule
Clk, J, K;
Implement the circuit with structural HDL
Moore Machine-Bad Example (6/8)
The better way is to write behavioral HDL directly
the whole optimization job (including Karnaugh Map and logic minimization)
module moore_bad(Clk, Reset, In_Data, Out_Data); input Clk, Reset, In_Data; output [1:0] Out_Data;
reg [1:0] Out_Data; reg [1:0] State; parameter S0=2'b00, S1=2'b01, S2=2'b11, S3=2'b10;
always @(posedge Clk) begin
if(Reset) State=S0;
else begin case(State)
Out_Data = 0;
if(In_Data == 1) State = S2;
State = S0;
end S1: begin
Out_Data = 1; if(In_Data == 1)
S2: begin Out_Data = 1;
if(In_Data == 1) State = S3;
State = S2;
end S3: begin
Out_Data = 0; if(In_Data == 1)
State = S1; else
State = S3; end
and let the EDA tool do
State = S2; else
State = S0; end
endcase end
end endmodule
Both State and Out_Data are implemented with flip
Note: This is a bad
Moore Machine-Good Example (7/8)
module moore_good(Clk, Reset, In_Data, Out_Data);
input Clk, Reset, In_Data; output [1:0] Out_Data;
reg [1:0] Out_Data;
reg [1:0] State, NextState; parameter S0=2'b00, S1=2'b01,
S2=2'b10, S3=2'b11;
always @(In_Data or State) begin
case(State) S0: begin
if(In_Data == 1) NextState = S2;
NextState = S0;
end S1:begin
if(In_Data == 1) NextState = S2;
NextState = S0;
end S2: begin
if(In_Data == 1) NextState = S3;
NextState = S2;
if(In_Data == 1)
NextState = S1; else
NextState = S3; end
endcase end
Next state logic
always @(State) begin
case(State) S0:Out_Data = 0; S1:Out_Data = 1; S2:Out_Data = 1; S3:Out_Data = 0; endcase
end endmodule
Output logic
always @(posedge Clk or posedge Reset)
begin if(Reset)
State = S0; else
State = NextState; end
State register (flip
Note: This is a good
(only “State” is implemented with flip
Moore Machine-Good Example (8/8)
Bad Style Extra flip
flops are inferred
Good Style
Mealy Machine (1/2)
State diagram
Four states: S0, S1, S2, S3
Input/Output
Next-state and output tables (I=input)
Present State
Next State
1/0 0/1 Initial state Input/Output
Mealy Machine (2/2)
Please do remember to write your mealy machine by using the good-style HDL
: Combinational circuit
: Sequential circuit
Control unit (mealy machine)
Using three always statements
Homework: implement a mealy machine
Control-Unit Implementation Styles (1/3)
Hardwired Control
Control unit
Control-Unit Implementation Styles (2/3)
Hardwired Control
Control unit with state-register and decoder
Control-Unit Implementation Styles (3/3)
Microprogramming Control
Control unit with state-register and ROM
One’s Count Problem (1/2)
One’s – counter implementation
Problem:Using a datapath with a 3 port register-file (2 read port and 1 write port), design a one’s counter that count the number of ones in an input dataword, and return the result after completion
Data := Input Ocount := 0 Mask := 1
Data Mask Ocount Temp
while Data :≠ 0 repeat
Temp := Data AND Mask R3:
Ocount := Ocount + Temp Data := Data >> 1
Outport := Ocount
One’s count Problem (2/2)
8*m register file
S0 Start=1
S4 Data≠0 S5
S1 Data=Inport
S2 Ocount=0
Temp=Data AND Mask
Control Unit
control signals
S Mask=1 37
S6 Data=Data>>1 S Outport= = Ocount+Temp
Shift right
module input input input output wire
Datapath of One’s-Counter (1/4)
Optimized by EDA tool
data_path(clock,reset,control_word,inport,outport,data); clock,reset;
[19:0] control_word;
[7:0] inport;
[7:0] outport,data;
[7:0] line1,line2,line3,line4;
selector O1(.inp_A(inport), .inp_B(data), .select(control_word[19]), .outp(line1));
register NO2(.clock(clock), .reset(reset), .WA(control_word[17:15]), .WE(control_word[18]), .RAA(control_word[13:11]), .REA(control_word[14]), .RAB(control_word[9:7]), .REB(control_word[10]), .Data_in(line1), .Bus_A(line2), .Bus_B(line3));
alu NO3(.Datain_A(line2), .Datain_B(line3), .select(control_word[6:4]), .outp(line4)); shifter NO4(.inp(line4), .select(control_word[3:1]), .outp(data));
buffer NO5(.OE(control_word[0]), .inp(data), .outp(outport)); endmodule
module input input output reg
Datapath of One’s-Counter (2/4)
selector(inp_A,inp_B,select,outp);
[7:0] inp_A,inp_B; select;
[7:0] outp; [7:0] outp;
module alu(Datain_A,Datain_B,select,outp);
or inp_A or inp_B) begin
if(select)
outp = inp_A;
outp = inp_B;
end endmodule
or Datain_A or Datain_B) begin
case(select)
3’b000:outp = ~Datain_A; 3’b001:outp = Datain_A & Datain_B; 3’b010:outp = Datain_A ^ Datain_B; 3’b011:outp = Datain_A | Datain_B; 3’b100:outp = Datain_A – 1; 3’b101:outp = Datain_A + Datain_B; 3’b110:outp = Datain_A – Datain_B; 3’b111:outp = Datain_A + 1;
end endmodule
input input output
[7:0] Datain_A,Datain_B; [2:0] select;
[7:0] outp; reg [7:0] outp;
Datapath of One’s-Counter (3/4)
module input input output reg
shifter(inp,select,outp); [7:0] inp;
[2:0] select;
temp = inp[0]; outp = inp >> 1;
outp[7] = temp; end
default: outp=8’hxx; endcase
end endmodule
[7:0] outp; [7:0] outp; temp;
or inp) begin
case(select)
3’b000:outp = inp; 3’b001:outp = inp; 3’b100:outp = inp << 1; 3'b101:
temp = inp[7]; outp = inp << 1; outp[0] = temp;
3'b110:outp = inp >> 1;
buffer(OE,inp,outp);
or inp) begin
outp = inp;
else outp=8’bz; end endmodule
[7:0] inp;
[7:0] outp; [7:0] outp;
Datapath of One’s-Counter (4/4)
module register(clock,reset,WA,WE,RAA, REA,RAB,REB,Data_in,Bus_A,Bus_B); input clock,reset,WE,REA,REB;
input [2:0] WA,RAA,RAB;
input [7:0] Data_in; output [7:0] Bus_A,Bus_B; reg [7:0] reg_array [7:0];
clock) begin
reg_array[0]=8’h00; reg_array[1]=8’h00; reg_array[2]=8’h00; reg_array[3]=8’h00; reg_array[4]=8’h00; reg_arra
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com