代写代考 XX 10X1X

Control Unit

Modern Design (1/3)
Modern design is composed of (1) Datapath and

Copyright By PowCoder代写 加微信 powcoder

Control inputs
Datapath inputs
(2) Controller (control unit or control path)
Control signals
Status signals
Control unit
Control outputs
Datapath outputs
High-level block diagram

Modern Design (2/3) Register-transfer-level block diagram
: Combinational circuit
: Sequential circuit
Control Unit

Modern Design (3/3)

An synthesis example of case statement
When 1=> X:= X+2;
A:= X+5; When 2=> A:= X+3;
When others => A:= X+W; end case;
(a) HDL description
(b) Control-flow representation (c) Data-flow representation
X := X+2; A := X+5;

Summation Problem (1/4) Calculate S= x1+ x2 + x3 + x4 + x5 with a ASIC chip
1. Sum up five inputs in the same period by using 4 adders
Five inputs must be ready at the same time. Why and How ?
a. How many input pins and output pins ? c. How fast you need ?
b. What is the resolutions of xi ? d. What is your design cost ?

Summation Problem (2/4) Calculate S= x1+ x2 + x3 + x4 + x5
2. Sum up five inputs in the different time units by using only 1 adders
Initial S= 0
Time unit _1 S<=S+ x1 Time unit _2 S<=S+ x2 Time unit _3 S<=S+ x3 Time unit _4 S<=S+ x4 Time unit _5 S<=S+ x5 Only one input must be ready at a time. Why? Cost is lower and critical path is shorter than the Method_1 . But its working rate is slower than Method_1. (5 clock cycles for 1 summation result) Calculate S= x1+ x2 + x3 + x4 + . . . + x50 xi yi Summation Problem (3/4) Control S Unit S2 (send out 3 proper control signals) If S1=0, a1=xi If S1=1, a1=yi If S2=0 S3=0, r = a2 If S2=0 S3=1, r = a1+a2 If S2=1 S3=0, r = a1-a2 IfS2=1S3=1, r=a1 What is the length of clock period ( )? Critical (longest) path delay must come now (e.g., x clock rate r must be ready before the next positive edge comes Accumulator (register) S1 S2 S3resE T_0 0 0 0 1 0 T_1 0 1 1 0 0 00100 ..... T_51 X 0 0 0 1 Summation Problem (4/4) Control unit should send out proper control signals at each state. There are two ways to generate those control signals: (1) Microgramming control a. Store control signals of each state at memory (ROM) b. Read out the control signals one by one S1 S2 S3resE T_0 0 0 0 1 0 T_1 0 1 1 0 0 00100 ..... T_51 X 0 0 0 1 (2) Hardwired control Use dedicate logic gates to generate the proper signals state by state (one by one) Clock Period (1/4) Gate: not and or xor Delay: 1ns 2.4ns 2.4 ns 4.2ns longest path for stage 1 11.4 ns 16.6 ns R1(Registers,p-trigger) R2(p-trigger)longest path for stage 2R3(p-trigger) 16.6 ns > 11.4ns
critical path=16.6 ns, so the clock period must be more than 16.6 ns (e.g., 18ns), why ?
New value for R1 is ready here New value for R2 is ready here
Correct result for stage 1 must be ready here before next p-edge Correct result for stage 2 must be ready here before next p-edge
need 16.6 ns
New value for R2 is ready here New value for R3 is ready here
need 11.4 ns

How to decide the clock period in a system?
Clock Period (2/4)
Combi- national Circuit
registers I
delay for C_I (10 ns)
Combi- national Circuit
Combi- national Circuit
Combi- national Circuit
IV registers
delay for C_II delay for C_III delay for C_IV (25 ns) (12 ns) (8 ns)
1. Find out
among combinational circuits C_I, C_II, C_III and C_IV. ).
the longest delay
2. The longest delay is named as the critical path (here is 25 ns
3. The clock period can be set as
little longer than the critical path, why?
4. clockfrequency 1 (here 1  1 40MHz) clockperiod 25ns 2510-9

Clock Period (3/4)
Combi- national Circuit
registers I
delay for C_I (10 ns)
Combi- national Circuit
Combi- national Circuit
(12 ns) (8 ns)
Separate C_II into two parts (C_II_a and C_II_b) by inserting proper registers to achieve faster clock frequency
IV registers delay for C_III delay for C_IV
delay for C_II (25 ns)
Combinational Circuit II
inserted registers registers
delay for C_II_a delay for C_II_b
Now, clock frequency is 71.4 MHz

Clock Period (4/4)
Better HDL style
data1 data2
Separating combinational and sequential circuits
Combinational Logic
Sequential Logic
module EXAMPLE(data1,data2,clk,q); input data1, data2, clk;
reg data,q;
always @(data1 or data2) data = com(data1,data2);
always @(posedge clk) q <= data; Sequential Logic Combinational Logic Design for Summation Problem (1/7) Calculate S= x1+ x2 + x3 + x4 + x5 module adder1(x1, x2, x3, x4, x5, out); input x1, x2, x3, x4, x5; output [2:0] out; reg [2:0] out; or x2 or x3 or x4 or x5) out=(((x1+x2)+x3)+x4)+x5; Four stages Assume the adder’s delay is k ns Unstable output (Delay is 4*kns) Design for Summation Problem (2/7) Three stages module adder2(x1, x2, x3, x4, x5, out); input x1, x2, x3, x4, x5; output [2:0] out; reg [2:0] out; or x2 or x3 or x4 or x5) out=((x1+x2)+(x3+x4))+x5; (shorter delay) kns kns kns Unstable output (Delay is 3*kns -- less than Method_1) Design for Summation Problem (3/7) module adder3(x1, x2, x3, x4, x5, clk, out); x1 input x1, x2, x3, x4, x5, clk; output [2:0] out; reg [2:0] out; clk) out=((x1+x2)+(x3+x4))+x5; kns kns kns Stable output with register (3-bit flip-flop) Delay is 3*kns+cns (reg assign delay) Design for Summation Problem (4/7) Method_4 module adder4(clk, x1, x2, x3, x4, x1 input clk,x1, x2, x3, x4, x5; output [2:0] out; reg [2:0] out, temp1, temp2,temp3; clk) begin x4 temp1<=(x1+x2)+x3; temp2<=x4; temp3<=x5; x5 out<=temp1+temp2+temp3; end endmodule temp2 temp3 Delay is 2*kns+cns which is less than Method_1 (4kns), Method_2 (3kns) and Method_3 (3kns+cns) So, this method can achieve the best (fastest) clock rate because its critical path is shortest. However, the correct out is generated after two clock cycles not just one (also named as datapath pipelining) Design for Summation Problem (5/7) 1. Wire delay 2. Register assignment delay Critical path is about 4kns A correct output is generated every clock cycle Event 123456 Completed 4k 8k 12k 16k 20k 24k faster clock rate kns kns kns about 2.5 kns about 2.5 kns Critical path is about 2.5 kns, why? A correct output is generated after two clock cycles Event 123456 time 5k 7.5k 10k 12.5k 15k 17.5k Two events are parallel processed in the unit. Faster clock rate but higher cost (3 extra regs) temp2 temp3 Design for Summation Problem (6/7) Critical path is about 4kns kns kns kns kns Event 1 Event 2 Event 3 Event 4 Event 5 Event 1 Event 2 Event 3 Event 4 Event 5 about 2.5 kns about 2.5 kns temp2 temp3 Critical path is about 2.5 kns Design for Summation Problem (7/7) temp2 temp3 about 2.5 kns Critical path is about 2.5 kns about 3.5 kns Critical path is about 3.5 kns about 2.5 kns about 1.5 kns Which one is better ? Balance is important Clock Skew Problem Clock Skew:時脈偏移 或 時脈歪斜 一般來說,在同步循序(synchronous sequential)電路中,各正反器的 clock delay 是接在一起的,所以各正反器應該是 在同一個瞬間,同步改變狀態。但是 實際上,因為佈線、繞線等延遲的問 題導致各正反器之 clock 有時脈偏移 (clock skew,各正反器之clock沒有 clock 在同一瞬間 high、low 變化),可能 會造成電路動作不正常。 通常需要特別處理,盡量讓每個正反 器在同一瞬間一起動作。 Optimization for RTL Design Control inputs Datapath inputs Control signals Status signal Control unit Control output Optimization for control unit: Datapath output Optimization for datapath: 1. Resource optimization 2. Time optimization 1. As suggestion by most textbooks of “Logic System Design” 2. Write a good-style HDL descriptions which are optimized by EDA tools Optimization for Control Unit Traditional Optimization Flow for Control Unit Design description or timing diagram Control Unit Finite State Machine (FSM) Develop state diagram Develop next-state and output tables Derive excitation equation Optimize logic circuit Minimize states Derive logic schematic and timing diagram Encode input, states, and outputs Simulation Decide the memory elements Functional verification and timing analysis Finite State Machine (1/4) Moore machine: S  O (output is dependent only on current state) Mealy machine: S  I  O (output is dependent on input and state) State diagram Four states: S0, S1, S2, S3 Input/Output Next-state and output tables (I=input) Present State Next State A mealy machine 1/0 0/1 Initial state Input/Output Finite State Machine (2/4) Next State Logic (combinational) Current State Register (sequential) Output Logic (combinational) SIO inputs asynchronous reset Mealy output asynchronous reset Moore Machine (state-based machine) Moore output Next State Logic (combinational) Current State Register (sequential) Output Logic (combinational) Mealy Machine (input-based machine) Finite State Machine (3/4)  For best legibility, describe FSM using two or three always@ statements (1) current state or state register (sequential circuit) (2) next state logic (combinational circuit) (3) output logic (combinational circuit)  Two combinational logic can be merged  Use parameter to describe the state name Control=0 ST1 Y=2 Finite State Machine (4/4) module FSM(Clock, Reset, Control, Y) input Clock, Reset, Control; output [2:0] Y; always @(Control or Currentstate) begin reg [1:0] CurrentState, Nextstate; reg [2:0] Y; parameter [1:0] ST0 = 2’b00, ST1 = 2’b01, Next state logic (Comb.C.) NextState = ST0; case (CurrentState) ST0: NextState <= ST1; ST1: if (Control) NextState <= ST2; else NextState <= ST3; ST2: NextState <= ST3; ST3: NextState <= ST0; endcase end State name (parameter) ST2 = 2’b10, ST3 = 2’b11; always @(posedge Clock or posedge Reset) if (Reset) always @(CurrentState) begin case(CurrentState) ST0:Y<=1; ST1:Y<=2; State register CurrentState <= ST0; else CurrentState <= NextState; Output logic (Comb.C.) ST2: Y <= 3; endcase ST3: Y <= 4; end endmodule Moore Machine (1/8) Design description or timing diagram Develop state diagram Derive excitation equation Develop next-state and output tables Optimize logic circuit Minimize states Derive logic schematic and timing diagram Encode input, states, and outputs Simulation Decide the memory elements Functional verification and timing analysis Optimization flow SO S:stateO:output Next-state and output tables (I=input) Present State Next State 0/1 S0 1/0 S1 1/1 S2 0/0 original state table Assume that we use JK flip-flops for storage Moore Machine (2/8) Present State Next State Present State need 2 flip-flops (named M and N) characteristic table excitation table Next State X MJ=I 0 X Moore Machine (3/8) 00 01 11 10 X 00 01 11 10 111XX 10X1X 00 01 11 10 00 1XX10 00 01 11 10 01 10101 Next state logic 00 01 11 10 0X 1X10X Output=M’N+MN’ State register Q Output logic Moore Machine (4/8) Synthesis Result Next state logic JState register Q module JK_FF(Clk, J, K, Q); Moore Machine (5/8) output Q, Q_Bar; reg Q, Q_Bar; always @(posedge Clk) begin case({J,K}) 2'b00: Q=Q; 2'b01: Q=0; 2'b10: Q=1; 2'b11: Q=~Q; endcase end endmodule Output logic module moore_JK(Clk, I, Out_Data); input Clk, I; output Out_Data; wire temp1, temp2, temp3, temp4, temp5, temp6; assign temp1 = I & temp5; assign temp4 = I & temp2; assign Out_Data = (temp3 & temp5) | (temp2 & temp6); JK_FF M(Clk, I, temp1, temp2, JK_FF N(Clk, temp4, temp3, temp5, temp6); endmodule Clk, J, K; Implement the circuit with structural HDL Moore Machine-Bad Example (6/8) The better way is to write behavioral HDL directly the whole optimization job (including Karnaugh Map and logic minimization) module moore_bad(Clk, Reset, In_Data, Out_Data); input Clk, Reset, In_Data; output [1:0] Out_Data; reg [1:0] Out_Data; reg [1:0] State; parameter S0=2'b00, S1=2'b01, S2=2'b11, S3=2'b10; always @(posedge Clk) begin if(Reset) State=S0; else begin case(State) Out_Data = 0; if(In_Data == 1) State = S2; State = S0; end S1: begin Out_Data = 1; if(In_Data == 1) S2: begin Out_Data = 1; if(In_Data == 1) State = S3; State = S2; end S3: begin Out_Data = 0; if(In_Data == 1) State = S1; else State = S3; end and let the EDA tool do State = S2; else State = S0; end endcase end end endmodule Both State and Out_Data are implemented with flip Note: This is a bad Moore Machine-Good Example (7/8) module moore_good(Clk, Reset, In_Data, Out_Data); input Clk, Reset, In_Data; output [1:0] Out_Data; reg [1:0] Out_Data; reg [1:0] State, NextState; parameter S0=2'b00, S1=2'b01, S2=2'b10, S3=2'b11; always @(In_Data or State) begin case(State) S0: begin if(In_Data == 1) NextState = S2; NextState = S0; end S1:begin if(In_Data == 1) NextState = S2; NextState = S0; end S2: begin if(In_Data == 1) NextState = S3; NextState = S2; if(In_Data == 1) NextState = S1; else NextState = S3; end endcase end Next state logic always @(State) begin case(State) S0:Out_Data = 0; S1:Out_Data = 1; S2:Out_Data = 1; S3:Out_Data = 0; endcase end endmodule Output logic always @(posedge Clk or posedge Reset) begin if(Reset) State = S0; else State = NextState; end State register (flip Note: This is a good (only “State” is implemented with flip Moore Machine-Good Example (8/8) Bad Style Extra flip flops are inferred Good Style Mealy Machine (1/2) State diagram Four states: S0, S1, S2, S3 Input/Output Next-state and output tables (I=input) Present State Next State 1/0 0/1 Initial state Input/Output Mealy Machine (2/2) Please do remember to write your mealy machine by using the good-style HDL : Combinational circuit : Sequential circuit Control unit (mealy machine) Using three always statements Homework: implement a mealy machine Control-Unit Implementation Styles (1/3) Hardwired Control Control unit Control-Unit Implementation Styles (2/3) Hardwired Control Control unit with state-register and decoder Control-Unit Implementation Styles (3/3) Microprogramming Control Control unit with state-register and ROM One’s Count Problem (1/2) One’s – counter implementation Problem:Using a datapath with a 3 port register-file (2 read port and 1 write port), design a one’s counter that count the number of ones in an input dataword, and return the result after completion Data := Input Ocount := 0 Mask := 1 Data Mask Ocount Temp while Data :≠ 0 repeat Temp := Data AND Mask R3: Ocount := Ocount + Temp Data := Data >> 1
Outport := Ocount

One’s count Problem (2/2)
8*m register file
S0 Start=1
S4 Data≠0 S5
S1 Data=Inport
S2 Ocount=0
Temp=Data AND Mask
Control Unit
control signals
S Mask=1 37
S6 Data=Data>>1 S Outport= = Ocount+Temp
Shift right

module input input input output wire
Datapath of One’s-Counter (1/4)
Optimized by EDA tool
data_path(clock,reset,control_word,inport,outport,data); clock,reset;
[19:0] control_word;
[7:0] inport;
[7:0] outport,data;
[7:0] line1,line2,line3,line4;
selector O1(.inp_A(inport), .inp_B(data), .select(control_word[19]), .outp(line1));
register NO2(.clock(clock), .reset(reset), .WA(control_word[17:15]), .WE(control_word[18]), .RAA(control_word[13:11]), .REA(control_word[14]), .RAB(control_word[9:7]), .REB(control_word[10]), .Data_in(line1), .Bus_A(line2), .Bus_B(line3));
alu NO3(.Datain_A(line2), .Datain_B(line3), .select(control_word[6:4]), .outp(line4)); shifter NO4(.inp(line4), .select(control_word[3:1]), .outp(data));
buffer NO5(.OE(control_word[0]), .inp(data), .outp(outport)); endmodule

module input input output reg
Datapath of One’s-Counter (2/4)
selector(inp_A,inp_B,select,outp);
[7:0] inp_A,inp_B; select;
[7:0] outp; [7:0] outp;
module alu(Datain_A,Datain_B,select,outp);
or inp_A or inp_B) begin
if(select)
outp = inp_A;
outp = inp_B;
end endmodule
or Datain_A or Datain_B) begin
case(select)
3’b000:outp = ~Datain_A; 3’b001:outp = Datain_A & Datain_B; 3’b010:outp = Datain_A ^ Datain_B; 3’b011:outp = Datain_A | Datain_B; 3’b100:outp = Datain_A – 1; 3’b101:outp = Datain_A + Datain_B; 3’b110:outp = Datain_A – Datain_B; 3’b111:outp = Datain_A + 1;
end endmodule
input input output
[7:0] Datain_A,Datain_B; [2:0] select;
[7:0] outp; reg [7:0] outp;

Datapath of One’s-Counter (3/4)
module input input output reg
shifter(inp,select,outp); [7:0] inp;
[2:0] select;
temp = inp[0]; outp = inp >> 1;
outp[7] = temp; end
default: outp=8’hxx; endcase
end endmodule
[7:0] outp; [7:0] outp; temp;
or inp) begin
case(select)
3’b000:outp = inp; 3’b001:outp = inp; 3’b100:outp = inp << 1; 3'b101: temp = inp[7]; outp = inp << 1; outp[0] = temp; 3'b110:outp = inp >> 1;
buffer(OE,inp,outp);
or inp) begin
outp = inp;
else outp=8’bz; end endmodule
[7:0] inp;
[7:0] outp; [7:0] outp;

Datapath of One’s-Counter (4/4)
module register(clock,reset,WA,WE,RAA, REA,RAB,REB,Data_in,Bus_A,Bus_B); input clock,reset,WE,REA,REB;
input [2:0] WA,RAA,RAB;
input [7:0] Data_in; output [7:0] Bus_A,Bus_B; reg [7:0] reg_array [7:0];
clock) begin
reg_array[0]=8’h00; reg_array[1]=8’h00; reg_array[2]=8’h00; reg_array[3]=8’h00; reg_array[4]=8’h00; reg_arra

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com