The Really Simple RISC Computer (RSRC) V0.811
Copyright William D. Richard, Ph.D.
Updated November 17, 2020
INTRODUCTION
The intellectual development of the Really Simple RISC Computer (RSRC) began on the first day of CSE
260M (Introduction to Digital Logic and Computer Design) with the introduction of Boolean algebra.
Boolean algebra was used to develop the concepts associated with logic gates and circuits, including
optimal covering of logic functions using both two-level Sum-of-Product (SOP) and Product-of-Sum (POS)
circuits and covering using the LUT-based architectures found in modern FPGAs. During the first section
of the course, we investigated addition and subtraction and built a simple Arithmetic Logic Unit (ALU)
that performed these operations as well as bit-wise AND and OR operations on 32-bit words.
Laboratory 1, completed during the first section of the course, focused primarily on the Xilinx Vivado
tool flow. A simple combinational circuit was implemented using switch outputs as inputs and LEDs as
outputs. You were required to constrain the pins on your design and generate a bit file that could be
downloaded to a Xilnix Artix-7 FPGA on a development board for testing.
In the second section of the course, we developed the concept of a rising edge-triggered D-type Flip-
Flop (FF) from first principles starting with the simple SR latch. Flip-flops begat registers, and it was only
a small stretch to the concept of a pipelined difference engine based on registers and adders. The block
diagram of a difference engine capable of computing any fourth order polynomial is shown below in
Figure 1.
Figure 1. A difference engine capable of computing any fourth order polynomial F(x), x = 0, 1, 2, … , once
initialized with the proper difference coefficients.
Register Transfer Notation (RTN) was introduced to describe the behavior of the pipelined difference
engine. For the circuit shown above, the RTN would be:
F F + G : G G +H : H H + I : I I + J ;
In this RTN description, a colon is used to separate register transfers that happen at the same time, and
here four computations are done simultaneously. The semicolon is used to identify clock period
32
CLK
F(x)
I J + H + G + F +
boundaries. From this RTN description, you developed a VHDL model of the pipelined difference engine
and simulated it extensively in homework 4.
The concepts associated with Finite State Machines (FSMs) were introduced next and used to construct
sequence recognizers and similar “traditional” academic sequential machines in state diagram form. The
recommended method for describing FSMs in synthesizable form so they are properly identified as FSMs
by modern synthesis tools was covered in excruciating detail (only FSMs that are identified as FSMs by
the tools can be optimized via selection of the state encoding technique). VHDL was then used to
implement a sequence recognizer presented as an Internet packet sniffer searching for questionable
strings like “bomb.” The concept of state equivalence was introduced, and a tabular approach to state
minimization was developed from basic theorems on sequential machines.
Every digital system can be and should be broken into two portions: Datapath and control. The
separation of control from datapath functionality has long been recognized as the best approach to
digital system design, and the concept of “datapath refinement” is almost universally applied to simplify
the control portion of the design. This concept was introduced and discussed in class.
In “The Difference Engine,” the datapath for a bus-based difference engine was presented as an
alternative to the pipelined difference engine of homework 4. A block diagram of the datapath is shown
below in Figure 2. The datapath of the bus-based difference engine consists of a set of registers, labeled
F, G, H, I, and J, two additional registers (A and C), and an adder connected by a single bus as shown in
Figure 2. The “control” portion of the difference engine was presented as a simple 12-state FSM that
controls the register transfers needed to compute the functional values of the polynomial described by
the difference engine coefficients. The register transfers coordinated by the 12-state FSM are:
1) A F ;
2) C A + G ;
3) F C ;
4) A G ;
5) C A + H ;
6) G C ;
7) A H ;
8) C A + I ;
9) H C ;
10) A I ;
11) C A + J ;
12) I C ;
You were given a VHDL model of the datapath and the VHDL for the 12-state FSM required to
implement the bus-based difference engine. This VHDL implementation of the bus-based difference
engine was discussed extensively in class.
Figure 2. The datapath for the bus-based difference engine. Register transfers across the 32-bit bus are
controlled by a 12-state Mealy-model FSM.
The pipelined difference engine required four different adders and could compute polynomial
coefficients at least 12 times faster than the bus-based difference engine, producing one polynomial
value every clock. Later, we will come to realize that the bus-based difference engine is to the RSRC as
the RSRC described below is to a fully pipelined version of the same processor!
F
f_in
f_out
G
g_in
g_out
H
h_in
h_out
I
i_in
i_out
J
j_in
j_out
A a_in
ADDER
C c_in
c_out
32
THE REALLY SIMPLE RISC COMPUTER (RSRC)
The Really Simple RISC Computer (RSRC) is a stored-program computer based on a simple extension of
the bus-based difference engine. By adding a Program Counter (PC), an Instruction Register (IR), and
Memory Address (MA) and Memory Data (MD) registers to the bus-based difference engine, we can
construct the basic stored-program computer shown below in Figure 3. Here, the program counter holds
the address of the next instruction in memory to be executed, and the instruction register holds the
instruction after it has been fetched from memory during execution of the instruction. The MA register
holds the address of the instruction as it is being fetched from memory, and the “input” MD register
holds the instruction fetched from memory prior to the instruction being moved to the instruction
register. The “Input” MD register also holds data as it is being loaded from memory during execution of
the load instruction (discussed later), and the “output” MD register holds data that is being stored to
memory during the execution of the store instruction (discussed later).
In addition to adding PC, IR, MA, and MD registers, the RSRC expands the set of registers available from
five to thirty-two and uses an Arithmetic Logic Unit (ALU) instead of a simple adder. The ALU is capable
of performing add, subtract, AND, OR, increment by four, and shift operations as described in detail
later.
The stored-program concept used with the RSRC that is so ubiquitous today was not so obvious in the
early days of computing, and many well-written histories are available to the budding young computer
architect. Since history, as they say, tends to repeat itself, learning about the history of computing is
important to anyone seriously considering a career in computer engineering.
The concept of the “fetch-execute” cycle is one of the first things introduced in most books on computer
architecture because it is one of the most fundamental notions on which modern CPUs are based. Here
in the RSRC, there are three steps involved in the “fetch” portion of the fetch-execute cycle, and each
step takes a single clock cycle:
1) Move the value in PC to MA while simultaneously incrementing the PC value by four and storing
it in the C register;
2) Read the next instruction from memory into the “input” MD register while simultaneously
moving the incremented PC value back to the PC;
3) Move the instruction fetched from memory to the IR.
The register transfers associated with each of these three steps are:
1) MA PC : C PC +4 ;
2) MD M[MA] : PC C ;
3) IR MD ;
Here, we have introduced a new functionality to our RTN notation: M[MA] implies the contents of
memory at address MA, much like we notate the element of an array in many programming languages.
The control signals produced by the FSM to cause these register transfers to happen are:
1) PCout, MAin, INC4, Cin
2) Cout, PCin, READ, MDrd
3) MDout, IRin
Figure 3. The RSRC is a simple extension of the bus-based difference engine created by adding PC, IR,
MA, and MD registers and extending the register set from five to 32 registers.
C1_out
C2_out
m
32
pc_in
md_wr
md_out
md_rd
ma_in
r0_in
r1_in
r31_in
IR
ir_in
R0
r0_out
R1
r1_out
R31
r31_out
A a_in
A ALU B
(+ – AND OR SHIFT INC4)
C c_in
c_out
32
MA
md_bus
MD
MD
FSM
32
32
A<31..0> TO MEMORY
D<31..0> TO/FROM MEMORY
READ.H
WRITE.H
DONE.H
PC
pc_out
IR<31..0>
REG FILE
The PC value is incremented not by one but by four because RSRC instructions are four bytes wide, and,
almost universally (at least in the Personal Computer (PC) world), memory is considered to be byte-
addressable. So, RSRC instructions are four bytes or 32-bits wide, and common terminology is to say the
RSRC “word size” is 32-bits. These details will be discussed in considerable depth later, but for now it is
enough to know that instructions start in memory every four byte addresses, that is, at addresses 0, 4, 8,
12, etc.
While every instruction fetch takes three clock ticks (we computer engineers call clock cycles “ticks” in
reference to how an old mechanical clock would go tick, tick, tick, counting off the seconds of the day),
the instruction execution phase takes a variable number of clock ticks. An ADD instruction, for example,
takes three clock ticks to execute:
1) Move one operand from the source register in the register file to the A register;
2) Place the second operand on the bus and add the two operands, saving the sum in the C
register;
3) Move the sum back to the target register in the register file.
While the ADD instruction takes three clock ticks to execute, some instructions like load and store take
more. The NOP instruction, that does nothing, takes only one clock tick to execute. But, no matter how
many clock ticks it takes to execute an instruction, after instruction execution is complete, the FSM
returns to the first state of the instruction fetch, and the fetch-execute cycle repeats.
Every clock tick of the instruction fetch and execute phases is associated with one state of the
controlling FSM, and that FSM has eight states. So, while the bus-based difference engine had a 12-state
FSM and was only capable of doing a single, predefined calculation, the RSRC FSM has only eight states
and is capable of performing a large number of different calculations using the stored-program concept!
RSRC (AND SRC) INSTRUCTION FORMAT
RSRC instructions are binary-compatible with the instructions of the Simple RISC Computer (SRC) used to
study computer architecture in CSE 362M. The RSRC is to the SRC as the Intel Core i7 is to the Intel Core
i3: Central Processing Units (CPUs) that have a different architectures but that execute the same
machine code are said to be binary-compatible. And, yes, the RSRC is actually faster (in terms of number
of clock ticks to execute instructions) than the SRC, although one could argue that the SRC is more
elegant and (probably) uses fewer resources on a custom IC or inside a commercial FPGA.
The instruction formats for the RSRC and SRC are shown below in Table 1. For the moment, the SRC
assembly guide can be used when writing RSRC assembly programs.
TABLE 1. RSRC and SRC instruction formats.
The “abstract” RTN for the RSRC and SRC instructions is given below in Table 2. “Abstract” RTN defines
very specifically what an instruction does, but it does not include the clock-tick-by-clock-tick details of
how the instruction is executed.
The abstract RTN below defines the displacement-based address, disp, and the relative address, rel, are
calculated. Displacement-based and relative addresses are used with the load, store, load relative, and
store relative instructions as explained in further detail below.
The abstract RTN also defines cond, the FSM input that is equal to logic 1 when a conditional branch
instruction branch condition has been met, and n, the shift count used by the shift instructions. Both
cond and n are discussed further below in the context of datapath refinement.
31 27 26 0
(8) NOP, STOP
31 27 26 22 21 17 16 12 11 0
ra rb rc UNUSED (6) ADD, SUB, AND, OR
31 27 26 22 21 17 16 0
ra rb c2 (1) ADDI, ANDI, ORI, LD, ST, LA
31 27 26 22 21 17 16 12 11 0
(3) NEG, NOT
31 27 26 22 21 17 16 12 11 3 2 0
(4) BR
31 27 26 22 21 17 16 12 11 3 2 0
(5) BRL
31 27 26 22 21 17 16 12 11 5 4 0
(7a) SHR, SHRA, SHL, SHC
31 27 26 22 21 17 16 12 11 5 4 0
(7b) SHR, SHRA, SHL, SHC
OP
OP
OP UNUSED
OP ra UNUSED rc UNUSED
OP UNUSED rb rc UNUSED
OP rb rc UNUSED ra
COND
OP rb UNUSED ra COUNT
OP rb rc UNUSED ra 00000
31 27 26 22 21 0
OP ra c1
COND
(2) LDR, STR, LAR
nop(:=op=0) → :
disp<31..0> := ((rb = 0) → c2<16..0>{sign extend} :
(rb ≠ 0) → R[rb] + c2<16..0>{sign extend, 2’s comp.}):
rel<31..0> := PC<31..0> + c1<21..0> {sign extend, 2’s comp.}:
ld (:= op= 1) → R[ra] ←M[disp] :
ldr (:= op= 2) → R[ra] ←M[rel] :
st (:= op= 3) → M[disp] ← R[ra] :
str (:= op= 4) → M[rel] ← R[ra] :
la (:= op= 5 ) → R[ra] ← disp :
lar (:= op= 6) → R[ra] ← rel :
cond := (c3<2..0> = 0 → 0 :
c3<2..0> = 1 → 1 :
c3<2..0> = 2 → R[rc] = 0 :
c3<2..0> = 3 → R[rc] ≠ 0 :
c3<2..0> = 4 → R[rc]<31> = 0 :
c3<2..0> = 5 → R[rc]<31> = 1) :
br (:= op= 8) → (cond → PC ← R[rb]) :
brl (:= op= 9) → (R[ra] ← PC : cond →(PC ← R[rb])) :
add (:= op=12) → R[ra] ← R[rb] + R[rc] :
addi (:= op=13) → R[ra] ← R[rb] + c2<16..0>{2’s comp. sign ext.} :
sub (:= op=14) → R[ra] ← R[rb] – R[rc] :
neg (:= op=15) → R[ra] ← -R[rc] :
and (:= op=20) → R[ra] ← R[rb] ∧ R[rc] :
andi (:= op=21) → R[ra] ← R[rb] ∧ c2<16..0>{sign extend} :
or (:= op=22) → R[ra] ← R[rb] ∨ R[rc] :
ori (:= op=23) → R[ra] ← R[rb] ∨ c2<16..0>{sign extend} :
not (:= op=24) → R[ra] ← ¬R[rc] :
n := ((c3<4..0> ==0) → R[rc]<4..0> :
(c3<4..0> ≠ 0) → c3<4..0>) :
shr (:= op=26) → R[ra]<31..0> ← (n @ 0) # R[rb]<31..n> :
shra (:= op=27) → R[ra]<31..0> ← (n @ R[rb]<31>) # R[rb]<31..n> :
shl (:= op=28) → R[ra]<31..0> ← R[rb]<31-n..0> # (n @ 0) :
shc (:= op=29) → R[ra]<31..0> ← R[rb]<31-n..0> # R[rb]<31..32-n> :
Table 2. RSRC and SRC abstract RTN.
WRITING RSRC ASSEMBLY LANGUAGE PROGRAMS
Writing RSRC (and SRC) assembly language programs is different than writing programs in C, or C++, or
Java. For example, whereas one does not worry about the address at which a C program will be loaded
in memory prior to execution, this address must be specified when writing an assembly language
program so the assembler, which “assembles” assembly language code into machine code (the 0s and 1s
of Table 1), knows how to resolve addresses implied by your code. The starting address of an assembly
program is specified using the “.ORG” assembler directive. Let’s take a look at an example RSRC
assembly program and see how the “.ORG” and some of the other features of the assembler are used.
.org 0 ; Program starts at address 0
la r31,TOP ; r31 holds the loop address
la r30,MYD ; r30 points to the output data array
la r1,1 ; Initialize difference engine coefficients
la r2,-1
la r3,0
la r4,1
la r5,1
TOP: st r1,0(r30) ; Store a value of F(x)
addi r30,r30,4 ; Point to next element of the output array
add r1,r1,r2 ; Update the difference engine values
add r2,r2,r3
add r3,r3,r4
add r4,r4,r5
br r31 ; Branch unconditionally to TOP
stop
.org 4096 ; The output array is located at address 4096
MYD: .dw 16
Figure 4. Example RSRC assembly language program that computes the values of an example
polynomial.
The program of Figure 4 computes the difference engine values for a polynomial f(x) for x = 0, 1, 2, … ,
and it never stops writing results to memory. Eventually, this would cause the program to overwrite
itself, but for now let us ignore this detail (you will fix this in a homework problem) and focus on how to
write RSRC assembly language programs.
In almost every case, RSRC assembly language programs are specified to be loaded into memory starting
at address 0 using a “.ORG 0” statement at the beginning of the program. The “.ORG” statement is an
assembler directive, and it therefore does not result in the generation of any machine language code. A
second “.ORG” directive is used in the example of Figure 4 to allocate 16 (32-bit) words of memory for a
table of output values starting at address 409610.
In CSE 362M, Computer Architecture, you will learn that multiple programs running in a virtual memory
system, like Windows 10, all were assembled to run from address 0. This is done via use of a Memory
Management Unit (MMU), and it vastly simplifies the steps required to compile and assemble a program
(and threads) to actually execute on a modern Operating System (OS). But, we’ll have to save these
interesting details for CSE 362M.
Another important feature of assembly language is the concept of a label. In Figure 4, both “TOP” and
“MYD” are labels. Labels are associated with memory addresses in assembly language, and the
assembler replaces “TOP” and “MYD” in lines 2 and 3 of the assembly language program with the values
2810 and 409610 during the second pass through the code (they are not known on the initial pass through
the code, which is why assemblers are always “two-pass” assemblers). Do you see why “TOP” is
associated with the value 2810? It’s because the store instruction “ST r1,0(r30)” will be loaded into
memory at address 2810!
Using the SRC simulator used in CSE 362M, one can simulate the execution of the RSRC assembly
language program of Figure 4. When a program has an infinite loop like the one being discussed here, it
is best to “step” through the program one instruction at a time using the “Step” button on the Graphical
User Interface (GUI). The register values and the contents of memory at address 4096 are shown below
in Figure 5 right after the store instruction executes that writes the 16th output value to the memory
array.
The program of Figure 4 uses five different RSRC assembly instructions:
1) LA, which here loads an immediate value into the register pointed to by ra
2) ADDI, which adds an immediate value to the register pointed to by rb and stores the result in
the register pointed to by ra
3) ADD, which adds the values in the registers pointed to by rb and rc and stores the result in
the register pointed to by ra
4) ST, which stores the value in the register pointed to by ra to the memory location pointed to
by rb
5) BR, which unconditionally branches to the address in the register pointed to by rc
It is important to keep in mind while writing assembly language programs, for any processor, that
assembly language instructions are specifying one or more register transfers as part of the execution
phase of the fetch-execute cycle. The ADD instruction specifies three register transfers as discussed
above, and the BR instruction specifies a register transfer from a register in the register file to the PC!
Figure 5. The status of registers and memory after the store instruction writes the 16th value to the
memory array.
The unconditional branch instruction used in the assembly language program of Figure 4 is the simplest
of the many different branch instructions supported by the RSRC. But, infinite loops are usually not very
useful, so one needs a way to branch conditionally. In total, the RSRC supports six different branch
instructions, including four conditional branch instructions (not counting the branch with linkage
instructions that will not be discussed at this point):
br rb ; Branch unconditionally to the address in the register pointed to by rb
brnv ; Branch never
brzr rb,rc ; Branch to the address in the register pointed to by rb if the register
; pointed to by rc is zero
brnz rb,rc ; Branch to the address in the register pointed to by rb if the register
; pointed to by rc is non-zero
brpl rb,rc ; Branch to the address in the register pointed to by rb if the register
; pointed to by rc is positive or zero
brmi rb,rc ; Branch to the address in the register pointed to by rb if the register
; pointed to by rc is negative
The conditional branch instructions use a 3-bit field (bits 2-0) to specify the branch condition. This 3-bit
field is used by the control block during execution so that the PC is overwritten only if the condition
being tested is true.
As an exercise, you will be asked to modify the program of Figure 4 so that it only writes 16 values to the
memory array. You will need to use a conditional branch to do this along with a counter or index that
you decrement (for example) each time you write a value to memory. Once this exercise is completed,
you will have gone a long way toward learning RSRC assembly language!
You’ve probably figured out by now that the RSRC doesn’t execute ASCII characters but something we
computer engineers call “machine code.” The assembler “assembles” your ASCII text file into the
sequence of 0s and 1s (in the form of 32-bit words) that are loaded into memory at run time. Each
instruction has a unique Operation Code, or OP Code as we computer engineers say, and this is used to
populate the first five (5) bits of each machine instruction during assembly. The ra, rb, and rc fields are
extracted directly from the assembly code using the known format of each instruction. Conceptually, it’s
really not that hard to understand how an assembler generates machine code from your assembly
language programs.
One thing not covered in our discussion so far is how the various constant fields in the instruction
formats of Table 1 are used. In most cases, the constant specified is sign-extended prior to use, i.e., the
upper bit of the constant field is copied repeatedly to form a 32-bit constant. As an example, if you
coded:
ADDI r1,r2,-1
the ra field would be populated with 00001, the rb field with 00010, and the 17-bit c2 field with:
11111111111111111,
or 17 1s. During execution the 17-bit value is extended to a full 32-bit value (32 1s):
11111111111111111111111111111111.
The c2 constant field is sign-extended in exactly the same way during execution of the LD, ST, and LA
instructions to produce the 32-bit value needed.
The shift instructions use a 5-bit field (bits 4-0) to specify the number of positions the source register is
to be shifted. If the 5-bit field is non-zero, this is interpreted as the shift count. If the 5-bit field is zero,
i.e., 000002, this is used to specify that the actual shift count is in the register pointed to by the rc field in
the instruction. The assembler knows how to “assemble” the correct bits from the syntax of the
instruction being assembled:
shr r1,r2,r3 ; This implies shift r2 right by the count in r3 and store the result in r1
shr r1,r2,3 ; This implies shift r2 right by 3 bit positions and store the result in r1
The 5-bit count field is not sign-extended but instead treated as an unsigned value.
REFINING THE DATAPATH
The FSM shown in Figure 3, while theoretically possible to design, would be almost impossible to design
in practice. There are 64 outputs, for example, just to control the behavior of the register file. And, the
entire machine instruction has to be used as input. By refining the data path, the complexity of the FSM
can be reduced to something far more manageable. In the case of the RSRC and the SRC, a simple 8-
state Mealy-model machine is all that is needed when appropriate refinements are made to the
datapath.
One obvious refinement is shown below in Figure 6. Here, a decoder is used along with signals Gra, Grb,
Grc, Rin, Rout, and BAout to reduce the number of control signals needed to control the register file.
These six control signals from the FSM replace the 64 control signals shown in Figure 3, a reduction of 58
control signals!
To output the contents of a register to the bus, the FSM asserts the appropriate grant signal (Gra, Grb,
or Grc) along with Rout.
To enable a value on the bus to be written into the register file, the FSM asserts the appropriate grant
signal (Gra, Grb, or Grc) along with Rin. It is understood that, at this point, only ra specifies the target
register, but this topology is general in that even the register pointed to by the rb or rc fields can be
written to in the future if needed.
BAout is used during the computation of a displacement-based address. BAout functions exactly like
Rout except that during the calculation of a displacement-based address, if the base register specified is
R[0], zero is forced onto the bus. As a reminder, the abstract RTN for the load and store instructions is:
ld(:= op=1) → R[ra] M[disp] :
la(:= op=2) → R[ra] disp :
st(:= op=3) → M[disp] R[ra] :
where disp<31:0> := ((rb=0) → c2<16:0>{sign extended} : (rb=/=0) → R[rb] + c2<16:0>{sign extended}) :
Figure 6. By adding a decoder to the register file, we can greatly simplify the FSM.
Without adding additional logic to the datapath, branches would be extremely difficult to implement in
the control FSM. The FSM would have to look at the branch condition specified by IR(2:0) and specify
the appropriate comparisons, for example, to test the register pointed to by the rc field. By adding the
logic shown below in Figure 7, the FSM is required to look at a single input bit, CON, to decide if a branch
should be taken!
Figure 7. The branch logic looks at IR<2..0> and R[rc]. R[rc] and IR<2..0> must be stable during the state
in which the FSM uses CON to determine both the next state transition and the FSM outputs.
This process of datapath refinement typically proceeds, during the design of any digital system, in the
manner described here, until the designer feels no further useful simplification can be made. Sometimes
during the process, one refinement leads to the need to modify a previous refinement, and that has
happened here: The logic introduced to simplify branches has resulted in the need to make R[rc]
available to the branch logic. This results in the need to add a multiplexor to the register file, controlled
by the rc field, that always outputs R[rc]. This modification to the register file is shown below in Figure
6A.
Figure 6A. The register file of Figure 6, as a result of the progression of datapath refinement, needs to be
modified so that R[rc] is always available to the branch logic of Figure 7 and the shift logic of Figure 9
discussed below.
Another datapath refinement that simplifies the overall design of the FSM is shown below in Figure 8.
Here, logic has been added so that the constants c1 and c2 can be placed on the bus by the FSM via the
assertion of control signals C1out and C2out, respectively. Careful analysis of the logic in Figure 8 reveals
that the constants c1 and c2 are output in sign-extended form! The upper five bits of IR (the OP Code)
are shown connecting to the FSM.
Figure 8. The instruction register refinement has logic that allows the c1 and c2 constants to be placed
on the bus via controls signals C1out and C2out, respectively.
The ALU in Figure 3 has to be designed so that it can shift the B input right, left, circularly, or right
arithmetically by 0 to 31 bit positions in order to support the RSRC shift instructions. If only a constant
shift could be specified, via the bottom 5 bits in the instruction, these five bits could be connected
directly to the ALU control inputs. But, the RSRC/SRC supports both constant shifts and shifts specified in
the R[rc] register: R[rc]<4..0> is used as the shift count when the bottom five bits of the instruction are
zero. The logic shown below in Figure 9 produces the correct 5-bit shift value n that is used by the ALU.
Figure 9. The shift logic used to produce the correct shift value for shift instructions. This logic again uses
the R[rc] value output by the modified register file.
When we finish with the process of datapath refinement, during which we would have been
documenting our design using RTN, the next step is implementation. Before beginning the
implementation step, however, it is important to carefully review the RTN for the design and document
fully what is needed for each component. The ALU can serve as an example. In this design, our ALU will
end up having the following control inputs.
add
sub
and
c=b
inc4
neg
shr
shl
shc
shra
or
not
n
Each control input comes from the FSM, and n comes from the shift logic shown in Figure 9.
While we have simplified the FSM via datapath refinement considerably, we are still left with 33 control
signals that must be generated by the FSM! The block diagram of the final design is shown below in
Figure 10.
Figure 10. The RSRC architecture after datapath refinement to support a simple control FSM.
CON
add
sub
and
c=b
inc4
neg
shr
shl
shc
shra
or
not
…
5
12
5
IR<4..0>
15
C1_out
C2_out
5
pc_in
md_wr
md_out
md_rd
ma_in
ra
rb
rc
r_in
r_out
ba_out
gra
grb
grc
R[rc]
IR
ir_in
Register
File
A a_in
A ALU B
(+ – AND OR SHIFT INC4)
C c_in
c_out
32
MA
md_bus
MD
MD
FSM
32
32
A<31:0> TO MEMORY
D<31:0> TO/FROM MEMORY
READ.H
WRITE.H
DONE.H
PC
pc_out IR<31..27> = OP CODE
SHIFT/BRANCH
LOGIC n
n
DISPLACEMENT-BASED ADDRESSING
The RSRC/SRC load and store instructions use DISPLACEMENT-BASED ADDRESSING. The abstract RTN for
the load and store instructions is:
ld (:= op= 1) → R[ra] ←M[disp] :
st (:= op= 3) → M[disp] ← R[ra] :
where
disp<31..0> := ((rb = 0) → c2<16..0>{sign extend} :
(rb ≠ 0) → R[rb] + c2<16..0>{sign extend, 2’s comp.}):
The la instruction, not shown above, is like ld except memory is not accessed and the address is placed
in R[ra].
There are several valid ways to code la, ld, and st:
While they LOOK different, the results are the same (almost):
DISPLACEMENT-BASED ADDRESSING is common in RISC CPUs, not just something made up for the
purpose of the RSRC/SRC as an academic exercise (this is from Wikipedia):
We should be careful with notation (this is from Heuring and Jordan):
REALY SIMPLE RISC COMPUTER CONCRETE RTN AND CONTROL SEQUENCES
_______________________________________________________________________
Instruction FETCH Concrete RTN and Control Sequence
STEP RTN Control Sequence
T0 MA <- PC : C <- PC + 4 ; PCout,MAin,INC4,Cin T1 MD <- M[MA] : PC <- C ; Cout,PCin,MDrd,Read,Wait on Done T2 IR <- MD ; MDout,IRin _______________________________________________________________________ ADD Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- R[rb] ; Grb,Rout,Ain T4 C <- A + R[rc] ; Grc,Rout,ADD,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End Note: It would be ok to swap R[rb] and R[rc], that is, the results would be the same; Grb and Grc would need to swap, too. _______________________________________________________________________ ADDI Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- R[rb] ; Grb,Rout,Ain T4 C <- A + c2{sign-extended} ; C2out,ADD,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End Note: T3 must use R[rb] since this is the instruction definition. _______________________________________________________________________ AND Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- R[rb] ; Grb,Rout,Ain T4 C <- A ^ R[rc] ; Grc,Rout,AND,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End Note: It would be ok to swap R[rb] and R[rc], that is, the results would be the same; Grb and Grc would need to swap, too. _______________________________________________________________________ ANDI Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- R[rb] ; Grb,Rout,Ain T4 C <- A ^ c2{sign-extended} ; C2out,AND,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End Note: T3 must use R[rb] since this is the instruction definition. _______________________________________________________________________ BR Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 CON -> PC <- R[rb] ; Grb,Rout,CON -> PCin,End
_______________________________________________________________________
BRL Instruction Concrete RTN and Control Sequence
STEP RTN Control Sequence
T3 R[ra] <- PC ; Gra,Rin,PCout T4 CON -> PC <- R[rb] ; Grb,Rout,CON -> PCin,End
Note: While rc and ra are different instruction _FIELDS_, they can refer to
the same register.
For example, brlzr r1,r2,r1
would specify a branch if rc=r1 is zero _AND_ storing the PC in ra=r1.
With this implementation, we have to stipulate that the ra and rc fields
cannot be equal for the BRL instruction. A fine point, but also one that
can cause a great deal of pain during system debug.
_______________________________________________________________________
LA Instruction Concrete RTN and Control Sequence
STEP RTN Control Sequence
T3 A <- ((rb=0) -> 0 : (rb /=0) -> R[rb]) ; Grb,BAout,Ain
T4 C <- A + c2{sign-extended} ; C2out,ADD,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End _______________________________________________________________________ LAR Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- PC ; PCout,Ain T4 C <- A + c1{sign-extended} ; C1out,ADD,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End _______________________________________________________________________ LD Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- ((rb=0) -> 0 : (rb /=0) -> R[rb]) ; Grb,BAout,Ain
T4 C <- A + c2{sign-extended} ; C2out,ADD,Cin T5 MA <- C ; Cout,MAin T6 MD <- M[MA] ; MDrd,Read,Wait on Done T7 R[ra] <- MD ; MDout,Gra,Rin,End _______________________________________________________________________ LDR Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- PC ; PCout,Ain T4 C <- A + c1{sign-extended} ; C1out,ADD,Cin T5 MA <- C ; Cout,MAin T6 MD <- M[MA] ; MDrd,Read,Wait on Done T7 R[ra] <- MD ; MDout,Gra,Rin,End _______________________________________________________________________ NEG Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 C <- 0 - R[rc] ; Grc,Rout,NEG,Cin T4 R[ra] <- C ; Cout,Gra,Rin,End _______________________________________________________________________ NOP Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 ; End _______________________________________________________________________ NOT Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 C <- ┌(R[rc]) ; Grc,Rout,NOT,Cin T4 R[ra] <- C ; Cout,Gra,Rin,End Note: You must use R[rc] in T3. _______________________________________________________________________ OR Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- R[rb] ; Grb,Rout,Ain T4 C <- A V R[rc] ; Grc,Rout,OR,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End Note: It would be ok to swap R[rb] and R[rc], that is, the results would be the same; Grb and Grc would need to swap, too. _______________________________________________________________________ ORI Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- R[rb] ; Grb,Rout,Ain T4 C <- A V c2{sign-extended} ; C2out,OR,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End Note: T3 must use R[rb] since this is the instruction definition. _______________________________________________________________________ SHC Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 C <- R[rb]<31-n..0>#R[rb]<31..32-n> ; Grb,Rout,SHC,Cin
T4 R[ra] <- C ; Cout,Gra,Rin,End _______________________________________________________________________ SHL Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 C <- R[rb]<31-n..0>#(n@0) ; Grb,Rout,SHRL,Cin
T4 R[ra] <- C ; Cout,Gra,Rin,End _______________________________________________________________________ SHR Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 C <- (n@0)#R[rb]<31..n> ; Grb,Rout,SHR,Cin
T4 R[ra] <- C ; Cout,Gra,Rin,End _______________________________________________________________________ SHRA Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 C <- (n@R[rb]<31>)#R[rb]<31..n> ; Grb,Rout,SHRA,Cin
T4 R[ra] <- C ; Cout,Gra,Rin,End _______________________________________________________________________ ST Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- ((rb=0) -> 0 : (rb /=0) -> R[rb]) ; Grb,BAout,Ain
T4 C <- A + c2{sign-extended} ; C2out,ADD,Cin T5 MA <- C ; Cout,MAin T6 MD <- R[ra] ; Gra,Rout,MDbus T7 M[MA] <- MD ; MDwr,Write,Wait on Done,End _______________________________________________________________________ STOP Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 ; Wait Forever _______________________________________________________________________ STR Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- PC ; PCout,Ain T4 C <- A + c1{sign-extended} ; C1out,ADD,Cin T5 MA <- C ; Cout,MAin T6 MD <- R[ra] ; Gra,Rout,MDbus T7 M[MA] <- MD ; MDwr,Write,Wait on Done,End _______________________________________________________________________ SUB Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- R[rb] ; Grb,Rout,Ain T4 C <- A - R[rc] ; Grc,Rout,SUB,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End Note: YOU CANNOT SWAP R[rb] AND R[rc]! THE ALU SUBTRACTS THE B INPUT FROM THE A INPUT! _______________________________________________________________________ REALLY SIMPLE RISC COMPUTER VHDL A full VHDL implementation of the RSRC is available from the instructor. SIMULATION The figure below shows a simulation of a VHDL implementation of the RSRC based on the datapath shown above. The RSRC was included in a “testbench” along with a non-volatile ROM model and a static RAM model. The non-volatile ROM held machine code that implemented a difference engine that computed the values of an example polynomial. Register “reg1” in the simulation shows the functional values of f(x) as the difference engine values are calculated starting with the value for f(x=0).