CS计算机代考程序代写 Java computer architecture assembly assembler The Really Simple RISC Computer (RSRC) V0.811

The Really Simple RISC Computer (RSRC) V0.811
Copyright William D. Richard, Ph.D.

Updated November 17, 2020

INTRODUCTION

The intellectual development of the Really Simple RISC Computer (RSRC) began on the first day of CSE

260M (Introduction to Digital Logic and Computer Design) with the introduction of Boolean algebra.

Boolean algebra was used to develop the concepts associated with logic gates and circuits, including

optimal covering of logic functions using both two-level Sum-of-Product (SOP) and Product-of-Sum (POS)

circuits and covering using the LUT-based architectures found in modern FPGAs. During the first section

of the course, we investigated addition and subtraction and built a simple Arithmetic Logic Unit (ALU)

that performed these operations as well as bit-wise AND and OR operations on 32-bit words.

Laboratory 1, completed during the first section of the course, focused primarily on the Xilinx Vivado

tool flow. A simple combinational circuit was implemented using switch outputs as inputs and LEDs as

outputs. You were required to constrain the pins on your design and generate a bit file that could be

downloaded to a Xilnix Artix-7 FPGA on a development board for testing.

In the second section of the course, we developed the concept of a rising edge-triggered D-type Flip-

Flop (FF) from first principles starting with the simple SR latch. Flip-flops begat registers, and it was only

a small stretch to the concept of a pipelined difference engine based on registers and adders. The block

diagram of a difference engine capable of computing any fourth order polynomial is shown below in

Figure 1.

Figure 1. A difference engine capable of computing any fourth order polynomial F(x), x = 0, 1, 2, … , once

initialized with the proper difference coefficients.

Register Transfer Notation (RTN) was introduced to describe the behavior of the pipelined difference

engine. For the circuit shown above, the RTN would be:

F  F + G : G  G +H : H  H + I : I  I + J ;

In this RTN description, a colon is used to separate register transfers that happen at the same time, and

here four computations are done simultaneously. The semicolon is used to identify clock period

32

CLK

F(x)
I J + H + G + F +

boundaries. From this RTN description, you developed a VHDL model of the pipelined difference engine

and simulated it extensively in homework 4.

The concepts associated with Finite State Machines (FSMs) were introduced next and used to construct

sequence recognizers and similar “traditional” academic sequential machines in state diagram form. The

recommended method for describing FSMs in synthesizable form so they are properly identified as FSMs

by modern synthesis tools was covered in excruciating detail (only FSMs that are identified as FSMs by

the tools can be optimized via selection of the state encoding technique). VHDL was then used to

implement a sequence recognizer presented as an Internet packet sniffer searching for questionable

strings like “bomb.” The concept of state equivalence was introduced, and a tabular approach to state

minimization was developed from basic theorems on sequential machines.

Every digital system can be and should be broken into two portions: Datapath and control. The

separation of control from datapath functionality has long been recognized as the best approach to

digital system design, and the concept of “datapath refinement” is almost universally applied to simplify

the control portion of the design. This concept was introduced and discussed in class.

In “The Difference Engine,” the datapath for a bus-based difference engine was presented as an

alternative to the pipelined difference engine of homework 4. A block diagram of the datapath is shown

below in Figure 2. The datapath of the bus-based difference engine consists of a set of registers, labeled

F, G, H, I, and J, two additional registers (A and C), and an adder connected by a single bus as shown in

Figure 2. The “control” portion of the difference engine was presented as a simple 12-state FSM that

controls the register transfers needed to compute the functional values of the polynomial described by

the difference engine coefficients. The register transfers coordinated by the 12-state FSM are:

1) A  F ;

2) C  A + G ;

3) F  C ;

4) A  G ;

5) C  A + H ;

6) G  C ;

7) A  H ;

8) C  A + I ;

9) H  C ;

10) A  I ;

11) C  A + J ;

12) I  C ;

You were given a VHDL model of the datapath and the VHDL for the 12-state FSM required to

implement the bus-based difference engine. This VHDL implementation of the bus-based difference

engine was discussed extensively in class.

Figure 2. The datapath for the bus-based difference engine. Register transfers across the 32-bit bus are

controlled by a 12-state Mealy-model FSM.

The pipelined difference engine required four different adders and could compute polynomial

coefficients at least 12 times faster than the bus-based difference engine, producing one polynomial

value every clock. Later, we will come to realize that the bus-based difference engine is to the RSRC as

the RSRC described below is to a fully pipelined version of the same processor!

F
f_in

f_out

G
g_in

g_out

H
h_in

h_out

I
i_in

i_out

J
j_in

j_out

A a_in

ADDER

C c_in

c_out

32

THE REALLY SIMPLE RISC COMPUTER (RSRC)

The Really Simple RISC Computer (RSRC) is a stored-program computer based on a simple extension of

the bus-based difference engine. By adding a Program Counter (PC), an Instruction Register (IR), and

Memory Address (MA) and Memory Data (MD) registers to the bus-based difference engine, we can

construct the basic stored-program computer shown below in Figure 3. Here, the program counter holds

the address of the next instruction in memory to be executed, and the instruction register holds the

instruction after it has been fetched from memory during execution of the instruction. The MA register

holds the address of the instruction as it is being fetched from memory, and the “input” MD register

holds the instruction fetched from memory prior to the instruction being moved to the instruction

register. The “Input” MD register also holds data as it is being loaded from memory during execution of

the load instruction (discussed later), and the “output” MD register holds data that is being stored to

memory during the execution of the store instruction (discussed later).

In addition to adding PC, IR, MA, and MD registers, the RSRC expands the set of registers available from

five to thirty-two and uses an Arithmetic Logic Unit (ALU) instead of a simple adder. The ALU is capable

of performing add, subtract, AND, OR, increment by four, and shift operations as described in detail

later.

The stored-program concept used with the RSRC that is so ubiquitous today was not so obvious in the

early days of computing, and many well-written histories are available to the budding young computer

architect. Since history, as they say, tends to repeat itself, learning about the history of computing is

important to anyone seriously considering a career in computer engineering.

The concept of the “fetch-execute” cycle is one of the first things introduced in most books on computer

architecture because it is one of the most fundamental notions on which modern CPUs are based. Here

in the RSRC, there are three steps involved in the “fetch” portion of the fetch-execute cycle, and each

step takes a single clock cycle:

1) Move the value in PC to MA while simultaneously incrementing the PC value by four and storing

it in the C register;

2) Read the next instruction from memory into the “input” MD register while simultaneously

moving the incremented PC value back to the PC;

3) Move the instruction fetched from memory to the IR.

The register transfers associated with each of these three steps are:

1) MA  PC : C  PC +4 ;

2) MD  M[MA] : PC  C ;

3) IR  MD ;

Here, we have introduced a new functionality to our RTN notation: M[MA] implies the contents of

memory at address MA, much like we notate the element of an array in many programming languages.

The control signals produced by the FSM to cause these register transfers to happen are:

1) PCout, MAin, INC4, Cin

2) Cout, PCin, READ, MDrd

3) MDout, IRin

Figure 3. The RSRC is a simple extension of the bus-based difference engine created by adding PC, IR,

MA, and MD registers and extending the register set from five to 32 registers.

C1_out
C2_out

m

32

pc_in

md_wr

md_out

md_rd

ma_in

r0_in

r1_in

r31_in

IR
ir_in

R0

r0_out

R1

r1_out

R31

r31_out

A a_in

A ALU B

(+ – AND OR SHIFT INC4)

C c_in

c_out

32

MA

md_bus
MD

MD

FSM

32

32

A<31..0> TO MEMORY

D<31..0> TO/FROM MEMORY

READ.H

WRITE.H

DONE.H

PC

pc_out
IR<31..0>

REG FILE

The PC value is incremented not by one but by four because RSRC instructions are four bytes wide, and,

almost universally (at least in the Personal Computer (PC) world), memory is considered to be byte-

addressable. So, RSRC instructions are four bytes or 32-bits wide, and common terminology is to say the

RSRC “word size” is 32-bits. These details will be discussed in considerable depth later, but for now it is

enough to know that instructions start in memory every four byte addresses, that is, at addresses 0, 4, 8,

12, etc.

While every instruction fetch takes three clock ticks (we computer engineers call clock cycles “ticks” in

reference to how an old mechanical clock would go tick, tick, tick, counting off the seconds of the day),

the instruction execution phase takes a variable number of clock ticks. An ADD instruction, for example,

takes three clock ticks to execute:

1) Move one operand from the source register in the register file to the A register;

2) Place the second operand on the bus and add the two operands, saving the sum in the C

register;

3) Move the sum back to the target register in the register file.

While the ADD instruction takes three clock ticks to execute, some instructions like load and store take

more. The NOP instruction, that does nothing, takes only one clock tick to execute. But, no matter how

many clock ticks it takes to execute an instruction, after instruction execution is complete, the FSM

returns to the first state of the instruction fetch, and the fetch-execute cycle repeats.

Every clock tick of the instruction fetch and execute phases is associated with one state of the

controlling FSM, and that FSM has eight states. So, while the bus-based difference engine had a 12-state

FSM and was only capable of doing a single, predefined calculation, the RSRC FSM has only eight states

and is capable of performing a large number of different calculations using the stored-program concept!

RSRC (AND SRC) INSTRUCTION FORMAT

RSRC instructions are binary-compatible with the instructions of the Simple RISC Computer (SRC) used to

study computer architecture in CSE 362M. The RSRC is to the SRC as the Intel Core i7 is to the Intel Core

i3: Central Processing Units (CPUs) that have a different architectures but that execute the same

machine code are said to be binary-compatible. And, yes, the RSRC is actually faster (in terms of number

of clock ticks to execute instructions) than the SRC, although one could argue that the SRC is more

elegant and (probably) uses fewer resources on a custom IC or inside a commercial FPGA.

The instruction formats for the RSRC and SRC are shown below in Table 1. For the moment, the SRC

assembly guide can be used when writing RSRC assembly programs.

TABLE 1. RSRC and SRC instruction formats.

The “abstract” RTN for the RSRC and SRC instructions is given below in Table 2. “Abstract” RTN defines

very specifically what an instruction does, but it does not include the clock-tick-by-clock-tick details of

how the instruction is executed.

The abstract RTN below defines the displacement-based address, disp, and the relative address, rel, are

calculated. Displacement-based and relative addresses are used with the load, store, load relative, and

store relative instructions as explained in further detail below.

The abstract RTN also defines cond, the FSM input that is equal to logic 1 when a conditional branch

instruction branch condition has been met, and n, the shift count used by the shift instructions. Both

cond and n are discussed further below in the context of datapath refinement.

31 27 26 0

(8) NOP, STOP

31 27 26 22 21 17 16 12 11 0

ra rb rc UNUSED (6) ADD, SUB, AND, OR

31 27 26 22 21 17 16 0

ra rb c2 (1) ADDI, ANDI, ORI, LD, ST, LA

31 27 26 22 21 17 16 12 11 0

(3) NEG, NOT

31 27 26 22 21 17 16 12 11 3 2 0

(4) BR

31 27 26 22 21 17 16 12 11 3 2 0

(5) BRL

31 27 26 22 21 17 16 12 11 5 4 0

(7a) SHR, SHRA, SHL, SHC

31 27 26 22 21 17 16 12 11 5 4 0

(7b) SHR, SHRA, SHL, SHC

OP

OP

OP UNUSED

OP ra UNUSED rc UNUSED

OP UNUSED rb rc UNUSED

OP rb rc UNUSED ra
COND

OP rb UNUSED ra COUNT

OP rb rc UNUSED ra 00000

31 27 26 22 21 0

OP ra c1

COND

(2) LDR, STR, LAR

nop(:=op=0) → :

disp<31..0> := ((rb = 0) → c2<16..0>{sign extend} :

(rb ≠ 0) → R[rb] + c2<16..0>{sign extend, 2’s comp.}):

rel<31..0> := PC<31..0> + c1<21..0> {sign extend, 2’s comp.}:

ld (:= op= 1) → R[ra] ←M[disp] :

ldr (:= op= 2) → R[ra] ←M[rel] :

st (:= op= 3) → M[disp] ← R[ra] :

str (:= op= 4) → M[rel] ← R[ra] :

la (:= op= 5 ) → R[ra] ← disp :

lar (:= op= 6) → R[ra] ← rel :

cond := (c3<2..0> = 0 → 0 :

c3<2..0> = 1 → 1 :

c3<2..0> = 2 → R[rc] = 0 :

c3<2..0> = 3 → R[rc] ≠ 0 :

c3<2..0> = 4 → R[rc]<31> = 0 :

c3<2..0> = 5 → R[rc]<31> = 1) :

br (:= op= 8) → (cond → PC ← R[rb]) :

brl (:= op= 9) → (R[ra] ← PC : cond →(PC ← R[rb])) :

add (:= op=12) → R[ra] ← R[rb] + R[rc] :

addi (:= op=13) → R[ra] ← R[rb] + c2<16..0>{2’s comp. sign ext.} :

sub (:= op=14) → R[ra] ← R[rb] – R[rc] :

neg (:= op=15) → R[ra] ← -R[rc] :

and (:= op=20) → R[ra] ← R[rb] ∧ R[rc] :
andi (:= op=21) → R[ra] ← R[rb] ∧ c2<16..0>{sign extend} :
or (:= op=22) → R[ra] ← R[rb] ∨ R[rc] :
ori (:= op=23) → R[ra] ← R[rb] ∨ c2<16..0>{sign extend} :
not (:= op=24) → R[ra] ← ¬R[rc] :

n := ((c3<4..0> ==0) → R[rc]<4..0> :

(c3<4..0> ≠ 0) → c3<4..0>) :

shr (:= op=26) → R[ra]<31..0> ← (n @ 0) # R[rb]<31..n> :

shra (:= op=27) → R[ra]<31..0> ← (n @ R[rb]<31>) # R[rb]<31..n> :

shl (:= op=28) → R[ra]<31..0> ← R[rb]<31-n..0> # (n @ 0) :

shc (:= op=29) → R[ra]<31..0> ← R[rb]<31-n..0> # R[rb]<31..32-n> :

Table 2. RSRC and SRC abstract RTN.

WRITING RSRC ASSEMBLY LANGUAGE PROGRAMS

Writing RSRC (and SRC) assembly language programs is different than writing programs in C, or C++, or

Java. For example, whereas one does not worry about the address at which a C program will be loaded

in memory prior to execution, this address must be specified when writing an assembly language

program so the assembler, which “assembles” assembly language code into machine code (the 0s and 1s

of Table 1), knows how to resolve addresses implied by your code. The starting address of an assembly

program is specified using the “.ORG” assembler directive. Let’s take a look at an example RSRC

assembly program and see how the “.ORG” and some of the other features of the assembler are used.

.org 0 ; Program starts at address 0

la r31,TOP ; r31 holds the loop address

la r30,MYD ; r30 points to the output data array

la r1,1 ; Initialize difference engine coefficients

la r2,-1

la r3,0

la r4,1

la r5,1

TOP: st r1,0(r30) ; Store a value of F(x)

addi r30,r30,4 ; Point to next element of the output array

add r1,r1,r2 ; Update the difference engine values

add r2,r2,r3

add r3,r3,r4

add r4,r4,r5

br r31 ; Branch unconditionally to TOP

stop

.org 4096 ; The output array is located at address 4096

MYD: .dw 16

Figure 4. Example RSRC assembly language program that computes the values of an example

polynomial.

The program of Figure 4 computes the difference engine values for a polynomial f(x) for x = 0, 1, 2, … ,

and it never stops writing results to memory. Eventually, this would cause the program to overwrite

itself, but for now let us ignore this detail (you will fix this in a homework problem) and focus on how to

write RSRC assembly language programs.

In almost every case, RSRC assembly language programs are specified to be loaded into memory starting

at address 0 using a “.ORG 0” statement at the beginning of the program. The “.ORG” statement is an

assembler directive, and it therefore does not result in the generation of any machine language code. A

second “.ORG” directive is used in the example of Figure 4 to allocate 16 (32-bit) words of memory for a

table of output values starting at address 409610.

In CSE 362M, Computer Architecture, you will learn that multiple programs running in a virtual memory

system, like Windows 10, all were assembled to run from address 0. This is done via use of a Memory

Management Unit (MMU), and it vastly simplifies the steps required to compile and assemble a program

(and threads) to actually execute on a modern Operating System (OS). But, we’ll have to save these

interesting details for CSE 362M.

Another important feature of assembly language is the concept of a label. In Figure 4, both “TOP” and

“MYD” are labels. Labels are associated with memory addresses in assembly language, and the

assembler replaces “TOP” and “MYD” in lines 2 and 3 of the assembly language program with the values

2810 and 409610 during the second pass through the code (they are not known on the initial pass through

the code, which is why assemblers are always “two-pass” assemblers). Do you see why “TOP” is

associated with the value 2810? It’s because the store instruction “ST r1,0(r30)” will be loaded into

memory at address 2810!

Using the SRC simulator used in CSE 362M, one can simulate the execution of the RSRC assembly

language program of Figure 4. When a program has an infinite loop like the one being discussed here, it

is best to “step” through the program one instruction at a time using the “Step” button on the Graphical

User Interface (GUI). The register values and the contents of memory at address 4096 are shown below

in Figure 5 right after the store instruction executes that writes the 16th output value to the memory

array.

The program of Figure 4 uses five different RSRC assembly instructions:

1) LA, which here loads an immediate value into the register pointed to by ra

2) ADDI, which adds an immediate value to the register pointed to by rb and stores the result in

the register pointed to by ra

3) ADD, which adds the values in the registers pointed to by rb and rc and stores the result in

the register pointed to by ra

4) ST, which stores the value in the register pointed to by ra to the memory location pointed to

by rb

5) BR, which unconditionally branches to the address in the register pointed to by rc

It is important to keep in mind while writing assembly language programs, for any processor, that

assembly language instructions are specifying one or more register transfers as part of the execution

phase of the fetch-execute cycle. The ADD instruction specifies three register transfers as discussed

above, and the BR instruction specifies a register transfer from a register in the register file to the PC!

Figure 5. The status of registers and memory after the store instruction writes the 16th value to the

memory array.

The unconditional branch instruction used in the assembly language program of Figure 4 is the simplest

of the many different branch instructions supported by the RSRC. But, infinite loops are usually not very

useful, so one needs a way to branch conditionally. In total, the RSRC supports six different branch

instructions, including four conditional branch instructions (not counting the branch with linkage

instructions that will not be discussed at this point):

br rb ; Branch unconditionally to the address in the register pointed to by rb

brnv ; Branch never

brzr rb,rc ; Branch to the address in the register pointed to by rb if the register

; pointed to by rc is zero

brnz rb,rc ; Branch to the address in the register pointed to by rb if the register

; pointed to by rc is non-zero

brpl rb,rc ; Branch to the address in the register pointed to by rb if the register

; pointed to by rc is positive or zero

brmi rb,rc ; Branch to the address in the register pointed to by rb if the register

; pointed to by rc is negative

The conditional branch instructions use a 3-bit field (bits 2-0) to specify the branch condition. This 3-bit

field is used by the control block during execution so that the PC is overwritten only if the condition

being tested is true.

As an exercise, you will be asked to modify the program of Figure 4 so that it only writes 16 values to the

memory array. You will need to use a conditional branch to do this along with a counter or index that

you decrement (for example) each time you write a value to memory. Once this exercise is completed,

you will have gone a long way toward learning RSRC assembly language!

You’ve probably figured out by now that the RSRC doesn’t execute ASCII characters but something we

computer engineers call “machine code.” The assembler “assembles” your ASCII text file into the

sequence of 0s and 1s (in the form of 32-bit words) that are loaded into memory at run time. Each

instruction has a unique Operation Code, or OP Code as we computer engineers say, and this is used to

populate the first five (5) bits of each machine instruction during assembly. The ra, rb, and rc fields are

extracted directly from the assembly code using the known format of each instruction. Conceptually, it’s

really not that hard to understand how an assembler generates machine code from your assembly

language programs.

One thing not covered in our discussion so far is how the various constant fields in the instruction

formats of Table 1 are used. In most cases, the constant specified is sign-extended prior to use, i.e., the

upper bit of the constant field is copied repeatedly to form a 32-bit constant. As an example, if you

coded:

ADDI r1,r2,-1

the ra field would be populated with 00001, the rb field with 00010, and the 17-bit c2 field with:

11111111111111111,

or 17 1s. During execution the 17-bit value is extended to a full 32-bit value (32 1s):

11111111111111111111111111111111.

The c2 constant field is sign-extended in exactly the same way during execution of the LD, ST, and LA

instructions to produce the 32-bit value needed.

The shift instructions use a 5-bit field (bits 4-0) to specify the number of positions the source register is

to be shifted. If the 5-bit field is non-zero, this is interpreted as the shift count. If the 5-bit field is zero,

i.e., 000002, this is used to specify that the actual shift count is in the register pointed to by the rc field in

the instruction. The assembler knows how to “assemble” the correct bits from the syntax of the

instruction being assembled:

shr r1,r2,r3 ; This implies shift r2 right by the count in r3 and store the result in r1

shr r1,r2,3 ; This implies shift r2 right by 3 bit positions and store the result in r1

The 5-bit count field is not sign-extended but instead treated as an unsigned value.

REFINING THE DATAPATH

The FSM shown in Figure 3, while theoretically possible to design, would be almost impossible to design

in practice. There are 64 outputs, for example, just to control the behavior of the register file. And, the

entire machine instruction has to be used as input. By refining the data path, the complexity of the FSM

can be reduced to something far more manageable. In the case of the RSRC and the SRC, a simple 8-

state Mealy-model machine is all that is needed when appropriate refinements are made to the

datapath.

One obvious refinement is shown below in Figure 6. Here, a decoder is used along with signals Gra, Grb,

Grc, Rin, Rout, and BAout to reduce the number of control signals needed to control the register file.

These six control signals from the FSM replace the 64 control signals shown in Figure 3, a reduction of 58

control signals!

To output the contents of a register to the bus, the FSM asserts the appropriate grant signal (Gra, Grb,

or Grc) along with Rout.

To enable a value on the bus to be written into the register file, the FSM asserts the appropriate grant

signal (Gra, Grb, or Grc) along with Rin. It is understood that, at this point, only ra specifies the target

register, but this topology is general in that even the register pointed to by the rb or rc fields can be

written to in the future if needed.

BAout is used during the computation of a displacement-based address. BAout functions exactly like

Rout except that during the calculation of a displacement-based address, if the base register specified is

R[0], zero is forced onto the bus. As a reminder, the abstract RTN for the load and store instructions is:

ld(:= op=1) → R[ra]  M[disp] :
la(:= op=2) → R[ra]  disp :
st(:= op=3) → M[disp]  R[ra] :

where disp<31:0> := ((rb=0) → c2<16:0>{sign extended} : (rb=/=0) → R[rb] + c2<16:0>{sign extended}) :

Figure 6. By adding a decoder to the register file, we can greatly simplify the FSM.

Without adding additional logic to the datapath, branches would be extremely difficult to implement in

the control FSM. The FSM would have to look at the branch condition specified by IR(2:0) and specify

the appropriate comparisons, for example, to test the register pointed to by the rc field. By adding the

logic shown below in Figure 7, the FSM is required to look at a single input bit, CON, to decide if a branch

should be taken!

Figure 7. The branch logic looks at IR<2..0> and R[rc]. R[rc] and IR<2..0> must be stable during the state

in which the FSM uses CON to determine both the next state transition and the FSM outputs.

This process of datapath refinement typically proceeds, during the design of any digital system, in the

manner described here, until the designer feels no further useful simplification can be made. Sometimes

during the process, one refinement leads to the need to modify a previous refinement, and that has

happened here: The logic introduced to simplify branches has resulted in the need to make R[rc]

available to the branch logic. This results in the need to add a multiplexor to the register file, controlled

by the rc field, that always outputs R[rc]. This modification to the register file is shown below in Figure

6A.

Figure 6A. The register file of Figure 6, as a result of the progression of datapath refinement, needs to be

modified so that R[rc] is always available to the branch logic of Figure 7 and the shift logic of Figure 9

discussed below.

Another datapath refinement that simplifies the overall design of the FSM is shown below in Figure 8.

Here, logic has been added so that the constants c1 and c2 can be placed on the bus by the FSM via the

assertion of control signals C1out and C2out, respectively. Careful analysis of the logic in Figure 8 reveals

that the constants c1 and c2 are output in sign-extended form! The upper five bits of IR (the OP Code)

are shown connecting to the FSM.

Figure 8. The instruction register refinement has logic that allows the c1 and c2 constants to be placed

on the bus via controls signals C1out and C2out, respectively.

The ALU in Figure 3 has to be designed so that it can shift the B input right, left, circularly, or right

arithmetically by 0 to 31 bit positions in order to support the RSRC shift instructions. If only a constant

shift could be specified, via the bottom 5 bits in the instruction, these five bits could be connected

directly to the ALU control inputs. But, the RSRC/SRC supports both constant shifts and shifts specified in

the R[rc] register: R[rc]<4..0> is used as the shift count when the bottom five bits of the instruction are

zero. The logic shown below in Figure 9 produces the correct 5-bit shift value n that is used by the ALU.

Figure 9. The shift logic used to produce the correct shift value for shift instructions. This logic again uses

the R[rc] value output by the modified register file.

When we finish with the process of datapath refinement, during which we would have been

documenting our design using RTN, the next step is implementation. Before beginning the

implementation step, however, it is important to carefully review the RTN for the design and document

fully what is needed for each component. The ALU can serve as an example. In this design, our ALU will

end up having the following control inputs.

add

sub

and

c=b

inc4

neg

shr

shl

shc

shra

or

not

n

Each control input comes from the FSM, and n comes from the shift logic shown in Figure 9.

While we have simplified the FSM via datapath refinement considerably, we are still left with 33 control

signals that must be generated by the FSM! The block diagram of the final design is shown below in

Figure 10.

Figure 10. The RSRC architecture after datapath refinement to support a simple control FSM.

CON

add
sub
and
c=b
inc4
neg
shr
shl
shc
shra
or
not

5

12

5

IR<4..0>

15

C1_out
C2_out

5

pc_in

md_wr

md_out

md_rd

ma_in

ra
rb
rc

r_in
r_out
ba_out
gra
grb
grc

R[rc]

IR
ir_in

Register

File

A a_in

A ALU B

(+ – AND OR SHIFT INC4)

C c_in

c_out

32

MA

md_bus
MD

MD

FSM

32

32

A<31:0> TO MEMORY

D<31:0> TO/FROM MEMORY

READ.H

WRITE.H

DONE.H

PC

pc_out IR<31..27> = OP CODE

SHIFT/BRANCH

LOGIC n

n

DISPLACEMENT-BASED ADDRESSING

The RSRC/SRC load and store instructions use DISPLACEMENT-BASED ADDRESSING. The abstract RTN for

the load and store instructions is:

ld (:= op= 1) → R[ra] ←M[disp] :

st (:= op= 3) → M[disp] ← R[ra] :

where

disp<31..0> := ((rb = 0) → c2<16..0>{sign extend} :

(rb ≠ 0) → R[rb] + c2<16..0>{sign extend, 2’s comp.}):

The la instruction, not shown above, is like ld except memory is not accessed and the address is placed

in R[ra].

There are several valid ways to code la, ld, and st:

While they LOOK different, the results are the same (almost):

DISPLACEMENT-BASED ADDRESSING is common in RISC CPUs, not just something made up for the

purpose of the RSRC/SRC as an academic exercise (this is from Wikipedia):

We should be careful with notation (this is from Heuring and Jordan):

REALY SIMPLE RISC COMPUTER CONCRETE RTN AND CONTROL SEQUENCES

_______________________________________________________________________

Instruction FETCH Concrete RTN and Control Sequence

STEP RTN Control Sequence

T0 MA <- PC : C <- PC + 4 ; PCout,MAin,INC4,Cin T1 MD <- M[MA] : PC <- C ; Cout,PCin,MDrd,Read,Wait on Done T2 IR <- MD ; MDout,IRin _______________________________________________________________________ ADD Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- R[rb] ; Grb,Rout,Ain T4 C <- A + R[rc] ; Grc,Rout,ADD,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End Note: It would be ok to swap R[rb] and R[rc], that is, the results would be the same; Grb and Grc would need to swap, too. _______________________________________________________________________ ADDI Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- R[rb] ; Grb,Rout,Ain T4 C <- A + c2{sign-extended} ; C2out,ADD,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End Note: T3 must use R[rb] since this is the instruction definition. _______________________________________________________________________ AND Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- R[rb] ; Grb,Rout,Ain T4 C <- A ^ R[rc] ; Grc,Rout,AND,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End Note: It would be ok to swap R[rb] and R[rc], that is, the results would be the same; Grb and Grc would need to swap, too. _______________________________________________________________________ ANDI Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- R[rb] ; Grb,Rout,Ain T4 C <- A ^ c2{sign-extended} ; C2out,AND,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End Note: T3 must use R[rb] since this is the instruction definition. _______________________________________________________________________ BR Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 CON -> PC <- R[rb] ; Grb,Rout,CON -> PCin,End

_______________________________________________________________________

BRL Instruction Concrete RTN and Control Sequence

STEP RTN Control Sequence

T3 R[ra] <- PC ; Gra,Rin,PCout T4 CON -> PC <- R[rb] ; Grb,Rout,CON -> PCin,End

Note: While rc and ra are different instruction _FIELDS_, they can refer to

the same register.

For example, brlzr r1,r2,r1

would specify a branch if rc=r1 is zero _AND_ storing the PC in ra=r1.

With this implementation, we have to stipulate that the ra and rc fields

cannot be equal for the BRL instruction. A fine point, but also one that

can cause a great deal of pain during system debug.

_______________________________________________________________________

LA Instruction Concrete RTN and Control Sequence

STEP RTN Control Sequence

T3 A <- ((rb=0) -> 0 : (rb /=0) -> R[rb]) ; Grb,BAout,Ain

T4 C <- A + c2{sign-extended} ; C2out,ADD,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End _______________________________________________________________________ LAR Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- PC ; PCout,Ain T4 C <- A + c1{sign-extended} ; C1out,ADD,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End _______________________________________________________________________ LD Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- ((rb=0) -> 0 : (rb /=0) -> R[rb]) ; Grb,BAout,Ain

T4 C <- A + c2{sign-extended} ; C2out,ADD,Cin T5 MA <- C ; Cout,MAin T6 MD <- M[MA] ; MDrd,Read,Wait on Done T7 R[ra] <- MD ; MDout,Gra,Rin,End _______________________________________________________________________ LDR Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- PC ; PCout,Ain T4 C <- A + c1{sign-extended} ; C1out,ADD,Cin T5 MA <- C ; Cout,MAin T6 MD <- M[MA] ; MDrd,Read,Wait on Done T7 R[ra] <- MD ; MDout,Gra,Rin,End _______________________________________________________________________ NEG Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 C <- 0 - R[rc] ; Grc,Rout,NEG,Cin T4 R[ra] <- C ; Cout,Gra,Rin,End _______________________________________________________________________ NOP Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 ; End _______________________________________________________________________ NOT Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 C <- ┌(R[rc]) ; Grc,Rout,NOT,Cin T4 R[ra] <- C ; Cout,Gra,Rin,End Note: You must use R[rc] in T3. _______________________________________________________________________ OR Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- R[rb] ; Grb,Rout,Ain T4 C <- A V R[rc] ; Grc,Rout,OR,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End Note: It would be ok to swap R[rb] and R[rc], that is, the results would be the same; Grb and Grc would need to swap, too. _______________________________________________________________________ ORI Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- R[rb] ; Grb,Rout,Ain T4 C <- A V c2{sign-extended} ; C2out,OR,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End Note: T3 must use R[rb] since this is the instruction definition. _______________________________________________________________________ SHC Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 C <- R[rb]<31-n..0>#R[rb]<31..32-n> ; Grb,Rout,SHC,Cin

T4 R[ra] <- C ; Cout,Gra,Rin,End _______________________________________________________________________ SHL Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 C <- R[rb]<31-n..0>#(n@0) ; Grb,Rout,SHRL,Cin

T4 R[ra] <- C ; Cout,Gra,Rin,End _______________________________________________________________________ SHR Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 C <- (n@0)#R[rb]<31..n> ; Grb,Rout,SHR,Cin

T4 R[ra] <- C ; Cout,Gra,Rin,End _______________________________________________________________________ SHRA Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 C <- (n@R[rb]<31>)#R[rb]<31..n> ; Grb,Rout,SHRA,Cin

T4 R[ra] <- C ; Cout,Gra,Rin,End _______________________________________________________________________ ST Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- ((rb=0) -> 0 : (rb /=0) -> R[rb]) ; Grb,BAout,Ain

T4 C <- A + c2{sign-extended} ; C2out,ADD,Cin T5 MA <- C ; Cout,MAin T6 MD <- R[ra] ; Gra,Rout,MDbus T7 M[MA] <- MD ; MDwr,Write,Wait on Done,End _______________________________________________________________________ STOP Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 ; Wait Forever _______________________________________________________________________ STR Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- PC ; PCout,Ain T4 C <- A + c1{sign-extended} ; C1out,ADD,Cin T5 MA <- C ; Cout,MAin T6 MD <- R[ra] ; Gra,Rout,MDbus T7 M[MA] <- MD ; MDwr,Write,Wait on Done,End _______________________________________________________________________ SUB Instruction Concrete RTN and Control Sequence STEP RTN Control Sequence T3 A <- R[rb] ; Grb,Rout,Ain T4 C <- A - R[rc] ; Grc,Rout,SUB,Cin T5 R[ra] <- C ; Cout,Gra,Rin,End Note: YOU CANNOT SWAP R[rb] AND R[rc]! THE ALU SUBTRACTS THE B INPUT FROM THE A INPUT! _______________________________________________________________________ REALLY SIMPLE RISC COMPUTER VHDL A full VHDL implementation of the RSRC is available from the instructor. SIMULATION The figure below shows a simulation of a VHDL implementation of the RSRC based on the datapath shown above. The RSRC was included in a “testbench” along with a non-volatile ROM model and a static RAM model. The non-volatile ROM held machine code that implemented a difference engine that computed the values of an example polynomial. Register “reg1” in the simulation shows the functional values of f(x) as the difference engine values are calculated starting with the value for f(x=0).