代写代考 Project 2.2 – Computer Architecture I – ShanghaiTech University

Project 2.2 – Computer Architecture I – ShanghaiTech University

Copyright By PowCoder代写 加微信 powcoder

Project 2.2: CPU

Computer Architecture I ShanghaiTech University

Project 2.1 Project 2.2

IMPORTANT INFO – PLEASE READ

The projects are part of your design project worth 2 credit points. As such they run in parallel to the actual course. So be aware that the due date for project and homework might be very close to each other! Start early and do not procrastinate.

In this project you will be using logisim-evolution to implement a 32-bit two-cycle processor based on RISC-V. Once you’ve completed this
project, you’ll know essentially everything you need to build a computer from scratch with nothing but transistors.

You are allowed to use any of Logisim’s built-in blocks for all parts of this project.
Save frequently and commit frequently! Try to save your code in Logisim every 5 minutes or so, and commit every time you produce a new feature, even if it is small.
Tests for a completed CPU have been included with the lab starter code. You can run them with the command ./cpu-sanity.sh. See the Testing section for information on how to interpret your test output.
Don’t move around the given inputs and outputs in your circuit; this could lead to issues with the autograder.
Because the files you are working on are actually XML files, they’re quite difficult to merge properly. Do not work on the project in two places and attempt to merge your changes! If you are working separately from your partner, make sure that only one person is working on the project at any given time. We highly recommend pair programming on this project; understanding the nuts and bolts should help you experience your magical software-meets-hardware-I-actually-know-how-a-computer-works-now moment (and will prepare you to tackle datapath problems on exams).

Please read this document CAREFULLY as there are key differences between the processor we studied in class and the processor you will be designing for this project.

0) Obtaining the Files

Clone your p2.2 repository from gitlab.
In the repository add a remote repo that contains the framework files:
git remote add framework https://autolab.sist.shanghaitech.edu.cn/gitlab/cs110_22s/p2.2_framework.git
Go and fetch the files:
git fetch framework
Now merge those files with your master branch:
git merge framework/master
The rest of the git commands work as usual.

RISC-V Green Card

RISC-V ISA Manual

What do the different wire colors mean?

Some common sources of Logisim errors, for your debugging convenience:

1) Processor

We have provided a skeleton for your processor in cpu.circ. You will be using your own implementations of the ALU and RegFile as you
construct your datapath. If you find somewhere need to change or find any errors in either of these components, you should definitely fix them. You are responsible for
constructing the entire datapath and control from scratch. Your completed processor should implement the ISA detailed below in the section
Instruction Set Architecture (ISA) using a two-cycle pipeline, with IF in the first stage and ID, EX, MEM, and WB in the second stage.

Your processor will get its program from the processor harness run.circ. Your processor will output the address of an instruction,
and accept the instruction at that address as an input. Inspect run.circ to see exactly what’s going on. (This same harness will
be used to test your final submission, so make sure your CPU fits in the harness before submitting your work!) Your processor has 2 inputs
that come from the harness:

Input Name Bit Width Description
INSTRUCTION 32 Driven with the instruction at the instruction memory address identified by the FETCH_ADDRESS (see below).
CLOCK 1 The input for the clock. As with the register file, this can be sent into subcircuits (e.g. the CLK input for your register file) or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not AND it with anything, etc.).

Your processor must provide the following outputs to the harness:

Output Name Bit Width Description
ra 32 Driven with the contents of ra. (FOR TESTING)
sp 32 Driven with the contents of sp. (FOR TESTING)
t0 32 Driven with the contents of t0. (FOR TESTING)
t1 32 Driven with the contents of t1. (FOR TESTING)
t2 32 Driven with the contents of t2. (FOR TESTING)
s0 32 Driven with the contents of s0. (FOR TESTING)
s1 32 Driven with the contents of s1. (FOR TESTING)
a0 32 Driven with the contents of a0. (FOR TESTING)
fetch_addr 32 This output is used to select which instruction is presented to the processor on the INSTRUCTION input.

Just like with the ALU and RegFile, be careful NOT to move the input or output pins!

1.5) Memory

The memory unit should be implemented by you! Here’s a quick summary of its inputs and outputs:

Output Name In- or Out-put? Bit Width Description
A: ADDR In 32 Address to read/write to in Memory
D: WRITE DATA In 32 Value to be written to Memory
En: WRITE ENABLE In 1 Equal to one on any instructions that write to memory, and zero otherwise
Clock In 1 Driven by the clock input to cpu.circ
D: READ DATA Out 32 Driven by the data stored at the specified address.

Note that the memory is word-addressable, meaning given an address, it will return 4-byte data.

Note: Do not change the property of given RAM in mem.circ.

2) The Instruction Set Architecture

Your CPU will support the instructions listed below. Most of the instructions should behave the same as the RISC-V you learned in class.
Notice that this is not all of the instructions on the green card! These are the only instructions you will be tested on implementing,
but you will not be punished for implementing other instructions (the autograder will cover ONLY the instructions below). If anything surprises you,
it is likely that we made a mistake. Please make a Piazza post about it.

Instruction Formats

The Instructions

Instruction Type Opcode Funct3 Funct7/IMM Operation
add rd, rs1, rs2 R 0x33 0x0 0x00 R[rd] ← R[rs1] + R[rs2]
mul rd, rs1, rs2 0x0 0x01 R[rd] ← (R[rs1] * R[rs2])[31:0]
sub rd, rs1, rs2 0x0 0x20 R[rd] ← R[rs1] – R[rs2]
sll rd, rs1, rs2 0x1 0x00 R[rd] ← R[rs1] << R[rs2] mulh rd, rs1, rs2 0x1 0x01 R[rd] ← (R[rs1] * R [rs2])[63:32] mulhu rd, rs1, rs2 0x3 0x01 (unsigned) R[rd] ← (R[rs1] * R[rs2])[63:32] xor rd, rs1, rs2 0x4 0x00 R[rd] ← R[rs1] ^ R[rs2] div rd, rs1, rs2 0x4 0x01 R[rd] ← R[rs1] / R[rs2] divu rd, rs1, rs2 0x5 0x01 (unsigned) R[rd] ← R[rs1] / R[rs2] srl rd, rs1, rs2 0x5 0x00 R[rd] ← R[rs1] >> R[rs2]
or rd, rs1, rs2 0x6 0x00 R[rd] ← R[rs1] | R[rs2]
remu rd, rs1, rs2 0x7 0x01 (unsigned) R[rd] ← R[rs1] % R[rs2]
and rd, rs1, rs2 0x7 0x00 R[rd] ← R[rs1] & R[rs2]
lb rd, offset(rs1) I 0x03 0x0 R[rd] ← SignExt(Mem(R[rs1] + offset, byte))
lh rd, offset(rs1) 0x1 R[rd] ← SignExt(Mem(R[rs1] + offset, half))
lw rd, offset(rs1) 0x2 R[rd] ← Mem(R[rs1] + offset, word)
lbu rd, offset(rs1) 0x4 (unsigned) R[rd] ← Mem(R[rs1] + offset, byte)
lhu rd, offset(rs1) 0x5 (unsigned) R[rd] ← Mem(R[rs1] + offset, half)
addi rd, rs1, imm 0x13 0x0 R[rd] ← R[rs1] + imm
slli rd, rs1, imm 0x1 0x00 R[rd] ← R[rs1] << imm slti rd, rs1, imm 0x2 R[rd] ← (R[rs1] < imm) ? 1 : 0 xori rd, rs1, imm 0x4 R[rd] ← R[rs1] ^ imm srli rd, rs1, imm 0x5 0x00 R[rd] ← R[rs1] >> imm
ori rd, rs1, imm 0x6 R[rd] ← R[rs1] | imm
andi rd, rs1, imm 0x7 R[rd] ← R[rs1] & imm

sw rs2, offset(rs1) S 0x23 0x2 Mem(R[rs1] + offset) ← R[rs2]
beq rs1, rs2, offset SB 0x63 0x0
if(R[rs1] == R[rs2])

 PC ← PC + {offset, 1b’0}

blt rs1, rs2, offset 0x4
if(R[rs1] less than R[rs2] (signed))

 PC ← PC + {offset, 1b’0}

bltu rs1, rs2, offset 0x6
if(R[rs1] less than R[rs2] (unsigned))

 PC ← PC + {offset, 1b’0}

bne rs1, rs2, offset 0x1
if(R[rs1] != R[rs2])

  PC ← PC + {offset, 1b’0}

bgeu rs1, rs2, offset 0x7
if(R[rs1] >= R[rs2] (unsigned))

 PC ← PC + {offset, 1b’0}

auipc rd, offset U 0x17 R[rd] ← PC + {offset, 12b’0}
lui rd, offset 0x37 R[rd] ← {offset, 12b’0}
jal rd, imm UJ 0x6f
R[rd] ← PC + 4

PC ← PC + {imm, 1b’0}

jalr rd, rs, imm I 0x67 0x0
R[rd] ← PC + 4

PC ← R[rs] + {imm}

For I, S, B, U, and J type instructions, you will have to extract the appropriate immediate from the instruction. You may want to create an
Immediate Generator subcircuit to do this. Remember that in RISC-V, all immediates that leave the immediate generator are 32-bits and sign-extended!
You may find the following diagram helpful for thinking about how to create the appropriate immediate for each type:

3) Controls

You can probably guess that control signals will play a very large part in this project. Figuring out all of the control signals may seem
intimidating. Remember that there is not a definitive set of control
signals–walk through the datapath with different types of instructions, and when you see a mux or other component think about what
selector/enable value you will need for that instruction. We suggest you create a “Control Module” subcircuit that takes in an instruction and outputs the
control signals for that instruction.

There are two major approaches to implementing the Control so that it can translate the opcode/funct3/funct7 to the corresponding instruction and then set the
control signals. One way to do, ROM control was outlined in lecture. Every instruction implemented by a processor maps to an address in a ROM (read-only memory) unit.
At that address in the ROM is the control word for that instruction. An address decoder takes in an instruction and outputs the address of the control word
for that instruction. This approach is common in CISC architectures like Intel’s x86-64, and offers some flexibility because it can be re-programmed by changing the contents of the ROM.

The other method is hard-wired control, which is usually the preferred approach for RISC architectures like MIPS and RISC-V.
Hard-wired control uses “AND”, “OR”, and “NOT” gates (along with the various components we’ve learned can be built from these gates, like
MUXes and DEMUXes) to produce the appropriate control signals. The instruction decoder takes in an instruction and outputs all of the control signals
for that instruction.

The method you choose to implement your control unit is one of the major design decisions you get to make in this project. If you have a partner,
be sure to discuss which approach you both believe is best.

4) Pipelining

Your processor will have a 2-stage pipeline:

Instruction Fetch: An instruction is fetched from the instruction memory. (Note: while you can, please do not calculate jump address
in this stage. Instead, you should try to deal with the jump control hazard.)
Execute: The instruction is decoded, executed, and committed (written back). This is a combination of the remaining stages of a normal
five-stage RISC-V pipeline.

First, make sure you understand what hazards you will have to deal with.

The instruction immediately after a branch or jump is not executed if a branch is taken. This makes your task a bit more complex. By the time you have figured out that a branch or jump is
in the execute stage, you have already accessed the instruction memory and pulled out (possibly) the wrong instruction. You will therefore need
to “kill” instruction that is being fetched if the instruction under execution is a jump or a taken branch. Instruction
kills for this project MUST be accomplished by MUXing a nop into the instruction stream and sending the nop into the Execute
stage instead of using the fetched instruction. Notice that 0x00000013, or addi x0, x0, 0 is a nop instruction; you should only kill if a branch is taken (do not kill otherwise). Do kill on every type of jump.

Do not solve this issue by calculating branch offsets in the IF stage. Because we test your output against the reference every cycle,
and the reference returns a nop, while it may be a conceptually correct solution, this will cause you to fail our tests.

Because all of the control and execution is handled in the Execute stage, your processor should be more or less indistinguishable from a
single-cycle implementation, barring the one-cycle startup latency and the branch/jump delays. However, we will be enforcing the two-stage
pipeline design. Some things to consider:

Will the IF and EX stages have the same or different PC values?
Do you need to store the PC between the pipelining stages?
To MUX a nop into the instruction stream, do you place it before or after the instruction register?
What address should be requested next while the EX stage executes a nop? Is this different than normal?

You might also notice a bootstrapping problem here: during the first cycle, the instruction register sitting between the pipeline stages won’t
contain an instruction loaded from memory. How do we deal with this? It happens that Logisim automatically sets registers to zero on reset; the
instruction register will then contain a nop. We will allow you to depend on this behavior of Logisim. Remember to go to
Simulate –> Reset Simulation (Ctrl+R) to reset your processor.

Getting Started: A Guide

We know that trying to build a CPU with a blank slate might be intimidating, so we want to guide you through how to think about this project
by implementing a simple I-type instruction, addi.

Recall the five stages of the CPU pipeline:

Instruction Fetch
Instruction Decode
Write Back

This guide will help you work through each of these stages, as it pertains to the add instruction. Each section will contain
questions for you to think through and pointers to important details, but it won’t tell you exactly how to implement the instruction.
You may need to read and understand each question before going to the next one, and you can see the answers by clicking
on the question. During your implementation, feel free to place things in subcircuits as you see fit.

Stage 1: Instruction Fetch

The main thing we are concerned about in this stage is: how do we get the current instruction? From lecture, we know that instructions are
stored in the instruction memory, and each of these instructions can be accessed through an address.

Which file in the project holds your instruction memory? How does it connect to your cpu.circ file?
The instruction memory is the ROM module in run.circ. It provides an input into your CPU named “Instruction” and takes an
output from your CPU named “fetch_addr”.

In your CPU, how would changing the address you output to fetch_addr affect the instruction input?
The instruction that run.circ outputs to your CPU should be the instruction at address fetch_addr in instruction memory.

How do you know what the fetch_addr should be? (Hint: it is also known as PC)
fetch_addr is the address of the current instruction being executed, so it is saved in the PC register. For this project,
it’s fine for the PC to start at 0, and that is the default value for registers.

For this project, does your PC hold an address of a byte or a word?
If you look in run.circ, you will see that the address coming from your CPU goes straight into the memory module; the bits that
cut off by the splitter are the upper 18 bits of the address. So fetch_addr is a word address. Whether PC is a word or a byte address
is up to you as the hardware designer, but keep in mind that instructions that write PC + 4 back to the RegFile are treating PC like a byte address.

For basic programs without any jumps or branches, how will the PC change from line to line?
The PC must increment by 1 instruction in order to go to the next instruction, as the address held by the PC register represents what
instruction to execute.

We have provided the PC register in the cpu.circ file. Please implement the PC’s behavior for simple programs – ignoring jumps
and branches. You will have to add in the latter two in the project, but for now we are only concerned with being able to run strings
of addi instructions. Where should the output of the PC register go? Remember to connect the clock! Remember that we’re implementing
a 2-stage pipelined processor, so the IF stage is separate from the remaining stages. What circuitry separates the different stages of a pipeline? Do
you need to add anything?

Stage 2: Instruction Decode

Now that we have our instruction coming from the instruction input, we have break it down in the Instruction Decode step,
according to the RISC-V instruction formats you have learned.

What type of instruction is addi? What are the different bit fields and which bits are needed for each?
I type. The fields are:
imm [31-20]
rs1 [19-15]
funct3 [14-12]
opcode [6-0]

In Logisim, what tool would you use to split out different groups of bits?

Implement the instruction field decode stage using the instruction input. You should use tunnels to label and group the bits.

Now we need to get the data from the corresponding registers, using the register file. Which instruction fields should be connected
to the register file? Which inputs of the register file should it connect to?
Instruction field rs1 will need to connect to read register 1.

Implement reading from the register file. You will have to bring in your RegFile from Project 2-1. Remember to connect
the clock!

What does the Immediate Generator need to do?
For addi, the immediate generator takes in 12 bits and produces a signed 32-bit immediate. We highly suggest you create an Immediate Generator subcircuit!

Stage 3: Execute

The Execute stage, also known as the ALU stage, is where the computation of most instructions is performed. This is also where we will introduce
the idea of using a Control Module.

For the add instruction, what should be your inputs in to the ALU?
Read Data 1 and the immediate produced by the Immediate Generator.

In the ALU, what is the purpose of ALU_Sel?
It chooses which operation the ALU should perform.

Although it is possible for now to just put a constant as the ALUSel, why would this be infeasible as you need to implement more instructions?
With more instructions, the input to the ALU might need to change, so you would need to have some sort of

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com