代写 MIPS NAME: NET ID:

NAME: NET ID:
Instructions:
CS-GY-6133 Mid-term Examination Fall 2016
1. The total duration for this examination is 2 hours. The exam contains three questions with a total value of 100 points.
2. Please write your name and net id number in the space above, and on the top of each page.
3. Write your solutions in the blank space provided after each question. It would be useful for you to clearly mark out your final answer.
4. Wherever possible, show your work and thought process. This will allow us to give you partial credit.
Question
Points
Total
1
2
3

Q1. ISA Comparison and Single-cycle Processor Design [40 Points]
In this problem, you will compare the micro-architecture of two ISAs: ISA-1 and ISA-2.
Both ISAs have the following properties:
􏰀 4 general purpose 8-bit registers.
􏰀 Address up to 256B of Byte addressable memory, therefore, each address is 8 bits
􏰀 Each instruction is 8 bits long
􏰀 The ISAs are 3-address ISAs
The instruction format is below
7 6 5 4 3 21 0 ISA-1 supports the following two instructions:
Recall that ldi is an example of an indexed load instruction. ISA-2 supports the following two instructions:
Recall that ldr is an example of a register indirect load instruction.
Opcode
Rs
rt
rd
Instruction
Opcode
Semantics
add rs, rt, rd
00
R[rd] <- R[rs] + R[rt] ldi rs, rt, rd 11 R[rd] <- Mem[R[rs] + R[rt]] Instruction Opcode Semantics add rs, rt, rd 00 R[rd] <- R[rs] + R[rt] ldr rs, rt, rd 11 R[rd] <- Mem[R[rs]] In this problem, you will design the micro-architecture of a processors that implement ISA-1 and ISA-2. Your designs must be a synchronous single-cycle micro-architectures. In other words, 􏰀 One instruction is fetched and executed in every clock cycle. 􏰀 Updates to architectural state (PC, and writes to register file) are synchronized to positive clock edges. In your implementation, you are allowed to use the following hardware blocks. 1. Flip-flops: the flip-flop 􏰁􏰂ample􏰂􏰃 its input on every positive clock edge, and is stable otherwise. 2. Instruction memory (IM): the IM reads an 8-bit address and returns the corresponding 8- bit instruction at that address. 3. Register File (RF): the RF reads two 2-bit addresses, Rd Reg. 1 and Rd Reg. 2, and returns the 8-bit values stored in the respective registers. It also reads a 2-bit Wrt Reg. address and updates the corresponding register with 8-bit Wrt Data. If there is a write and read attempt to/from the same register, you can assume the write happens first. The Write Enable control signal must be set to 1 to enable the write operation. 4. Adder: reads two 8-bit values, Op 1 and Op2, and outputs Res = Op1+Op2 5. Data Memory (DM): reads an 8-bit address and outputs the 8-bit value stored at that address. In addition, to the blocks above, you might need additional hardware blocks like MUXes and a decoder to generate control signals. If you use a decoder, please specify the decoder􏰄s functionality using pseudo-code. IM, RF, Adders, DM and Decoders are assumed to have a 1 ns execution latency. MUXes have a 0.2 ns execution latency. 1.1 [10 Points] Design a synchronous, single-cycle architecture that implements ISA-1. Draw the resulting micro-architecture in the space below. 1.2 [5 Points] Determine the minimum clock period at which the ISA-1 micro-architecture operates. Assume that IM, RF, Adders, DM and Decoders are assumed to have a 1 ns execution latency. MUXes have a 0.2 ns execution latency. 1.3 [10 Points] Design a synchronous, single-cycle architecture that implements ISA-2. In your implementation, you must exploit the fact that ISA-2 i􏰂 􏰁􏰂impler􏰃 (note that it uses the simpler register indirect addressing mode vs. ISA-1 that uses indexed addressing) to reduce the clock period of your ISA-2 micro-architecture compared to ISA-1. Draw the resulting micro- architecture in the space below. 1.4 [5 Points] Determine the minimum clock period at which the ISA-2 micro-architecture operates. As before, assume that IM, RF, Adders, DM and Decoders are assumed to have a 1 ns execution latency. MUXes have a 0.2 ns execution latency. 1.5 [5 Points] Emulate the ldi instruction of ISA-1 using ISA-2. That is, re-implement the following ISA-1 instruction: using add and ldr instructions. ldi rs, rt, rd 1.6 [5 Points] Assume that for an ISA-1 program, x% of instructions are adds and the rest are ldi instructions. The same program is implemented in ISA-2 by emulating ldi instructions using ISA-2 instructions, based on your implementation from Problem 1.5. Based on your answers to problems 1.4 and 1.5, determine the range of values of x for which ISA-2 programs execute faster than ISA-1 programs. Q2. Pipelined MIPS [30 Points] In cla􏰂􏰂 􏰅e 􏰂􏰆􏰇died a 􏰁􏰂􏰆andard􏰃 5-stage MIPS pipeline. Now imagine a 6-stage implementation in which the execute stage (EX) is split into two pipeline stages, EX1 and EX2. The first execute stage, EX1, takes as input both operands and the computation finishes in EX2. The 6-stage MIPS pipeline is shown below for 3 MIPS instructions: add, lw, and sw. 2.1 [10 Points] Consider the following two dependent add instructions with a RAW hazard:. add $1, $2, $3 add $3, $4, $5 Draw a timing diagram that shows how the RAW hazard can be resolved using forwarding and, if required, stalling. Note the forwarding path (i.e., which stage to which stage) that you used to resolve the dependency. 2.2 [10 Points] Now consider the same two dependent instructions as above, except this time with m independent instruction in between the two dependent instructions. add $1, $2, $3 Inst 1 Inst 2 ... Inst m add $3, $4, $5 In the sequence above, Inst 1, Inst 2, .., Inst m are m MIPS instructions that are independent of the two add instructions and independent of each other. They execute without any stalling. What is the largest value of m for which there is a RAW Hazard between the two dependent add instructions? Draw a timing diagram that shows how the RAW hazard can be resolved using forwarding and, if required, stalling. Your diagram only needs to illustrate the two dependent add instructions. Note the forwarding path (i.e., which stage to which stage) that you used to resolve the dependency. 2.3 [10 Points] Give an example of RAW hazard involving a lw instruction immediately followed by a sw instruction. Draw a timing diagram that shows how the RAW hazard can be resolved using forwarding and, if required, stalling. Note the forwarding path (i.e., which stage to which stage) that you used to resolve the dependency. Q3. Caches [30 Points] A 16-Byte, 4-way set-associative cache with 2-Byte blocks is used as the L1-cache for a processor with a Byte addressable 4GB main memory. 3.1 [5 Points] Determine the number of offset bits, index bits, and tag bits for the cache. 3.2 [15 Points] Assume that the cache uses a true LRU replacement policy, i.e., it evicts the block that was least recently used/accessed. The ways of the cache are labeled 0,1,2,3. You can assume that when a cache block is fetched into a set with one or more empty ways, it is placed in the empty way with the lowest numbered label. For example, if all four ways are empty and a cache block is fetched, it would be placed in Way 0. For the sequence of accesses below, determine if the access is a Hit or a Miss. Also indicate the index of the set and which way (way 0,1,2, or 3) the block is placed in. All accesses are assumed to be read accesses. Rd Access (Address in decimal) Hit/Miss Set Index Way # (0,1,2,3) 0 2 4 8 12 1 16 5 3 8 3.3 [5 Points] A processor consist of a two-level cache hierarchy: an L1 cache with a single cycle access time and a 5% miss rate, and an L2 cache with a 10 cycle access time and a 1% miss rate. Accessing main memory requires 100 clocks. What is the average memory access time (AMAT), measured in clock cycles, for this processor? 3.4 [5 Points] Consider two caches that have the same capacity, but different block sizes. Cache-1 has a small block size while Cache-2 has a large block size. Name one advantage and one disadvantage of Cache-1 vs. Cache-2.