CS计算机代考程序代写 mips c++ cache algorithm Student ID: __

Student ID: __
Question 1. Understanding MIPS Program (10 Marks)
Consider the following MIPS procedure. The numbers at the left are the line numbers of the code.
1) procedure:
2) label1:
3)
4) label2: 5)
addi $t0,$zero,1
add $v0,$zero,$zero
slt $t7,$t0,$a1
beq $t7,$zero,label3
sll $t1,$t0,2
add $t1,$t1,$a0
lw $t2, 0($t1)
lw $t3, -4($t1)
slt $t7,$t3,$t2
beq $t7,$zero,label4
addi $t0,$t0, 1
j label2
6)
7)
8)
9)
10)
11)
12)
13)
14) label3:
15) label4: jr $ra
addi $v0,$zero,1
a) Assume that register $a0 holds the base address of an integer array A, and register $a1 holds an integer indicating the number of elements in array A (i.e. if array A has 6 elements, $a1 will be 6). What will be the return value of the procedure in $v0, if A={1,5,3}? You just need to write a single value. (3 marks)
Answer: 0
b) By referring to part (a) or otherwise, explain what the return value in $v0 indicates with respect to the elements in array A using one sentence only. (4 marks)
Answer: The procedure checks to see if the elements of array A are stored in ascending order. The procedure returns 1 in $v0 if the elements are in ascending order, it returns 0 (in $v0) otherwise. Or similar explanations.
c) The procedure does not preserve the $ra register value. Explain in one sentence whether this would create potential problem. (3 marks)
Answer: There is no nested jal call, so the returning address in $ra will have no danger of being overwritten, therefore there is no problem here.
2

Student ID: __ The Euclidean Algorithm is a recursive algorithm that computes the greatest common divisor
Question 2. MIPS recursions (18 Marks)
(GCD) of two positive integers, x and y. The C++ code of the Euclidean Algorithm is given below:
int euclidean(int x, int y){
if (y==0){
return x; }
else{
return euclidean(y,x%y));
} }
The mod􏰑l􏰑􏰒 o􏰓e􏰔a􏰕o􏰔, 􏰖%􏰗, abo􏰘e 􏰓e􏰔fo􏰔m􏰒 􏰕he di􏰘i􏰒ion o􏰓e􏰔a􏰕ion (x/y), and returns the integer remainder resulting from the division operation.
Implement the above C++ function using a recursive MIPS procedure. When the procedure is being called, assume that:
􏰙 $a0containsx,
􏰙 $a1containsy,
􏰙 preserve registers whenever necessary,
􏰙 you are not allowed to write any extra helper procedure.
Hint: you can do the division of x/y using the div instruction (see the appendix for the details).
Write you code together with comments on the blank lines provided below. The blank lines are more than enough. You can add as many labels as you wish. Marks will be deducted if you do not follow the rules stated.
3

Student ID: __
euclidean:
base_case:
#if base case reached, return. Otherwise continue in non_base_case
________________________________________ # ________________________________________ # ________________________________________ # ________________________________________ # ________________________________________ #
non_base_case:
#back up appropriate register(s) to the stack
________________________________________ # ________________________________________ # ________________________________________ # ________________________________________ # ________________________________________ #
#do the division operation, set up new x, y for the next recursive call ________________________________________ # ________________________________________ # ________________________________________ # ________________________________________ # ________________________________________ # ________________________________________ #
#done with recursion, restore appropriate register(s) from the stack ________________________________________ # ________________________________________ # ________________________________________ # ________________________________________ # ________________________________________ #
return:
jr $ra
4

non_base_case:
addi $sp, $sp, -4
#push
#
#
#
#recursive call
#pop
#return to the caller function
sw $ra, 0($sp)
div $a0, $a1
add $a0, $a1, $zero
mfhi $a1
jal euclidean
lw $ra, 0($sp)
addi $sp, $sp, 4
return:
2 points
jr $ra
5
Student ID:
__
Answer:
euclidean:
base_case:
slti $t0,$a1,1 #test whether to terminate
beq $t0,$zero, non_base_case #non base-case
j return #return to the caller function
2 points 2 points 2 points
2 points
2 points 2 points 2 points 2 points

Student ID: __
Q􏰑e􏰒􏰕ion 3: Com􏰓􏰑􏰕e􏰔 A􏰔i􏰕hme􏰕ic (16 Ma􏰔k􏰒)
a)
Two arbitrary unsigned 2-bit binary integers a1a0 and b1b0 are multiplied together to give the unsigned binary integer r3r2r1r0. From the observation of the paper-and-pencil example below, the hardware carrying out the operation can be implemented directly using only AND gates and 1-bit Adders.
𝑏􏰚 𝑏0 􏰛 𝑎􏰚𝑎0
______________________________________________________________ 0 𝑎0.𝑏􏰚 𝑎0.𝑏0
􏰜 𝑎􏰚.𝑏􏰚 𝑎􏰚.𝑏0 0 ______________________________________________________________
𝑟𝑟𝑟𝑟 􏰝􏰞􏰚0
Based on the above, complete the connections of the following circuit so that it can perform the multiplication described above. Some of the connections have been done for you. The input/output pins of the 1-bit adder employed are shown in the right. (6 marks)
Hint: there might be carry-in among neighboring bits.
6

Student ID: __
Answer: 2 marks for r1, r2 and r3 each
b) Given unsigned dividend 11100, and divisor 00101, compute the division result (quotient and remainder). The division is executed by the refined version of the division hardware (5-bit) as shown below.
Fill as many rows below as needed to produce the 􏰔e􏰒􏰑l􏰕. E􏰟􏰓lain 􏰕he o􏰓e􏰔a􏰕ion􏰒 in 􏰕he 􏰖Ac􏰕ion􏰗 column. (10 marks)
Hin􏰕: O􏰓e􏰔a􏰕ion􏰒 can be 􏰖􏰒􏰑b􏰕􏰔ac􏰕ion􏰗, 􏰖􏰒hif􏰕 lef􏰕, 􏰒e􏰕 LSB 0􏰗, 􏰖􏰒hif􏰕 lef􏰕, 􏰒e􏰕 LSB 1􏰗, 􏰖􏰑ndo􏰗 and 􏰖adj􏰑􏰒􏰕 􏰔emainde􏰔􏰗, 􏰠hich ca􏰔􏰔􏰡 􏰕he 􏰒ame meaning a􏰒 in lec􏰕􏰑􏰔e no􏰕e􏰒. LSB = lea􏰒􏰕 significant bit.
7

Iteration 0
1
2
Divisor (D)
Remainder (R) 00000 11100 00001 11000 11100 11000 00001 11000 00011 10000 11110 10000 00011 10000 00111 00000 00010 00000 00100 00001
11111 00001 00100 00001 01000 00010 00011 00010 00110 00101
00011 00101
3
00101
4
5
6
1.5 marks each iteration, total 7.5 marks 2.5 mark for correct final result
8
Student ID:
Action Initial State Shift left subtraction undo
Shift left, set LSB0 Subtraction undo
Shift left, set LSB0 Subtraction Shift left, set LSB1
subtraction undo
Shift left, set LSB0 subtraction Shift left, set LSB1
Adjust reminder
__

Student ID: __
Q􏰑e􏰒􏰕ion 4. Single-c􏰡cle Da􏰕a􏰓a􏰕h and Con􏰕􏰔ol (15 Ma􏰔k􏰒) Given the following single-cycle implementation
a) Highlight all datapath (datapath wires?) involved in each step during the execution of lw instruction on the above diagram. Draw only after clear thoughts. (8 marks)
Instruction fetch: 2 marks
Read register: 2 marks
Memory address calculation: 2 marks Memory read: 1 mark
Write back: 1 marks
9

b) Write down the values of control signals required for the lw instruction. (4 marks)
Note:
1. A 􏰖don􏰢􏰕 ca􏰔e􏰗 con􏰕􏰔ol 􏰒ignal i􏰒 􏰔e􏰓􏰔e􏰒en􏰕ed b􏰡 an 􏰖􏰟􏰗. If 􏰕he 􏰒ignal i􏰒 a 􏰖don􏰢􏰕 ca􏰔e􏰗, onl􏰡
􏰖􏰟􏰗 i􏰒 con􏰒ide􏰔ed co􏰔􏰔ec􏰕 (i.e. 􏰖0􏰗 , 􏰖1􏰗 a􏰔e bo􏰕h inco􏰔􏰔ec􏰕).
2. Below table asks for 2-bit ALUOp but not 4-bit ALU control
RegDst ALUsrc MemtoReg RegWrite MemRead MemWrite ALUOp1 ALUOp0
Answer: RegDst = 0, RegWrite = 1, ALUSrc = 1, ALUOp = 00, MemRead = 1, MemWrite = 0, MemToReg = 1
0.5 mark for each control signal
c) Assume that memories and the ALU have 2ns delays, the register file has a 1ns delay. The delay of all other components, e.g. multiplexers, control unit, PC accesses, sign extension unit, and wires can be ignored. What would be the minimum required cycle time for lw? Briefly explain your answer. Answer with no verification will receive zero mark. (3 marks)
Answer: Total needs 8ns [1 mark],
2ns for instruction fetch, 1ns for reading registers, 2ns for effective address computation, 2ns to read memory, and 1ns for write back. [2 marks]
Student ID: __
10

Student ID: __
Q􏰑e􏰒􏰕ion 5. M􏰑l􏰕i-c􏰡cle Da􏰕a􏰓a􏰕h and Con􏰕􏰔ol (16 Ma􏰔k􏰒)
Consider implementing a new imaginary 5-cycle MIPS R-type instruction: sub3. The instruction
is as follows:
sub3 $rd, $rs, $rt, $ru #Reg[rd] = Reg[ru] – Reg[rs] – Reg[rd] The instruction is still in R-type format, where the shamt field is used to hold ru.
Field op rs rt rd shamt/ru func Bits 31-26 25-21 20-16 15-11 10-6 5-0
a) Modify the datapath from lecture in the following diagram to support the execution of sub3. The modification should finish sub3 execution in as small number of cycles as possible. Hint:
1. rd = ru-rs-rt can be calculated as rd = ru-(rt+rs).
2. The modification may include adding wires and/or adding/extending multiplexors to
the datapath. New/modified control signals may be needed correspondingly.
3. Do not modify the main functional units (i.e. the ALU, Memory, and register file). (5
marks)
11

1. 2.
Cycle
Cycle 2 operations are the same for all MIPS instructions.
You should decide how many cycles are needed for sub3. The table below gives more than enough number of rows for cycles.
Operations
Answer:
Student ID: __
b)
Describe the operations executed in each cycle for the new sub3 instruction. (5 marks) Hint:
Cycle 1
– Fetch
– PC=PC+4
Cycle 2
– Read $rs and $rt [0.5 mark]
– Calculate the PC-relative address [0.5 mark]
Cycle 3
– Calculate the sum of the values in $rs and $rt: ALUOut = $rs + $rt [1 mark]
– Read $ru [1 mark]
Cycle 4
– Subtract $ru with $rs+$rt: ALUOut = $ru – ALUOut [1 mark]
12

Student ID: __
– Store the value from AluOut to $rd [1 mark]
c) Complete the finite state machine (FSM) diagram for sub3 instruction from its 3rd cycle. Remember to fill all the control signals that you have added or modified in part (a). (6 marks)
Note: Control signals appeared in states (e.g. the cycles) follow similar rules as explained during the lecture.
Cycle 5
Cycle 6 – Cycle 7 –
13

Student ID: __
14

Student ID: __
Answer:
15

Student ID: __
16

Student ID: __ a) Consider a program with 2500 instructions. With the following instruction mix:
Q􏰑e􏰒􏰕ion 6. Pi􏰓eline (15 Ma􏰔k􏰒)
Instruction class R-type Load Store Branch Jump
Mix 38% 26% 12% 20% 4%
Given the cycle times and the CPI per instruction type in the table below for three processor design alternatives, calculate the average CPI in executing the program and the program execution time for each of the three alternatives (consider 1 ns =10-9 second). Show all your steps. (6 marks)
Processor Designs
Single-cycle 5-stage multi-cycle 5-stage pipelined
Cycle time
CPI per instruction type R-type Load Store Branch Jump
10ns 1 2ns 4 2ns (assume no stalls) 4
1 1 1 1 5 4 3 3 5 4 3 3
Answer:
i) Single-cycle implementation
CPI = 1 Time = 2500 * 1 * 10 ns = 25 μs
ii) 5-stage multi-cycle implementation
CPI = (0.38*4 + 0.26*5 + 0.12*4 + 0.20*3 + 0.04*3) = 4.02
Time = 2500*4.02*2 ns = 20.1 μs
iii) 5-stage pipelining implementation
CPI = (4+2500)/2500 = 1.0016 Time = 2500*1.0016*2 ns = 5.008 μs
2 marks for each alternative, 1 mark for CPI, 1 mark for execution time
b) The following sequence of instructions is executed in a basic five-stage (IF, ID, EXE, MEM, WB) pipelined processor.
Assumption:
1. No structural hazards.
2. Full set of forwarding hardware is available, from any stage to any stage (provided it
does not violate the timing consistency).
Show how the instructions below would progress through pipeline. Draw forwarding path among stages if used. (9 marks)
Proper pipeline is 5 marks, 1 mark for each instruction
4 forwarding paths, 1 mark each
17

Student ID: __ 1 2 3 4 5 6 7 8 9 10 11 12
or $8, $1, $4
and $7, $8, $4
lw $8, 4($7)
add $1, $2, $8
sw $1, 4($7)
IF ID E IF ID IF
M W B
E M W B
ID E M W B
BU IF ID E M W BB
BU IF ID E M W BB
18

Student ID: __
Question 7. Memory and Cache (10 Marks)
a) Consider a memory system with 2 levels of caches as shown in the figure below:
The latencies for L1 cache, L2 cache, and the memory are 2 cycles, 4 cycles and n cycles respectively. Assume there is no latency for bus and cache lookups. Assume the hit rate in level 1 cache is 95%, and the hit rate in the level 2 cache is 90%, what is the maximum value for the memory latency n such that the average latency of a single memory access is no more than 3 cycles? (4 marks)
Answer:
2 + 0.05*4+ 0.05*0.1*n <= 3 0.005*n <= 0.8 n <= 160 Therefore the memory latency should be less than or equal to 160 cycles. b) In running a program, the CPU generates the following list of 32-bit memory addresses to access: 0xA0000004, 0xD0000030, 0xA000004C, 0xD0000038 Assume the memory is divided into blocks of 16 bytes in size, consider a direct-mapped cache with 4 cache frames and a block size of 16 bytes. The cache contains no data before the first memory access (at address 0xA0000004). Fill out the following table. The first row has been done for you. (6 marks) Memory (byte) address being referenced 0xA0000004 0xD0000030 0xA000004C 0xD0000038 Block Cache block assigned address 0xA000000 0xA000000 mod 4 = 002 Hit or Miss in cache Miss 19 Answer: 0xA0000004 0xD0000030 0xA000004C 0xD0000038 0xA000000 0xD000003 0xA000004 0xD000003 0xA000000 mod 4 = 002 0xD000003 mod 4 = 112 0xA000004 mod 4 = 002 0xD000003 mod 4 = 112 Miss Miss Miss Hit Student ID: __ Memory (byte) address being referenced Block address Cache block assigned Hit or Miss in cache ------ End of Exam Paper ------ 20 Student ID: __ A􏰓􏰓endi􏰟 1: MIPS in􏰒􏰕􏰔􏰑c􏰕ion􏰒 1 21 Student ID: __ A􏰓􏰓endi􏰟 2: MIPS in􏰒􏰕􏰔􏰑c􏰕ion􏰒 2 22 Student ID: __ --------Draft paper-------- 23