CS计算机代考程序代写 CSE 141: Homework #2

CSE 141: Homework #2
Due: July 15, Thursday by 11:59pm on Gradescope
• Please either type in or if handwritten, make your answers as neat as possible.
• Enter your answers completely in the text box right below the question. Answers that appear outside of
the box might not be graded. Do not move the boxes.
• Q1-Q14 are compulsory questions and carry 5 points each.
• Q15-Q19 are optional questions for your practice and will not be graded.
For Q1 and Q2, assume for a given processor the CPI of arithmetic instructions is 1, the CPI of load/store instructions is 10, and the CPI of branch instructions is 3. Assume a program has the following instruction breakdowns: 600 million arithmetic instructions, 200 million load/store instructions, 100 million branch instructions.
Q1. Suppose that new, more powerful arithmetic instructions are added to the instruction set. On average, through the use of these more powerful arithmetic instructions, we can reduce the number of arithmetic instructions needed to execute a program to 80% of original value, and the cost of increasing the clock cycle time to 1.1 times of the earlier cycle time. Is this a good design choice? Why?

Q2. Suppose that we find a way to double the performance of arithmetic instructions. What is the overall speedup of our machine? What if we find a way to improve the performance of arithmetic instructions by 15 times?
For Q3 and Q4, assume that for a given program 70% of the executed instructions are arithmetic, 10% are load/store, and 20% are branch.
Q3. Given this instruction mix and the assumption that an arithmetic instruction requires 2 cycles, a load/store instruction takes 7 cycles, and a branch instruction takes 3 cycles, find the average CPI.

Q4. For a speedup of 1.25, how many cycles, on average, may an arithmetic instruction take if load/store and branch instructions are not improved at all?
For Q5 and Q6, consider the following two processors:
P1 has a clock rate of 3 GHz, average CPI of 0.7, and requires the execution of 1.0E9 instructions. P2 has a clock rate of 4 GHz, an average CPI of 0.9, and requires the execution of 5.0E9 instructions.
Q5. One usual fallacy is to consider the computer with the largest clock rate as having the largest performance. Check if this is true for P1 and P2. Explain your result.

Q6. Another fallacy is to consider that the processor executing the largest number of instructions will need a larger CPU time. Considering that processor P2 is executing a sequence of 1.0E9 instructions and that the CPI of processors P1 and P2 do not change, determine the number of instructions that P1 can execute in the same time that P2 needs to execute 1.0E9 instructions.
Q7. [Amdahl’s Law] Assume we have a program for which 15% of the execution time is inherently serial, and 85% is parallelizable. What is the maximum speedup expected (over single-core execution) when running on :
i) a 4-core multicore processor ii) an 8-core multicore processor

Q8. Assume 0xAC430014 has just been fetched within a single cycle processor with a register file containing: r0 = 0, r1=-2, r2=-1, r3=4, r4=-5, r5=8, r6=4, r8=2, r12=7, r31=-12. What are the outputs of the sign-extend and the jump “Shift left 2” units (Ref Fig 4.24, P&H) for this instruction?

Q9. For the processor state and instruction in Q8, list the data input values into the core ALU and the two add units. (Ref Fig 4.24, P&H). Assume the value of the PC is 0x200000.

Q10. Assume a single-cycle processor as in Fig 4.11(P&H) with clock cycle time of 10ns, of which the ALU takes 1 ns. Suppose we add a multiply instruction, so now the ALU takes 3 ns. By having a multiply instruction this reduces the total number of instructions executed by 8%. What is the speedup from adding multiply? Is that a good tradeoff?
Q11. The PowerPC ISA supports the load_indexed instruction, as in load_indexed $rd, $rs, $rt which means R[$rd] = M[$rs+$rt]. In what ways (if any!) would we need to change the processor (e.g., Fig 4.15, P&H) to support that instruction? Just consider the datapath, not control logic, and describe the changes.

Q12. For a single cycle processor, assume the latencies of the logic blocks Inst-mem, Add, Mux (any mux), ALU, Regfile, Data-Mem, Sign Extend, Shift-Left-2 are 250ps, 80ps, 30ps, 100ps, 120ps, 250ps, 20ps, 0ps respectively. If the only thing that the processor needed to do is fetch consecutive instructions what would be the cycle time? (eg, Fig 4.6, P&H).

Q13. Using the same latencies from Q12, what would be the cycle time of the processor (data path similar to Fig 4.11) if it now only had to support one instruction — beq.

Q14. Referencing the following figure from class, how difficult would it be to add support for the instruction addi to our datapath? What would I need to add (if anything!) ? You can just address the datapath, not the control logic (but if you need new control signals, give them names). If changes are needed, you can draw them as modifications to this figure.

Optional Questions
The questions below will not be graded, they are only for your practice.
For questions Q15 and Q17, consider a computer running a program that requires 250 s, with 75 s spent executing FP instructions, 80 s executing Ld/St instructions, 55s executing INT instructions and 40 s spent executing branch instructions.
Q15. What is the overall speedup if the time for FP operations is reduced by 20%?
Q16. By how much is the time for INT operations reduced if the overall speed up is 1.15? Assume the execution times for other instructions remain the same.
Q17. Assume a program requires the execution of 50 × 10^6 FP instructions, 110 × 10^6 INT instructions, 80 × 10^6 L/S instructions, and 16 × 10^6 branch instructions. The CPI for each type of instruction is 10, 1, 4, and 2 respectively. Assume that the processor has a 2 GHz clock rate.
What should be the CPI of FP instructions if we want the program to run three times faster?

Q18. Machine B runs at 2 GHz and has a CPI of 1.5 for a particular program. Machine C, which runs at 5 GHz, has a CPI of 2.5 for that program, while executing 20% more instructions. Which machine is faster? What is the speedup over the slower machine?

Q19. Referencing the following figure from class, how difficult would it be to add support for the instruction jr to our datapath? What would I need to add? You can just address the datapath, not the control logic (but if you need new control signals, give them names). If changes are needed, you can draw them as modifications to this figure.