程序代写代做代考 cache 1 of 2

1 of 2

0907432 Computer Design (Spring 2010)

Midterm Exam Solution

:رقم الشعبة :رقم التسجيل :السما
==================================================================================

Instructions: Time 50 min. Closed books & notes. No calculators or mobile phones. No questions are allowed. Show

your work clearly. Every problem is for 6 marks.

==================================================================================

Q1. Assume that the instructions executed by a processor are broken down as follows:

Type ALU Control Memory

Frequency 50% 20% 30%

If you can improve the performance of only one type of these three instruction types by a factor of 2, answer the

following two questions.

(a) What type would you improve? ______ALU____________

(b) Using Amdahl’s law, what is the resulting overall speedup? ___4/3____

Speedup = 1 / ( 1-0.5 + 0.5/2 ) = 1/.75 = 4/3

Q2. Describe how the branch history table (BHT) and the branch target buffer (BTB) are used in branch prediction.

____Using the branch instruction address,

______1) The BHT is consulted to predict the branch direction

______2) The BTB is consulted to predict the branch target address

__________________________________________________________________________________________

Q3. Assume that you have a typical 5-stage pipelined processor that uses forwarding and stalls to solve data

hazards, resolves branches in the decode stage, and has one branch delay slot. After you do pipeline

scheduling (rearrange instructions) for the following code sequence, the minimum number of cycles

needed to execute this code sequence is: ____9______ (You must show your rearranged code)

Original Code Sequence Rearranged Code Sequence

L1: lw R5, 0(R1)

lw R6, 0(R5)

add R7, R7, R6

addi R1, R1, #-4

bne R1, R2, L1

nop

L1: lw R5, 0(R1)

addi R1, R1, #-4

lw R6, 0(R5)

bne R1, R2, L1

add R7, R7, R6

2 of 2

Q4. Assume that the following code sequence is executed by a speculative superscalar processor of degree

2. This processor uses reservation stations and reorder buffer. The integer latency is 1 cycle and the load

latency is 2 cycles. The processor has one address calculation unit, one memory access unit, one integer

ALU unit, and one branch unit. Assume that the processor predicts that the branch instruction is not

taken and the branch is actually not taken. Using pipeline diagram in the space below, the number of

cycles needed to fetch and commit these instructions is: ___10_____

1 2 3 4 5 6 7 8 9 0 1 2 3 4 5

lw R2, 0(R1) F I A M W C

lw R3, 4(R1) F I A M W C

add R4, R2, R3 F I E W C

beq R2, R3, Skip F I E W C

sub R5, R2, R3 F I E W C

Skip:

Q5. For a direct-mapped cache design with 32-bit address, the following bits of the address are used to access the

cache.

Tag Index Offset

31-12 11-5 4-0

(a) What is the cache line size (in words)? _____2
5
/4 = 32/4 = 8 words_________

(b) How many lines does the cache have? __2
11-5+1

= 2
7
= 128___________

(c) What is the ratio between total bits required for such a cache implementation over the data storage bits?

Ratio = [ 128 * (32*8 + 20 + 1) ] / [ 128 * (32*8) ] = 277/256

Starting from power on, the following byte addressed cache references are recorded.

Address Block Address Cache Index Hit or Miss

0 0 0 Miss
4 0 0 Hit
16 0 0 Hit

128 4 4 Miss
224 7 7 Miss
160 5 5 Miss
4100 128 0 Miss
30 0 0 Miss
140 4 4 Hit
3100 96 96 Miss

(d) How many blocks are replaced? __two blocks___

(e) What is the hit ratio? ___3/10_______