Name:_______________________________
ECE 437: Introduction to Digital Computer Design & Prototyping Midterm #1
Your exam should have 14 (fourteen) pages.
Page 13 is intentionally left blank.
Write your name on this page and at least one other page.
Closed book, closed notes.
Switch off and put away cell-phones/pagers/calculators.
Numerical expressions do NOT have to evaluated. The correct expression is
sufficient to get full credit.
Apportion your time carefully.
For multiple choice questions, please choose the most appropriate choice if you
think more than one choice is acceptable.
Numbers in brackets represent points for that question. Points add up to 100.
You may request additional copies of Figure 1. If you use additional copies,
you must unambiguously mark the copy that you want graded. If you submit multiple attempts without marking one copy as your solution, you will get zero credit for that problem.
Good luck.
Prob.
Max.
Score
I
30
II
20
III
25
IV
25
Total
100
Page 1 of 12
I.
(30 points) Multiple Choice Questions (3 points each)
a. Which of the following instructions is not likely to be in a RISC instruction set?
(1) div reg3, reg2, reg1 // reg3 <- reg2/reg1
(2) move reg1, reg2 // move contents of reg1 to reg2
(3) max reg2, reg1, imm-n // find the maximum of “imm-n” integers beginning at
address contained in reg1 and copy it to reg2.
(4) add reg3, reg1, reg2 // place sum of reg1 and reg2 in reg3
(5) Choices (3) and (4) are both unlikely to be in a RISC instruction set.
b. What happens if the register file reads and writes happen on the same clock edge. (In the lecture we assumed that writes happen in the first half-clock and the values can be read in the second half.)
(1) It is a structural hazard that increases CPI.
(2) It necessitates changes in forwarding logic but does not affect performance. (3) It introduces true data-hazards which require stalling.
(4) Either (2) or (3) could occur.
(5) None of the above.
c. Which of the following is not correct?
(1) The key benefit of pipelining is that it offers a huge improvement in clock cycle
with a modest loss in CPI.
(2) The control signal generation in a pipelined processor is identical to that of a
single-cycled processor.
(3) The x86 ISA is RISC-like ISA.
(4) If delay slots cannot be filled with useful work, it is perfectly okay to fill them with
NOOPs.
(5) Harmonic means should be used to average rates.
d. What is the CPI on a pipelined processor if 30% of instructions are load instructions and 50% of loads are followed by an immediate use. (Assume an implementation like the one shown in Fig 2 and assume that there are no other RAW hazards.) Tricky!
(1) 1.05
(2) 1.10
(3) 1.15
(4) Cannot be determined without additional information (5) None of the above
e. Choose the correct expression for overall speedup when applying three non- overlapping fractions f1, f2 and f3 when each fraction fi achieves a speedup of si. (1) 1/( (1-f1-f2-f3) + (f1/s1+f2/s2+f3/s3) )
(2) (f1+f2+f3)/(s1+s2+s2)
(3) (f1/s1 + f2/s2 + f3/s3)
(4) 1/(1–f1 +f1/s1)+1/(1–f2 +f2/s2)+1/(1–f3 +f3/s3) (5) None of the above
Name:_______________________________
Page 2 of 12
Name:_______________________________
f. Characterize the following statements as TRUE or FALSE.
(1) A four stage pipeline with the work distributed in 25%:30%:20%:25% proportions
is 20% faster than an evenly divided four-stage pipeline. (Assuming no pipeline
hazards.)
(2) The “jal” instruction in the MIPS instruction set requires the Register Write Enable
control signal to be asserted
(3) An optimization that makes 75% of execution 3 times faster is better than an
optimization that makes 25% of execution 10 times faster.
(4) MFLOPS is a meaningful comparison metric if the same workload is used.
(5) Averaging performance across programs using unweighted arithmetic means
implicitly assumes that all programs are equally frequent.
g. If machine A is p times, q times and r times as fast as machine B on three different workloads (say P,Q, and R), on average machine A is faster than machine B by:
(1) HM(p,q,r)
(2) (p+q+r)/3
(3) GM(p,q,r)
(4) AM(p,q,r)
(5) Options (2) and (4) are both correct.
h. In general, increasing the pipeline depth increases
(1) The performance penalty of branch misprediction
(2) Probability of data hazards
(3) Clockspeed
(4) All of the above
(5) None of the above
i. Why is the stack necessary?
(1) To enable function calls within function calls.
(2) To enable floating point operations.
(3) It is not really necessary. It is a software convention that can be redesigned to
use FIFO queues.
(4) It is necessary only for specific algorithms that require a stack data-structure.
(5) None of the above
j. Which of the three terms in the execution time expression do compiler optimizations improve?
(1) CPI
(2) Clock period
(3) Number of Instructions
(4) More than one of the above
(5) None of the above
Page 3 of 12
II.
(20 points) Short Questions (5 points each. ) There are two pairs of related questions (a and b are related. c and d are related.) So it may help to read both before you attempt either.
a. Consider a pipeline with branch resolution in the ID stage as shown in Fig 2. Though the comparator input values are shown as coming from the register file, they may need to be forwarded/bypassed from the pipeline registers of older instructions that have not yet written their results back to the register file. Provide logic equations to handle bypassing to the ID-stage branch resolution for cases that do not require stalling
b. For the same branch resolution hardware, provide logic equations to detect hazards that do require a stall. (Your solution must also identify cases which require stalling.)
Name:_______________________________
Page 4 of 12
c. (5 points) Consider two pipelined processor implementations. One (say, implementation A) resolves branches in the MEM stage. The other (implementation B) resolves branches in the ID stage. Assume that there are no pipeline hazards other than branch-related control flow hazards. Also, assume that there the branch hazards are handled by introducing bubbles (i.e., no branch prediction, no delay slots). Finally, assume that the two implementations operate at the same clock speed. If we assume a workload with 20% branches, which implementation is faster and by what factor?
d. (5 points) Provide an expression for CPI for an arbitrary n-stage pipeline assuming that branches are resolved in the kth (1 < k < n) stage. Assume that there are no other hazards, branches are 20% of all instructions, and branch prediction accuracy is x (0
4. ALUSrc (0 Selects 2nd register, 1 selects sign-extended immediate as second ALU operand)
5. RegWrite (Asserted to enable writes to the register file.)
6. Branch (Asserted if instruction is a “beq” instruction.)
7. MemRead (Asserted to enable data memory read.)
8. MemWrite (Asserted to enable data memory write.)
Name:_______________________________
Page 6 of 12
Name:_______________________________
Fig 1. Single Cycle Datapath
Page 7 of 12
Name:_______________________________
Fig 2. Pipelined Datapath
Page 8 of 12
IV. (25 points) Pipelined Processor Implementation.
This question is used to satisfy the pipeline ABET outcome. To successfully satisfy the ABET outcome, you must score at least 70% on this question.
Consider the code below
outer:
inner:
mov $3, 3
mov $4, 2
subi $4,$4,1
bne $4, $0, inner
subi $3 $3, 1
bne $3, $0, outer
Name:_______________________________
(a) (10 points) Compute the overall branch prediction accuracy for the code shown above assuming a single-bit branch predictor. Assume that the two branches map to different entries in the branch predictor table and that both predictors are initially in the “Predict Taken” state. Your solution must list whether each occurrence of each of the two branches is predicted correctly or not.
# branch 1
# branch 2
Page 9 of 12
Name:_______________________________
(b) (10 points) Now compute the branch prediction accuracy assuming a 2-bit branch predictor. As before, assume that the two branches map to different entries in the branch predictor table and that both predictors are initially in the “Strongly Predict Taken” state. Your solution must list whether each occurrence of each of the two branches is predicted correctly or not.
Page 10 of 12
Name:_______________________________
(c) (5 points) Assume that there are two machines, one with a single bit-predictor and another with a 2-bit predictor. If the branch prediction accuracy computed in parts (a) and (b) are the average prediction accuracies for workloads of interest, which machine is faster and by what factor? (Assume that branch resolution is at the end of MEM stage in both cases.).
Assume the CPI ignoring branches is 1 and every 6th instruction is a branch.
Page 11 of 12
Powers of 2
Name:_______________________________
0
1
1
2
2
4
3
8
4
16
5
32
6
64
7
128
8
256
9
512
10
1,024
11
2,048
12
4,096
13
8,192
14
16,384
15
32,768
16
65,536
17
131,072
18
262,144
19
524,288
20
1,048,576
21
2,097,152
22
4,194,304
23
8,388,608
24
16,777,216
25
33,554,432
26
67,108,864
27
134,217,728
28
268,435,456
29
536,870,912
30
1,073,741,824
31
2,147,483,648
32
4,294,967,296
Glossary:
CPI: Cycles per Instruction
CC: Clock cycletime
IC: Instruction Count
MIPS: (a) Millions of instructions per second (b) The MIPS ISA (depending on context) FSM: Finite State Machine
ALU: Arithmetic Logic Unit
RISC: Reduced Instruction Set Computer
CISC: Complex Instruction Set Computer
OS: Operating System
RTL: Register Transfer Language
Page 12 of 12