1. [20 marks] Amdahl’s Law.
On a uniprocessor, portion A of program P consumes x% of the time, and portion
B consumes the remaining (100 – x)%. On a parallel computer, portion A speeds
Copyright By PowCoder代写 加微信 powcoder
up by a factor of 1.375, while portion B speeds up by the number of processors.
The theoretical maximum speedup is 400. What is the minimum integral number
of processors required to achieve at least 41% of the maximum speedup? What
is the minimum integral number of processors required to achieve at least 83%
of the maximum speedup? In each case, show ‘P’, ‘P_frac’, and the final
arithmetic operation used to compute ‘P_frac’. Hint: Use fractions.
– at least 41% of the maximum speedup:
a) P = ________ processors
b) Computed P_frac as: [ ]
– at least 83% of the maximum speedup:
c) P = ________ processors
d) Computed P_frac as: [ ]
2. [20 marks] Scalar In-Order Pipelines
+————————————————————————-+ P
|
| | o
|
| +-+ +-+ +-+ +-+ | e
|
| +-+ +-+ +-+ +-+ | s
| f/d d/x x/m m/w | o
+————————————————————————-+ r
Below, memrefs have 2 m-boxes, and floating-point multiplies have 3 x-boxes.
The pipeline has 4 latches. Consider the following program:
loop: 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6
l.d f0,0(r5) f d x m m w
l.d f2,0(r1) f d x m m w
l.d f4,0(r2) f d x m m w
mul.d f6,f0,f4 . . f d x x x n w
s.d f6,0(r5) . f d x m m
mul.d f8,f4,f6 f d x x x n w
addi r5,r5,8 f d x n w
bne r5,r6,loop . f d
When a box processing an instruction retrieves two operands, answer in the
order left operand, right operand. The proper order is _required_.
a) From which latch or latches does the x-box in cycle 8 receive its
operand or operands?
latches: ____, ____
b) From which latch or latches does the m-box in cycle 11 receive its
operand or operands? Ignore integer operands.
latches: ____, ____
c) From which latch or latches does the x-box in cycle 11 receive its
operand or operands?
latches: ____, ____
d) From which latch or latches does the d-box in cycle 13 receive its
operand or operands?
latches: ____, ____
3. [20 marks] Dynamic Instruction Scheduling I
reservation station # ____
+————————————————-+ +————+
| +—-+ | | |
>>| left rs | | +—-+ | | |
| ear: / +—-+ flag test: | | | | |
| flag +—-+ | | |
| \ +—————-+ | | | |
| value | | v | | |
| +—————-+ +——-+ |—->| scalar |
| instr. opcode: | | | | functional |
| +—-+ +——-+ |<----| unit |
>>|right rs | | +–+ | | |
|ear: / +—-+ rs_id: | | | | |
| flag +–+ | | |
| \ +—————-+ +–+ | | |
| value | | free: | | | | |
| +—————-+ +–+ | | |
+————————————————-+ +————+
Consider the following program that walks through two equal-length arrays from
right to left, does element-wise multiplication, adds a constant, and stores
the result in the corresponding element of the first array. Assume multiple
functional units of every type. The loop below is the entire program.
l.d f0,0(r1)
l.d f2,0(r2)
mul.d f4,f0,f2
add.d f4,f4,f6
s.d f4,0(r1)
subi r1,r1,8
subi r2,r2,8
bne r1,r3,loop
We will dispatch—in program order—the ten “floating-point” instructions
from the first two iterations of the loop into ten reservation stations.
Do _not_ assume that all instructions are dispatched before any are issued.
l.d f0,0(r1) l1 [fdg] alpha [rsg]
l.d f2,0(r2) \ \
mul.d f4,f0,f2 m1 –> a1 –> s1 gamma –> delta –> epsilon
add.d f4,f4,f6 / /
s.d f4,0(r1) l2 beta
l.d f0,-8(r1) l3 [fdg] zeta [rsg]
l.d f2,-8(r2) \ \
mul.d f4,f0,f2 m2 –> a2 –> s2 theta –> iota –> kappa
add.d f4,f4,f6 / /
s.d f4,-8(r1) l4 eta
Once instructions start issuing there will be concurrency, and all completion
times are completely arbitrary.
a) Consider a faulty implementation of Tommasulo’s algorithm in which none of
the registers have ears. In the unrolled loop above, could ‘m1’ and ‘m2’
receive bad operand input? _____ (both/neither/only ‘m1’/only ‘m2’) Explain.
b) In another faulty implementation, reservation stations set themselves to
‘free’ as soon as they have issued. In general, could this be a problem?
___ (yes/no) Explain.
c) Why do reservation stations have an ‘rs_id’ field?
d) The program above contains a memory-mediated antidependence on Mem[r1].
Assume the load-store buffers don’t notice this. Could this be a problem?
___ (yes/no) Explain.
4. [25 marks] Dynamic Instruction Scheduling II
a) Both ‘l1’ and ‘l2’ are pending at the time ‘m1’ is dispatched. Show the
resulting field values indicated below after ‘m1’ dispatch.
gamma.left.rs = ____; gamma.right.rs = ____;
b) After ‘m1’ is dispatched, both ‘l1’ and ‘l2’ complete. Show the resulting
field values indicated below after ‘l1’ and ‘l2’ completion.
gamma.left.value = ____; gamma.right.value = ____;
c) In program P, there is a flow dependence from earlier instruction ‘alpha’
to later instruction ‘beta’. Why is it necessary that ‘alpha’ be dispatched
before ‘beta’? Explain.
d) In program P, there is a left-operand antidependence from earlier
instruction ‘alpha’ to later instruction ‘beta’. A hardware malfunction
prevents the left ear from being turned off. Could this cause a problem?
___ (yes/no) Explain.
e) In program P, there is an output dependence on ‘f6’ from earlier
instruction ‘alpha’ to later instruction ‘beta’. Both instructions have
issued, but neither has completed. How many updates does ‘f6’ receive
and in what order? Explain.
5. [15 marks] Multicycle Operations and Loop Timing
loop: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | 1 | 2 | 3 | 4 | 5 | 6
l.d f0,0(r1) | | | | | | | | | | | | | | |
————————————————————–
l.d f2,24(r1) | | | | | | | | | | | | | | |
————————————————————–
mul.d f4,f0,f2 | | | | | | | | | | | | | | |
————————————————————–
mul.d f8,f4,f6 | | | | | | | | | | | | | | |
————————————————————–
s.d f8,0(r1) | | | | | | | | | | | | | | |
————————————————————–
subi r1,r1,8 | | | | | | | | | | | | | | |
————————————————————–
bne r1,r2,loop | | | | | | | | | | | | | | |
————————————————————-
Assume: 1) memrefs have one x-box and 3 m-boxes, 2) FP arithmetic operations
have 3 x-boxes and one (never-used) m-box, 3) there are multiple pipelined
functional units, 4) we use “predict not taken”, and 5) the loop iterates 43
times. As scratch work, fill in the space-time diagram above, but do _not_
return it to me. You may use a computer for this purpose, or pen.
a) What is the gap between two iterations? ____
b) What is the time for the whole loop? ____
c) How many “fresh” stalls (i.e., dots) are there, respectively, in each of
the last five instructions of the loop body (i.e., from ‘mul.d’ to ‘bne’)?
___________
_______________________________________________
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com