CODE: 4-Octyl itaconate Date: November 15, 2021
Name: _____________ ID: __________
Copyright By PowCoder代写 加微信 powcoder
1. [15 marks] Amdahl’s Law
On a uniprocessor, portion A of program P consumes 12 s of the run time, while
portion B consumes 88 s of the run time. Unusually, on a parallel computer,
portion A _slows down_ by a multiplicative factor x > 1. More commonly, on a
parallel computer, portion B speeds up by the number of processors. The
theoretical maximum speedup ‘R’ on program P is 6.1937 1462.
a) What is the value of ‘x’? (answer to five decimal places)
b) What real number of notional processors is required to achieve a speedup
of precisely ‘R/2′? (answer to five decimal places)
c) What real number of notional processors is required to achieve a speedup
of precisely ’99R/100’? (answer to five decimal places)
2. [20 marks] RISC 1.0 Scalar In-Order Pipeline
+————————————————————————-+ P
|
| | o
|
| +-+ +-+ +-+ +-+ | e
| | | | | | | | | | s
|
| +-+ +-+ +-+ +-+ | o
| f/d d/x x/m m/w | r
+————————————————————————-+
During decoding, when the d-box senses trouble, it shuts the pipeline down,
and then reschedules it for a later cycle to complete unfinished work.
“localize” = transfer data from the register file; “retrieve” = transfer data
from a latch. Consider both integer and floating-point register operands.
loop: 1
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | 1 | 2 | 3 | 4 | 5 | 6
fld f0,0(r1) f | d | xi| m1| m2| w | | | | | | | | | |
—————————————————————
addi r3,r3,8 | f | d | xi| n | n | w | | | | | | | | |
—————————————————————
fld f4,0(r3) | | f | d | xi| m1| m2| w | | | | | | | |
—————————————————————
fmul f6,f0,f4 | | | . | . | f | d | x1| x2| x3| n | w | | | |
—————————————————————
fmul f8,f0,f6 | | | | | | . | . | f | d | x1| x2| x3| n | w |
—————————————————————
fsd f6,0(r1) | | | | | | | | | f | d | xi| m1| m2| |
—————————————————————
addi r1,r1,8 | | | | | | | | | | f | d | xi| n | n | w
—————————————————————
bne r1,r4,loop | | | | | | | | | | | . | f | d | |
—————————————————————
a) For the second ‘fld’ above, what localization does the d-box perform in
cycle 4? Which operands, if any, does it retrieve in cycle 4? Which other
boxes, if any, retrieve operands for ‘fld’?
b) For the first ‘fmul’ above, what localization does the d-box perform in
either cycle 5 or cycle 7? Which operands, if any does it retrieve in cycle
7? Which other boxes, if any, retrieve operands for ‘fmul’?
c) For the second ‘fmul’ above, what localization does the d-box perform in
either cycle 8 or cycle 10? Which operands, if any does it retrieve in cycle
10? Which other boxes, if any, retrieve operands for ‘fmul’?
d) For ‘fsd’ above, what localization does the d-box perform in cycle 11?
Which operands, if any, does it retrieve in cycle 11? Which other boxes, if
any, retrieve operands for ‘fsd’? Note: hard question.
e) For ‘bne’ above, what localization does the d-box perform in either cycle
13 or cycle 14? Which operands, if any, does it retrieve in cycle 14? Which
other boxes, if any, retrieve operands for ‘bne’?
3. [25 marks] Dynamic Instruction Scheduling I
| reservation station # ____
| +————————————————-+ +————+
| | +—-+ +—–+ | | |
|—->*) left rs | | flag test: | | | | |
| | ear: / +—-+ +—–+ | | |
| | flag | | | |
| | \ +—————-+ v | | |
| | value | |———————–>| |
| | +—————-+ +——–+ | | scalar |
| | instr. opcode: | | |<--->| functional |
| | +—-+ +——–+ | | unit |
|—->*) right rs | | +–+ +–+ | | |
| | ear: / +—-+ rs_id: | | free: | | | | |
| | flag +–+ +–+ | | |
| | \ +—————-+ | | |
| | value | |———————–>| |
| | +—————-+ | | |
| +————————————————-+ +————+
| reservation station for f-register
| +———————————-+
| | +—-+ | This RRS is indexed with the name
|—->*) ear: rs | | | of the f-register it contains, and
| | +—-+ | is therefore easily accessible.
| | |
| | +—————-+ | Syntax: RRS[f2], for example
| | value | | |
| | +—————-+ |
| +———————————-+
The following program walks through two arrays from right to left, does
element-wise multiplication, adds a constant, and stores the result in the
corresponding element of the first array. Assume multiple functional units
of every type.
fld f0,0(r1) // honorary floating-point instruction
fld f2,0(r2) // honorary floating-point instruction
fmul f4,f0,f2
fadd f4,f4,f6
fsd f4,0(r1)
subi r1,r1,8
subi r2,r2,8
bne r1,r3,loop
In program order, we dispatch the ten “floating-point” instructions from the
first two iterations of the loop into ten reservation stations. We do not
assume that all instructions are dispatched before any are issued. All
completion times are arbitrary.
fld f0,0(r1) l1 [flow graph] alpha [RSs]
fld f2,0(r2) \ \
fmul f4,f0,f2 m1 –> a1 –> s1 gamma –> delta –> epsilon
fadd f4,f4,f6 / /
fsd f4,0(r1) l2 beta
fld f0,-8(r1) l3 [flow graph] zeta [RSs]
fld f2,-8(r2) \ \
fmul f4,f0,f2 m2 –> a2 –> s2 theta –> iota –> kappa
fadd f4,f4,f6 / /
fsd f4,-8(r1) l4 eta
a) Both instruction ‘l1’ and ‘l2’ are pending. Instruction ‘m1’ has not been
dispatched yet. What are the current contents of gamma.left.rs and
gamma.right.rs?
b) Consider a faulty hardware implementation of Tomasulo’s algorithm in which
RRS[f4].rs gets stuck at the first value assigned to it. In the unrolled loop
above, could ‘s1’ receive bad operand input? Could ‘s2’ receive bad operand
c) Suppose we dispatched the second iteration first (in local program order)
and then dispatched the first iteration second (in local program order).
What, if anything, would change in the final contents of the register file?
d) Suppose a reservation station had incorrect information about its own name.
What problems, if any, might occur when the pipeline is running?
e) The program above contains a memory-mediated antidependence on Mem[r1].
In this program, what problems, if any, could result from this antidependence?
4. [25 marks] Dynamic Instruction Scheduling II
a) Both ‘l1’ and ‘l2’ are pending at the time ‘m1’ is dispatched. Show the
resulting field values indicated below.
gamma.left.rs = ____; gamma.right.rs = ____;
b) After ‘m1’ is dispatched, both ‘l1’ and ‘l2’ complete. Show the resulting
field values indicated below. Use the syntax ‘Mem[address]’ to denote any
memory values.
gamma.left.value = ____; gamma.right.value = ____;
Below, if I say there is an X-dependence from ‘i’ to ‘j’, it is understood
that ‘i’ is earlier and ‘j’ is later.
c) There is a flow dependence from producer ‘i’ to consumer ‘j’. ‘i’ has
issued and is pending; ‘j’ has been dispatched. Due to hardware malfunction,
some instruction ‘k’ (later than ‘i’) has been dispatched into i’s reservation
station. What is the necessary and sufficient condition for ‘k’ to be allowed
to issue without violating the flow dependence between ‘i’ and ‘j’? You may
need to augment a condition you already know.
d) There is an antidependence from instruction ‘i’ to instruction ‘j’. Due
to hardware malfunction, the relevant ear in ‘i’ cannot be turned off or
otherwise disabled. The producer ‘p’ for instruction ‘i’ is still pending.
Find two independent temporal conditions either of which would protect this
antidependence.
e) There is an output dependence from instruction ‘i’ (writing value ‘pi’) to
instruction ‘j’ (writing value ‘pi^2’). Both ‘i’ and ‘j’ have issued. What
is the necessary and sufficient condition for both ‘pi’ and ‘pi^2’ to be
physically written to the common destination register?
5. [15 marks] Multicycle Operations and Loop Timing
loop: 1
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | 1 | 2 | 3 | 4 | 5 | 6
fld f0,0(r1) | | | | | | | | | | | | | | |
————————————————————–
fld f2,24(r1) | | | | | | | | | | | | | | |
————————————————————–
fmul f4,f0,f2 | | | | | | | | | | | | | | |
————————————————————–
fmul f8,f4,f6 | | | | | | | | | | | | | | |
————————————————————–
subi r1,r1,8 | | | | | | | | | | | | | | |
————————————————————–
fsd f8,8(r1) | | | | | | | | | | | | | | |
————————————————————–
bne r1,r2,loop | | | | | | | | | | | | | | |
————————————————————-
Assume: 1) memrefs have one x-box and 3 m-boxes, 2) FP arithmetic operations
have 3 x-boxes and one m-box, 3) there are multiple pipelined functional
units, 4) we use “predict not taken”, and 5) the loop iterates 43 times. As
scratch work, fill in the space-time diagram above, but do _not_ return it to
a) What is the gap between two iterations? ____
b) What is the time for the whole loop? ____
c) How many “fresh” stalls (i.e., dots) are there, respectively, in each of
the last five instructions of the loop body (‘fmul’ to ‘bne’)?
_______________________________________________
comp326-f21 mailing list
https://mailman.encs.concordia.ca/mailman/listinfo/comp326-f21
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com