Name: ___________________ ID: ____________
COMP326 Assignment 3 Fall 2021
Issued: October 25, 2021 Due: November 8, 2021
Copyright By PowCoder代写 加微信 powcoder
Submit electronically to Moodle. No extension will be granted.
1. Loop Unrolling [40 marks]
Consider the following loop:
loop: fld f4,0(r1) l1
fld f6,0(r2) l2
fmul f4,f4,f0 m1
fmul f6,f6,f2 m2
fadd f4,f4,f6 a1
fsd f4,0(r1) s1
subi r1,r1,8 sub1
subi r2,r2,8 sub2
bnez r1,loop br
a) [6 marks] Using the names ‘l1’ to ‘s1’ for the first six instructions in
the body of this loop, draw the flow-dependence graph for these instructions.
Label each arrow with the flow-dependence gap between the producer and the
consumer instruction. In 1 a), assume all floating-point arithmetics have
4 x-boxes. Finally, also label arrows with any operands retrieved from the
register file, i.e., when no producer is visible.
For b), focus on three flow-dependence instances: i) FP arith to FP arith,
ii) FP arith to FP store, and iii) FP load to FP arith. Denote the number
of m-boxes in memory references by ‘#m’, and the number of x-boxes in
FP arithmetics by ‘#x’.
b) [6 marks] For each of the three designated flow-dependence instances,
indicate the number of stalls in adjacent producer-consumer pairs as functions
of ‘#m’ and ‘#x’.
c) [10 marks] Suppose #m = 1 and #x = 4. How many stalls occur in one
iteration of the loop (all nine instructions) if it is executed exactly as
written? Ignore the branch stall.
d) [10 marks] Unroll the loop twice. If one reschedules the unrolled loop
optimally, how many stalls are left? (Keep the branch as the last
instruction, but feel free to pull any ‘subi’ instructions up for gap padding.
Show the optimally rescheduled code using the _short_ names). Ignore the
branch stall.
e) [8 marks] Three instructions in the original code are memory references.
After unrolling the loop twice, and scheduling, some memory references may
change. Show all memory references in the unrolled code. Since register
renaming becomes rather tedious in medium-size problems, use the syntax:
‘fld …,24(r3)’ or ‘fsd …,8(r5)’.
2. Dynamic Instruction Scheduling I [30 marks]
Imagine that reservation stations only track whether floating-point operands
are valid, and that integer operands appear by magic whenever needed, with one
exception, noted below. Here, we write, for example, ‘rs1(l1)’, to indicate
that the reservation station ‘rs1’ contains the instruction ‘l1’. Let’s
give different names to the same instruction in different iterations. With
this convention, we might also have ‘rs7(l4)’.
a) [10 marks] Unroll the loop twice. Dispatch instructions ‘l1’ to ‘s2’ to
the pipeline’s 12 reservation stations, viz., ‘rs1’ to ‘rs12’. List the
contents of reservation stations ‘rs7’ to ‘rs12’, including the instructions
they contain. Ignore integer operands. The ‘rs’, ‘value’, ‘test’, busy’, and
‘opcode’ fields are the most important contents, with a distinction between
left and right. When listing the contents of ‘rs7’, write “[rs7]:” followed
by the contents.
For now, we treat the reservation stations for loads in an unrealistic way:
we imagine that ears play no role, and that a load may issue as soon as it
is dispatched. But store has two operands: the value stored, and the
(integer) memory address. We use the left ear for the value stored, and
pretend the other ear isn’t there.
We need a convention how to display textually the “ear” contents of a
reservation station. For each ear, precisely one field is valid. The
contents are either: i) a value from the “old” register file, ii) the result
of a completed producer (yes, this is redundant notation; the result is also
sitting in a register), or iii) the name of a pending producer. Use,
respectively, the syntax: i) ‘r5’, ii) ‘result[rs3]’, or iii) ‘rs3’. We
display the issue status by writing “may issue” or “may not issue”. We
display the busy status by writing “in use” or “free”.
Using these conventions, show the reservation-station contents of ‘rs7’ to
‘rs12’ before any instruction has issued.
b) [10 marks] After both loads of the second iteration complete, but no other
action has taken place, show the contents of reservation stations ‘rs7’
through ‘rs12’.
c) [10 marks] At this point, the d-box begins to dispatch the third iteration.
When it has dispatched as much as possible, show the contents of the
reservation stations ‘rs7’ through ‘rs12’. No other action takes place.
3. Dynamic Instruction Scheduling II [30 marks]
a) [10 marks] There is a flow dependence from earlier instruction ‘alpha’
to later instruction ‘beta’. Assume that impersonation is impossible and
that the register file is protected against corruption. Case I: ‘alpha’
completes _before_ ‘beta’ is dispatched. Case II: ‘alpha’ completes _after_
‘beta’ is dispatched. Show that the flow dependence is upheld in both cases.
b) [10 marks] There is an anti-dependence from earlier instruction ‘alpha’
to later instruction ‘beta’. Assume that impersonation is impossible and
that the register file is protected against corruption. Alpha’s producer
is the even earlier ‘gamma’. Case I: ‘gamma’ completes _before_ ‘beta’ is
dispatched. Case II: ‘gamma’ completes _after_ ‘beta’ is dispatched. Show
that the anti-dependence is upheld in both cases.
c) [10 marks] There is an output dependence from earlier instruction ‘alpha’
to later instruction ‘beta’. Assume that impersonation is impossible and
that the register file is protected against corruption. Case I: ‘alpha’
completes _before_ ‘beta’ is dispatched. Case II: ‘alpha’ completes _after_
‘beta’ is dispatched. Show that the output dependence is upheld in both cases.
_______________________________________________
comp326-f21 mailing list
https://mailman.encs.concordia.ca/mailman/listinfo/comp326-f21
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com