COMP326 Assignment 4 Fall 2020
Issued: November 16, 2020 Due: December 7, 2020
Submit electronically to Moodle. No extension will be granted.
1. Vector Performance (40 marks)
Consider the following code fragment:
vld v1,r5 // load vector x
vld v2,r6 // load vector y
vmul v3,v1,f0 // z1 = a * x
vadd v4,v2,v3 // z2 = y + z1
vmul v5,v2,f2 // z3 = b * y
vst v4,r6 // store z2 as vector y
vst v5,r5 // store z3 as vector x
Make the following assumptions:
– this is a four-lane vector processor—this affects the running time of
vector instructions
– there are two copies of each arithmetic vector functional unit, and two
copies of the vector load-store functional unit—this affects the number
of vector instructions that can run in parallel in a gang
– a vector store completes as soon as the last element is sent to memory
– start-up penalties of functional units are: l/s = 12, add = 6, mul = 7
– the vector register length is 256
a) In the absence of vector chaining, determine the execution time of this
code fragment by direct summation with start-up penalties and the value of
‘n’. Sum the gangs to get the total time.
b) In the presence of vector chaining, determine the execution time of this
code fragment by direct summation with start-up penalties and the value of
‘n’. Sum the gangs to get the total time.
2. Vectorizing Compiler (30 marks)
Consider this familiar loop:
loop: l.d f4,0(r1) l1
l.d f6,0(r2) l2
mul.d f4,f4,f0 m1
mul.d f6,f6,f2 m2
add.d f4,f4,f6 a1
s.d f4,0(r1) s1
subi r1,r1,8 sub1
subi r2,r2,8 sub2
bnez r1,loop br
Rewrite the code using vector instructions. Draw the flow-dependence graph
of vector instructions. Using vector chaining, make a rough estimate of
the running time of the program, using the data in question 1.
3. Vectorizing Loop Nests (30 marks)
Consider the following loop nest that computes the lengths of many
strings.
for( i=0; i