COMP322101 Exam
Question 1
Inspect the segment of code in Fio. 1. This snows a function addValue that takes a loatinc
This is then added to element index of a global array data. The calculation itself is pertormec
Copyright By PowCoder代写 加微信 powcoder
by a senarate routine that does not alter the value ofy
davalue( float
float value
datalindex.
– perrmon.exa.cu.ation ( x .:
data index.
for Question 1
(a) Consider first Amdanls law, which states that the maximum speedup S
program with a fraction f left in serial in
where y is the number of processina units, e.Q. Inreads.
benne the speeduo in terms of the serial execution time « and the parallel execu
ton time t..
(1) Derive Amdan’s law. equation (*
(b) Within a multi-threaded context. the function addva
lue() in Fig. 1 may be
taneouslv o two or more threads.
O Define a data race, and the conditions under which one mav occur.
(m) Data races are known to sometimes lead to non-deterministic benaviour. Describe
what this means and how it can occur making specific reference to the core ir
aD At which line number in -ia. 1 does the data race potentiallv arise?
IC us suagested that in order to make the function thread-sate, lines 3 and « snould botr
ined within a single
critical region,
such as that implemented in OpenM? as
Boragza oss critica. f
O his found that making this change does indeed resut in a thread-sate addvalue
-yolain this onservaton
however, Ine performance is signmcany reduced, even for a large number o
threads ,. With reference to Amdanl’s law, exolain why.
(in Suggest one wav in wich moving the start and/or end of the critical region car
Improve pertormance while maintainina thread-saletv. Give your reasonina
Id I’ is further suggested that using multiole locks to control access to the arrav
should result in a further performance benefit. Outline now this mignt De implemented.
and why the performance might improve.
Question 2
A reduction oneration can be defined as when a collection of elements s duced so a smaller
collection by the repeated application of a combiner function. This combiner function is
tvoicallv a binary operator 8 that acts on two elements. returnina a single element. ie
a for parallel reduction it is important that the operator o is associative, le.
Way is this?
lol Which of the following operations are commutative. ie. Obevs a 8 5 = 6? Whicr
are exactly associative, and which only aoproximately so? If approximatelv associative.
explain wry and the possiole consequence in relation to the equivalent serial reduction.
O) Integer multiolication (ianorina overflow).
(iD Takin the average of fwo float variables. ie. 0.5f* (a+b).
(c) Fia. 2 shows a binary tree as might be employed for reduction.
Suppose you were
asked to implement this pattern in parallel on a multi-core CPU. How would you ensure
the corres: computation every time it is run? How would vour implementation differ
for a distributed memorv architecture, and why? You
(d) Reduction on a GPIl introduces now difficulties hut also notential honofits
you need to reduce a data setthat is verv large (but not so large as to exceed memorv)
0 Wnat challenge do large data sets pose when trvina to pertorm an operation sucr
as reduction on a Gel. and what can be done to resolve it? You do not need to
provide an imnlementation details specific to reduction
CONTINUED ON FACING PAGE
Page 4 01 3
COMP322101
, Near the and of the redurtion when the number of remainina calculations in he
performed is small. what feature of a tvoical Gel can be exoloited to improve
performance?
(e) Look again at Fig. 2, and note this can be interpreted as a task araph: that is, a directec
acyclic arach describing the dependencies in the calculations.
(i) In which levels of the tree are the reductions between processing units actually
being performed?
(in) Regard each node of the tree in which calculations are being performed as a task.
and assume each task takes equal time. What is the work and span of this grash?
Old Consider an aroitrarv dinar tree that nas » = 2 nodes in the uopermost row. sc
the version in the faure correcnonds to m
Again assuming that each task
takse sanaltimo unai ie ino werz and enan nau?
(iv For both of the cases (ill and (ill), what is the maximum speedup according to the
WorK-span model?
fauestion 2 total: 25 marks
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com