CS代考 COMP322101 Module Title: Parallel Computation School of Computing

Module Code: COMP322101 Module Title: Parallel Computation School of Computing
©c UNIVERSITY OF LEEDS Semester 2 2018/2019
Calculator instructions:
– You are allowed to use a non-programmable calculator only from the following list of

Copyright By PowCoder代写 加微信 powcoder

approved models in this examination: Casio FX-82, Casio FX-83, Casio FX-85. Dictionary instructions:
– You are not allowed to use your own dictionary in this examination. A basic English dictionary is available to use: raise your hand and ask an invigilator, if you need it.
Examination Information
– There are 5 pages to this exam.
– There are 2 hours to complete the exam.
– Answer all 2 questions.
– The number in brackets [ ] indicates the marks available for each question or part question.
– You are reminded of the need for clear presentation in your answers.
– The total number of marks for this examination paper is 50.
Page 1 of 5 Turn the page over

Module Code: COMP322101 Question 1
Let ts be the serial execution time for an algorithm, and suppose that the equivalent parallel algorithm takes a time tp when executed on p processing units.
1 2 3 4 5 6 7 8 9 10
(i) Define the speedup S in terms of ts and tp.
(ii) Suppose a fraction f of the parallel execution time is actually executed in serial. Derive
the Gustafson-Barsis law that the maximum speedup is S = f + p(1 − f).
(iii) What is the difference between weak and strong scaling? Which is the Gustafson- Barsis law related to?
Fig. 1 shows part of an MPI program that sends data to the process with the next higher rank, with wrap-around so the process with rank p − 1 sends to rank 0. It is found that this cyclic communication pattern causes deadlock for large messages.
// Initialise MPI.
int numProcs, rank;
MPI Comm size(MPI COMM WORLD,&numProcs); MPI Comm rank(MPI COMM WORLD,&rank);
// Initialise arrays
… // (not shown)
// Send data to process with one higher rank, with wrap-around. MPI Send(sendData,n,MPI FLOAT,(rank==numProcs-1?0:rank+1),
0,MPI COMM WORLD);
MPI Recv(recvData,n,MPI FLOAT,(rank==0?numProcs-1:rank-1),
0,MPI COMM WORLD,MPI IGNORE STATUS);
// Deallocate arrays and call MPI Finalize(). … // (not shown)
Figure 1: Code for question 1(b).
(i) What is deadlock, and why does it occur in this situation?
(ii) One way to resolve this is to change the program logic so that some processes call MPI Send() before MPI Recv(), while others call MPI Recv() before MPI Send(). Give one other solution. You do not need to provide any MPI function names.
[1 mark] Page 2 of 5 Turn the page over

Module Code: COMP322101
(iii) Someone changes the program in Fig. 1 so that rank 0 calls MPI Recv() before MPI Send(). All other ranks 1 to p − 1 inclusive call MPI Send() before MPI Recv() as before. This is found to resolve the deadlock. Explain why. You may like to use a diagram to support your answer.
(iv) However, timing runs show that the total communication time is very long. Explain why. How would you expect the time for the total communication to vary with p?
(c) Someone is writing an application for a shared memory architecture and identifies a critical region with a potential data race. Rather than use a high level construct (such as OpenMP’s #pragma omp critical), they decide to implement their own solution using a lock.
(i) Write some pseudo-code to explain how a lock can be employed to ensure only one thread enters the critical region at a time. You may assume that the parallel framework you are using provides a lock object lock t with methods lock() and unlock() for locking and unlocking respectively.
(ii) At an even lower level, the lock and unlock routines employ atomic operations to perform their functions. What is an atomic operation, and why are they used by locks?
(iii) The function aCAS(int *x,int cmp,int y) atomically sets x=y if *x==cmp. It returns the original value of x regardless of the result of the comparison. Show how this function can be used to implement the acquiring of a lock at the start of the critical region. Second and subsequent threads should just skip the region. You do not need to show how the lock is released.
[Question 1 Total: 25 marks]
Page 3 of 5 Turn the page over

Module Code: COMP322101 Question 2
Consider the serial code in Fig. 2. This calculates all N values of the floating point array out[N] from known values of the array in[N] of the same size. The calculations are performed in a loop, using the function float performComplexCalculation(float). The details of this function are not known to us.
float in[N], out[N];
… // Initialise all N elements of in[N].
for(i=0;iCS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com