程序代写 COMP322101 Module Title: Parallel Computation School of Computing

Module Code: COMP322101 Module Title: Parallel Computation School of Computing
Examination Information
– There are 7 pages to this exam.
– Answer all 2 questions.

Copyright By PowCoder代写 加微信 powcoder

©c UNIVERSITY OF LEEDS Semester 2 2020/2021
– The total number of marks for this examination paper is 80.
– The number in brackets [ ] indicates the marks available for each question or part question.
– Your submission should be less than 4000 words in total.
– It is possible to receive full marks using less than 2000 words in total.
– You may lose marks for including extensive discussions that are not relevant to the question.
– You will not receive any marks, and may lose marks, for simply copying material from your notes verbatim.
– You are reminded of the need for clear presentation in your answers.
– You are required to submit a single PDF file via the Minerva submission point.
Page 1 of 7 Turn the page over

Module Code: COMP322101 Question 1
(a) The parallel speedup S is defined as the ratio of the serial execution time ts to the parallel execution time tp, S = ts/tp. Suppose your hardware supports p processing units. What are the maximum and minimum values of S, if such limits exist, in terms of p? Justify your answers by explaining in what type of situation each limit could be approached.
(b) Suppose you need to solve a problem that is comprised of N independent tasks, each of which takes the same time to complete. You want to execute these in parallel using p processing units. However, N is not a multiple of p.
(i) It is suggested that the N tasks can be distributed over the p processing units as follows: p−1 processing units are allocated ⌈N/p⌉ tasks, where ⌈N/p⌉ is N/p rounded up to the nearest integer, and the final processing unit then takes all remaining tasks. What problem or problems can you see with this suggestion?
(ii) It is now proposed to allocate the tasks to processes in a ‘round-robin’ manner, cycling through all processing units, allocating one task to each in turn before moving onto the next unit. Provide some pseudo-code for a loop that implements this allocation of tasks.
(iii) However, the overhead in performing this loop is regarded as too high, and direct expressions are preferred. Give one or two lines of C code that determines the number of tasks numTasks that should be allocated to the processing unit with index unit, where 0<=unit0 is an argument and h is calculated from xmin, xmax and N. For the intended application, f(x) is time-consuming to evaluate for each x. Calculated values are therefore stored in an array store[N] so they can be re-used. For all of question 2, you may assume N is a multiple of p.
You are asked to parallelise area(). Inspect the code, then answer the following questions.
1 // Compute-intensive function of x only.
2 float f( float x ) …
4 // Returns an estimate of the area under f(x) from xmin
5 // to x=xmax. Calculated values of f() are copied to store[N].
6 float area( float xmin, float xmax, int N, float *store )
8 float h = (xmax-xmin)/N; 9
10 // Calculate and store f(x) for each x.
12 for( i=0; iCS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com