CONFIDENTIAL EXAM PAPER
This paper is not to be removed from the exam venue
Information Technologies EXAMINATION
Semester 1 – Main, 2018
Copyright By PowCoder代写 加微信 powcoder
COMP5426 Parallel and Distributed Computing
EXAM WRITING TIME: READING TIME:
EXAM CONDITIONS:
10 minutes
For Examiner Use Only Q Mark
1 2 3 4 5 6 7 8
Room Number
Seat Number
Student Number
ANONYMOUSLY MARKED
(Please do not write your name on this exam paper)
________ |__|__|__|__|__|__|__|__|__|
This is a CLOSED book examination – no material permitted During reading time – writing is not permitted at all
MATERIALS PERMITTED IN THE EXAM VENUE:
(No electronic aids are permitted e.g. laptops, phones)
Calculator – non-programmable
MATERIALS TO BE SUPPLIED TO STUDENTS: 9
1 x 12-page answer book
INSTRUCTIONS TO STUDENTS:
• This paper comprises FIVE questions, each with multiple short questions.
ALL questions must be answered.
• Write your response to questions in the answer booklet(s) provided.
• Note that questions are of unequal value. The total mark of this exam paper is 100.
Total ________
• This exam paper must be returned with the answer booklets
• Take care to write legibly. Write your final answers in ink, not pencil.
Please tick the box to confirm that your examination paper is complete.
Page 1 of 5
Question 1 (12 Marks)
(This section contains 4 short-answer questions.)
a. Distinguish between implicit and explicit parallelism. [2]
b. Classify parallel computers based on Flynn’s taxonomy in terms of streams of data
and instructions with examples. [4]
c. Describe at least FOUR characteristics of shared-memory multiprocessors that distinguish them from distributed-memory multicomputers on the basis of architecture, data sharing and communication/synchronization. [4]
d. Distinguish between SIMD and SPMD. [2]
Question 2 (16 Marks)
(This section contains 5 short-answer questions)
a. What are the major performance metrics for characterizing an interconnection network, and what are the main purposes of using these metrics? [4]
b. What is the execution time, in general, relate to the standard all-to-one reduction operation on a two-dimensional torus with a total number of compute nodes p. [4]
c. Explain Hypercube Network with properties. [4]
d. What is the Arc connectivity of a two-dimensional torus? [2]
e. What is the cost (i.e., the number of links) of a hypercube network? [2]
Question 3 (27 Marks)
(This section contains 5 short-answer questions.)
a. Provide concrete definitions and examples of the following terms: [3] i. Parallel speedup
ii. Parallel efficiency iii. Total overhead
b. Provide concrete definitions and examples of Amdahl’s Law and Gustafson-Barsis’s law. What do these two laws tell us? [4]
c. Suppose 90% of the code of a program is found to be parallel and it can be executed simultaneously by 16 homogeneous processors. Rest of the code is sequential. The execution speed of a processor is 4 GFLOPS (Giga Floating Point Operations Per Second). Calculate the effective execution speed in GFLOPS of this program
execution.
Page 2 of 5
d. Consider a matrix of size 9×11, which is partitioned into blocks, each being of size 2×2, and distributed onto a 2×2 processor mesh. Label each matrix element with the processor numbers 1, 2, 3 and 4 that the element is mapped to by the two-dimensional block cyclic layout. [4]
e. Consider performing an even-odd transportation sort for a sequence of n numbers on a linear array of p processors. What are the speedup, efficiency and isoefficiency function? Show your work. (You may assume that a local sequential sort of m numbers takes O(mlog m) time and a local sequential merge of two sorted sequences of m numbers takes O(m) time.) [10]
Question 4 (19 Marks)
(This section contains 3 questions)
List SIX very basic MPI functions which are used in almost every parallel programs using MPI. [3]
Explain clearly what each one of the following MPI commands does. You need to specify the input and output parameters and what they store before and after the call.
int MPI_Comm_size(MPI_Comm comm, int *size)
MPI_Comm_split(MPI_Comm comm, int color, int key, MPI_Comm
MPI_Scatterv(const void *sendbuf, const int sendcounts[], const
int displs[], MPI_Datatype sendtype, void *recvbuf, int recvcount,
MPI_Datatype recvtype, int root, MPI_Comm comm)
Suppose that MPI_COMM_WORLD consists of three processes 0, 1, and 2, and suppose the following code is executed: [8]
int x, y, z;
switch(my_rank) {
case 0: x=0; y=1; z=2;
MPI_Bcast(&x, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Send(&y, 1, MPI_INT, 2, 2, MPI_COMM_WORLD);
MPI_Bcast(&y, 1, MPI_INT, 1, MPI_COMM_WORLD);
case 1: x=3; y=8; z=5;
MPI_Bcast(&x, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Send(&z, 1, MPI_INT, 2, 3, MPI_COMM_WORLD);
MPI_Bcast(&y, 1, MPI_INT, 1, MPI_COMM_WORLD);
case 2: x=6; y=7; z=8;
MPI_Bcast(&z, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Recv(&x, 1, MPI_INT, 0, 2, MPI_COMM_WORLD, &status);
MPI_Recv(&y, 1, MPI_INT, 1, 3, MPI_COMM_WORLD, &status);
MPI_Bcast(&z, 1, MPI_INT, 1, MPI_COMM_WORLD);
What are the values of x, y, and z on each process after the code has been executed?
Page 3 of 5
QUESTION 5 (26 Marks)
(This section contains 5 questions)
a. What is race condition? Explain race condition with example. [2]
b. Explain mutex and condition variable and the associated operations. [3]
c. Using mutex and condition variable, write a simple synchronization structure barrier which is used to coordinate the execution of a set of threads. When a thread reaches a synchronization point by calling barrier, its execution is stopped until all other threads in the set reach the synchronization point. [8]
A barrier type is defined as typedef struct {
pthread_mutex_t count_lock; /* mutex */
pthread_cond_t ok_to_proceed; /* condition variable */
int num_thrds; /* the total number of threads in the set */
int count; /* the number of threads arrived */
} mylib_barrier_t;
You must write the following TWO functions: mylib_barrier_init(mylib_barrier_t *b, int n_thrds), and mylib_barrier(mylib_barrier_t *b)
/* all threads are awakened when the total number of blocked threads is equal to n_thrds. */
d. In the context of a GPU programmed using CUDA, what are (i) threads; (ii) blocks; and (iii) grids? [2]
e. Outline how you would allocate and free memory for the arrays on the GPU, and how you would transfer data from and to the host CPU. [3]
f. Consider a CUDA kernel routine plus_reduce() below. This routine performs a parallel add reduction operation, i.e., parallel summation of an array of integers with N elements. [8]
__global__
void plus_reduce(int *input, int N, int *total)
{ int tid = threadldx.x;
int i = blockldx.x*blockDim.x + threadldx.x;
__shared__ int x[blockDim.x];
x[tid] = (i
{ If (tidCS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com