CS4021
Continued – 1 –
THE UNIVERSITY OF WARWICK
MSc Examinations: Summer 2015
High Performance Computing
Time allowed: 3 hours
Answer FOUR questions.
Calculators may be used.
Read carefully the instructions on the answer book and make sure that the particulars
required are entered on each answer book.
1. (a) Discuss the differences between Cluster systems and Grid systems. [6]
(b) Explain three general approaches to achieving high performance from the
perspective of hardware and give an example for each approach. [6]
(c) What do we mean by the Granularity of Parallelism? Give four types of
parallelism in order of granularity and provide an application example for
each. [9]
(d) Analyse the two “for” loops in Listing 1. Describe whether the iterations of
these two loops can be parallelised automatically by compilers and explain
how you reached your conclusions. [4]
CS4021
Continued – 2 –
2 (a) The data to be sent in the MPI_Send routine is represented by a triple, (address,
count, datatype), where address is the starting address of the data, count is the
number of data items and datatype is the type of the data. Discuss the
advantages of this representation over using the alternative pair (address,
length) to represent the data to be sent. In (address, length), address is the
starting address of the data and length is the number of bytes that the data
occupies. [6]
(b) The Ready Mode is an MPI communication mode. Explain the circumstance
under which Ready Mode communication can be used and discuss the benefits
of using Ready Mode communication. [5]
(c) A collective communication operation is performed by all relevant processes at
the same time with the same set of parameters. However different processes
may interpret the parameters differently. Describe, using illustrative examples
if necessary, the operations of the following two MPI collective
communication calls. Further, discuss how different processes interpret
different parameters in MPI_Scatter.
i) MPI_Barrier(MPI_Comm Comm)
ii) MPI_Scatter(void *sendbuf, int sendcnt, MPI_Datatype sendtype,
void *recvbuf, int recvcount, MPI_Datatype recvtype, int root,
MPI_Comm comm) [8]
(d) MPI_Type_vector can be used to construct the users’ own data types. The
format of the function is as follows:
MPI_Type_vector ( int count,
int blocklen,
int stride,
MPI_Datatype oldtype,
MPI_Datatype *newtype)
Let oldtype ={(MPI_CHAR, 0), (MPI_INT, 8)} with the extent of 24 bits.
Give the memory layout of newtype after calling
MPI_Type_vector (2, 2, 10, oldtype, newtype). [6]
CS4021
Continued – 3 –
3. (a) Describe what dependencies exist in the following sequence of statements.
Explain how to remove the anti-dependency and the output dependency in these
statements.
if(a==0){
a=b;
b=2× c;
}
else {
b=a×b;
d=b+e;
a=a+c;
d=c×a;
} [12]
(b) Explain the differences between the “Private” variable and the “Shared”
variable in OpenMP. [5]
(c) Give three ways of setting the number of threads generated in a parallel region
in OpenMP. [3]
(d) Discuss why OpenMP is a higher-level parallelism than MPI. [5]
4. (a) Discuss the drawbacks of using asymptotic analysis to evaluate the performance
of an algorithm. [5]
(b) Amdahl’s law states that if the fraction of the part that has to be run in
sequence in a program is f, then the maximum speed that the program can
achieve is 1/f. Derive the process of obtaining the maximum speed of 1/f. [5]
(c) Explain what the iso-efficiency function is. Assume a program has the
following expression for calculating its parallel efficiency, E. Derive the
iso-efficiency function of the program.
2
2 2
2
3 5
N
E
N P NP
[5]
(d) Modelling the execution time of an application is a good way of evaluating the
performance of the application. Discuss how to model the execution time of an
application. The discussion should cover the modelling of both computation
time and communication time, and the discussion should revolve around the
various parameters used to model the execution time. [10]
CS4021
End – 4 –
5. (a) What aspects of network performance do node degree, bisection width and
diameter in a network represent, respectively? Figure 1 shows the
interconnection topologies of a 1D array and a 1D ring, where “S” represents a
switch, and “P” represents a processor and an edge between two switches
represents a communication link. What are the node degree, bisection width
and diameter of these two topologies? [9]
(b) Describe the general principles of combining MPI and OpenMP to write
parallel programs. [6]
(d) What does single system image mean in Cluster systems? Discuss the benefits
of providing a single system image in Cluster systems. [10]
6. (a) Using data replication and parity information are two possible solutions to
handling disk failures. Discuss the advantages and disadvantages of these two
solutions? [6]
(b) Discuss the advantages of RAID5 over RAID3. [3]
(c) Explain what file view is in parallel I/O and describe the three parameters and
their meanings that are used to define a file view. [6]
(d) There are three potential methods to implement parallel I/O: 1) One process
performs I/O operations for all other processes; 2) Each process reads or writes
the data from or to a separate file; 3) Different processes access different parts of
a common file. Discuss the advantages and disadvantages of each method.
Which method of parallel I/O is most widely used nowadays? [10]
CS4021
End – 1 –
THE UNIVERSITY OF WARWICK
MSc Examinations: Summer 2015
High Performance Computing
Supplementary Information.
Listings and Figures to accompany the CS4021 question paper
Segment 1:
for(i=1; i<=N; i++) A(i)= A(i)×B(i); Segment 2: for(i=3; i<=N; i++) A(i)= A(i-1) × B(i); Listing 1: Code segments for Question 1(d) (a) 1D array (b) 1D ring Figure 1. Network Topologies for Question 5(a)