程序代写代做代考 algorithm compiler CS4021

CS4021

Continued – 1 –

THE UNIVERSITY OF WARWICK

MSc Examinations: Summer 2015

High Performance Computing

Time allowed: 3 hours

Answer FOUR questions.

Calculators may be used.

Read carefully the instructions on the answer book and make sure that the particulars

required are entered on each answer book.

1. (a) Discuss the differences between Cluster systems and Grid systems. [6]

(b) Explain three general approaches to achieving high performance from the

perspective of hardware and give an example for each approach. [6]

(c) What do we mean by the Granularity of Parallelism? Give four types of

parallelism in order of granularity and provide an application example for

each. [9]

(d) Analyse the two “for” loops in Listing 1. Describe whether the iterations of

these two loops can be parallelised automatically by compilers and explain

how you reached your conclusions. [4]

CS4021

Continued – 2 –

2 (a) The data to be sent in the MPI_Send routine is represented by a triple, (address,

count, datatype), where address is the starting address of the data, count is the

number of data items and datatype is the type of the data. Discuss the

advantages of this representation over using the alternative pair (address,

length) to represent the data to be sent. In (address, length), address is the

starting address of the data and length is the number of bytes that the data

occupies. [6]

(b) The Ready Mode is an MPI communication mode. Explain the circumstance

under which Ready Mode communication can be used and discuss the benefits

of using Ready Mode communication. [5]

(c) A collective communication operation is performed by all relevant processes at

the same time with the same set of parameters. However different processes

may interpret the parameters differently. Describe, using illustrative examples

if necessary, the operations of the following two MPI collective

communication calls. Further, discuss how different processes interpret

different parameters in MPI_Scatter.

i) MPI_Barrier(MPI_Comm Comm)

ii) MPI_Scatter(void *sendbuf, int sendcnt, MPI_Datatype sendtype,

void *recvbuf, int recvcount, MPI_Datatype recvtype, int root,

MPI_Comm comm) [8]

(d) MPI_Type_vector can be used to construct the users’ own data types. The

format of the function is as follows:

MPI_Type_vector ( int count,

int blocklen,

int stride,

MPI_Datatype oldtype,

MPI_Datatype *newtype)

Let oldtype ={(MPI_CHAR, 0), (MPI_INT, 8)} with the extent of 24 bits.

Give the memory layout of newtype after calling

MPI_Type_vector (2, 2, 10, oldtype, newtype). [6]

CS4021

Continued – 3 –

3. (a) Describe what dependencies exist in the following sequence of statements.

Explain how to remove the anti-dependency and the output dependency in these

statements.

if(a==0){

a=b;

b=2× c;

}

else {

b=a×b;

d=b+e;

a=a+c;

d=c×a;

} [12]

(b) Explain the differences between the “Private” variable and the “Shared”

variable in OpenMP. [5]

(c) Give three ways of setting the number of threads generated in a parallel region

in OpenMP. [3]

(d) Discuss why OpenMP is a higher-level parallelism than MPI. [5]

4. (a) Discuss the drawbacks of using asymptotic analysis to evaluate the performance

of an algorithm. [5]

(b) Amdahl’s law states that if the fraction of the part that has to be run in

sequence in a program is f, then the maximum speed that the program can

achieve is 1/f. Derive the process of obtaining the maximum speed of 1/f. [5]

(c) Explain what the iso-efficiency function is. Assume a program has the

following expression for calculating its parallel efficiency, E. Derive the

iso-efficiency function of the program.

2

2 2

2

3 5

N
E

N P NP

 
[5]

(d) Modelling the execution time of an application is a good way of evaluating the

performance of the application. Discuss how to model the execution time of an

application. The discussion should cover the modelling of both computation

time and communication time, and the discussion should revolve around the

various parameters used to model the execution time. [10]

CS4021

End – 4 –

5. (a) What aspects of network performance do node degree, bisection width and

diameter in a network represent, respectively? Figure 1 shows the

interconnection topologies of a 1D array and a 1D ring, where “S” represents a

switch, and “P” represents a processor and an edge between two switches

represents a communication link. What are the node degree, bisection width

and diameter of these two topologies? [9]

(b) Describe the general principles of combining MPI and OpenMP to write

parallel programs. [6]

(d) What does single system image mean in Cluster systems? Discuss the benefits

of providing a single system image in Cluster systems. [10]

6. (a) Using data replication and parity information are two possible solutions to

handling disk failures. Discuss the advantages and disadvantages of these two

solutions? [6]

(b) Discuss the advantages of RAID5 over RAID3. [3]

(c) Explain what file view is in parallel I/O and describe the three parameters and

their meanings that are used to define a file view. [6]

(d) There are three potential methods to implement parallel I/O: 1) One process

performs I/O operations for all other processes; 2) Each process reads or writes

the data from or to a separate file; 3) Different processes access different parts of

a common file. Discuss the advantages and disadvantages of each method.

Which method of parallel I/O is most widely used nowadays? [10]

CS4021

End – 1 –

THE UNIVERSITY OF WARWICK

MSc Examinations: Summer 2015

High Performance Computing

Supplementary Information.

Listings and Figures to accompany the CS4021 question paper

Segment 1:

for(i=1; i<=N; i++) A(i)= A(i)×B(i); Segment 2: for(i=3; i<=N; i++) A(i)= A(i-1) × B(i); Listing 1: Code segments for Question 1(d) (a) 1D array (b) 1D ring Figure 1. Network Topologies for Question 5(a)