程序代写代做代考 algorithm compiler THE UNIVERSITY OF WARWICK

THE UNIVERSITY OF WARWICK

CS4022

Continued – 1 –

THE UNIVERSITY OF WARWICK

Fourth Year Examinations: Summer 2015

High Performance Computing

Time allowed: 3 hours

Answer FOUR questions.

Calculators may be used.

Read carefully the instructions on the answer book and make sure that the particulars

required are entered on each answer book.

1. (a) Discuss the differences between vector processors and MPP (Massively

Parallel Processing) machines. [6]

(b) Explain three general approaches to achieving high performance from the

perspective of hardware and give an example for each approach. [6]

(c) What do we mean by the Granularity of Parallelism? Give four types of

parallelism in order of granularity and provide an application example for

each. [9]

(d) Analyse the two “for” loops in Listing 1. Describe whether the iterations of

these two loops can be parallelised automatically by compilers and explain

how you reached your conclusions. [4]

CS4022

Continued – 2 –

2 (a) The data to be sent in the MPI_Send routine is represented by a triple, (address,

count, datatype), where address is the starting address of the data, count is the

number of data items and datatype is the type of the data. Discuss the

advantages of this representation over using the alternative pair (address,

length) to represent the data to be sent. In (address, length), address is the

starting address of the data and length is the number of bytes that the data

occupies. [6]

(b) The Ready Mode is an MPI communication mode. Explain the circumstance

under which Ready Mode communication can be used and discuss the benefits

of using Ready Mode communication. [5]

(c) A collective communication operation is performed by all relevant processes at

the same time with the same set of parameters. However different processes

may interpret the parameters differently. Describe, using illustrative examples

if necessary, the operations of the following two MPI collective

communication calls. Further, discuss how different processes interpret

different parameters in MPI_Scatter.

i) MPI_Barrier(MPI_Comm Comm)

ii) MPI_Scatter(void *sendbuf, int sendcnt, MPI_Datatype sendtype,

void *recvbuf, int recvcount, MPI_Datatype recvtype, int root,

MPI_Comm comm) [8]

(d) MPI_Type_struct can be used to construct the users’ own data types. The

format of the function is as follows:

MPI_Type_struct ( int count,

int *array_of_blocklengths,

MPI_Int *array_of_displacements,

MPI_Datatype *array_of_types,

MPI_Datatype *newtype)

Let oldtype1 ={(MPI_CHAR, 0), (MPI_INT, 8)} with the extent of 24 bits (3

bytes) and oldtype2={(MPI_FLOAT, 0), (MPI_INT, 32)} with the extent of 48

bits (6 bytes). Let B=(1, 2), D=(2, 20) and C=(oldtype1, oldtype2). Note that

the parameter *array_of_displacements holds the BYTE displacement of each

block.

Give the memory layout of newtype after calling

MPI_Type_struct (2, B, D, C, newtype). [6]

CS4022

Continued – 3 –

3. (a) Describe what dependencies exist in the following sequence of statements.

Explain how to remove the anti-dependency and the output dependency in these

statements.

if(a==0){

a=b;

b=2× c;

}

else {

b=a×b;

d=b+e;

a=a+c;

d=c×a;

} [12]

(b) Explain the differences between the “Private” variable and the “Reduction”

variable in OpenMP. [5]

(c) Give three ways of setting the number of threads generated in a parallel region

in OpenMP. [3]

(d) Discuss why OpenMP is a higher-level parallelism than MPI. [5]

4. (a) Discuss the drawbacks of using asymptotic analysis to evaluate the performance

of an algorithm. [5]

(b) Amdahl’s law states that if the fraction of the part that has to be run in

sequence in a program is f, then the maximum speed that the program can

achieve is 1/f. Derive the process of obtaining the maximum speed of 1/f. [5]

(c) Explain what the iso-efficiency function is. Assume a program has the

following expression for calculating its parallel efficiency, E, where N is the

problem size and P is the number of processors. Derive the iso-efficiency

function of the program and explain your answer.

2

2 2

2

3 5

N
E

N P NP

 
[5]

(d) Modelling the execution time of an application is a good way of evaluating the

performance of the application. Discuss how to model the execution time of an

application. The discussion should explain the equations used to calculate

computation time and communication time. Also, the discussion should describe

various parameters that should be considered when modelling the execution

time. [10]

CS4022

End – 4 –

5. (a) What aspects of network performance do node degree, bisection width and

diameter in a network represent, respectively? Figure 1 shows the

interconnection topologies of a 2D mesh and a 2D torus, where “S” represents

a switch and “P” represents a processor, and an edge between two switches

represents a communication link. What are the node degree, bisection width

and diameter of these two topologies? [9]

(b) Describe the general principles of combining MPI and OpenMP to write

parallel programs. [6]

(d) What does single system image mean in Cluster systems? Discuss the benefits

of providing a single system image in Cluster systems. [10]

6. (a) Using data replication and parity information are two possible solutions to

handling disk failures. Discuss the advantages and disadvantages of these two

solutions? [6]

(b) Explain the similarities between RAID5 and RAID3, and discuss the advantages

of RAID5 over RAID3. [3]

(c) When one or more processes update the same data, I/O consistency needs to be

maintained. Explain why the I/O accesses may not be consistent. Describe three

methods that can be used to ensure I/O consistency. [6]

(d) There are three potential methods to implement parallel I/O: 1) One process

performs I/O operations for all other processes; 2) Each process reads or writes

the data from or to a separate file; 3) Different processes access different parts of

a common file. Discuss the advantages and disadvantages of each method.

Which method of parallel I/O is most widely used nowadays? [10]

CS4022

End – 1 –

THE UNIVERSITY OF WARWICK

Fourth Year Examinations: Summer 2015

High Performance Computing

Supplementary Information.

Listings and Figures to accompany the CS4022 question paper

Segment 1:

for(i=1; i<=N; i++) { A(i)= B(i-1) ×B(i-2); B(i)= A(i)×B(i); } Segment 2: for(i=3; i<=N; i++) { A(i)= A(i-1) × B(i); B(i)= B(i-2) + B(i-1); } Listing 1: Code segments for Question 1(d) (a) 2D mesh (b) 2D torus Figure 1. Network Topologies for Question 5(a)