Overview Load balancing Work pools Summary and next lecture
XJCO3221 Parallel Computation
University of Leeds
Copyright By PowCoder代写 加微信 powcoder
Lecture 13: Load balancing
XJCO3221 Parallel Computation
Load balancing Previous lectures
Work pools Today’s lecture Summary and next lecture
Previous lectures
Several times in this module we have mentioned the concept of load balancing:
Poor load balancing results in processing units spending time idle.
Usually realised when synchronising threads or processes. First encountered for the Mandelbrot set generator [Lecture 3].
Important for parallel performance for all architectures — shared and distributed memory CPU, and GPU.
XJCO3221 Parallel Computation
Load balancing Previous lectures Work pools Today’s lecture
Summary and next lecture
Today’s lecture
Today we will look at load balancing more closely, and how to reduce its performance penalty:
Return to the example of the Mandelbrot set generator, this time in MPI.
Understand how heterogeneity in the problem results in poor load balancing.
See how a task scheduler can improve load balancing at runtime.
Go through a concrete example of a work pool.
XJCO3221 Parallel Computation
Load balancing
Work pools Summary and next lecture
Mandelbrot set again
Load balancing Static load balancing
The Mandelbrot set (c.f. Lecture 3) Code on Minerva: Mandelbrot MPI.c plus makefile
Domain is −2 ≤ x ≤ 2 and −2≤y ≤2.
Calculation performed iteratively for each (x,y).
Pixel coloured according to the number of iterations.
Here, the black region corresponds to a high number of iterations.
No upper bound – some points will iterate indefinitely if allowed.
XJCO3221 Parallel Computation
Load balancing
Work pools Summary and next lecture
Mandelbrot set again
Load balancing Static load balancing
Strip partitioning
Partition the domain [cf. last lecture] into horizontal strips1: Process 0
Process 1 Process 2 Process 3
Process p-1
1Equivalent results for partitioning into vertical strips, or blocks.
XJCO3221 Parallel Computation
Load balancing
Work pools Summary and next lecture
Mandelbrot set again
Load balancing Static load balancing
Load imbalance
Because some pixels take longer to calculate the colour than others, the load is unevenly distributed across the processes: parallel execution time
Process 0 Process 1 Process 2 Process 3
Process p-1
finished finished
XJCO3221 Parallel Computation
Load balancing
Work pools Summary and next lecture
Mandelbrot set again
Load balancing
Static load balancing
Load balancing
Parallel execution time determined by the last processing unit to finish.
Poor load balancing results in significant idle time for at least one process/thread.
Inefficient use of available resources.
Definition
The goal of load balancing is for each processing unit (thread or process) to perform a similar volume of computations, and therefore finish at roughly the same time.
and XJCO3221 Parallel Computation
Load balancing
Work pools Summary and next lecture
Mandelbrot set again
Load balancing
Static load balancing
Up until now most problems we have encountered have been naturally load balanced.
For example, for vector addition between two n-vectors, assigning each processing unit to equal numbers of vector elements results in good load balancing.
Each unit performs n/p additions.
Note that the Mandelbrot set is a map, i.e. an embarrassingly
parallel problem (since there are no data dependencies). Still a challenge to attain good performance.
XJCO3221 Parallel Computation
Load balancing
Work pools Summary and next lecture
Mandelbrot set again Load balancing Static load balancing
Static load balancing
Definition
Sometimes it is possible to determine (approximately) equal loads at compile time. This is known as static load balancing.
For the Mandelbrot set example, we could assign larger domains to regions where the calculations should be fast.
Should improve load balancing.
However, an exact expression is not available. Therefore any such
heuristic can only achieve approximate load balancing. and XJCO3221 Parallel Computation
Load balancing
Work pools Summary and next lecture
Mandelbrot set again Load balancing Static load balancing
Static load balancing (ideal case)
Process 1 Process 2
parallel execution time finished
finished …
saving compared to unbalanced version
XJCO3221 Parallel Computation
Overview Load balancing Work pools Summary and next lecture
Dynamic load balancing
Work pools in MPI
Schedulers and runtime implementations MIMD at last!
Dynamic load balancing
Definition
Dynamic load balancing is performed at runtime. No a priori knowledge of computation times is required.
Basic idea:
1 Break the problem down into small independent tasks.
2 Each processing unit performs one task at a time.
3 When it is complete, it starts/is assigned another task.
4 Repeat 3 until all tasks are complete.
and XJCO3221 Parallel Computation
Overview Load balancing Work pools Summary and next lecture
Dynamic load balancing
Work pools in MPI
Schedulers and runtime implementations MIMD at last!
Full problem broken down into tasks:
Tasks performed sequentially on different threads/processes:
Processing unit 1
Processing unit 2
Processing unit 3
XJCO3221 Parallel Computation
Overview Load balancing Work pools Summary and next lecture
Dynamic load balancing
Work pools in MPI
Schedulers and runtime implementations MIMD at last!
Functional or task parallelism
Up to now we have largely considered parallelising the same operation to a (large) data set.
Known as data parallelism.
When performed in a loop, can also be termed loop
parallelism.
Now we are parallelising a number of tasks.
Called task parallelism or functional parallelism.
Be warned that these terms are sometimes used to refer to slightly different concepts.
More on task parallelism in Lecture 19.
XJCO3221 Parallel Computation
Overview Load balancing Work pools Summary and next lecture
Dynamic load balancing
Work pools in MPI
Schedulers and runtime implementations MIMD at last!
Work pools
The scheduler assigns tasks to processing units at runtime.
Often part of the parallel/concurrency runtime system. Introduces a (small) overhead.
Various algorithms exist for efficient task scheduling.
To understand the role of a scheduler, we will look at a simple scheduler implemented in MPI – a centralised work pool.
One process (usually rank 0) performs the scheduling – this is the main process1.
Remaining processes action the tasks – the workers1.
1You may see ‘master’ (for main) and ‘slaves’ (for workers) in the literature. and XJCO3221 Parallel Computation
Overview Load balancing Work pools Summary and next lecture
Dynamic load balancing
Work pools in MPI
Schedulers and runtime implementations MIMD at last!
Worker pseudocode
Function workerProcess() in Mandelbrot MPI.c
1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16
initialise(); // Including MPI_Init().
while( true ) {
// Wait for message from the main (rank 0).
MPI_Recv( message , … );
// Is this a termination request?
if( message==TERMINATE ) break;
// Else perform calculation and send back to rank 0.
result = actionTask( message );
MPI_Send( result , … ); }
finalise(); // Including MPI_Finalize().
XJCO3221 Parallel Computation
Overview Load balancing Work pools Summary and next lecture
Dynamic load balancing
Work pools in MPI
Schedulers and runtime implementations MIMD at last!
Main process pseudocode (1)
Function mainProcess() in Mandelbrot MPI.c
initialiseAndOpenWindow();
// Initialise variable that tracks progress.
int numActive =0;
// Send initial request to each worker.
for( p=1; p
// Get result from ANY worker process.
MPI_Recv(result ,…,MPI_ANY_SOURCE ,…,&status); numActive –;
// Send request IMMEDIATELY to the SAME worker.
if( !finished ) {
MPI_Send(task ,…,status.MPI_SOURCE ,…);
numActive ++; }
// Action the message.
actionResult( result ); }
XJCO3221 Parallel Computation
Overview Load balancing Work pools Summary and next lecture
Dynamic load balancing
Work pools in MPI
Schedulers and runtime implementations MIMD at last!
Main process pseudocode (3)
Function idle() in Mandelbrot MPI.c
// Tell all workers to terminate.
for( p=1; p