CS代考 COMP5426 Distributed

COMP5426 Distributed
mming Distributed Memory
Platforms (

ting (GEPP)

for ib = 1 to n-1 step b … Process matrix b columns at a time
end = ib + b-1 … Point to end of block of b columns
apply BLAS2 version of GEPP to get A(ib:n , ib:end) = L’ * U’
… let LL denote the strict lower triangular part of A(ib:end , ib:end) + I
A(ib:end , end+1:n) = LL-1 * A(ib:end , end+1:n)
A(end+1:n , end+1:n ) = A(end+1:n , end+1:n )
– A(end+1:n , ib:end) * A(ib:end , end+1:n)
… apply delayed updates with single matrix-multiply … with inner dimension b
… update next b ro

or distribu
This is simply because owned by any thread
For shared memory machines, actually, the task assignment presented in the previous lecture is just
ted memory machines data mu
 Exchange data using message passing
 Must consider how to
flexibility for task

2D cyclic block: mesh/torus
Assgn for D
processes are als
o organized as a 2D

Matrix multiply of
green = green – blue * pink

P1·L1·U1 P2·L2·U2 P3·L3·U3 P4·L4·U4
and use these b pivot
Avoiding G
Choose b Choose b Choose b
pivot rows of pivot rows of pivot rows of
pivot rows of
without pivoting
Not the same pivots rows chosen as for GEPP
 Need to show numerically stable (D., Grigori, Xiang, ‘11) 10
W1, call th W2, call th W3, call th W4, call th
pivot rows, call them W12’
W1 W2 W3 W4
pivot rows, call them W34’
pivot rows
rows (i.e., move

Avoiding G
Choose b pivot rows

MPI example:
for ( int i
MPI example:

for ( int i = 0; i < steps; i++ ) {
    /* computation here */
    MPI_Send(&buf, n, MPI_DOUBLE, 1, 42, comm);
}

for ( int i = 0; i < steps; i++ ) {
    MPI_Recv(&buf, n, MPI_DOUBLE, 0, 42, comm, &status);
    /* do some computation */
}

MPI example:
MPI_Request req[2]; /* each Isend/Irecv needs a request */
MPI_Wait(&req[idx], &status);
idx = (idx + 1) % 2;
MPI_Isend(&buf, n, MPI_DOUBLE, 1, 42, comm, &req[idx]);
MPI_Irecv(&buf, n, MPI_DOUBLE, 0, 42, comm, &req[idx]);

for ( int i = 0; i < steps; i++ ) {
    MPI_Wait(&req[i], &status);
    idx = (idx + 1) % 2;
    MPI_Irecv(&buf, n, MPI_DOUBLE, 0, 42, comm, &req[idx]);
    /* do some computation here*/
}

if (rank == 0) {
    idx = 0;
    /* do some computation here */
    MPI_Isend(&buf, n, MPI_DOUBLE, 1, 42, comm, &req[idx]);
    for ( int i = 0; i < steps; i++ ) {
        /* computation */
    }
}

MPI example:
In distributed memory across the processes we need to consider data and task assignment simultaneously. It becomes more restrictive for distributed systems. We must seriously consider data locality, load balancing and the efficiency of resource utilization. Additional cost for data communication means we must also seriously consider how to minimize communication overhead.