程序代写代做代考 data structure c/c++ algorithm MPI types, Scatter and

MPI types, Scatter and
Scatterv

Wednesday, April 6, 16

MPI types, Scatter and
Scatterv

0 1 2 3 4 5

6 7 8 9 10 11

12 13 14 15 16 17

18 19 20 21 22 23

24 25 26 27 28 29

30 31 32 33 34 35

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

Logical and physical layout of a
C/C++ array in memory.

A = malloc(6*6*sizeof(int));

Wednesday, April 6, 16

MPI_Scatter
int MPI_Scatter(
const void *sendbuf, // data to send
int sendcount, // sent to each process
MPI_Datatype sendtype,// type of data sent
void *recvbuf, // where received
int recvcount, // how much to receive
MPI_Datatype recvtype,// type of data received
int root, // sending process
MPI_Comm comm) // communicator

sendbuf, sendcount, sendtype valid only at the
sending process

Wednesday, April 6, 16

Equal number elements to all
processors

0 1 2 3 4 5

6 7 8 9 10 11

12 13 14 15 16 17

18 19 20 21 22 23

24 25 26 27 28 29

30 31 32 33 34 35

int MPI_Scatter(A, 9, MPI_Int, B, 9,
MPI_Int,0,
MPI_COMM_WORLD)

A

3 4 50 1 2 6 7 8P0

12 13 149 10 11 15 16 17P1

21 22 2318 19 20 24 25 26P2

30 31 3227 28 29 33 34 35P3

Wednesday, April 6, 16

MPI_Scatterv
int MPI_Scatter(
const void *sendbuf, // data to send
const int *sendcounts,// sent to each process

const int* displ // where in sendbuf
// sent data is

MPI_Datatype sendtype,// type of data sent
void *recvbuf, // where received
int recvcount, // how much to receive
MPI_Datatype recvtype,// type of data received
int root, // sending process
MPI_Comm comm) // communicator

sendbuf, sendcount, sendtype valid only at the
sending process

Wednesday, April 6, 16

Specify the number elements
sent to each processor

int[] counts = {10, 9, 8, 9};
int[] displ = {0, 10, 19, 27};
int MPI_Scatterv(A, counts, displs, MPI_Int,rb, counts, MPI_Int 0,

MPI_COMM_WORLD)

0 1 2 3 4 5

6 7 8 9 10 11

12 13 14 15 16 17

18 19 20 21 22 23

24 25 26 27 28 29

30 31 32 33 34 35

A

3 4 50 1 2 6 7 8P0

21 22 2319 20 24 25 26P2

30 31 3227 28 29 33 34 35P3

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

9

P1 12 13 1410 11 15 16 17 18

rb

Wednesday, April 6, 16

What if we want to scatter
columns (C array layout)

0 1 2 3 4 5

6 7 8 9 10 11

12 13 14 15 16 17

18 19 20 21 22 23

24 25 26 27 28 29

30 31 32 33 34 35

A
18 24 300 6 12P0

19 25 311 7 13P1

20 26 322 8 14P2

21 27 333 9 15P3

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

22 28 344 10 16P4

23 29 355 11 17P5

Wednesday, April 6, 16

MPI_Type_vector

int MPI_Type_vector(
int count, // number of blocks
int blocklength, // #elts in a blocks
int stride, // #elts between block starts
MPI_Datatype oldtype, // type of block elements
MPI_Datatype *newtype // handle for new type
)

Allows a type to be created that puts together blocks of
elements in a vector into another vector.

Note that a 2-D array in contiguous memory can be treated as
a 1-D vector.

Wednesday, April 6, 16

MPI_Type_create_resized

int MPI_Type_create_resized(
MPI_Datatype oldtype, // type being resized
MPI_Aint lb, // new lower bound
MPI_Aint extent, // new extent (“length”)
MPI_Datatype *newtype) // resized type name
)

Allows a new size (or extent) to be assigned to an existing type.

Allows MPI to determine how far from an object O1 the next
adjacent object O2 is. As we will see this is often necessitated
because we treat a logically 2-D array as a 1-D vector.

Wednesday, April 6, 16

Using MPI_Type_vector

MPI_Datatype col, coltype;
MPI_Type_vector(6, 1, 6, MPI_INT,

&col);
MPI_Type_commit(&col); // not necessary
MPI_Type_create_resized(col, 0,
1*sizeof(int), &coltype);
MPI_Type_commit(&coltype); // needed
MPI_Scatter(A, 1, coltype, rb,
6, MPI_Int, 0, MPI_COMM_WORLD);

A
0 1 2 3 4 5

6 7 8 9 10 11

12 13 14 15 16 17

18 19 20 21 22 23

24 25 26 27 28 29

30 31 32 33 34 35

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

Wednesday, April 6, 16

MPI_Datatype col, coltype;
MPI_Type_vector(6, 1, 6, MPI_INT,

&col);
MPI_Type_commit(&col); // not necessary
MPI_Type_create_resized(col, 0,
1*sizeof(int), &coltype);
MPI_Type_commit(&coltype); // needed
MPI_Scatter(A, 1, coltype, rb,
6, MPI_Int, 0, MPI_COMM_WORLD);

A
0 1 2 3 4 5

6 7 8 9 10 11

12 13 14 15 16 17

18 19 20 21 22 23

24 25 26 27 28 29

30 31 32 33 34 35

Each of 6 blocks is made of 1 ints of
length one, and the next block starts
6 positions in the linearized array
from the start of the previous block.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

MPI_Type_vector: defining the
type

1

1 2 3 4 5 6
Wednesday, April 6, 16

Using
MPI_type_create_resized

MPI_Datatype col, coltype;
MPI_Type_vector(6, 1, 6, MPI_INT,

&col);
MPI_Type_commit(&col); // not necessary

MPI_Type_create_resized(col, 0,
1*sizeof(int), &coltype);
MPI_Type_commit(&coltype); // needed
MPI_Scatter(A, 1, coltype, rb,
6, MPI_Int, 0, MPI_COMM_WORLD);

A
0 1 2 3 4 5

6 7 8 9 10 11

12 13 14 15 16 17

18 19 20 21 22 23

24 25 26 27 28 29

30 31 32 33 34 35

resize creates a new type from a previous
type and changes the size. This allows
easier computation of the offset from one
element of a type to the next element of
a type in the original data structure.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

1

Wednesday, April 6, 16

MPI_Datatype col, coltype;
MPI_Type_vector(6, 1, 6, MPI_INT,

&col);
MPI_Type_commit(&col); // not necessary
MPI_Type_create_resized(col, 0,
1*sizeof(int), &coltype);
MPI_Type_commit(&coltype); // needed
MPI_Scatter(A, 1, coltype, rb,
6, MPI_Int, 0, MPI_COMM_WORLD);

A
0 1 2 3 4 5

6 7 8 9 10 11

12 13 14 15 16 17

18 19 20 21 22 23

24 25 26 27 28 29

30 31 32 33 34 35

resize creates a new type from a previous
type and changes the size. This allows
easier computation of the offset from one
element of a type to the next element of
a type in the original data structure.

one object of type
col starts here

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

The next starts here, one
sizeof(int) away.

one object of type
col starts here

The next starts here, one
sizeof(int) away.

Wednesday, April 6, 16

The result of the
communication

MPI_Datatype col, coltype;
MPI_Type_vector(6, 1, 6, MPI_INT,

&col);
MPI_Type_commit(&col); // not necessary
MPI_Type_create_resized(col, 0,
1*sizeof(int), &coltype);
MPI_Type_commit(&coltype); // needed
MPI_Scatter(A, 1, coltype, rb,
6, MPI_Int, 0, MPI_COMM_WORLD);

A
0 1 2 3 4 5

6 7 8 9 10 11

12 13 14 15 16 17

18 19 20 21 22 23

24 25 26 27 28 29

30 31 32 33 34 35

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

1

0 6 12 18 24 30

1 7 13 19 25 31
. . .
5 11 17 23 29 35

P0

P1

P2

Wednesday, April 6, 16

Scattering diagonal blocks

MPI_Datatype block, blocktype;
MPI_Type_vector(2, 2, 6, MPI_INT,

&block);
MPI_Type_commit(&block); // not necessary
MPI_Type_create_resized(block, 0,
14*sizeof(int), &blocktype);
MPI_Type_commit(&blocktype); // needed
int MPI_Scatter(A, 1, blocktype, B, 4,

MPI_Int,0,
MPI_COMM_WORLD)

A
5

11

17

23

29

0 1 2 3 4

6 7 8 9 10

12 13 14 15 16

18 19 20 21 22

24 25 26 27 28

30 31 32 33 34 35

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

2

1 2

6

note that 2*numrows + width of block =14

Wednesday, April 6, 16

Scattering the blocks

MPI_Datatype block, blocktype;
MPI_Type_vector(2, 2, 14, MPI_INT,

&block);
MPI_Type_commit(&block); // not necessary
MPI_Type_create_resized(block, 0,
14*sizeof(int), &blocktype);
MPI_Type_commit(&blocktype); // needed
int MPI_Scatter(A, 1, blocktype, B, 4,

MPI_Int,0,
MPI_COMM_WORLD)

A
5

11

17

23

29

0 1 2 3 4

6 7 8 9 10

12 13 14 15 16

18 19 20 21 22

24 25 26 27 28

30 31 32 33 34 35 0 1 6 7P0

14 15 20 21P1

2928 34 35P2

B

Wednesday, April 6, 16

The Type_vector statement
describing this

MPI_Datatype block, blocktype;
MPI_Type_vector(3, 3, 6, MPI_INT,

&block);
MPI_Type_commit(&block); // not necessary
MPI_Type_create_resized(block, 0,
3*sizeof(int), &blocktype);
MPI_Type_commit(&blocktype); // needed

A
5

11

17

23

29

0 1 2 3 4

6 7 8 9 10

12 13 14 15 16

18 19 20 21 22

24 25 26 27 28

30 31 32 33 34 35

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

3

1 2 3

6

Wednesday, April 6, 16

The create_resize statement
for this

MPI_Datatype block, blocktype;
MPI_Type_vector(3, 3, 6, MPI_INT,

&block);
MPI_Type_commit(&block); // not necessary
MPI_Type_create_resized(block, 0,
3*sizeof(int), &blocktype);
MPI_Type_commit(&blocktype); // needed

A
5

11

17

23

29

0 1 2 3 4

6 7 8 9 10

12 13 14 15 16

18 19 20 21 22

24 25 26 27 28

30 31 32 33 34 35

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

3 15 3

Distance between start of blocks
varies, but are multiples of 3. Use
MPI_Scatterv

Wednesday, April 6, 16

Sending the data
MPI_Datatype block, blocktype;
int disp = {0, 1, 6, 7)
int scount = {1, 1, 1, 1}
int rcount = {9, 9, 9, 9}
MPI_Type_vector(3, 3, 6, MPI_INT,

&block);
MPI_Type_commit(&block); // not necessary
MPI_Type_create_resized(block, 0,
3*sizeof(int), &blocktype);
MPI_Type_commit(&blocktype); // needed
int MPI_Scatterv(A, scount, displ,

blocktype, rb, rcount,
MPI_Int, 0,
MPI_COMM_WORLD)

A
5

11

17

23

29

0 1 2 3 4

6 7 8 9 10

12 13 14 15 16

18 19 20 21 22

24 25 26 27 28

30 31 32 33 34 35

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

3 15 3

0 1 6 7 displacement is sizeof(blockcol)
Wednesday, April 6, 16

Matrix Multiply
Cannon’s Algorithm

• Useful for the small project

Wednesday, April 6, 16

Parallel Algorithm 2
(Cannon’s Algorithm)

• Associate a primitive task with each matrix element
• Agglomerate tasks responsible for a square (or nearly

square) block of C (the result matrix)

• Computation-to-communication ratio rises to n / √p
(same total computation, more computation per
communication)

• 2n / p < n / √p when p > 4

Wednesday, April 6, 16

A simplifying assumption

• Assume that
• A, B and (consequently) C are n x n square

matrices

• √p is an integer, and
• n = k⋅√p, k an integer (i.e. n is a multiple of √p

Wednesday, April 6, 16

Elements of A and B Needed to
Compute a Process’s Portion of C

Algorithm 1

Cannon’s
Algorithm

Wednesday, April 6, 16

Blocks need to compute
a C element

A B C

Wednesday, April 6, 16

Blocks need to compute
a C element

A B C

Processor that owns these blocks fully computes value of this C
block (but needs more data than this)

Wednesday, April 6, 16

Blocks needed to compute a C element

CBA

Processor P3,2 needs, at some point, to simultaneously hold the green
A and B blocks, the red A and B blocks, the blue A and B blocks, and
the cayenne A and B blocks.

With the current data layout it cannot do useful work because it does
not contain matching A and B blocks (it has a red A and blue B block)

Wednesday, April 6, 16

3,2 3,2 3,2

Blocks needed to compute a C element

A B C

We need to rearrange the data so that every block has
useful work to do

The initial data configuration does not provide for this

Wednesday, April 6, 16

A B C

Move every element Ai,j over i columns

Move every element Bi,j up j rows (0 based
index)

Change the initial data setup

Wednesday, April 6, 16

Move every element Ai,j over i columns

Move every element Bi,j up j rows

CBA

Change the initial data setup

Wednesday, April 6, 16

Every processor now has useful work to
do

Note — this only shows the full data layout for one
processor

CBA

Wednesday, April 6, 16

At each step in the multiplication, shift B
elements up within their column, and A

elements left within their row

CBA

First partial sum

Wednesday, April 6, 16

At each step in the multiplication, shift B
elements up within their column, and A

elements left within their row

CBA

Second partial sum

Wednesday, April 6, 16

At each step in the multiplication, shift B
elements up within their column, and A

elements left within their row

CBA

Third partial sum

Wednesday, April 6, 16

At each step in the multiplication, shift B
elements up within their column, and A

elements left within their row

CBA

Fourth partial sum

Wednesday, April 6, 16

Another way to view this

Before After

Wednesday, April 6, 16

Another way to view this

Before After
This goes

here
A block goes here
(over 2 (i) rows)

B block goes here
(up 1 (j) rows)

B block goes here
(up 1 (j) rows)

Wednesday, April 6, 16

Yet another way to view
this

A00
B00

A01
B01

A02
B02

A03
B03

A10
B10

A11

B11

A12
B12

A13
B13

A20
B20

A21
B21

A22
B22

A23
B23

A30
B30

A31
B31

A32
B32

A33
B33

Each triangle
represents a
matrix block

Only same-color
triangles should
be multiplied

Wednesday, April 6, 16

Rearrange Blocks

A00

B00

A01

B01

A02

B02

A03

B03
A10

B10

A11

B11

A12

B12

A13

B13

A20

B20

A21

B21

A22

B22

A23

B23

A30

B30

A31

B31

A32

B32

A33

B33

Block Ai,j shifts
left i positions

Block Bi,j shifts
up j positions

Wednesday, April 6, 16

Consider Process P1,2

B02

A10A11 A12

B12

A13

B22

B32 Step 1
Next communication

Next communication

Wednesday, April 6, 16

Consider Process P1,2

B12

A11A12 A13

B22

A10

B32

B02 Step 2
Next communication

Next communication

Wednesday, April 6, 16

Consider Process P1,2

B22

A12A13 A10

B32

A11

B02

B12 Step 3
Next communication

Next communication

Wednesday, April 6, 16

Consider Process P1,2

B32

A13A10 A11

B02

A12

B12

B22 Step 4
Next communication

Next communication

Wednesday, April 6, 16

Complexity Analysis

• Algorithm has √p iterations
• During each iteration process multiplies two

(n / √p ) × (n / √p ) matrices: Θ(n / √p)3 or Θ(n3 / p 3/2)

• Overall computational complexity: √p n3/p 3/2 or Θ(n3 / p)
• During each √p iterations a process sends and receives

two blocks of size (n / √p ) × (n / √p )

• Overall communication complexity: Θ(n2/ √p)

Wednesday, April 6, 16