High Performance Computing
Performance I
Dr Ligang He
Copyright By PowCoder代写 加微信 powcoder
Four approaches to modelling the performance of a parallel application
Model execution time (Construct the performance model)
Amdahl’s law
Asymptotic analysis
Modelling execution time
Atmosphere model
p The atmosphere model is used in weather forecast
p Capture the relation among atmospheric attributes, such as
velocity of air, temperature, pressure, moisture, etc.
p The model is established by physical laws and represented
by a set of partial differential equations
an example
Partial differential equations in the atmosphere model
Modelling execution time Atmosphere model
p too complicated to be solved by mathematical derivation
p resort to the numerical method
p Cannot generate the solution for every physical point
p discretize the space, i.e., partition the space into a set of cells and use one point in a cell to represent all points in the cell.
p a continuous space is approximated by a finite set of regularly spaced points in that space
p Calculate the solution at these discrete points
an example
Communication pattern
Each point uses the nine-point stencil to calculate its horizontal motion and uses the three-point stencil to calculate its vertical motion
Fig.2. The 3-point stencil for calculating the vertical velocity
Fig. 1. The 9-point stencil for calculating the horizontal velocity
• When processor 1 wants to calculate the data points allocated to it, how many data items does it have to obtain from processor 2?
Modelling computation time
If we assume a grid of size N*N*Z points, and using 1- D decomposition (along y-axis) to partition the grid among P processors, then
p each task is responsible for a subgrid of size N*(N/P)*Z
Modelling computation time
If we assume a grid of size N*N*Z points, and using 1- D decomposition (along y-axis) to partition the grid among P processors, then
p each task is responsible for a subgrid of size N*(N/P)*Z p then, Tcomp for each subgrid can be calculated as follows,
Tcomp=tc*N*(N/P)*z
where tc is the average time of calculating a single grid point
Modelling the time of sending one message
p Tmsg (the time spent in sending one message) can be modelled: Tmsg=ts+twL
where ts is the message startup time, tw is the transfer time per byte, L is the size of the message
Tmsg is a function of L; ts and tw are constants given a computing platform If we plot the function of Tmsg over L as a graph, the graph is a straight line.
Modelling the time of sending one message
p Tmsg (the time spent in sending one message) can be modelled: Tmsg=ts+twL
where ts is the message startup time, tw is the transfer time per byte, L is the size of the message
Tmsg is a function of L; ts and tw are constants given a computing platform If we plot the function of Tmsg over L, the function is a straight line.
Note how ts and tw are represented in the line.
Modelling communication time
Ø Communication time for calculating a subgrid can be computed as
Tcomm=2(ts+tw2NZ)
Ø Hence, the performance model for the execution time of calculating the velocity of the grid of points is
Tp=Tcomp+Tcomm=tc*N*(N/p)*Z+2(ts+tw2NZ)
Ø From the performance model of the execution time, we can know a lot of information.
p What will happen when we increase p?
• Execution time decreases with increasing P; the
proportion of communication cost increases
p Execution time increases with increasing N, Z, tc, ts, tw
2D decomposition
à If applying 2-D decomposition, then q Execution time can be modelled as:
1D decomposition
Tcomp = tc ⇤ N ⇤ (N/P ) ⇤ z Tcomm = 2(ts + tw2NZ)
2D decomposition
à If applying 2-D decomposition, then q Execution time can be modelled as
Tcomp = tcN2Z/P N
Tcomm = 4(ts + tw2pP Z)
TP = Tcomp + Tcomm = tcN2Z + ts4P + tw8NZpP P
1D decomposition
Tcomp = tc ⇤ N ⇤ (N/P ) ⇤ z Tcomm = 2(ts + tw2NZ)
Decomposition analysis
q Boundary surfaces between sub-grids are shaded.
q Data on boundary surfaces need to be communicated.
q The lower surface-to-volume ratio, the better:
• Surface = communication
• Volume = computation
• means lower proportion of communication time in the total execution time
Decomposition analysis
Consider a 3-D grid and assume the grid is a cube
qVolume of the grid is V,
qThe length of each dimension of the grid is V1/3 qV is partitioned over n processors
q The volume of each partition (i.e., the number of points in each partition) is V/n
Decomposition analysis
Sub-grid Volume
V1/3 / n V1/3 / n1/2
V1/3 / n1/3
V1/3 V1/3 / n1/2 V1/3 / n1/3
V1/3 V1/3 / n1/3
2V2/3 4V2/3 .n1/2 6c2/3
2n / V1/3 4n1/2 / V1/3
6n1/3/ V1/3
Sub-grid Length
Sub-grid Surfaces
Surface to Volume
Decomposition analysis
Sub-grid Volume
V1/3 / n1/2
V1/3 / n1/3
Sub-grid Length
V1/3 / n1/2
V1/3 / n1/3
V1/3 / n1/3
Sub-grid Surfaces
4V2/3 /n1/2
Surface to Volume
4n1/2 / V1/3
6n1/3/ V1/3
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com