编程辅导 COMP5426 Distributed

COMP5426 Distributed
ility of Pa
rallel Systems

Copyright By PowCoder代写 加微信 powcoder

Performance
uential algorithm
The parallel  the input
 the numb
er of processors
 the communication parame
rithm must the
context of the underlying parallel
 Sequential runtime depends on the input size
 The asymptotic runtime of a sequential program is
runtime size
by its runtime
of the machine
e analyzed
architecture

Scalability of
underlying architecture in order to gain
A key question faced by parallel programmers is how to
effectively utiliz
e the processing resources provided by the
Roughly speaking, scalability of a parallel system can be defined
as the ability to continue to achieve good performance with increased problem size and increased processing resources
 Parallel system: The combination of a parallel architectur a parallel algorithm implemented on it
Scalability analysis plays an important role evaluation of large parallel systems
high performan
 Predicting the performance of a parallel algorithm on a parallel architecture for a large number processors from the known performance on fewer processors;
 For a fixed problem size, determining the optimal number of processors to be used and the maximum speedup that can be achieved

Recall speedup (𝑆𝑆) is the ratio of the time taken to solve a problem on a single processor to the time required to solve the same problem on a parallel computer with 𝑝𝑝 processing elements:
processing
the parallel runtime for the same problem using 𝑝𝑝
processors
goal is to maximi
𝑆𝑆 = 𝑇𝑇𝑠𝑠 𝑇𝑇𝑝𝑝
where 𝑇𝑇 is sequential runtime of a program and 𝑇𝑇 is 𝑠𝑠𝑝𝑝
ze speedup

affecting th
 Algorithm/program related Partitioning (load balancing & task
Synchronization/communication
computation
Computational intensity
 Architecture related
Number of available processors/cores
Communication networks
Topologies
of parallel
architecture
granularity)

Observe that
all processors
he total time collectively spent by
𝑝𝑝𝑇𝑇 (𝑝𝑝 is the nu
for synchronization and communication and idle time.
This is called the total overhead.
The overhead function (𝑇𝑇𝑜𝑜) 𝑇𝑇𝑜𝑜 = 𝑝𝑝𝑇𝑇𝑝𝑝 − 𝑇𝑇𝑠𝑠
𝑇𝑇𝑎𝑎𝑎𝑎𝑎𝑎 − 𝑇𝑇𝑠𝑠 is then
combined in non-useful work, e.g., co
all the proce
ocessors).
total time
𝑆𝑆= 𝑝𝑝𝑇𝑇𝑠𝑠
Speedup can be rewritten
𝑇𝑇+𝑇𝑇 𝑠𝑠𝑜𝑜

Efficiency
Recall efficiency is a measure of the fraction of time
which a processing el
𝐸𝐸==𝑠𝑠=𝑠𝑠 𝑝𝑝 ( 𝑝𝑝 𝑇𝑇𝑝𝑝 ) 𝑇𝑇𝑎𝑎 𝑎𝑎 𝑎𝑎
Efficiency is a function of total overhead 𝑇𝑇𝑜𝑜 𝑠𝑠  The smaller 𝑇𝑇𝑜𝑜, the higher the efficiency
ement is usefully employed
( 𝑇𝑇𝑠𝑠 + 𝑇𝑇𝑜𝑜 ) ( 1 + 𝑇𝑇𝑜𝑜 )
Inmanyapplicationsoverhead𝑇𝑇 isafunction
number of processors 𝑝𝑝
The efficiency will decrease as the number of processors increases if the overhead increases faster than the increase in the number of processors

arallel algorith
arallel system.
Often, using fewer processors improv performance of parallel systems.
Using fewer than the maximum possible number of processing elements to execute a

Granularity
Adding n numbers
processors

Granularity
l runtime 𝑇𝑇𝑠𝑠 = Θ(𝑛𝑛)
If an addition takes constant time, say, 𝑡𝑡 and
𝑐𝑐 communication of a single word takes time 𝑡𝑡 + 𝑡𝑡 we
have the parallel time 𝑇𝑇𝑝𝑝 = Θ(log 𝑛𝑛)
ding 𝑛𝑛 numbers on 𝑛𝑛 processors is
peedup of ad
The efficiency is
𝑆𝑆=𝑛𝑛 log 𝑛𝑛
𝑛𝑛1 log𝑛𝑛 =Θ

Granularity
Conside process are pow
r the problem of adding 𝑛𝑛 numbers on 𝑝𝑝
ing elements such that 𝑝𝑝 < 𝑛𝑛 and both 𝑛𝑛 and 𝑝𝑝 ers of 2. Granularity Conside process are pow f 𝑛𝑛 ≫ 𝑝𝑝, r the problem of adding 𝑛𝑛 numbers on 𝑝𝑝 ing elements such that 𝑝𝑝 < 𝑛𝑛 and both 𝑛𝑛 and 𝑝𝑝 ers of 2. 𝑇𝑇 =𝛩𝛩(𝑛𝑛+log𝑝𝑝) 𝑝𝑝𝑝𝑝𝑆𝑆=𝑛𝑛=𝑝𝑝 The parallel runtime Thespeedupis 𝑛𝑛𝑝𝑝+log𝑝𝑝 1+𝑛𝑛𝑝𝑝log𝑝𝑝 we can achieve good speedup of this algorithm Granularity Conside process are pow r the problem of adding 𝑛𝑛 numbers on 𝑝𝑝 ing elements such that 𝑝𝑝 < 𝑛𝑛 and both 𝑛𝑛 and 𝑝𝑝 ers of 2. Workload Scalin When analyzing the scalabilit know workload scaling models processors are increased  Problem constrain ed (PC): or i.e., a constant workload or fixed problem size  Time constrained (TC): or Fixed-time Model, i.e., there hen th are two well- e number of em Constrained We want to measure the speedup of a parallel program by keeping the problem size (workload) fixed as the size of the parallel machine (number of s als d to o called fixed workload the well-known Amdahl’ (PC) Scalin speedup s law t consider peedup will be 𝑠𝑠 heads, just a that contains two parts, one being naturally sequ 𝛽𝛽 and the other being naturally parallel 1 − 𝛽𝛽 We be obtaine best case, ignoring overhead and communication costs Assume sequential runtime 𝑇𝑇 and # processes 𝑝𝑝 𝑠𝑠𝑇𝑇 𝑇𝑇𝑝𝑝 = 𝛽𝛽𝑇𝑇𝑠𝑠 + (1 − 𝛽𝛽) 𝑝𝑝𝑠𝑠 𝑆𝑆=𝑇𝑇= 1 when𝑝𝑝→∞ 𝑇𝑇𝑝𝑝 𝛽𝛽+(1−𝛽𝛽 ined Scali rgest problem size machine with about the same original problem on a single processor  When increasing the number of processors, we parallelexecutiontime𝑇𝑇 remainsconstant 𝑝𝑝 also increase the problem size such that the xed-time speedup to the well-known Gustafson’s law execution time hl’s Law prevente signers from exploiting elism for many years because even parallel programs contain a small amount of natural serial code. In 1988 and Ed Barsis, two of Sandia Labs researchers derived an alternate to Amdahl’s ey is in observing that 𝛽𝛽 (sequential fraction), 𝑛𝑛 (problem size) and 𝑝𝑝 (number of processors) are not independent of each other in many practical problems. Gustafson interpreted their law to mean that parallelism can be used to increase the size of problem (keeping 𝑝𝑝 con Processors n = 10,000 = 𝑇𝑇𝑠𝑠 = 𝛼𝛼 + 1 − 𝛼𝛼 𝑇𝑇𝑝𝑝 -time speedup is 𝛼𝛼(𝑝𝑝 − 1) In the Gustafson-Barsis derivation of parallelruntime𝑇𝑇 iskeptasaconstant,,i.e.,when increasing the number of processors, we also increase the amount of work 𝑊𝑊′ = 𝛼𝛼𝑊𝑊 + 𝑝𝑝 1 − 𝛼𝛼 𝑊𝑊 ere 𝛼𝛼 is the sequential fraction of the program, parallel rt is now a variable proportional to 𝑝𝑝 The parallel runtime is then 𝑇𝑇𝑝𝑝 = 𝛼𝛼𝑊𝑊 + 1 − 𝛼𝛼 𝑊𝑊 =  𝑇𝑇 is normalized to W, which is a constant Forthesameamountofwork𝑊𝑊 whenusingjustone uential time is 𝑇𝑇𝑠𝑠 = 𝛼𝛼𝑊𝑊 + 𝑝𝑝 1 − 𝛼𝛼 𝑊𝑊 10 processors achieved by  Assume the sequential fraction of a program 𝛼𝛼 = 0.1 number of processors 𝑝𝑝 = 10 𝑆𝑆=𝑝𝑝−𝛼𝛼 𝑝𝑝−1 =10−0.1 10− A nine-fold increase in speed written as 𝐸𝐸 = For many problems in practice, the total The efficiency of a 1+𝑇𝑇𝑜𝑜 𝑇𝑇𝑠𝑠 overhead𝑇𝑇 isanincreasingfunctionof𝑝𝑝. For a given problem size (i.e., the value of 𝑇𝑇𝑠𝑠 remains constant), as we increase the number of processors, 𝑇𝑇 increases and the overall efficiency then goes down This is the case for a Amdahl’s law! allel prog programs – However, Total overhead function of both problem processors 𝑝𝑝 Therefore, high p should be used to n many cases, 𝑇𝑇𝑜𝑜 grows sub- respect to 𝑛𝑛. The efficiency increases if the problem size is increased keeping the number function 𝑇𝑇𝑜𝑜 is a size 𝑛𝑛 and the large size problems on of efficiency: (a) for a fixed prob Metric of of a parallel system: measure of the system’s ability to increase performance as numbe processors increases A scalable system maintain are added  simultaneously increase the problem size and number of processors to keep efficiency constant Question: what is the rate at which the problem size must increase with respect to the number of processing elements to keep the efficiency fixed?  This rate determines the scalability of The slower this rate, th processors If we kn 𝑊𝑊 and 𝑝𝑝 the amount of Metric of Scalabi asymptotic number of operations associated with the best serial algorithm to solve the size 𝑛𝑛 of th  Note 𝑊𝑊 is a function of problem work 𝑊𝑊 𝑇𝑇 𝑜𝑜 Write efficiency as a function Setting efficiency as a constant, we n between 𝑊𝑊, 𝑝𝑝 and 𝑇𝑇 ow 𝑇𝑇𝑜𝑜, we obtain the relation between and thus the relation between 𝑛𝑛 and 𝑝𝑝 𝑇𝑇𝑝𝑝 = 𝑊𝑊+𝑇𝑇𝑜𝑜(𝑊𝑊,𝑝𝑝) 𝑝𝑝 sulting expression for speed We can write parallel runtime as: = 𝐸𝐸 𝑇𝑇𝑜𝑜 1−𝐸𝐸 Metric of Scalabi 𝑆𝑆=𝑊𝑊= 𝑝𝑝𝑊𝑊 𝑇𝑇𝑝𝑝 𝑊𝑊+𝑇𝑇𝑜𝑜(𝑊𝑊,𝑝𝑝) 𝐸𝐸=𝑆𝑆=𝑊𝑊= 1 𝑝𝑝 𝑊𝑊 + 𝑇𝑇𝑜𝑜 ( 𝑊𝑊 , 𝑝𝑝 ) 1 + 𝑇𝑇𝑜𝑜 ( 𝑊𝑊 , 𝑝𝑝 ) / 𝑊𝑊 the expression for efficiency as 𝑇𝑇 (𝑊𝑊, 𝑝𝑝) = 1 − 𝐸𝐸 𝑜𝑜𝑊𝑊𝐸𝐸 a desired efficiency constant 𝑊𝑊,𝑝𝑝 =𝐾𝐾𝑇𝑇𝑜𝑜(𝑊𝑊,𝑝𝑝) (where𝑘𝑘= 𝐸𝐸 ) 1−𝐸𝐸 problem size since the to 𝑝𝑝 and 𝑊𝑊 and the amount function of 𝑛𝑛 problem size function of p – This function isoefficiency function Metric of Scalabi 𝑛𝑛 of 𝑝𝑝 by ebraic manipulations function of work 𝑊𝑊 is system can maintain a constant efficiency and achieve speedups increasing in proportion to essing element be expressed as is called the hich a parallel Metric of Scalabi lements is by 𝑝𝑝 log 𝑝𝑝, we obtain 𝑛𝑛 = 𝐾𝐾𝑝𝑝log𝑝𝑝  If the number of processing elements is increased from 𝑝𝑝 to 𝑝𝑝𝑝, the problem size (in this case, 𝑊𝑊 = 𝑛𝑛 ) must be increased by a factor of (𝑝𝑝𝑝 log 𝑝𝑝𝑝) / (𝑝𝑝 log 𝑝𝑝) on 𝑝𝑝 processing  The overhead 𝑇𝑇𝑜𝑜 for the problem of adding 𝑛𝑛 numberson𝑝𝑝processinge approximately 𝑝𝑝log𝑝𝑝  Substituting 𝑇𝑇𝑜𝑜 ymptotic Analysi Consider the problem of sorting a list of n numbers. The fastest serial programs for this problem run in time Θ(𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙 𝑛𝑛). Consider four parallel algorith A2, A3, and A4 as ymptotic Analysi If the metric is followed by A3, In terms of followed by ptimal, A1 an It is important analysis and to speed, algorithm A4, and A2 (in or efficiency, A2 A3 and A1. In terms of cost, algorithms to identify the objectives use appropriate metrics! A1 is the best, der of increasing A4 are the best, A number of ot needs of applications her metrics have also  For real-time applications, the objective is to scale up a system to accomplish a task in a specified time bound metrics operate at the limit o estimate performance f memory and processing elements. If scaled speedup is close to li ystem is considered If the isoefficiency is near linear, speedup curve is close to linear as btained when the problem s scaled well. dahl’s law  Time const Gustafson’s law Scalability analysis plays an important role in performance evaluation of large parallel systems mportant performance metrics 𝑜𝑜  Speedup 𝑆𝑆, efficiency 𝐸𝐸, total overhead 𝑇𝑇  𝑆𝑆 = 𝑇𝑇𝑠𝑠 = 𝑇𝑇𝑠𝑠 = 𝑝𝑝 , and 𝐸𝐸 = 𝑆𝑆 = 1 𝑇𝑇𝑝𝑝 ( 𝑇𝑇𝑠𝑠 + 𝑇𝑇𝑜𝑜 ) / 𝑝𝑝 1 + 𝑇𝑇𝑜𝑜 / 𝑇𝑇𝑠𝑠 𝑝𝑝 1 + 𝑇𝑇𝑜𝑜 / 𝑇𝑇𝑠𝑠 Workload scaling models: High performance compu large size problems ined (PC): or Fixed-load Model, TC): or Fixed-time Model, lead is rate deter e slower this efficiency Isoefficiency - what is the rate at which the problem size must increase with respect to the number of processing elements to keep the efficiency fixed? mines the scalabilit rate, the better he problem size 𝑛𝑛 expressed as a function of the number of processing elements 𝑝𝑝 function: t 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com