COMP5426 Distributed
ility of Pa
rallel Systems
Copyright By PowCoder代写 加微信 powcoder
Performance
uential algorithm
The parallel the input
the numb
er of processors
the communication parame
rithm must the
context of the underlying parallel
Sequential runtime depends on the input size
The asymptotic runtime of a sequential program is
runtime size
by its runtime
of the machine
e analyzed
architecture
Scalability of
underlying architecture in order to gain
A key question faced by parallel programmers is how to
effectively utiliz
e the processing resources provided by the
Roughly speaking, scalability of a parallel system can be defined
as the ability to continue to achieve good performance with increased problem size and increased processing resources
Parallel system: The combination of a parallel architectur a parallel algorithm implemented on it
Scalability analysis plays an important role evaluation of large parallel systems
high performan
Predicting the performance of a parallel algorithm on a parallel architecture for a large number processors from the known performance on fewer processors;
For a fixed problem size, determining the optimal number of processors to be used and the maximum speedup that can be achieved
Recall speedup (𝑆𝑆) is the ratio of the time taken to solve a problem on a single processor to the time required to solve the same problem on a parallel computer with 𝑝𝑝 processing elements:
processing
the parallel runtime for the same problem using 𝑝𝑝
processors
goal is to maximi
𝑆𝑆 = 𝑇𝑇𝑠𝑠 𝑇𝑇𝑝𝑝
where 𝑇𝑇 is sequential runtime of a program and 𝑇𝑇 is 𝑠𝑠𝑝𝑝
ze speedup
affecting th
Algorithm/program related Partitioning (load balancing & task
Synchronization/communication
computation
Computational intensity
Architecture related
Number of available processors/cores
Communication networks
Topologies
of parallel
architecture
granularity)
Observe that
all processors
he total time collectively spent by
𝑝𝑝𝑇𝑇 (𝑝𝑝 is the nu
for synchronization and communication and idle time.
This is called the total overhead.
The overhead function (𝑇𝑇𝑜𝑜) 𝑇𝑇𝑜𝑜 = 𝑝𝑝𝑇𝑇𝑝𝑝 − 𝑇𝑇𝑠𝑠
𝑇𝑇𝑎𝑎𝑎𝑎𝑎𝑎 − 𝑇𝑇𝑠𝑠 is then
combined in non-useful work, e.g., co
all the proce
ocessors).
total time
𝑆𝑆= 𝑝𝑝𝑇𝑇𝑠𝑠
Speedup can be rewritten
𝑇𝑇+𝑇𝑇 𝑠𝑠𝑜𝑜
Efficiency
Recall efficiency is a measure of the fraction of time
which a processing el
𝐸𝐸==𝑠𝑠=𝑠𝑠 𝑝𝑝 ( 𝑝𝑝 𝑇𝑇𝑝𝑝 ) 𝑇𝑇𝑎𝑎 𝑎𝑎 𝑎𝑎
Efficiency is a function of total overhead 𝑇𝑇𝑜𝑜 𝑠𝑠 The smaller 𝑇𝑇𝑜𝑜, the higher the efficiency
ement is usefully employed
( 𝑇𝑇𝑠𝑠 + 𝑇𝑇𝑜𝑜 ) ( 1 + 𝑇𝑇𝑜𝑜 )
Inmanyapplicationsoverhead𝑇𝑇 isafunction
number of processors 𝑝𝑝
The efficiency will decrease as the number of processors increases if the overhead increases faster than the increase in the number of processors
arallel algorith
arallel system.
Often, using fewer processors improv performance of parallel systems.
Using fewer than the maximum possible number of processing elements to execute a
Granularity
Adding n numbers
processors
Granularity
l runtime 𝑇𝑇𝑠𝑠 = Θ(𝑛𝑛)
If an addition takes constant time, say, 𝑡𝑡 and
𝑐𝑐 communication of a single word takes time 𝑡𝑡 + 𝑡𝑡 we
have the parallel time 𝑇𝑇𝑝𝑝 = Θ(log 𝑛𝑛)
ding 𝑛𝑛 numbers on 𝑛𝑛 processors is
peedup of ad
The efficiency is
𝑆𝑆=𝑛𝑛 log 𝑛𝑛
𝑛𝑛1 log𝑛𝑛 =Θ
Granularity
Conside process are pow
r the problem of adding 𝑛𝑛 numbers on 𝑝𝑝
ing elements such that 𝑝𝑝 < 𝑛𝑛 and both 𝑛𝑛 and 𝑝𝑝 ers of 2.
Granularity
Conside process are pow
f 𝑛𝑛 ≫ 𝑝𝑝,
r the problem of adding 𝑛𝑛 numbers on 𝑝𝑝
ing elements such that 𝑝𝑝 < 𝑛𝑛 and both 𝑛𝑛 and 𝑝𝑝 ers of 2.
𝑇𝑇 =𝛩𝛩(𝑛𝑛+log𝑝𝑝) 𝑝𝑝𝑝𝑝𝑆𝑆=𝑛𝑛=𝑝𝑝
The parallel runtime
Thespeedupis 𝑛𝑛𝑝𝑝+log𝑝𝑝 1+𝑛𝑛𝑝𝑝log𝑝𝑝 we can achieve good speedup
of this algorithm
Granularity
Conside process are pow
r the problem of adding 𝑛𝑛 numbers on 𝑝𝑝
ing elements such that 𝑝𝑝 < 𝑛𝑛 and both 𝑛𝑛 and 𝑝𝑝 ers of 2.
Workload Scalin
When analyzing the scalabilit know workload scaling models processors are increased
Problem constrain
ed (PC): or
i.e., a constant workload or fixed problem size
Time constrained (TC): or Fixed-time Model, i.e.,
there hen th
are two well- e number of
em Constrained
We want to measure the speedup of a parallel program by keeping the problem size (workload) fixed as the size of the parallel machine (number of
s als d to
o called fixed workload the well-known Amdahl’
(PC) Scalin
speedup s law
t consider
peedup will be 𝑠𝑠
heads, just a
that contains two parts, one being naturally sequ 𝛽𝛽 and the other being naturally parallel 1 − 𝛽𝛽
We be obtaine
best case, ignoring overhead and communication costs Assume sequential runtime 𝑇𝑇 and # processes 𝑝𝑝
𝑠𝑠𝑇𝑇 𝑇𝑇𝑝𝑝 = 𝛽𝛽𝑇𝑇𝑠𝑠 + (1 − 𝛽𝛽) 𝑝𝑝𝑠𝑠
𝑆𝑆=𝑇𝑇= 1 when𝑝𝑝→∞ 𝑇𝑇𝑝𝑝 𝛽𝛽+(1−𝛽𝛽
ined Scali
rgest problem size
machine with about the same
original problem on a single processor
When increasing the number of processors, we
parallelexecutiontime𝑇𝑇 remainsconstant 𝑝𝑝
also increase the problem size such that the
xed-time speedup
to the well-known Gustafson’s law
execution time
hl’s Law prevente
signers from
exploiting
elism for many years because even
parallel programs contain a small amount of natural serial code.
In 1988 and Ed Barsis, two of Sandia Labs researchers derived an alternate to Amdahl’s
ey is in observing that 𝛽𝛽 (sequential fraction), 𝑛𝑛 (problem size) and 𝑝𝑝 (number of processors) are not independent of each other in many practical problems.
Gustafson interpreted their law to mean that parallelism can be used to increase the size of problem (keeping 𝑝𝑝 con
Processors
n = 10,000
= 𝑇𝑇𝑠𝑠 = 𝛼𝛼 + 1 − 𝛼𝛼 𝑇𝑇𝑝𝑝
-time speedup is
𝛼𝛼(𝑝𝑝 − 1)
In the Gustafson-Barsis derivation of parallelruntime𝑇𝑇 iskeptasaconstant,,i.e.,when
increasing the number of processors, we also increase the amount of work
𝑊𝑊′ = 𝛼𝛼𝑊𝑊 + 𝑝𝑝 1 − 𝛼𝛼 𝑊𝑊
ere 𝛼𝛼 is the sequential fraction of the program,
parallel rt is now a variable proportional to 𝑝𝑝 The parallel runtime is then 𝑇𝑇𝑝𝑝 = 𝛼𝛼𝑊𝑊 + 1 − 𝛼𝛼 𝑊𝑊 =
𝑇𝑇 is normalized to W, which is a constant
Forthesameamountofwork𝑊𝑊 whenusingjustone
uential time is 𝑇𝑇𝑠𝑠 = 𝛼𝛼𝑊𝑊 + 𝑝𝑝 1 − 𝛼𝛼 𝑊𝑊
10 processors
achieved by
Assume the sequential fraction of a program 𝛼𝛼 =
0.1 number of processors 𝑝𝑝 = 10 𝑆𝑆=𝑝𝑝−𝛼𝛼 𝑝𝑝−1
=10−0.1 10−
A nine-fold increase in speed
written as 𝐸𝐸 =
For many problems in practice, the total
The efficiency of a
1+𝑇𝑇𝑜𝑜 𝑇𝑇𝑠𝑠
overhead𝑇𝑇 isanincreasingfunctionof𝑝𝑝.
For a given problem size (i.e., the value of 𝑇𝑇𝑠𝑠
remains constant), as we increase the number of processors, 𝑇𝑇 increases and the overall
efficiency then goes down
This is the case for a Amdahl’s law!
allel prog
programs –
However, Total overhead function of both problem
processors 𝑝𝑝
Therefore, high p
should be used to
n many cases, 𝑇𝑇𝑜𝑜 grows sub-
respect to 𝑛𝑛. The efficiency increases if the problem size is increased keeping the number
function 𝑇𝑇𝑜𝑜 is a size 𝑛𝑛 and the
large size problems
on of efficiency:
(a) for a fixed prob
Metric of of a parallel system: measure of the system’s ability to increase performance as numbe processors increases
A scalable system maintain are added
simultaneously increase the problem size and number of processors to keep efficiency constant
Question: what is the rate at which the problem size must increase with respect to the number of processing elements to keep the efficiency fixed?
This rate determines the scalability of
The slower
this rate, th
processors
If we kn 𝑊𝑊 and 𝑝𝑝
the amount of
Metric of Scalabi
asymptotic number of operations associated
with the best serial algorithm to solve the
size 𝑛𝑛 of th
Note 𝑊𝑊 is a function of problem
work 𝑊𝑊 𝑇𝑇 𝑜𝑜
Write efficiency as a function
Setting efficiency as a constant, we n between 𝑊𝑊, 𝑝𝑝 and 𝑇𝑇
ow 𝑇𝑇𝑜𝑜, we obtain the relation between
and thus the relation between 𝑛𝑛 and 𝑝𝑝
𝑇𝑇𝑝𝑝 = 𝑊𝑊+𝑇𝑇𝑜𝑜(𝑊𝑊,𝑝𝑝) 𝑝𝑝
sulting expression for speed
We can write parallel runtime as:
= 𝐸𝐸 𝑇𝑇𝑜𝑜 1−𝐸𝐸
Metric of Scalabi
𝑆𝑆=𝑊𝑊= 𝑝𝑝𝑊𝑊
𝑇𝑇𝑝𝑝 𝑊𝑊+𝑇𝑇𝑜𝑜(𝑊𝑊,𝑝𝑝)
𝐸𝐸=𝑆𝑆=𝑊𝑊= 1
𝑝𝑝 𝑊𝑊 + 𝑇𝑇𝑜𝑜 ( 𝑊𝑊 , 𝑝𝑝 ) 1 + 𝑇𝑇𝑜𝑜 ( 𝑊𝑊 , 𝑝𝑝 ) / 𝑊𝑊
the expression for efficiency as
𝑇𝑇 (𝑊𝑊, 𝑝𝑝) = 1 − 𝐸𝐸 𝑜𝑜𝑊𝑊𝐸𝐸
a desired efficiency constant
𝑊𝑊,𝑝𝑝 =𝐾𝐾𝑇𝑇𝑜𝑜(𝑊𝑊,𝑝𝑝) (where𝑘𝑘= 𝐸𝐸 ) 1−𝐸𝐸
problem size
since the to
𝑝𝑝 and 𝑊𝑊 and the amount function of 𝑛𝑛
problem size
function of p – This function
isoefficiency function
Metric of Scalabi
𝑛𝑛 of 𝑝𝑝 by ebraic manipulations
function of work 𝑊𝑊 is
system can maintain a constant efficiency and achieve speedups increasing in proportion to
essing element
be expressed as
is called the
hich a parallel
Metric of Scalabi
lements is
by 𝑝𝑝 log 𝑝𝑝, we obtain
𝑛𝑛 = 𝐾𝐾𝑝𝑝log𝑝𝑝
If the number of processing elements is increased from 𝑝𝑝 to 𝑝𝑝𝑝, the problem size (in this case, 𝑊𝑊 = 𝑛𝑛 ) must be increased by a factor of (𝑝𝑝𝑝 log 𝑝𝑝𝑝) / (𝑝𝑝 log 𝑝𝑝)
on 𝑝𝑝 processing
The overhead 𝑇𝑇𝑜𝑜 for the problem of adding 𝑛𝑛 numberson𝑝𝑝processinge approximately 𝑝𝑝log𝑝𝑝
Substituting 𝑇𝑇𝑜𝑜
ymptotic Analysi
Consider the problem of sorting a list of n numbers. The fastest serial programs for this problem run in time Θ(𝑛𝑛 𝑙𝑙𝑙𝑙𝑙𝑙 𝑛𝑛). Consider four parallel algorith
A2, A3, and A4 as
ymptotic Analysi
If the metric is
followed by A3,
In terms of followed by
ptimal, A1 an
It is important analysis and to
speed, algorithm A4, and A2 (in or
efficiency, A2 A3 and A1.
In terms of cost, algorithms
to identify the objectives use appropriate metrics!
A1 is the best, der of increasing
A4 are the best,
A number of ot
needs of applications
her metrics have also
For real-time applications, the objective is to scale up a system to accomplish a task in
a specified
time bound
metrics operate at the limit o estimate performance
f memory and
processing elements.
If scaled speedup is close to li
ystem is considered
If the isoefficiency is near linear, speedup curve is close to linear as
btained when the problem s
scaled well.
dahl’s law
Time const
Gustafson’s law
Scalability analysis plays an important role in performance evaluation of large parallel systems
mportant performance metrics 𝑜𝑜 Speedup 𝑆𝑆, efficiency 𝐸𝐸, total overhead 𝑇𝑇
𝑆𝑆 = 𝑇𝑇𝑠𝑠 = 𝑇𝑇𝑠𝑠 = 𝑝𝑝 , and 𝐸𝐸 = 𝑆𝑆 = 1
𝑇𝑇𝑝𝑝 ( 𝑇𝑇𝑠𝑠 + 𝑇𝑇𝑜𝑜 ) / 𝑝𝑝 1 + 𝑇𝑇𝑜𝑜 / 𝑇𝑇𝑠𝑠 𝑝𝑝 1 + 𝑇𝑇𝑜𝑜 / 𝑇𝑇𝑠𝑠
Workload scaling models:
High performance compu large size problems
ined (PC): or
Fixed-load Model,
TC): or Fixed-time Model, lead
is rate deter e slower this
efficiency
Isoefficiency - what is the rate at which the problem size must increase with respect to the number of processing elements to keep the efficiency fixed?
mines the scalabilit rate, the better
he problem size 𝑛𝑛 expressed as a function of the number of processing elements 𝑝𝑝
function: t
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com