COMP5426 Distributed
Efficiency
Copyright By PowCoder代写 加微信 powcoder
When using p processo
xpect p speedu
bably not!
Overheads in addition to introduced in most parall
The overheads include
rs to solve a pr
efficiency
the computation el programs
Process/thread communication or synchronization
Workload imbalance among available processors/threads
Extra work introduce
parallelism
d to manage the com
putation and
hl’s is used using multi
It shows that
hard limit on the pot program
to predict the ple processors
total amou
purely sequential, and the other is perfectly parallelizable
𝑆𝑆=𝑇𝑇 𝑇𝑇𝑠𝑠
𝑇𝑇𝑠𝑠 𝛽𝛽𝑇𝑇𝑠𝑠 + (1
theoretical speedup for parallel processing
rial parts of a program impo
ential speedup from parallelizing
s for solving given problem is divided into two parts, one part 𝛽𝛽
of operation
𝛽𝛽)𝑇𝑇𝑠𝑠 =𝛽𝛽 𝑝𝑝
𝑝𝑝 𝑝𝑝+1−𝛽𝛽
How a problem specificatio algorithm that displays con
Parallel algorithm de simple recipes
arallel Algo
requires the
t is commonly r
sort of integrati
May need new ideas that have not been studied before
n is translated currency, scala
as “creativ
into an bility, an
arallel algorith
First part consi problem to iden
parallelism
Try to find t applicability
arallel Algo
nd part considers e
specific ma
of algorithms and f
architectures
Try to identify relationship between properties
ders characteristics of the tify the maximum degree of
chine independent
heoretical
potential a
eatures of
be divided
plementation
General design process
Communication/synchronizat
arallel Algo
Partitioning: divide a large task into multi ones which can be executed concurrently
execution of concurrent tasks and establish appropriate communication/synchronization structures
e above stages
for parallel execution
Assignment: reorganize tasks and assign them to
ltiple processes/thread
involves th
recognizing oppo
s – machine dependent
ple smaller
Partitioning
Expose opportunities for parallel execution
The focus is on defining a large number of small ta each of which consists of the computation and the
Task partitioning Divide the compu
partitioning
Then associa
Data partitioning
Divide data into pieces first
Then associa
ich one should be
computations with
applied depending on the
eces first
e computatio
ation/ tasks generated by a partition are intended to execute concurrently but cannot, in general, execute independently
allow computation t
Communication/synchronization is then required manage the data transfer and/or coordinate the execution of tasks
Organizing this communication in can be challenging
Data must then be transferred between tasks so as
chronizatio
Partitionin
rtitioning:
ion vector y,
Computing each
Observations
Task size is uniform
No dependences between n
Embarrassingly
element of
Partitionin
Data partitioning:
We can further partition each row of A and vector b for each inner product and then associate one multiplication with each pair of data items as a task
Also other ways of
partitioning
multiplicatio
Observations
Task size is uniform
But dependences between tasks for inner product of two vectors!
structures of
To design parallel algorithms for solving a give problem, typically the first step is to detect p
ential algorit
If the algorithm’s parallel structure is determined, many subsequent decisions become obvious
Task-dependenc
dentify program’s parallel
structures
ependencies between
edge set features
structur
In a task-dependency graph each node represents task and the arrows between nodes indicate
It is often enough to present one small graph
The main purpose is to show the parallel structure to demonstrate various properties of the algorithm, e.g.,
input data –
s – regularity
o construct the gra
prediction
MV multiplicatio
tiplication
Arrows indicate
additions (data
ependency)
Each vertical arrow-node
of dot product 𝑦𝑦 𝑖𝑖
line represents an output
sequential
Another example – matrix multiplication:
that all ou
elements can be computed concurrently without dependency
another example:
for (i=0; i