COMP5426 Distributed
Load Balancing
15 17 19 21 23 25 27 29
Copyright By PowCoder代写 加微信 powcoder
In practice, the actual speedup for solving a problem often smaller than the ideal speedup and worse as th
cessors increases
15 17 19 21 23 25 27 29
Typically
of Ineffici
Poor single processor performance
Too much parallelism overhead
Synchronization, communicatio
oad imbalance
Different amounts of
Different speeds of computing resou
work across proces
communication
Recall Amdahl’s law: a small amount of load imbal can significantly bound maximum speedup
strategies for
balance workload across the processes/threads
The strategies can be classified into two categories:
assignment
Static task
Dynamic task assignment (
ssignment to
job scheduling
Static Task Assignm
In static assignment the assignment of tasks processes/threads is pre-determined
Simple and zero runtime overhead
Static assig
predictable
t is applicable when the amount of tasks are
of task partitioning
e.g., dense m
meshes, can be easily decomposed
equal size with regular data depen
assignment relatively easy
Some other problems, e.g., sparse meshes, and graphs, are more com how to balance workload, minimize and redundant computation
ent is directly related to the quality
ation, regular
into a number of tasks
dency pattern – task
matrices, unstructured plicated, need to consider
communication overhead,
Graph partitioning of smaller graphs
It has a wid
rtitioning
e range of
application
a graph into a number
s in practice
problem can be reformulated as a graph problem e.g., scientific computing, VLSI circuit design,
computer graph
We are interested in large computation/analysis using
thus graph
processing step
ics, and web grap
partitioning
parallel machines
rtitioning
methods for solving
are grid-oriented, e.g.,
The 2D space is divided into a number of triangles
Each triangle, represented by a vertex in the correspo dependency graph (or mesh), has a state
The neighbouring triangles, represen
the vertices in the
ate update of a
rential equations
ted by edges connectin
involves the state
Definition of Graph Partiti
Partitioning
Information
There are several
solve the problem
3D space (e.g.,
e.g., the graph is derived from some structure in 2D
geometric coordinates can be attached to the vertices and therefore the geometric layout of graph is known
partitioning
PDEs), the
ate Bisection
oordinate bisection
x-coordinate which divides
For recursive bisection we
z- coordinates
This leads to a better lower edge cut size
partitioning method
We may simply draw a hyperplane orthogonal, say, to
simplest coord
alternately b
aspect ratio
inate-based
hopefully a
ate Bisection
ate Bisection
age – coordinate de
The same structure may partitioned quite differently in different coordinate systems
e.g., the partition in (b) is not good as the edge cut set is not minimized in this simple example
(𝑥𝑥, 𝑦𝑦) system let point P h polar coordinates (𝑟𝑟, 𝛼𝛼) a
(𝑟𝑟, 𝛼𝛼 − 𝜃𝜃) in the (𝑥𝑥′, 𝑦𝑦′) system We have
𝑥𝑥=𝑟𝑟cos and
𝑥𝑥′ =𝑟𝑟cos(𝛼𝛼−𝜃𝜃)
and =𝑟𝑟cos𝛼𝛼cos𝜃𝜃+𝑟𝑟sin𝛼𝛼sin𝜃𝜃
= 𝑥𝑥 cos 𝜃𝜃 + 𝑦𝑦 sin 𝜃𝜃 𝑦𝑦′ =𝑟𝑟sin(𝛼𝛼−𝜃𝜃)
= 𝑟𝑟 sin 𝛼𝛼 cos 𝜃𝜃 − 𝑟𝑟 cos 𝛼𝛼 sin 𝜃𝜃 = −𝑥𝑥 sin 𝜃𝜃 + 𝑦𝑦 cos 𝜃𝜃
of 𝐿𝐿, i.e., We then
�𝑑𝑑= 𝑖𝑖2 =
𝑢𝑢 = (𝑣𝑣, 𝑤𝑤) know 𝑑𝑑𝑖𝑖 of
t rotate x- and
𝑣𝑣2 �(𝑥𝑥 − 𝑣𝑣2𝑠𝑠𝑥𝑥𝑥𝑥 + 2𝑣𝑣
𝑣𝑣2 + 𝑤𝑤2 = 1
ordinates, we need to find the
𝑥𝑥̅) +2𝑣𝑣𝑤𝑤�(𝑥𝑥−𝑥𝑥̅) 𝑦𝑦−𝑦𝑦�
each mesh point and with
𝑤𝑤𝑠𝑠𝑥𝑥𝑥𝑥 + 𝑤𝑤2𝑠𝑠𝑥𝑥𝑥𝑥
2 𝑦𝑦 − 𝑦𝑦� )
th we have
Partitioning
require ge
In many practical problems graph don’t have geometric
e.g., the
Information
The coordinate-based methods, such as inertial
model of WWW, vertices
ometric information about the
, however, vert information
improve an i
local charac
spectral bisec
t takes a kind of
nitial partition
The Kernigan-Lin (or K-L) algorithm does not create partitions but rather improves them iteratively
local view of
neighbouring vertices
Thus it nicely complements algorithms which have a
view of the problem but t
the problem,
teristics, such as inertial bisection and
Multilevel Parti
n many app
processing step
The multilevel
rithms have been
However, these for large graphs
speed up the computation
It reduces the size of the gr
collapsing
partitioning
small size grap
to construct a partition for t
partitioning is just
partition, e.g.,
methods are very expensive, especially
aph gradually b
he original graph
that find a
first pre-
ncoarsens it
Multilevel Parti
Multilevel phases
Coarsening
Initial partition Refinement
algorithms
Multilevel Parti
arsening phase,
a sequence of smaller graph
s, each with
vertices, is constructed
It can be achieved by finding a maximal the matched vertices into a multinode
A matching of a graph, is a set of edges, no two incident on the same vertex
If a matching contains all possible such edges (desirable f graph coarsening, it is call maximal matching
matching and collapsing
Multilevel Parti
respect to the original graph
bisection of
a coarser graph to be goo
The weight of a multinode is set equal to the sum of weights of the vertices collapsed in the multinode
The edges of a multinode are the union of the edges of the vertices collapsed in the multinode
Multilevel Parti
graph is c
In the partitioning phase
Any partitioning algorithm It is fast and efficient as
oarse enough, it can be
can be used the size of the
partitione
graph is small
relatively easy t
e vertex weights and edge weights are
es, the bisection of the coarser graph
o be refined
preserved in
original grap
Multilevel Parti
Given the part
partition of the finer graph
rser graph, we need to
In the refinement phase we now have more degree of freedom find a better partition (improved cut size and balance)
The refinement can be done by using one of the variants of K-L algorithm
often is a
trade off between the
of the sol
only a solution of medium
ly interested i
Some algorithms run quite fast quality
Some others take a long time but deliver excellent solu Even others can be tuned between both extremes
The choice of time vs. quality depends on the intended application
For VLSI-design we prefer the quality as even solution can save real money
For sparse matrix computation we are time
the execution time for the graph partitioning has to be less than the time saved by parallel matrix computation
hus there is no single best solution for all situations
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com