CS代写 COMP5426 Distributed

COMP5426 Distributed
Load Balancing

15 17 19 21 23 25 27 29

Copyright By PowCoder代写 加微信 powcoder

In practice, the actual speedup for solving a problem often smaller than the ideal speedup and worse as th
cessors increases
15 17 19 21 23 25 27 29

 Typically
of Ineffici
Poor single processor performance
Too much parallelism overhead
 Synchronization, communicatio
oad imbalance
 Different amounts of
 Different speeds of computing resou
work across proces
communication

Recall Amdahl’s law: a small amount of load imbal can significantly bound maximum speedup
strategies for
balance workload across the processes/threads
The strategies can be classified into two categories:
assignment
 Static task
 Dynamic task assignment (
ssignment to
job scheduling

Static Task Assignm
In static assignment the assignment of tasks processes/threads is pre-determined
 Simple and zero runtime overhead
Static assig
predictable
t is applicable when the amount of tasks are
of task partitioning
e.g., dense m
meshes, can be easily decomposed
equal size with regular data depen
assignment relatively easy
 Some other problems, e.g., sparse meshes, and graphs, are more com how to balance workload, minimize and redundant computation
ent is directly related to the quality
ation, regular
into a number of tasks
dency pattern – task
matrices, unstructured plicated, need to consider
communication overhead,

Graph partitioning of smaller graphs
It has a wid
rtitioning
e range of
application
a graph into a number
s in practice
problem can be reformulated as a graph problem  e.g., scientific computing, VLSI circuit design,
computer graph
We are interested in large computation/analysis using
thus graph
processing step
ics, and web grap
partitioning
parallel machines

rtitioning
methods for solving
are grid-oriented, e.g.,
 The 2D space is divided into a number of triangles
 Each triangle, represented by a vertex in the correspo dependency graph (or mesh), has a state
 The neighbouring triangles, represen
the vertices in the
ate update of a
rential equations
ted by edges connectin
involves the state

Definition of Graph Partiti

Partitioning
Information
There are several
solve the problem
3D space (e.g.,
 e.g., the graph is derived from some structure in 2D
geometric coordinates can be attached to the vertices and therefore the geometric layout of graph is known
partitioning
PDEs), the

ate Bisection
oordinate bisection
x-coordinate which divides
For recursive bisection we
z- coordinates
 This leads to a better lower edge cut size
partitioning method
We may simply draw a hyperplane orthogonal, say, to
simplest coord
alternately b
aspect ratio
inate-based
hopefully a

ate Bisection

ate Bisection
age – coordinate de
 The same structure may partitioned quite differently in different coordinate systems
 e.g., the partition in (b) is not good as the edge cut set is not minimized in this simple example

(𝑥𝑥, 𝑦𝑦) system let point P h polar coordinates (𝑟𝑟, 𝛼𝛼) a
(𝑟𝑟, 𝛼𝛼 − 𝜃𝜃) in the (𝑥𝑥′, 𝑦𝑦′) system We have
𝑥𝑥=𝑟𝑟cos and
 𝑥𝑥′ =𝑟𝑟cos(𝛼𝛼−𝜃𝜃)
and =𝑟𝑟cos𝛼𝛼cos𝜃𝜃+𝑟𝑟sin𝛼𝛼sin𝜃𝜃
= 𝑥𝑥 cos 𝜃𝜃 + 𝑦𝑦 sin 𝜃𝜃  𝑦𝑦′ =𝑟𝑟sin(𝛼𝛼−𝜃𝜃)
= 𝑟𝑟 sin 𝛼𝛼 cos 𝜃𝜃 − 𝑟𝑟 cos 𝛼𝛼 sin 𝜃𝜃 = −𝑥𝑥 sin 𝜃𝜃 + 𝑦𝑦 cos 𝜃𝜃

of 𝐿𝐿, i.e., We then
�𝑑𝑑= 𝑖𝑖2 =
𝑢𝑢 = (𝑣𝑣, 𝑤𝑤) know 𝑑𝑑𝑖𝑖 of
t rotate x- and
𝑣𝑣2 �(𝑥𝑥 − 𝑣𝑣2𝑠𝑠𝑥𝑥𝑥𝑥 + 2𝑣𝑣
𝑣𝑣2 + 𝑤𝑤2 = 1
ordinates, we need to find the
𝑥𝑥̅) +2𝑣𝑣𝑤𝑤�(𝑥𝑥−𝑥𝑥̅) 𝑦𝑦−𝑦𝑦�
each mesh point and with
𝑤𝑤𝑠𝑠𝑥𝑥𝑥𝑥 + 𝑤𝑤2𝑠𝑠𝑥𝑥𝑥𝑥
2 𝑦𝑦 − 𝑦𝑦� )
th we have

Partitioning
require ge
In many practical problems graph don’t have geometric
 e.g., the
Information
The coordinate-based methods, such as inertial
model of WWW, vertices
ometric information about the
, however, vert information

improve an i
local charac
spectral bisec
t takes a kind of

nitial partition
The Kernigan-Lin (or K-L) algorithm does not create partitions but rather improves them iteratively
local view of
neighbouring vertices
Thus it nicely complements algorithms which have a
view of the problem but t
the problem,
teristics, such as inertial bisection and

Multilevel Parti
n many app
processing step
The multilevel
rithms have been
However, these for large graphs
speed up the computation
It reduces the size of the gr
collapsing
partitioning
small size grap
to construct a partition for t
partitioning is just
partition, e.g.,
methods are very expensive, especially
aph gradually b
he original graph
that find a
first pre-
ncoarsens it

Multilevel Parti
Multilevel phases
 Coarsening
 Initial partition  Refinement
algorithms

Multilevel Parti
arsening phase,
a sequence of smaller graph
s, each with
vertices, is constructed
It can be achieved by finding a maximal the matched vertices into a multinode
A matching of a graph, is a set of edges, no two incident on the same vertex
If a matching contains all possible such edges (desirable f graph coarsening, it is call maximal matching
matching and collapsing

Multilevel Parti
respect to the original graph
bisection of
a coarser graph to be goo
 The weight of a multinode is set equal to the sum of weights of the vertices collapsed in the multinode
 The edges of a multinode are the union of the edges of the vertices collapsed in the multinode

Multilevel Parti
graph is c
In the partitioning phase
 Any partitioning algorithm  It is fast and efficient as
oarse enough, it can be
can be used the size of the
partitione
graph is small
relatively easy t
e vertex weights and edge weights are
es, the bisection of the coarser graph
o be refined
preserved in
original grap

Multilevel Parti
Given the part
partition of the finer graph
rser graph, we need to
In the refinement phase we now have more degree of freedom find a better partition (improved cut size and balance)
The refinement can be done by using one of the variants of K-L algorithm

often is a
trade off between the
of the sol
only a solution of medium
ly interested i
Some algorithms run quite fast quality
Some others take a long time but deliver excellent solu Even others can be tuned between both extremes
The choice of time vs. quality depends on the intended application
 For VLSI-design we prefer the quality as even solution can save real money
 For sparse matrix computation we are time
the execution time for the graph partitioning has to be less than the time saved by parallel matrix computation
hus there is no single best solution for all situations

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com