Distributed Computing, COMP 4001 1
Distributed Connections
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 2
• Trees
• Canonical Form
• Distributed Views • Broadcast
• BFS/DFS
• Flooding
• Convergecast
• Applications
Outline
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 3
Trees
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 4
Trees and Communication
• Trees are everywhere: saplings, rivers, chemical compounds. – There is something about their e ciency and economy.
• Trees form a natural communication structure in distributed computing.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 5
Main Concepts on Trees
• A tree is a connected graph that has no cycles.
• Start with the tree of one vertex: we can build up any tree we
wish by successively adding a new edge and a new vertex.
– At each stage, the # of vertices exceeds the number of edges
by 1, so every tree with n vertices has exactly n 1 edges N
l Ii:: iii.is
T s-;..
.
5
4
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
# μ°
am
#
2 3
u
Ver
O
Eda t
1 2
Distributed Computing, COMP 4001 6
Characterization of Trees
• Let T be a graph with n vertices. Then the following statements are equivalent.
1. T is connected and has no cydes. {2. T has n 1 edges and has no cycles.
3. T is connected and has n 1 edges.
4. T is connected and the removal of any edge disconnects T .
5. Any two vertices of T are connected by exactly one path.
6. T contains no cycles, but the addition of any new edge
creates a cycle.
equivalent
Are
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 7 Spanning Trees
• Let G be a connected graph. Then a spanning tree in G is a subgraph of G that includes every vertex and is also a tree.
%ntas¥e÷gykr
OOO
• Spanning trees emerge naturally in communication. Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
• A graph . . . Tj
• . . . and possible spanning trees
Distributed Computing, COMP 4001
8
Forests
• A Forest is a collection of vertex disjoint trees.
The trees
of the forest form connected
component . • A Spanning Forest is a collection of vertex disjoint spanning
• Forests arise naturally in clustering. trees.
<
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Tai
Tree Ti has " ti
fr
How many forest have ?
n= #of = ta
disjoint: Ta
Tk vertices
{IT , Tai . . . , Tk
ta- a
--
+- .-+tie-
:
) edges does the
,
r
r-
n
vertices ttat- - - +tk
in T h e forest
k
ta I t
I=n
Distributed Computing, COMP 4001 9
Canonical Form
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 10 How Do Nodes Build their Knowledge?
• They learn by exchanging messages in rounds.
3-
stage stage
co O
O O
=stq
In shops theefeeydcdlecb
info
3 away
→t stage
o
-
stage
o
• At the same time, di↵erent nodes learn di↵erent things! nodes
hops
from
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001
11
Information Growth and Knowledge Discovery
• Start from node u
¥
→*
u
-
-
• When does the growth stop?
It stops when you reach the farthest you can
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
from
u
Distributed Computing, COMP 4001 12
In a Line Graph
• In a typical synchronous distributed algorithm each node executes the following atomic actions
send ! receive ! process in synchronous rounds.
• Node v, by exchanging messages in rounds...
• ...receives information about distance 1, 2, 3, . . . nodes.
←
→
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
It the
knowledge
about
o-•I-Iii-←°→ →.→←
-
07=07%0-0
•Iii builds
-
graph
.
The
method graph :
i s n o t hunted
any
by connected
it works in graqhm.in
any
Distributed Computing, COMP 4001 13
Canonical Form
• Assume that initially, all nodes only know their own identifier and potentially some additional input.
• Information needs at least r rounds to travel r hops.
• After r rounds, a node v can only learn about other nodes at
distance at most r.
• If message size and local computations are not restricted, it is
in fact not hard to see, that
– in r rounds, a node v can learn exactly all the node labels
and inputs up to distance r from v.
• This allows us to transform every deterministic r-round synchronous algorithm into a simple canonical form.
4
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
¥
: :÷÷
÷÷¥¥¥÷÷⇐÷i
An
20+2' I ' ' " t....++2
= 2"
'
-
l
of
number
# exponential
messages
III
.
%
÷' . i.
Distributed Computing, COMP 4001 14
Cumulative Messages
• The idea is to “simplify communication” with cumulative messages
• A typical synchronous distributed algorithm at each node consists of a sequence of executions
send ! receive ! process
in synchronous rounds.
• Often what matters is the source and the destination.
• Can we first do a sequence of r executions “send ! receive00 followed by a single “process00 at the end?
• In other words, can we send “cumulative” messages for r rounds and finally do the processing?
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 15
Example: Computing the Sum in a Ring
• Consider a ring of n nodes with identifiers IDi and weights wi at each node, for i = 1,2,....n.
• In a typical distributed computation for a node i: for r rounds do
1. i sends pair (IDi, wi) to i + 1, and receives pair (IDi 1, wi 1) from i 1,
2. process by adding wi + wi 1.
• This can be done in a cumulative manner at i as follows:
1. i sends (IDi,wi),(IDi 1,wi 1),...,(IDi 1,wi r) to i+1, receives (IDi 1, wi 1), (IDi 2, wi 2), . . . , (IDi r 1, wi r 1)
---
from i 1
2. process by adding wi 1 + wi 2 + · · · + wi r 1
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 16 Synchronous Algorithm: Canonical Form
•
Synchronous Algorithm Canonical Form
1. In r rounds: send complete initial state to nodes at distance at most r /* all the communication first */
2. Compute output based on complete information about r-neighborhood /* do all the computation in the end */
• Example: information “moves” in waves! r=0
Wea re constrained
r=1
only memory
r=2
r=3
by
r=4
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 17 Mai0n Claim on Canonical Form
• Theorem 1 If message size and local computations are not bounded, every deterministic, synchronous r-round algorithm can be transformed into an algorithm having the canonical form (i.e., it is possible to first communicate for r rounds and then do all the computations in the end).
• Notice the importance of being able to transmit messages of arbitrary size:
– this size will depend on the number of r rounds, and – it can be exponetial in r
• To handle “large size messages” you need “large memory”
distributed systemcan be anythingIwanttobe!
A provided
ina
that I do not exceed
message
! memroqnufremenf
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 18
Main Argument
• Consider an r-round algorithm A. We want to show that A can be brought to a canonical form.
• First, let the nodes communicate for r rounds.
• Assume that in every round, every node sends its complete
state to all of its neighbors.
• By induction, after i rounds, every node knows the initial state
of all other nodes at distance at most i.
• Hence, after r rounds, a node v has the combined initial
knowledge of all the nodes in its r-neighborhood.
• We can show that this su ces to simulate locally (at node v) enough of Algorithm A to compute all the messages that v receives in the r communication rounds of a regular execution of Algorithm A.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 19
Issues
• It is straightforward to generalize the canonical form to randomized algorithms:
– Every node first computes all the random bits it needs throughout the algorithm.
• The random bits are then part of the initial state of a node.
aggregate t h e
into
Canonical form of distributed
In computation
we
messages
trees
.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 20
Distributed Views
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 21
Views: Undirected Networks
• Each node has port labels and can build a view accumulating its knowledge.
• The view of depth k of a node is a tree containing information on all the walks of length k leaving that node.
• Viewis contain all the information that nodes could obtain by
exchanging messages with their neighbors.
. ¥÷÷'
" """÷
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 22 r-View (or r-Hop View or r-Neighborhood)
• Collection of initial states of all nodes in the r-neighborhood of a node v, is called r-hop view (or neighborhood) of v.
– For a given graph G, it is denoted by VrG(v) or NrG(v).
00
– We usually omit mention of G (when clear from the
• A view can be enriched as needed by 4
context) and denote it by
Vr(v) or Nr(v).
You
augment any information
can
the View with
– on node states,
– node topology r hops away from the source v, – etc
.
including information:
you want
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 23
Example: r-Hop Views for r = 0, 1, 2, 3
2- Hop 1- Hop
Hop
O
-
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 24
Issues
• Assume that initially, every node knows its degree, its label (identifier) and potentially some additional input.
• The r-hop view of a node v then includes
– the complete topology of the r-neighborhood,
– possibly edges between nodes at distance r in the subgraph, and
– the labels and additional inputs of all nodes in the r-neighborhood.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 25
The View as a Function
• Theorem 2 A deterministic r-round algorithm A is a function that maps every possible r-hop view to the set of possible outputs.
• By Theorem 1, we know that we can transform Algorithm A to the canonical form.
• After r communication rounds, every node v knows exactly its r-hop view.
• This information su ces to compute the output of node v.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 26 Issues
• Two nodes with equal r-hop views
– have to compute the same output in every r-round
algorithm.
• For coloring algorithms, the only input of a node v is its label.
– The r-hop view of a node therefore is its labeled
r-neighborhood.
:÷¥÷÷÷
two
e
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001
27
Power of Viewsa
A
graph of
n
nodes
• For a graph of n nodes, Norris (1995) proved that if two nodes have the same view of depth n 1, then they have the same views for all depths.
r=0
to 4:
r=1
A¥i5n
airmeno
r=2
r=3
r=4
• Taking the diameter of a graph into account, can
n 1 to
for bidirectional graphs with port numberings
ve
O( + log(n/ ))
impro
I
aWe won’t discuss details for these claims.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 28 Views: Directed Networks
• A view can be computed by a node on a network using a
distributed deterministic algorithm
there " on
Directed graph build their
manner
• In directed networks we have “in” and “out” views at a node. Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Is
I
it,
aieesdi.ca?OfgloglYd)?
Distributed Computing, COMP 4001 29
na -k→
i
two
j hops
Broadcasting
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 30
Broadcasting
• Broadcasting refers to a method of transferring a message to all recipients “at once” in a network.
– It is initiated by a single processor, the source.
– The source sends a message to all other nodes in the system.
• In a typical network it may not be possible to send a message “at once” since there might be multiple hops from the source to the rest of the nodes.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 31
Graph Concepts in Broadcasting
• The distance d(u, v) between nodes u and v in an undirected graph G is the number of hops of a minimum length path betw0een u and v.
• The radius
– of a node u is the maximum distance between u and any
other node in the graph.
R(u) = max d(u, v)
v
– of a graph is the minimum radius of any node in the graph. R = min R(u)
u
• The radius and diameter of a graph are called graph eccentricities.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
EEE d
" curl
)=# u
from
ctfu, u dcu, 4=0
hops to v
dear) - da dcuivlsdcuiwltdlw ,
,
u
)
a) Hw nosing
dam ) dlwv)
Distributed Computing, COMP 4001 32
Examples: Graph Eccentricities
• Distance I pen:}put E
O
• Radius
• Diameter
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 33
Examples: Graph Eccentricities ) • Radius, Diameter R - Rca
¥02.0
• There is a close relationship between the radius R and the diameter D of a graph
– RD2R.
RED
ELK
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 34
Examples: Graph Eccentricities
• What are the Radius and Diameter?
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 35
0
BFS/DFS
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 36
BFS Spanning Trees
• Traversal of a graph is performed by visiting all of its vertices in some predefined order.
• Breadth-First-Search Tree. A breadth-first-search tree T of a graph G is a spanning tree of G such that for every node of G, the tree path is a minimum-hop path to the root.
BFS 4¥
*: IEEE Bfs
• Of course a root must be specified!
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 37
BFS Algorithma
• BFS Algorithm: Input a graph G = (V, E)
Proceed by layers,
1. mark the root r;
¥17
2. mark all neighbor vertices that are one hop away from r;
3. mark new vertices that are one hop away from these neighbors (these are two hops away from r);
4. and so on.
• It uses a FIFO queue
• It checks whether a vertex has been discovered before enqueueing the vertex rather than delaying this check until the vertex 0is dequeued from the queue
aInvented in 1945 by Konrad Zuse
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 38
BFS Algorithm
• How do you construct a BFS tree from a given graph? O
+ ¥9
goro I
• First choose any vertex as a root!
610¥
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001
39
BFS (Distance Computation (1/2))
• It starts by placing the source node s at distance d(s) = 0; the distance of all other nodes starts as d(i) = 1.
• At the kth step (starting at k = 0), all n€odes i at distance d(i) = k are examined, and any neighbors j with d(j) = 1 (i.e., not yet discovered) have their distance d(j) set to k + 1.
• The process halts when step k finds no such neighbors; d(j) is then the length of the shortest path from s to j, or d(j) = 1 if there is no such path.
if n o path exists it belongsto a
different connectedcoup
.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 40
BFS (Distance Computation (2/2))
• BFS is the simplest way to search a graph.
• It is suited only for unweighted graphs: ignores edge weights. • Example 1:
• Example 2: In a social network, your friends are at level one and your friends of friends are at level two in a BFS starting at your node.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 41
1€t
• Finding the shortest path between two nodes u and v (with path length measured by number of edges)
What is BFS Tree Used for?
• Finding all nodes within one connected component
– BFS by itself is not enough: some message passing is
needed!
!A
– u and v could be the nodes initiating BFS trees,
root
respectively. a --
• Testing a graph for bipartiteness
on
– Construct a BFS tree from a vertex v and look at all other
vertices at odd or even distance from v.
B KEN
• Doing e cient broadcast – from any any node.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
⇒ ⇒
Big FRM
--
Afoff- μ" A ⇒
Distributed Computing, COMP 4001 42
DFS Spanning Trees
– S(u) all the nodes in the subtree of u, and
– P(u) denote all the vertices that exist in a path between u and the root.
• For a rooted spanning tree T of a graph G,
5H
mm
let us denote by
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 43
DFS Spanning Trees
• DFS Algorithm: Input a graph G = (V, E)
1. Start from a vertex r;
2. visit all possible vertices as far as y*ou can reach;
3. when all vertices are visited, return to the current parent node.
*if I
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 44
DFS (Depth-First Search) (1/2)
• DFS visits the same nodes as BFS but in a di↵erent order.
• If it sees an unvisited node j while examining node i, it fully discovers all unvisited nodes reachable from j and then backtracks to node i to consider the remainder of the nodes adjacent to i.
• It is best described recursively.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 45
DFS (Depth-First Search) (2/2)
• All nodes start out unvisited. • DFS(i):
1. mark i as visited
2. for all nodes j adjacent to i do: 3. if node j is not visited DFS(j)
• Example
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 46
What is DFS Tree Used for?
• Finds all of the vertices reachable from a source vertex r in a graph
– unlike BFS it does not need to search the whole graph.
• Topological sorting.
– this is because of the way it traverses a directed graph.
• Finding the bridges of a graph.
– these are edges whose removal disconnects the graph.
• Finding connected components.
– like BFS.
• Finding strongly connected components.
– these are maximal “strongly connected components” of a directed graph.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 47
Lower Bounds
• The message complexity of broadcast in an n node graph is at least n 1. £
– This is because every node must receive the message. – Which graphs require n 1 message complexity?
• The source’s radius is a lower bound for the time complexity.
– This is because it needs that many hops fromDa source.
• You can use a pre-computed spanning tree to do broadcast
n
with tight message complexity.
• If the spanning tree is a BFS spanning tree (for a given source),
""
then the time complexity is tight as well.
i*
rent.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 48
More on BFS and DFS
• Both BFS and DFS describe a tree; i is the parent of j if the unvisited node j is discovered while examining node i.
• The DFS tree has a rich set of mathematical properties.
– For example, if “(i” is printed at the start of DFS(i) and “i)” when it finishes (after traversing all its neighbors j), then the result is an expression with properly nested and matching parentheses.
– The parentheses of two nodes i and j are either nested one within the other, or they are disjoint.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 49 Impact of Knowledge: Clean Graphs
• If the graph is stored in adjacency list form, both BFS and DFS take an amount of time that is linear in the size of the graph: O(|V | + |E|), where |V | and |E| are the number of nodes and edges, respectively.
¥E
• If the nodes do not know the topology of the graph (i.e., for a clean network) then the number of edges is a lower bound for the broadcast message complexity.
– If you do not try every edge, you might miss a whole part of the graph behind it.
• Knowledge can a↵ect th0e message complexity!
• Call a graph (network) clean if the nodes do not know the
topology of the graph.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 50
Flooding
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 51
• Flooding
• FloodMaxID • OptFloodMax
Outline
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 52
Flooding
• Used by nodes to identify themselves •
1. The source (root) sends the message to all neighbors.
2. Each other node v upon receiving the message the first time forwards the message to all (other) neighbors.
3. Upon later receiving the message again (over other edges), a node can discard the message.
Flooding Algorithm
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 53
Flooding Example
• Let A be the initiating node: AB
CDE
Q.
• Note that node D receives two messages.
x
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 54
Flooding and Trees
• If node v receives the message first from node u, then node v calls node u parent.
– Parent relation defines a spanning tree T (nodes receiving more than one message keep message only from one initiator).
– If flooding algorithm is executed in a synchronous system, then T is a BFS spanning tree (with respect to the root).
• Let R(s) be the radius of the source s in the network.
– In asynchronous systems the flooding algorithm terminates
after R(s) time units.
– However, the spanning tree constructed may not be a BFS spanning tree.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 55
FloodMaxID
• We give a simple algorithm that causes both leaders and non-leaders to identify themselves.
• The algorithm
– requires that the processes know the diameter of the
network;
– floods the maximum ID throughout the network,
⇤ so we call it the FloodMaxID algorithm.
• The algorithm makes leader election possible in a general network.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 56
•
Flood MaxID
1. 0Every process maintains a record of the maximum ID it has seen so far (initially its own).
FloodMaxID Algorithm
2. At each round, each process propagates this maximum on all of its outgoing edges.
3. After D (diameter) rounds, if the maximum value seen is the process’s own ID, the process elects itself the leader; otherwise, it is a non-leader.
• FloodMax elects the process with the maximum ID.
" end,
00.143
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 57
0Analysis of FloodMax
• Define imax to be the index of the process with the maximum
ID, and umax to be that user ID.
• Theorem 3 In the FloodMax algorithm, process imax outputs leader and each other process outputs non-leader, within diameter rounds.
• Main Claim After diameter rounds,
– statusimax = leader and
– after r rounds, the maximum ID has reached every process that is within distance r of imax, as measured along directed
– status =non leader,foreveryj6=i
j max
.
• The key to the proof of this Claim is the fact that
paths in G.
in diameter number
it imax
know
of
y ot uh e stepswho
leader
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 58
FloodMax
• The FloodMax algorithm does not extend directly to the asynchronous setting, because there are no rounds in the asynchronous model.
• However, it is possible to simulate the rounds asynchronously. – We simply require each process that sends a round r
message to tag that message with its round number r.
– The recipient waits to receive round r messages from all its
neighbors before performing its round r transition.
• By simulating diameter rounds, the algorithm can terminate
correctly.
era
a
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
In a way , w e makethe algorithm " message driven "
I
messages round r
of '
rtl it
must
all from ¥
'O' ' .
:*:#
:*
.
Distributed Computing, COMP 4001 59 OptFloodMax Algorithm
• There is a simple improvement that can be used to decrease the communication complexity in many cases, although it does not decrease the order of magnitude in the worst case.
• Namely, processes can send their current max user ID values only when they first learn about them, not at every round.
Heantynumpngalgonthuy
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 60
Convergecast
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 61
Convergecast
• Convergecast is reversed broadcast:
– Instead of a root sending a message to all other nodes, all
other nodes send information to a root.
• Convergecast is useful for input collection. .÷÷:
O
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 62
Echo Algorithm
• Requirement: This algorithm is initiated at the leaves.
1. A leaf sends a message to its parent.
Echo Algorithm
2. If an inner node has received aImessage from each child, it
B:
' ' Echo
F
#¥
t asked
sends a message to the parent.
1
a
D
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 63
Complexity Issues: Broadcast and Convergecast (1/2)
• The echo algorithm is paired with the flooding algorithm, which is used to let the leaves know that they should start the echo process; this is known as flooding/echo.
• One can use convergecast for termination detection.
– If a root wants to know whether all nodes in the system
have finished some task, it initiates a flooding/echo;
⇤ the message in the echo algorithm then means “This subtree has finished the task.”
• Message complexity of the echo algorithm is n 1,
– but together with flooding it is O(m), where m = |E| is the number of edges in the graph.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 64
Complexity Issues: Broadcast and Convergecast (2/2)
• The time complexity of the echo algorithm is determined by the depth of the spanning tree (i.e., the radius of the root within the tree) generated by the flooding algorithm.
• The flooding/echo algorithm can do much more than collecting acknowledgements from subtrees.
– For instance, one can use it to compute the number of nodes in the system, or the maximum ID (for leader election), or the sum of all values stored in the system.
• By combining results one can compute even fancier aggrega- tions, e.g., with the number of nodes and the sum one can compute the average. With the average one can compute the standard deviation. And so on . . .
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 65
Application to Leader Election
• Asynchronous broadcast and convergecast can be used to solve the leader election problem in arbitrary graphs
– without any distinguished source node and
– without the processes having any knowledge of the number
of nodes or the diameter of the network. • The processes need to have user IDs.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 66
Basic Leader Election Algorithm
• Every node can initiate – first a broadcast, and – next a convergecast
in order to discover the maximum user ID in the network.
• The node that finds that the maximum is equal to its own ID elects itself as leader.
• This algorithm uses O(n|E|) messages.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 67
(Directed Graph) BFS and Applications
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 68
Construction of BFS
• How do we perform breadth-first search (BFS) in a network based on an arbitrary strongly connected directed graph having a distinguished source node?
• We consider how to establish a breadth-first spanning tree for the (di)-graph.
• Motivation for constructing such a tree comes from the desire to have a convenient structure to use as a basis for broadcast communication.
• The BFS tree minimizes the maximum communication time from the process at the distinguished node to all other processes in the network
– To do this run BFS from each node of the graph and compare values obtained ateach node.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 69
Construction of BFS
• We suppose that the network is strongly connected and that there is a distinguished source node i0.
• The algorithm is supposed to output the structure of a breadth-first directed spanning tree of the network graph, rooted at i0.
• The output should appear in a distributed fashion: each process other than i0 should have a parent component that gets set to indicate the node that is its parent in the tree.
• As usual, processes only communicate over directed edges.
• Processes are assumed to have user IDs but to have no knowledge of the size or diameter of the network.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 70
SynchBFS
• The basic idea for this algorithm is the same as for the standard sequential breadth-first search algorithm.
•
SynchBFS Algorithm
1. At any point during execution, there is some set of processes that is marked: initially just i0.
2. Process i0 sends out a search message at round 1, to all of its outgoing neighbors.
3. At any round, if an unmarked process receives a search message, it marks itself and chooses one of the processes from which the search has arrived as its parent.
4. At the first round after a process gets marked, it sends a search message to all of its outgoing neighbors.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 71
Analysis of SynchBFS
• We can prove the invariant that
– after r rounds, every process at distance d from i0 in the graph, 1 d r, has its parent pointer defined; moreover, each such pointer points to a node at distance d 1 from i0.
This invariant can, as usual, be proved by induction on the number of rounds.
• The time complexity is at most diameter rounds.
• The number of messages is just |E|
– a search message is transmitted exactly once on each directed edge.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 72
Applications of BFS
• Message Broadcast
• Global computation
• Electing a leader
• Computing the diameter
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 73
Message Broadcast: Piggybacking
• The SynchBFS algorithm can easily be augmented to implement message broadcast.
• If a process has a message M that it wants to communicate to all of the processes in the network,
– it can simply initiate an execution of SynchBFS with itself as the root, piggybacking message M on the search message it sends in round 1.
• Other processes continue to piggyback M on all their search messages as well.
– Since the tree eventually spans all the nodes, message M is eventually delivered to all the processes.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 74
Global Computation
– Collection of information from throughout the network or,
• This means
– more generally, the computation of a function based on
distributed inputs. • For example,
– consider the problem in which each process has a nonnegative integer input value and we want to find the sum of all the inputs in the network.
– Using a BFS tree, this can be done easily (and e ciently) as follows.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 75
Global Computation
• Starting from the leaves, “fan in” the results in a convergecast procedure, as follows.
1. Each leaf sends its value to its parent;
2. each parent waits until it gets the values from all its children, adds them to its own input value, and then sends the sum to its own parent.
• The sum calculated by the root of the BFS tree is the final answer.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 76
Electing a Leader
• Using SynchBFS, an algorithm can be designed to elect a leader in a network with IDs, even when the processes have no knowledge of n or diameter.
1. Namely, all the processes can initiate breadth-first searches in parallel.
2. Each process i uses the tree thereby constructed and the global computation procedure just described to determine the maximum ID of any process in the network.
3. The process with the maximum ID then declares itself to be the leader, and all others announce that they are not the leader.
• If the graph is undirected, the time is O(diameter) and the number of messages is O(n|E|).
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 77
Computing the Diameter
• The diameter of the network can be computed by having all processes initiate breadth-first searches in parallel.
1. Each process i uses the tree thereby constructed to determine max dist, defined to be the maximum distance from i to any other process in the network.
2. Each process i then reuses its breadth-first tree for a global computation to discover the maximum of the max dist values.
• If the graph is undirected, the time is O(diam) and the number of messages is O(n|E|), where diam is the diameter of the graph.
• The diameter thus computed could be used, for example, in the leader-election algorithm FloodMax.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 78
Exercisesa
1. Explain why every tree is a bipartite graph.
2. Let T be a graph with n vertices. Then the following statements are equivalent.
(a) T is connected and has no cydes.
(b) T has n 1 edges and has no cycles.
(c) T is connected and has n 1 edges.
(d) T is connected and the removal of any edge disconnects T .
(e) Any two vertices of T are connected by exactly one path.
(f) T contains no cycles, but the addition of any new edge
creates a cycle.
3. Give an algorithm to compute the diameter and radius of a tree.
4. Determine the size of a message which propagates for r hops in aNot to hand in!
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 79
“Synchronous Algorithm Canonical Form”. More specifically, consider a complete binary rooted tree with height n. Label the ports at an interior node as L, R (for the Left and Right siblings at a node), and P for its parent. Do the same in an analogous manner for the root and the leaves. For each r n and each node v construct the r-view at this node.
5. A connected graph is Hamiltonian if there is a cycle, that includes every vertex exactly once (such a cycle is called Hamiltonian). A connected graph is semi-Hamiltonian if there is a path (but not a cycle) that includes every vertex exactly once (such a path is called semi-Hamiltonian). Determine which of the following graphs are semi-Hamiltonian, and write down a corresponding semi-Hamiltonian path where possible:
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 80
6. A forest is a graph (not necessarily connected), each of whose components is a tree.
(a) Let G be a forest with 11 vertices and k components. How many edges does G have?
(b) Construct a forest with 12 vertices and 9 edges.
(c) Is it true that every forest with k components has at least
2k vertices of degree 1?
7. A spanning forest in a graph G (not necessarily connected) is obtained by constructing a spanning tree for each component of G.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 81
(a) Find a spanning forest for the following graph.
(b) LetGbeagraph,andletF beasubgraphofG. IfF isa forest which includes all vertices of G, is F necessarily a spanning forest of G?
8. Find three spanning trees in the Petersen graph (depicted below):
9. Prove that trees and forests are bipartite graphs.
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)
Distributed Computing, COMP 4001 82
10. Prove that, in a bipartite graph, every cycle has an even number of edges. Conversely, prove that, if every cycle of a graph has an even number of edges, then the graph is bipartite. Hint: Consider a connected graph G. Choose a vertex v in G and consider those vertices whose minimum distance from v is even and those whose minimum distance from v is odd. To which vertices are the “odd” vertices adjacent? To which vertices are the “even” vertices adjacent?
Evangelos Kranakis, Carleton University, SCS (November 16, 2020)