Prefix Sums and Pointer Jumping
CMPSC 450
Doubling
• A processing technique in which accesses or actions are governed by increasing powers or 2
• That is, processing proceeds by 1, 2, 4, 8, 16, etc., doubling on each iteration
CMPSC 450
2
Prefix Sum by Doubling
• Overview
• 1. Each a(i) is added to a(i+1)
• 2. Each a(i) is added to a(i+2)
• 3. Each a(i) is added to a(i+4)
• 4. Each a(i) is added to a(i+8) • ETC…..
• At any time if an index exceeds n, the operation is suppressed
CMPSC 450
3
Prefix Sums by Doubling Example
4
9
5
2
10
6
12
8*
4
13
14
7
12
16
18*
20*
4
13
18
20
26*
23*
30*
36*
4
13
18
20
30
36
48
56
T1 = O(n) Tp = O(log n)
CMPSC 450
Prefix Sums by Doubling Example
0#
1
2
3
4
5
6
7*
0#
0,1#
1,2
2,3
3,4
4,5
5,6*
6,7*
0#
0,1#
0,1,2 #
0,1,2, 3#
1,2,3, 4
2,3,4, 5
3,4,5, 6
4,5,6, 7
0
0,1
0,1,2
0,1,2, 3
0,1,2, 3,4
0,1,2, 3,4,5
0,1,2 3,4,5, 6
0,1,2, 3,4,5, 6,7
T1 = O(n) Tp = O(log n)
CMPSC 450
Time Complexity
• O(Log N)
• At each step, the number of sums that are
complete doubles
• 1, 2, 4, 8,…
• Thus, the number of steps is log n
CMPSC 450
6
Total Operations – Work/Cost
• For N data values
• Step 1 = N – 1 additions {-20}
• 1 PC is suppressed
• Step 2 = N – 2 additions {-21}
• 2 PCs are suppressed
• Step 3 = N – 4 additions {-22} • 4 PCs are suppressed
• Etc.
CMPSC 450
7
Consider case of N = 8
• Step 1 = N-1 = 7
• Step 2 = N-2 = 6
• Step 3 = N-4 = 4 • TOTAL = 17
• Generalize
• (N-1) + (N-2) + (N-4) = 3N -7
• Sequential = N-1
CMPSC 450
8
Generalize the Work
Sum(i=0 to (log n) -1:
N – 2i = (N-1) + (N-2) + (N-4) +…+ (N-2log n -1 )
= N*log N – (1+2+4+2log n -1 ) (1+2+4+2log n -1 ) = ???
CMPSC 450
9
Total Work for Prefix Sums
(1+2+4+2log n -1 ) = 2log n – 1
Size = N * Log N – (2log n – 1) = N * Log N – 2log n + 1
T1 = N
CMPSC 450
10
A new technique: Pointer Jumping
• Also called path doubling
• Two example problems • Finding roots of a forest • Parallel prefix
• Tree?
• Directed tree? • Rooted tree? • Forest?
CMPSC 450
Pointer Jumping
Source: https://en.wikipedia.org/wiki/Pointer_jumping
CMPSC 450
Finding roots of a forest: PRAM algorithm
Input: Forest of rooted directed trees, each with self loop at root. Arcs specified by (i, P(i)), where 1≤i≤n.
Output: For each vertex i, the root S(i) of the tree containing i. begin
1. for 1 ≤ i ≤ n pardo Set S(i) := P(i)
while (S(i) ≠ S(S(i))) do Set S(i) := S(S(i))
end
CMPSC 450
Analysis
• Let h be the maximum height of any tree in forest • Number of iterations of while loop is O(log h)
• Each iteration can be executed in O(1) time
• Total work is W(n) = O(n log h)
• Serial time complexity of this problem is θ(n)
• The parallel algorithm is not work-optimal • unless h = O(1)
CMPSC 450
Parallel Prefix
• Generalization of prefix sums
• Each node in forest also has associated weight
• We want to compute prefix sums along path to root
CMPSC 450
Parallel prefix on rooted directed trees: PRAM algorithm
Input: Forest of rooted directed trees, each with self loop at root. Arcs specified by (i, P(i)), each vertex has weight W(i), where 1≤i≤n. For a root r, W(r) = 0.
Output: For each vertex i, W(i) is set equal to the sum of the weights of vertices on the path from I to the root of the tree.
begin
1. for 1 ≤ i ≤ n pardo
Set S(i) := P(i)
while (S(i) ≠ S(S(i))) do
Set W(i) := W(i) + W(S(i))
Set S(i) := S(S(i))
end
CMPSC 450
Analysis
• PRAM algorithm requires O(nlog h) work and O(log n) time
• Not work-optimal unless h = O(1)
• The list ranking problem is a special case: each tree is a linked list.
CMPSC 450
List Ranking
• Input:
S Successor array: represents linked list of n nodes (0 to n-1)
3
4
1
2
5
6
-1
First node • Output:
R Distance array: distance from node i to end of list First node
Last node
6
3
4
5
2
1
0
Last node
CMPSC 450
List ranking PRAM algorithm using pointer jumping
Input: Linked list of n nodes. Successor of each node given by S(i), S value of last node is set to -1.
Output: For each node 1 ≤ i ≤ n, the distance R(i) from node i to end of the list. begin
1. for 1 ≤ i ≤ n pardo
if (S(i) ≠ -1) then set R(i) := 1 else set R(i) := 0
2. for 1 ≤ i ≤ n pardo Set Q(i) := S(i)
while (Q(i) ≠ -1 and Q(Q(i)) ≠ -1) do
Set R(i) := R(i) + R(Q(i))
Set Q(i) := Q(Q(i))
end
CMPSC 450
Analysis
• Serial complexity: θ(n)
• O(nlog n) work and O(log n) time
• Not work optimal due to the additional log n factor. • Requires O(n) processors
• List ranking problem introduced in Wyllie’s PhD thesis, 1979
• Vishkin, 1984: Optimal randomized algorithm
• Cole and Vishkin, 1989: Optimal deterministic algorithm
• Anderson and Miller, 1990: Simpler, optimal randomized algorithm
• How to design work-optimal algorithms? • Pointerjumping+shrinklist
CMPSC 450
Why is list ranking important?
• Subroutine for Euler tours
• Optimal Euler tour computation is a subroutine in several PRAM
parallel graph algorithms
• Used in database search, ordering records
CMPSC 450