留学生考试辅导 Analysis of Algorithms, I

Analysis of Algorithms, I
CSOR W4231

Computer Science Department

Columbia University
More divide & conquer algorithms: fast int/matrix multiplication

2 Binary search
3 Integer multiplication
4 Fast matrix multiplication (Strassen’s algorithm)

Review of the last lecture
In the last lecture we discussed
􏰉 Asymptoticnotation(O,Ω,Θ,o,ω) 􏰉 The divide & conquer principle
􏰉 Divide the problem into a number of subproblems that are smaller instances of the same problem.
􏰉 Conquer the subproblems by solving them recursively.
􏰉 Combine the solutions to the subproblems into the
solution for the original problem. 􏰉 Application: mergesort
􏰉 Solving recurrences

mergesort (A,left,right)
if right == left then return
mid = lef t + ⌊(right − lef t)/2⌋ mergesort (A,left,mid) mergesort (A, mid + 1, right) merge (A, lef t, right, mid)
􏰉 Initialcall:mergesort(A,1,n)
􏰉 Subroutine merge merges two sorted lists of sizes ⌈n/2⌉, ⌊n/2⌋ into one sorted list of size n in time Θ(n).

Running time of mergesort
The running time of mergesort satisfies: T(n)=2T(n/2)+cn, forn≥2,constantc>0
This structure is typical of recurrence relations:
􏰉 an inequality or equation bounds T(n) in terms of an expression involving T (m) for m < n 􏰉 a base case generally says that T (n) is constant for small constant n 􏰉 We ignore floor and ceiling notations 􏰉 A recurrence does not provide an asymptotic bound for T (n): to this end, we must solve the recurrence Solving recurrences, method 1: recursion trees The technique consists of three steps 1. Analyze the first few levels of the tree of recursive calls 2. Identify a pattern 3. Sum over all levels of recursion Example: analysis of running time of mergesort T (n) = 2T (n/2) + cn, n ≥ 2 A frequently occurring recurrence and its solution The running time of many recursive algorithms is given by T (n) = aT 􏰀 n 􏰁 + cnk , for a, c > 0, b > 1, k ≥ 0
What is the recursion tree for this recurrence?
􏰉 a is the branching factor
􏰉 b is the factor by which the size of each subproblem shrinks ⇒ at level i, there are ai subproblems, each of size n/bi
⇒ each subproblem at level i requires c(n/bi)k work
􏰉 the height of the tree is logb n levels
logbni ik k b 􏰊a􏰋i
⇒ Totalwork: 􏰅 ac(n/b) =cn 􏰅

Solving recurrences, method 2: Master theorem
Theorem 1 (Master theorem).
If T(n) = aT(⌈n/b⌉) + O(nk) for some constants a > 0, b > 1, k ≥ 0, then
 O(nlogba) ,ifa>bk O(nklogn) ,ifa=bk
O(nk) ,ifaA[mid]lookforxinA[mid+1,n].
Initially, the entire array is “active”, that is, x might be anywhere in the array.
1 mid n Suppose x > A[mid].
Then the active area of the array, where x might be, is to the right of mid. ≤A[mid] ≥A[mid]

Binary search pseudocode
binarysearch(A, lef t, right)
mid = lef t + ⌈(right − lef t)/2⌉ if x == A[mid] then
return mid
else if right == left then
else if x > A[mid] then
left = mid + 1
else right = mid − 1
binarysearch(A, lef t, right)
Initial call: binarysearch(A, 1, n)

Binary search running time
Observation: At each step there is a region of A where x could be and we shrink the size of this region by a factor of 2 with every probe:
􏰉 If n is odd, then we are throwing away ⌈n/2⌉ elements.
􏰉 If n is even, then we are throwing away at least n/2 elements.

Binary search running time
Observation: At each step there is a region of A where x could be and we shrink the size of this region by a factor of 2 with every probe:
􏰉 If n is odd, then we are throwing away ⌈n/2⌉ elements. 􏰉 If n is even, then we are throwing away at least n/2
Hence the recurrence for the running time is T(n) ≤ T(n/2) + O(1)

Sublinear running time
Here are two ways to argue about the running time:
1. Master theorem: b = 2,a = 1,k = 0 ⇒ T(n) = O(logn).
2. We can reason as follows: starting with an array of size n, 􏰉 After k probes, the array has size at most n (every time
we probe an entry, the active portion of the array halves).
􏰉 After k = log n probes, the array has constant size. We can now search linearly for x in the constant size array.
􏰉 We spend constant work to halve the array (why?). Thus the total work spent is O(log n).

Concluding remarks on binary search
1. The right data structure can improve the running time of the algorithm significantly.
􏰉 What if we used a linked list to store the input?
􏰉 Arrays allow for random access of their elements: given an index, we can read any entry in an array in time O(1)
(constant time).
2. In general, we obtain running time O(log n) when the algorithm does a constant amount of work to throw away a constant fraction of the input.

2 Binary search
3 Integer multiplication
4 Fast matrix multiplication (Strassen’s algorithm)

Integer multiplication
􏰉 How do we multiply two integers x and y?
􏰉 Elementary school method: compute a partial product by multiplying every digit of y separately with x and then add up all the partial products.
􏰉 Remark: this method works the same in base 10 or base 2. Examples: (12)10 · (11)10 and (1100)2 · (1011)2
1100 ⨉ 1011
0000 + 1100

Elementary algorithm running time
A more reasonable model of computation: a single operation on a pair of digits (bits) is a primitive computational step.
Assume we are multiplying n-digit (bit) numbers.
􏰉 O(n) time to compute a partial product.
􏰉 O(n) time to combine it in a running sum of all partial products so far.
⇒ There are n partial products, each consisting of n bits, hence total number of operations is O(n2).
Can we do better?

A first divide & conquer approach
Consider n-digit decimal numbers x, y.
x = xn−1xn−2 …x0
y = yn−1yn−2 …y0
Idea: rewrite each number as the sum of the n/2 high-order
digits and the n/2 low-order digits.
x = xn−1…xn/2xn/2−1…x0 =xH ·10n/2 +xL
􏰠 􏰟􏰞 􏰡􏰠 􏰟􏰞 􏰡
y = yn−1…yn/2yn/2−1…y0 =yH ·10n/2 +yL
􏰠 􏰟􏰞 􏰡􏰠 􏰟􏰞 􏰡
where each of xH , xL, yH , yL is an n/2-digit number.

􏰉 n=2,x=12,y=11
12 = 1·101+2
􏰠􏰟􏰞􏰡 􏰠􏰟􏰞􏰡 􏰠􏰟􏰞􏰡 􏰠􏰟􏰞􏰡
x xH 10n/2 xL 11 = 1·101+1
􏰠􏰟􏰞􏰡 􏰠􏰟􏰞􏰡 􏰠􏰟􏰞􏰡 􏰠􏰟􏰞􏰡
y yH 10n/2 yL 􏰉 n=4,x=1000,y=1110
1000 = 10 · 102 + 0 􏰠 􏰟􏰞 􏰡 􏰠􏰟􏰞􏰡 􏰠􏰟􏰞􏰡 􏰠􏰟􏰞􏰡
x xH 10n/2 xL 1110 = 11·102+10
􏰠 􏰟􏰞 􏰡 􏰠􏰟􏰞􏰡 􏰠􏰟􏰞􏰡 􏰠􏰟􏰞􏰡
y yH 10n/2 yL

A first divide & conquer approach
x·y = (xH10n/2 +xL)·(yH10n/2 +yL)
= xHyH ·10n +(xHyL +xLyH)·10n/2 +xLyL
In words, we reduced the problem of solving 1 instance of size n (i.e., one multiplication between two n-digit numbers) to the problem of solving 4 instances, each of size n/2 (i.e., computing the products xHyH,xHyL,xLyH and xLyL).
This is a divide and conquer solution!
􏰉 Recursively solve the 4 subproblems.
􏰉 Multiplication by 10n is easy (shifting): O(n) time.
􏰉 Combine the solutions from the 4 subproblems to an overall solution using 3 additions on O(n)-digit numbers: O(n) time.

Karatsuba’s observation
Running time: T (n) ≤ 4T (n/2) + cn
􏰉 bytheMasterTheorem:T(n)=O(n2) 􏰉 no improvement

Karatsuba’s observation
Running time: T (n) ≤ 4T (n/2) + cn
􏰉 bytheMasterTheorem:T(n)=O(n2) 􏰉 no improvement
However, if we only needed three n/2-digit multiplications, then by the Master theorem
T (n) ≤ 3T (n/2) + cn = O(n1.59) = o(n2). Recall that
x·y = xHyH ·10n +(xHyL +xLyH)10n/2 +xLyL
Key observation: we do not need each of xH yL, xLyH . We only need their sum, xH yL + xLyH .

Gauss’s observation on multiplying complex numbers
A similar situation: multiply two complex numbers a + bi, c + di (a + bi)(c + di) = ac + (ad + bc)i + bdi2

Gauss’s observation on multiplying complex numbers
A similar situation: multiply two complex numbers a + bi, c + di (a + bi)(c + di) = ac + (ad + bc)i + bdi2
Gauss’s observation: can be done with just 3 multiplications (a + bi)(c + di) = ac + ((a + b)(c + d) − ac − bd)i + bdi2,
at the cost of few extra additions and subtractions.
∗ Unlike multiplications, additions and subtractions of n-digit numbers are cheap: O(n) time!

Karatsuba’s algorithm
x·y = (xH10n/2 +xL)·(yH10n/2 +yH)
= xHyH · 10n + (xHyL + xLyH)10n/2 + xLyL
Similarly to Gauss’s method for multiplying two complex numbers, compute only the three products
xHyH, xLyL, (xH +xL)(yH +yL) andobtainthesumxHyL+xLyH from
(xH +xL)(yH +yL)−xHyH −xLyL =xHyL +xLyH. Combining requires O(n) time hence
T (n) ≤ 3T (n/2) + cn = O(nlog2 3) = O(n1.59)

Pseudocode
Let k be a small constant. Integer-Multiply(x, y)
if n == k then return xy
write x = xH 10n/2 + xL, y = yH 10n/2 + yL compute xH + xL, yH + yL
product = Integer-Multiply(xH + xL, yH + yL) xHyH = Integer-Multiply(xH,yH)
xLyL = Integer-Multiply(xL, yL)
return xH yH 10n + (product − xH yH − xLyL)10n/2 + xLyL

Concluding remarks
􏰉 To reduce the number of multiplications we do few more additions/subtractions: these are fast compared to multiplications.
􏰉 There is no reason to continue with recursion once n is small enough: the conventional algorithm is probably more efficient since it uses fewer additions.
􏰉 When we recursively compute (xH + xL)(yH + yL), each of xH + xL, yH + yL might be (n/2 + 1)-digit integers. This does not affect the asymptotics.

2 Binary search
3 Integer multiplication
4 Fast matrix multiplication (Strassen’s algorithm)

Fast matrix multiplication
Matrix multiplication: a fundamental primitive in numerical linear algebra, scientific computing, machine learning and large-scale data analysis.
􏰉 Input: m×n matrix A, n×p matrix B 􏰉 Output: m×pmatrixC=AB
Example: A=􏰊1 0􏰋,B=􏰊1 1􏰋,C=􏰊1 1􏰋 011111
Lower bounds on matrix multiplication algorithms for m, p = Θ(n)?

Conventional matrix multiplication
for 1 ≤ i ≤ m do for 1 ≤ j ≤ p do
for 1 ≤ k ≤ n do
ci,j+=ai,k ·bk,j end for
end for end for
􏰉 Running time?
􏰉 Can we do better?

A first divide & conquer approach: 8 subproblems
Assume square A,B where n = 2k for some k > 0.
Idea: express A, B as 2 × 2 block matrices and use the conventional algorithm to multiply the two block matrices.
A11 A12 A21 A22
C11 C12 C21 C22
B11 B12 = C11 C12 B21 B22 C21 C22
= A11B11 + A12B21 = A11B12 + A12B22 = A21B11 + A22B21 = A21B12 + A22B22
Running time?
􏰌􏰞􏰡􏰠􏰟􏰍􏰌 􏰍􏰌 􏰍

Strassen’s breakthrough: 7 subproblems suffice (part 1)
Compute the following ten n/2 × n/2 matrices. 1. S1 = B11 − B22
2. S2 = A11 + A12
3. S3 = A21 + A22
4. S4 = B21 − B11 5. S5 = A11 + A22 6. S6 = B11 + B22 7. S7 = A12 − A22 8. S8 = B21 + B22 9. S9 = A11 − A21 10. S10 = B11 + B12
Running time?

Strassen’s breakthrough: 7 subproblems suffice (part 2)
Compute the following seven products of n/2 × n/2 matrices. 1. P1 = A11S1
2. P2 = S2B22
3. P3 = S3B11
4. P4 = A22S4 5. P5 = S5S6 6. P6 = S7S8 7. P7 = S9S10
Compute C as follows:
1. C11 =P4 +P5 +P6 −P2 2. C12 = P1 + P2
3. C21 = P3 + P4
4. C22 =P1 +P5 −P3 −P7
Running time?

Strassen’s running time and concluding remarks
􏰉 Recurrence:T(n)=7T(n/2)+cn2
􏰉 By the Master theorem:
T (n) = O(nlog2 7) = O(n2.81)
􏰉 Recently, there is renewed interest in Strassen’s algorithm for high-performance computing: thanks to its lower communication cost (number of bits exchanged between machines in the network or data center), it is better suited than the traditional algorithm for multi-core processors.

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts