CS代写 COMP3121/9101

Algorithms COMP3121/9101

THE UNIVERSITY OF
NEW SOUTH WALES

Copyright By PowCoder代写 加微信 powcoder

Algorithms
COMP3121/9101

School of Computer Science and Engineering
University of Wales Sydney

4. FAST LARGE INTEGER MULTIPLICATION

COMP3121/9101 1 / 35

Basics revisited: how do we multiply two numbers?

The primary school algorithm:

X X X X <- first input integer * X X X X <- second input integer X X X X \ O(n^2) intermediate operations: X X X X / O(n^2) elementary multiplications X X X X / + O(n^2) elementary additions --------------- X X X X X X X X <- result of length 2n Can we do it faster than in n2 many steps?? COMP3121/9101 2 / 35 The Karatsuba trick Take the two input numbers A and B, and split them into two XX . . .X︸ ︷︷ ︸ XX . . .X︸ ︷︷ ︸ AB can now be calculated as follows: AB = A1B12 n + (A1B0 +A0B1)2 n + ((A1 +A0)(B1 +B0)−A1B1 −A0B0)2 We have saved one multiplication, now we have only three: A0B0, A1B1 and (A1 +A0)(B1 +B0). COMP3121/9101 3 / 35 n+((A1+A0)(B1+B0)−A1B1−A0B0)2 1: function Mult(A,B) 2: if |A| = |B| = 1 then return AB 4: A1 ← MoreSignificantPart(A); 5: A0 ← LessSignificantPart(A); 6: B1 ← MoreSignificantPart(B); 7: B0 ← LessSignificantPart(B); 8: U ← A0 +A1; 9: V ← B0 +B1; 10: X ←Mult(A0,B0); 11: W ←Mult(A1,B1); 12: Y ←Mult(U,V); 13: return W 2n + (Y −X −W ) 2n/2 +X 14: end if 15: end function COMP3121/9101 4 / 35 The Karatsuba trick How many steps does this algorithm take? (remember, addition is in linear time!) Recurrence: T (n) = 3T a = 3; b = 2; f(n) = c n; nlogb a = nlog2 3 since 1.5 < log2 3 < 1.6 we have f(n) = c n = O(nlog2 3−ε) for any 0 < ε < 0.5 Thus, the first case of the Master Theorem applies. Consequently, T (n) = Θ(nlog2 3) < Θ(n1.585) without going through the messy calculations! COMP3121/9101 5 / 35 Generalizing Karatsuba’s algorithm Can we do better if we break the numbers in more than two pieces? Lets try breaking the numbers A,B into 3 pieces; then with k = n/3 we obtain A = XXX . . .XX︸ ︷︷ ︸ k bits of A2 XXX . . .XX︸ ︷︷ ︸ k bits of A1 XXX . . .XX︸ ︷︷ ︸ k bits of A0 AB = A2B2 2 4k + (A2B1 +A1B2)2 3k + (A2B0 +A1B1 +A0B2)2 + (A1B0 +A0B1)2 COMP3121/9101 6 / 35 The Karatsuba trick AB = A2B2︸ ︷︷ ︸ + (A2B1 +A1B2︸ ︷︷ ︸ + (A2B0 +A1B1 +A0B2︸ ︷︷ ︸ + (A1B0 +A0B1︸ ︷︷ ︸ +A0B0︸ ︷︷ ︸ we need only 5 coefficients: C3 =A2B1 +A1B2 C2 =A2B0 +A1B1 +A0B2 C1 =A1B0 +A0B1 Can we get these with 5 multiplications only? Should we perhaps look at (A2 +A1 +A0)(B2 +B1 +B0) = A0B0 +A1B0 +A2B0 +A0B1 +A1B1 +A2B1 +A0B2 +A1B2 +A2B2 ??? Not clear at all how to get C0 − C4 with 5 multiplications only ... COMP3121/9101 7 / 35 The Karatsuba trick: slicing into 3 pieces We now look for a method for getting these coefficients without any guesswork! We form the naturally corresponding polynomials: PA(x) = A2 x 2 +A1 x+A0; PB(x) = B2 x 2 +B1 x+B0. k +A0 = PA(2 k +B0 = PB(2 COMP3121/9101 8 / 35 The Karatsuba trick: slicing into 3 pieces If we manage to compute somehow the product polynomial PC(x) = PA(x)PB(x) = C4 x 2 + C1 x+ C0, with only 5 multiplications, we can then obtain the product of numbers A and B simply as A ·B = PA(2k)PB(2k) = PC(2k) = C4 24k + C3 23k + C2 22k + C1 2k + C0, Note that the right hand side involves only shifts and additions. Since the product polynomial PC(x) = PA(x)PB(x) is of degree 4 we need 5 values to uniquely determine PC(x). We choose the smallest possible 5 integer values (smallest by their absolute value), i.e., −2, −1, 0, 1, 2. Thus, we compute PA(−2), PA(−1), PA(0), PA(1), PA(2) PB(−2), PB(−1), PB(0), PB(1), PB(2) COMP3121/9101 9 / 35 The Karatsuba trick: slicing into 3 pieces For PA(x) = A2x 2 +A1x+A0 we have PA(−2) = A2(−2)2 +A1(−2) +A0 = 4A2 − 2A1 +A0 PA(−1) = A2(−1)2 +A1(−1) +A0 = A2 −A1 +A0 PA(0) = A20 +A10 +A0 = A0 PA(1) = A21 +A11 +A0 = A2 +A1 +A0 PA(2) = A22 +A12 +A0 = 4A2 + 2A1 +A0. Similarly, for PB(x) = B2x 2 +B1x+B0 we have PB(−2) = B2(−2)2 +B1(−2) +B0 = 4B2 − 2B1 +B0 PB(−1) = B2(−1)2 +B1(−1) +B0 = B2 −B1 +B0 PB(0) = B20 +B10 +B0 = B0 PB(1) = B21 +B11 +B0 = B2 +B1 +B0 PB(2) = B22 +B12 +B0 = 4B2 + 2B1 +B0. These evaluations involve only additions because 2A = A+A; 4A = 2A+ 2A. COMP3121/9101 10 / 35 The Karatsuba trick: slicing into 3 pieces Having obtained PA(−2), PA(−1), PA(0), PA(1), PA(2) and PB(−2), PB(−1), PB(0), PB(1), PB(2) we can now obtain PC(−2), PC(−1), PC(0), PC(1), PC(2) with only 5 multiplications of large numbers: PC(−2) = PA(−2)PB(−2) = (A0 − 2A1 + 4A2)(B0 − 2B1 + 4B2) PC(−1) = PA(−1)PB(−1) = (A0 −A1 +A2)(B0 −B1 +B2) PC(0) = PA(0)PB(0) PC(1) = PA(1)PB(1) = (A0 +A1 +A2)(B0 +B1 +B2) PC(2) = PA(2)PB(2) = (A0 + 2A1 + 4A2)(B0 + 2B1 + 4B2) COMP3121/9101 11 / 35 The Karatsuba trick: slicing into 3 pieces Thus, if we represent the product PC(x) = PA(x)PB(x) in the coefficient form as PC(x) = C4x 2 + C1x+ C0 we get C4(−2)4 + C3(−2)3 + C2(−2)2 + C1(−2) + C0 = PC(−2) = PA(−2)PB(−2) C4(−1)4 + C3(−1)3 + C2(−1)2 + C1(−1) + C0 = PC(−1) = PA(−1)PB(−1) + C1 · 0 + C0 = PC(0) = PA(0)PB(0) + C1 · 1 + C0 = PC(1) = PA(1)PB(1) + C1 · 2 + C0 = PC(2) = PA(2)PB(2). Simplifying the left side we obtain 16C4 − 8C3 + 4C2 − 2C1 + C0 = PC(−2) C4 − C3 + C2 − C1 + C0 = PC(−1) C0 = PC(0) C4 + C3 + C2 + C1 + C0 = PC(1) 16C4 + 8C3 + 4C2 + 2C1 + C0 = PC(2) COMP3121/9101 12 / 35 The Karatsuba trick: slicing into 3 pieces Solving this system of linear equations for C0, C1, C2, C3, C4 produces (as an exercise solve this system by hand, using the Gaussian elimination) C0 = PC(0) Note that these expressions do not involve any multiplications of TWO large numbers and thus can be done in linear time. With the coefficients C0, C1, C2, C3, C4 obtained, we can now form the polynomial PC(x) = C0 + C1x+ C2x We can now compute PC(2 k) = C0 + C12 4k in linear time, because computing PC(2 k) involves only binary shifts of the coefficients plus O(k) additions. Thus we have obtained A ·B = PA(2k)PB(2k) = PC(2k) with only 5 multiplications! Here is the complete algorithm: COMP3121/9101 13 / 35 1: function Mult(A,B) 2: obtain A0, A1, A2 and B0, B1, B2 such that A = A2 22 k + A1 2k + A0 ; B = B2 22 k + B1 2k + B0 ; 3: form polynomials PA (x) = A2x2 + A1x + A0 ; PB (x) = B2x2 + B1x + B0 ; PA (−2) ← 4A2 − 2A1 + A0 PB (−2) ← 4B2 − 2B1 + B0 PA (−1) ← A2 − A1 + A0 PB (−1) ← B2 − B1 + B0 PA (0) ← A0 PB (0) ← B0 PA (1) ← A2 + A1 + A0 PB (1) ← B2 + B1 + B0 PA (2) ← 4A2 + 2A1 + A0 PB (2) ← 4B2 + 2B1 + B0 PC (−2) ← Mult(PA (−2), PB (−2)); PC (−1) ← Mult(PA (−1), PB (−1)); PC (0) ← Mult(PA (0), PB (0)); PC (1) ← Mult(PA (1), PB (1)); PC (2) ← Mult(PA (2), PB (2)) C0 ← PC (0); C1 ← 7: form PC (x) = C4x4 + C3x3 + C2x2 + C1x + C0 ; compute PC (2k ) = C424k + C323k + C222k + C12k + C0 8: return PC (2k ) = A · B. 9: end function COMP3121/9101 14 / 35 The Karatsuba trick: slicing into 3 pieces How fast is this algorithm? We have replaced a multiplication of two n bit numbers with 5 multiplications of n/3 bit numbers with an overhead of additions, shifts and the similar, all doable in linear time c n; T (n) = 5T We now apply the Master Theorem: we have a = 5, b = 3, so we consider nlogb a = nlog3 5 ≈ n1.465... Clearly, the first case of the MT applies and we get T (n) = O(nlog3 5) < O(n1.47). COMP3121/9101 15 / 35 The Karatsuba trick: slicing into 3 pieces Recall that the original Karatsuba algorithm runs in time nlog2 3 ≈ n1.58 > n1.47.

Thus, we got a significantly faster algorithm.

Then why not slice numbers A and B into even larger number of slices?
Maybe we can get even faster algorithm?

The answer is, in a sense, BOTH yes and no, so lets see what happens if
we slice numbers into p+ 1 many (approximately) equal slices, where
p = 1, 2, 3, . . ….

COMP3121/9101 16 / 35

Generalizing Karatsuba’s algorithm

The general case – slicing the input numbers A,B into p+ 1 many slices

For simplicity, let us assume A and B have exactly (p+ 1)k bits
(otherwise one of the slices will have to be shorter);

Note: p is a fixed (smallish) number, a fixed parameter of our design –
p+ 1 is the number of slices we are going to make, but k depends on the
input values A and B and can be arbitrarily large!

Slice A,B into p+ 1 pieces each:

kp + Ap−12

k(p−1) + · · ·+ A0

kp + Bp−12
k(p−1) + · · ·+ B0

-1 . . . A0

k bits k bits … k bits

A divided into p+1 slices each slice k bits = (p+1) k bits in total

COMP3121/9101 17 / 35

Generalizing Karatsuba’s algorithm

We form the naturally corresponding polynomials:

PA(x) = Apx

p−1 + · · ·+ A0
PB(x) = Bpx

p−1 + · · ·+ B0

As before, we have:

k); B = PB(2
k); AB = PA(2

k) = (PA(x) · PB(x)) |x=2k

AB = (PA(x) · PB(x)) |x=2k

we adopt the following strategy:
we will first figure out how to multiply polynomials fast to obtain

PC(x) = PA(x) · PB(x);

then we evaluate PC(2

Note that PC(x) = PA(x) · PB(x) is of degree 2p:

COMP3121/9101 18 / 35

Generalizing Karatsuba’s algorithm

+A1x+A0)(B3x

+B1x+B0) =

+ (A2B3 +A3B2)x

+ (A1B3 +A2B2 +A3B1)x

+(A0B3 +A1B2 +A2B1 +A3B0)x
+ (A0B2 +A1B1 +A2B0)x

+(A0B1 +A1B0)x+A0B0

In general: for

PA(x) = Apx

+ · · ·+A0

PB(x) = Bpx

+ · · ·+B0

PA(x) · PB(x) =

 xj = 2p∑

We need to find the coefficients Cj =

AiBk without performing (p+ 1)

many multiplications necessary to get all products of the form AiBk.

COMP3121/9101 19 / 35

A VERY IMPORTANT DIGRESSION:

If you have two sequences A⃗ = (A0, A1, . . . , Ap−1, Ap) and

B⃗ = (B0, B1, . . . , Bm−1, Bm), and if you form the two corresponding polynomials

PA(x) = A0 +A1x+ . . .+Ap−1x

PB(x) = B0 +B1x+ . . .+Bm−1x

and if you multiply these two polynomials to obtain their product

PA(x) · PB(x) =

 xj = p+m∑

then the sequence C⃗ = (C0, C1, . . . , Cp+m) of the coefficients of the product
polynomial, with these coefficients given by

AiBk, for 0 ≤ j ≤ p+m,

is extremely important and is called the LINEAR CONVOLUTION of

sequences A⃗ and B⃗ and is denoted by C⃗ = A⃗ ⋆ B⃗.

COMP3121/9101 20 / 35

AN IMPORTANT DIGRESSION:

For example, if you have an audio signal and you want to emphasise the bass
sounds, you would pass the sequence of discrete samples of the signal through
a digital filter which amplifies the low frequencies more than the medium and
the high audio frequencies.
This is accomplished by computing the linear convolution of the sequence of
discrete samples of the signal with a sequence of values which correspond to
that filter, called the impulse response of the filter.

This means that the samples of the output sound are simply the coefficients of

the product of two polynomials:
1 polynomial PA(x) whose coefficients Ai are the samples of the input

2 polynomial PB(x) whose coefficients Bk are the samples of the so

called impulse response of the filter (they depend of what kind of
filtering you want to do).

Convolutions are bread-and-butter of signal processing, and for that reason it
is extremely important to find fast ways of multiplying two polynomials of
possibly very large degrees.
In signal processing these degrees can be greater than 1000.
This is the main reason for us to study methods of fast computation of
convolutions (aside of finding products of large integers, which is what we are
doing at the moment).

COMP3121/9101 21 / 35

Coefficient vs value representation of polynomials

Every polynomial PA(x) of degree p is uniquely determined by its values at
any p+ 1 distinct input values x0, x1, . . . , xp:

PA(x) ↔ {(x0, PA(x0)), (x1, PA(x1)), . . . , (xp, PA(xp))}

For PA(x) = Apx

p−1 + . . .+A0, these values can be obtained via a
matrix multiplication:

 . (1)

It can be shown that if xi are all distinct then this matrix is invertible.

Such a matrix is called the Vandermonde matrix.

COMP3121/9101 22 / 35

Coefficient vs value representation of polynomials – ctd.

Thus, if all xi are all distinct, given any values PA(x0), PA(x1), . . . , PA(xp) the
coefficients A0, A1, . . . , Ap of the polynomial PA(x) are uniquely determined:

• Equations (1) and (2) show how we can commute between:
1 a representation of a polynomial PA(x) via its coefficients

Ap, Ap−1, . . . , A0, i.e. PA(x) = Apx
p + . . .+A1x+A0

2 a representation of a polynomial PA(x) via its values

PA(x)↔ {(x0, PA(x0)), (x1, PA(x1)), . . . , (xp, PA(xp))}

COMP3121/9101 23 / 35

Coefficient vs value representation of polynomials- ctd.

If we fix the inputs x0, x1, . . . , xp then commuting between a representation of
a polynomial PA(x) via its coefficients and a representation via its values at
these points is done via the following two matrix multiplications, with matrices
made up from constants:

Thus, for fixed input values x0, . . . , xp this switch between the two kinds of
representations is done in linear time!

COMP3121/9101 24 / 35

Our strategy to multiply polynomials fast:

1 Given two polynomials of degree at most p,

PA(x) = Apx
+ . . .+A0; PB(x) = Bpx

+ . . .+B0

convert them into value representation at 2p+ 1 distinct points x0, x1, . . . , x2p:

PA(x) ↔ {(x0, PA(x0)), (x1, PA(x1)), . . . , (x2p, PA(x2p))}

PB(x) ↔ {(x0, PB(x0)), (x1, PB(x1)), . . . , (x2p, PB(x2p))}

Note: since the product of the two polynomials will be of degree 2p
we need the values of PA(x) and PB(x) at 2p+1 points, rather than
just p+ 1 points!

2 Multiply these two polynomials point-wise, using 2p+ 1 multiplications only.

PA(x)PB(x) ↔ {(x0, PA(x0)PB(x0)︸ ︷︷ ︸

), (x1, PA(x1)PB(x1)︸ ︷︷ ︸

), . . . , (x2p, PA(x2p)PB(x2p)︸ ︷︷ ︸

3 Convert such value representation of PC(x) = PA(x)PB(x) back to coefficient

PC(x) = C2px

+ . . .+ C1x+ C0;

COMP3121/9101 25 / 35

Fast multiplication of polynomials – continued

What values should we choose for x0, x1, . . . , x2p??

Key idea: use 2p+ 1 smallest possible integer values!

{−p, −(p− 1), . . . , −1, 0, 1, . . . , p− 1, p}

So we find the values PA(m) and PB(m) for all m such that −p ≤ m ≤ p.
Remember that p+ 1 is the number of slices we split the input numbers A,B.

Multiplication of a large number with k bits by a constant integer d can be
done in time linear in k because it is reducible to d− 1 additions:

d ·A = A+A+ . . .+A︸ ︷︷ ︸

Thus, all the values

PA(m) = Apm

+ · · ·+A0 : −p ≤ m ≤ p,

PB(m) = Bpm

+ · · ·+B0 : −p ≤ m ≤ p.

can be found in time linear in the number of bits of the input numbers!

COMP3121/9101 26 / 35

Fast multiplication of polynomials – ctd.

We now perform 2p+ 1 multiplications of large numbers to obtain

PA(−p)PB(−p), . . . , PA(−1)PB(−1), PA(0)PB(0), PA(1)PB(1), . . . , PA(p)PB(p)

For PC(x) = PA(x)PB(x) these products are 2p+ 1 many values of PC(x):

PC(−p) = PA(−p)PB(−p), . . . , PC(0) = PA(0)PB(0), . . . , PC(p) = PA(p)PB(p)

Let C0, C1, . . . , C2p be the coefficients of the product polynomial C(x), i.e., let

PC(x) = C2px

+ · · ·+ C0,

We now have:

C2p(−p)2p + C2p−1(−p)2p−1 + · · ·+ C0 = PC(−p)
C2p(−(p− 1))2p + C2p−1(−(p− 1))2p−1 + · · ·+ C0 = PC(−(p− 1))

C2p(p− 1)2p + C2p−1(p− 1)2p−1 + · · ·+ C0 = PC(p− 1)

+ · · ·+ C0 = PC(p)

COMP3121/9101 27 / 35

Fast multiplication of polynomials – ctd.

This is just a system of linear equations, that can be solved for C0, C1, . . . , C2p:

1 −p (−p)2 . . . (−p)2p
1 −(p− 1) (−(p− 1))2 . . . (−(p− 1))2p

1 p− 1 (p− 1)2 . . . (p− 1)2p
1 p p2 . . . p2p

PC(−(p− 1))

i.e., we can obtain C0, C1, . . . , C2p as

1 −p (−p)2 . . . (−p)2p
1 −(p− 1) (−(p− 1))2 . . . (−(p− 1))2p

1 p− 1 (p− 1)2 . . . (p− 1)2p
1 p p2 . . . p2p

PC(−(p− 1))

But the inverse matrix also involves only constants depending on p only;

Thus the coefficients Ci can be obtained in linear time.

So here is the algorithm we have just described:

COMP3121/9101 28 / 35

1: function Mult(A,B)
2: if |A| = |B| < p + 1 then return AB 4: obtain p + 1 slices A0, A1, . . . , Ap and B0, B1, . . . , Bp such that + . . . + A0 + . . . + B0 5: form polynomials PA(x) = Apx + . . . + A0 PB(x) = Bpx + . . . + B0 6: for m = −p to m = p do 7: compute PA(m) and PB(m); 8: PC(m)← Mult(PA(m)PB(m)) 9: end for 10: compute C0, C1, . . . C2p via 1 −p (−p)2 . . . (−p)2p 1 −(p− 1) (−(p− 1))2 . . . (−(p− 1))2p 1 p p2 . . . p2p PC(−(p− 1)) 11: form PC(x) = C2px2p + . . . + C0 and compute PC(2k) 12: return PC(2k) = A · B 13: end if 14: end function COMP3121/9101 29 / 35 How fast is our algorithm? it is easy to see that the values of the two polynomials we are multiplying have at most k + s bits where s is a constant which depends on p but does NOT depend on k: PA(m) = Apm p−1 + · · ·+A0 : −p ≤ m ≤ p. This is because each Ai is smaller than 2 k because each Ak has k bits; thus |PA(m)| < pp(p+ 1)× 2k ⇒ log2 |PA(m)| < log2(p p(p+ 1)) + k = s+ k Thus, we have reduced a multiplication of two k(p+ 1) digit numbers to 2p+ 1 multiplications of k + s digit numbers plus a linear overhead (of additions splitting the numbers etc.) So we get the following recurrence for the complexity of Mult(A,B): T ((p+ 1)k) = (2p+ 1)T (k + s) + c k Let n = (p+ 1)k. Then T (n) = (2p+ 1)︸ ︷︷ ︸ p+ 1︸ ︷︷ ︸ Since s is constant, its impact can be neglected. 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com