NEW SOUTH WALES
Algorithms COMP3121/9101
School of Computer Science and Engineering University of New South Wales
4. FAST LARGE INTEGER MULTIPLICATION – part A
COMP3121/9101
1 / 35
Basics revisited: how do we multiply two numbers?
The primary school algorithm:
X X X X <- first input integer
* X X X X <- second input integer
-------
XXXX X XXX
XX XX XXX X
--------------- XXXXXXXX
\
\ O(n^2) intermediate operations:
/ O(n^2) elementary multiplications
/ + O(n^2) elementary additions <- resultoflength2n
COMP3121/9101 2 / 35
Basics revisited: how do we multiply two numbers?
The primary school algorithm:
X X X X <- first input integer
* X X X X <- second input integer
-------
XXXX X XXX
XX XX XXX X
--------------- XXXXXXXX
\
\ O(n^2) intermediate operations:
/ O(n^2) elementary multiplications
/ + O(n^2) elementary additions <- resultoflength2n
Can we do it faster than in n2 many steps??
COMP3121/9101 2 / 35
The Karatsuba trick
Take the two input numbers A and B, and split them into two halves:
COMP3121/9101 3 / 35
The Karatsuba trick
Take the two input numbers A and B, and split them into two
halves:
A=A122 +A0
n
B=B122 +B0
A1 A0
A= XX...XXX...X
n/2 bits n/2 bits
n
COMP3121/9101
3 / 35
The Karatsuba trick
Take the two input numbers A and B, and split them into two
halves:
A=A122 +A0
n
A1 A0
A= XX...XXX...X
n/2 bits n/2 bits
n
B=B122 +B0
AB can now be calculated as follows:
COMP3121/9101
3 / 35
The Karatsuba trick
Take the two input numbers A and B, and split them into two
halves:
A=A122 +A0
n
A1 A0
A= XX...XXX...X
n
B=B122 +B0
AB can now be calculated as follows:
AB = A1B12n + (A1B0 + A0B1)2n2 + A0B0
n/2 bits
n/2 bits
COMP3121/9101
3 / 35
The Karatsuba trick
Take the two input numbers A and B, and split them into two
halves:
A=A122 +A0
n
A1 A0
A= XX...XXX...X
n/2 bits n/2 bits
n
B=B122 +B0
AB can now be calculated as follows:
AB = A1B12n + (A1B0 + A0B1)2n2 + A0B0
= A1B12n + ((A1 + A0)(B1 + B0) − A1B1 − A0B0)2n2 + A0B0
We have saved one multiplication, now we have only three: A0B0, A1B1 and (A1 + A0)(B1 + B0).
COMP3121/9101 3 / 35
AB =
A1B12n+((A1+A0)(B1+B0)−A1B1−A0B0)2n2 +A0B0
1: functionMult(A,B)
2: if |A|=|B|=1thenreturnAB
3: else
4: A1 ← MoreSignificantPart(A);
5: A0 ← LessSignificantPart(A);
6: B1 ← MoreSignificantPart(B);
7: B0 ← LessSignificantPart(B);
8: U←A0+A1;
9: V←B0+B1;
10: X←Mult(A0,B0);
11: W←Mult(A1,B1);
12: Y ← Mult(U, V);
13: return W 2n +(Y −X −W)2n/2 +X
14: end if
15: end function
COMP3121/9101 4 / 35
The Karatsuba trick
How many steps does this algorithm take? (remember, addition is in linear time!)
COMP3121/9101 5 / 35
The Karatsuba trick
How many steps does this algorithm take? (remember, addition is in linear time!)
Recurrence: T (n) = 3 T n + c n 2
COMP3121/9101 5 / 35
The Karatsuba trick
How many steps does this algorithm take? (remember, addition is in linear time!)
Recurrence: T (n) = 3 T n + c n 2
a=3; b=2; f(n)=cn; nlogba =nlog23
COMP3121/9101
5 / 35
The Karatsuba trick
How many steps does this algorithm take? (remember, addition is in linear time!)
Recurrence: T (n) = 3 T n + c n 2
a=3; b=2; f(n)=cn; nlogba =nlog23 since 1.5 < log2 3 < 1.6 we have
f(n)=cn=O(nlog23−ε) forany 0<ε<0.5
COMP3121/9101
5 / 35
The Karatsuba trick
How many steps does this algorithm take? (remember, addition is in linear time!)
Recurrence: T (n) = 3 T n + c n 2
a=3; b=2; f(n)=cn; nlogba =nlog23 since 1.5 < log2 3 < 1.6 we have
f(n)=cn=O(nlog23−ε) forany 0<ε<0.5 Thus, the first case of the Master Theorem applies.
COMP3121/9101
5 / 35
The Karatsuba trick
How many steps does this algorithm take? (remember, addition is in linear time!)
Recurrence: T (n) = 3 T n + c n 2
a=3; b=2; f(n)=cn; nlogba =nlog23 since 1.5 < log2 3 < 1.6 we have
f(n)=cn=O(nlog23−ε) forany 0<ε<0.5 Thus, the first case of the Master Theorem applies.
Consequently,
T (n) = Θ(nlog2 3) < Θ(n1.585)
without going through the messy calculations!
COMP3121/9101
5 / 35
Generalizing Karatsuba’s algorithm
Can we do better if we break the numbers in more than two pieces?
COMP3121/9101 6 / 35
Generalizing Karatsuba’s algorithm
Can we do better if we break the numbers in more than two pieces? Lets try breaking the numbers A, B into 3 pieces; then with
k = n/3 we obtain
COMP3121/9101 6 / 35
Generalizing Karatsuba’s algorithm
Can we do better if we break the numbers in more than two pieces? Lets try breaking the numbers A, B into 3 pieces; then with
k = n/3 we obtain
A = XXX ...XX XXX ...XX XXX ...XX
k bitsofA2 k bitsofA1 k bitsofA0
COMP3121/9101
6 / 35
Generalizing Karatsuba’s algorithm
Can we do better if we break the numbers in more than two pieces? Lets try breaking the numbers A, B into 3 pieces; then with
k = n/3 we obtain
i.e.,
A = XXX ...XX XXX ...XX XXX ...XX
k bitsofA2 k bitsofA1 k bitsofA0 A= A222k + A12k + A0
COMP3121/9101
6 / 35
Generalizing Karatsuba’s algorithm
Can we do better if we break the numbers in more than two pieces? Lets try breaking the numbers A, B into 3 pieces; then with
k = n/3 we obtain
i.e.,
A = XXX ...XX XXX ...XX XXX ...XX
k bitsofA2 k bitsofA1 k bitsofA0 A= A222k + A12k + A0
B= B222k + B12k + B0
COMP3121/9101
6 / 35
Generalizing Karatsuba’s algorithm
Can we do better if we break the numbers in more than two pieces? Lets try breaking the numbers A, B into 3 pieces; then with
k = n/3 we obtain
i.e.,
A = XXX ...XX XXX ...XX XXX ...XX
k bitsofA2 k bitsofA1 k bitsofA0 A= A222k + A12k + A0
B= B222k + B12k + B0
So,
AB = A2B2 24k + (A2B1 + A1B2)23k + (A2B0 + A1B1 + A0B2)22k+
+ (A1B0 + A0B1)2k + A0B0
COMP3121/9101 6 / 35
The Karatsuba trick
AB = A2B2 24k + (A2B1 + A1B2)23k + (A2B0 + A1B1 + A0B2)22k+
C4 C3 C2 + (A1B0 + A0B1)2k + A0B0
C1 C0
COMP3121/9101
7 / 35
The Karatsuba trick
AB = A2B2 24k + (A2B1 + A1B2)23k + (A2B0 + A1B1 + A0B2)22k+
C4 C3 C2 + (A1B0 + A0B1)2k + A0B0
we need only 5 coefficients:
C1 C0
COMP3121/9101
7 / 35
The Karatsuba trick
AB = A2B2 24k + (A2B1 + A1B2)23k + (A2B0 + A1B1 + A0B2)22k+
C4 C3 C2 + (A1B0 + A0B1)2k + A0B0
C1 C0
we need only 5 coefficients:
C4 =A2B2
C3 =A2B1 +A1B2
C2 =A2B0 + A1B1 + A0B2 C1 =A1B0 +A0B1
C0 =A0B0
COMP3121/9101
7 / 35
The Karatsuba trick
AB = A2B2 24k + (A2B1 + A1B2)23k + (A2B0 + A1B1 + A0B2)22k+
C4 C3 C2 + (A1B0 + A0B1)2k + A0B0
C1 C0
we need only 5 coefficients:
C4 =A2B2
C3 =A2B1 +A1B2
C2 =A2B0 + A1B1 + A0B2 C1 =A1B0 +A0B1
C0 =A0B0
Can we get these with 5 multiplications only?
COMP3121/9101
7 / 35
The Karatsuba trick
AB = A2B2 24k + (A2B1 + A1B2)23k + (A2B0 + A1B1 + A0B2)22k+
C4 C3 C2 + (A1B0 + A0B1)2k + A0B0
C1 C0
we need only 5 coefficients:
C4 =A2B2
C3 =A2B1 +A1B2
C2 =A2B0 + A1B1 + A0B2 C1 =A1B0 +A0B1
C0 =A0B0
Can we get these with 5 multiplications only? Should we perhaps look at
(A2 +A1 +A0)(B2 +B1 +B0) =
A0B0 +A1B0 +A2B0 +A0B1 +A1B1 +A2B1 +A0B2 +A1B2 +A2B2
???
COMP3121/9101
7 / 35
The Karatsuba trick
AB = A2B2 24k + (A2B1 + A1B2)23k + (A2B0 + A1B1 + A0B2)22k+
C4 C3 C2 + (A1B0 + A0B1)2k + A0B0
C1 C0
we need only 5 coefficients:
C4 =A2B2
C3 =A2B1 +A1B2
C2 =A2B0 + A1B1 + A0B2 C1 =A1B0 +A0B1
C0 =A0B0
Can we get these with 5 multiplications only? Should we perhaps look at
(A2 +A1 +A0)(B2 +B1 +B0) =
A0B0 +A1B0 +A2B0 +A0B1 +A1B1 +A2B1 +A0B2 +A1B2 +A2B2
Not clear at all how to get C0 − C4 with 5 multiplications only ... COMP3121/9101
???
7 / 35
The Karatsuba trick: slicing into 3 pieces
We now look for a method for getting these coefficients without any guesswork!
COMP3121/9101 8 / 35
The Karatsuba trick: slicing into 3 pieces
We now look for a method for getting these coefficients without any guesswork!
Let
A=A222k +A12k +A0 B=B222k +B12k +B0
COMP3121/9101
8 / 35
The Karatsuba trick: slicing into 3 pieces
We now look for a method for getting these coefficients without any guesswork!
Let
A=A222k +A12k +A0 B=B222k +B12k +B0
We form the naturally corresponding polynomials: PA(x)=A2x2 +A1x+A0;
PB(x) = B2 x2 + B1 x + B0.
COMP3121/9101
8 / 35
The Karatsuba trick: slicing into 3 pieces
We now look for a method for getting these coefficients without any guesswork!
Let
A=A222k +A12k +A0 B=B222k +B12k +B0
We form the naturally corresponding polynomials: PA(x)=A2x2 +A1x+A0;
Note that
PB(x) = B2 x2 + B1 x + B0.
A =A2 (2k)2 + A1 2k + A0 = PA(2k); B =B2 (2k)2 + B1 2k + B0 = PB(2k).
COMP3121/9101
8 / 35
The Karatsuba trick: slicing into 3 pieces
If we manage to compute somehow the product polynomial PC(x)=PA(x)PB(x)=C4x4 +C3x3 +C2x2 +C1x+C0,
COMP3121/9101 9 / 35
The Karatsuba trick: slicing into 3 pieces
If we manage to compute somehow the product polynomial PC(x)=PA(x)PB(x)=C4x4 +C3x3 +C2x2 +C1x+C0,
with only 5 multiplications, we can then obtain the product of numbers A and B simply as
A·B=PA(2k)PB(2k)=PC(2k)=C424k +C323k +C222k +C12k +C0,
COMP3121/9101 9 / 35
The Karatsuba trick: slicing into 3 pieces
If we manage to compute somehow the product polynomial PC(x)=PA(x)PB(x)=C4x4 +C3x3 +C2x2 +C1x+C0,
with only 5 multiplications, we can then obtain the product of numbers A and B simply as
A·B=PA(2k)PB(2k)=PC(2k)=C424k +C323k +C222k +C12k +C0, Note that the right hand side involves only shifts and additions.
COMP3121/9101 9 / 35
The Karatsuba trick: slicing into 3 pieces
If we manage to compute somehow the product polynomial PC(x)=PA(x)PB(x)=C4x4 +C3x3 +C2x2 +C1x+C0,
with only 5 multiplications, we can then obtain the product of numbers A and B simply as
A·B=PA(2k)PB(2k)=PC(2k)=C424k +C323k +C222k +C12k +C0, Note that the right hand side involves only shifts and additions.
Since the product polynomial PC(x) = PA(x)PB(x) is of degree 4 we need 5 values to uniquely determine PC (x).
COMP3121/9101 9 / 35
The Karatsuba trick: slicing into 3 pieces
If we manage to compute somehow the product polynomial PC(x)=PA(x)PB(x)=C4x4 +C3x3 +C2x2 +C1x+C0,
with only 5 multiplications, we can then obtain the product of numbers A and B simply as
A·B=PA(2k)PB(2k)=PC(2k)=C424k +C323k +C222k +C12k +C0, Note that the right hand side involves only shifts and additions.
Since the product polynomial PC(x) = PA(x)PB(x) is of degree 4 we need 5 values to uniquely determine PC (x).
We choose the smallest possible 5 integer values (smallest by their absolute value),
COMP3121/9101 9 / 35
The Karatsuba trick: slicing into 3 pieces
If we manage to compute somehow the product polynomial PC(x)=PA(x)PB(x)=C4x4 +C3x3 +C2x2 +C1x+C0,
with only 5 multiplications, we can then obtain the product of numbers A and B simply as
A·B=PA(2k)PB(2k)=PC(2k)=C424k +C323k +C222k +C12k +C0, Note that the right hand side involves only shifts and additions.
Since the product polynomial PC(x) = PA(x)PB(x) is of degree 4 we need 5 values to uniquely determine PC (x).
We choose the smallest possible 5 integer values (smallest by their absolute value), i.e., −2, −1, 0, 1, 2.
COMP3121/9101 9 / 35
The Karatsuba trick: slicing into 3 pieces
If we manage to compute somehow the product polynomial PC(x)=PA(x)PB(x)=C4x4 +C3x3 +C2x2 +C1x+C0,
with only 5 multiplications, we can then obtain the product of numbers A and B simply as
A·B=PA(2k)PB(2k)=PC(2k)=C424k +C323k +C222k +C12k +C0, Note that the right hand side involves only shifts and additions.
Since the product polynomial PC(x) = PA(x)PB(x) is of degree 4 we need 5 values to uniquely determine PC (x).
We choose the smallest possible 5 integer values (smallest by their absolute value), i.e., −2, −1, 0, 1, 2.
Thus, we computePA(−2), PA(−1), PA(0), PA(1), PA(2) PB (−2), PB (−1), PB (0), PB (1), PB (2)
COMP3121/9101 9 / 35
The Karatsuba trick: slicing into 3 pieces
ForPA(x)=A2x2 +A1x+A0 wehave
PA(−2) = A2(−2)2 +A1(−2)+A0 = 4A2 −2A1 +A0 PA(−1) = A2(−1)2 +A1(−1)+A0 = A2 −A1 +A0
PA(0)=A202 +A10+A0 =A0
PA(1) = A212 +A11+A0 = A2 +A1 +A0 PA(2) = A222 +A12+A0 = 4A2 +2A1 +A0.
COMP3121/9101
10 / 35
The Karatsuba trick: slicing into 3 pieces
ForPA(x)=A2x2 +A1x+A0 wehave
PA(−2) = A2(−2)2 +A1(−2)+A0 = 4A2 −2A1 +A0 PA(−1) = A2(−1)2 +A1(−1)+A0 = A2 −A1 +A0
PA(0)=A202 +A10+A0 =A0
PA(1) = A212 +A11+A0 = A2 +A1 +A0 PA(2) = A222 +A12+A0 = 4A2 +2A1 +A0.
Similarly, for PB(x) = B2x2 + B1x + B0 we have
PB(−2)=B2(−2)2 +B1(−2)+B0 =4B2 −2B1 +B0 PB(−1)=B2(−1)2 +B1(−1)+B0 =B2 −B1 +B0
PB(0)=B202 +B10+B0 =B0 PB(1)=B212 +B11+B0 =B2 +B1 +B0 PB(2)=B222 +B12+B0 =4B2 +2B1 +B0.
COMP3121/9101
10 / 35
The Karatsuba trick: slicing into 3 pieces
ForPA(x)=A2x2 +A1x+A0 wehave
PA(−2) = A2(−2)2 +A1(−2)+A0 = 4A2 −2A1 +A0 PA(−1) = A2(−1)2 +A1(−1)+A0 = A2 −A1 +A0
PA(0)=A202 +A10+A0 =A0
PA(1) = A212 +A11+A0 = A2 +A1 +A0 PA(2) = A222 +A12+A0 = 4A2 +2A1 +A0.
Similarly, for PB(x) = B2x2 + B1x + B0 we have
PB(−2)=B2(−2)2 +B1(−2)+B0 =4B2 −2B1 +B0 PB(−1)=B2(−1)2 +B1(−1)+B0 =B2 −B1 +B0
PB(0)=B202 +B10+B0 =B0 PB(1)=B212 +B11+B0 =B2 +B1 +B0 PB(2)=B222 +B12+B0 =4B2 +2B1 +B0.
These evaluations involve only additions because 2A = A + A; 4A = 2A + 2A. COMP3121/9101 10 / 35
The Karatsuba trick: slicing into 3 pieces
Having obtained PA(−2),PA(−1),PA(0),PA(1),PA(2) and PB (−2), PB (−1), PB (0), PB (1), PB (2)
we can now obtain PC (−2), PC (−1), PC (0), PC (1), PC (2) with only 5 multiplications of large numbers:
PC(−2) = PA(−2)PB(−2)
= (A0 −2A1 +4A2)(B0 −2B1 +4B2)
PC(−1) = PA(−1)PB(−1)
= (A0 −A1 +A2)(B0 −B1 +B2)
PC(0) = PA(0)PB(0) = A0B0
PC(1) = PA(1)PB(1)
= (A0 +A1 +A2)(B0 +B1 +B2)
PC(2) = PA(2)PB(2)
= (A0 +2A1 +4A2)(B0 +2B1 +4B2) COMP3121/9101
11 / 35
The Karatsuba trick: slicing into 3 pieces
Thus, if we represent the product C(x) = PA(x)PB(x) in the coefficient form as C(x) = C4x4 +C3x3 +C2x2 +C1x+C0 we get
C4(−2)4 + C3(−2)3 + C2(−2)2 + C1(−2) + C0 = PC(−2) = PA(−2)PB(−2) C4(−1)4 + C3(−1)3 + C2(−1)2 + C1(−1) + C0 = PC(−1) = PA(−1)PB(−1)
C404 +C303 +C202 +C1 ·0+C0 =PC(0)=PA(0)PB(0) C414 +C313 +C212 +C1 ·1+C0 =PC(1)=PA(1)PB(1) C424 +C323 +C222 +C1 ·2+C0 =PC(2)=PA(2)PB(2).
COMP3121/9101 12 / 35
The Karatsuba trick: slicing into 3 pieces
Thus, if we represent the product C(x) = PA(x)PB(x) in the coefficient form as C(x) = C4x4 +C3x3 +C2x2 +C1x+C0 we get
C4(−2)4 + C3(−2)3 + C2(−2)2 + C1(−2) + C0 = PC(−2) = PA(−2)PB(−2) C4(−1)4 + C3(−1)3 + C2(−1)2 + C1(−1) + C0 = PC(−1) = PA(−1)PB(−1)
C404 +C303 +C202 +C1 ·0+C0 =PC(0)=PA(0)PB(0) C414 +C313 +C212 +C1 ·1+C0 =PC(1)=PA(1)PB(1) C424 +C323 +C222 +C1 ·2+C0 =PC(2)=PA(2)PB(2).
Simplifying the left side we obtain
16C4 −8C3 +4C2 −2C1 +C0 = PC(−2) C4 −C3 +C2 −C1 +C0 = PC(−1)
C0 =PC(0) C4 +C3 +C2 +C1 +C0 =PC(1) 16C4 +8C3 +4C2 +2C1 +C0 = PC(2)
COMP3121/9101
12 / 35
The Karatsuba trick: slicing into 3 pieces
Solving this system of linear equations for C0, C1, C2, C3, C4 produces (as an exercise solve this system by hand, using the Gaussian elimination)
C0 =PC(0)
C1 = PC(−2) − 2PC(−1) + 2PC(1) − PC(2)
12 3 3 12
C2 = −PC(−2) + 2PC(−1) − 5PC(0) + 2PC(1) − PC(2)
24 3 4 3 24 C3 = −PC(−2) + PC(−1) − PC(1) + PC(2)
12 6 6 12
C4 = PC(−2) − PC(−1) + PC(0) − PC(1) + PC(2)
24 6 4 6 24
COMP3121/9101
13 / 35
The Karatsuba trick: slicing into 3 pieces
Solving this system of linear equations for C0, C1, C2, C3, C4 produces (as an exercise solve this system by hand, using the Gaussian elimination)
C0 =PC(0)
C1 = PC(−2) − 2PC(−1) + 2PC(1) − PC(2)
12 3 3 12
C2 = −PC(−2) + 2PC(−1) − 5PC(0) + 2PC(1) − PC(2)
24 3 4 3 24 C3 = −PC(−2) + PC(−1) − PC(1) + PC(2)
12 6 6 12
C4 = PC(−2) − PC(−1) + PC(0) − PC(1) + PC(2)
24 6 4 6 24
Note that these expressions do not involve any multiplications of TWO large numbers and thus can be done in linear time.
COMP3121/9101 13 / 35
The Karatsuba trick: slicing into 3 pieces
Solving this system of linear equations for C0, C1, C2, C3, C4 produces (as an exercise solve this system by hand, using the Gaussian elimination)
C0 =PC(0)
C1 = PC(−2) − 2PC(−1) + 2PC(1) − PC(2)
12 3 3 12
C2 = −PC(−2) + 2PC(−1) − 5PC(0) + 2PC(1) − PC(2)
24 3 4 3 24 C3 = −PC(−2) + PC(−1) − PC(1) + PC(2)
12 6 6 12
C4 = PC(−2) − PC(−1) + PC(0) − PC(1) + PC(2)
24 6 4 6 24
Note that these expressions do not involve any multiplications of TWO large numbers and thus can be done in linear time.
With the coefficients C0, C1, C2, C3, C4 obtained, we can now form the polynomial PC (x) = C0 + C1x + C2x2 + C3x3 + C4x4.
COMP3121/9101 13 / 35
The Karatsuba trick: slicing into 3 pieces
Solving this system of linear equations for C0, C1, C2, C3, C4 produces (as an exercise solve this system by hand, using the Gaussian elimination)
C0 =PC(0)
C1 = PC(−2) − 2PC(−1) + 2PC(1) − PC(2)
12 3 3 12
C2 = −PC(−2) + 2PC(−1) − 5PC(0) + 2PC(1) − PC(2)
24 3 4 3 24 C3 = −PC(−2) + PC(−1) − PC(1) + PC(2)
12 6 6 12
C4 = PC(−2) − PC(−1) + PC(0) − PC(1) + PC(2)
24 6 4 6 24
Note that these expressions do not involve any multiplications of TWO large numbers and thus can be done in linear time.
With the coefficients C0, C1, C2, C3, C4 obtained, we can now form the polynomial PC (x) = C0 + C1x + C2x2 + C3x3 + C4x4.
We can now compute PC(2k) = C0 + C12k + C222k + C323k + C424k in linear
time, because computing PC(2k) involves only binary shifts of the coefficients plus O(k) additions.
COMP3121/9101 13 / 35
The Karatsuba trick: slicing into 3 pieces
Solving this system of linear equations for C0, C1, C2, C3, C4 produces (as an exercise solve this system by hand, using the Gaussian elimination)
C0 =PC(0)
C1 = PC(−2) − 2PC(−1) + 2PC(1) − PC(2)
12 3 3 12
C2 = −PC(−2) + 2PC(−1) − 5PC(0) + 2PC(1) − PC(2)
24 3 4 3 24 C3 = −PC(−2) + PC(−1) − PC(1) + PC(2)
12 6 6 12
C4 = PC(−2) − PC(−1) + PC(0) − PC(1) + PC(2)
24 6 4 6 24
Note that these expressions do not involve any multiplications of TWO large numbers and thus can be done in linear time.
With the coefficients C0, C1, C2, C3, C4 obtained, we can now form the polynomial PC (x) = C0 + C1x + C2x2 + C3x3 + C4x4.
We can now compute PC(2k) = C0 + C12k + C222k + C323k + C424k in linear time, because computing PC(2k) involves only binary shifts of the coefficients plus O(k) additions.
Thus we have obtained A · B = PA(2k)PB(2k) = PC(2k) with only 5 multiplications!
COMP3121/9101 13 / 35
The Karatsuba trick: slicing into 3 pieces
Solving this system of linear equations for C0, C1, C2, C3, C4 produces (as an exercise solve this system by hand, using the Gaussian elimination)
C0 =PC(0)
C1 = PC(−2) − 2PC(−1) + 2PC(1) − PC(2)
12 3 3 12
C2 = −PC(−2) + 2PC(−1) − 5PC(0) + 2PC(1) − PC(2)
24 3 4 3 24 C3 = −PC(−2) + PC(−1) − PC(1) + PC(2)
12 6 6 12
C4 = PC(−2) − PC(−1) + PC(0) − PC(1) + PC(2)
24 6 4 6 24
Note that these expressions do not involve any multiplications of TWO large numbers and thus can be done in linear time.
With the coefficients C0, C1, C2, C3, C4 obtained, we can now form the polynomial PC (x) = C0 + C1x + C2x2 + C3x3 + C4x4.
We can now compute PC(2k) = C0 + C12k + C222k + C323k + C424k in linear time, because computing PC(2k) involves only binary shifts of the coefficients plus O(k) additions.
Thus we have obtained A · B = PA(2k)PB(2k) = PC(2k) with only 5
multiplications!Here is the complete algorithm:
COMP3121/9101 13 / 35
1: 2: 3: 4:
function Mult(A, B)
obtain A0,A1,A2 and B0,B1,B2 such that A = A2 22k +A1 2k +A0; B = B2 22k +B1 2k +B0;
5:
6:
form polynomials PA(x) = A2x2 + A1x + A0; PA(−2)←4A2−2A1+A0
PA(−1) ← A2 −A1 +A0 PA (0) ← A0
PA(1) ← A2 +A1 +A0 PA(2) ← 4A2 +2A1 +A0
PC(−2) ← Mult(PA(−2),PB(−2)); PC(0) ← Mult(PA(0),PB(0)); PC(1) ← Mult(PA(1),PB(1));
PB(x) = B2x2 + B1x + B0; PB(−2) ← 4B2 − 2B1 + B0 PB(−1) ← B2 −B1 +B0 PB(0) ← B0
PB(1) ← B2 +B1 +B0 PB(2) ← 4B2 +2B1 +B0
PC(−1) ← Mult(PA(−1),PB(−1));
PC(2) ← Mult(PA(2),PB(2))
C0 ← PC(0); PC (−2)
C1 ←
PC (−2) 2PC (−1) −
2PC (1) PC (2) −
+
12 3 3 12
2PC (−1)
C2←− + − + −
5PC (0) 2PC (1) PC (2) 24 3 4 3 24
7:
8: 9:
PC (−2) PC (−1) PC (1) PC (2) C3←− + − +
12 6 6 12
PC (−2) PC (−1) PC (0) PC (1) PC (2) C4← − + − +
24 6 4 6 24
form PC(x) = C4x4 + C3x3 + C2x2 + C1x + C0; compute PC(2k) = C424k + C323k + C222k + C12k + C0
return PC(2k) = A · B. end function
COMP3121/9101
14 / 35
The Karatsuba trick: slicing into 3 pieces
How fast is this algorithm?
COMP3121/9101 15 / 35
The Karatsuba trick: slicing into 3 pieces
How fast is this algorithm?
We have replaced a multiplication of two n bit numbers with 5 multiplications of n/3 bit numbers with an overhead of additions, shifts and the similar, all doable in linear time c n;
COMP3121/9101 15 / 35
The Karatsuba trick: slicing into 3 pieces
How fast is this algorithm?
We have replaced a multiplication of two n bit numbers with 5 multiplications of n/3 bit numbers with an overhead of additions, shifts and the similar, all doable in linear time c n;
thus,
T (n) = 5T n + c n 3
COMP3121/9101 15 / 35
The Karatsuba trick: slicing into 3 pieces
How fast is this algorithm?
We have replaced a multiplication of two n bit numbers with 5 multiplications of n/3 bit numbers with an overhead of additions, shifts and the similar, all doable in linear time c n;
thus,
T (n) = 5T n + c n 3
We now apply the Master Theorem:
COMP3121/9101 15 / 35
The Karatsuba trick: slicing into 3 pieces
How fast is this algorithm?
We have replaced a multiplication of two n bit numbers with 5 multiplications of n/3 bit numbers with an overhead of additions, shifts and the similar, all doable in linear time c n;
thus,
T (n) = 5T n + c n 3
We now apply the Master Theorem:
wehavea=5, b=3,soweconsidernlogba =nlog35 ≈n1.465...
COMP3121/9101
15 / 35
The Karatsuba trick: slicing into 3 pieces
How fast is this algorithm?
We have replaced a multiplication of two n bit numbers with 5 multiplications of n/3 bit numbers with an overhead of additions, shifts and the similar, all doable in linear time c n;
thus,
T (n) = 5T n + c n 3
We now apply the Master Theorem:
wehavea=5, b=3,soweconsidernlogba =nlog35 ≈n1.465...
Clearly, the first case of the MT applies and we get T (n) = O(nlog3 5) < O(n1.47).
COMP3121/9101
15 / 35
The Karatsuba trick: slicing into 3 pieces
Recall that the original Karatsuba algorithm runs in time
nlog2 3 ≈ n1.58 > n1.47.
COMP3121/9101 16 / 35
The Karatsuba trick: slicing into 3 pieces
Recall that the original Karatsuba algorithm runs in time
nlog2 3 ≈ n1.58 > n1.47.
Thus, we got a significantly faster algorithm.
COMP3121/9101 16 / 35
The Karatsuba trick: slicing into 3 pieces
Recall that the original Karatsuba algorithm runs in time
nlog2 3 ≈ n1.58 > n1.47.
Thus, we got a significantly faster algorithm.
Then why not slice numbers A and B into even larger number of slices? Maybe we can get even faster algorithm?
COMP3121/9101 16 / 35
The Karatsuba trick: slicing into 3 pieces
Recall that the original Karatsuba algorithm runs in time
nlog2 3 ≈ n1.58 > n1.47.
Thus, we got a significantly faster algorithm.
Then why not slice numbers A and B into even larger number of slices? Maybe we can get even faster algorithm?
The answer is, in a sense, BOTH yes and no, so lets see what happens if we slice numbers into p + 1 many (approximately) equal slices, where
p = 1,2,3,……
COMP3121/9101 16 / 35
Generalizing Karatsuba’s algorithm
The general case – slicing the input numbers A, B into p + 1 many slices
COMP3121/9101 17 / 35
Generalizing Karatsuba’s algorithm
The general case – slicing the input numbers A, B into p + 1 many slices For simplicity, let us assume A and B have exactly (p + 1)k bits
(otherwise one of the slices will have to be shorter);
COMP3121/9101 17 / 35
Generalizing Karatsuba’s algorithm
The general case – slicing the input numbers A, B into p + 1 many slices For simplicity, let us assume A and B have exactly (p + 1)k bits
(otherwise one of the slices will have to be shorter);
Note: p is a fixed (smallish) number, a fixed parameter of our design –
p + 1 is the number of slices we are going to make, but k depends on the input values A and B and can be arbitrarily large!
COMP3121/9101 17 / 35
Generalizing Karatsuba’s algorithm
The general case – slicing the input numbers A, B into p + 1 many slices For simplicity, let us assume A and B have exactly (p + 1)k bits
(otherwise one of the slices will have to be shorter);
Note: p is a fixed (smallish) number, a fixed parameter of our design –
p + 1 is the number of slices we are going to make, but k depends on the input values A and B and can be arbitrarily large!
Slice A, B into p + 1 pieces each:
A= Ap2kp + Ap−12k(p−1) + ···+ A0
B = Bp2kp + Bp−12k(p−1) + ···+ B0
COMP3121/9101
17 / 35
Generalizing Karatsuba’s algorithm
The general case – slicing the input numbers A, B into p + 1 many slices For simplicity, let us assume A and B have exactly (p + 1)k bits
(otherwise one of the slices will have to be shorter);
Note: p is a fixed (smallish) number, a fixed parameter of our design –
p + 1 is the number of slices we are going to make, but k depends on the input values A and B and can be arbitrarily large!
Slice A, B into p + 1 pieces each:
A= Ap2kp + Ap−12k(p−1) + ···+ A0
B = Bp2kp + Bp−12k(p−1) + ···+ B0
k bits k bits … k bits
A divided into p+1 slices each slice k bits = (p+1) k bits in total COMP3121/9101
17 / 35
Ap
Ap-1
…
A0
Generalizing Karatsuba’s algorithm
We form the naturally corresponding polynomials:
PA(x)= Apxp + Ap−1xp−1 + ···+ A0 PB(x)= Bpxp + Bp−1xp−1+ ···+ B0
As before, we have:
A=PA(2k); B=PB(2k); AB=PA(2k)PB(2k)=(PA(x)·PB(x))|x=2k
COMP3121/9101 18 / 35
Generalizing Karatsuba’s algorithm
We form the naturally corresponding polynomials:
PA(x)= Apxp + Ap−1xp−1 + ···+ A0 PB(x)= Bpxp + Bp−1xp−1+ ···+ B0
As before, we have:
A=PA(2k); B=PB(2k); AB=PA(2k)PB(2k)=(PA(x)·PB(x))|x=2k
Since
AB = (PA (x) · PB (x)) |x=2k
COMP3121/9101 18 / 35
Generalizing Karatsuba’s algorithm
We form the naturally corresponding polynomials:
PA(x)= Apxp + Ap−1xp−1 + ···+ A0 PB(x)= Bpxp + Bp−1xp−1+ ···+ B0
As before, we have:
A=PA(2k); B=PB(2k); AB=PA(2k)PB(2k)=(PA(x)·PB(x))|x=2k
Since
we adopt the following strategy:
AB = (PA (x) · PB (x)) |x=2k
COMP3121/9101 18 / 35
Generalizing Karatsuba’s algorithm
We form the naturally corresponding polynomials:
PA(x)= Apxp + Ap−1xp−1 + ···+ A0 PB(x)= Bpxp + Bp−1xp−1+ ···+ B0
As before, we have:
A=PA(2k); B=PB(2k); AB=PA(2k)PB(2k)=(PA(x)·PB(x))|x=2k
Since
AB = (PA (x) · PB (x)) |x=2k we adopt the following strategy:
we will first figure out how to multiply polynomials fast to obtain PC(x) = PA(x) · PB(x);
COMP3121/9101 18 / 35
Generalizing Karatsuba’s algorithm
We form the naturally corresponding polynomials:
PA(x)= Apxp + Ap−1xp−1 + ···+ A0 PB(x)= Bpxp + Bp−1xp−1+ ···+ B0
As before, we have:
A=PA(2k); B=PB(2k); AB=PA(2k)PB(2k)=(PA(x)·PB(x))|x=2k
Since
AB = (PA (x) · PB (x)) |x=2k we adopt the following strategy:
we will first figure out how to multiply polynomials fast to obtain PC(x) = PA(x) · PB(x);
then we evaluate PC(2k).
COMP3121/9101 18 / 35
Generalizing Karatsuba’s algorithm
We form the naturally corresponding polynomials:
PA(x)= Apxp + Ap−1xp−1 + ···+ A0 PB(x)= Bpxp + Bp−1xp−1+ ···+ B0
As before, we have:
A=PA(2k); B=PB(2k); AB=PA(2k)PB(2k)=(PA(x)·PB(x))|x=2k
Since
AB = (PA (x) · PB (x)) |x=2k we adopt the following strategy:
we will first figure out how to multiply polynomials fast to obtain PC(x) = PA(x) · PB(x);
then we evaluate PC(2k).
Note that PC(x) = PA(x) · PB(x) is of degree 2p:
COMP3121/9101 18 / 35
Generalizing Karatsuba’s algorithm
We form the naturally corresponding polynomials:
PA(x)= Apxp + Ap−1xp−1 + ···+ A0 PB(x)= Bpxp + Bp−1xp−1+ ···+ B0
As before, we have:
A=PA(2k); B=PB(2k); AB=PA(2k)PB(2k)=(PA(x)·PB(x))|x=2k
Since
AB = (PA (x) · PB (x)) |x=2k we adopt the following strategy:
we will first figure out how to multiply polynomials fast to obtain PC(x) = PA(x) · PB(x);
then we evaluate PC(2k).
Note that PC(x) = PA(x) · PB(x) is of degree 2p:
2p PC(x) = Cjxj
j=0
COMP3121/9101 18 / 35
Generalizing Karatsuba’s algorithm
Example:
(A3x3 +A2x2 +A1x+A0)(B3x3 +B2x2 +B1x+B0) =
A3B3x6 + (A2B3 + A3B2)x5 + (A1B3 + A2B2 + A3B1)x4
+(A0B3 + A1B2 + A2B1 + A3B0)x3 + (A0B2 + A1B1 + A2B0)x2
+(A0B1 + A1B0)x + A0B0
COMP3121/9101 19 / 35
Generalizing Karatsuba’s algorithm
Example:
(A3x3 +A2x2 +A1x+A0)(B3x3 +B2x2 +B1x+B0) =
A3B3x6 + (A2B3 + A3B2)x5 + (A1B3 + A2B2 + A3B1)x4
+(A0B3 + A1B2 + A2B1 + A3B0)x3 + (A0B2 + A1B1 + A2B0)x2
In general: for
+(A0B1 + A1B0)x + A0B0 PA(x) = Apxp + Ap−1xp−1 + ··· + A0
PB(x)=Bpxp +Bp−1xp−1 +···+B0
COMP3121/9101 19 / 35
Generalizing Karatsuba’s algorithm
Example:
(A3x3 +A2x2 +A1x+A0)(B3x3 +B2x2 +B1x+B0) =
A3B3x6 + (A2B3 + A3B2)x5 + (A1B3 + A2B2 + A3B1)x4
+(A0B3 + A1B2 + A2B1 + A3B0)x3 + (A0B2 + A1B1 + A2B0)x2
In general: for
we have
+(A0B1 + A1B0)x + A0B0 PA(x) = Apxp + Ap−1xp−1 + ··· + A0
PB(x)=Bpxp +Bp−1xp−1 +···+B0
COMP3121/9101 19 / 35
Generalizing Karatsuba’s algorithm
Example:
(A3x3 +A2x2 +A1x+A0)(B3x3 +B2x2 +B1x+B0) =
A3B3x6 + (A2B3 + A3B2)x5 + (A1B3 + A2B2 + A3B1)x4
+(A0B3 + A1B2 + A2B1 + A3B0)x3 + (A0B2 + A1B1 + A2B0)x2
In general: for
we have
+(A0B1 + A1B0)x + A0B0 PA(x) = Apxp + Ap−1xp−1 + ··· + A0
PB(x)=Bpxp +Bp−1xp−1 +···+B0 2p 2p
PA(x)·PB(x)= AiBk
j=0 i+k=j
xj =Cjxj j=0
COMP3121/9101
19 / 35
Generalizing Karatsuba’s algorithm
Example:
(A3x3 +A2x2 +A1x+A0)(B3x3 +B2x2 +B1x+B0) =
A3B3x6 + (A2B3 + A3B2)x5 + (A1B3 + A2B2 + A3B1)x4
+(A0B3 + A1B2 + A2B1 + A3B0)x3 + (A0B2 + A1B1 + A2B0)x2
In general: for
we have
+(A0B1 + A1B0)x + A0B0 PA(x) = Apxp + Ap−1xp−1 + ··· + A0
PB(x)=Bpxp +Bp−1xp−1 +···+B0 2p 2p
PA(x)·PB(x)= AiBk xj =Cjxj
j=0 i+k=j j=0
We need to find the coefficients Cj = AiBk without performing (p + 1)2 i+k=j
many multiplications necessary to get all products of the form AiBk. COMP3121/9101 19 / 35
A VERY IMPORTANT DIGRESSION:
If you have two sequences A⃗ = (A0, A1, . . . , Ap−1, Ap) and
B⃗ = (B0, B1, . . . , Bm−1, Bm), and if you form the two corresponding polynomials
PA(x)=A0 +A1x+…+Ap−1xp−1 +Apxp PB(x)=B0 +B1x+…+Bm−1xm−1 +Bmxm
COMP3121/9101 20 / 35
A VERY IMPORTANT DIGRESSION:
If you have two sequences A⃗ = (A0, A1, . . . , Ap−1, Ap) and
B⃗ = (B0, B1, . . . , Bm−1, Bm), and if you form the two corresponding polynomials
PA(x)=A0 +A1x+…+Ap−1xp−1 +Apxp PB(x)=B0 +B1x+…+Bm−1xm−1 +Bmxm
and if you multiply these two polynomials to obtain their product
p+m
xj =Cjxj j=0
m+p
PA(x)·PB(x)= AiBk j=0 i+k=j
COMP3121/9101
20 / 35
A VERY IMPORTANT DIGRESSION:
If you have two sequences A⃗ = (A0, A1, . . . , Ap−1, Ap) and
B⃗ = (B0, B1, . . . , Bm−1, Bm), and if you form the two corresponding polynomials
PA(x)=A0 +A1x+…+Ap−1xp−1 +Apxp PB(x)=B0 +B1x+…+Bm−1xm−1 +Bmxm
and if you multiply these two polynomials to obtain their product
p+m
xj =Cjxj j=0
then the sequence C⃗ = (C0, C1, . . . , Cp+m) of the coefficients of the product polynomial, with these coefficients given by
Cj=AiBk, for 0≤j≤p+m, i+k=j
is extremely important and is called the LINEAR CONVOLUTION of sequences A⃗ and B⃗ and is denoted by C⃗ = A⃗ ⋆ B⃗ .
m+p
PA(x)·PB(x)= AiBk j=0 i+k=j
COMP3121/9101
20 / 35
AN IMPORTANT DIGRESSION:
For example, if you have an audio signal and you want to emphasise the bass sounds, you would pass the sequence of discrete samples of the signal through a digital filter which amplifies the low frequencies more than the medium and the high audio frequencies.
COMP3121/9101 21 / 35
AN IMPORTANT DIGRESSION:
For example, if you have an audio signal and you want to emphasise the bass sounds, you would pass the sequence of discrete samples of the signal through a digital filter which amplifies the low frequencies more than the medium and the high audio frequencies.
This is accomplished by computing the linear convolution of the sequence of discrete samples of the signal with a sequence of values which correspond to that filter, called the impulse response of the filter.
COMP3121/9101 21 / 35
AN IMPORTANT DIGRESSION:
For example, if you have an audio signal and you want to emphasise the bass sounds, you would pass the sequence of discrete samples of the signal through a digital filter which amplifies the low frequencies more than the medium and the high audio frequencies.
This is accomplished by computing the linear convolution of the sequence of discrete samples of the signal with a sequence of values which correspond to that filter, called the impulse response of the filter.
This means that the samples of the output sound are simply the coefficients of the product of two polynomials:
COMP3121/9101 21 / 35
AN IMPORTANT DIGRESSION:
For example, if you have an audio signal and you want to emphasise the bass sounds, you would pass the sequence of discrete samples of the signal through a digital filter which amplifies the low frequencies more than the medium and the high audio frequencies.
This is accomplished by computing the linear convolution of the sequence of discrete samples of the signal with a sequence of values which correspond to that filter, called the impulse response of the filter.
This means that the samples of the output sound are simply the coefficients of the product of two polynomials:
1 polynomial PA(x) whose coefficients Ai are the samples of the input signal;
COMP3121/9101 21 / 35
AN IMPORTANT DIGRESSION:
For example, if you have an audio signal and you want to emphasise the bass sounds, you would pass the sequence of discrete samples of the signal through a digital filter which amplifies the low frequencies more than the medium and the high audio frequencies.
This is accomplished by computing the linear convolution of the sequence of discrete samples of the signal with a sequence of values which correspond to that filter, called the impulse response of the filter.
This means that the samples of the output sound are simply the coefficients of the product of two polynomials:
1 polynomial PA(x) whose coefficients Ai are the samples of the input signal;
2 polynomial PB(x) whose coefficients Bk are the samples of the so called impulse response of the filter (they depend of what kind of filtering you want to do).
COMP3121/9101 21 / 35
AN IMPORTANT DIGRESSION:
For example, if you have an audio signal and you want to emphasise the bass sounds, you would pass the sequence of discrete samples of the signal through a digital filter which amplifies the low frequencies more than the medium and the high audio frequencies.
This is accomplished by computing the linear convolution of the sequence of discrete samples of the signal with a sequence of values which correspond to that filter, called the impulse response of the filter.
This means that the samples of the output sound are simply the coefficients of the product of two polynomials:
1 polynomial PA(x) whose coefficients Ai are the samples of the input signal;
2 polynomial PB(x) whose coefficients Bk are the samples of the so called impulse response of the filter (they depend of what kind of filtering you want to do).
Convolutions are bread-and-butter of signal processing, and for that reason it is extremely important to find fast ways of multiplying two polynomials of possibly very large degrees.
COMP3121/9101 21 / 35
AN IMPORTANT DIGRESSION:
For example, if you have an audio signal and you want to emphasise the bass sounds, you would pass the sequence of discrete samples of the signal through a digital filter which amplifies the low frequencies more than the medium and the high audio frequencies.
This is accomplished by computing the linear convolution of the sequence of discrete samples of the signal with a sequence of values which correspond to that filter, called the impulse response of the filter.
This means that the samples of the output sound are simply the coefficients of the product of two polynomials:
1 polynomial PA(x) whose coefficients Ai are the samples of the input signal;
2 polynomial PB(x) whose coefficients Bk are the samples of the so called impulse response of the filter (they depend of what kind of filtering you want to do).
Convolutions are bread-and-butter of signal processing, and for that reason it is extremely important to find fast ways of multiplying two polynomials of possibly very large degrees.
In signal processing these degrees can be greater than 1000.
COMP3121/9101 21 / 35
AN IMPORTANT DIGRESSION:
For example, if you have an audio signal and you want to emphasise the bass sounds, you would pass the sequence of discrete samples of the signal through a digital filter which amplifies the low frequencies more than the medium and the high audio frequencies.
This is accomplished by computing the linear convolution of the sequence of discrete samples of the signal with a sequence of values which correspond to that filter, called the impulse response of the filter.
This means that the samples of the output sound are simply the coefficients of the product of two polynomials:
1 polynomial PA(x) whose coefficients Ai are the samples of the input signal;
2 polynomial PB(x) whose coefficients Bk are the samples of the so called impulse response of the filter (they depend of what kind of filtering you want to do).
Convolutions are bread-and-butter of signal processing, and for that reason it is extremely important to find fast ways of multiplying two polynomials of possibly very large degrees.
In signal processing these degrees can be greater than 1000.
This is the main reason for us to study methods of fast computation of convolutions (aside of finding products of large integers, which is what we are doing at the moment).
COMP3121/9101 21 / 35
Coefficient vs value representation of polynomials
Every polynomial PA(x) of degree p is uniquely determined by its values at any p + 1 distinct input values x0,x1,…,xp:
PA(x) ↔ {(x0, PA(x0)), (x1, PA(x1)), . . . , (xp, PA(xp))}
COMP3121/9101 22 / 35
Coefficient vs value representation of polynomials
Every polynomial PA(x) of degree p is uniquely determined by its values at any p + 1 distinct input values x0,x1,…,xp:
PA(x) ↔ {(x0, PA(x0)), (x1, PA(x1)), . . . , (xp, PA(xp))}
For PA(x) = Apxp + Ap−1xp−1 + . . . + A0, these values can be obtained via a
matrix multiplication:
1 x0 x20 … xp0 A0 PA(x0)
2p
1 x1 x1 … x1 A1 PA(x1)
. . . . . . = . . . . . . . . .
1 x p x 2p . . . x p p A p P A ( x p )
(1)
COMP3121/9101
22 / 35
Coefficient vs value representation of polynomials
Every polynomial PA(x) of degree p is uniquely determined by its values at any p + 1 distinct input values x0,x1,…,xp:
PA(x) ↔ {(x0, PA(x0)), (x1, PA(x1)), . . . , (xp, PA(xp))}
For PA(x) = Apxp + Ap−1xp−1 + . . . + A0, these values can be obtained via a
matrix multiplication:
1 x0 x20 … xp0 A0 PA(x0)
2p
1 x1 x1 … x1 A1 PA(x1)
. . . . . . = . . . . . . . . .
1 x p x 2p . . . x p p A p P A ( x p )
It can be shown that if xi are all distinct then this matrix is invertible.
(1)
COMP3121/9101
22 / 35
Coefficient vs value representation of polynomials
Every polynomial PA(x) of degree p is uniquely determined by its values at any p + 1 distinct input values x0,x1,…,xp:
PA(x) ↔ {(x0, PA(x0)), (x1, PA(x1)), . . . , (xp, PA(xp))}
For PA(x) = Apxp + Ap−1xp−1 + . . . + A0, these values can be obtained via a
matrix multiplication:
1 x0 x20 … xp0 A0 PA(x0)
2p
1 x1 x1 … x1 A1 PA(x1)
. . . . . . = . . . . . . . . .
1 x p x 2p . . . x p p A p P A ( x p )
It can be shown that if xi are all distinct then this matrix is invertible.
Such a matrix is called the Vandermonde matrix.
(1)
COMP3121/9101
22 / 35
Coefficient vs value representation of polynomials – ctd.
Thus, if all xi are all distinct, given any values PA(x0), PA(x1), . . . , PA(xp) the coefficientsA0,A1,…,Ap ofthepolynomialPA(x)areuniquelydetermined:
(2)
A0 1 x0 x20 … xp0 −1PA(x0) 2p
A11×1 x1 …x1PA(x1) . = . . . . . . .…...
A p 1 x p x 2p . . . x p p P A ( x p )
COMP3121/9101
23 / 35
Coefficient vs value representation of polynomials – ctd.
Thus, if all xi are all distinct, given any values PA(x0), PA(x1), . . . , PA(xp) the coefficientsA0,A1,…,Ap ofthepolynomialPA(x)areuniquelydetermined:
(2)
A0 1 x0 x20 … xp0 −1PA(x0) 2p
A11×1 x1 …x1PA(x1) . = . . . . . . .…...
A p 1 x p x 2p . . . x p p P A ( x p ) • Equations (1) and (2) show how we can commute between:
COMP3121/9101
23 / 35
Coefficient vs value representation of polynomials – ctd.
Thus, if all xi are all distinct, given any values PA(x0), PA(x1), . . . , PA(xp) the coefficientsA0,A1,…,Ap ofthepolynomialPA(x)areuniquelydetermined:
(2)
A0 1 x0 x20 … xp0 −1PA(x0) 2p
A11×1 x1 …x1PA(x1) . = . . . . . . .…...
A p 1 x p x 2p . . . x p p P A ( x p )
• Equations (1) and (2) show how we can commute between:
1 a representation of a polynomial PA(x) via its coefficients
Ap, Ap−1, . . . , A0, i.e. PA(x) = Apxp + . . . + A1x + A0
COMP3121/9101
23 / 35
Coefficient vs value representation of polynomials – ctd.
Thus, if all xi are all distinct, given any values PA(x0), PA(x1), . . . , PA(xp) the coefficientsA0,A1,…,Ap ofthepolynomialPA(x)areuniquelydetermined:
(2)
A0 1 x0 x20 … xp0 −1PA(x0) 2p
A11×1 x1 …x1PA(x1) . = . . . . . . .…...
A p 1 x p x 2p . . . x p p P A ( x p )
• Equations (1) and (2) show how we can commute between:
1 a representation of a polynomial PA(x) via its coefficients
Ap, Ap−1, . . . , A0, i.e. PA(x) = Apxp + . . . + A1x + A0
2 a representation of a polynomial PA(x) via its values
PA(x) ↔ {(x0, PA(x0)), (x1, PA(x1)), . . . , (xp, PA(xp))}
COMP3121/9101
23 / 35
Coefficient vs value representation of polynomials- ctd.
If we fix the inputs x0, x1, . . . , xp then commuting between a representation of a polynomial PA(x) via its coefficients and a representation via its values at these points is done via the following two matrix multiplications, with matrices made up from constants:
PA(x0) 1 x0 x20 … xp0 A0 2p
PA(x1) 1 x1 x1 … x1 A1 . = . . . . . . ; .…...
P A ( x p ) 1 x p x 2p . . . x p p A p
COMP3121/9101
24 / 35
Coefficient vs value representation of polynomials- ctd.
If we fix the inputs x0, x1, . . . , xp then commuting between a representation of a polynomial PA(x) via its coefficients and a representation via its values at these points is done via the following two matrix multiplications, with matrices made up from constants:
PA(x0) 1 x0 x20 … xp0 A0 2p
PA(x1) 1 x1 x1 … x1 A1 . = . . . . . . ; .…...
P A ( x p ) 1 x p x 2p . . . x p p A p A0 1 x0 x20 … xp0 −1PA(x0)
2p A11×1 x1 …x1PA(x1)
. = . . . . . . . .…...
A p 1 x p x 2p . . . x p p P A ( x p )
COMP3121/9101
24 / 35
Coefficient vs value representation of polynomials- ctd.
If we fix the inputs x0, x1, . . . , xp then commuting between a representation of a polynomial PA(x) via its coefficients and a representation via its values at these points is done via the following two matrix multiplications, with matrices made up from constants:
PA(x0) 1 x0 x20 … xp0 A0 2p
PA(x1) 1 x1 x1 … x1 A1 . = . . . . . . ; .…...
P A ( x p ) 1 x p x 2p . . . x p p A p A0 1 x0 x20 … xp0 −1PA(x0)
2p A11×1 x1 …x1PA(x1)
. = . . . . . . . .…...
A p 1 x p x 2p . . . x p p P A ( x p )
Thus, for fixed input values x0, . . . , xp this switch between the two kinds of
representations is done in linear time!
COMP3121/9101 24 / 35
Our strategy to multiply polynomials fast:
1 Given two polynomials of degree at most p,
PA(x)=Apxp +…+A0; PB(x)=Bpxp +…+B0
COMP3121/9101 25 / 35
Our strategy to multiply polynomials fast:
1 Given two polynomials of degree at most p,
PA(x)=Apxp +…+A0; PB(x)=Bpxp +…+B0
convert them into value representation at 2p + 1 distinct points x0, x1, . . . , x2p: PA(x) ↔ {(x0, PA(x0)), (x1, PA(x1)), . . . , (x2p, PA(x2p))}
PB(x) ↔ {(x0,PB(x0)),(x1,PB(x1)),…,(x2p,PB(x2p))}
COMP3121/9101 25 / 35
Our strategy to multiply polynomials fast:
1 Given two polynomials of degree at most p,
PA(x)=Apxp +…+A0; PB(x)=Bpxp +…+B0
convert them into value representation at 2p + 1 distinct points x0, x1, . . . , x2p: PA(x) ↔ {(x0, PA(x0)), (x1, PA(x1)), . . . , (x2p, PA(x2p))}
PB(x) ↔ {(x0,PB(x0)),(x1,PB(x1)),…,(x2p,PB(x2p))}
Note: since the product of the two polynomials will be of degree 2p we need the values of PA(x) and PB(x) at 2p+1 points, rather than just p + 1 points!
COMP3121/9101 25 / 35
Our strategy to multiply polynomials fast:
1 Given two polynomials of degree at most p,
PA(x)=Apxp +…+A0; PB(x)=Bpxp +…+B0
convert them into value representation at 2p + 1 distinct points x0, x1, . . . , x2p: PA(x) ↔ {(x0, PA(x0)), (x1, PA(x1)), . . . , (x2p, PA(x2p))}
PB(x) ↔ {(x0,PB(x0)),(x1,PB(x1)),…,(x2p,PB(x2p))}
Note: since the product of the two polynomials will be of degree 2p we need the values of PA(x) and PB(x) at 2p+1 points, rather than just p + 1 points!
2 Multiply these two polynomials point-wise, using 2p + 1 multiplications only.
COMP3121/9101 25 / 35
Our strategy to multiply polynomials fast:
1 Given two polynomials of degree at most p,
PA(x)=Apxp +…+A0; PB(x)=Bpxp +…+B0
convert them into value representation at 2p + 1 distinct points x0, x1, . . . , x2p: PA(x) ↔ {(x0, PA(x0)), (x1, PA(x1)), . . . , (x2p, PA(x2p))}
PB(x) ↔ {(x0,PB(x0)),(x1,PB(x1)),…,(x2p,PB(x2p))}
Note: since the product of the two polynomials will be of degree 2p we need the values of PA(x) and PB(x) at 2p+1 points, rather than just p + 1 points!
2 Multiply these two polynomials point-wise, using 2p + 1 multiplications only.
PA(x)PB(x) ↔ {(x0,PA(x0)PB(x0)),(x1,PA(x1)PB(x1)),…,(x2p,PA(x2p)PB(x2p))}
PC (x0) PC (x1)
PC (x2p)
COMP3121/9101
25 / 35
Our strategy to multiply polynomials fast:
1 Given two polynomials of degree at most p,
PA(x)=Apxp +…+A0; PB(x)=Bpxp +…+B0
convert them into value representation at 2p + 1 distinct points x0, x1, . . . , x2p: PA(x) ↔ {(x0, PA(x0)), (x1, PA(x1)), . . . , (x2p, PA(x2p))}
PB(x) ↔ {(x0,PB(x0)),(x1,PB(x1)),…,(x2p,PB(x2p))}
Note: since the product of the two polynomials will be of degree 2p we need the values of PA(x) and PB(x) at 2p+1 points, rather than just p + 1 points!
2 Multiply these two polynomials point-wise, using 2p + 1 multiplications only.
PA(x)PB(x) ↔ {(x0,PA(x0)PB(x0)),(x1,PA(x1)PB(x1)),…,(x2p,PA(x2p)PB(x2p))}
PC (x0) PC (x1) PC (x2p)
3 Convert such value representation of PC(x) = PA(x)PB(x) back to coefficient
form
PC(x) = C2px2p + C2p−1x2p−1 + … + C1x + C0;
COMP3121/9101 25 / 35
Fast multiplication of polynomials – continued
What values should we choose for x0, x1, . . . , x2p??
COMP3121/9101 26 / 35
Fast multiplication of polynomials – continued
What values should we choose for x0, x1, . . . , x2p?? Key idea: use 2p + 1 smallest possible integer values!
COMP3121/9101 26 / 35
Fast multiplication of polynomials – continued
What values should we choose for x0, x1, . . . , x2p?? Key idea: use 2p + 1 smallest possible integer values!
{−p, −(p−1), … , −1, 0, 1, … , p−1, p}
COMP3121/9101 26 / 35
Fast multiplication of polynomials – continued
What values should we choose for x0, x1, . . . , x2p?? Key idea: use 2p + 1 smallest possible integer values!
{−p, −(p−1), … , −1, 0, 1, … , p−1, p}
So we find the values PA(m) and PB(m) for all m such that −p ≤ m ≤ p.
COMP3121/9101
26 / 35
Fast multiplication of polynomials – continued
What values should we choose for x0, x1, . . . , x2p?? Key idea: use 2p + 1 smallest possible integer values!
{−p, −(p−1), … , −1, 0, 1, … , p−1, p}
So we find the values PA(m) and PB(m) for all m such that −p ≤ m ≤ p.
Remember that p + 1 is the number of slices we split the input numbers A, B.
COMP3121/9101 26 / 35
Fast multiplication of polynomials – continued
What values should we choose for x0, x1, . . . , x2p?? Key idea: use 2p + 1 smallest possible integer values!
{−p, −(p−1), … , −1, 0, 1, … , p−1, p}
So we find the values PA(m) and PB(m) for all m such that −p ≤ m ≤ p. Remember that p + 1 is the number of slices we split the input numbers A, B.
Multiplication of a large number with k bits by a constant integer d can be done in time linear in k because it is reducible to d − 1 additions:
d·A=A+A+…+A
d
COMP3121/9101 26 / 35
Fast multiplication of polynomials – continued
What values should we choose for x0, x1, . . . , x2p?? Key idea: use 2p + 1 smallest possible integer values!
{−p, −(p−1), … , −1, 0, 1, … , p−1, p}
So we find the values PA(m) and PB(m) for all m such that −p ≤ m ≤ p. Remember that p + 1 is the number of slices we split the input numbers A, B.
Multiplication of a large number with k bits by a constant integer d can be done in time linear in k because it is reducible to d − 1 additions:
d·A=A+A+…+A
d
Thus, all the values
PA(m) = Apmp +Ap−1mp−1 +···+A0 :
PB(m)=Bpmp +Bp−1mp−1 +···+B0 :
can be found in time linear in the number of bits of the input numbers!
−p ≤ m ≤ p,
−p≤m≤p.
COMP3121/9101
26 / 35
Fast multiplication of polynomials – ctd.
We now perform 2p + 1 multiplications of large numbers to obtain PA(−p)PB(−p), … , PA(−1)PB(−1), PA(0)PB(0), PA(1)PB(1), … , PA(p)PB(p)
COMP3121/9101 27 / 35
Fast multiplication of polynomials – ctd.
We now perform 2p + 1 multiplications of large numbers to obtain PA(−p)PB(−p), … , PA(−1)PB(−1), PA(0)PB(0), PA(1)PB(1), … , PA(p)PB(p)
For PC(x) = PA(x)PB(x) these products are 2p + 1 many values of PC(x): PC(−p) = PA(−p)PB(−p), … , PC(0) = PA(0)PB(0), … , PC(p) = PA(p)PB(p)
COMP3121/9101 27 / 35
Fast multiplication of polynomials – ctd.
We now perform 2p + 1 multiplications of large numbers to obtain PA(−p)PB(−p), … , PA(−1)PB(−1), PA(0)PB(0), PA(1)PB(1), … , PA(p)PB(p)
For PC(x) = PA(x)PB(x) these products are 2p + 1 many values of PC(x): PC(−p) = PA(−p)PB(−p), … , PC(0) = PA(0)PB(0), … , PC(p) = PA(p)PB(p)
Let C0, C1, . . . , C2p be the coefficients of the product polynomial C(x), i.e., let PC(x) = C2px2p + C2p−1x2p−1 + ··· + C0,
COMP3121/9101 27 / 35
Fast multiplication of polynomials – ctd.
We now perform 2p + 1 multiplications of large numbers to obtain PA(−p)PB(−p), … , PA(−1)PB(−1), PA(0)PB(0), PA(1)PB(1), … , PA(p)PB(p)
For PC(x) = PA(x)PB(x) these products are 2p + 1 many values of PC(x): PC(−p) = PA(−p)PB(−p), … , PC(0) = PA(0)PB(0), … , PC(p) = PA(p)PB(p)
Let C0, C1, . . . , C2p be the coefficients of the product polynomial C(x), i.e., let PC(x) = C2px2p + C2p−1x2p−1 + ··· + C0,
We now have:
C2p(−p)2p + C2p−1(−p)2p−1 + · · · + C0 = PC (−p)
C2p(−(p − 1))2p + C2p−1(−(p − 1))2p−1 + ··· + C0 = PC(−(p − 1))
.
C2p(p−1)2p +C2p−1(p−1)2p−1 +···+C0 = PC(p−1)
C2pp2p + C2p−1p2p−1 + ··· + C0 = PC(p) COMP3121/9101
27 / 35
Fast multiplication of polynomials – ctd.
This is just a system of linear equations, that can be solved for C0, C1, . . . , C2p: 1 −p (−p)2 … (−p)2p C0 PC(−p)
2 2p
1 −(p−1) (−(p−1)) … (−(p−1)) C1 PC(−(p−1))
. . . . . . . …...=., …....
1 p−1 (p−1)2 … (p−1)2p C2p−1 PC(p−1)
1 p p2 … p2p
C2p PC (p)
COMP3121/9101
28 / 35
1 −p (−p)2 … (−p)2p −1 PC(−p)
Fast multiplication of polynomials – ctd.
This is just a system of linear equations, that can be solved for C0, C1, . . . , C2p: 1 −p (−p)2 … (−p)2p C0 PC(−p)
2 2p
1 −(p−1) (−(p−1)) … (−(p−1)) C1 PC(−(p−1))
. . . . . . . …...=., …....
1 p−1 (p−1)2 … (p−1)2p C2p−1 PC(p−1)
1 p p2 … p2p i.e., we can obtain C0,C1,…,C2p as
C2p PC (p)
C0 2 2p
1 −(p−1) (−(p−1)) … (−(p−1)) PC(−(p−1))
C1 …... .=…....
.…...
. 1 p−1 (p−1)2 … (p−1)2p C2p 1 p p2 … p2p
PC(p−1) PC(p)
COMP3121/9101
28 / 35
1 −p (−p)2 … (−p)2p −1 PC(−p)
Fast multiplication of polynomials – ctd.
This is just a system of linear equations, that can be solved for C0, C1, . . . , C2p: 1 −p (−p)2 … (−p)2p C0 PC(−p)
2 2p
1 −(p−1) (−(p−1)) … (−(p−1)) C1 PC(−(p−1))
. . . . . . . …...=., …....
1 p−1 (p−1)2 … (p−1)2p C2p−1 PC(p−1)
1 p p2 … p2p i.e., we can obtain C0,C1,…,C2p as
C2p PC (p)
C0 2 2p
1 −(p−1) (−(p−1)) … (−(p−1)) PC(−(p−1))
C1 …... .=…....
.…... . 1 p−1 (p−1)2 … (p−1)2p PC(p−1)
C2p 1 p p2 … p2p PC(p) But the inverse matrix also involves only constants depending on p only;
COMP3121/9101 28 / 35
1 −p (−p)2 … (−p)2p −1 PC(−p)
. 1 p−1 (p−1)2 … (p−1)2p PC(p−1) C2p 1 p p2 … p2p PC(p)
Fast multiplication of polynomials – ctd.
This is just a system of linear equations, that can be solved for C0, C1, . . . , C2p: 1 −p (−p)2 … (−p)2p C0 PC(−p)
2 2p
1 −(p−1) (−(p−1)) … (−(p−1)) C1 PC(−(p−1))
. . . . . . . …...=., …....
1 p−1 (p−1)2 … (p−1)2p C2p−1 PC(p−1)
1 p p2 … p2p i.e., we can obtain C0,C1,…,C2p as
C2p PC (p)
C0 2 2p
1 −(p−1) (−(p−1)) … (−(p−1)) PC(−(p−1))
C1 …... .=…....
.…...
But the inverse matrix also involves only constants depending on p only; Thus the coefficients Ci can be obtained in linear time.
COMP3121/9101 28 / 35
1 −p (−p)2 … (−p)2p −1 PC(−p)
. 1 p−1 (p−1)2 … (p−1)2p PC(p−1) C2p 1 p p2 … p2p PC(p)
Fast multiplication of polynomials – ctd.
This is just a system of linear equations, that can be solved for C0, C1, . . . , C2p: 1 −p (−p)2 … (−p)2p C0 PC(−p)
2 2p
1 −(p−1) (−(p−1)) … (−(p−1)) C1 PC(−(p−1))
. . . . . . . …...=., …....
1 p−1 (p−1)2 … (p−1)2p C2p−1 PC(p−1)
1 p p2 … p2p i.e., we can obtain C0,C1,…,C2p as
C2p PC (p)
C0 2 2p
1 −(p−1) (−(p−1)) … (−(p−1)) PC(−(p−1))
C1 …... .=…....
.…...
But the inverse matrix also involves only constants depending on p only; Thus the coefficients Ci can be obtained in linear time.
So here is the algorithm we have just described:
COMP3121/9101 28 / 35
1: 2: 3: 4:
5:
6: 7: 8: 9: 10:
function Mult(A, B)
if |A| = |B| < p+1 then return AB else
obtain p + 1 slices A0,A1,...,Ap and B0,B1,...,Bp such that A = Ap2pk +Ap−12(p−1)k +...+A0 B = Bp2pk +Bp−12(p−1)k +...+B0
11: 12: 13: 14:
form polynomials
PA(x) = Apxp + Ap−1x(p−1) + ... + A0
PB(x) = Bpxp + Bp−1x(p−1) + ... + B0
for m = −p to m = p do
compute PA(m) and PB(m); PC(m) ← Mult(PA(m)PB(m))
end for
compute C0, C1, . . . C2p via
C01−p (−p)2...(−p)2p−1PC(−p)
2 2p
C1 1 −(p − 1) (−(p − 1)) . . . (−(p − 1)) PC (−(p − 1))
= .
.. . . . . . ....... .......
C2p 1 p p2 ... p2p
formPC(x)=C2px2p+...+C0 andcomputePC(2k)
return PC (2k ) = A · B end if
end function
PC(p)
COMP3121/9101
29 / 35
How fast is our algorithm?
COMP3121/9101 30 / 35
How fast is our algorithm?
it is easy to see that the values of the two polynomials we are multiplying have at most k + s bits where s is a constant which depends on p but does NOT depend on k:
PA(m) = Apmp +Ap−1mp−1 +···+A0 : −p ≤ m ≤ p. This is because each Ai is smaller than 2k because each Ak has k bits; thus
|PA(m)| < pp(p+1)×2k ⇒ log2 |PA(m)| < log2(pp(p+1))+k = s+k
COMP3121/9101 30 / 35
How fast is our algorithm?
it is easy to see that the values of the two polynomials we are multiplying have at most k + s bits where s is a constant which depends on p but does NOT depend on k:
PA(m) = Apmp +Ap−1mp−1 +···+A0 : −p ≤ m ≤ p. This is because each Ai is smaller than 2k because each Ak has k bits; thus
|PA(m)| < pp(p+1)×2k ⇒ log2 |PA(m)| < log2(pp(p+1))+k = s+k
Thus, we have reduced a multiplication of two k(p + 1) digit numbers to 2p + 1 multiplications of k + s digit numbers plus a linear overhead (of additions splitting the numbers etc.)
COMP3121/9101 30 / 35
How fast is our algorithm?
it is easy to see that the values of the two polynomials we are multiplying have at most k + s bits where s is a constant which depends on p but does NOT depend on k:
PA(m) = Apmp +Ap−1mp−1 +···+A0 : −p ≤ m ≤ p. This is because each Ai is smaller than 2k because each Ak has k bits; thus
|PA(m)| < pp(p+1)×2k ⇒ log2 |PA(m)| < log2(pp(p+1))+k = s+k
Thus, we have reduced a multiplication of two k(p + 1) digit numbers to 2p + 1 multiplications of k + s digit numbers plus a linear overhead (of additions splitting the numbers etc.)
So we get the following recurrence for the complexity of Mult(A, B):
COMP3121/9101 30 / 35
How fast is our algorithm?
it is easy to see that the values of the two polynomials we are multiplying have at most k + s bits where s is a constant which depends on p but does NOT depend on k:
PA(m) = Apmp +Ap−1mp−1 +···+A0 : −p ≤ m ≤ p. This is because each Ai is smaller than 2k because each Ak has k bits; thus
|PA(m)| < pp(p+1)×2k ⇒ log2 |PA(m)| < log2(pp(p+1))+k = s+k
Thus, we have reduced a multiplication of two k(p + 1) digit numbers to 2p + 1 multiplications of k + s digit numbers plus a linear overhead (of additions splitting the numbers etc.)
So we get the following recurrence for the complexity of Mult(A, B):
T ((p + 1)k) = (2p + 1)T (k + s) + c k
COMP3121/9101 30 / 35
How fast is our algorithm?
it is easy to see that the values of the two polynomials we are multiplying have at most k + s bits where s is a constant which depends on p but does NOT depend on k:
PA(m) = Apmp +Ap−1mp−1 +···+A0 : −p ≤ m ≤ p. This is because each Ai is smaller than 2k because each Ak has k bits; thus
|PA(m)| < pp(p+1)×2k ⇒ log2 |PA(m)| < log2(pp(p+1))+k = s+k
Thus, we have reduced a multiplication of two k(p + 1) digit numbers to 2p + 1 multiplications of k + s digit numbers plus a linear overhead (of additions splitting the numbers etc.)
So we get the following recurrence for the complexity of Mult(A, B):
Let n = (p+1)k. Then
T ((p + 1)k) = (2p + 1)T (k + s) + c k
nc T(n)=(2p+1)T +s + n
p+1 p+1
a
b
COMP3121/9101
30 / 35
How fast is our algorithm?
it is easy to see that the values of the two polynomials we are multiplying have at most k + s bits where s is a constant which depends on p but does NOT depend on k:
PA(m) = Apmp +Ap−1mp−1 +···+A0 : −p ≤ m ≤ p. This is because each Ai is smaller than 2k because each Ak has k bits; thus
|PA(m)| < pp(p+1)×2k ⇒ log2 |PA(m)| < log2(pp(p+1))+k = s+k
Thus, we have reduced a multiplication of two k(p + 1) digit numbers to 2p + 1 multiplications of k + s digit numbers plus a linear overhead (of additions splitting the numbers etc.)
So we get the following recurrence for the complexity of Mult(A, B):
Let n = (p+1)k. Then
T ((p + 1)k) = (2p + 1)T (k + s) + c k
nc T(n)=(2p+1)T +s + n
p+1 p+1
Since s is constant, its impact can be neglected. COMP3121/9101
30 / 35
a
b
How fast is our algorithm?
T(n)=(2p+1)T n +s+ c n
a
p+1 p+1
b
Since logb a = logp+1(2p + 1) > 1, we can choose a small ε such that also logb a−ε > 1.
COMP3121/9101
31 / 35
How fast is our algorithm?
T(n)=(2p+1)T n +s+ c n
a
p+1 p+1
b
Since logb a = logp+1(2p + 1) > 1, we can choose a small ε such that also logb a−ε > 1.
Consequently, for such an ε we would have f (n) = c/(p + 1) n = O nlogb a−ε .
COMP3121/9101 31 / 35
How fast is our algorithm?
T(n)=(2p+1)T n +s+ c n
a
p+1 p+1
b
Since logb a = logp+1(2p + 1) > 1, we can choose a small ε such that also logb a−ε > 1.
Consequently, for such an ε we would have f (n) = c/(p + 1) n = O nlogb a−ε . Thus, with a = 2p+1 and b = p+1 the first case of the Master Theorem applies;
COMP3121/9101 31 / 35
How fast is our algorithm?
T(n)=(2p+1)T n +s+ c n
a
p+1 p+1
b
Since logb a = logp+1(2p + 1) > 1, we can choose a small ε such that also logb a−ε > 1.
Consequently, for such an ε we would have f (n) = c/(p + 1) n = O nlogb a−ε . Thus, with a = 2p+1 and b = p+1 the first case of the Master Theorem applies; so we get:
T(n) = Θnlogb a = Θnlogp+1(2p+1)
COMP3121/9101 31 / 35
Note that
nlogp+1(2p+1) < nlogp+1 2(p+1) = nlogp+1 2+logp+1(p+1)
=n1+log
p+1
2 =n1+
1 log2 (p+1)
COMP3121/9101
32 / 35
Note that
nlogp+1(2p+1) < nlogp+1 2(p+1) = nlogp+1 2+logp+1(p+1)
=n1+log
2 =n1+ 1 p+1 log2 (p+1)
Thus, by choosing a sufficiently large p, we can get a run time arbitrarily close to linear time!
COMP3121/9101 32 / 35
Note that
=n1+log
Thus, by choosing a sufficiently large p, we can get a run time arbitrarily
close to linear time!
How large does p have to be, in order to to get an algorithm which runs in time n1.1?
nlogp+1(2p+1) < nlogp+1 2(p+1) = nlogp+1 2+logp+1(p+1)
2 =n1+ 1 p+1 log2 (p+1)
COMP3121/9101 32 / 35
Note that
=n1+log
Thus, by choosing a sufficiently large p, we can get a run time arbitrarily
close to linear time!
How large does p have to be, in order to to get an algorithm which runs
in time n1.1?
1.11+1 1110
log2(p + 1)
nlogp+1(2p+1) < nlogp+1 2(p+1) = nlogp+1 2+logp+1(p+1)
n =n log2(p+1) →
= → p+1=2 10
2 =n1+ 1 p+1 log2 (p+1)
COMP3121/9101
32 / 35
Note that
=n1+log
Thus, by choosing a sufficiently large p, we can get a run time arbitrarily
close to linear time!
How large does p have to be, in order to to get an algorithm which runs
nlogp+1(2p+1) < nlogp+1 2(p+1) = nlogp+1 2+logp+1(p+1)
2 =n1+ 1 p+1 log2 (p+1)
in time n1.1?
1.11+1 1110
n =n log2(p+1) → = → p+1=2 log2(p + 1) 10
Thus, we would have to slice the input numbers into 210 = 1024 pieces!!
COMP3121/9101 32 / 35
We would have to evaluate polynomials PA(x) and PB(x) both of degree p at values up to p.
COMP3121/9101 33 / 35
We would have to evaluate polynomials PA(x) and PB(x) both of degree p at values up to p.
However, p = 210, so evaluating PA(p) = Appp + . . . + A0 involves multiplication of Ap with pp = (210 )210 ≈ 1.27 × 103079 .
COMP3121/9101
33 / 35
We would have to evaluate polynomials PA(x) and PB(x) both of degree p at values up to p.
However, p = 210, so evaluating PA(p) = Appp + . . . + A0 involves multiplication of Ap with pp = (210 )210 ≈ 1.27 × 103079 .
Thus, while evaluations of PA(x) and PB(x) for x = −p...p can theoretically all be done in linear time, T (p) = c p, the constant c is absolutely humongous.
COMP3121/9101 33 / 35
We would have to evaluate polynomials PA(x) and PB(x) both of degree p at values up to p.
However, p = 210, so evaluating PA(p) = Appp + . . . + A0 involves multiplication of Ap with pp = (210 )210 ≈ 1.27 × 103079 .
Thus, while evaluations of PA(x) and PB(x) for x = −p...p can theoretically all be done in linear time, T (p) = c p, the constant c is absolutely humongous.
Consequently, slicing the input numbers in more than just a few slices results in a hopelessly slow algorithm, despite the fact that the asymptotic bounds improve as we increase the number of slices!
COMP3121/9101 33 / 35
We would have to evaluate polynomials PA(x) and PB(x) both of degree p at values up to p.
However, p = 210, so evaluating PA(p) = Appp + . . . + A0 involves multiplication of Ap with pp = (210 )210 ≈ 1.27 × 103079 .
Thus, while evaluations of PA(x) and PB(x) for x = −p...p can theoretically all be done in linear time, T (p) = c p, the constant c is absolutely humongous.
Consequently, slicing the input numbers in more than just a few slices results in a hopelessly slow algorithm, despite the fact that the asymptotic bounds improve as we increase the number of slices!
The moral is: In practice, asymptotic estimates are useless if the size of the constants hidden by the O-notation are not estimated and found to be reasonably small!!!
COMP3121/9101 33 / 35
Crucial question: Are there numbers x0, x1, . . . , xp such that the size of xpi does not grow uncontrollably?
COMP3121/9101 34 / 35
Crucial question: Are there numbers x0, x1, . . . , xp such that the size of xpi does not grow uncontrollably?
Answer: YES; they are the complex numbers zi lying on the unit circle, i.e., such that |zi| = 1!
COMP3121/9101 34 / 35
Crucial question: Are there numbers x0, x1, . . . , xp such that the size of xpi does not grow uncontrollably?
Answer: YES; they are the complex numbers zi lying on the unit circle, i.e., such that |zi| = 1!
This motivates us to consider values of polynomials at inputs which are equally spaced complex numbers all lying on the unit circle.
COMP3121/9101 34 / 35
Crucial question: Are there numbers x0, x1, . . . , xp such that the size of xpi does not grow uncontrollably?
Answer: YES; they are the complex numbers zi lying on the unit circle, i.e., such that |zi| = 1!
This motivates us to consider values of polynomials at inputs which are equally spaced complex numbers all lying on the unit circle.
The sequence of such values is called the discrete Fourier transform (DFT) of the sequence of the coefficients of the polynomial being evaluated.
COMP3121/9101 34 / 35
Crucial question: Are there numbers x0, x1, . . . , xp such that the size of xpi does not grow uncontrollably?
Answer: YES; they are the complex numbers zi lying on the unit circle, i.e., such that |zi| = 1!
This motivates us to consider values of polynomials at inputs which are equally spaced complex numbers all lying on the unit circle.
The sequence of such values is called the discrete Fourier transform (DFT) of the sequence of the coefficients of the polynomial being evaluated.
We will present a very fast algorithm for computing these values, called the Fast Fourier Transform, abbreviated as FFT.
COMP3121/9101 34 / 35
Crucial question: Are there numbers x0, x1, . . . , xp such that the size of xpi does not grow uncontrollably?
Answer: YES; they are the complex numbers zi lying on the unit circle, i.e., such that |zi| = 1!
This motivates us to consider values of polynomials at inputs which are equally spaced complex numbers all lying on the unit circle.
The sequence of such values is called the discrete Fourier transform (DFT) of the sequence of the coefficients of the polynomial being evaluated.
We will present a very fast algorithm for computing these values, called the Fast Fourier Transform, abbreviated as FFT.
The Fast Fourier Transform is the most executed algorithm today and is thus arguably the most important algorithm of all.
COMP3121/9101 34 / 35
Crucial question: Are there numbers x0, x1, . . . , xp such that the size of xpi does not grow uncontrollably?
Answer: YES; they are the complex numbers zi lying on the unit circle, i.e., such that |zi| = 1!
This motivates us to consider values of polynomials at inputs which are equally spaced complex numbers all lying on the unit circle.
The sequence of such values is called the discrete Fourier transform (DFT) of the sequence of the coefficients of the polynomial being evaluated.
We will present a very fast algorithm for computing these values, called the Fast Fourier Transform, abbreviated as FFT.
The Fast Fourier Transform is the most executed algorithm today and is thus arguably the most important algorithm of all.
Every mobile phone performs thousands of FFT runs each second, for example to compress your speech signal or to compress images taken by your camera, to mention just a few uses of the FFT.
COMP3121/9101 34 / 35
PUZZLE!
The warden meets with 23 new prisoners when they arrive. He tells them, “You may meet today and plan a strategy. But after today, you will be in isolated cells and will have no communication with one another. In the prison there is a switch room, which contains two light switches labeled A and B, each of which can be in either the on or the off position. I am not telling you their present positions. The switches are not connected to anything. After today, from time to time whenever I feel so inclined, I will select one prisoner at random and escort him to the switch room. This prisoner will select one of the two switches and reverse its position. He must move one, but only one of the switches. He can’t move both but he can’t move none either. Then he will be led back to his cell. No one else will enter the switch room until I lead the next prisoner there, and he’ll be instructed to do the same thing. I’m going to choose prisoners at random. I may choose the same guy three times in a row, or I may jump around and come back. But, given enough time, everyone would eventually visit the switch room many times. At any time anyone of you may declare to me: “We have all visited the switch room. If it is true, then you will all be set free. If it is false, and somebody has not yet visited the switch room, you will be fed to the alligators.”
What is the strategy the prisoners can devise to gain their freedom?
COMP3121/9101 35 / 35