Contents
1 Introduction 17
1.1 History and Systems . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.1.1 The ‘calculus’ side . . . . . . . . . . . . . . . . . . . . . . 19
1.1.2 The ‘group theory’ side . . . . . . . . . . . . . . . . . . . 20
1.1.3 A synthesis? . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2 Expansion and Simplification . . . . . . . . . . . . . . . . . . . . 20
1.2.1 A Digression on “Functions” . . . . . . . . . . . . . . . . 22
1.2.2 Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.2.3 Simplification . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.2.4 An example of simplification . . . . . . . . . . . . . . . . 27
1.2.5 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.3 Algebraic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.3.1 Algebraic Closures . . . . . . . . . . . . . . . . . . . . . . 31
1.4 Some Complexity Theory . . . . . . . . . . . . . . . . . . . . . . 32
1.4.1 Complexity Hierarchy . . . . . . . . . . . . . . . . . . . . 34
1.4.2 Probabilistic Algorithms . . . . . . . . . . . . . . . . . . . 34
1.5 Some Maple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.5.1 Maple polynomials . . . . . . . . . . . . . . . . . . . . . . 35
1.5.2 Maple rational functions . . . . . . . . . . . . . . . . . . . 35
1.5.3 The RootOf construct . . . . . . . . . . . . . . . . . . . . 36
1.5.4 Active and Inert Functions . . . . . . . . . . . . . . . . . 36
1.5.5 The simplify command . . . . . . . . . . . . . . . . . . . . 38
1.5.6 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2 Polynomials 41
2.1 What are polynomials? . . . . . . . . . . . . . . . . . . . . . . . . 41
2.1.1 How do we manipulate polynomials? . . . . . . . . . . . . 43
2.1.2 Polynomials in one variable . . . . . . . . . . . . . . . . . 43
2.1.3 A factored representation . . . . . . . . . . . . . . . . . . 48
2.1.4 Polynomials in several variables . . . . . . . . . . . . . . . 49
2.1.5 Other representations . . . . . . . . . . . . . . . . . . . . 51
2.1.6 The Newton Representation . . . . . . . . . . . . . . . . . 54
2.1.7 Representations in Practice . . . . . . . . . . . . . . . . . 56
2.1.8 Comparative Sizes . . . . . . . . . . . . . . . . . . . . . . 57
1
2 CONTENTS
2.2 Rational Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.2.1 Canonical Rational Functions . . . . . . . . . . . . . . . . 59
2.2.2 Candidness of rational functions . . . . . . . . . . . . . . 60
2.3 Greatest Common Divisors . . . . . . . . . . . . . . . . . . . . . 61
2.3.1 Polynomials in one variable . . . . . . . . . . . . . . . . . 62
2.3.2 Subresultant sequences . . . . . . . . . . . . . . . . . . . . 66
2.3.3 The Extended Euclidean Algorithm . . . . . . . . . . . . 68
2.3.4 Partial Fractions . . . . . . . . . . . . . . . . . . . . . . . 70
2.3.5 Polynomials in several variables . . . . . . . . . . . . . . . 70
2.3.6 Square-free decomposition . . . . . . . . . . . . . . . . . . 72
2.3.7 Sparse Complexity . . . . . . . . . . . . . . . . . . . . . . 74
2.4 Non-commutative polynomials . . . . . . . . . . . . . . . . . . . 75
3 Polynomial Equations 77
3.1 Equations in One Variable . . . . . . . . . . . . . . . . . . . . . . 77
3.1.1 Quadratic Equations . . . . . . . . . . . . . . . . . . . . . 77
3.1.2 Cubic Equations . . . . . . . . . . . . . . . . . . . . . . . 78
3.1.3 Quartic Equations . . . . . . . . . . . . . . . . . . . . . . 80
3.1.4 Higher Degree Equations . . . . . . . . . . . . . . . . . . 80
3.1.5 Reducible defining polynomials . . . . . . . . . . . . . . . 81
3.1.6 Multiple Algebraic Numbers . . . . . . . . . . . . . . . . . 82
3.1.7 Solutions in Real Radicals . . . . . . . . . . . . . . . . . . 83
3.1.8 Equations of curves . . . . . . . . . . . . . . . . . . . . . 83
3.1.9 How many Real Roots? . . . . . . . . . . . . . . . . . . . 85
3.1.10 Thom’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . 88
3.2 Linear Equations in Several Variables . . . . . . . . . . . . . . . 89
3.2.1 Linear Equations and Matrices . . . . . . . . . . . . . . . 89
3.2.2 Representations of Matrices . . . . . . . . . . . . . . . . . 90
3.2.3 Matrix Inverses: not a good idea! . . . . . . . . . . . . . . 91
3.2.4 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.2.5 Over/under-determined Systems . . . . . . . . . . . . . . 97
3.3 Nonlinear Multivariate Equations: Distributed . . . . . . . . . . 98
3.3.1 Gröbner Bases . . . . . . . . . . . . . . . . . . . . . . . . 101
3.3.2 How many Solutions? . . . . . . . . . . . . . . . . . . . . 103
3.3.3 Orderings . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.3.4 Complexity of Gröbner Bases . . . . . . . . . . . . . . . . 107
3.3.5 A Matrix Formulation . . . . . . . . . . . . . . . . . . . . 111
3.3.6 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.3.7 The Gianni–Kalkbrener Theorem . . . . . . . . . . . . . . 114
3.3.8 The Faugère–Gianni–Lazard–Mora Algorithm . . . . . . . 117
3.3.9 The Gröbner Walk . . . . . . . . . . . . . . . . . . . . . . 120
3.3.10 Factorization and Gröbner Bases . . . . . . . . . . . . . . 124
3.3.11 The Shape Lemma . . . . . . . . . . . . . . . . . . . . . . 124
3.3.12 The Hilbert function . . . . . . . . . . . . . . . . . . . . . 126
3.3.13 Comprehensive Gröbner Bases and Systems . . . . . . . . 126
3.3.14 Coe�cients other than fields . . . . . . . . . . . . . . . . 128
CONTENTS 3
3.3.15 Non-commutative Ideals . . . . . . . . . . . . . . . . . . . 129
3.4 Nonlinear Multivariate Equations: Recursive . . . . . . . . . . . 130
3.4.1 Triangular Sets and Regular Chains . . . . . . . . . . . . 130
3.4.2 Zero Dimension . . . . . . . . . . . . . . . . . . . . . . . . 131
3.4.3 Positive Dimension . . . . . . . . . . . . . . . . . . . . . . 132
3.4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 135
3.4.5 Regular Decomposition . . . . . . . . . . . . . . . . . . . 136
3.5 Equations and Inequalities . . . . . . . . . . . . . . . . . . . . . . 136
3.5.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . 137
3.5.2 Real Radical . . . . . . . . . . . . . . . . . . . . . . . . . 138
3.5.3 Quantifier Elimination . . . . . . . . . . . . . . . . . . . . 138
3.5.4 Algebraic Decomposition . . . . . . . . . . . . . . . . . . 140
3.5.5 Cylindrical Algebraic Decomposition . . . . . . . . . . . . 143
3.5.6 Computing Algebraic Decompositions . . . . . . . . . . . 146
3.5.7 Describing Solutions . . . . . . . . . . . . . . . . . . . . . 148
3.5.8 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 152
3.5.9 Further Observations . . . . . . . . . . . . . . . . . . . . . 153
3.6 Virtual Term Substitution . . . . . . . . . . . . . . . . . . . . . . 155
3.6.1 The Weak Case . . . . . . . . . . . . . . . . . . . . . . . . 155
3.6.2 The Strict Case . . . . . . . . . . . . . . . . . . . . . . . . 156
3.6.3 Nested Quantifiers . . . . . . . . . . . . . . . . . . . . . . 157
3.6.4 Universal quantifiers . . . . . . . . . . . . . . . . . . . . . 158
3.6.5 Complexity of VTS . . . . . . . . . . . . . . . . . . . . . . 159
3.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4 Modular Methods 161
4.1 Matrices: a Simple Example . . . . . . . . . . . . . . . . . . . . . 162
4.1.1 Matrices with integer coe�cients: Determinants . . . . . 163
4.1.2 Matrices with polynomial coe�cients: Determinants . . . 164
4.1.3 Conclusion: Determinants . . . . . . . . . . . . . . . . . . 165
4.1.4 Linear Equations with integer coe�cients . . . . . . . . . 165
4.1.5 Linear Equations with polynomial coe�cients . . . . . . . 166
4.1.6 Conclusion: Linear Equations . . . . . . . . . . . . . . . . 166
4.1.7 Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . 167
4.2 Gcd in one variable . . . . . . . . . . . . . . . . . . . . . . . . . . 167
4.2.1 Bounds on divisors . . . . . . . . . . . . . . . . . . . . . . 168
4.2.2 The modular – integer relationship . . . . . . . . . . . . . 169
4.2.3 Computing the g.c.d.: one large prime . . . . . . . . . . . 171
4.2.4 Computing the g.c.d.: several small primes . . . . . . . . 173
4.2.5 Computing the g.c.d.: early success . . . . . . . . . . . . . 175
4.2.6 An alternative correctness check . . . . . . . . . . . . . . 176
4.2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.3 Polynomials in two variables . . . . . . . . . . . . . . . . . . . . . 178
4.3.1 Degree Growth in Coe�cients . . . . . . . . . . . . . . . . 178
4.3.2 The evaluation–interpolation relationship . . . . . . . . . 180
4.3.3 G.c.d. in Zp[x, y] . . . . . . . . . . . . . . . . . . . . . . . 181
4 CONTENTS
4.3.4 G.c.d. in Z[x, y] . . . . . . . . . . . . . . . . . . . . . . . 184
4.4 Polynomials in several variables . . . . . . . . . . . . . . . . . . . 186
4.4.1 A worked example . . . . . . . . . . . . . . . . . . . . . . 187
4.4.2 Converting this to an algorithm . . . . . . . . . . . . . . . 189
4.4.3 Worked example continued . . . . . . . . . . . . . . . . . 190
4.4.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 194
4.5 Further Applications . . . . . . . . . . . . . . . . . . . . . . . . . 195
4.5.1 Resultants and Discriminants . . . . . . . . . . . . . . . . 195
4.5.2 Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . 195
4.6 Gröbner Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
4.6.1 General Considerations . . . . . . . . . . . . . . . . . . . 199
4.6.2 The Hilbert Function and reduction . . . . . . . . . . . . 200
4.6.3 The Modular Algorithm . . . . . . . . . . . . . . . . . . . 202
4.6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 203
4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
5 p-adic Methods 207
5.1 Introduction to the factorization problem . . . . . . . . . . . . . 207
5.2 Modular methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
5.2.1 The Musser test . . . . . . . . . . . . . . . . . . . . . . . 209
5.3 Factoring modulo a prime . . . . . . . . . . . . . . . . . . . . . . 212
5.3.1 Berlekamp’s small p method . . . . . . . . . . . . . . . . . 212
5.3.2 The Cantor–Zassenhaus method . . . . . . . . . . . . . . 214
5.3.3 Berlekamp’s large p method . . . . . . . . . . . . . . . . . 215
5.4 From Zp to Z? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
5.5 Hensel Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
5.5.1 Linear Hensel Lifting . . . . . . . . . . . . . . . . . . . . . 218
5.5.2 Quadratic Hensel Lifting . . . . . . . . . . . . . . . . . . . 220
5.5.3 Quadratic Hensel Lifting Improved . . . . . . . . . . . . . 222
5.5.4 Hybrid Hensel Lifting . . . . . . . . . . . . . . . . . . . . 225
5.6 The recombination problem . . . . . . . . . . . . . . . . . . . . . 226
5.7 Univariate Factoring Solved . . . . . . . . . . . . . . . . . . . . . 227
5.8 Multivariate Factoring . . . . . . . . . . . . . . . . . . . . . . . . 229
5.8.1 A “Good Reduction” Complexity Result . . . . . . . . . . 230
5.8.2 A Sparsity Result . . . . . . . . . . . . . . . . . . . . . . 231
5.8.3 The Leading Coe�cient Problem . . . . . . . . . . . . . . 231
5.9 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 233
5.9.1 Factoring Straight-LIne Programs . . . . . . . . . . . . . 233
5.9.2 p-adic Greatest Common Divisors . . . . . . . . . . . . . 233
5.9.3 p-adic Gröbner Bases . . . . . . . . . . . . . . . . . . . . 234
5.9.4 p-adic determinants . . . . . . . . . . . . . . . . . . . . . 235
5.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
CONTENTS 5
6 Algebraic Numbers and Functions 237
6.1 Representations of Finite Fields . . . . . . . . . . . . . . . . . . . 239
6.1.1 Additive Representation . . . . . . . . . . . . . . . . . . . 239
6.1.2 Multiplicative representation . . . . . . . . . . . . . . . . 240
6.2 Representations of Algebraic Numbers . . . . . . . . . . . . . . . 241
6.3 Factorisation with Algebraic Numbers . . . . . . . . . . . . . . . 242
6.4 The D5 approach to algebraic numbers . . . . . . . . . . . . . . . 243
6.5 Distinguishing roots . . . . . . . . . . . . . . . . . . . . . . . . . 243
7 Calculus 245
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
7.2 Integration of Rational Expressions . . . . . . . . . . . . . . . . . 247
7.2.1 Integration of Proper Rational Expressions . . . . . . . . 247
7.2.2 Hermite’s Algorithm . . . . . . . . . . . . . . . . . . . . . 248
7.2.3 The Ostrogradski–Horowitz Algorithm . . . . . . . . . . . 249
7.2.4 The Trager–Rothstein Algorithm . . . . . . . . . . . . . . 250
7.3 Theory: Liouville’s Theorem . . . . . . . . . . . . . . . . . . . . . 253
7.3.1 Liouville’s Principle . . . . . . . . . . . . . . . . . . . . . 255
7.3.2 Finding L . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
7.3.3 Risch Structure Theorem . . . . . . . . . . . . . . . . . . 257
7.3.4 Overview of Integration . . . . . . . . . . . . . . . . . . . 257
7.4 Integration of Logarithmic Expressions . . . . . . . . . . . . . . . 259
7.4.1 The Polynomial Part . . . . . . . . . . . . . . . . . . . . . 260
7.4.2 The Rational Expression Part . . . . . . . . . . . . . . . . 260
7.4.3 Conclusion of Logarithmic Integration . . . . . . . . . . . 261
7.5 Integration of Exponential Expressions . . . . . . . . . . . . . . . 263
7.5.1 The Polynomial Part . . . . . . . . . . . . . . . . . . . . . 264
7.5.2 The Rational Expression Part . . . . . . . . . . . . . . . . 264
7.6 Integration of Algebraic Expressions . . . . . . . . . . . . . . . . 268
7.7 The Risch Di↵erential Equation Problem . . . . . . . . . . . . . 268
7.8 The Parallel Approach . . . . . . . . . . . . . . . . . . . . . . . . 271
7.8.1 The Parallel Approach: Algebraic Expressions . . . . . . 272
7.9 Definite Integration . . . . . . . . . . . . . . . . . . . . . . . . . . 272
7.10 Other Calculus Problems . . . . . . . . . . . . . . . . . . . . . . 273
7.10.1 Indefinite summation . . . . . . . . . . . . . . . . . . . . . 273
7.10.2 Definite Symbolic Summation . . . . . . . . . . . . . . . . 273
7.10.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
7.10.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
8 Algebra versus Analysis 275
8.1 Functions and Formulae . . . . . . . . . . . . . . . . . . . . . . . 275
8.2 Branch Cuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
8.2.1 Some Unpleasant Facts . . . . . . . . . . . . . . . . . . . 277
8.2.2 The Problem with Square Roots . . . . . . . . . . . . . . 278
8.2.3 Possible Solutions . . . . . . . . . . . . . . . . . . . . . . 278
8.2.4 Removable Branch Cuts . . . . . . . . . . . . . . . . . . . 281
6 CONTENTS
8.3 Fundamental Theorem of Calculus Revisited . . . . . . . . . . . . 282
8.4 Constants Revisited . . . . . . . . . . . . . . . . . . . . . . . . . 282
8.4.1 Constants can be useful . . . . . . . . . . . . . . . . . . . 283
8.4.2 Constants are often troubling . . . . . . . . . . . . . . . . 283
8.5 Integrating ‘real’ Functions . . . . . . . . . . . . . . . . . . . . . 283
8.6 Logarithms revisited . . . . . . . . . . . . . . . . . . . . . . . . . 285
8.7 Other decision questions . . . . . . . . . . . . . . . . . . . . . . . 285
8.8 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
8.8.1 A Definite Integral . . . . . . . . . . . . . . . . . . . . . . 288
A Algebraic Background 291
A.1 The resultant and friends . . . . . . . . . . . . . . . . . . . . . . 291
A.1.1 Resultant . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
A.1.2 Discriminants . . . . . . . . . . . . . . . . . . . . . . . . . 294
A.1.3 Iterated Operations . . . . . . . . . . . . . . . . . . . . . 294
A.2 Useful Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
A.2.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
A.2.2 Coe�cients of a polynomial . . . . . . . . . . . . . . . . . 296
A.2.3 Roots of a polynomial . . . . . . . . . . . . . . . . . . . . 297
A.2.4 Root separation . . . . . . . . . . . . . . . . . . . . . . . . 299
A.2.5 Developments . . . . . . . . . . . . . . . . . . . . . . . . . 299
A.3 Chinese Remainder Theorem . . . . . . . . . . . . . . . . . . . . 302
A.4 Chinese Remainder Theorem for Polynomials . . . . . . . . . . . 303
A.5 Vandermonde Systems . . . . . . . . . . . . . . . . . . . . . . . . 304
A.6 More matrix theory . . . . . . . . . . . . . . . . . . . . . . . . . 306
A.7 Algebraic Structures . . . . . . . . . . . . . . . . . . . . . . . . . 307
B Excursus 309
B.1 The Budan–Fourier Theorem . . . . . . . . . . . . . . . . . . . . 309
B.2 Equality of factored polynomials . . . . . . . . . . . . . . . . . . 310
B.3 Karatsuba’s method . . . . . . . . . . . . . . . . . . . . . . . . . 312
B.3.1 Karatsuba’s method in practice . . . . . . . . . . . . . . . 313
B.3.2 Karatsuba’s method and sparse polynomials . . . . . . . . 314
B.3.3 Karatsuba’s method and multivariate polynomials . . . . 314
B.3.4 Faster still . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
B.3.5 Faster division . . . . . . . . . . . . . . . . . . . . . . . . 315
B.3.6 Faster g.c.d. computation . . . . . . . . . . . . . . . . . . 315
B.4 Strassen’s method . . . . . . . . . . . . . . . . . . . . . . . . . . 316
B.4.1 Strassen’s method in practice . . . . . . . . . . . . . . . . 317
B.4.2 Further developments . . . . . . . . . . . . . . . . . . . . 318
B.4.3 Matrix Inversion . . . . . . . . . . . . . . . . . . . . . . . 318
B.5 Cyclotomic Polynomials . . . . . . . . . . . . . . . . . . . . . . . 319
CONTENTS 7
C Systems 321
C.1 Axiom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
C.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
C.1.2 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
C.1.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
C.2 Macsyma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
C.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
C.2.2 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
C.3 Maple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
C.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
C.3.2 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
C.3.3 Data structures . . . . . . . . . . . . . . . . . . . . . . . . 324
C.3.4 Heuristic GCD . . . . . . . . . . . . . . . . . . . . . . . . 327
C.3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 327
C.4 MuPAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
C.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
C.4.2 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
C.5 Reduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
C.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
C.5.2 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
D Index of Notation 331
8 CONTENTS
List of Figures
1.1 Converting Monte Carlo to Las Vegas . . . . . . . . . . . . . . . 35
1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.4 An example of Maple’s RootOf construct . . . . . . . . . . . . . 37
2.1 A polynomial SLP . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.2 Code fragment A — a graph . . . . . . . . . . . . . . . . . . . . 54
2.3 Code fragment B — a tree . . . . . . . . . . . . . . . . . . . . . . 54
2.4 DAG representation . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.5 Tree representation . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.6 Maple’s Original Polynomials . . . . . . . . . . . . . . . . . . . . 57
2.7 Maple’s New-Style Polynomials . . . . . . . . . . . . . . . . . . . 57
2.8 Subresultant p.r.s. algorithm . . . . . . . . . . . . . . . . . . . . 67
3.1 Program for computing solutions to a cubic . . . . . . . . . . . . 79
3.2 Program for computing solutions to a quartic . . . . . . . . . . . 80
3.3 x3 � x2 illustrating Thom’s Lemma . . . . . . . . . . . . . . . . . 89
3.4 Gianni–Kalkbrener Algorithm . . . . . . . . . . . . . . . . . . . . 116
3.5 Algorithm 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
3.6 Body of Algorithm 15 . . . . . . . . . . . . . . . . . . . . . . . . 123
3.7 Cylindrical Decomposition after Collins . . . . . . . . . . . . . . 146
3.8 y3 � 7 y2 + 14 y � x� 8: Thom’ Lemma . . . . . . . . . . . . . . 149
3.9 y3 � 7 y2 + 14 y � x� 8: indexing . . . . . . . . . . . . . . . . . . 150
4.1 Diagrammatic illustration of Modular Algorithms . . . . . . . . . 161
4.2 Diagrammatic illustration of Algorithm 17 . . . . . . . . . . . . . 171
4.3 Algorithm 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
4.4 Diagrammatic illustration of Algorithm 18 . . . . . . . . . . . . . 175
4.5 “Early termination” g.c.d. code . . . . . . . . . . . . . . . . . . . 176
4.6 Algorithm 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
4.7 Diagrammatic illustration of Algorithm 21 . . . . . . . . . . . . . 182
4.8 Algorithm 21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.9 Diagrammatic illustration of g.c.d.s in Z[x, y] (1) . . . . . . . . . 184
4.10 Diagrammatic illustration of g.c.d.s in Z[x, y] (2) . . . . . . . . . 185
9
10 LIST OF FIGURES
4.11 Diagrammatic illustration of sparse g.c.d. . . . . . . . . . . . . . 190
4.12 Algorithm 22: Sparse g.c.d. . . . . . . . . . . . . . . . . . . . . . 191
4.13 Algorithm 23: Inner sparse g.c.d. . . . . . . . . . . . . . . . . . . 191
4.14 Algorithm 24: Sparse g.c.d. from skeleton . . . . . . . . . . . . . 192
4.15 f from section 4.4.3 . . . . . . . . . . . . . . . . . . . . . . . . . 192
4.16 g from section 4.4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . 193
4.17 Algorithm 26 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
5.1 Diagrammatic illustration of Hensel Algorithms . . . . . . . . . . 207
5.2 Algorithm 27; Berlekamp for small p . . . . . . . . . . . . . . . . 214
5.3 Algorithm28: Distinct Degree Factorization . . . . . . . . . . . . 215
5.4 Algorithm29: Split a Distinct Degree Factorization . . . . . . . . 216
5.5 Algorithm 30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
5.6 Algorithm 31 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
5.7 Algorithm 32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
5.8 Algorithm 33 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
5.9 Algorithm 34 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
5.10 Algorithm 35: Combine Modular Factors . . . . . . . . . . . . . . 226
5.11 Overview of Factoring Algorithm . . . . . . . . . . . . . . . . . . 228
5.12 Algorithm 37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
6.1 Non-candidness of algebraics . . . . . . . . . . . . . . . . . . . . 238
6.2 Algebraic numbers in the denominator . . . . . . . . . . . . . . . 238
6.3 An unspecified field in Maple . . . . . . . . . . . . . . . . . . . . 240
6.4 Primitive elements in Maple . . . . . . . . . . . . . . . . . . . . . 241
6.5 An evaluation of Maple’s RootOf construct . . . . . . . . . . . . 242
7.1 Algorithm 41: IntLog–Polynomial . . . . . . . . . . . . . . . . . . 261
7.2 Algorithm 42: IntLog–Rational Expression . . . . . . . . . . . . . 262
7.3 Algorithm 43: IntExp–Polynomial . . . . . . . . . . . . . . . . . 265
7.4 Algorithm 44: IntExp–Rational Expression . . . . . . . . . . . . 266
8.1 A Riemann surface example: log . . . . . . . . . . . . . . . . . . 280
8.2 plot3d(C, x =-4..4, y=-4..4): C from (8.20) . . . . . . . . . 284
8.3 Graph of apparent integral in (8.22) . . . . . . . . . . . . . . . . 286
C.1 Axiom output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
C.2 Axiom type system . . . . . . . . . . . . . . . . . . . . . . . . . . 323
C.3 Macsyma output . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
C.4 Maple output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
C.5 Tree for A, B corresponding to table C.1 . . . . . . . . . . . . . . 326
C.6 Tree for A, B corresponding to table C.2 . . . . . . . . . . . . . . 327
C.7 MuPAD output . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
C.8 Reduce output . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
LIST OF FIGURES 11
List of Algorithms
1 Miller–Rabin primality testing . . . . . . . . . . . . . . . . . . . . . 34
2 Euclid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3 General g.c.d. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4 Subresultant p.r.s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5 Extended Euclidean . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6 General extended p.r.s. . . . . . . . . . . . . . . . . . . . . . . . . . 69
7 Bivariate g.c.d. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8 Sturm Sequence evaluation . . . . . . . . . . . . . . . . . . . . . . . 86
9 Buchberger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
10 Gianni–Kalkbrener . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
11 Gianni–Kalkbrener Step . . . . . . . . . . . . . . . . . . . . . . . . . 116
12 FGLM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
14 Extended Buchberger . . . . . . . . . . . . . . . . . . . . . . . . . . 121
15 Gröbner Walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
16 Modular Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . 166
17 Modular GCD (Large prime version) . . . . . . . . . . . . . . . . . . 171
18 Modular GCD (Small prime version) . . . . . . . . . . . . . . . . . . 174
19 Modular GCD (Alternative small prime version) . . . . . . . . . . . 177
20 Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
21 Bivariate Modular GCD . . . . . . . . . . . . . . . . . . . . . . . . . 183
22 Sparse g.c.d. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
23 Inner sparse g.c.d. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
24 Sparse g.c.d. from skeleton . . . . . . . . . . . . . . . . . . . . . . . 192
25 Farey Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
26 Modular Gröbner base . . . . . . . . . . . . . . . . . . . . . . . . . . 204
27 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
28 Distinct Degree Factorization . . . . . . . . . . . . . . . . . . . . . . 215
29 Split a Distinct Degree Factorization . . . . . . . . . . . . . . . . . . 216
30 Univariate Hensel Lifting (Linear Two Factor version) . . . . . . . . 220
31 Univariate Hensel Lifting (Linear version) . . . . . . . . . . . . . . . 221
32 Univariate Hensel Lifting (Quadratic Two Factor version) . . . . . . 222
33 Univariate Hensel Lifting (Quadratic version) . . . . . . . . . . . . . 223
34 Univariate Hensel Lifting (Improved Quadratic Two Factor version) 224
35 Combine Modular Factors . . . . . . . . . . . . . . . . . . . . . . . . 226
36 Factor over Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
37 Multivariate Hensel Lifting (Linear version) . . . . . . . . . . . . . . 230
38 Wang’s EEZ Hensel Lifting . . . . . . . . . . . . . . . . . . . . . . . 232
39 Trager–Rothstein . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
40 Integration Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . 255
41 IntLog–Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
42 IntLog–Rational Expression . . . . . . . . . . . . . . . . . . . . . . . 262
43 IntExp–Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
44 IntExp–Rational Expression . . . . . . . . . . . . . . . . . . . . . . . 266
12 LIST OF FIGURES
46 resultant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
47 Chinese Remainder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
48 Chinese Remainder (Polynomial form) . . . . . . . . . . . . . . . . . 302
49 Chinese Remainder for Polynomials . . . . . . . . . . . . . . . . . . 303
50 Chinese Remainder (Multivariate) . . . . . . . . . . . . . . . . . . . 304
51 Vandermonde solver . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
52 Vandermonde variant solver . . . . . . . . . . . . . . . . . . . . . . . 306
LIST OF FIGURES 13
List of Open Problems
1 Algebra of O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2 Sparse gcd (strong) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3 Sparse gcd (weak) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4 Roots of Sparse Polynomials . . . . . . . . . . . . . . . . . . . . . . 87
5 fg + 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7 Sparse Gröbner Bases . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8 Complexity of the FGLM Algorithm (I) . . . . . . . . . . . . . . . . 118
9 Complexity of the FGLM Algorithm (II) . . . . . . . . . . . . . . . . 118
10 Coe�cient growth in the FGLM Algorithm . . . . . . . . . . . . . . 119
11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
12 Better treatment of division . . . . . . . . . . . . . . . . . . . . . . . 137
13 RAG Formulation 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
14 Not all CADs are outputs of our algorithms . . . . . . . . . . . . . . 154
15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
16 Matrix Determinant costs . . . . . . . . . . . . . . . . . . . . . . . . 165
17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
18 Alternative Route for Bivariate Polynomial g.c.d. . . . . . . . . . . . 179
19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
20 Bad reductions in Zippel’s algorithm . . . . . . . . . . . . . . . . . . 194
21 Modular Gröbner Bases for Inhomogeneous Ideals . . . . . . . . . . 203
22 Reconstructed Bases might not be Gröbner . . . . . . . . . . . . . . 203
23 Evaluate [vH02] against [ASZ00] . . . . . . . . . . . . . . . . . . . . 227
24 Better Choice of ‘Best’ Prime . . . . . . . . . . . . . . . . . . . . . . 229
25 Low-degree Factorization . . . . . . . . . . . . . . . . . . . . . . . . 229
26 p-adic Gröbner bases . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
27 Algebraic Numbers Reviewed . . . . . . . . . . . . . . . . . . . . . . 243
28 Crossings of factored polynomials . . . . . . . . . . . . . . . . . . . . 311
14 LIST OF FIGURES
Preface
This text is under active development, especially due to comments from col-
leagues, notably the Bath Mathematical Foundations seminar that listened to
an explanation of section 4.4, and students on the course CM30070 — Com-
puter Algebra at the University of Bath. David Wilson has been a most helpful
recorder of errors during this, and he and Matthew England have signalled sev-
eral problems. David Stoutemyer has been a very helpful reader. I am grateful
to John Abbott (Genoa) for the polynomial f on page 208, and for the encour-
agement of people at CICM 2010 to write Chapter 8. I am grateful to Professor
Vorobjov for the material in Excursus B.4.3. Professor Sergei Abramov (Rus-
sian Academy of Sciences) has been a helpful reader. As always when I write, I
am grateful to David Carlisle for his TEXnical wisdom.
It is probably best cited as “J.H. Davenport, Computer Algebra. http:
//staff.bath.ac.uk/masjhd/JHD-CA.pdf” with a date.
Mathematical Prerequisites
This book has been used to support teaching at the University of Bath to both
Mathematics and Computer Science Students. The main requirement is “math-
ematical maturity” rather than specific facts: some specific pre-requisites are
listed in Table 1. If the students are not familiar with the O notation, it would
be as well for the lecturer to explain this, at whatever level of sophistication
is appropriate: this book is pretty unsophisticated in its use, and all that is
needed is in section 1.4.
Table 1: Specific Prerequisites
Chapter(s) Subjects
3.2.1 Matrices
4,5 Modular arithmetic
7, 8 Calculus
15
http://staff.bath.ac.uk/masjhd/JHD-CA.pdf
http://staff.bath.ac.uk/masjhd/JHD-CA.pdf
16 LIST OF FIGURES
Changes in Academic Year 2016–17
4.10.2016 Tweaked Maple’s expand on page 23. Added section 1.5.2.
9.10.2016 Wrote sections 1.2.5 and 1.5.6. Added Observation 3.
24.10.2016 Fix various typos and clarified Definition def:sparsebitsize (thanks
to a vigilant student). Added sections 3.6.4, 3.6.5. Tidied notation in
Appendix A.
Chapter 1
Introduction
Computer algebra, the use of computers to do algebra rather than simply arith-
metic, is almost as old as computing itself, with the first theses [Kah53, Nol53]
dating back to 1953. Indeed it was anticipated from the time of Babbage, when
Ada Augusta, Countess of Lovelace, wrote
We may say most aptly that the Analytical Engine weaves alge-
braical patterns just as the Jacquard loom weaves flowers and leaves.
[Ada43]
In fact, it is not even algebra for which we need software packages, computers
by themselves can’t actually do arithmetic: only a limited subset of it. If we ask
Excel1 to compute e⇡
p
163�262537412640768744, we will be told that the answer
is 256. More mysteriously, if we go back and look at the formula in the cell, we
see that it is now e⇡
p
163�262537412640768800. In fact, 262537412640768744 is
too large a whole number (or integer, as mathematicians say) for Excel to han-
dle, and it has converted it into floating-point (what Excel terms “scientific”)
notation. Excel, or any other software using the IEEE standard [IEE85] repre-
sentation for floating-point numbers, can only store them to a given accuracy,
about2 16 decimal places.3 In fact, it requires twice this precision to show that
e⇡
p
163 6= 262537412640768744. Since e⇡
p
163 = (�1)
p
�163, it follows from deep
results of transcendental number theory [Bak75], that not only is e⇡
p
163 not
an integer, it is not a fraction (or rational number), nor even the solution of a
polynomial equation with integer coe�cients: essentially it is a ‘new’ number.
1Or any similar software package.
2We say ‘about’ since the internal representation is binary, rather than decimal.
3In fact, Excel is more complicated even than this, as the calculations in this table show.
i 1 2 3 4 . . . 10 11 12 . . . 15 16
a 10i 10 100 1000 1. . . 0 . . . 1. . . 0 1011 1012 . . . 1015 1016
b a-1 9 99 999 9999 9. . . 9 . . . 9. . . 9 1012 . . . 1015 1016
c a-b 1 1 1 1 1 . . . 1 1 . . . 1 0
We can see that the printing changes at 12 decimal digits, but that actual accuracy is not lost
until we subtract 1 from 1016.
17
18 CHAPTER 1. INTRODUCTION
Definition 1 A number n, or more generally any algebraic object, is said to be
transcendental over a ring R if there is no non-zero polynomial p with coe�-
cients in R such that p(n) = 0.
With this definition, we can say that e⇡
p
163 is transcendental over the inte-
gers. Transcendence is a deep mathematical question with implications for the
decidability of computer algebra, see page 287.
We will see throughout this book (for an early example, see page 66) that
innocent-seeming problems can give rise to numbers far greater than one would
expect. Hence there is a requirement to deal with numbers larger, or to greater
precision, than is provided by the hardware manufacturers whose chips under-
lie our computers. Practically every computer algebra system, therefore, has
a package for manipulating arbitrary-sized integers (so-called bignums) or real
numbers (bigfloats). These arbitrary-sized objects, and the fact that mathemat-
ical objects in general are unpredictably sized, means that computer algebra
systems generally need sophisticated memory management, generally garbage
collection (see [JHM11] for a general discussion of this topic).
But the fact that the systems can deal with large numbers does not mean
that we should let numbers increase without doing anything. If we have two
numbers with n digits, adding them requires a time proportional to n, or in
more formal language (see section 1.4) a time O(n). Multiplying them4 requires
a time O(n2). Calculating a g.c.d., which is fundamental in the calculation of
rational numbers, requires O(n3), or O(n2) with a bit of care5. This implies that
if the numbers become 10 times longer, the time is multiplied by 10, or by 100,
or by 1000. So it is always worth reducing the size of these integers. We will
see later (an early example is on page 161) that much ingenuity has been well-
spent in devising algorithms to compute “obvious” quantities by “non-obvious”
ways which avoid, or reduce, the use of large numbers. The phrase intermediate
expression swell is used to denote the phenomenon where intermediate quantites
are much larger than the input to, or outputs from, a calculation.
Notation 1 We write algorithms in an Algol-like notation, with the Rutishauser
symbol := to indicate assignment, and = (as opposed to C’s ==) to indicate the
equality predicate. We use indentation to indicate grouping6, rather than clut-
ter the text with begin . . . end. Comments are introduced by the # character,
running to the end of the line.
4In principle, O(n logn log logn) is enough [AHU74, Chapter 8], but no computer algebra
system routinely uses this, for it is more like 20n logn log logn. However, most systems will
use ‘Karatsuba arithmetic’ (see [KO63, and section B.3]), which takes O(nlog2 3 ⇡ n1.585),
once the numbers reach an appropriate length, often 16 words [SGV94].
5In principle, O(n log2 n log logn) [AHU74, Chapter 8], but again no system uses it rou-
tinely, but this combined with Karatsuba gives O(nlog2 3 logn) (section B.3.6), and this is
commonly used.
6An idea which was tried in Axiom [JS92], but which turns out to be better in books than
in real life.
1.1. HISTORY AND SYSTEMS 19
1.1 History and Systems
The first recorded use of computers to do computations of the sort we envisage
was in 1951 [MW51], where 180
�
2127 � 1
�2 � 1, a 79-digit number, was shown
to be prime. In 1952 the great mathematician Emil Artin had the equally great
computer scientist John von Neumann perform an extensive calculation relating
to elliptic curves on the MANIAC computer [ST92, p. 119]. In 1953, two theses
[Kah53, Nol53] kicked o↵ the ‘calculus’ side of computer algebra with programs
to di↵erentiate expressions. In the same year, [Has53] showed that algorithms
in group theory could be implemented on computers.
1.1.1 The ‘calculus’ side
The initial work [Kah53, Nol53] consisted of programs to do one thing, but the
focus soon moved on to ‘systems’, capable of doing a variety of tasks. One
early one was Collins’ system SAC [Col71], written in Fortran. Its descendants
continue in SAC-2 [Col85] and QEPCAD [Bro03]7. Of course, computer algebra
systems can be written in any computer language, even Cobol [fHN76].
However, many of the early systems were written in LISP, largely because
of its support for garbage collection and large integers. The group at M.I.T.,
very influential in the 1960s and 70s, developed Macsyma [MF71, PW85]. This
system now exists in several versions [Ano07], even on mobile ’phones8. At about
the same time, Hearn developed Reduce [Hea05], and shortly after a group at
IBM Yorktown Heights produced SCRATCHPAD [GJY75]. This group then
produced AXIOM [JS92], a system that attempted to match the generality of
mathematics with ‘generic programming’ to allow algorithms to be programmed
in the generality in which they are conventionally (as in this book) stated, e.g.
“polynomials over a ring” as in Definition 22.
These were, on the whole, very large software systems, and attempts were
made to produce smaller ones. muMATH [RS79] and its successor Derive [RS92]
were extremely successful systems on the early PC, and paved the way for the
computer algebra facilities of many high-end calculators. Much of the work on
“compact computer algebra” is described in [Sto11b].
Maple [CGGG83] pioneered a ‘kernel+library’ design.
The basic Maple system, the kernel , is a relatively small collection of
compiled C code. When a Maple session is started, the entire kernel
is loaded. It contains the essential facilities required to run Maple
and perform basic mathematical operations. These components in-
clude the Maple programming language interpreter, arithmetic and
simplification routines, print routines, memory management facili-
ties, and a collection of fundamental functions. [MGH+03, p. 6]
7Also now available on mobile ’phones: https://sites.google.com/site/
maximaonandroid/.
8https://sites.google.com/site/maximaonandroid/.
https://sites.google.com/site/maximaonandroid/
https://sites.google.com/site/maximaonandroid/
https://sites.google.com/site/maximaonandroid/
20 CHAPTER 1. INTRODUCTION
1.1.2 The ‘group theory’ side
Meanwhile, those interested in computation group theory, and related topics,
had not been idle. One major system developed during the 1970s/80s was CAY-
LEY [BC90]. This team later looked at Axiom, and built the system MAGMA
[BCM94], again with a strong emphasis on genericity. Another popular system
is GAP [BL98], whose ‘kernel+library’ design was consciously [Neu95] inspired
by Maple.
The reader may well ask ‘why two di↵erent schools of thought?’ The au-
thor has often asked himself the same question. The di↵erence seems one of
mathematical attitude, if one can call it that. The designer of a calculus system
envisages it being used to compute an integral, factor a polynomial, multiply
two matrices, or otherwise operate on a mathematical datum. The designer of
a group theory system, while he will permit the user to multiply, say, permuta-
tions or matrices, does not regard this as the object of the system: rather the
object is to manipulate whole groups (etc.) of permutations (or matrices, or
. . .), i.e. a mathematical structure.
1.1.3 A synthesis?
While it is too early to say that the division has been erased, it can be seen
that MAGMA, for example, while firmly rooted in the group-theory tradition,
has many more ‘calculus like’ features. Conversely, the interest in polynomial
ideals, as in Proposition 33, means that systems specialising in this direction,
such as SINGULAR [Sch03b] or COCOA [GN90], use polynomial algorithms,
but the items of interest are the mathematical structures such as ideals rather
than the individual polynomials.
1.2 Expansion and Simplification
These two words, or the corresponding verbs ‘expand’ and ‘simplify’ are much
used (abused?) in computer algebra, and a preliminary explanation may be
in order. Computers of course, do not deal in mathematical objects, but,
ultimately, in certain bit patterns which are representations of mathematical
objects. Just as in ink on paper, a mathematical object may have many rep-
resentations, just as most of us would say that x + y and y + x are the same
mathematical object.
Definition 2 A correspondence f between a class O of objects (generally we
think o fthe abstract objects of mathematics) and a class R of representations is
a representation of O by R if each element of O corresponds to one or more ele-
ments of R (otherwise it is not represented) and each element of R corresponds
to one and only one element of O (otherwise we do not know which element of O
is represented). In other words “is represented by”, is the inverse of a surjective
function f or “represents” from a subset of R (the “legal representations”) to
O.
1.2. EXPANSION AND SIMPLIFICATION 21
Notation 2 When discussing the di↵erence between abstract objects and com-
puter representations, we will use mathematical fonts, such as x, for the abstract
objects and typewriter font, such as x, for the representations.
Hence we could represent any mathematical objects, such as polynomials, by
well-formed strings of indeterminates, numbers and the symbols +,-,*,(,).
The condition “well-formed” is necessary to prevent nonsense such as )x+++1y(,
and would typically be defined by some finite grammer [ALSU07, Section 2.2].
With no simplification rules, such a representation would regard x-x as just
that, rather than as zero.
Definition 3 A representation of a monoid (i.e. a set with a 0, and an addition
operation) is said to be normal if the only representation of the object 0 is 0.
If we have a normal representation f , then we can tell if two mathematical
objects a and b, represented by a and b, are equal by computing a-b: if this
is zero (i.e. 0), then a and b are equal, while if this is not zero, they must be
unequal. However, this is an ine�cient method, to put it mildly.
Observation 1 Normal representations are very important in practice, since
many algorithms contain tests for zero/non-zero of elements. Sometimes these
are explicit, as in Gaussian elimination of a matrix, but often they are implicit,
as in Euclid’s Algorithm (Algorithm 2, page 62), where we take the remainder
after dividing one polynomial by another, which in turn requires that we know
the leading non-zero coe�cient of the divisor.
Definition 4 A representation is said to be canonical if every object has only
one representation, i.e. f is a bijection, or 1–1 mapping.
With a canonical representation, we can say that two objects “are the same if,
and only if, they look the same”. For example, we cannot have both (x+1)^2
and x^2+2x+1, since (x+ 1)2 = x2 + 2x+ 1.
Definition 5 A representation is said to be locally canonical (with respect to a
certain context) if every object whose introduction does not change the context
has only one representation, i.e. f (restricted to non context-changing expres-
sions) is a bijection, or 1–1 mapping.
It is possible to build any normal representation into a locally canonical one,
using an idea due to [Bro69]. We store every computed (top-level) expression
e1, e2, . . ., and for a new expression E, we compute the normal form of every
E � ei. If this is zero, then E = ei, so we return ei, otherwise E is a new
expression to be stored. This has various objections, both practical (the storage
and computation time) and conceptual (the answer printed depends on the
context, i.e. the past history of the session), but nevertheless is worth noting.
Another example of locally canonical representations is at page 82. Note
that no guarantee is made about comparing expressions in di↵erent contexts.
22 CHAPTER 1. INTRODUCTION
Definition 6 ([Sto11a, Definition 3]) A candid expression is one that is not
equivalent to an expression that visibly manifests a simpler expression class.
In particular, if the “simpler class” is {0}, “candid” means the same as normal,
but the concept is much more general, and useful. In particular, if a candid
expressions contains a variable v, then it really depends on v. Untyped systems
such as Maple and Macsyma could be candid in this sense9: typed systems like
Axiom have more problems in this area, since subtracting two polynomials in
v will produce a polynomial in v, even if the result in a particular case is, say,
v � (v � 1) = 1, which doesn’t in fact depend on v.
Candidness depends on the class of expressions considered as ‘simpler’. Con-
sider
F : x 7! x�
p
x2 (1.1)
(viewed as a function R! R). It might be tempting to consider F to be zero,
but in fact F (�1) = �2. In fact, F can also be written as x 7! x � |x| or
x 7!
n
0 x � 0
2x x 0 . Is (1.1) therefore candid? It depends on the definition of
“simpler”.
Since practically everyone would agree that {0} is a simple class, candid
expressions are automatically normal, but need not be canonical (see (1.1)),
so this definition provides a useful intermediate ground, which often cöıncides
with näıve concepts of “simpler”. We refer the reader to [Sto11a] for further,
very illustrative, examples, to section 2.2.2 for a discussion of candidness for
rational functions, and to sections 7.3.2 and 7.3.3 for a discussion of candidness
in integration theory.
We should note that ‘candidness’ is not always achievable. One possible
requirement would be “if the final value is real, then no complex numbers should
be involved”, but this is not generally possible: at (3.10) we see that the three
real roots of x3 � x can only be computed via complex numbers. In this case,
they have simple expressions (0, +1, and �1), but that is not true in general:
consider x3 � x� 1
1000
, whose roots are
1
60
3
p
108 + 12 i
p
11999919 + 20 1
3
p
108+12 i
p
11999919
,
� 1
120
3
p
108 + 12 i
p
11999919� 10 1
3
p
108+12 i
p
11999919
+
i
p
3
20
✓
1/6
3
p
108 + 12 i
p
11999919� 200 1
3
p
108+12 i
p
11999919
◆
,
� 1
120
3
p
108 + 12 i
p
11999919� 10 1
3
p
108+12 i
p
11999919
�
i
p
3
20
✓
1/6
3
p
108 + 12 i
p
11999919� 200 1
3
p
108+12 i
p
11999919
◆
.
1.2.1 A Digression on “Functions”
In colloquial mathematics, the word “function” has many uses and meanings,
and one of the distinctive features of computer algebra is that it plays on these
9See section 2.2.2 for other aspects of candidness.
1.2. EXPANSION AND SIMPLIFICATION 23
various uses.
In principle, (pure) mathematics is clear on the meaning of “function”.
On dit qu’un graphe F est un graphe fonctionnel si, pour tout x, il
existe au plus un objet correspondant à x par F (I, p. 40). On dit
qu’une correspondance f = (F,A,B) est une fonction si son graphe
F est un graphe fonctionnel, et si son ensemble de départ A est égal
à son ensemble de définition pr1 F [pr1 is “projection on the first
component”].
[Bou70, p. E.II.13]
The present author’s loose translation is as follows.
We say that a graph (i.e. a subset of A⇥B) F is a functional graph
if, for every x, there is at most one pair (x, y) 2 F [alternatively
(x, y1), (x, y2) 2 F implies y1 = y2]. We then say that a correspon-
dance f = (F,A,B) is a function if its graph F is a functional graph
and the starting set A is equal to pr1 F = {x | 9y(x, y) 2 F}.
So for Bourbaki a function includes the definition of the domain and codomain,
and is total and single-valued.
Notation 3 We will write (F,A,B)B for such a function definition. If C ⇢ A,
we will write (F,A,B)B |C , “(F,A,B)B restricted to C”, for (G,C,B)B where
G = {(x, y) 2 F |x 2 C}.
Hence the function sin is, formally, ({(x, sinx) : x 2 R},R,R)B, or, equally
possibly, ({(x, sinx) : x 2 C},C,C)B. But what about x + 1? We might be
thinking of it as a function, e.g. ({(x, x+ 1) : x 2 R},R,R)B, but equally we
might just be thinking of it as an abstract polynomial, a member of Q[x]. In
fact, the abstract view can be pursued with sin as well, as seen in Chapter 7,
especially Observation 18, where sin is viewed as 1
2i
(✓ � 1/✓) 2 Q(i, x, ✓|✓0 =
i✓). The relationship between the abstract view and the Bourbakist view is
pursued further in Chapter 8.
1.2.2 Expansion
This word is relatively easy to define, as application of the distributive law, as
seen in the first bullet point of Maple’s description of the expand command.
• The expand command distributes products over sums. This is
done for all polynomials. For quotients of polynomials, only
sums in the numerator are expanded; products and powers [in
the denominator10] are left alone.
• The expand command also expands most mathematical func-
tions, including . . ..
10See Section 1.5.2.
24 CHAPTER 1. INTRODUCTION
In any given system, the precise meaning of expansion depends on the underlying
polynomial representation used (recursive/distributed — see page 49), so Maple,
which is essentially distributed, would expand x(y+1) into xy+x, while Reduce,
which is recursive, would not, but would expand y(x+ 1) into xy + y, since its
default ordering is ‘x before y’.
Expansion can, of course, cause exponential blow-up in the size of the ex-
pression: consider (a + b)(c + d)(e + f) . . ., or sin(a + b + c + . . .). The second
bullet point of Maple’s description can lead to even more impressive expansion,
as in
expand(BesselJ(4,t)^3);
(just where did the number 165888 come from?) or
expand(WeierstrassP(x+y+z,2,3));
1.2.3 Simplification
This word is much used in algebra, particularly at the school level, and has
been taken over by computer algebra, which has thereby committed the sin of
importing into a precise subject a word without a precise meaning.
Looking at the standard textbooks on Computer Algebra Systems
(CAS) leaves one even more perplexed: it is not even possible to find
a proper definition of the problem of simplification [Car04].
Let us first consider a few examples.
1. Does x
2�1
x�1 simplify to x + 1? For most people, the answer would be
‘yes’, but some would query “what happens when x = 1”, i.e. would
ask whether we are dealing with abstract formulae, or representations of
functions. This is discussed further for rational functions on pages 59 and
276, and in item 5 below.
2. Assuming the answer to the previous question is ‘yes’, does x
1000�1
x�1 sim-
plify to x999 + · · ·+1? Here the fraction is much shorter than the explicit
polynomial, and we have misled the reader by writing · · · here11.
3. Does
p
1� x
p
1 + x simplify to
p
1� x2? Assuming the mathematical
validity, which is not trivial [BD02], then the answer is almost certainly
yes, since the second operations involves fewer square roots than the first.
4. Does
p
x� 1
p
x+ 1 simplify to
p
x2 � 1? This might be thought to be
equivalent to the previous question, but consider what happens when x =
�2. The first one simplifies to
p
�3
p
�1 = i
p
3 · i = �
p
3, while the
second simplifies to
p
3. Note that evaluations result in real numbers,
though they proceed via complex ones. Distinguishing this case from the
previous one is a subject of active research [BBDP07, CDJW00].
11The construct · · · is very common in written mathematics, but has had almost (but see
[SS06]) no analysis in the computer algebra literature.
1.2. EXPANSION AND SIMPLIFICATION 25
5. Assume we are working modulo a prime p, i.e. in the field Fp. Does
xp � x simplify to 0? As polynomials, the answer is no, but as functions
Fp ! Fp, the answer is yes, by Fermat’s Little Theorem. Note that, as
functions Fp2 ! Fp2 , say, the answer is no.
In terms of Notation 3, we can say that
({(x, xp � x) : x 2 Fp} ,Fp,Fp)B = ({(x, 0) : x 2 Fp} ,Fp,Fp)B ,
but
��
(x, xp � x) : x 2 Fp2
,Fp2 ,Fp2
�
B 6=
��
(x, 0) : x 2 Fp2
,Fp2 ,Fp2
�
B .
Now we give a few illustrations of what simplification means to di↵erent audi-
ences.
Teachers At a course on computer algebra systems for teachers of the French
‘concours’, among the most competitive mathematics examinations in the
western world, there was a round table on the meaning of simplification.
Everyone agreed that the result should be ‘mathematically equivalent’,
though it was less clear, prompted by example 1, exactly what this meant.
The response to example 2 was along the lines of “well, you wouldn’t ask
such a question”. The author wishes he had had examples 3 and 4 to hand
at the time!
The general consensus was that ‘simplification’ meant ’give me the an-
swer I want’. This answer is not e↵ective, in the sense that it cannot be
converted into a set of rules.
Whether this is an appropriate approach to pedagogy is outside the scope
of this book, but we refer the reader to [BDS09, BDS10].
Moses [Mos71] This is a seminal paper, but describes approaches to simplifi-
cation rather than defining it. Inasmuch as it does define it, it talks about
‘utility for further operations’, which again is not e↵ective as a definition
for top-level simplification, though it’s relevant for intermediate opera-
tions. However, the principle is important, since a request to factor would
find the expression x999 + · · · + 1 appropriate12, whereas x1000�1
x�1 is in a
field, and factorisation is not a relevant question.
Carette [Car04] He essentially defines simplification in terms of the length of
the result, again insisting on mathematical equivalence. This would regard
examples 1 (assuming Q(x), so that the expressions were mathematically
equivalent) and 3 as simplifications, but not 2, since the expression be-
comes longer, or 4, since we don’t have mathematical equivalence.
Stoutemyer [Sto11a] sets out 10 goals for simplification, one of which (Goal
6) is that
12In fact, knowing the expression came from that quotient would be relevant [BD89] to
factorisation algorithms, but that’s beside the point here.
26 CHAPTER 1. INTRODUCTION
Default simplification should produce candid [Definition 6] re-
sults for rational expressions and for as many other classes as is
practical. Default simplification should try hard even for classes
where candidness cannot be guaranteed for all examples.
Candid expressions tend be the shorter than others (x
10�1
x�1 etc. being
an obvious family of counter-examples), so this view is relatively close to
Carette’s, but not identical (see also page 60, item 2).
Numerical Analysts A numerical analyst would be shocked at the idea that
x2 � y2 was ‘simpler’ than (x + y)(x � y). He would instantly quote an
example such as the following [Ham07].
For simplicity, assume decimal arithmetic, with perfect rounding
and four decimal places. Let x = 543.2 and y = 543.1. Then
x2 evaluates to 295100 and y2 evaluates to 295000, so x2 � y2
becomes 100, while (x+ y)(x� y) evaluates to 108.6, a perfect
rounding of the true answer 108.63.
Furthermore, if we take x = 913.2 and y = 913.1, x2 � y2 is
still 100, while the true result is 182.63.
This is part of the whole area of numerical stability : fascinating, but
outside the scope of this text.
One principle that can be extracted from the above is “if the expression is zero,
please tell me”: this would certainly meet both the teachers’ and Carette’s
views. This can be seen as a call for simplification to return a normal form
where possible [Joh71, Ric97].
Maple’s description of the ‘simplify’ command is as follows.
• The simplify command is used to apply simplification rules
to an expression.
• The simplify/expr calling sequence searches the expression,
expr, for function calls, square roots, radicals, and powers. It
then invokes the appropriate simplification procedures.
• symbolic Specifies that formal symbolic manipulation of ex-
pressions is allowed without regard to the analytical issue of
branches for multi-valued functions. For example, the expres-
sion sqrt(x^2) simplifies to x under the symbolic option. With-
out this option, the simplified result must take into account the
di↵erent possible values of the (complex) sign of x.
Maple does its best to return a normal form, but can be fooled: for example
RootOf
�
Z 4 + b Z 2 + d
�
� 1/2
q
�2 b+ 2
p
b2 � 4 d ,
1.2. EXPANSION AND SIMPLIFICATION 27
which is actually zero (applying figure 3.2), does not simplify to zero under
Maple 11.
Because simplification may often require expansion, e.g. to take (x�1)(x+1)
to x2 � 1, the two are often confused, and indeed both Macsyma and Reduce
(internally) used ratsimp and *simp (respectively) to denote what we have
called expansion.
1.2.4 An example of simplification
This section is inspired by an example in [Sto13]. Consider
(cos (x))
3
sin (x) + 1
2
(cos (x))
3
sin (x) + 2 (cos (x))
3
cos (2x) sin (x)+
1
2
(cos (x))
3
cos (4x) sin (x)� 3
2
cos (x) (sin (x))
3�
2 cos (x) cos (2x) (sin (x))
3 � 1
2
cos (x) cos (4x) (sin (x))
3
.
(1.2)
Typing this into Maple simply collects terms, giving
3
2
(cos (x))
3
sin (x) + 2 (cos (x))
3
cos (2x) sin (x)+
1
2
(cos (x))
3
cos (4x) sin (x)� 3
2
cos (x) (sin (x))
3�
2 cos (x) cos (2x) (sin (x))
3 � 1
2
cos (x) cos (4x) (sin (x))
3
.
(1.3)
combine(%,trig), i.e. using the multiple angle formulae to replace trigonomet-
ric powers by sin / cos of multiples of the angles, gives
3
8
sin (4x) + 1
4
sin (2x) + 1
4
sin (6x) + 1
16
sin (8x) . (1.4)
expand(%,trig) (i.e. using the multiple angle formulae in the other direction)
gives
4 sin (x) (cos (x))
7 � 4 (cos (x))5 (sin (x))3 , (1.5)
which is also given by Maple’s simplify applied to (1.3) (simplify applied to
(1.4) leaves it unchanged). Mathematica’s FullSimplify gives
2 (sin(3x)� sin(x)) (cos(x))5 . (1.6)
However, an even simpler form is found by Eureqa13:
(cos(x))
4
sin(4x). (1.7)
We note that (1.4) and (1.5) are the results of algorithmic procedures, pushing
identities in one direction or the other, applied to (1.3), while (1.6) and (1.7)
are half-way positions, which happen to be shorter than the algorithmic results.
13http://creativemachines.cornell.edu/eureqa. However, we should note that Maple
(resp. Mathematica) has proved that (1.5) (resp. (1.6)) is equivalent to (1.2), whereas Eureqa
merely claims that (1.7) fits the same data points as [a finite sample of] (1.2). Nevertheless,
Eureqa’s capability to find a short equivalent in essentially Carette’s sense [Car04] is impres-
sive, and in any case, knowing what to prove, any algebra system can prove equivalence.
http://creativemachines.cornell.edu/eureqa
28 CHAPTER 1. INTRODUCTION
1.2.5 Equality
To understand the di�culties that computer algebra systems have with equal-
ity, we need to remember the di↵erence between objects and representations
(Definition 2. These di�culties are discussed for compter algebra systems in
[Dav02] and for computer proof systems in [GKS15],
Notation 4 We use =O to stand for equality of (mathematical) objects, and
=R for equality of representations. By slight abuse of notation, we will also
regard =O as a relation on R, meanng “the mathematical objects denoted by
these two representations are the same”. For a given computer algebra system,
we use =CA to stand for equality in that system, e.g. =Maple.
Example 1 Hence, in terms of “ink-and-paper” representations, (x + 1)2 =O
x2 + 2x+ 1 but (x+ 1)2 6=R x2 + 2x+ 1.
In terms of relations on R, i.e. subsets of R⇥R, =R✓=O, i.e. if two represen-
tations are equal, the corresponding mathematical objects are equal.
Definition 7 (Definition 4 restated) A representation is said to be canoni-
cal if =O is the same as =R.
What properties might we want =CA to have?
Reflexive Since =R is reflexive, this will be achieved if =CA◆=R, as it should
be.
Symmetric This is pretty easy to achieve.
Transitive This is more di�cult. If =CA does more sophisticated processing
than =R, it might recognise that a =CA b and b =CA c, but fail to recognise
that a =CA c, especially if b is significantly simpler than a and c.
Congruence If a =CA b, we would like f(a) =CA f(b). If the system’s rep-
resentation of f(x) is just the uninterpreted f(x), this is relatively easy
to achieve. If the implementation of f is more sophisticated, this may be
harder to achieve.
Strictly speaking, what we have stated is unary congruence, and we
really want binary congruence. i.e. If a =CA b and c =CA d, we would
like f(a, c) =CA f(b, d), and in general congruence of all arities. The same
remarks apply.
Soundness This is “if =CA says that two things are equal, then they are”: in
symbols =CA✓=O. This is a property we should always have: if there’s
a counterexample,then two objects, which are mathematically not equal,
are declared equal by =R.
Completeness This is “if two things are mathematically equal, then =CA says
so”: in symbols =CA◆=O.
1.3. ALGEBRAIC DEFINITIONS 29
Unfortunately, completeness is impossible in general, essentially as a conse-
quence of the Gödel Incompleteness Theorem.
Neither =O nor =R is appropriate for “pedagogical equality” as needed in
mathematical education software: see [BDS09, BDS10] for a discussion of this.
1.3 Algebraic Definitions
In this section we give some classic definitions from algebra, which we will return
to throughout this book. Other concepts are defined as they occur, but these
ones are assumed.
Definition 8 A set R is said to be a ring if it has two binary operations + and
⇤, a unary operation � and a distinguished element 0, such that, for all a, b
and c in R:
1. a+ (b+ c) = (a+ b) + c (associativity of +);
2. a ⇤ (b ⇤ c) = (a ⇤ b) ⇤ c (associativity of ⇤);
3. a+ b = b+ a (commutativity of +);
4. a+ (�a) = 0;
5. a+ 0 = a;
6. a ⇤ (b+ c) = (a ⇤ b) + (a ⇤ c) (distributivity of ⇤ over +);
6’ (b+ c) ⇤ a = b ⇤ a+ c ⇤ a (right-distributivity)
7. a ⇤ b = b ⇤ a (commutativity of ⇤).
Not every text includes the last clause, and they would call a ‘commutative ring’
what we have called simply a ‘ring’. In the absence of the last clause, we will
refer to a ‘non-commutative ring’. 6’ is unnecessary if we have commutativity,
of course.
Definition 9 If R is a (possibly non-commutative) ring and ; 6= I ✓ R, then
we say that I is a (left-)ideal of R, written I /R, if the following two conditions
are satisfied14:
(i) 8f, g 2 I, f � g 2 I,
(i) 8f 2 R, g 2 I, fg 2 I.
Notation 5 If S ⇢ R, we write (S) for Sn2N {
Pn
i=1 figi fi 2 R, gi 2 S}:
the set of all linear combinations (over R) of the elements of S. We tend to
abuse notation and write (a1, . . . , ak) instead of ({a1, . . . , ak}). This is called
the (left-)ideal generated by S, and clearly is a left-ideal.
14We write f � g 2 I since then 0 = f � f 2 I, and then f + g = f � (0� g) 2 I.
30 CHAPTER 1. INTRODUCTION
There are also concepts of right ideal and two-sided ideal , but all concepts agree
in the case of commutative rings. Non-trivial ideals (the trivial ideals are {0}
and R itself) exist in most rings: for example, the set of multiples of m is an
ideal in Z.
Proposition 1 If I and J are ideals, then I + J = {f + g : f 2 I, g 2 J} and
IJ = {fg : f 2 I, g 2 J} are themselves ideals.
Definition 10 A ring is said to be noetherian, or to satisfy the ascending chain
condition if every ascending chain I1 ⇢ I2 · · · of ideals is finite.
Theorem 1 (Noether) If R is a commutative noetherian ring, then so is R[x]
(Notation 12, page 42).
Corollary 1 If R is a commutative noetherian ring, then so is R[x1, . . . , xn].
Definition 11 A ring R is said to be an integral domain if, in addition to the
conditions above, there is a neutral element 1 such that 1 ⇤ a = a and, whenever
a ⇤ b = 0, at least one of a and b is zero.
Another way of stating the last is to say that R has no zero-divisors (meaning
none other than zero itself).
Definition 12 An element u of a ring R is said to be a unit of R if there is
an element u�1 2 R such that u ⇤ u�1 = 1. u�1 is called the inverse of u. Note
that the context R matters: 2 is a unit in Q (with inverse 1/2), but not in Z.
Definition 13 If a = u⇤b where u is a unit, we say that a and b are associates.
Proposition 2 If a and b are associates, and b and c are associates, then a
and c are associates. Therefore being associates is an equivalence relation, since
it’s clearly reflexite and symmetric.
For the integers, n and �n are associates, whereas for the rational numbers,
any two non-zero numbers are associates.
Definition 14 An ideal I of the form (f), i.e. such that every h 2 I is gf for
some g 2 R, is called principal. If R is an integral domain such that every ideal
I is principal, then R is called a principal ideal domain, or P.I.D.
Classic P.I.D.s are the integers Z, and polynomials in one variable over a field.
Inside a principal ideal domain, we have the standard concept of a greatest
common divisor (formally defined in Definition 32).
Proposition 3 Let R be a P.I.D., a, b 2 R and (a, b) = (g). Then g is a
greatest common divisor of a and b, in the sense that any other common divisor
divides g, and g = ca+ db for c, d 2 R.
It is possible to have g.c.d.s without being a P.I.D.: common examples are Z[x]
(where the ideal (2, x) is not principal) and Q[x, y] (where the ideal (x, y) is not
principal).
1.3. ALGEBRAIC DEFINITIONS 31
Definition 15 If F is a ring in which every non-zero element is a unit, F is
said to be a field.
The “language of fields” therefore consists of two constants (0 and 1), four
binary operations (+, �, ⇤ and /) and two unary operations (� and �1, which
can be replaced by the binary operations combined with the constants). The
rational numbers and real numbers are fields, but the integers are not. For any
m, the integers modulo m are a ring, but only if m is prime do they form a
field. The only ideals in a field are the trivial ones.
Definition 16 If R is an integral domain, we can always form a field from it,
the so called field of fractions, consisting of all formal fractions15 a
b
: a, b 2
R, b 6= 0, where a/b is zero if and only if a is zero. Addition is defined by
a
b
+ c
d
= ad+bc
bd
, and multiplication by a
b
⇤ c
d
= ac
bd
. So a
b
= c
d
if, and only if,
ad� bc = 0.
In particular, the rational numbers are the field of fractions of the integers.
Definition 17 If F is a field, the characteristic of F , written char(F ), is the
least positive number n such that 1 + · · ·+ 1
| {z }
n times
= 0. If there is no such n, we say
that the characteristic is zero.
So the rational numbers have characteristic 0, while the integers modulo p have
characteristic p, as do, for example, the rational functions whose coe�cients are
integers modulo p.
Proposition 4 The characteristic of a field, if non-zero, is always a prime.
1.3.1 Algebraic Closures
Some polynomial equations have solutions in a given ring/field, and some do
not. For example, x2� 1 = 0 always has two solutions: x = �1 and x = 1. The
reader may protest that, over a field of characteristic two, there is only one root,
since 1 = �1. However, over a field of characteristic two, x2 � 1 = (x� 1)2, so
x = 1 is a root with multiplicity two.
Definition 18 A field F is said to be algebraically closed if every polynomial
in one variable over F has a root in F .
Proposition 5 If F is algebraically closed, then every polynomial of degree n
with coe�cients in F has, with multplicity, n roots in F .
Theorem 2 (“Fundamental Theorem of Algebra”) 16 C, the set of com-
plex numbers, is algebraically closed.
15Strictly speaking, it consists of equivalence classes of formal fractions, under the equality
we are about to define.
16The title is in quotation marks, since C (and R) are constructs of analysis, rather than
algebra.
32 CHAPTER 1. INTRODUCTION
Definition 19 If F is a field, the algebraic closure of F , denoted F is the field
generated by adjoining to F the roots of all polynomials over F .
It follows from proposition 82 that the algebraic closure is in fact algebraically
closed.
1.4 Some Complexity Theory
As is usual in computing we will use the so-called “Landau notation”17 to de-
scribe the computing time (or space) of operations.
Notation 6 (“Landau”) Let N be some measure of the size of a problem,
generally the size of the input to a program, and f some function. If t(N) is
the time taken, on a given hardware/software configuration, for a particular
program to solve the hardest problem of size N , we say that “t is eventually no
bigger than f”, in symbols
t(N) = O(f(N)), (1.8)
if, from some point onwards as N increases, t is no more than some fixed mul-
tiple of f , i.e. 9C 2 R,M 2 N : 8N > M t(N) < Cf(N). C (and M) are
generally referred to as the implicit constants of this notation.
Hardware tends to change in speed, but generally linearly, so O-notation is inde-
pendent of particular hardware choices. If the program implements a particular
algorithm, we will say that the algorithm has this O behaviour.
We will also use “soft O” notation.
Notation 7 (“soft O”) Let N be some measure of the size of a problem, and
f some function. If t(N) is the time taken, on a given hardware/software con-
figuration, for a particular program to solve the hardest problem of size N , we
say that
t(N) = Õ(f(N)) (1.9)
if t(N) grows “almost no faster” than f(N), i.e. slower than f(N)1+✏ for any
positive ✏: in symbols 8" > 09C 2 R,M 2 N : 8N > M t(N) < Cf(N)1+".
We should note that “= O” and “= Õ” should really be written with “2” rather
than “=”, and this use of “=” is not reflexive, symmetric or transitive. Also,
many authors use Õ to mean “up to logarithmic factors”, which is included in
our, more general, definition. [DL08]. One specific use of Õ is given in footnote
4, page 163.
The key results of complexity theory for elementary algorithms are that it
is possible to multiply two N -bit integers in time Õ(N), and two degree-N
polynomials with coe�cients of bit-length at most ⌧ in time Õ(N⌧). [AHU83]
For matrix multiplication, the situation is more complicated: see Notation 43
17Though apparently first introduced by[Bac94, p. 401]. See [Knu74].
1.4. SOME COMPLEXITY THEORY 33
Definition 20 (Following [BFSS06]) We say that an algorithm producing
output of size N is optimal if its running time is O(N), and almost optimal if
its running time is Õ(N).
Addition of numbers and polynomials is therefore optimal, and multiplication
almost optimal. Matrix multiplication is not known to be either.
However, the reader should be warned that Õ expressions are often far from
the reality experienced in computer algebra, where data are small enough that
the limiting processes in equations (1.8) and (1.9) have not really taken hold (see
note 4), or are quantised (in practice integer lengths are measured in words, not
bits, for example). We the therefore often use the phrase “classical arithmetic”
to mean O(n2) integer/polynomial multiplication, and O(n3) matrix multipli-
cation.
When it comes to measuring the intrinstic di�culty of a problem, rather
than the e�ciency of a particular algorithm, we need lower bounds rather than
the upper bounds implied in (1.8) and (1.9).
Notation 8 (Lower bounds) Consider a problem P , and a given encoding,
e.g. “dense polynomials (Definition 26) with coe�cients in binary”. Let N be
the size of a problem instance, and C a particular computing paradigm, and way
of counting operations. If we can prove that there is a c such that any algorithm
solving this problem must take at least cf(N) operations on at least one problem
instance of size N , then we say that this problem has cost at least of the order
of f(n), written
PC = ⌦(f(N)) or loosely P = ⌦(f(N)). (1.10)
Again “=” really ought to be “2”.
In some instances (sorting is one of these) we can match upper and lower bounds.
Notation 9 (⇥) If PC = ⌦(f(N)) and PC = O(f(N)), then we say that PC is
of order exactly f(N), and write PC = ⇥(f(N)).
For example if C is the paradigm in which we only count comparison operations,
sorting N objects is ⇥(N logN).
Notation 10 (Further Abuse) We will sometimes abuse these notations fur-
ther, and write, say, f(N) = 2O(N), which can be understood as either of the
equivalent forms log2 f(N) = O(N) or 9C 2 R,M : 8N > Mf(N) < 2CN .
Note that 2O(N) and O(2N ) are very di↵erent things. 4N = 22N is 2O(N) but
not O(2N ).
Open Problem 1 (Algebra of O) Manipulating such O-expressions, especi-
ally when they depend on several variables, can be very tedious as seen in [Col75,
pp. 160–163]. Write a computer algebra package to simplify such expressions,
so that, for example,
Osimplify(O(N^3)+O(N^2));
would yield O(N3).
34 CHAPTER 1. INTRODUCTION
1.4.1 Complexity Hierarchy
If an algorithm has input of size N , it will take time O(N) to read its input,
so this is generally the least complexity we consider. We then have various
complexities, all of which are (bounded by) polynomials in N :
N � N logN � N logN log logN � N log2 N � N3/2 � N2 � N3 � · · · .
(1.11)
We have written f(N) � g(N) rather than f(N) < g(N) since it depends on
the implicit constants whether for a particular N an algorithm whose time is
O(f(N)) is actually faster than one whose time is ⌦(g(N)): all we know is that
eventually, as N grows, the one with the better complexity will be faster.
All the complexities in class (1.11) are referred to as polynomial time com-
plexities, or P .
Definition 21 An algorithm is polynomial time, or in the class P if there is
a constant c such that the running time of the algorithm is O(N c).
Beyond this, we have an exponential class (or EXP ):
1.01N = 20.01436N � 2N � 4N = 22N � 2N logN � 2N2 � · · · (1.12)
Again, it depends on the implied constants, but eventually any algorithm whose
complexity is polynomial will be faster than one whose complexity is in (1.12).
There are complexities which lie between the two: (B.18) showsN⇥(1/ log logN),
which goes slower than any cN .
1.4.2 Probabilistic Algorithms
The traditional definition of an algorithm is “a definite sequence of operations
which terminates and produces the desired result” or words to that e↵ect, so the
title of this section is, taken literally, an oxymoron. Nevertheless, the concept
is very useful, and can be made precise by inserting into that definition “(which
may include calls to random number generators)”. We can then distinguish
various kinds of probabilistic algorithms (where “fast” tends to mean “polyno-
mial time”). In each case, “probably” refers to probability across the possible
outputs of the random number generators.
Monte Carlo (“always fast/probably correct”) The classic example here is the
Miller–Rabin primality testing algorithm [Rab80].
Algorithm 1 (Miller–Rabin primality testing)
Input: a number N of n bits
Output: Either “N is definitely composite” or “N is probably prime”.
The algorithm picks a random a 2 [2, N�1] and computes aN�1 carefully.
The running time, with classical arithmetic, is therefore O(n3). If N is
prime, the algorithm always outputs “N is probably prime”. If N is
composite, the algorithm outputs “N is probably prime” for at most 1/4
of a-values.
1.5. SOME MAPLE 35
Las Vegas (“always correct/probably fast”) If Algorithm A is a Monte Carlo
algorithm for problem P , and Algorithm B is a fast verifier for correctness,
then the process in Figure 1.1 will give a Las Vegas algorithm for problem
P . It is the absence of such a verifier that means that Miller–Rabin pri-
mality testing (as opposed to the deterministic, but much more expensive
AKS [AKS04] test) is only Monte Carlo.
Figure 1.1: Converting Monte Carlo to Las Vegas
do
ans:=Algorithm A(P)
while Algorithm B(ans,P)=false
return ans
Atlantic City (“probably fast/probably correct”) Again, the existence of a
fast verifier for correctness would let the process in Figure 1.1 convert this
to a Las Vegas algorithm. One example of an Atlantic City algorithm
is given in Theorem 3: again we lack a fast verifier, so the algorithm is
“only” Atlantic City.
1.5 Some Maple
1.5.1 Maple polynomials
From Maple 17 onwards, Maple has two possible representations for polynomi-
als.
Original Expressions An expression, whether or not it had been through
expand, was an n-ary sum of n-ary power products with numerical coef-
ficients. Uniqueness of power products (and hence collection of like terms
(8’) on page 42) was enforced through hashing, rather than sorting, and
hence summands and multiplicands could, and did, appear in any order.
Sparse distributed [MP12, MP14] This is as on page 50, and the order is
total degree lexicographic (Section 3.3.3).
There’s a Maple worksheet demonstrating this at http://staff.bath.ac.uk/
masjhd/JHD-CA/MaplePoly.html.
1.5.2 Maple rational functions
Maple’s expand command, as we saw on page 23, only expands numerators of
rational functions, so we get results like Figure 1.2. If we actually want the
denominators expanded, and the whole placed over a common denominator,
the correct tool is normal(...,expanded), as in Figure 1.3. As the name
http://staff.bath.ac.uk/masjhd/JHD-CA/MaplePoly.html
http://staff.bath.ac.uk/masjhd/JHD-CA/MaplePoly.html
36 CHAPTER 1. INTRODUCTION
Figure 1.2:
> expand((x-1)*(x-2)/((x-3)*(x-4)));
2
x 3 x 2
————— – ————— + —————
(x – 3) (x – 4) (x – 3) (x – 4) (x – 3) (x – 4)
Figure 1.3:
> normal((x-1)*(x-2)/((x-3)*(x-4)),expanded);
2
x – 3 x + 2
————-
2
x – 7 x + 12
implies, we get a normal representation (Definition 3), and indeed a canonical
one, as greatest common divisors are cancelled, and the leading coe�cient of
the denominator made positive in a consistent way.
1.5.3 The RootOf construct
Maple uses this construct to indicate a solution of an (univariate) equation. It’s
generally seen in the context of polynomial equations, as in Figure 1.4, where
essentially nothing more can be said about these numbers other than they are
the roots of the polynomial z5�5z3+4z�1. Note that Maple indexes the result of
RootOf according to the rules at http://www.maplesoft.com/support/help/
Maple/view.aspx?path=RootOf/indexed.
1.5.4 Active and Inert Functions
Notation 11 Maple functions fall into two categories.
active is the usual form: a command that tells Maple to perform a computa-
tion, as in gcd(x^2-1, x^3-1) which computes x� 1, or cos(Pi), which
computes �1.
inert is a form which tells Maple just to store the unevaluated concept of the
computation, as in Gcd(x^2-1, x^3-1), which returns Gcd
�
x2 � 1, x3 � 1
�
.
Some Maple commands have both active and inert forms — in this case the
inert form generally begins with a capital letter. An inert form can always be
built by prepending the % symbol to the function name, as in %cos(Pi), which
returns ‘%cos‘ (⇡).
http://www.maplesoft.com/support/help/Maple/view.aspx?path=RootOf/indexed
http://www.maplesoft.com/support/help/Maple/view.aspx?path=RootOf/indexed
1.5. SOME MAPLE 37
Figure 1.4: An example of Maple’s RootOf construct
> solve(z^5-5*z^3+4*z-1, x);
/ 5 3 \
RootOf\_Z – 5 _Z + 4 _Z – 1, index = 1/,
/ 5 3 \
RootOf\_Z – 5 _Z + 4 _Z – 1, index = 2/,
/ 5 3 \
RootOf\_Z – 5 _Z + 4 _Z – 1, index = 3/,
/ 5 3 \
RootOf\_Z – 5 _Z + 4 _Z – 1, index = 4/,
/ 5 3 \
RootOf\_Z – 5 _Z + 4 _Z – 1, index = 5/
The value function evaluates the inert forms in its argument, so that
value(Gcd(x^2-1, x^3-1))
is equivalent to gcd(x^2-1, x^3-1), and therefore is x� 1.
Inert forms have many uses (see the Maple documentation): one that is par-
ticularly relevant to this book is their relationship with the mod operator. The
Maple documentation says, describing e mod m,
The mod operator evaluates the expression e over the integers modulo
m.
What may not be apparent here is that the usual Maple rules, that arguments
get evaluated before being passed to functions, still apply. Hence
gcd(x+2,x-3) mod 5
first evaluates gcd(x+2,x-3) (getting 1) then passes this in, e↵ectively calling
1 mod 5, and we get 1. What we probably intended was
Gcd(x+2,x-3) mod 5
so that the unevaluated g.c.d. object is passed to be calculated modulo 5, where
the result is x+2 (Maple by default uses 0, . . . , |m|�1 to store the results modulo
m). Similarly
factor(x^4+1) mod 2
38 CHAPTER 1. INTRODUCTION
first evaluates factor(x^4+1), getting x4+1 since this polynomial is irreducible
over the integers, e↵ectively calling x^4+1 mod 2, which is x4 + 1. Had we
written
Factor(x^4+1) mod 2
then Maple would be being asked to factor x4 + 1 modulo 2, and the answer
would have been (x+ 1)4.
Note also that 10^100 mod 7 will first calculate the integer 10100. The
correct syntax in this case is 10&^100 mod 7 to defer the exponentiation.
1.5.5 The simplify command
This Maple command is been discussed elsewhere, especially in section 1.2.4. It
is worth noting that simplification in this sense does not commute with substi-
tution, and it can be argued [Zim07] that it is a ‘user-oriented’ command that
should not be used inside Maple programs, where well-specified commands such
as expand (Section 1.2.2), or more specific ones such as combine(%,trig) are
very appropriate.
1.5.6 Equality
There are four operators in Maple which are relevant here, and the first three
could all be thought of as Maple’s =Maple.
= This actually forms a symbolic equation by default, as in
> 2=3;
2=3
It is only when this is required to be a Boolean value, either because of
the context (if or while statement), or explicitly by the evalb function,
that it is converted into true/false. When this happens, it appears to
the author that this implements =R (Notation 4), i.e. equality of data
structures.
testeq This uses the probabilistic algorithm of [Gon84] and hence might be
unsound, i.e. say that two things are equal when in fact they are not, but
the author has been unable to provoke this. It returns FAIL (rather than
false) if the expressions are of a category that it cannot handle, such as
square roots.
• The next two are part of Maple’s assume facility [WG93], but can be used
without any assumptions.
is typically has the syntax is(x1,prop1) and “returns true if all possible
values of x1 satisfy the property prop1”. It “returns false if any possible
value of x1 does not satisfy the property prop1”, and “returns FAIL if
1.5. SOME MAPLE 39
it cannot determine whether the property is always satisfied”. However,
it can also be used as is(x=y), and in this context would seem to do
significant amounts of simplification: at least the equivalent of normal,
and possibly simplify. However, it cannot really handle square roots,
and returns false (rather than FAIL, annoyingly) for both examples 3
and 4 (page 24).
coulditbe typically has the syntax coulditbe(x1,prop1) and “determines
whether there is a value of x1 such that prop1 is satisfied”. However,
it cannot really handle square roots, and returns true (rather than FAIL,
annoyingly) for the negations of both examples 3 and 4 (page 24), even
though there is no x for which examples 3 is not true.
40 CHAPTER 1. INTRODUCTION
Chapter 2
Polynomials
Polynomials are fundamental to much of mathematics, even when the objects
under discussion are apparently not polynomials, such as di↵erential equations.
Equally, polynomials underpin much of computer algebra. But what, precisely,
are they?
2.1 What are polynomials?
There are numerous definitions. From our point of view, computer algebra,
we will adopt the following definition for commutative1 polynomials, leaving
non-commutative polynomials to be discussed in section 2.4.
Definition 22 (Polynomials) A (commutative) polynomial is built up from
coe�cients, which are assumed to form a ring (definition 8), and certain in-
determinates (often called variables), by the algebraic operations of addition,
subtraction and multiplication. These are subject to the following laws, where
a, b, c are polynomials, m,n coe�cients, and 0 and 1 certain distinguished coef-
ficients.
1. a+ b = b+ a;
2. (a+ b) + c = a+ (b+ c);
3. a+ 0 = a
4. a+ (�a) = 0;
5. a ⇤ b = b ⇤ a;
6. a ⇤ (b ⇤ c) = (a ⇤ b) ⇤ c;
1“Commutative” meaning a ⇤ b = b ⇤ a. Strictly speaking, we should also worry whether
addition is commutative, i.e. whether a+ b = b+ a, but we will always assume that addition
is commutative.
41
42 CHAPTER 2. POLYNOMIALS
7. a ⇤ 1 = a;
8. a ⇤ (b+ c) = (a ⇤ b) + (a ⇤ c);
9. m+ n = m� n;
10. m ⇤ n = m⌦ n;
where we have used � and ⌦ to denote the operations of addition and multipli-
cation on coe�cients, which are assumed to be given to us.
The reader can think of the coe�cients as being numbers, though they need not
be, and may include other indeterminates that are not the “certain indetermi-
nates” of the definition. However, we will use the usual ‘shorthand’ notation of
2 for 1 � 1 etc. The associative laws (2 and 6 above) mean that addition and
multiplication can be regarded as n-ary operations. A particular consequence
of these rules is
8’ m ⇤ a+ n ⇤ a = (m� n) ⇤ a
which we can think of as ‘collection of like terms’.
Proposition 6 Polynomials over a ring form a ring themselves.
Definition 23 (Free Algebra) If it is the case that a polynomial is only zero
if it can be deduced to be zero by rules 1–10 above, and the properties of � and
⌦, then we say that we have a free polynomial algebra.
Free algebras are common, but by no means the only one encountered in com-
puter algebra. For examples, trigonometry is often encoded by regarding sin ✓
and cos ✓ as indeterminates, but subject to sin2 ✓ + cos2 ✓ = 1 [Sto77].
Notice what we have not mentioned: division and exponentiation.
Definition 24 ((Exact) Division) If a = b ⇤ c, then we say that b divides a,
and we write b = a
c
.
Note that, for the moment, division is only defined in this context. We note
that, if c is not a zero-divisor, b is unique.
Definition 25 (Exponentiation) If n is a natural number and a is a polyno-
mial, then we define an inductively by:
• a0 = 1;
• an+1 = a ⇤ an.
Notation 12 If K is a set of coe�cients, and V a set of variables, we write
K[V ] for the set of polynomials with coe�cients in K and variables in V . We
write K[x] instead of K[{x}] etc.
2.1. WHAT ARE POLYNOMIALS? 43
2.1.1 How do we manipulate polynomials?
We have defined the abstract, almost Platonic, concept of polynomials as math-
ematical objects, and polynomial algebra as rules for these objects. What do
we mean by the representation of these objects in a computer system?
One option would be for a computer algebra system essentially to do nothing,
simply recording the computations requested by the user, so that a+b would
become simply a + b. However, we would not be very happy with a calculator
which computed 1+1 as “1+1”, as we would rather see “2”. In particular, if the
answer is 0, we would like to see that shown, i.e. we would like the representation
to be normal (definition 3).
2.1.2 Polynomials in one variable
We will first consider polynomials in one variable, say x. If the coe�cients
come from a domain K, the polynomials in x over K are denoted by K[x]
(Notation 12). One obvious route to a canonical representation (definition 4) is
to insist that polynomials be expanded, i.e. that multiplication is only applied
to coe�cients and variables, not to general polynomials. This is achieved by
applying distributivity, rule 8 from definition 22, where necessary, ensuring that
multiplication is not applied to additions, and rule 8’ to collect terms. Once this
is done, the polynomial is of the form
Pn
i=0 aix
i, where the ai are coe�cients.
Notation 13 We assume that an 6= 0, which is easy if the coe�cients are
represented normally (Definition 3). In this case, n is called the degree of
the polynomial, denoted deg(f), or degx(f) if we wish to make it clear which
variable is being considered. an is called the leading coe�cient of f , and denoted
lc(f) or lcx(f). If lc(f) = 1, we say that f is monic. f � lc(f)xdeg(f), i.e.
f minus its leading term, is known as the reductum of f , red(f). The set
{red(f), red(red(f)), . . .} is known as the iterated reducta of f .
There is then an important distinction, which does not really occur when
doing algebra by hand: does one represent the zero coe�cients, or not?
Definition 26 A representation2 is said to be dense if every coe�cient, zero
or not, is represented, while it is sparse if zero coe�cients are not stored.
Hence the polynomial x2+0x�1, normally written as x2�1, would be stored as
<1,0,-1> in a dense representation, but <<2,1>,<0,-1>> in a sparse representa-
tion. As is implicit in the previous sentence, the normal “human” representation
is sparse. Those systems that automatically expand, e.g. Reduce [Hea05], use a
sparse representation, since a system would look fairly silly if it was unable to
represent x1000000000+1 since it could not store the 999,999,999 zeros. However,
dense representations are often used internally in some algorithms.
2In the current case, we are dealing with polynomials. But the concept is more general —
see section 3.2.2 for sparse matrices, for example.
44 CHAPTER 2. POLYNOMIALS
Definition 27 For a polynomial f =
P
i cix
↵i 2 Z[x], we define the sparse bit
size of f to be
P
i [1 + log2(|ci|) + log2(1 + ↵i)]: the number of bits needed to
encode f (the 1+ term allows for the sign of the coe�cient). This definition
ignores the awkward practicalities that bits are whole (we should have dlog2 . . .e)
and come in bytes/words, but has the right O-behaviour. It is also information-
theoretically correct, in that the ↵i might be 0, hence the 1+, but the ci can’t be
0, so don’t need a corresponding 1+, but this is unlikely to be taken advantage
of in practice. It also ignores the lengths of length fields, to say which bits mean
what, and the lengths of the lengths of the length fields . . . .
We say that an algorithm has poly-sparse complexity if the complexity is a
polynomial function in the sparse bit size of the inputs and outputs. See also
Definition 29.
Proposition 7 Both the dense and the sparse expanded representation are can-
onical (definition 4), provided that:
• the coe�cients themselves are canonical (otherwise polynomials of degree
0 would not be canonical)
• leading (in the dense case) or all (in the sparse case) zeros are suppressed;
• (in the sparse case) the individual terms are stored in a specified order,
generally3 sorted.
Addition is fairly straight-forward in either representation. In Lisp, we can do
addition in a sparse representation as follows. We use a representation in which
the CAR of a polynomial is a term, whilst the CDR is another polynomial: the ini-
tial polynomial minus the term defined by the CAR, i.e. the reductum (Notation
13). A term is a CONS, where the CAR is the exponent and the CDR is the coe�-
cient. Thus the LISP structure of the polynomial 3×2+1 is ((2 . 3) (0 . 1)),
and the list NIL represents the polynomial 0, which has no non-zero coe�cients,
and thus nothing to store. In this representation, we must note that the num-
ber 1 does not have the same representation as the polynomial 1 (which is ((0
. 1))), and that the polynomial 0 is represented di↵erently from the other
numerical polynomials.
(DE ADD-POLY (A B)
(COND ((NULL A) B)
((NULL B) A)
((GREATERP (CAAR A) (CAAR B))
(CONS (CAR A) (ADD-POLY (CDR A) B)))
((GREATERP (CAAR B) (CAAR A))
(CONS (CAR B) (ADD-POLY A (CDR B))))
((ZEROP (PLUS (CDAR A) (CDAR B)))
; We must not construct a zero term
3But we should observe that Maple, for example, which uses a hash-based representation,
is still canonical, even though it may not seem so to the human eye.
2.1. WHAT ARE POLYNOMIALS? 45
(ADD-POLY (CDR A) (CDR B)))
(T (CONS (CONS (CAAR A) (PLUS (CDAR A) (CDAR B)))
(ADD-POLY (CDR A) (CDR B))))))
(DE MULTIPLY-POLY (A B)
(COND ((OR (NULL A) (NULL B)) NIL)
; If a = a0+a1 and b = b0+b1, then ab = a0b0 + a0b1 + a1b
(T (CONS (CONS (PLUS (CAAR A) (CAAR B))
(TIMES (CDAR A) (CDAR B)))
(ADD-POLY (MULTIPLY-POLY (LIST (CAR A))
(CDR B))
(MULTIPLY-POLY (CDR A) B))))))
If A has m terms and B has n terms, the calculating time (i.e. the num-
ber of LISP operations) for ADD-POLY is bounded by O(m + n), and that for
MULTIPLY-POLY by O(m2n) ((m(m+ 3)/2� 1)n to be exact).4
There is a technical, but occasionally important, di�culty with this pro-
cedure, reported in [ABD88]. We explain this di�culty in order to illustrate
the problems which can arise in the translation of mathematical formulae into
computer algebra systems. In MULTIPLY-POLY, we add a0b1 to a1b. The order
in which these two objects are calculated is actually important. Why, since this
can a↵ect neither the results nor the time taken? The order can dramatically
a↵ect the maximum memory space used during the calculations. If a and b are
dense of degree n, the order which first calculates a0b1 should store all these
intermediate results before the recursion finishes. Therefore the memory space
needed is O(n2) words, for there are n results of length between 1 and n. The
other order, a1b calculated before a0b1, is clearly more e�cient, for the space
used at any moment does not exceed O(n). This is not a purely theoretical re-
mark: [ABD88] were able to factorise x1155 � 1 with REDUCE in 2 megabytes
of memory5, but they could not remultiply the factors without running out of
memory, which appears absurd.
There are multiplication algorithms which are more e�cient than this one:
roughly speaking, we ought to sort the terms of the product so that they ap-
pear in decreasing order, and the use of ADD-POLY corresponds to an insertion
sort. We know that the number of coe�cient multiplications in such a ‘classi-
cal’ method is mn, so this emphasises that the extra cost is a function of the
exponent operations (essentially, comparison) and list manipulation. Of course,
the use of a better sorting method (such as “quicksort” or “mergesort”) o↵ers a
more e�cient multiplication algorithm, say O(mn logm) in terms of time. Na
ive construction of all the terms followed by sorting would take O(mn) space
to build the unsorted list, and we can do better with ”heapsort” [Joh74]. But
most systems have tended to use an algorithm similar to the procedure given
4It is worth noting the asymmetry in the computing time of what is fundamentally a
symmetric operation. Hence we ought to choose A as the one with the fewer terms. If this
involves counting the number of terms, many implementors ‘cheat’ and take A as the one of
lower degree, hoping for a similar e↵ect.
5A large machine for the time!
46 CHAPTER 2. POLYNOMIALS
above: [MP08] shows the substantial improvements that can be made using a
better algorithm.
In general, the product of a sparse polynomial with m terms by one with n
terms will havemn terms, so a O(mn logm) has “nearly optimal” (optimal up to
logarithmic factors) complexity in terms of the worst-case output size. However,
this worst-case output size is often not achieved in practice. For example, the
polynomials may actually be dense in x1000, and then the number of terms is
m+ n� 1 rather than mn.
Notation 14 Define the support of a polynomial f , supp(f), to be the set of
exponents n such that xn occurs in f with a non-zero coe�cient. The sparsity of
f is the size of supp(f). Given two polynomials f and g, the possible exponent
set of f · g, poss(f, g) is {ef + eg : ef 2 supp(f), eg 2 supp(g)}.
Hence supp(f ·g) ✓ poss(f, g), with strict inequality occurring when cancellation
of coe�cients gives us an “unexpected” zero.
Theorem 3 ([AR15, Theorem 1.1]) Given f, g 2 Z[x] with bounds for the
degree D � deg(f) + deg(g) and height C � ||f ||1 + ||g||1, and a constant
µ 2 (0, 1), Algorithm SparseMultZZ of [AR15] correctly computes the product
h = fg with probability exceeding 1�µ, using expected6 Õ((#poss(f, g) logD+
#supp(f · g) logC) bit operations, where the constants in Õ depend on µ.
We can simplify the complexity to Õ((#poss(f, g)(logD+logC)), which shows
that this algorithm, which is based on interpolation, is nearly optimal in the
expected sparsity of the output. However, it is probabilistic, which limits its use
in practice.
Maple’s original representation (see section 1.5.1), and methods, are rather
di↵erent. The terms in a Maple sum might appear to be unordered, so we might
ask what prevents 2×2 from appearing at one point, and �2×2 at another. Maple
uses a hash-based representation [CGGG83], so that insertion in the hash table
takes amortised7 constant time, and the multiplication is O(mn).
In a dense representation, we can use radically di↵erent, and more e�cient,
methods based on Karatsuba’s method [KO63, and section B.3], which takes
time O
�
max(m,n)min(m,n)0.57…
�
, or the Fast Fourier Transform [AHU74,
chapter 8], where the running time is O(max(m,n) logmin(m,n)). Since the
number of terms in a dense product is m + n � 1 = O(max(m,n)), these algo-
rithms are nearly optimal.
Division is fairly straight-forward: to divide f by g, we keep subtracting
appropriate (meaning ci = (lc(f)/lc(g))x
degf�degg) multiples of g from f until
the degree of f is less than the degree of g. If the remaining term (the remainder)
is zero, then g divides f , and the quotient can be computed by summing the ci.
This is essentially the process known to schoolchildren as “long division”.
6This is an Atlantic City (“probably fast/probably correct”) algorithm: see section 1.4.2.
In fact the authors prove the existence of a Monte Carlo algorithm, but cannot describe it as
we do not know the number-theoretic constants.
7Occasionally the hash table may need to grow, so an individual insertion may be expensive,
but the average time is constant.
2.1. WHAT ARE POLYNOMIALS? 47
However, the complexity of sparse polynomial division is not so straight-
forward [DC10]. We cannot bound the complexity in terms of just the number
of terms in the input, because of the example8 of
xn � 1
x� 1 = x
n�1 + · · ·+ x+ 1, (2.1)
where two two-term inputs give rise to an n-term output. So imagine that we
are dividing f by g, to give a quotient of q and a remainder of r.
Notation 15 Let a polynomial f have degree df , and tf non-zero terms.
Since each step in the division algorithm above generates a term of the quo-
tient, there are tq steps. Each step generates tg terms, so there are O(tg)
arithmetic operations. But there are also the comparison operations required
to organise the subtraction, and it is hard to find a better bound than df
for the number of these. Hence the total cost is O(tq(df + tg)) operations.
Since we don’t know tq until we have done the division, the best estimate is
O(d2f + df tg). Again, ideas based on Heapsort can improve this, and the best
result is O(tf + tqtg logmin(tq, tg)) = O(tf + (df � dg)tg logmin(df � dg, tg))
commparisons[MP11]. This algorithm therefore has poly-sparse complexity
(Definition 27).
Example 2 (Bad Mergesort) The reason we need to use Heapsort rather
than, say, Mergesort is seen is a case like the following
xn
xn/2 + xn/2�1 + xn/4 + 1
,
where we perform n/2 merges of the divisor (4 terms) with polynomials with
up to n/2 terms, i.e. O(n2) work. The problem arises because the merges in
this case are asymmetrical, and a merge is only e�cient when merging things
of roughly the same size.
An alternative strategy, known as “geobuckets” for “geometrically increasing
buckets”, was devised by Yan [Yan98]. Let d be a fixed growth factor (his
experiments9 suggested d = 4). The application was Buchberger’s Algorithm
(9), where we are repeatedly computing f := f � cigi, where f typically is a
large polynomial and the cigi are polynomials which may be large or small. His
intermediate representation for a polynomial f was
f := f1 � f2 � · · ·� fl, (2.2)
where � signifies a lazy summation that we have yet to perform, and each
“bucket” fi consists of at most d
i terms, sorted normally. Then, when we
8See also Excursus B.5.
9Geobuckets are also used in CoCoA [Abb15], for polynomial multiplication, division and
polynomial reduction, and after experimentation then have also settled on d = 4. SINGULAR
also uses geobuckets with d = 4 [Sch15].
48 CHAPTER 2. POLYNOMIALS
have to subtract cigi from f , we subtract it from fk, where k is minimal with
dk � tgi . It is conceivable that this will cause fk to ‘overflow’, i.e. have more
than dk terms, in which case we add fk to fk+1, and then set fk to 0. This has
the consequence that we (almost) never add (i.e. merge) polynomials of greatly
unequal sizes, and also, compared with a heap, the storage requirements are
reduced: if d � 2 the redundancy is bound to be less than 50%, even if every
monomial in every other fi is a duplicate of a monomial in fl. At the end, we
actually perform the summations implicit in (2.2) to get a polynomial in the
usual form, taking care to perform them as
(. . . (f1 � f2)� · · ·)� fl
so as to preserve the balanced nature of the merges.
2.1.3 A factored representation
Instead of insisting that multiplication not be applied to addition, we could
insist that addition not be applied to multiplication. This would mean that a
polynomial was represented as a product of polynomials, each the sum of simple
terms:
f =
Y
i
fi =
Y
i
0
@
ni
X
j=0
ai,jx
j
1
A . (2.3)
In practice, repeated factors are stored explicitly, as in the following format:
f =
Y
i
fdii =
Y
i
0
@
ni
X
j=0
ai,jx
j
1
A
di
. (2.4)
We have a choice of using sparse or dense representations for the fi, but usually
sparse is chosen. It is common to insist that the fi are square-free
10 and rela-
tively prime11 (both of which necessitate only g.c.d. computations12 — lemma
3), but not necessarily13 irreducible. Hence this representation is generally
known as partially factored. In this format, the representation is not canonical,
since the polynomial x2 � 1 could be stored either as that (with 2 terms), or as
(x � 1)(x + 1) (with 4 terms): however, it is normal in the sense of definition
3. For equality testing, see excursus B.2 Though an extension of Stoutemyer’s
10Which almost certainly improves compactness, but see [CD91], where a dense polynomial
of degree 12 was produced (13 terms), whose square had only 12 nonzero terms, and the
process can be generalised. Twelve is minimal [Abb02].
11Which often improves compactness, but consider (xp�1)(xq�1) where p and q are distinct
primes, which would have to be represented as (x� 1)2(xp�1 + · · ·+ 1)(xq�1 + · · ·+ 1).
12These g.c.d. computations, if carried out by modular or p-adic methods (pages 161 and
207), should be cheap if the answer is “no simplification”, and otherwise should, at least in
non-pathological cases, lead to greater e�ciency later.
13If we were to insist on irreducibility, we would need to store xp�1 as (x�1)(xp�1+· · ·+1),
with p + 2 terms rather than with 2. Furthermore, irreducibility can be expensive to prove
[ASZ00].
2.1. WHAT ARE POLYNOMIALS? 49
original definition of candid (Definition 6), we can say that insisting on square-
freeness, so that any repeated part of one factor is explicit in the exponents in
(2.4), as in x3 + x2 � x � 1 having to be written as (x � 1)(x + 1)2, and on
relative primeness, so that a repeated factor cannot be hidden in two di↵erent
factors, as in (x2� 1)(x2+x� 2) having to be written as (x� 1)2(x+1)(x+2),
means that all the repetition is visible in the format of (2.4) and nothing is
being hidden.
Multiplication is relatively straight-forward, we check (via g.c.d. computa-
tions14) for duplicated factors between the two multiplicands, and then combine
the multiplicands. Addition can be extremely expensive, and the result of an
addition can be exponentially larger than the inputs: consider
(x+ 1)(x2 + 1) · · · (x2k + 1) + (x+ 2)(x2 + 2) · · · (x2k + 2),
where the input has 4(k + 1) non-zero coe�cients, and the output has 2k+1
(somewhat larger) ones.
This representation is not much discussed in the general literature, but is
used in Redlog [DS97] and Qepcad [CH91], both of which implement cylindrical
algebraic decomposition (see section 3.5), where a great deal of use can be made
of corollaries 20 and 21.
2.1.4 Polynomials in several variables
Here the first choice is between factored and expanded. The arguments for, and
algorithms handling, factored polynomials are much the same15 as in the case
of one variable. The individual factors of a factored form can be stored in any
of the ways described below for expanded polynomials, but recursive is more
common since it is more suited to g.c.d. computations (chapters 4 and 5), which
as we saw above are crucial to manipulating factored representations.
If we choose an expanded form, we have one further choice to make, which
we explain in the case of two variables, x and y, and illustrate the choices with
x2y + x2 + xy2 � xy + x� y2.
Notation 16 The total degree of a monomial is the sum of the degrees of all
the variables. The total degree of a polynomial is the greatest total degree of
any monomial in it.
Note that this definition implicitly assumes that we have performed any cancel-
lation, since the total degree of x2y3�x�x2y3 is 1, not 5. Put another way, we
require the representation to be candid (Definition 6), at least in this respect.
14Again, these should be cheap if there is no factor to detect, and otherwise lead to reduc-
tions in size.
15In one variable, the space requirements for a typical dense polynomial and its factors are
comparable, e.g. 11 terms for a polynomial of degree 10, and 6 each for two factors of degree
5. For multivariates, this is no longer the case. Even for bivariates, we would have 121 terms
for a dense polynomial of degree 10, and 36 each for two factors of degree 5. The gain is
greater as the number of variables increases.
50 CHAPTER 2. POLYNOMIALS
recursive — C[x][y]. We regard the polynomials as polynomials in y, whose
coe�cients are polynomials in x. Then the sample polynomial would be
(x � 1)y2 + (x2 � x)y + (x2 + x)y0. We have made the y0 term explicit
here: in practice detailed representations in di↵erent systems di↵er on this
point.
recursive — C[y][x]. We regard the polynomials as polynomials in x, whose
coe�cients are polynomials in y. Then the sample polynomial would be
(y + 1)x2 + (y2 � y + 1)x+ (�y2)x0.
distributed — C[x, y]. We regard the polynomials as polynomials in x and
y, whose coe�cients are numbers. With 6 terms (as in this example),
there are 6! = 720 possible orders. It is usual to impose two additional
constraints on the order on terms, or more accurately on the monomials16,
i.e. ignoring the coe�cients, which we will denote17 as >.
Definition 28 An ordering is said to be an admissible ordering if it sat-
isfies the following conditions.
• Compatibility with multiplication: if a > b then, for all monomials c,
ac > bc.
• Well-foundedness: for all non-trivial monomials a, a > 1.
These requirements greatly reduce the available orders for our sample
polynomial. One possibility would be to sort by total degree (i.e. the sum
of the degrees in each variable), using degree in x as a tie-breaker. This
would give us x2y+ xy2 + x2� xy� y2 + x. There is a fuller discussion of
such orderings in Section 3.3.3. However, we should note one important
property of admissible orders here.
Theorem 4 (Descending Chain Condition; Dickson’s Lemma)
Any decreasing sequence (with respect to an admissible ordering) of mono-
mials in a finite number of variables is finite. [Dic13]
In general, if there are n variables, there are n! possible recursive representations,
but an infinite number of possible distributed representations, though clearly
only finitely many di↵erent ones for any one given polynomial or finite set of
polynomials.
In both cases, we use sparse, rather than dense, representations, since any
reasonable multivariate polynomial had better be sparse: degree 6 in each of 6
variables means 76 = 117649 terms.
16In this book we use the wordmonomial to mean a product of (possibly repeated) variables,
as in xyz or x2y, without any coe�cient. Term means a product with a coe�cient, as in 3x2y.
Usage on this point di↵ers.
17We are not making any numerical evaluation of the monomials, merely saying which order
we put the monomials in.
2.1. WHAT ARE POLYNOMIALS? 51
Definition 29 We say that an algorithm has poly-sparse complexity if the com-
plexity is a polynomial function in the sparse bit size (see Definition 27) of the
inputs and outputs. We say that an algorithm has poly-semisparse complexity
if the complexity is a polynomial function in the sparse bit size and the degree
of the polynomial. If all that can be said is that the complexity is polynomial in
dn (where d is the total degree and n is the number of indeterminates) we say
it has poly-dense complexity.
The same canonicality results as for univariate polynomials apply.
Proposition 8 For a fixed ordering, both recursive and distributed representa-
tions are canonical (definition 3). Partially factored representations are normal,
but not canonical.
It makes no sense to compare polynomials in di↵erent representations, or the
same representation but di↵erent orderings. We have spoken about ‘represen-
tations’, but in fact the division between recursive and distributed goes deeper.
While characterisations 3 and 2 of a Gröbner base (theorem 16) can make sense
in either view, characterisations 4 and 1 (the only e↵ective one) only make sense
in a distributed view. Conversely, while the abstract definitions of factorisation
and greatest common divisors (definition 31) make sense whatever the view, the
only known algorithms for computing them (algorithm 2 or the advanced ones
in chapters 4 and 5) are inherently recursive18.
2.1.5 Other representations
Sparse representations take up little space if the polynomial is sparse. But
shifting the origin from x = 0 to x = 1, say, will destroy this sparsity, as
might many other operations. The following example, adapted from [CGH+03],
illustrates this. Let �(Y, T ) be
9X1 . . . 9Xn(X1 = T +1)^ (X2 = X21 )^ · · ·^ (Xn = X2n�1)^ (Y = X2n). (2.5)
The technology described in section 3.5.3 will convert this to a polynomial equa-
tion
(Y, T ) : Y = (1 + T )2
n
. (2.6)
Dense or sparse representations have problems with this, in the sense that ex-
pression (2.5) has length O(n), but expression (2.6) has length O(2n) or more.
A factored representation could handle the right-hand side, assuming that we
are not representing the equations as polynomial = 0. But changing the last
conjunct of � to (Y = (Xn + 1)
2) changes to
Y =
⇣
1 + (1 + T )2
n�1
⌘2
, (2.7)
whose factored representation now has length O(2n).
18At least for commutative polynomials. Factorisation of non-commutative polynomials is
best done in a distributed form.
52 CHAPTER 2. POLYNOMIALS
Factored representations display a certain amount of internal structure, but
at the cost of an expensive, and possibly data-expanding, process of addition.
Are there representations which do not have these ‘defects’? Yes, though they
may have other ‘defects’.
Expression tree This representation “solves” the cost of addition in the fac-
tored representation, by storing addition as such, just as the factored
representation stored multiplication as such. Hence
�
(x+ 1)3 � 1
�2
would
be legal, and represented as such. Equation (2.7) would also be stored
compactly provided exponentiation is stored as such, e.g. Z2 requiring
one copy of Z, rather than two as in Z · Z. This system is not canoni-
cal, or even normal: consider (x + 1)(x � 1) � (x2 � 1). This would be
described by Moses [Mos71] as a “liberal” system, and generally comes
with some kind of expand command to convert to a canonical represen-
tation. Assuming now that the leaf nodes are constants and variables,
and the tree’s internal nodes are (binary, i.e. with two arguments) addi-
tion, subtraction and multiplication, then a tree with maximal depth p
can represent a polynomial with maximum total degree 2p. It would need
to have 2p � 1 internal nodes (all multiplication), and 2p leaf nodes. The
degree is easy to bound, by means of a tree-walk, but harder to compute,
especially if cancellation actually occurs. Similarly, the leading coe�cient
can be computed via a simple tree-walk if no cancellation occurs.
Expression DAG also known as Straight-Line Program (SLP) [IL80].
This is essentially19 the representation used by Maple — it looks like the
previous representation, but the use of hashing in fact makes it a directed
acyclic graph (DAG). Again, a straight-line program of length l (i.e. a
DAG of depth l�1) can store a polynomial of degree 2l�1. The di↵erence
with the expression tree representation above is that we only need l nodes,
since the nodes can be reused.
This format is essentially immune to the “change of origin” problem men-
tioned above, since we need merely replace the x node by a tree to compute
x + 1, thus adding two nodes, and possibly increasing the depth by one,
irrespective of the size of the polynomial. The general ‘straight-line’ for-
malism has advantages where multi-valued functions such as square root
are concerned: see the discussions around figures 3.1 and 3.2.
However, there is one important caveat about straight-line programs: we
must be clear what operations are allowed. If the only operations are +,
� and ⇥, then evidently a straight-line program computes a polynomial.
Equally, if division is allowed, the program might not compute a polyno-
mial. But might it? If we look at figure 2.1, we see that p = x2 � 1 and
q = x� 1, so the result is p
q
= x
2�1
x�1 = x+1. Or is it? If we feed in x = 1,
we in fact get 0
0
, rather than 2. This is a singularity of the kind known as
19Maple uses n-ary addition and multiplication, rather than binary, as described in section
C.3.
2.1. WHAT ARE POLYNOMIALS? 53
a removable singularity, because limx!1
p(x)
q(x)
= 2. In fact [IL80, Theorem
3], deciding if two straight-line programs are equivalent is undecidable if
division is allowed.
Figure 2.1: A polynomial SLP
x
+ &
�1 ⇤ # �1
& . # .
p! + + q
& .
/
We said earlier that the only known algorithms for computing greatest
common divisors were recursive. This is essentially true, and means that
the computation of greatest common disivors of straight-line programs is
not a straight-forward process [Kal88].
Reconstructing one of the more explicit representations (usually a sparse
one!) from a straight-line program representation is not straightforward.
The current state of the art over general finite fields is the following20.
Proposition 9 ([AGR14a, Theorem 1]) Let F 2 Fq[z1, . . . , zn], and
suppose we are given a division-free straight-line program SF of length L
which evaluates F , an upper bound D = maxj degzj (F ), and an upper
bound T on the number of nonzero terms t of F . There exists a proba-
bilistic algorithm which interpolates F with probability at least 3/4. The
algorithm requires
Õ
�
Ln(T logD + n)(logD + log q) logD + n!�1T logD + n! logD
�
bit operations, where ! is the matrix multiplication exponent.
We note that this is linear in T , which is as good as we could hope for.
If we can choose the finite field, to be Zp where p is smooth (has only small
prime divisors) then we can do much better: Õ(LTn logD + n2T log2 D)
[Kal10, for the algorithm], [AGR14a, for the complexity]. Such fields ap-
pear to be common [HB78], and this gives us a algorithm for interpolat-
ing F over the integers with time Õ
�
(n2T logD + nLT )(n logD + logH)
�
[Kal10, for the algorithm], [Roc14, for the complexity]. There are further
improvements to be had in [AGR14b].
20The author is grateful to Dan Roche for explanations here, and for [Roc14].
54 CHAPTER 2. POLYNOMIALS
Figure 2.2: Code fragment A — a graph
p:=x+1;
q:=p;
r:=p*q;
Figure 2.3: Code fragment B — a tree
p:=x+1;
q:=x+1;
r:=p*q;
Additive Complexity This [Ris85, Ris88] is similar to a straight-line pro-
gram, except that we only count the number of (binary) addition/subtrac-
tion nodes, i.e. multiplication and exponentiation are ‘free’. Hence the
degree is unbounded in terms of the additive complexity, but for a given
expression (tree/DAG) can be bounded by a tree-walk. A univariate poly-
nomial of additive complexity a has at most Ca
2
real roots for some ab-
solute constant C: conjecturally this can be reduced to 3a. These bounds
trivially translate to the straight-line program and expression tree cases.
Specialist There are many possible highly specialist representations of poly-
nomials. One of the most impressive (where it’s applicable) is the graph
representation, used in [AIR14] to represent a polynomial with 317,881,154
monomials in a half-page graph.
“Additive complexity” is more of a measure of the ‘di�culty’ of a polynomial
than an actual representation. Of the others, the first was used in Macsyma for
its “general expression”, and the second is used in Maple21. In fact, Macsyma
would22 allow general DAGs, but would not force them. Consider the two code
fragments in figures 2.2 and 2.3. In the case of figure 2.2, both systems would
produce the structure in figure 2.4. For figure 2.3, Macsyma would produce the
structure23 in figure 2.5. Maple would still produce the structure of figure 2.4,
since the hashing mechanism would recognise that the two x+1 were identical.
2.1.6 The Newton Representation
For simplicity, in this subsection we will only consider the case of charateristic
0: finite characteristic has some serious technical di�culties, and we refer the
21Until such time as operations such as expand are used!
22Not explicitly, but rather as a side-e↵ect of the fact that Macsyma is implemented in Lisp,
which cares little for the di↵erence. The basic function EQUAL does not distinguish between
acyclic and cyclic structures
23The x is shown as shared since the Lisp implementation will store symbols unqiuely.
2.1. WHAT ARE POLYNOMIALS? 55
Figure 2.4: DAG representation
1 x
& .
p! + q
+
r ! ⇤
Figure 2.5: Tree representation
1 x 1
& . & .
p! + + q
& .
r ! ⇤
reader to [BFSS06]. We will also only consider monic polynomials.
Notation 17 Let p = xn+
Pn�1
i=0 aix
i =
Qn
i=0(x�↵i) be a polynomial of degree
n. Let �s =
Pn
i=0 ↵
s
i , and define the Newton series of p to be Newton(p) =
P
s�0 �sT
s.
It is well-known that the ai and ↵i are related:
an�1 = �
n
X
i=0
↵i
an�2 =
n
X
i=0
n
X
j=i+1
↵i↵j
…
…
…
a0 = (�1)n
n
Y
i=0
↵i.
These are then related to the �i:
�1 = �an�1
�21 = �2 + 2an�2
…
…
….
Hence, in characteristic 0, the �i (i n) form an alternative to the ai, a
fact known since 1840 [LV40]. But how do we convert rapidly between these
representations?
56 CHAPTER 2. POLYNOMIALS
Proposition 10 ([Sch82], see also [BFSS06, Lemma 1]) Newton(p) =
rev(p0)
rev(p)
as a power series about x = 0, where rev(p) =
Pn
i=0 an�ix
i.
2.1.7 Representations in Practice
General-purpose (calculus-side) computer algebra systems have to deal with
many expressions other than polynomials, but tend to regard polynomials as
the basic construct.
2.1.7.1 Representations in Reduce
This is probably one of the most straightforward. The basic Reduce object is
a standard quotient , i.e. a pair (literally a CONS cell in Lisp) of polynomials,
or standard forms . A standard form is a sparse recursive multivariate polyno-
mial, where the “variables” (known as kernels in Reduce) may be variables such
as x, but equally functions and expressions such as (cos x) — a Lisp form
representing cos(x).
If the kernels are genuinely indeterminates, i.e. we have a free algebra (Def-
inition 23), then this is a normal form, and it is canonical subject to the re-
strictions in Proposition 14. Initially Reduce did not, by default, compute gcds
(clause 3), but this changed as more e�cient gcd algorithms were implemented.
2.1.7.2 Representations in Macsyma
Macsyma’s “general representation” is essentially an expression tree one. There
is a special form, known as Canonical Rational Expression24, which again is a
ratio of sparse recursive polynomials.
2.1.7.3 Representations in Maple
Maple’s original internal representation25 was an expression tree whose funda-
mental form was an n-ary sum of n-ary power products, as in the representation
of 9xy3z � 4y3z2 � 6xy2z � 8×3 � 5 as in Figure 2.6. The “variables” might in
fact be other Maple expressions, so that x2 � (x � 1)(x + 1) � 1 would be a
sum of three terms, one of which was the product of two elements each of which
were themselves sums. The elements in SUM and PROD expressions were stored in
hash-code order, which greatly facilitated combining like terms. The advantages
and disadvantages of this are described in [MP12, MP13].
As described there, for genuine polynomials (which would exclude x2� (x�
1)(x+1)�1 and expressions involving RootOf etc.), an alternative data structure
was introduced at Maple 17: the POLY data structure, which is a packed sparse
distributed representation: in this case as in Figure 2.7. where the numbers 5131
24See http://www.ma.utexas.edu/maxima/maxima_11.html.
25The author is grateful to Michael Monagan of Simon Fraser University for much of this
information, and for permisson to reproduce the images. Note that, in the standard format
for a SUM, all the coe�cients are the odd-numbered elements (counting the SUM as number 1)
except for the constant term. This is apparently a historical design decision.
http://www.ma.utexas.edu/maxima/maxima_11.html
2.1. WHAT ARE POLYNOMIALS? 57
Figure 2.6: Maple’s Original Polynomials
SUM11 means a sum expression occupying 11 words (header plus 5 prod-
uct/coe�cient pairs); similarly PROD7 means a product expression occupying 7
words (header plus 3 variable/exponent pairs).
PROD 7
PROD 5
PROD 7
PROD 3
PROD 7 11 3
23 zy
1 2 1
3
zy
yx z
x
x
1−5−8−6−4SUM 11 9
Figure 2.7: Maple’s New-Style Polynomials
POLY12 means a poly data structure of header word, a pointer to the variables,
and five exponent/coe�cients pairs.
SEQ 4 x y z
−4 −6 −8 −59 5032 4121 3300 00005131POLY 12
etc. are in fact numbers base 215, so “5131”= 5 · 245 +230 +3 · 215 +1, meaning
‘total degree 5’ and then the exponents of the individual variables. This form is
only used if the packed exponent field will fit into a 64-bit integer (or a 32-bit
integer on 32-bit Maple. Note that a 64-bit Maple integer actually goes up to
262 � 1, as given by kernelopts(maximmediate)). In this case, we had four
‘variables’ (three real ones and the slot for total degree), and 16 = b62/4c —
with four real variables we would have a base 212 expression as 12 = b62/5c,
and so on. The fact that total degree is stored, and that the items in a POLY
data structure are stored in decreasing order of this packed exponent, means
that we have a ‘graded lexicographic’ (see p. 105) ordering.
The di↵erence should be invisible to the casual Maple user, except for per-
formance, but can be seen via the dismantle command.
2.1.8 Comparative Sizes
This section is largely of theoretical interest, since practical systems employ a
variety of techniques for storing polynomials, and are also constrained by the
actual size of machine words etc. Hence an analysis of a practical system will
tend to contain 4dlog231(n)e for the size in bytes, rather than log2(n) for the
58 CHAPTER 2. POLYNOMIALS
size in bits.
Notation 18 We assume our polynomials have integer coe�cients and are in
N variables, each of which26 occurs to degree at most D, and the coe�cients are
at most C in absolute value. Furthermore, suppose there are at most T non-zero
terms. Let n = log2 N , d = log2 D and c = 1 + log2 C (the “1+” allows for a
sign. We ignore the space needed to store N , D etc. themselves.
Dense At least in principle, dense storage, whether recursive or distributed,
just stores the coe�cients and no additional structural information. There
are (D+1)N terms, and , hence the storage needed is sdense = c(D+1)
N .
Sparse (Distributed) There are T terms, each needing a coe�cient and N
exponents, hence s
sparse
= T (c+Nd). If T is maximal (i.e. the polyno-
mial is completely dense), s
sparse
= s
dense
+ (D + 1)NNd.
Sparse (Recursive) This is harder to describe, partly because it depends on
the variable order, as xD1 x
D
2 · · ·xdN�1
�
xDN + · · ·+ x0N
�
stores N +D expo-
nents, but if xN is the main variable, it stores (D + 1)N exponents. The
coe�cient storage is the same as for s
sparse
, though.
Expression Tree Since this representation is not canonical, we need to discuss
the size of a particular representation, not just of an abstract polynomial.
Call this stree. Note that x
N requires N � 1 multiplications, whether
we use repeated squaring, as in x4 = (x ⇤ x) ⇤ (x ⇤ x) or iteration as in
x4 = x ⇤ (x ⇤ (x ⇤ x))
Expression DAG Again this is not canonical. xN can now be computed by
recursive squaring with reuse, so x2
n
only needs n multiplications. Call
this sDAG. Since every tree is a DAG, we have sDAG stree.
Though there are unusual counter-examples, if we assume that c� d� n, then
s
dense
� s
sparse
� s
tree
� s
DAG
. (2.8)
Proposition 11 Each of the gaps in (2.8) can be exponentially big.
2.2 Rational Functions
Of course, we want to manipulate a wider class of expressions, and even 1
x
is
not a polynomial.
26Many authors prefer to bound the total degree by D. The di↵erence is not great in
practice.
2.2. RATIONAL FUNCTIONS 59
Definition 30 A rational function is built up from coe�cients, which are as-
sumed to form an integral domain (definition 11), and certain indeterminates,
by the algebraic operations of addition, subtraction, multiplication and division
(except that division by zero is not permitted). In addition to the laws in def-
inition 22 (but with a, b and c interpreted as rational functions), the following
law is obeyed.
11. a ⇤ (1/a) = 1.
2.2.1 Canonical Rational Functions
Proposition 12 Any rational function f can be put over a common denomina-
tor, i.e. written as n/d where n and d are polynomials, known as the numerator
and denominator respectively. We write num(f) and den(f) respectively, noting
that in fact these are only defined up to units.
Proposition 13 In common denominator format, a
b
= c
d
if, and only if, ad�
bc = 0.
We can in fact characterise three simple forms of equality.
common coe�cients An example of this would be.
x2 � 2x+ 1
x2 � 1 versus
2×2 � 4x+ 2
2×2 � 2 .
Here we need to remove the g.c.d. of the contents (definition 35) of the
two polynomials.
“up to sign” An example of this would be.
�x2 + 2x� 1
x2 � 1 versus
x2 � 2x+ 1
�x2 + 1 .
These are ‘clearly equal’ but “computers don’t do clearly”. We need a
convention, and the common one27 is ‘leading coe�cient positive’ in the
denominator. However, this does not generalise so easily to other domains
of coe�cients [DT90].
common factors An example of this would be
x2 � 2x+ 1
x2 � 1 versus
x� 1
x+ 1
.
If we put the di↵erence between the two over a common denominator, we
get 0
x2�1 = 0. The reader may complain that
x2�2x+1
x2�1 “is undefined when
x = 1”, whereas x�1
x+1
“has the value 0”. However, we have not defined
what we mean by such substitutions, and for the purposes of this chapter,
we are concerned with algebraic equality in the sense of proposition 13.
27By no means the only possible one: ‘leading coe�cient negative’ would be equally valid,
as would ‘trailing coe�cient positive’.
60 CHAPTER 2. POLYNOMIALS
Proposition 14 ([DT90]) A representation n/d where n and d are polynomi-
als is canonical if the following conditions are satisfied:
1. n and d are polynomials from a free (Definition 23) polynomial algebra;
2. these polynomials are themselves represented canonically;
3. Any greatest common divisor, whether polynomial or content, is removed
from n and d;
4. n/d is canonical with respect to units, typically by insisting that d have a
canonical-up-to-associates (e.g. positive) leading coe�cient.
The reader might think that condition 1 was unnecessary in view of condition
2. This this is not so is shown by
p
2� 1 = 1p
2 + 1
(2.9)
where each of
p
2�1 and
p
2+1 are represented canonically, but nevertheless we
have an equality here, which wouldn’t occur for any value of the “indeterminate”
except
p
2.
2.2.2 Candidness of rational functions
We have already given (Definition 6) an abstract definition of candidness, which
can also be described as “what you see is what you’ve got” mathematically.
What would this mean for rational functions (and therefore for polynomials)?
[Sto11a, p.869] gives the following as a su�cient set of conditions28.
1. there are no compound ratios such as
x+
x+ 1
1 + 1
x
(2.10)
(note that this is 2x, and therefore “only” a polynomial, so violates the
general definition of candidness),
2. all ratios that occur are reduced (therefore preferring x999 + · · ·+x+1 to
x1000�1
x�1 , so disagreeing with Carette’s definition of ‘simplification’ on page
25),
3. the factors and terms are ordered in an easily discerned traditional way29,
such as lexically by descending degree,
4. all manifestly similar factors and terms are collected,
28He also remarks “There can be other candid forms for rational expressions, including
[appropriately reduced, ruling out (2.10)] continued fractions. However, the complexity of
implementing a candid simplifier increases with the permissiveness of the allowed result forms.”
29It is this clause that Maple’s sometimes disconcerting hash-based output breaks.
2.3. GREATEST COMMON DIVISORS 61
5. for each variable, the actual degree of every variable in a reduced ratio of
an expanded numerator and denominator would be no less than what a
user would predict assuming no cancellations. For example, assuming no
cancellations, we can predict that at most the degree of x will be 3 in the
denominator and 6 in the numerator when
x3 +
1
x2 � 1 +
1
x+ 2
(2.11)
is reduced over a common denominator. Those are the resulting degrees,
so (2.11) is a candid representation, even though it’s probably not one
of a class of canonical representations. Conversely (2.10) violates this
condition, since we would predict it to be the ratio of a degree 2 and a
degree 1 polynomial.
In particular, clause 2 (or 5) implies that any common factors are cancelled,
which poses the question, answered in the next section: how do we compute
common factors? Given f
g
, is there an h dividing both f and g that should be
cancelled?
2.3 Greatest Common Divisors
The following definition is valid whenever we have a concept of division.
Definition 31 h is said to be a greatest common divisor, or g.c.d., of f and g
if, and only if:
1. h divides both f and g;
2. if h0 divides both f and g, then h0 divides h.
This definition clearly extends to any number of arguments. The g.c.d. is nor-
mally written gcd(f, g).
Note that we have defined a g.c.d, whereas it is more common to talk of the
g.c.d. However, ‘a’ is correct. We normally say that 2 is the g.c.d. of 4 and 6,
but in fact �2 is equally a g.c.d. of 4 and 6.
Proposition 15 If h and h0 are greatest common divisors of a and b, they are
associates (definition 13).
Example 3 (Greatest common divisors need not exist) Consider the set
of all integers with
p
�5. 2 clearly divides both 6 and and 2 + 2
p
�5. However,
so does 1 +
p
�5 (since 6 = (1 +
p
�5)(1 �
p
�5)), yet there is no multiple of
both 2 and 1 +
p
�5 which divides both.
Definition 32 An integral domain (definition 11) in which any two elements
have a greatest common divisor is known30 as a g.c.d. domain.
30Normally known as a unique factorisation domain, but, while the existence of greatest
common divisors is equivalent to the existence of unique factorisation, the ability to compute
greatest common divisors is not equivalent to the ability to compute unique factorisations
[FS56, DGT91], and hence we wish to distinguish the two.
62 CHAPTER 2. POLYNOMIALS
If R is a g.c.d. domain, then the elements of the field of fractions (definition 16)
can be simplified by cancelling a g.c.d. between numerator and denominator,
often called “reducing to lowest terms”. While this simplifies fractions, it does
not guarantee that they are normal or canonical. One might think that 0
1
was the unique representation of zero required for normality, but what of 0�1?
Equally �1
2
= 1�2 , and in general we have to remove the ambiguity caused by
units. In the case of rational numbers, we do this automatically by making the
denominator positive, but the general case is more di�cult [DT90].
Definition 33 h is said to be a least common multiple, or l.c.m., of f and g
if, and only if:
1. both f and g divide h ;
2. if both f and g divide h0, then h divides h0.
This definition clearly extends to any number of arguments. The l.c.m. is
normally written lcm(f, g).
Proposition 16 If gcd(f, g) exists, then fg/ gcd(f, g) is a least common mul-
tiple of f and g.
This result is normally written as fg = gcd(f, g)lcm(f, g), but this is only true
up to associates. We should also note that this result does not extend to any
number of arguments: in general fgh 6= gcd(f, g, h)lcm(f, g, h).
2.3.1 Polynomials in one variable
For univariate polynomials over a field, we can define a more extended version
of division.
Definition 34 If a and b 6= 0 are polynomials in K[x], K a field, and a = qb+r
with deg(r) < deg(b), then we say that b divides a with quotient q and remainder
r, and q and r are denoted quo(a, b) and rem(a, b).
It is clear that q and r exist, and are unique. Division in the previous sense
then corresponds to the case r = 0.
Theorem 5 (Euclid) If K is a field, the univariate polynomials K[x] form a
g.c.d. domain.
Algorithm 2 (Euclid)
Input: f, g 2 K[x].
Output: h 2 K[x] a greatest common divisor of f and g
i := 1;
if deg(f) < deg(g)
then a0 := g; a1 := f ;
else a0 := f ; a1 := g;
2.3. GREATEST COMMON DIVISORS 63
while ai 6= 0 do
ai+1 = rem(ai�1, ai);
#qi :=the corresponding quotient: ai+1 = ai�1 � qiai
i := i+ 1;
return ai�1;
Proof. We must first show that this is an algorithm, i.e. that the potentially
infinite loop actually terminates. But deg(ai) is a non-negative integer, strictly
decreasing each time round the loop, and therefore the loop must terminate. So
ai = 0, but ai = rem(ai�2, ai�1), so ai�1 divides ai�2. In fact, ai�2 = qi�1ai�1.
Now ai�1 = ai�3 � qi�2ai�2, so ai�3 = ai�1(1 + qi�2qi�1), and so on, until we
deduce that ai�1 divides a0 and a1, i.e. f and g in some order. Hence the result
of this algorithm is a common divisor. To prove that it is a greatest common
divisor, we must prove that any other common divisor, say d, of f and g divides
ai�1. d divides a0 and a1. Hence it divides a2 = a0 � q1a1. Hence it divides
a3 = a1 � q2a2, and so on until it divides ai�1.
We should note that our algorithm is asymmetric in f and g: if they have
the same degree, it is not generally the case that gcd(f, g) = gcd(g, f), merely
that they are associates.
Lemma 1 In these circumstances, the result of Euclid’s algorithm is a linear
combination of f and g, i.e. ai�1 = �i�1f + µi�1g: �i�1, µi�1 2 K[x].
Proof. a0 and a1 are certainly such combinations: a0 = 1 ·f+0 ·g or 1 ·g+0 ·f
and similarly for a1. Then a2 = a0� q1a1 is also such a combination, and so on
until ai�1, which is the result.
The above theory, and algorithm, are all very well, but we would like to
compute (assuming they exist!) greatest common divisors of polynomials with
integer coe�cients, polynomials in several variables, etc. So now let R be any
g.c.d. domain.
Definition 35 If f =
Pn
i=0 aix
i 2 R[x], define the content of f , written
cont(f), or contx(f) if we wish to make it clear that x is the variable, as
gcd(a0, . . . , an). Technically speaking, we should talk of a content, but in the
theory we tend to abuse language, and talk of the content. Similarly, the prim-
itive part, written pp(f) or ppx(f), is f/cont(f). f is said to be primitive if
cont(f) is a unit.
Proposition 17 If f divides g, then cont(f) divides cont(g) and pp(f) divides
pp(g). In particular, any divisor of a primitive polynomial is primitive.
The following result is in some sense a converse of the previous sentence.
Lemma 2 (Gauss) The product of two primitive polynomials is primitive.
Proof. Let f =
Pn
i=0 aix
i and g =
Pm
j=0 bjx
j be two primitive polynomials,
and h =
Pm+n
i=0 cix
i their product. Suppose, for contradiction, that h is not
64 CHAPTER 2. POLYNOMIALS
primitive, and p is a prime31 dividing cont(h). Suppose that p divides all the
coe�cients of f up to, but not including , ak, and similarly for g up to but not
including bl. Now consider
ck+l = akbl +
k�1
X
i=0
aibk+l�i +
k+l
X
i=k+1
aibk+l�i (2.12)
(where any indices out of range are deemed to correspond to zero coe�cients).
Since p divides cont(h), p divides ck+l. By the definition of k, p divides
every ai in
Pk�1
i=0 aibk+l�i, and hence the whole sum. Similarly, by definition of
l, p divides every bk+l�i in
Pk+l
i=k+1 aibk+l�i, and hence the whole sum. Hence p
divides every term in equation (2.12) except akbl, and hence has to divide akbl.
But, by definition of k and l, it does not divide either ak or bl, and hence cannot
divide the product. Hence the hypothesis, that cont(h) could be divisible by a
prime, is false.
Corollary 2 cont(fg) = cont(f)cont(g).
Theorem 6 (“Gauss’ Lemma”) If R is a g.c.d. domain, and f, g 2 R[x],
then gcd(f, g) exists, and is gcd(cont(f), cont(g)) gcd(pp(f), pp(g)).
Proof. Since R is an integral domain, its field of fractions, say K is a field.
Hence, in K[x] where theorem 5 is applicable, pp(f) and pp(g) have a greatest
common divisor, say h. If c is any non-zero element of R, then ch is also a
greatest common divisor of pp(f) and pp(g). Hence we can assume that h is
in R[x] and, as a polynomial of R[x], is primitive. In K[x], pp(f) is a multiple
of h, say pp(f) = hk for k 2 K[x]. We can write k = dk0, where k0 2 R[x]
and is primitive. Then d�1pp(f) = hk0. But h and k0 are primitive, so, by
the Lemma, their product is primitive, and d is a unit. Hence h is, in R[x], a
common divisor of pp(f) and pp(g).
But, if h̄ is a common divisor of pp(f) and pp(g) in R[x], it is certainly
a common divisor of f and g in K[x], hence divides h in K[x], and so pp(h̄)
divides h in R[x]. Hence h is a greatest common divisor of pp(f) and pp(g) in
R[x], and the rest of the theorem is obvious.
This gives us one obvious means of computing g.c.d.s in R[x], which can be
described as “compute in K[x] and sort out the contents afterwards”. More
formally it would be Algorithm 3.
Algorithm 3 (General g.c.d.)
Input: f, g 2 R[x].
Output: h 2 R[x] a greatest common divisor of f and g
31The reader may complain that, in note 30, we said that the ability to compute g.c.d.s
was not equivalent to the ability to compute unique factors, and hence primes. But we are
not asking to factorise cont(f), merely supposing, for the sake of contradiction that it is
non-trivial, and therefore has a prime divisor.
2.3. GREATEST COMMON DIVISORS 65
1. fc := contx(f); fp := f/fc; gc := contx(g); gp := g/gc.
2. h := gcd(fp, gp) computed in K[x] by Algorithm 2
3. hp := pp(h⇥ (enough to remove denominators))
4. return gcd(fc, gc)⇥ hp
# Correct by reasoning above.
Certainly this is an algorithm, but is it a good one? Let k = max(degx(f), degx(g)).
In terms of the number of coe�cient operations, and in the dense representa-
tions, we do O(k2) operations. But how big do these coe�cients get?
Consider the computation of the g.c.d. of the following two polynomials
(this analysis is mostly taken from [Bro71a, Bro71b], but with one change32:
�21 instead of +21 for the trailing coe�cient of B):
A(x) = x8 + x6 � 3x4 � 3x3 + 8x2 + 2x� 5;
B(x) = 3x6 + 5x4 � 4x2 � 9x� 21.
The first elimination gives A� (x2
3
� 2
9
)B, that is
�5
9
x4 +
127
9
x2 � 29
3
,
and the subsequent eliminations give
50157
25
x2 � 9x� 35847
25
93060801700
1557792607653
x+
23315940650
173088067517
and, finally,
761030000733847895048691
86603128130467228900
.
Since this is a number, it follows that no polynomial can divide both A and B,
i.e. that gcd(A,B) = 1.
It is obvious that these calculations on polynomials with rational coe�cients
require several g.c.d. calculations on integers, and that the integers in these
calculations are not always small.
We can eliminate these g.c.d. calculations by working all the time with
polynomials with integer coe�cients, and this gives a generalisation of the ai of
algorithm 2, known as polynomial remainder sequences or p.r.s., by extending
the definition of division.
Definition 36 Instead of dividing f by g in K[x], we can multiply f by a
suitable power of the leading coe�cient of g, so that the divisions stay in R.
The pseudo-remainder of dividing f by g, written prem(f, g), is the remainder
when one divides lc(g)deg(f)�deg(g)+1f by g, conceptually in K[x], but in fact all
32Originally an error, but it makes the point better.
66 CHAPTER 2. POLYNOMIALS
the calculations can be performed in R, i.e. all divisions are exact in R. This
is denoted33 by prem(f, g).
In some applications (section 3.1.9) it is necessary to keep track of the signs:
we define a signed polynomial remainder sequence or s.p.r.s. of f0 = f and
f1 = g to have fi proportional by a positive constant to �rem(fi�2, fi�1).
This gives us a pseudo-euclidean algorithm, analogous to algorithm 2 where we
replace rem by prem, and fix up the contents afterwards. In the above example,
we deduce the following sequence:
�15x4 + 381x2 � 261,
6771195x2 � 30375x� 4839345,
500745295852028212500x+ 1129134141014747231250
and
7436622422540486538114177255855890572956445312500.
Again, this is a number, so gcd(A,B) = 1. We have eliminated the fractions,
but at a cost of even larger numbers. Can we do better?
2.3.2 Subresultant sequences
One option would be to make the ai primitive at each step, since we are going to
fix up the content terms later: giving the so-called primitive p.r.s. algorithm,
which in this case would give
�5x4 + 127x2 � 87; 5573x2 � 25x� 3983;�4196868317x� 1861216034; 1
This is a perfectly reasonable algorithm when it is a question of polynomials in
one variable, and is essentially equivalent to calculating with rational numbers,
but over a common denominator. However, if we come to polynomials in several
variables, every step of the g.c.d. for polynomials in n variables would involve
the g.c.d. of several polynomials in n � 1 variables, each step of each of which
would involve the g.c.d. of several polynomials in n� 2 variables, and so on.
The following, slightly mysterious34, algorithm will do the trick. By the
Subresultant Theorem [Loo82], all divisions involved are exact, i.e. we always
stay in R[x]. Furthermore, the factors �i that are cancelled are generically as
large as possible, where by “generically” we mean that, if the coe�cients of
f and g were all independent, nothing more could be cancelled35. In the same
33This definition agrees with Maple, but not with all software systems, which often use prem
to denote what Maple calls sprem, i.e. only raising lc(g) to the smallest power necessary.
34Some of the mystery is explained by corollary 4 on page 95. In particular the various �
signs, which are irrelevant as far as a strict g.c.d. algorithm is concerned, come from corollary
4.
35The reader may comment that the example, repeated below with this algorithm, shows a
consistent factor of 3 in a
2
, and this is true however the non-zero coe�cients are perturbed.
Indeed, if the leading coe�cient of a
1
is changed to, say, 4, we get a consistent factor of 4.
However, if the coe�cient of x7 in a
0
is made non-zero, then the common factor will generally
go away, and that is what we mean by “generically”.
2.3. GREATEST COMMON DIVISORS 67
Figure 2.8: Subresultant p.r.s. algorithm
Algorithm 4 (Subresultant p.r.s.)
Input: f, g 2 K[x].
Output: h 2 K[x] a greatest common divisor of pp(f) and pp(g)
Comment: If f, g 2 R[x], where R is an integral domain and K is the field of
fractions of R, then all computations are exact in R[x]. This can therefore fulfil
the rôle of steps 2–3 of Algorithm 3.
i := 1;
if deg(f) < deg(g)
then a0 := pp(g); a1 := pp(f);
else a0 := pp(f); a1 := pp(g);
�0 := deg(a0)� deg(a1);
�2 := (�1)�0+1;
2 := �1;
while ai 6= 0 do
ai+1 = prem(ai�1, ai)/�i+1;
#qi :=the corresponding quotient: ai+1 = lc(ai)
�i�1ai�1 � qiai
�i := deg(ai)� deg(ai+1);
i := i+ 1;
i+1 := (�lc(ai�1))�i�2 1��i�2i ;
�i+1 := �lc(ai�1) �i�1i+1 ;
return pp(ai�1);
The ai are referred to as a subresultant polynomial remainder sequence.
68 CHAPTER 2. POLYNOMIALS
example as before, we get the following:
a2 = 15x
4 � 381x2 + 261,
a3 = �27865x2 + 125x+ 19915,
a4 = �3722432068x� 8393738634,
a5 = 1954124052188.
Here the numbers are much smaller, and indeed it can be proved that the
coe�cient growth is only linear in the step number. a2 has a content of 3,
which the primitive p.r.s. would eliminate, but this content in fact disappears
later. Similarly a3 has a content of 5, which again disappears later. Hence we
have the following result.
Theorem 7 Let R be a g.c.d. domain. Then there is an algorithm to calculate
the g.c.d. of polynomials in R[x]. If the original coe�cients have length bounded
by B, the length at the i-th step is bounded by iB. Strictly speaking, this theorem
is true for polynomials in several variables, where “length” is replaced by “degree
in variables other than x”. Over the integers, we need to add log(2i!) to allow
for the fact that the sum of two integers can be larger than either.
This algorithm is the best method known for calculating the g.c.d., of all those
based on Euclid’s algorithm applied to polynomials with integer coe�cients. In
chapter 4 we shall see that if we go beyond these limits, it is possible to find
better algorithms for this calculation.
2.3.3 The Extended Euclidean Algorithm
We can in fact do more with the Euclidean algorithm. Consider the following
variant of Algorithm 2, where we have added a few extra lines, marked (*),
manipulating (a, b), (c, d) and (e, e0).
Algorithm 5 (Extended Euclidean)
Input: f, g 2 K[x].
Output: h 2 K[x] a greatest common divisor of f and g, and c, d 2 K[x] such
that cf + dg = h.
i := 1;
if deg(f) < deg(g)
then a0 := g; a1 := f ;
(⇤1) a := 1; d := 1; b := c := 0
else a0 := f ; a1 := g;
(⇤1) c := 1; b := 1; a := d := 0
while ai 6= 0 do
(⇤2) #Loop invariant: ai = af + bg; ai�1 = cf + dg;
ai+1 := rem(ai�1, ai);
qi :=the corresponding quotient: #ai+1 = ai�1 � qiai
(⇤3) e := c� qia; e0 := d� qib; #ai+1 = ef + e0g
2.3. GREATEST COMMON DIVISORS 69
i := i+ 1;
(⇤3) (c, d) := (a, b);
(⇤3) (a, b) := (e, e0)
return (ai�1, c, d);
The comments essentially form the proof of correctness. In particular, if f and
g are relatively prime, there exist c and d such that
cf + dg = 1 : (2.13)
a result often called Bézout’s identity36.
For the sake of further developments (section B.3.6), we can express the
marked lines as
(⇤1)
✓
a b
c d
◆
:=
✓
1 0
0 1
◆
or
✓
0 1
1 0
◆
(2.14)
(⇤2)
✓
ai
ai+1
◆
=
✓
a b
c d
◆✓
f
g
◆
(2.15)
(⇤3)
✓
a b
c d
◆
:=
✓
0 1
1 �qi
◆✓
a b
c d
◆
(2.16)
It is possible to make similar modifications to algorithm 4, and the same
theory that shows the division by �i+1 is exact shows that we can perform
the same division of (e, e0). However, at the end, we return pp(ai�1), and the
division by cont(ai�1) is not guaranteed to be exact when applied to (a, b).
Algorithm 6 (General extended p.r.s.)
Input: f, g 2 K[x].
Output: h 2 K[x] a greatest common divisor of pp(f) and pp(g), h0 2 K and
c, d 2 K[x] such that cf + dg = h0h
Comment: If f, g 2 R[x], where R is an integral domain and K is the field of
fractions of R, then all computations are exact in R[x], and h0 2 R, c, d 2 R[x].
i := 1;
if deg(f) < deg(g)
then a0 := pp(g); a1 := pp(f);
(*) a := 1; d := 1; b := c := 0
else a0 := pp(f); a1 := pp(g);
(*) c := 1; b := 1; a := d := 0
�0 := deg(a0)� deg(a1);
�2 := (�1)�0+1;
2 := �1;
while ai 6= 0 do
(*) #Loop invariant: ai = af + bg; ai�1 = cf + dg;
ai+1 = prem(ai�1, ai)/�i+1;
36Often spelled Bezout. But the title page of [B7́9] does have the acute accent.
70 CHAPTER 2. POLYNOMIALS
#qi :=the corresponding quotient: ai+1 = lc(ai)
�i�1ai�1 � qiai
�i := deg(ai)� deg(ai+1);
(*) e := (c� qia)/�i+1;
(*) e0 := (d� qib)/�i+1; #ai+1 = ef + e0g
i := i+ 1;
(*) (c, d) = (a, b);
(*) (a, b) = (e, e0)
i+1 := �lc(ai�1)�i�2 1��i�2i ;
�i+1 := �lc(ai�1) �i�1i+1 ;
return (pp(ai�1), cont(ai�1), c, d);
2.3.4 Partial Fractions
Bezout’s Identity (2.13) has a useful consequence. Suppose we have a fraction
p/q, and q = fg with gcd(f, g) = 1. Then
p
gf
=
p(cf + dg)
fg
=
pc
g
+
pd
f
. (2.17)
Even if p/q is proper, i.e. degp < degq, the same may not be true of pc/g or
pd/f . However, if the left-hand side of (2.17) is proper, so must the right-hand
side be when collected over a common denominator. Hence the extents to which
pc/g and pd/f are improper must cancel. So
p
q
=
pc rem g
g
+
pd rem f
f
. (2.18)
A decomposition of the form of the right-hand side of (2.18) is called a partial
fraction decomposition. This can clearly be extended to any number of factors
of the denominator, i.e.
p
Qn
i=1 fi
=
n
X
i=1
pi
fi
. (2.19)
There is one important caution: even if p, q 2 Z[x], this need not be the case
for the partial fraction decomposition, for example
1
x2 � 1 =
1/2
x� 1 +
�1/2
x+ 1
.
2.3.5 Polynomials in several variables
Here it is best to regard the polynomials as recursive, so that R[x, y] is regarded
as R[y][x]. In this case, we now know how to compute the greatest common
divisor of two bivariate polynomials.
Algorithm 7 (Bivariate g.c.d.)
Input: f , g 2 R[y][x].
Output: h 2 R[y][x] a greatest common divisor of f and g
2.3. GREATEST COMMON DIVISORS 71
hc := the g.c.d. of contx(f) and contx(g)
# this is a g.c.d. computation in R[y].
hp := algorithm 4 (ppx(f), ppx(g))
# replacing R by R[y], which we know, by theorem 7, is a g.c.d. domain.
return hchp
# which by theorem 6 is a g.c.d. of f and g.
This process generalises.
Theorem 8 If R is a g.c.d. domain, then R[x1, . . . , xn] is also a g.c.d. domain.
Proof. Induction on n, with theorem 7 as the building block.
What can we say about the complexity of this process? It is easier to analyse
if we split up the division process which computes rem(ai�1, ai) into a series
of repeated subtractions of shifted scaled copies of ai from ai�1. Each such
subtraction reduces deg(ai�1), in general by 1. For simplicity, we shall assume
that the reduction is precisely by 1, and that37 deg(ai+1) = deg(ai)� 1. It also
turns out that the polynomial manipulation in x is the major cost (this is not the
case for the primitive p.r.s, where the recursive costs of the content computations
dominates), so we will skip all the other operations (the proof of this is more
tedious than enlightening). Let us assume that degx(f)+degx(g) = k, and that
the coe�cients have maximum degree d. Then the first subtraction will reduce
k by 1, and replace d by 2d, and involve k operations on the coe�cients. The
next step will involve k � 1 operations on coe�cients of size 2d. The next step
combines one of the original polynomials, with coe�cients of degree d, with this
polynomial with coe�cients of degree 2d, giving a polynomial with coe�cients
of degree 3d. Combining this with a polynomial with coe�cients of degree 2d
ought to give us coe�cients of degree 5d, but in fact we divide by the previous
leading coe�cient (degree d) so the answer is a polynomial with coe�cients of
degree 4d, and so on, giving degree id at the i-th step (for a matrix view of this,
see Corollary 6 and the discussion after it), and a total cost of
Pk
i=0(k�i)F (id),
where F (d) is the cost of operating on coe�cients of degree d. Let us suppose
that there are v variables in all : x itself and v � 1 variables in the coe�cients
with respect to x.
v = 2 Here the coe�cients are univariate polynomials. If we assume classic
multiplication on dense polynomials, F (d) = cd2 + O(d). We are then
looking at
k
X
i=0
(k � i)F (id) c
k
X
i=0
(k � i)i2d2 +
k
X
i=0
kO(id)
ck
k
X
i=0
i2d2 � c
k
X
i=0
i3d2 + k3O(d)
37This assumption is known as assuming that the remainder sequence is normal . Note that
our example is distinctly non-normal, and that, in the case of a normal p.r.s., �
i
= ±lc(a
i�2)
2.
In fact, the sub-resultant algorithm was first developed for normal p.r.s., where it can be seen
as a consequence of the Dodgson–Bareiss Theorem (theorem 15).
72 CHAPTER 2. POLYNOMIALS
= c
✓
1
3
k4 +
1
2
k3 +
1
6
k2
◆
d2 � c
✓
1
4
k4 +
1
2
k3 +
1
4
k2
◆
d2 + k3O(d)
= c
✓
1
12
k4 � 1
12
k2
◆
d2 + k3O(d)
which we can write as O(k4d2). We should note the asymmetry here: this
means that we should choose the principal variable (i.e. the x in algorithm
7) to be whichever of x and y minimises
min(max(degx(f), degx(g)),max(degy(f), degy(g))).
v = 3 Here the coe�cients are bivariate polynomials. If we assume classic mul-
tiplication on dense polynomials, F (d) = cd4+O(d3). We are then looking
at
k
X
i=0
(k � i)F (id) c
k
X
i=0
(k � i)i4d4 +
k
X
i=0
kO(i3d3)
ck
k
X
i=0
i4d4 � c
k
X
i=0
i5d4 + k5O(d3)
= c
✓
1
5
k6 + · · ·
◆
d2 � c
✓
1
6
k6 + · · ·
◆
d2 + k5O(d3)
= c
✓
1
30
k6 + · · ·
◆
d4 + k5O(d3)
which we can write as O(k6d4). The asymmetry is again obvious.
general v The same analysis produces O(k2vd2v�2).
We see that the cost is exponential in v, even though it is polynomial in d and
k. This is not a purely theoretical observation: any experiment with several
variables will bear this out, even when the inputs (being sparse) are quite small:
the reader need merely use his favourite algebra system on
a0 := ax
4 + bx3 + cx2 + dx+ e; a1 := fx
4 + gx3 + hx2 + ix+ j,
treating x as the main variable (which of course one would not do in practice),
to see the enormous growth of the coe�cients involved.
2.3.6 Square-free decomposition
Let us revert to the case of polynomials in one variable, x, over a field K, and let
us assume that char(K) = 0 (see definition 17 — the case of characteristic non-
zero is more complicated [DT81], and we really ought to talk about ‘separable
decomposition’ [Lec08]).
Definition 37 The formal derivative of f(x) =
Pn
i=0 aix
i is written f 0(x) and
computed as f 0(x) =
Pn
i=1 iaix
i�1.
2.3. GREATEST COMMON DIVISORS 73
This is what is usually referred to as the derivative of a polynomial in calculus
texts, but we are making no appeal to the theory of di↵erentiation here: merely
defining a new polynomial whose coe�cients are the old ones (except that a0
disappears) multiplied by the exponents, and where the exponents are decreased
by 1.
Proposition 18 The formal derivative satisfies the usual laws:
(f + g)0 = f 0 + g0 (fg)0 = f 0g + fg0.
Proof. By algebra from the definition. This is taken up in more generality in
Proposition 70.
Let us consider the case f = gnh, where g and h have no common factors.
Then f 0 = gnh0 + ngn�1g0h and is clearly divisible by gn�1. n is not zero in
K (by the assumption on char(K)), so g does not divide f 0/gn�1 = gh0 + ng0h.
Hence gcd(f, f 0) is divisible by gn�1 but not by gn. These considerations lead
to the following result.
Proposition 19 Let f =
Qk
i=1 f
ni
i where the fi are relatively prime and have
no repeated factors. Then
gcd(f, f 0) =
k
Y
i=1
fni�1i .
Definition 38 The square-free decomposition of a polynomial f is an expres-
sion
f =
n
Y
i=1
f ii (2.20)
where the fi are relatively prime and have no repeated factors. f is said to be
square-free if n = 1.
Note that some of the fi may be 1.
Lemma 3 Such a decomposition exists for any non-zero f , and can be calcu-
lated by means of gcd computations and divisions.
Proof. Let g = gcd(f, f 0) =
Qn
i=1 f
i�1
i by the previous proposition. Then
f/g =
Qn
i=1 fi and gcd(g, f/g) =
Qn
i=2 fi. Hence
f/g
gcd(g, f/g)
= f1.
Applying the same process to g will compute f2, and so on.
This is not in fact the most e�cient way of computing such a decomposition:
a better method was given by Yun [Yun76].
74 CHAPTER 2. POLYNOMIALS
2.3.7 Sparse Complexity
So far we have, implicitly, considered dense polynomials. What if the polyno-
mials are sparse? There are then various embarrassing possibilities that
1. the gcd g might be much denser than the inputs A and B
2. even if the gcd g is not much denser than the inputs A and B, the cofactors
A/g and B/g computed as part of the verification might be much denser
3. even if neither of these happens, various intermediate results might be
much denser than A and B.
Problem 2 is easy to demonstrate: consider the cofactors in gcd(xp� 1, xq � 1),
where p and q are distinct primes. Problem 1 is demonstrated by the following
elegant example of [Sch03a] (extended to multivariates in Example 20)
gcd(xpq � 1, xp+q � xp � xq + 1) = (x
p�1)(xq�1)
x�1
= xp+q�1 + xp+q�2 ± · · ·� 1
| {z }
2min(p, q) terms
, (2.21)
and indeed just knowing whether two polynomials have a non-trivial gcd is hard,
by the following result.
Theorem 9 ([Pla77]) It is NP-hard to determine whether two sparse polyno-
mials (in the standard encoding) have a non-trivial common divisor.
This theorem, like the examples above, relies on the factorisation of xp�1, and it
is an open question [DC10, Challenge 3] whether this is the only obstacle. More
precisely, we have the following equivalent of the problem solved for division.
Open Problem 2 (Sparse gcd (strong)) Find an algorithm for computing
h = gcd(f, g) which is polynomial-time in tf , tg and th, and independent of the
degrees (or possibly polynomial in the logarithms of the degrees). See [DC10,
Challenge 5].
For multivariate polynomials, we seem to have a solution in practice (page 195)
in the sense that its running time depends on the number of actual terms in
the multivariate g.c.d., rather than the potential number of terms. It is still
polynomial in the degree in the main variable, though.
A weaker problem, but still unsolved, is
Open Problem 3 (Sparse gcd (weak)) Find an algorithm for computing h =
gcd(f, g) which is polynomial-time in tf , tg, th and tf/h, tg/h.
Observation 2 It follows from (2.21) that square-free decompositions are also
hard in the sense that the number of terms in the output is unbounded in the
2.4. NON-COMMUTATIVE POLYNOMIALS 75
number of terms in the input:
xpq+p+q � xpq+p � xpq+q + xpq � xp+q + xp + xq � 1
| {z }
8 terms
=
(xpq � 1) (xp+q � xp � xq + 1) =(x� 1)3
�
xp+q�2 + 2xp+q�3 ± · · ·+ 1
�
| {z }
p+ q � 2 terms
2
(. . .)
(2.22)
Furthermore, the largest coe�cient in the marked term of multiplicity 2 is
min(p, q), quashing any hopes that the coe�cients of a square-free decomposition
might be bounded in terms only of the coe�cients of the input.
2.4 Non-commutative polynomials
The label “non-commutative polynomials” in fact covers three cases.
1. The indeterminates commute, but the coe�cients do not. For definiteness,
we will refer to this case as polynomials with non-commuting coe�cients.
In this case, rule 5 of definition 22 has to be replaced by
50 x ⇤ y = y ⇤ x;
where x and y are indeterminates, not general polynomials. This means
that some of the traditional laws of algebra cease to operate: for example
(ax+ b)(ax� b) = a2x2 � b2
becomes
(ax+ b)(ax� b) = a2x2 + (�a ⇤ b+ b ⇤ a)x� b2
2. The coe�cients commute, but the indeterminates do not. For definiteness,
we will refer to this case as polynomials with non-commuting indetermi-
nates. In this case, rule 5 of definition 22 has to be replaced by the
assumption
• m⌦ n = n⌦m.
At this point, many of the traditional laws of algebra cease to operate:
even the Binomial Theorem in the form
(x+ y)2 = x2 + 2xy + y2
has to be replaced by
(x+ y)2 = x2 + (xy + yx) + y2.
A common case of non-commuting indeterminates is in di↵erential algebra,
where the variable x and the di↵erentiation operator d
dx
do not commute,
but rather satisfy the equation
d
dx
(xa) = x
da
dx
+ a. (2.23)
76 CHAPTER 2. POLYNOMIALS
3. Neither can be assumed to commute, in which case rule 5 of definition 22
is just deleted, with no replacement.
Notation 19 If the variables do not commute, it is usual to use the notation
Rhx1, . . . , xni for the ring of polynomials with coe�cients in R and the non-
commuting variables x1, . . . , xn.
Chapter 3
Polynomial Equations
In the first parts of this chapter, we will deal with polynomial equations, either
singly or as sets of equations. A preliminary remark is in order. Any polynomial
equation
A = B, (3.1)
where A and B are polynomial equations, can be reduced to one whose right-
hand side is zero, i.e.
A�B = 0. (3.2)
Notation 20 Henceforth, all polynomial equations will be assumed to be in the
form of (3.2).
3.1 Equations in One Variable
We may as well assume that the unknown variable is x. If the equation is linear
in x then, by the notation above, it takes the form
ax+ b = 0. (3.3)
The solution is then obvious: x = �b/a.
3.1.1 Quadratic Equations
Again, by the notation above, our equation takes the form
ax2 + bx+ c = 0. (3.4)
The solutions are well-known to most schoolchildren: there are two of them, of
the form
x =
�b±
p
b2 � 4ac
2a
. (3.5)
77
78 CHAPTER 3. POLYNOMIAL EQUATIONS
However, if b2� 4ac = 0, i.e. c = b2/4a then there is only one solution: x = �b
2a
.
In this case, the equation becomes ax2 + bx+ b
2
4a
= 0, which can be re-written
as a
�
x+ b
2a
�2
= 0, making it more obvious that there is a repeated root, and
that the polynomial is not square-free (definition 38).
Mathematicians dislike the sort of anomaly in “this equations has two solu-
tions except when c = b2/4a”, especially as there are two roots as c tends to
the value b2/4a. We therefore say that, in this special case, x = �b
2a
is a double
root of the equation. This can be generalised, and made more formal.
Definition 39 If, in the equation f = 0, f has a square-free decomposition
f =
Qn
i=1 f
i
i , and x = ↵ is a root of fi, we say that x = ↵ is a root of f of
multiplicity i. When we say we are counting the roots of f with multiplicity,
we mean that x = ↵ should be counted i times.
Proposition 20 The number of roots of a polynomial equation over the complex
numbers, counted with multiplicity, is equal to the degree of the polynomial.
Proof. deg(f) =
P
ideg(fi), and each root of fi is to be counted i times as a
root of f . That fi has i roots is the so-called Fundamental Theorem of Algebra.
In this case, the two roots are given by the two possible signs of the square
root, and
p
0 is assumed to have both positive and negative signs.
3.1.2 Cubic Equations
There is a formula for the solutions of the cubic equation
x3 + ax2 + bx+ c, (3.6)
albeit less well-known to schoolchildren:
1
6
3
q
36 ba� 108 c� 8 a3 + 12
p
12 b3 � 3 b2a2 � 54 bac+ 81 c2 + 12 ca3 �
2b� 2
3
a2
3
p
36 ba� 108 c� 8 a3 + 12
p
12 b3 � 3 b2a2 � 54 bac+ 81 c2 + 12 ca3
� 1
3
a.
We can simplify this by making a transformation1 to equation (3.6): replacing
x by x� a
3
. This transforms it into an equation
x3 + bx+ c (3.7)
(where b and c have changed). This has solutions of the form
1
6
3
q
�108 c+ 12
p
12 b3 + 81 c2 � 2b
3
p
�108 c+ 12
p
12 b3 + 81 c2
. (3.8)
1This is the simplest case of the Tschirnhaus transformation[vT83], which can always
eliminate the xn�1 term in a polynomial of degree n.
3.1. EQUATIONS IN ONE VARIABLE 79
S :=
p
12 b3 + 81 c2;
T :=
3
p
�108 c+ 12S;
return
1
6
T � 2b
T
;
Figure 3.1: Program for computing solutions to a cubic
Now a cubic is meant to have three roots, but a näıve look as equation (3.8)
shows two cube roots, each with three values, and two square roots, each with
two values, apparently giving a total of 3 ⇥ 3 ⇥ 2 ⇥ 2 = 36 values. Even if
we decide that the two occurrences of the square root should have the same
sign, and similarly the cube root should have the same value, i.e. we e↵ectively
execute the program in figure 3.1, we would still seem to have six possibilities.
In fact, however, the choice in the first line is only apparent, since
1
6
3
q
�108 c� 12
p
12 b3 + 81 c2 =
2b
3
p
�108 c+ 12
p
12 b3 + 81 c2
. (3.9)
In the case of the quadratic with real coe�cients, there were two real solu-
tions if b2� 4ac > 0, and complex solutions otherwise. However, the case of the
cubic is more challenging. If we consider x3 � 1 = 0, we compute (in figure 3.1)
S := 9; T := 6; return 1;
(or either of the complex cube roots of unity if we choose di↵erent values of T ).
If we consider x3 + 1 = 0, we get
S := 9; T := 0; return “ 0
0
”;
but we can (and must!) take advantage of equation (3.9) and compute
S := �9; T := �6; return � 1;
(or either of the complex variants).
For x3 + x, we compute
S :=
p
12; T :=
p
12; return 0;
and the two complex roots come from choosing the complex roots in the com-
putation of T , which is really
3
p
12
p
12. x3�x is more challenging: we compute
S :=
p
�12; T :=
p
�12; return {�1, 0, 1}; (3.10)
i.e. three real roots which can only be computed (at least via this formula)
by means of complex numbers. In fact it is clear that any other formula must
have the same problem, since the only choices of ambiguity lie in the square and
cube roots, and with the cube root, the ambiguity involves complex cube roots
of unity.
80 CHAPTER 3. POLYNOMIAL EQUATIONS
3.1.3 Quartic Equations
Here the equation would be x4 + ax3 + bx2 + cx+ d, but after the Tschirnhaus
transformation x! x� a
4
, analogous to that which took equation (3.6) to (3.7),
we can assume that a = 0. A truly marvellous solution then looks as follows
(but the page is too small to contain it!).
p
6
12
v
u
u
t
�4 b 3
p
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3 +
�
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3
�2/3
+ 48 d+ 4 b2
3
p
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3
+1/12
v
u
u
u
u
u
u
t
�
0
B
@
48 b
3
q
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3
v
u
u
t
�4 b 3
p
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3 +
�
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3
�2/3
+ 48 d+ 4 b2
3
p
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3
+ 6
v
u
u
t
�4 b 3
p
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3 +
�
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3
�2/3
+ 48 d+ 4 b2
3
p
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3
⇣
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3
⌘2/3
+ 288
v
u
u
t
�4 b 3
p
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3 +
�
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3
�2/3
+ 48 d+ 4 b2
3
p
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3
d+ 24
v
u
u
t
�4 b 3
p
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3 +
�
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3
�2/3
+ 48 d+ 4 b2
3
p
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3
b2 + 72 c
p
6
3
q
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3
1
C
A
1
3
p
�288 db+ 108 c2 + 8 b3 + 12
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3
1
s
�4 b 3
p
�288 db+108 c2+8 b3+12
p
�768 d3+384 d2b2�48 db4�432 dbc2+81 c4+12 c2b3+(�288 db+108 c2+8 b3+12
p
�768 d3+384 d2b2�48 db4�432 dbc2+81 c4+12 c2b3)
2/3
+48 d+4 b2
3
p
�288 db+108 c2+8 b3+12
p
�768 d3+384 d2b2�48 db4�432 dbc2+81 c4+12 c2b3
(3.11)
We can adopt the same formulation as in Figure 3.1, as shown in figure 3.2. Here
S :=
p
�768 d3 + 384 d2b2 � 48 db4 � 432 dbc2 + 81 c4 + 12 c2b3
T :=
3
p
�288 db+ 108 c2 + 8 b3 + 12S
U :=
r
�4 bT + T 2 + 48 d+ 4 b2
T
return
p
6
12
U +
p
6
12
s
�
�
8 bTU + UT 2 + 48Ud+ 4Ub2 + 12 c
p
6T
�
TU
Figure 3.2: Program for computing solutions to a quartic
the problem of multiple choices is even more apparent, but in this formulation
it turns out that choices cancel, much as in the case of the cubic. We have
the same problem as in the case of the cubic, that real solutions can arise from
complex intermediates, but also that the answer apparently involves
p
6, even
though it clearly need not do so in reality. For example, with x4 � 5×2 + 4,
whose solutions are ±1,±2, we can evaluate
S := 72
p
�3; T := 17 +
p
�3; U := 3
p
6; return 2; (3.12)
taking the other square root at the end gives 1, and taking the other square root
when computing U gives �1 or �2. We should also note that T was evaluated
as
3
p
4760 + 864
p
�3: not entirely obvious.
3.1.4 Higher Degree Equations
When it comes to higher degree equations, the situation is very di↵erent.
Theorem 10 (Abel, Galois [Gal79]) The general polynomial equation of de-
gree 5 or more is not soluble in radicals (i.e. in terms of k-th roots).
3.1. EQUATIONS IN ONE VARIABLE 81
In fact,2 if we choose such a polynomial “at random”, the probability of its
having a solution that can be expressed in terms of radicals is zero. Of course,
any particular quintic, or higher degree equation, may have solutions expressible
in radicals, such as x5 � 2, whose solutions are 5
p
2, but this is the exception
rather than the rule.
Hence algebra systems, if they handle such concepts, can only regard the
roots of such equations as being defined by the polynomial of which they are
a root. A Maple example3 is given in figure 1.4, where the Maple operator
RootOf is generated. It is normal to insist that the argument to RootOf (or its
equivalent) is square-free: then di↵erently-indexed roots are genuinely di↵erent.
Then ↵, the first root of f(x), satisfies f(↵) = 0, the second root � satisfies
f(x)/(x � ↵) = 0, and so on. Even if f is irreducible, these later polynomials
may not be, but determining the factorisations if they exist is a piece of Galois
theory which would take us too far out of our way [FM89]. It is, however,
comparatively easy to determine the Monte Carlo question: “such factorisations
definitely do not exist”/“they probably do exist” [DS00].
3.1.5 Reducible defining polynomials
It should be noted that handling such constructs when the defining polynomial
is not irreducible can give rise to unexpected results. For example, in Maple,
if ↵ is RootOf(x^2-1,x), then 1
↵�1 returns that, but attempting to evaluate
this numerically gives infinity, which is right if ↵ = 1, but wrong if ↵ = �1,
the other, equally valid, root of x2 � 1. In this case, the mathematical answer
to “is ↵ � 1 zero?” is neither ‘yes’ nor ‘no’, but rather ‘it depends which ↵
you mean’, and Maple is choosing the 1 value (as we can see from 1
↵+1
, which
evaluates to 0.5). However, the ability to use polynomials not guaranteed to
be irreducible can be useful in some cases — see section 3.3.7. In particular,
algorithm 11 asks if certain expressions are invertible, and a ‘no’ answer here
entrains a splitting into cases, just as asking “is ↵�1 zero?” entrains a splitting
of RootOf(x^2-1,x).
In general, suppose we are asking if g(↵) is invertible, where ↵ = RootOf(f(x), x),
i.e. we are asking for d(↵) such that d(↵)g(↵) = 1 after taking account of the
fact that ↵ = RootOf(f(x), x). This is tantamount to asking for d(x) such that
d(x)g(x) = 1 modulo f(x) = 0, i.e. d(x)g(x) + c(x)f(x) = 1 for some c(x). But
applying the Extended Euclidean Algorithm (Algorithm 5) to f and g gives us
c and d such that cf + dg = gcd(f, g). Hence if the gcd is in fact 1, g(↵) is
invertible, and we have found the inverse.
If in fact the gcd is not 1, say some h(x) 6= 1, then we have split f as f = hĥ,
2The precise statement is as follows. For all n � 5, the fraction of polynomials of degree n
and coe�cients at most H which have a root expressible in radicals tends to zero as H tends
to infinity.
3By default, Maple will also use this formulation for roots of most quartics, and the expres-
sion in figure 3.2 is obtained by convert(%,radical) and then locating the sub-expressions by
hand. This can be seen as an application of Carette’s view of simplification (page 25), though
historically Carette’s paper is a retrospective justification.
82 CHAPTER 3. POLYNOMIAL EQUATIONS
where ĥ = f/h. Now
↵ = RootOf(f(x), x), ↵ = RootOf(h(x), x) _ ↵ = RootOf(ĥ(x), x),
and in the first case g(↵) is definitely zero, and the second case requires us
to consider gcd(g, ĥ), and ĥ has lower degree than f , so this splitting process
terminates.
3.1.6 Multiple Algebraic Numbers
The situation gets more complicated if we have several such algebraic numbers
in play. By hand, such situations tend to be bypassed almost instinctively: if
there are
p
2 around, we replace
p
8 by 2
p
2, and if
p
2 and
p
3 are around, we
replace
p
6 by
p
2
p
3. For positive radicals, such an approach is good enough if
correctly formalised.
Proposition 21 Let the Ni be positive numbers, and the Mj be a square-free
basis for the Ni, i.e. the Mj are relatively prime and have no repeated factors,
and each Ni =
Q
j M
ni,j
j . Then the k-th roots of the Mj form a multiplicative
basis for the k-th roots of the Ni, and the only relations are
⇣
M
1/k
j
⌘k
= Mj.
If we allow negative numbers, we have to deal with the catch that 4
p
�4 = 1+ i,
which corresponds to the fact that x4 +4 = (x2 +2x+2)(x2� 2x+2), but this
is all [Sch00a, Theorem 19].
Hence, given a set S of expressions involving non-nested radicals, we can
compute the corresponding square-free basis Mj for the Ni occurring in the rad-
icals, let k be the least common multiple of the denominators of the exponents,
and express every number as N0
Q
M
↵j/k
j where N0 2 Q and 0 ↵j < k.
This representation is locally canonical (Definition 5): every number has a
unique representation until a new radical is introduced. However, it is only
locally canonical, not canonical: in di↵erent contexts we could have 51/261/2
and 31/2101/2, which are both valid, but equal. If we had them both in the
same context, they would both become 21/231/251/2. We note that this repre-
sentation is not candid (Definition 6), since, if we have some n1/6, thereafter we
represent m1/2 as m3/6. This is easily solved by cancelling common factors in
↵j/k on printing, while preserving the common denominator internally.
Integer g.c.d. is an e�cient process, and can compute a relatively prime
basis of the Ni e�ciently. However the square-free aspect is more troublesome,
and indeed is believed to be as hard as integer factorisation in general. There
are various reasons why we want a square-free basis.
1. We certainly want to avoid expressions like 41/2 or 271/3, and we would
probably want to replace 41/4 by 21/2. This can be achieved by checking
the Mj for being perfect powers, and this can be done e↵ciiently: [BS93]
shows O(log2 n). Failing to do this would mean that our expressions were
not even locally canonical, or indeed normal — consider 2� 41/2.
3.1. EQUATIONS IN ONE VARIABLE 83
2. Expressions like z := 721/6 are troublesome, as z2 will be stored internally
as 722/6 and printed as 721/3, whereas a human being would prefer to see
2 ·91/3, or even better 2 ·32/3. Hence it could be argued that there is scope
for non-candid expressions in this case.
There is further discussion of non-nested radicals in [RS13]: in particular
they point out that if we are prepared to do complete factorisation of the Ni
into primes we can have truly canonical representations.
In terms of the RootOf construct, we see that
p
2 is actually ↵ = RootOf(x2�
2, x) and
p
8 is actually � = RootOf(x2� 8, x). Now both x2� 2 and x2� 8 are
irreducible polynomials over the integers. But, in the presence of ↵, x2�8 factors
as (x�2↵)(x+2↵). The “square-free basis” technique of the previous paragraphs
spots this factorisation directly for non-nested radicals, but in general we are
led to the complications of factorisation in the presence of algebraic numbers
(Section 6.3).
3.1.7 Solutions in Real Radicals
We have seen above, both in the case of the cubic, equation (3.10), and the
quartic, equation (3.12), that real roots may need to be expressed via complex
radicals, even if all the root are real. Indeed, in the case of the cubic, this is
necessary. However, the quartic x4 + 4x3 + x2 � 6x+ 2, whose roots are
n
�1 +
p
3,�1�
p
3,�1 +
p
2,�1�
p
2
o
shows that polynomials can have real roots expressible in terms of real radicals,
and a slightly less obvious example is given by x4 + 4x3 � 44x2 � 96x + 552,
whose roots are
⇢
�1�
q
25 + 2
p
6,�1 +
q
25 + 2
p
6,�1�
q
25� 2
p
6,�1 +
q
25� 2
p
6
�
.
There is a little-known theorem in this area.
Theorem 11 ([Isa85]) Suppose that all the roots of an irreducible polynomial
f(x) over Q are real. Then if any root of the polynomial is expressible in radicals,
the degree of the polynomial must be a power of two.
3.1.8 Equations of curves
For a fuller description of this topic, see [Ful69]. In particular, we only consider
the a�ne case, whereas the projective case (i.e. allowing for “points at infinity”)
is in many ways more general.
Definition 40 An (a�ne) algebraic curve C(x1, . . . , xn) in n dimensions over
a field K is the set of solutions (x1, . . . , xn) to n � 1 independent algebraic
equations, i.e. polynomials gi(x1, . . . , xn) = 0.
84 CHAPTER 3. POLYNOMIAL EQUATIONS
If n = 2 we say that we have a plane algebraic curve.
Of course, the precise curve and equations are often not very interesting: for
instance we would like to think that the parabola x21 � x2 was “the same” as
y1 � y22 , and so on.
Definition 41 Two curves C(x1, . . . , xn) and C
0(y1, . . . , ym) are said to be bi-
rationally equivalent if there are two families of rational functions
F = (f1(x1, . . . , xn), . . . , fm(x1, . . . , xn))
and
G = (g1(y1, . . . , ym), . . . , gn(y1, . . . , ym))
such that:
1. for almost all (x1, . . . , xn) 2 C, (f1(x1, . . . , xn), . . . , fm(x1, . . . , xn)) is de-
fined and 2 C 0;
2. for almost all (y1, . . . , ym) 2 C 0, g1(y1, . . . , ym), . . . , gn(y1, . . . , ym) is de-
fined and 2 C;
3. almost everywhere, F and G are mutually inverse, i.e.
fi(g1(y1, . . . , ym), . . . , gn(y1, . . . , ym)) = yi
and
gj(f1(x1, . . . , xn), . . . , fm(x1, . . . , xn)) = xj .
“Almost everywhere” means “on a non-empty Zariski open set” [Ful69, ], and
can be thought of as “except where we get 0
0
behaviour”.
Theorem 12 Every algebraic curve is birationally equivalent to a plane curve.
Proof. If there are more than two variables, there is more than one equation,
and we can use resultants to eliminate one variable and one equation.
We then have the concept [Sen08] of a curve being soluble by radicals . In
this case, the generic curve of degree greater than six is not soluble by radicals
[Zar26]. However, many “interesting” curves are soluble by radicals.
Proposition 22 [Sen08, Corollary 3.2] Every irreducible plane curve of degree
at most five is soluble by radicals.
Proposition 23 [Sen08, Corollary 3.3] Every irreducible singular plane curve
of degree at most six is soluble by radicals.
Algorithms to compute these expressions are given in [Har11, SS11].
3.1. EQUATIONS IN ONE VARIABLE 85
It is also the case4 that the o↵set , i.e. the curve defined as the set of points
a fixed distance d from the original curve, to a curve soluble by radicals is also
soluble by radicals.
3.1.9 How many Real Roots?
While a polynomial of degree n has n complex roots, it generally has fewer real
ones, though how many, on average, is an interesting question, depending on
the definition of “on average”. The ‘obvious’ definitions would be ‘normally
distributed’ or ‘uniformly distributed in some range’, and for these Kac [Kac43]
shows that the average number is 2
⇡
log n. A definition with better geometric
invariance properties gives
q
n(n+2)
3
: very di↵erent [LL12].
We have seen that it is not obvious how many real roots a polynomial has:
can we answer that question, or more formally the following?
Problem 1 Given a square-free polynomial f , determine how many real roots
f has, and describe each real root su�ciently precisely to distinguish it from
the others. Many authors have asked the same question about non-square-
free polynomials, and have laboured to produce better theoretical complexity
bounds, since the square-free part of a polynomial may have larger coe�cients
than the original polynomial. However, in practice it is always better to compute
the square-free part first.
The usual description (but see Section 3.1.10) is to enclose the root in an interval,
within which it is the only root.
Definition 36 introduced the concept of a signed polynomial remainder se-
quence, also called a Sturm–Habicht sequence: fi is proportional by a positive
constant to �rem(fi�2, fi�1). The positive constant is normally chosen to keep
the coe�cients integral and as small as possible.
Definition 42 If f(x) is a square-free polynomial and a 2 R [ {�1,1}, let
Vf (a) denote the number of sign changes in the sequence f0(a), f1(a), . . . , fn(a),
where f0, . . . , fn is the Sturm–Habicht sequence of f and f
0, also known as the
Sturm sequence of f .
If f is not square-free, we need more careful definitions [BPR06], and to be clear
whether we are counting with multiplicity or not.
Theorem 13 (Sturm) If a < b are not zeros of f , and f is square-free, then
Vf (a)� Vf (b) is the number of zeros of f in (a, b).
4Unpublished. Prof. Sendra has supplied this proof for plane curves.
Let (R
1
, R
2
) be a square parametrization of the curve and K
0
= C(t) ⇢ K
1
⇢ . . . ⇢ K
s
be
a (radical) field tower such that R
1
, R
2
2 K
s
. Considering the formal derivation with respect
to t, one can deduce (for instance by induction on s) that if R 2 K
s
then its derivative R0 is
also in K
s
.
Now consider a = (R0
1
)2 + (R0
2
)2 2 K
s
and K
s+1
= K
s
(
p
a), then (O
1
, O
2
) = (R
1
, R
2
) ±
d/
p
a(�(R
2
)0, (R
1
)0) 2 (K
s+1
)2. So (O
1
, O
2
) is radical with the tower K
0
⇢ . . . ⇢ K
s
⇢
K
s+1
.
86 CHAPTER 3. POLYNOMIAL EQUATIONS
Vf (1) (which can be regarded as lima!1 Vf (a)) can be computed as the num-
ber of sign changes in the sequence of leading coe�cients of the Sturm sequence.
Similarly, Vf (�1) is the number of sign changes in the sequence of leading co-
e�cients of the Sturm sequence with the signs of the odd-degree terms reversed.
Hence Vf (�1)�Vf (1), the total number of real roots, is easily computed from
the Sturm sequence.
Example 4 Let f be the polynomial x5 � 15x4 + 85x3 � 225x2 + 274x� 120.
Then a Sturm–Habicht sequence (just taking pseudo-renaimders) is
f0(x) = f = x
5 � 15x4 + 85x3 � 225x2 + 274x� 120
f1(x) = f(x)
0 = 5x4 � 60x3 + 255x2 � 450x+ 274
f2(x) = �prem(f0, f1) = 50x3 � 450x2 + 1270x� 1110
f3(x) = �prem(f1, f2) = 17500x2 � 105000x+ 147500
f3(x) = �prem(f2, f3) = 15750000000x� 47250000000
f5(x) = �prem(f3, f4) = 2480625000000000000000000
Since all the leading coe�cients are positive, the sign sequence at 1 is +,+,+,
+,+,+ and Vf (1) = 0. Similarly, the sign sequence at �1 is �,+,�,+,�,+,
so Vf (�1) = 5 and there are five real roots. Since fi(0) is just the trailing
coe�cient of fi we see that Vf (0) is the number of variations in �,+,�,+,�,+,
also 5. Hence there are no roots between �1 and 0, and five roots between 0 and
1. A longer version can be found at http: // staff. bath. ac. uk/ masjhd/
JHD-CA/ SHexample. html .
Theorem 14 If f is a square-free polynomial of degree d and coe�cients less
than 2L, then the number of subdivisions of (�1,1) required is O(d(L+log d))
[Dav85a]. If L > log d, then the number of subdivisions required can be ⌦(d(L+
log d)) [ESY06]. Hence if L > log d, the number of subdivisions required is
⇥(d(L+ log d)).
While the obvious way of computing Vf (a) is by the definition, i.e. evaluating
f0(a), . . ., this turns out not to be the most e�cient. Rather, while computing
the Sturm sequence f0 . . . , we should also store the quotients qi, so that fi(x) =
� (fi�2(x)� qi(x)fi�1(x)). We then compute as follows.
Algorithm 8 (Sturm Sequence evaluation)
Input:
a: A number
fn(x): Last non-zero element of Sturm sequence of f
qi(x): Quotient sequence from Sturm sequence of f
Output: Sequence L of fn(a), fn�1(a), . . . , f0(a).
L[n] := fn(a);
L[n� 1] := qn+1(a)L[n];
for i = n . . . 2
L[i� 2] := qi(a)L[i� 1]� L[i];
return L
http://staff.bath.ac.uk/masjhd/JHD-CA/SHexample.html
http://staff.bath.ac.uk/masjhd/JHD-CA/SHexample.html
3.1. EQUATIONS IN ONE VARIABLE 87
If f has degree n, coe�cients of bit-length at most ⌧ , a has numerator
and denominator of bit-length �, this algorithm has asymptotic complexity
Õ(d2 max(�, ⌧)) [LR01].
Since it is possible to say how big the roots of a polynomial can be (proposi-
tions 89, 90 and 91), we can determine, as precisely as we wish, the location of
the real roots of a univariate polynomial: every time the Sturm sequence says
that there are more than one root in an interval, we divide the interval in two,
and re-compute V (a)� V (b) for each half.
This is far from the only way of counting and locating real roots, i.e. solving
problem 1: other methods are based on Descartes’5 rule of signs (Theorem 52:
the number of roots of f in (0,1) is less than or equal to, by an even number,
the number of sign changes in the coe�cients of f) [CA76], its generalisation
the Budan–Fourier theorem [Hur12] (Corollaries 33 and 34: the number of roots
of f in6 [a, b] is less than or equal to, by an even number, the number of sign
changes in the derivatives of f evaluated at a (i.e. f(a), f 0(a), f 00(a) . . . ) less the
same evaluated at b), on continued fractions [TE07], or on numerical methods
[Pan02].
All such algorithms take time polynomial in d, the degree of the polynomial
whose real roots are being counted/determined, i.e. they are algorithms for the
dense model. The best results are in [Bur13], and are Õ(d4L2) for coe�cients
2L.
In the sparse model, we have theorems, but are short on algorithms.
Proposition 24 A polynomial with t non-zero terms has at most 2t � 1 real
roots (not counted with multiplicity).
Since the number of sign changes in a sequence of t terms is at most t�1, there
are at most 2t � 1 real roots (t � 1 positive, t � 1 negative and zero: consider
x3 � x) for a polynomial with t non-zero terms, irrespective of d.
Open Problem 4 (Roots of Sparse Polynomials) Find algorithms for count-
ing the number of real roots whose complexity depends polynomially on t alone,
or t and log d. There is recent progress described in [BHPR11]: notably a prob-
abilistic algorithm when t = 4. See also [Sag14], which depends polynomially
on t and log d, and linearly on the sparse bit size.
In fact, we can say more than Proposition 24.
Proposition 25 ([KPT12, Theorem 9]) Let g be a polynomial of the form
g(x) =
k
X
i=1
ai
m
Y
j=1
f
↵i,j
j
where ai 2 R, ↵i,j 2 N and the fj have at most t non-zero terms. Then the
number of real roots of g (not counted with multiplicity) is at most 2k2tk
2m/2+
2kmt = 2O(k
2m log t), and in particular is polynomial in t.
5This rule is always called after Descartes, though the proof actually seems to be due to
Gauss [BF93].
6We assume neither a nor b are roots.
88 CHAPTER 3. POLYNOMIAL EQUATIONS
If all the ↵i,j 2 {0, 1} the result would be trivial, as g would have at most
ktm non-zero terms. If k = 1 the result is also trivial. The point is that we
are allowing a slightly wider range of polynomials. However, Proposition 25 is
probably a long way from the truth. The following example7 illustrates this.
Open Problem 5 (fg + 1) Suppose f and g each have t terms. Then (Propo-
sition 24) each has at most 2t � 1 real roots (and exactly 2t � 1 only if 0 is
included). Hence fg, which might have t2 terms, has at most 4t � 3 real roots
(since we shouldn’t count 0 twice). How many real roots does fg+1 have? It is
not di�cult to make it have 4t� 2 roots (since 0 might be a double root of fg,
and we can split that into two roots), but nothing more is known.
Problem 2 Having solved Problem 1, we may actually wish to know the roots
to a given accuracy, say L bits after the point. Again, many authors have asked
the same question about non-square-free polynomials, and have laboured to
produce better theoretical complexity bounds, since the square-free part of a
polynomial may have larger coe�cients than the original polynomial. However,
in practice it is always better to compute the square-free part first.
The best published solution to this problem currently is that in [KS11]. TO
BE COMPLETED
3.1.10 Thom’s Lemma
An alternative approach to the description of real roots is provided by Thom’s
Lemma, and elaborated in [CR88].
Notation 21 A sign condition ✏i is any of the symbols “> 0”, “< 0”, “= 0”. A generalised sign condition is any of these or “� 0”, “ 0”. If ✏ = (✏0, . . . , ✏n�1) is any n-tuple of generalised sign conditions, we denote by ✏ the relaxation of ✏ obtained by replacing < 0 by 0 and > 0 by � 0.
Lemma 4 (Thom’s Lemma [CR88, Proposition 1.2]) Let p be a polynomial
of degree n, and ✏ an n-tuple of sign conditions. Let
A(✏) =
n
x 2 R|8i p(i)✏i
o
,
where p(i) denotes the i-th formal derivative (Definition 37) of p. Then:
(a) A(✏) is either empty or connected, i.e. an interval;
(b) If A(✏) is an interval, then A(✏) is the closure of A(✏), obtained by replacing
< r by r and > l by � l.
The proof is by induction on deg(p). Item (b) may seem obvious, but is in fact
rather subtle. Consider p = x3�x2, depicted in Figure 3.3. While {x|p(x) > 0}
is the interval (1,1), {x|p(x) � 0} is not the interval [1,1), but rather {0} [
3.2. LINEAR EQUATIONS IN SEVERAL VARIABLES 89
Figure 3.3: x3 � x2 illustrating Thom’s Lemma
[1,1). Thom’s Lemma actually talks about A(✏) with all the signs fixed, and
the sign of p00 distinguishes {0} from [1,1).
Corollary 3 A root x0 of p, hence a point where p
(0)(x0) = 0, is uniquely
determined by the signs of all the derivatives of p there.
It is possible to compute with real algebraic numbers defined this way, and
answer questions such as “how many roots does q(x, y) have, when y is the
(unique) root of p(y) = 0 with the following sign conditions: p0(y) < 0, p00(y) > 0
o.ts?”: again for details see [CR88].
3.2 Linear Equations in Several Variables
We now consider the case of several polynomial equations in several (not nec-
essarily the same number) of variables.
Notation 22 The variables will be called x1, . . . , xn, though in specific examples
we may use x, y or x, y, z etc.
3.2.1 Linear Equations and Matrices
A typical set of 3-by-3 linear equations might look like the following.
2x+ 3y � 4z = a;
7In [KPT12], but the author is grateful to Professor Koiran for explaining its significance
in a June 2015 Dagstuhl seminar.
90 CHAPTER 3. POLYNOMIAL EQUATIONS
3x� 2y + 2z = b;
4x� 3y � 2z = c.
If we denote by M the matrix
0
@
2 3 �4
3 �2 2
4 �3 �1
1
A, x the (column) vector (x, y, z)
and a the (column) vector (a, b, c), then this becomes the single matrix equation
M.x = a, (3.13)
which has, assuming M is invertible, the well-known solution
x = M�1.a. (3.14)
This poses two questions: how do we store matrices, and how do we compute
inverses?
3.2.2 Representations of Matrices
The first question that comes to mind here is “dense or sparse?”, as in definition
26 (page 43). For a dense representation of matrices, the solution is obvious:
we store a two-dimensional array (or one-dimensional array of one-dimensional
arrays if our language does not support two-dimensional arrays) containing the
values mi,j of the elements of the matrix. The algorithms for adding and multi-
plying dense matrices are pretty obvious, though in fact it is possible to multiply
two 2⇥ 2 matrices with seven multiplications of entries rather than the obvious
eight [Str69, Win71]: this leads to being able to multiply two n ⇥ n matrices
with O(nlog2 7⇡2.807) element multiplications rather than O(n3): see Excursus
B.4.
For a sparse representation of matrices, we have more choices.
row-sparse Here M is stored as a one–dimensional array of rows: the i-th
row consisting of a list of pairs (j,mi,j). This representation is equivalent
to that of the sparse polynomial
P
j mi,jx
j , and this technique has been
used in practice [CD85] and has a useful analogue in the case of non-linear
equations (see section 3.3.5).
column-sparse Here M is stored as a one–dimensional array of columns: the
j-th column consisting of a list of pairs (i,mi,j).
totally sparse Here M is stored as a list of triples (i, j,mi,j).
structured There are a large variety of special structures of matrices familar to
numerical analysts, such as ‘banded’, ‘Toeplitz’ etc. Each of these can be
stored e�ciently according to their special form. For example, circulant
(a special form of Toeplitz) matrices, of the form
0
B
B
@
a1 a2 a3 . . . an�1 an
an a1 a2 a3 . . . an�1
…
. . . · · · · · · . . .
…
a2 a3 . . . an�1 an a1
1
C
C
A
,
vagrant
vagrant
3.2. LINEAR EQUATIONS IN SEVERAL VARIABLES 91
are, strictly speaking, dense, but only have n distinct entries, so are
“information-sparse”. Recognizing these structures is not just a matter
of storage economy: there are also faster algorithms, e.g. [KMS13] can
solve an n⇥n Toepliz (3.13) in time O(n log2 n), rather than the ‘obvious’
O(n3) or better O(n2.···) (see Section B.4).
Clearly, if the matrix is structured, we should use the corresponding represen-
tation. For randomly sparse matrices, the choice depends on what we are doing
with the matrix: if it is row operations, then row-sparse is best, and so on. One
big issue with sparse matrices is known as fill-in — the tendency for operations
on sparse matrices to yield less sparse matrices. For example, if we multiply two
n ⇥ n matrices, each with e non-zero elements per row, with n � e, we would
expect, assuming the non-zero elements are scattered at random, the resulting
matrix to have e2 non-zero elements per row. This has some apparently para-
doxical consequences. Suppose M and N are two such matrices, and we wish to
compute MNv for many vectors v. Clearly, we compute MN once and for all,
and multiply this by v, and for dense M and N , this is right if there are more
than n such vectors v. But, if n � e > 2, this is not optimal, since, once MN
is computed, computing (MN)v requires ne2 operations, while compluting Nv
requires ne, as does computing M(Nv), totalling 2ne < ne2.
3.2.3 Matrix Inverses: not a good idea!
The first response to the question “how do we compute matrix inverses” ought
to be “are you sure you want to?” Solving (as opposed to thinking about the
solution of) equation (3.13) via equation (3.14) is often not the best way to
proceed in computer algebra8. Gaussian elimination (possibly using some of
the techniques described later in this section) directly on equation (3.13) is
generally the best way. This is particularly true if M is sparse, since M�1 is
generally not sparse — an extreme example of fill-in. Indeed, special techniques
are generally used for the solution of large sparse systems, particularly those
arising in integer factorisation or other cryptographic applications [HD03].
The usual method of solving linear equations, or computing the inverse of a
matrix, is via Gaussian elimination, i.e. transforming equation (3.13) into one in
which M is upper triangular, and then back-substituting. This transformation
is done by row operations, which amount to adding/subtracting multiples of one
row from another, since
P = Q & R = S implies P + �R = Q+ �S. (3.15)
If we try this on the above example, we deduce successively that z = a�18b+13c,
y = a � 14b + 10c and x = a � 15b + 11c. Emboldened by this we might try a
8Or elsewhere: “To most numerical analysts, matrix inversion is a sin” [Hig02, p. 260].
92 CHAPTER 3. POLYNOMIAL EQUATIONS
larger matrix:
M =
0
B
B
B
B
B
@
a b c d
e f g h
i j k l
m n o p
1
C
C
C
C
C
A
. (3.16)
After clearing out the first column, we get the matrix
0
B
B
B
B
B
@
a b c d
0 � eb
a
+ f � ec
a
+ g � ed
a
+ h
0 � ib
a
+ j � ic
a
+ k � id
a
+ l
0 �mb
a
+ n �mc
a
+ o �md
a
+ p
1
C
C
C
C
C
A
.
Clearing the second column gives us
0
B
B
B
B
B
B
B
B
@
a b c d
0 � eb
a
+ f � ec
a
+ g � ed
a
+ h
0 0 � (�
ib
a
+j)(� eca +g)
(� eba +f)
� ic
a
+ k
�(� iba +j)(�
ed
a
+h)
(� eba +f)
� id
a
+ l
0 0 � (�
mb
a
+n)(� eca +g)
(� eba +f)
� mc
a
+ o
(mba �n)(�
ed
a
+h)
(� eba +f)
� md
a
+ p
1
C
C
C
C
C
C
C
C
A
,
which we can “simplify” to
0
B
B
B
B
B
B
@
a b c d
0 �eb+af
a
�ec+ag
a
�ed+ah
a
0 0 afk�agj�ebk+ecj+ibg�icf�eb+af
afl�ahj�ebl+edj+ibh�idf
�eb+af
0 0 afo�agn�ebo+ecn+mbg�mcf�eb+af
afp�ahn�ebp+edn+mbh�mdf
�eb+af
1
C
C
C
C
C
C
A
.
(3.17)
After clearing the third column, the last element of the matrix is
�
�
�
� ib
a
+ j
� �
� ed
a
+ h
�
�
� eb
a
+ f
� � id
a
+ l
!
�
�
�mb
a
+ n
� �
� ec
a
+ g
�
�
� eb
a
+ f
� � mc
a
+ o
!
⇥
�
�
� ib
a
+ j
� �
� ec
a
+ g
�
�
� eb
a
+ f
� � ic
a
+ k
!�1
�
�
�mb
a
+ n
� �
� ed
a
+ h
�
�
� eb
a
+ f
� � md
a
+ p.
This simplifies to
�
�afkp+ aflo+ ajgp� ajho� angl + anhk + ebkp� eblo
�ejcp+ ejdo+ encl � endk � ibgp+ ibho+ ifcp� ifdo
�inch+ indg +mbgl �mbhk �mfcl +mfdk +mjch�mjdg
afk � agj � ebk + ecj + ibg � icf . (3.18)
3.2. LINEAR EQUATIONS IN SEVERAL VARIABLES 93
The numerator of this expression is in fact the determinant of the original ma-
trix, |M |.
In general, for an n⇥n matrix, we would perform O(n3) computations with
rational functions, which would, if we were to simplify, involve g.c.d. computa-
tions, often costly.
Can we do better? We could take a leaf out of the calculation on page
66, and not introduce fractions, but rather cross-multiply. If we do this while
clearing column one, we get
M2 :=
0
B
B
B
B
B
@
a b c d
0 �eb+ af �ec+ ag �ed+ ah
0 aj � ib ak � ic al � id
0 �mb+ an ao�mc ap�md
1
C
C
C
C
C
A
. (3.19)
After clearing column two,we get
M3 :=
0
B
B
B
B
B
B
B
B
B
@
a b c d
0 �eb+ af �ec+ ag �ed+ ah
0 0 (�aj + ib) (�ec+ ag)+ (�aj + ib) (�ed+ ah)+
(�eb+ af) (ak � ic) (�eb+ af) (al � id)
0 0 (�an+mb) (�ec+ ag)+ (�an+mb) (�ed+ ah)+
(�eb+ af) (ao�mc) (�eb+ af) (ap�md)
1
C
C
C
C
C
C
C
C
C
A
.
(3.20)
The result of the next step is better contemplated than printed!
However, if we do contemplate the result printed above, we see that rows
3 and 4 contain polynomials of degree four, whereas in the “simplified” form
(3.17) we only have polynomials of degree three in the numerators. Indeed, if
we were to expand the matrix above, we would observe that rows three and
four each had a common factor of a. Similarly, if we were to (or were to get a
computer algebra system to) expand and then factor the last step, we would get
a2(af�eb)|M |, as in equation (3.18). Such common factors are not a cöıncidence
(indeed, they cannot be, since M is the most general 4⇥ 4 matrix possible).
Theorem 15 (Dodgson–Bareiss [Bar68, Dod66]) 9 Consider a matrix with
entries mi,j. Let m
(k)
i,j be the determinant
�
�
�
�
�
�
�
�
�
m1,1 m1,2 . . . m1,k m1,j
m2,1 m2,2 . . . m2,k m2,j
. . . . . . . . . . . . . . .
mk,1 mk,2 . . . mk,k mk,j
mi,1 mi,2 . . . mi,k mi,j
�
�
�
�
�
�
�
�
�
,
9The Oxford logician Charles Dodgson was better known as Lewis Carroll. Much of this
seems to have been known earlier [Chi53].
94 CHAPTER 3. POLYNOMIAL EQUATIONS
i.e. that of rows 1 . . . k and i, with columns 1 . . . k and j. In particular, the
determinant of the matrix of size n whose elements are (mi,j) is m
(n�1)
n,n and
mi,j = m
(0)
i,j . Then (assuming m
(�1)
0,0 = 1):
m
(k)
i,j =
1
m
(k�2)
k�1,k�1
�
�
�
�
�
m
(k�1)
k,k m
(k�1)
k,j
m
(k�1)
i,k m
(k�1)
i,j
�
�
�
�
�
.
Proof. By fairly tedious induction on k.
How does this relate to what we have just seen? If we do fraction-free
elimination on the matrix M of (3.16), we get (3.19), which we can rewrite as
M2 =
0
B
B
B
B
B
B
B
B
B
B
B
@
a b c d
0
�
�
�
�
a b
e f
�
�
�
�
�
�
�
�
a c
e g
�
�
�
�
�
�
�
�
a d
e h
�
�
�
�
0
�
�
�
�
a b
i j
�
�
�
�
�
�
�
�
a c
i k
�
�
�
�
�
�
�
�
a d
i l
�
�
�
�
0
�
�
�
�
a b
m n
�
�
�
�
�
�
�
�
a c
m o
�
�
�
�
�
�
�
�
a d
m p
�
�
�
�
1
C
C
C
C
C
C
C
C
C
C
C
A
, (3.190)
or, in the terminology of Theorem 15,
M2 =
0
B
B
B
@
m
(0)
1,1 m
(0)
1,2 m
(0)
1,3 m
(0)
1,4
0 m
(1)
2,2 m
(1)
2,3 m
(1)
2,4
0 m
(1)
3,2 m
(1)
3,3 m
(1)
3,4
0 m
(1)
4,2 m
(1)
4,3 m
(1)
4,4
1
C
C
C
A
. (3.1900)
The next elimination step is
M3 =
0
B
B
B
B
B
B
B
B
B
@
m
(0)
1,1 m
(0)
1,2 m
(0)
1,3 m
(0)
1,4
0 m
(1)
2,2 m
(1)
2,3 m
(1)
2,4
0 0
�
�
�
�
�
m
(1)
2,2 m
(1)
2,3
m
(1)
3,2 m
(1)
3,3
�
�
�
�
�
�
�
�
�
�
m
(1)
2,2 m
(1)
2,4
m
(1)
3,2 m
(1)
3,4
�
�
�
�
�
0 0
�
�
�
�
�
m
(1)
2,2 m
(1)
2,3
m
(1)
4,2 m
(1)
4,3
�
�
�
�
�
�
�
�
�
�
m
(1)
2,2 m
(1)
2,4
m
(1)
4,2 m
(1)
4,4
�
�
�
�
�
1
C
C
C
C
C
C
C
C
C
A
, (3.200)
and Theorem 15 guarantees that m
(0)
1,1 (i.e. a) divides the determinants in rows
3 and 4, so that we have
0
B
B
@
1 0 0 0
0 1 0 0
0 0 1/a 0
0 0 0 1/a
1
C
C
A
M3 =
0
B
B
B
@
m
(0)
1,1 m
(0)
1,2 m
(0)
1,3 m
(0)
1,4
0 m
(1)
2,2 m
(1)
2,3 m
(1)
2,4
0 0 m
(2)
3,3 m
(2)
3,4
0 0 m
(2)
4,3 m
(2)
4,4
1
C
C
C
A
. (3.21)
This is a general result.
3.2. LINEAR EQUATIONS IN SEVERAL VARIABLES 95
Corollary 4 (Bareiss’ algorithm) When doing fraction-free Gaussian elim-
ination, after clearing column k, every element of rows k+1 . . . n is divisible by
m
(k�2)
k�1,k�1.
This is actually the ‘one-step’ variant of Bareiss [Bar68]: there are other variants
with more advanced look-ahead, but they do not (and can not) cancel any more
in general. This result accounts for the factor of a observed in rows 3 and 4
above, and for the factors of a2 and af � eb in the last step. Cancelling the
a in rows 3 and 4 would in fact automatically prevent the a2 from even being
generated — far better than generating it and then cancelling it!
We normally apply Gaussian elimination (fraction-free or not) to solve sys-
tems such as (3.13), by transforming them to
U.x = a0, (3.22)
where U 0 is upper-triangular: performing the same operations on the rows of a
to get a0 as we do on M to get U. This can be viewed as elimination on the
augmented matrix M — M with a as an extra column.
Corollary 5 When doing fraction-free Gaussian elimination in the augmented
matrix M, after clearing column k, every element of rows k + 1 . . . n, including
the “right-hand side”, is divisible by m
(k�2)
k�1,k�1.�The Bareiss–Dodgson calculation pre-supposes that we are indeed doing Gaus-
sian elimination precisely, and building all the sub-determinants specified. It
is tempting to assume that we can take “short cuts”, and skip zero elements,
as after all “they are already zero, and so don’t need to be cleared”. This is
a mistake, as we may miss factors we should cancel. Unfortunately, demon-
strating this is not trivial, so there’s a Maple worksheet demonstrating this at
http://staff.bath.ac.uk/masjhd/JHD-CA/BDwarning.html.
Corollary 6 If the initial entries are integers of length l (resp. polynomials of
degree l), then after k steps, the entries will have length (resp. degree) O(kl).
This is to be contrasted with the O(2kl) of the näıve fraction-free approach.
We mote that a similar approach can be applied to the “LU decomposition” of
matrices [Jef10].
It is possible to view Euclid’s algorithm for polynomials as Gaussian elimi-
nation in a matrix (Sylvester’s matrix — definition 109) of coe�cients, and the
factors �i that are cancelled by the sub-resultant variant for normal polynomial
remainder sequences (footnote 37 on page 71) are those predicted by Corollary
4 above.
3.2.4 Complexity
Let the elements of an n⇥ n matrix have size d (degree for polynomials, or bit-
length for integers, ignoring10 the fact that a + b may have greater bit-length
10If we wanted to take it into account, we would use the Hadamard bounds (Propositions
84 and 85) to bound the size of the intermediate objects, which are k ⇥ k determinants by
Theorem 15, and get an extra factor of log d in the length.
http://staff.bath.ac.uk/masjhd/JHD-CA/BDwarning.html
96 CHAPTER 3. POLYNOMIAL EQUATIONS
than either a or n), and let the cost of multiplying two elements of size d, or
dividing an element of size 2d by an element of size d, be M(d). We will
ignore the cost of addition/substraction. Then the cost of clearing columns is
as follows.
column 1 For n� 1 rows, we do n� 1 calculations of new elements, which are
2⇥ 2 determinants: cost 2M(d) each, making a total of 2(n� 1)2M(d).
column 2 For n� 2 rows, we do n� 2 calculations of new elements, which are
2⇥ 2 determinants followed by a cancellation: cost 3M(2d) each, making
a total of 3(n� 2)2M(2d).
column k For n� k rows, we do n� k calculations of new elements, which are
2⇥ 2 determinants followed by a cancellation: cost 3M(kd) each, making
a total of 3(n� k)2M(kd).
If we call the first term 3(n � 1)2M(d) rather than 2(n � 1)2M(d) (i.e. ignore
the fact that there is no cancellation for the first column) we get
3
n�1
X
k=1
(n� k)2M(kd). (3.23)
The value of this depends on M (we’ll ignore constant factors, so there is an
implied O(· · ·), or Õ(· · ·), round everything in this analysis).
M(kd) = (kd)2 (classical arithmetic on integers, or univariate polynomials with
fixed-length coe�cients) we get
1
30
d2n5 � 1
30
d2n = O(d2n5). (3.24)
Allowing for coe�cient growth in the integer case10, we would have
O(d2n5 log2 n). (3.25)
M(kd) = kd (the dominant e↵ect in FFT-based arithmetic on integers, or uni-
variate polynomials with fixed-length coe�cients) we get 1
12
dn4� 1
12
n2d =
Õ(dn4).
Suppose we have an n⇥n matrix, and the elements are polynomials of degree
d in s variables. We have two options for computing the determinant:
1. Fraction-free Gaussian elimination, as described above;
2. Expansion by minors (see Appendix A.6), where we compute all
n(n�1)
2
�
�
�
�
a1,i a1,j
a2,i a2,j
�
�
�
�
, then all
n(n�1)(n�2)
6
�
�
�
�
�
�
a1,i a1,j a1,k
a2,i a2,j a2,k
a3,i a3,j a3,k
�
�
�
�
�
�
and so on.
3.2. LINEAR EQUATIONS IN SEVERAL VARIABLES 97
The first involves O(n3) operations on polynomials, and the second O(n2n)
such operations. The comparison seems clear-cut, we should use the Bareiss-
Dodgson Fraction-free method (Corollary 4). Furthermore the intermediate
results of Corollary 4 are, apart from the pre-cancellation
�
�
�
�
�
m
(1)
3,2 m
(1)
3,3
m
(1)
4,2 m
(1)
4,3
�
�
�
�
�
etc.,
sub-minors of the same size as would be computed in the minors method: see
(3.21) for an example.
The experimental evidence [Smi76, Smi79] is rather di↵erent, and much de-
pends in practice on the sparsity of the matrix. This is analysed in [GJ76],
whose conclusions also depend on the sparsity of the polynomials.
Open Problem 6 The computations in [Smi76, Smi79] should be repeated on
modern systems/computers, and the scaling should be re-examined now that
larger systems are feasible.
3.2.5 Over/under-determined Systems
So far we have implicitly assumed that there are as many equations as there
are unknowns, and that the equations determine the unknowns precisely (in
other words, that the determinant of the corresponding matrix is non-zero).
What happens if these assumptions do not hold? There are several cases to be
distinguished.
Over-determined and consistent Here the ‘extra’ equations are consistent
with those that determine the solution. A trivial example in one variable
would be the pair 2x = 4, 3x = 6.
Over-determined and inconsistent Here the ‘extra’ equations are not con-
sistent with those that determine the solution. A trivial example in one
variable would be the pair 2x = 4, 3x = 9, where the first implies that
x = 2, but the second that x = 3.
Spuriously over-determined This is a generalisation of “over-determined
and consistent” when, after deleting the ‘extra’ equations that convery
no new information, we are left with an under-determined system.
Under-determined and consistent Here there are not enough equations (pos-
sibly after deleting spurious ones) to determine all the variables. An exam-
ple would be x+y = 3. Here x can be anything, but, once x is chosen, y is
fixed as 3�x. Equally, we could say that y can be anything, but, once y is
chosen, x is fixed as 3�y. The solutions form a k-dimensional hyper-plane,
where k is the number of variables minus the number of (non-spurious)
equations.
Under-determined yet inconsistent Here the equations (possibly after delet-
ing spurious ones) are still inconsistent. One example would be x+ 2y +
3z = 1, 2x+ 4y + 6z = 3.
98 CHAPTER 3. POLYNOMIAL EQUATIONS
We are then left with three possibilities for the solutions, which can be cate-
gorised in terms of the dimension (‘dim’).
dim = �1 This is the conventional ‘dimension’ assigned when there are no so-
lutions, i.e. the equations are inconsistent.
dim = 0 Precisely one solution.
dim > 0 An infinite number of solutions, forming a hyperplane of dimension
dim.
3.3 Nonlinear Multivariate Equations: Distributed
Most of the section has its origin in the pioneering work of Buchberger [Buc70].
Some good modern texts are [AL94, BW93, CLO06].
If the equations are nonlinear, equation (3.15) is still available to us. So,
given the three equations
x2 � y = 0 x2 � z = 0 y + z = 0,
we can subtract the first from the second to get y�z = 0, hence y = 0 and z = 0,
and we are left with x2 = 0, so x = 0, albeit with multiplicity 2 (definition 39).
However, we can do more than this. Given the two equations
x2 � 1 = 0 xy � 1 = 0, (3.26)
there might seem to be no row operation available. But in fact we can subtract
x times the second equation from y times the first, to get x� y = 0. Hence the
solutions are x = ±1, y = x.
We can generalise equation (3.15) to read as follows: for all polynomials f
and g,
P = Q & R = S implies fP + gR = fQ+ gS. (3.27)
Lemma 5 In equation (3.27), it su�ces to consider terms (monomials with
leading coe�cients) for f and g rather than general polynomials.
Proof. Let f be
P
aimi and g be
P
bimi, where the mi are monomials and
the ai and bi coe�cients (possibly zero, but for a given i, both ai and bi should
not be zero, since then mi would be redundant). Then for each i, the monomial
version of equation (3.27) gives
P = Q & R = S implies aimiP + bimiR = aimiQ+ bimiS.
Then we can use equation (3.15) repeatedly, with � = 1, to add these together
to get the general form of equation (3.27).
Because of equation (3.2), we can regard equations as synonymous with
polynomials. Equation (3.27) then motivates the following definition.
3.3. NONLINEAR MULTIVARIATE EQUATIONS: DISTRIBUTED 99
Definition 43 Let S be a set of polynomials in the variables x1, . . . , xn, with
coe�cients from R. The ideal generated by S, denoted (S), is the set of all
finite sums
P
fisi: si 2 S, fi 2 R[x1, . . . , xn]. If S generates I, we say that S
is a basis for I.
Observation 3 In fact we can define ideals in infinitely many variables, but
the theory diverges somewhat as Theorem 4 is no longer valid. See [?].
Proposition 26 This is indeed an ideal in the sense of definition 9.
Strictly speaking, what we have defined here is the left ideal : there are also
concepts of right ideal and two-sided ideal , but all concepts agree in the case of
commutative polynomials, which we will assume until section 3.3.15.
Proposition 27 ((S)) = (S).
Definition 44 Two sets of polynomial equations are equivalent if the polyno-
mials defining the left-hand sides generate the same ideal. We will see how to
test this in corollary 7.
Just as an upper triangular matrix is a nice formulation of a set of linear equa-
tions, allowing us to “read o↵”, the solutions, so we would like a similarly ‘nice’
basis for an ideal generated by non-linear equations. In order to do this, we
will regard our polynomials in a distributed format, with the terms sorted in
some admissible (page 50) ordering >. Note that we are not requiring that
the polynomials are stored this way in an algebra system, though in fact most
algebra systems specialising in this area will do so: we are merely discussing
the mathematics of such polynomials. Having fixed such an ordering >, we can
define the following concepts.
Definition 45 If f is a non-zero polynomial, the leading term of f , denoted
lt(f), is that term greatest with respect to >. The corresponding monomial is
called the leading monomial of f , lm(f). We will sometimes apply lm to sets,
where lm(S) = {lm(s)|s 2 S}.
“Monomial algebra” is a particularly simple form of polynomial algebra: in
particular
gcd
n
Y
i=1
xaii ,
n
Y
i=1
xbii
!
=
n
Y
i=1
x
min(ai,bi)
i ,
lcm
n
Y
i=1
xaii ,
n
Y
i=1
xbii
!
=
n
Y
i=1
x
max(ai,bi)
i .
Definition 46 If lm(g) divides lm(f), then we say that g reduces f to h =
lc(g)f � (lt(f)/lm(g))g, written f !g h. Otherwise we say that f is reduced
with respect to g. The Maple user should note that Maple’s Reduce command
actually implements complete reduction — see Definition 47.
100 CHAPTER 3. POLYNOMIAL EQUATIONS
If R is a field, division is possible, and so it is more usual to reduce f to
f � (lt(f)/lt(g))g. In the construction of h, the leading terms of both lc(g)f
and (lt(f)/lm(g))g are lc(f)lc(g)lm(f), and so cancel. Hence lm(h) < lm(f).
This observation and theorem 4 give us the following result.
Proposition 28 Any chain f1 !g f2 !g f3 · · · is finite, i.e. terminates in a
polynomial h reduced with respect to g. We write f1
⇤!gh.
These concepts and results extend to reduction by a set G of polynomials, where
f !G h means 9g 2 G : f !g h. We must note that a polynomial can have
several reductions with respect to G (one for each element of G whose leading
monomial divides the leading monomial of f). For example, let G = {g1 =
x � 1, g2 = y � 2} and f = xy. Then there are two possible reductions of f :
f !g1 h1 = f � yg1 = y, and f !g2 h2 = f � xg2 = 2x. In this case h1 !g2 2
and h2 !g1 2, so that f ⇤!
G
2 uniquely, but even this need not always be the case.
If we let G = {g1 = x�1, g2 = x2} and f = x2�1, then f !g2 h2 = f�g2 = �1,
whereas f !g1 f � xg1 = x� 1!g1 0: so f ⇤!
G
0 or �1.
This definition deals with reduction of the leading monomial of f by g, but
it might be that other monomials are reducible. For simplicity we consider the
case when R is a field.
Definition 47 If any term cm of f is reducible by g, i.e. the leading monomial
of g divides m, we say that g part-reduces f , and write f )g f � (cm/lt(g))g.
We can continue this process (only finitely often, by repeated application of
theorem 4), until no monomial of f is reducible by g, when we write f
⇤)gh, and
say that f is completely reduced by g to h. Again, this extends to reduction by
a set of polynomials.
Reduction is conceptually fairly easy, but can be expensive if implemented
näıvely. Yan [Yan98] observed this, and invented the “geobucket” data structure
(see (2.2)), which ensured that we were not repeatedly subtracting polynomials
with few terms (as tends to be the case in G) from polynomials with many
terms (as f often is). In particular, he observed a factor of over 32 in one large
computation: 43 hours instead of 8 weeks!
In section 3.2.1, we performed row operations: subtracting a multiple of one
row from another, which is essentially what reduction does, except that the
‘multiple’ can include a monomial factor. It turns out that we require a more
general concept, given in the next definition.
Definition 48 Let f, g 2 R[x1, . . . , xn]. The S-polynomial of f and g, written
S(f, g) is defined as
S(f, g) =
lt(g)
gcd(lm(f), lm(g))
f � lt(f)
gcd(lm(f), lm(g))
g. (3.28)
We note that the divisions concerned are exact, and that this generalises reduc-
tion in the sense that, if lm(g) divides lm(f), then f !g S(f, g). As with
3.3. NONLINEAR MULTIVARIATE EQUATIONS: DISTRIBUTED 101
reduction, the leading monomials in the two components on the righthand
side of equation (3.28) cancel. Another way of thinking of the S-polynomial
(when R is a field) is that it is the di↵erence between what you get by reducing
lcm(lm(f), lm(g)) by f and by g.
Proposition 29 S(f, g) = �S(g, f).
Proposition 30 S(f, g) 2 ({f, g}).
3.3.1 Gröbner Bases
From now until section 3.3.14, we will assume that R is a field. However, we
will continue to use R, and not gratuitously make polynomials monic, since this
can be expensive.
Theorem 16 [BW93, Proposition 5.38, Theorem 5.48] The following condi-
tions on a set G 2 R[x1, . . . , xn], with a fixed ordering > on monomials, are
equivalent.
1. 8f, g 2 G,S(f, g) ⇤!G0. This is known as the S-Criterion.
2. If f
⇤!Gg1 and f ⇤!
G
g2, then g1 and g2 di↵er at most by a multiple in R,
i.e.
⇤!G is essentially well-defined.
3. 8f 2 (G), f ⇤!G0.
4. (lm(G)) = (lm((G))), i.e. the leading monomials of G generate the same
ideal as the leading monomials of the whole of (G).
If G satisfies these conditions, G is called a Gröbner base (or standard basis).
These are very di↵erent kinds of conditions, and the strength of Gröbner theory
lies in their interplay. Condition 2 underpins the others:
⇤!G is well-defined.
Condition 1 looks technical, but has the great advantage that, for finite G, it
is finitely checkable: if G has k elements, we take the k(k � 1)/2 unordered
(by proposition 29) pairs from G, compute the S-polynomials, and check that
they reduce to zero. This gives us either a proof or an explicit counter-example
(which is the key to algorithm 9). Since f
⇤!G0 means that f 2 (G), condition
3 means that ideal membership is testable if we have a Gröbner base for the
ideal. Condition 4 can be seen as a generalisation of “upper triangular” — see
section 3.3.5.
Now let G and H be Gröbner bases, possibly with respect to di↵erent or-
derings.
Proposition 31 If 8g 2 G, g ⇤!H0, then (G) ✓ (H).
Proof. Let f 2 (G). Then f ⇤!G0, so f = P cigi. But gi ⇤!
H
0, so gi =
P
j dijhj . Therefore f =
P
j (
P
i cidij)hj , and so f 2 (H).
102 CHAPTER 3. POLYNOMIAL EQUATIONS
Corollary 7 If 8g 2 G, g ⇤!H0, and 8h 2 H,h ⇤!G0, then (G) = (H).
Over a field, a particularly useful Gröbner base is a completely reduced
Gröbner base (abbreviated crGb) G, i.e. one where every element is completely
reduced with respect to all the others: in symbols
8g 2 G g ⇤)G\{g}g.
For a consistent set of linear polynomials, the crGb would be a set of linear
polynomials in one variable each, e.g. {x�1, y�2, z�3}, e↵ectively the solution.
In general, a monic crGb is a canonical (definition 4) form for an ideal: two ideals
are equal if, and only if, they have the same crGb (with respect to the same
ordering, of course).
Theorem 17 (Buchberger) Every polynomial ideal has a Gröbner base: we
will show this constructively for finitely-generated11 ideals over noetherian (def-
inition 10) rings.
Algorithm 9 (Buchberger)
Input: finite G0 ⇢ R[x1, . . . , xn]; monomial ordering >.
Output: G a Gröbner base for (G0) with respect to >.
G := G0; n := |G|;
# we consider G as {g1, . . . , gn}
P := {(i, j) : 1 i < j n}
while P 6= ; do
Pick (i, j) 2 P ;
P := P \ {(i, j)};
Let S(gi, gj)
⇤!Gh
If h 6= 0 then
# lm(h) /2 (lm(G))
gn+1 := h; G := G [ {h};
P := P [ {(i, n+ 1) : 1 i n};
n := n+ 1;
Optionally G :=Interreduce(G)
Proof. The polynomials added to G are reductions of S-polynomials of mem-
bers of G, and hence are in the same ideal as G, and therefore of G0. If this
process terminates, then the result satisfies condition 1, and so is a Gröbner
base for some ideal, and therefore the ideal of G. By proposition 30 and the
properties of
⇤!G, h 2 (G), so (G) is constant throughout this process and G
has to be a Gröbner base for (G0). Is it possible for the process of adding new
h to G, which implies increasing (lm(G)), to go on for ever? No: corollary 1
says that the increasing chain of (lm(G)) is finite, so at some point we cannot
increase (lm(G)) any further, i.e. we cannot add a new h.
11In fact, every polynomial ideal over a noetherian ring is finitely generated. However,
it is possible to encode undecidability results in infinite descriptions of ideals, hence we say
“finitely generated” to avoid this paradox.
3.3. NONLINEAR MULTIVARIATE EQUATIONS: DISTRIBUTED 103
Proposition 32 Every finitely generated polynomial ideal over a field K has a
completely reduced Gröbner base with respect to any given ordering, and this is
unique up to order of elements and multiplication by elements of K⇤.
Hence, for a fixed ordering, a monic crGb is a “fingerprint” of an ideal,
uniquely identifying it. This makes definition 44 algorithmic. It also allows
ideal arithmetic.
Proposition 33 Let G1 and G2 be Gröbner bases of the ideals I1 and I2 with
respect to a fixed ordering. Then:
1. I1 / I2 i↵ 8g 2 G1 g ⇤!
G2
0;
2. I1 + I2 = (G1 [G2);
3. I1I2 = ({g1g2 | g1 2 G1, g2 2 G2}) .
Furthermore, all these processes are algorithmic.
3.3.2 How many Solutions?
Here we will try to give an analysis of the various possibilities for the number
of solutions of a set of polynomial equations. We will assume that a crGb for
the polynomials has been computed, which therefore cannot be over-determined
in the sense of having redundant equations. However, we may still need more
equations than variables — see the examples at the start of section 3.3.7.
Unlike section 3.2.5 however, we have to ask ourselves “in which domain
are the solutions?” We saw in Theorem 10 that, even for an equation in one
variable, the ‘solutions’ may have no simpler formulation than ‘this is a root
of p(x)’. Fortunately, this is all that we need. We will assume that K is the
algebraic closure (definition 19) of (the field of fractions of) R.
Definition 49 The set of solutions over K of an ideal I is called the variety of
I, written V (I). If S is a set of polynomials which generates I, so I = hSi, we
will write V (S) as shorthand for V (hSi).
We should note that two di↵erent ideals can have the same variety, e.g. (x) and
(x2) both have the variety x = 0, but the solution has di↵erent multiplicity.
However, the two ideals (x2 + y2) and (x, y) both have only the solution x =
y = 0 over the reals, but over the complexes the first has the solutions x = ±iy,
and hence the varieties are di↵erent. See Section 3.5.2.
Proposition 34 V (I1 · I2) = V (I1) [ V (I2).
Proposition 35 V (I1 [ I2) = V (I1) \ V (I2).
Definition 50 The radical of an ideal I, denoted
p
I, is defined as
p
I = {p|8x 2 V (I), p(x) = 0} .
An equivalent definition is
p
I = {p|9m : pm 2 I} .
104 CHAPTER 3. POLYNOMIAL EQUATIONS
If I is generated by a single polynomial p,
p
I is generated by the square-free
part of p.
Example 5 The ideal (x2 + y2) and the ideal (xy) are both radical. However,
the ideal I = (x2 + y2, xy) is not radical, and in fact
p
I = (x, y).
Let us see how this happens. x2+y2 2 I and xy 2 I, therefore x2+y2+2xy 2 I.
But this polynomial is (x + y)2, therefore x + y 2
p
I. Since xy 2
p
I and
x+ y 2
p
I, we have xy� y(x+ y) 2
p
I. But this is just �y2, so y2 2
p
I, and
hence y 2
p
I. Similarly x 2
p
I.
In terms of varieties, V (I) is {x = 0, y = 0} (which in fact is with multiplicity
two, since it is a point of multiplicity 2 in both (x2 + y2) and (xy)). A simpler
(indeed the simplest) ideal with this variety is (x, y).
Proposition 36
p
I is itself an ideal, and
p
I � I.
Definition 51 The dimension of an ideal I in S = k[x1, . . . , xn] is the maxi-
mum number of algebraically independent, over k, elements of the quotient S/I.
We can also talk about the dimension of a variety.
No solutions in K Here the crGb will be {1}, or more generally {c} for some
non-zero constant c. The existence of a solution would imply that this
constant was zero, so there are no solutions. The dimension is undefined,
but normally written as �1.
A finite number of solutions in K There is a neat generalisation of the re-
sult that a polynomial of degree n has n roots.
Proposition 37 If it’s finite, the number (counted with multiplicity) of
solutions of a system with Gröbner basis G is equal to the number of
monomials which are not reducible by G.
It follows from this that, if (and only if) there are finitely many solutions,
every variable xi must appear alone, to some power, as the leading mono-
mial of some element of G. In this case, the dimension is zero. We return
to this case in section 3.3.7.
An infinite number of solutions in K Then some variables do not occur
alone, to some power, as the leading monomial of any element of G. In
this case, the dimension is greater than zero.
While ‘dimension’, as defined above, is a convenient generalisation of the linear
case, many more things can happen in the non-linear case. If the dimension of
the ideal is d, there must be at least d variables which do not occur alone, to
some power, as the leading monomial of any element of G. However, if d > 0,
there may be more. Consider the ideal (xy � 1) / k[x, y]. {xy � 1} is already a
Gröbner base, and neither x nor y occur alone, to any power, in a leading term
(the only leading term is xy). However, the dimension is 1, not 2, because fixing
3.3. NONLINEAR MULTIVARIATE EQUATIONS: DISTRIBUTED 105
x determines y, and vice versa, so there is only one independent variable. In the
case of a triangular set (definition 66), we can do much better, as in Proposition
45.
There are other phenomena that occur with nonlinear equations that cannot
occur with linear equations. Consider the ideal
h(x+ 1� y)(x� 6 + y), (x+ 1� y)(y � 3)i
(where the generators we have quoted do in fact form a Gröbner base, at least
for plex(x,y) and tdeg(x,y), and the leading monomials are x2 and xy). x
occurs alone, but y does not, so in fact this ideal has dimension greater than
0 but at most 1, i.e. dimension 1. But the solutions are x = y � 1 (a straight
line) and the point (3, 3). Such ideals are said to be of mixed dimension, and
are often quite tedious to work with [Laz09].
3.3.3 Orderings
In section 2.1.4, we defined an admissible ordering on monomials, and the theory
so far is valid for all orderings. What sort of orderings are admissible? We first
need an ordering on the variables themselves, which we will also denote >, and
we will assume that x1 > · · · > xn (in examples, x > y > z). Suppose the
two monomials to be compared are A = xa11 . . . x
an
n and B = x
b1
1 . . . x
bn
n . These
monomials have total degree a =
Pn
i=1 ai and b =
Pn
i=1 bi.
purely lexicographic — plex in Maple We first compare a1 and b1. If they
di↵er, this tells us whether A > B (a1 > b1) or A < B (a1 < b1). If they
are the same, we go on to look at a2 versus b2 and so on. The order is
similar to looking up words in a dictionary/lexicon — we look at the first
letter, and after finding this, look at the second letter, and so on. In this
order x2 is more important than xy10.
total degree, then lexicographic — grlex in Maple We first look at the
total degrees: if a > b, then A > B, and a < b means A < B. If
a = b, then we look at lexicographic comparison. In this order xy10 is
more important than x2, and x2y more important than xy2.
total degree, then reverse lexicographic — tdeg in Maple This order is
the same as the previous, except that, if the total degrees are equal, we
look lexicographically, then take the opposite. Many systems, in particular
Maple and Mathematica12, reverse the order of the variables first. The
reader may ask “if the order of the variables is reversed, and we then
reverse the sense of the answer, what’s the di↵erence?”. Indeed, for two
variables, there is no di↵erence. However, with more variables it does
indeed make a di↵erence. For three variables, the monomials of degree
three are ordered as
x3 > x2y > x2z > xy2 > xyz > xz2 > y3 > y2z > yz2 > z3
12http://reference.wolfram.com/mathematica/tutorial/PolynomialOrderings.html
http://reference.wolfram.com/mathematica/tutorial/PolynomialOrderings.html
106 CHAPTER 3. POLYNOMIAL EQUATIONS
under grlex, but as
x3 > x2y > xy2 > y3 > x2z > xyz > y2z > xz2 > yz2 > z3
under tdeg. One way of seeing the di↵erence is to say that grlex with
x > y > z discriminates in favour of x, whereas tdeg with z > y > x
discriminates against z. This metaphor reinforces the fact that there is
no di↵erence with two variables.
It seems that tdeg is, in general, the most e�cient order, however see
Example 6 for a specific counterexample.
k-elimination Here we choose any order >0 on x1, . . . , xk, and use that. If this
cannot decide, we then use a second order >00 on xk+1, . . . , xn. Since >
0
is admissible, the least monomial is x01 . . . x
0
k, so this order will eliminate
x1, . . . , xk as far as possible, in the sense that the polynomials in only
xk+1, . . . , xn in a Gröbner base computed with such an order are all that
can be deduced about these variables. It is common, but by no means
required, to use tdeg for both >0 and >00. Note that this is not the
same as simply using tdeg, since the exponents of xk+1, . . . , xn are not
considered unless x1, . . . , xk gives a tie.
weighted orderings Here we compute the total degree with a weighting fac-
tor, e.g. we may weight x twice as much as y, so that the total degree of
xiyj would be 2i + j. This can come in lexicographic or reverse lexico-
graphic variants.
matrix orderings These are in fact the most general form of orderings [Rob85].
Let M be a fixed n⇥ n matrix of reals, and regard the exponents of A as
an n-vector a. Then we compare A and B by computing the two vectors
M.a and M.b, and comparing these lexicographically.
lexicographic M is the identity matrix.
grlex M =
0
B
B
B
B
B
@
1 1 . . . 1 1
1 0 . . . 0 0
0
. . . 0 . . . 0
…
…
…
…
…
0 . . . 0 1 0
1
C
C
C
C
C
A
.
tdeg It would be tempting to say, by analogy with grlex, that the matrix
is
0
B
B
B
B
@
1 1 . . . 1 1
0 0 . . . 0 1
0 . . . 0 1 0
…
…
…
…
…
0 1 0 . . . 0
1
C
C
C
C
A
. However, this is actually grlex with the
variable order reversed, not genuine reverse lexicographic. To get
3.3. NONLINEAR MULTIVARIATE EQUATIONS: DISTRIBUTED 107
that, we need the matrix
0
B
B
B
B
B
@
1 1 . . . 1 1
�1 0 . . . 0 0
0
. . . 0 . . . 0
…
…
…
…
…
0 . . . 0 �1 0
1
C
C
C
C
C
A
, or, if we are
adopting the Maple convention of reversing the variables as well,
0
B
B
B
B
@
1 1 . . . 1 1
0 0 . . . 0 �1
0 . . . 0 �1 0
…
…
…
…
…
0 �1 0 . . . 0
1
C
C
C
C
A
.
k-elimination If the matrices are Mk for >
0 and Mn�k for >
00, then
M =
✓
Mk 0
0 Mn�k
◆
.
weighted orderings Here the first row of M corresponds to the weights,
instead of being uniformly 1.
Most “serious” Gröbner systems13 implement matrix orderings, but have
special case implementations for the more common ones listed above, often
storing the (weighted) total degree as well as the individual degrees to
minimise recomputation.
3.3.4 Complexity of Gröbner Bases
We have proved nothing about the running time of Buchberger’s algorithm.
Indeed, “algorithm” is almost an over-statement: we have not specified the
choice of (i, j) 2 P at all. It turns out in practice that the complexity, though
not the correctness, of the algorithm is strongly dependent on this choice, and
on the ordering > being used.
Observation 4 There have been many improvements and alternative sugges-
tions for computing Gröbner bases. Some improvements are given below. As of
writing probably the best algorithm is the F5 algorithm [Fau02, HA10]. Section
4.6 discusses ways of reducing the coe�cient growth in these computations.
Observation 5 However many improvements have been, or might be, made,
computing Gröbner bases can never be a trivial task. A classic result [MM82]
shows that the output degree can be doubly exponential in the number of variables,
and this is also the worst case [Dub90].
Theorem 18 ([MM82], [MR10, Theorem 4.1]) For any d and k, let n =
14(k + 1). There is a set Jn,d of polynomials in k[x1, . . . , xn] of degree at most
d such that any Gröbner base of Jn,d with respect to a total degree ordering
contains a polynomial of degree at least 1
2
d2
k
+ 4.
13Such as SINGULAR [Sch03b], CoCoA [Abb04] or Macauley [BS86].
108 CHAPTER 3. POLYNOMIAL EQUATIONS
Therefore this is O
⇣
d2
n/14
⌘
. We can improve this to “nearly” n/2, at the cost
of being less explicit.
Theorem 19 ([Yap91]) Fix an admissible monomial ordering. Then there is
a family of ideals In ✓ k[x1, . . . , xn] for n 2 N, generated by O(n) polynomials
of degree d, such that any Gröbner base Gn of In has degree � d2
(1/2�✏)n
for
any epsilon > 0 and su�ciently large d, n.
Theorem 20 ([Dub90]) Whatever the ordering, polynomials of total degree
d in k[x1, . . . , xn] have a reduced Gröbner base with polynomials of degree
2
⇣
d2
2
+ d
⌘2n�1
.
Open Problem 7 (Sparse Gröbner Bases) Even when k = 1, we are very
far away from being able in practice to compute Gröbner bases in 28 variables.
Also, of course, these polynomials could be very sparse (the example of Theo-
rem 18 involves only binomials), so this result does not necessarily prove that
the complexity of computing Gröbner bases is doubly exponential in a sparse
encoding. What, then, are the more practical implications of Theorems 18, 19?
We should note that the degree of a Gröbner base depends strongly on the di-
mension (Definition 51) of the ideal — see section 3.3.8 for the zero-dimensional
case, and more generally the following result.
Theorem 21 ([MR11, Theorem 8]) 14 Let I be an r-dimensional ideal in
k[x1, . . . , xn] generated by s polynomials of total degree d1 � . . . � ds. Then, for
any admissible ordering, the maximum total degree required in a Gröbner base
of I is at most 2
⇣
1
2
(d1 · · · dn�r)2(n�r) + 12d1
⌘2r
.
The worst case ideals are very far from being radical (Definition 50), and indeed
the complexity comes from this property. For radical ideals, we have a much
better degree bound.
Theorem 22 TO BE COMPLETED[Kol88] [Kollar,1988]
However, this doesn’t mean that the Gröbner bases are small, as shown by this
example.
Example 6 (van Hoeij [vH15]) Consider 3n variables V := {x1, . . . , xn, y�
1, . . . , yn, z1, . . . , zn}. Let us work modulo 2, and also with the equations S :=
{c2 � c : 8c 2 V}, so e↵ectively this is the Boolean ring. Let L := {xiyi � xi �
yi�zi : i 2 {1, . . . , n}} and H := S[L[{
Qn
i=1 zi}. Intuitively, the last equation
in H says that (at least) one of the zi is zero, and if zi = 0, the i-th equation
in L says that xi = yi = 0. If one is familiar with algebraic geometry, it is easy
to see that (H) is a radical ideal.
14This was originally published as [MR10, Corollary 3.21], but with an error corrected in
[MR11, Theorem 8].
3.3. NONLINEAR MULTIVARIATE EQUATIONS: DISTRIBUTED 109
Let T = {xizi � xi : i 2 {1, . . . , n}} [ {yizi � yi : i 2 {1, . . . , n}} and P =
{Qni=1 ci : c1 2 {x1, y1, z1}, . . . , cn 2 {xn, yn, zn}}. Note that, while all the other
sets have O(n) elements, P has 3n elements. The equations in T can be intern-
preted as “zi = 0 ! xi = 0” and “zi = 0 ! yi = 0”. van Hoeij shows
that, with respect to any total degree ordering, the reduced Gröbner base of H is
G := S [ L [ T [ P, i.e. with exponentially more elements.
The author has also observed that, with the lexicographic order and the zi
coming first, the Gröbner base only has those elements of P containing no zi,
i.e. 2n of them. So van Hoeij’s example is also one where any total degree order
has exponentially more polynomials than a good lexicographic order.
Having got, as it were, the depressing news out of the way, let’s look at some
useful results in practice.
Proposition 38 (Buchberger’s gcd (or First) Criterion [Buc79]) If
gcd(lm(f), lm(g)) = 1, (3.29)
then S(f, g)
⇤!{f,g}0.
In practice, this is implemented by not even adding to P , either in the initial
construction or when augmenting it due to a new h, pairs for which equation
(3.29) is satisfied. This proposition also explains why the more general construct
of an S-polynomial is not relevant to linear equations: when f and g are linear,
if they have the same leading variable, one can reduce the other, and if they do
not, then the S-polynomial reduces to zero.
Proposition 39 (Buchberger’s lcm (or Third) Criterion [Buc79]) If I =
(B) contains f , g, h and the reductions under
⇤!B of S(f, g) and S(f, h), and
if both lcm(lm(f), lm(g)) and lcm(lm(f), lm(h)) divide lcm(lm(g), lm(h)), then
S(g, h)
⇤!B0, and hence need not be computed.
This has been generalised to a chain of polynomials fi connecting g and h: see
[BF91].
Propositions 38 and 39 are therefore su�cient to say that we need not com-
pute an S-polynomial: the question of whether they are necessary is discussed
by [HP07]. Terminology varies in this area, and some refer to Buchberger’s
Second Criterion as well. The more descriptive gcd/lcm terminology is taken
from [Per09].
Whereas applying the gcd Criterion (Proposition 38) to S(f, g) depends only
on f and g, applying the lcm Criterion (Proposition 39 and its generalisations)
to S(g, h) depends on the whole past history of the computation. It might be
that the Criterion is not applicable now, but might become applicable in the
future. Hence we can ask
which way of picking elements from P in Algorithm 9 will maximise
the e↵ectiveness of the lcm Criterion?
110 CHAPTER 3. POLYNOMIAL EQUATIONS
A partial answer was given in [Buc79].
Definition 52 We say that an implementation of Algorithm 9 follows a nor-
mal selection strategy if, at each iteration, we pick a pair (i, j) such that
lcm(lm(gi), lm(gj)) is minimal with respect to the ordering in use.
TO BE COMPLETED(how) This does not quite specify the selection com-
pletely: given a tie between (i, j) and (i0, j0) (with i < j, i0 < j0), we choose the
pair (i, j) if j < j0, otherwise (i0, j0) [GMN+91]. Note that here we are actually
looking at the numerical values of indices, with larger values meaning “newer”
polynomials. Hence we are picking the pair whose newer element is oldest.
For the sake of simplicity, let us assume that we are dealing with a total
degree ordering, or a lexicographic ordering. The case of weighted orderings,
or so-called multi-graded orderings (e.g. an elimination ordering each of whose
components is a total degree ordering) is discussed in [BCR11].
Definition 53 A polynomial is said to be homogeneous15 if every term has the
same total degree. A set of polynomials is said to be homogeneous if each of
them separately is homogeneous. Note that we are not insisting that all terms
in the set have the same degree, merely that within each polynomial they have
the same total degree.
Definition 54 If f =
P
ci
Qn
j=1 x
ai,j
j 2 K[x1, . . . , xn] is not homogeneous, and
has total degree d, we can define its homogenisation to be f0 2 K[x0, x1, . . . , xn]
as
f0 =
X
cix
d�
P
n
j=1
ai,j
0
n
Y
j=1
x
ai,j
j .
Proposition 40 If f and g are homogeneous, so is S(f, g) and h where f !g h.
Corollary 8 If the input to Algorithm 9 is a set of homogeneous polynomials,
then the entire computation is carried out with homogeneous polynomials.
The normal selection strategy is observed to work well with homogeneous poly-
nomials, but can sometimes be very poor on non-homogeneous polynomials.
Hence [GMN+91] introduced the following concept.
Definition 55 The ‘sugar’ Sf of a polynomial f in Algorithm 9 is defined in-
ductively as follows:
1. For an input f 2 G0, Sf is the total degree of f (even if we are working
in a lexicographic ordering)
2. If t is a term, Stf = deg(t) + Sf ;
3. Sf+g = max(Sf , Sg).
15This can be extended to weighted homogeneous orderings, and most of the advantages
carry through [FSEDT14].
3.3. NONLINEAR MULTIVARIATE EQUATIONS: DISTRIBUTED 111
We define the sugar of a pair of polynomials to be the sugar of their S-polynomial,
i.e. (the notation is not optimal here!) S(f,g) = SS(f,g).
The sugar of a polynomial is then the degree it would have had we homogenised
all the polynomials before starting Algorithm 9.
Definition 56 We say that an implementation of Algorithm 9 follows a sugar
selection strategy if, at each stage, we pick a pair (i, j) such that S(gi,gj) is
minimal.
This does not completely specify what to do, and it is usual to break ties with the
normal selection strategy (Definition 52), and “sugar then normal” is generally
just referred to as “sugar”.
3.3.5 A Matrix Formulation
Equation (3.13) showed how a family of linear equations can be represented as
a matrix equation. We can do the same with nonlinear equations: (3.26) can
be written as
✓
1 0 0 0 �1
0 1 0 0 �1
◆
0
B
B
B
@
x2
xy
x
y
1
1
C
C
C
A
= 0 (3.30)
However, this does not give us an obvious solution. Rather, we need to extend
the system, allowing not just the original equations, but also y times the first
and x times the second, to give the following.
0
B
@
1 0 0 0 �1 0
0 1 0 0 0 �1
1 0 0 �1 0 0
0 0 1 0 0 �1
1
C
A
0
B
B
B
B
B
@
x2y
x2
xy
x
y
1
1
C
C
C
C
C
A
= 0. (3.31)
Elimination in this gives us
0
B
@
1 0 0 0 �1 0
0 1 0 0 0 �1
0 0 0 �1 1 0
0 0 1 0 0 �1
1
C
A
0
B
B
B
B
B
@
x2y
x2
xy
x
y
1
1
C
C
C
C
C
A
= 0, (3.32)
which produces, as the third row, the equation y � x, as we do (up to a change
of sign) after (3.26). In pure linear algebra, we can do no further, since we really
112 CHAPTER 3. POLYNOMIAL EQUATIONS
require y times this equation. This means considering
0
B
B
B
B
B
@
1 0 0 0 0 0 �1 0 0
0 1 0 0 0 0 0 �1 0
0 0 1 0 0 0 0 0 �1
1 0 0 0 �1 0 0 0 0
0 1 0 0 0 �1 0 0 0
0 0 0 0 1 0 0 0 �1
1
C
C
C
C
C
A
0
B
B
B
B
B
B
B
B
B
B
B
@
x2y2
x2y
x2
xy2
xy
x
y2
y
1
1
C
C
C
C
C
C
C
C
C
C
C
A
= 0. (3.33)
Eliminating here (using row 1 to kill the leading term in row 4, and the same
with row 2 against row 5) gives
0
B
B
B
B
B
@
1 0 0 0 0 0 �1 0 0
0 1 0 0 0 0 0 �1 0
0 0 1 0 0 0 0 0 �1
0 0 0 0 �1 0 1 0 0
0 0 0 0 0 �1 0 1 0
0 0 0 0 1 0 0 0 �1
1
C
C
C
C
C
A
0
B
B
B
B
B
B
B
B
B
B
B
@
x2y2
x2y
x2
xy2
xy
x
y2
y
1
1
C
C
C
C
C
C
C
C
C
C
C
A
= 0, (3.34)
and now row 4 can kill the leading term in row 6, to give
0
B
B
B
B
B
@
1 0 0 0 0 0 �1 0 0
0 1 0 0 0 0 0 �1 0
0 0 1 0 0 0 0 0 �1
0 0 0 0 �1 0 1 0 0
0 0 0 0 0 �1 0 1 0
0 0 0 0 0 0 1 0 �1
1
C
C
C
C
C
A
0
B
B
B
B
B
B
B
B
B
B
B
@
x2y2
x2y
x2
xy2
xy
x
y2
y
1
1
C
C
C
C
C
C
C
C
C
C
C
A
= 0. (3.35)
The last line of this corresponds to y2�1. To use this to deduce an equation for
x, we would need to consider x times this equation, which would mean adding
further rows and columns to the matrix.
In this formulation, the statement “G is a Gröbner base for (F )” corresponds
to the existence (not the uniqueness, as reduction is not unique) of matrices X,
Y and R such that
X.F = G
Y.G = F
R.G = 0
(3.36)
where F and G are the matrix versions of F and G: the first two lines say that
F and G generate the same ideal, and the last that G is a Gröbner basis.
3.3. NONLINEAR MULTIVARIATE EQUATIONS: DISTRIBUTED 113
No-one would actually suggest doing this in practice, any more than any-one
would compute a g.c.d. in practice by building the Sylvester matrix (which is
actually the univariate case of this process), but the fact that it exists can be
useful in theory, as we will find that the Sylvester matrix formulation of g.c.d.
computation is useful in chapter 4. See section 5.9.3.
3.3.6 Example
Consider the three polynomials below.
g1 = x
3yz � xz2,
g2 = xy
2z � xyz,
g3 = x
2y2 � z.
The S-polynomials to be considered are S(g1, g2), S(g1, g3) and S(g2, g3). We
use a purely lexicographical ordering with x > y > z. The leading terms of
g2 = xy
2z � xyz and g3 = x2y2 � z are xy2z and x2y2, whose l.c.m. is x2y2z.
Therefore
S(g2, g3) = xg2 � zg3 = (x2y2z � x2yz)� (x2y2z � z2) = �x2yz + z2.
This polynomial is non-zero and reduced with respect to G, and therefore G is
not a Gröbner basis. Therefore we can add this polynomial (or, to make the
calculations more readable, its negative) to G — call it g4. This means that the
S-polynomials to be considered are S(g1, g2), S(g1, g3), S(g1, g4), S(g2, g4) and
S(g3, g4).
Fortunately, we can make a simplification, by observing that g1 = xg4, and
therefore the ideal generated by G does not change if we suppress g1. This sim-
plification leaves us with two S-polynomials to consider: S(g2, g4) and S(g3, g4).
S(g2, g4) = xg2 � yg4 = �x2yz + yz2,
and this last polynomial can be reduced (by adding g4), which gives us yz
2�z2.
As it is not zero, the basis is not Gröbner, and we must enlarge G by adding
this new generator, which we call g5. The S-polynomials to be considered are
S(g3, g4), S(g2, g5), S(g3, g5) and S(g4, g5).
S(g3, g4) = zg3 � yg4 = �z2 + yz2,
and this can be reduced to zero (by adding g5). In fact, this reduction follows
from Buchberger’s lcm (third) criterion, proposition 39, as we have already
computed S(g2, g3) and S(g2, g4).
S(g2, g5) = zg2 � xyg5 = �xyz2 + xyz2 = 0.
S(g4, g5) = zg4 � x2g5 = �z3 + x2z2 = x2z2 � z3,
114 CHAPTER 3. POLYNOMIAL EQUATIONS
where the last rewriting arranges the monomials in decreasing order (with re-
spect to <). This polynomial is already reduced with respect to G, G is
therefore not a Gröbner basis, and we must add this new polynomial to G
— let us call it g6. The S-polynomials to be considered are S(g3, g5), S(g2, g6),
S(g3, g6), S(g4, g6) and S(g5, g6). The reader can check that G reduces all these
S-polynomials to zero, and that G is therefore a Gröbner basis of the ideal, viz.
g2 = xy
2z � xyz,
g3 = x
2y2 � z,
g4 = x
2yz � z2,
g5 = yz
2 � z2,
g6 = x
2z2 � z3.
No power of x, y or z occurs alone, so we see that the variety is certainly not
zero-dimensional, even though we started with three equations in three variables,
and z is undetermined. If z 6= 0, then g5 can be divided by z2 to give y = 1 and
then g3 becomes x
2 � z, hence this part of the solution variety is a parabola.
But if z = 0, all equations except g3 collapse, and we have x
2y2 = 0. Hence
this part of the solution variety is two straight lines x = z = 0 and y = z = 0,
each in fact of multiplicity four. Hence the solution is in fact of dimension one,
a fact that was not evident when we started.
3.3.7 The Gianni–Kalkbrener Theorem
In this section, we will consider the case of dimension 0, i.e. finitely many
solutions over K. We first remark that the situation can be distinctly more
challenging than in the case of linear equations, which we illustrate by means
of two examples.
1. G = {x2 � 1, y2 � 1}. This is a Gröbner base with respect to any or-
dering. There are four irreducible monomials {1, x, y, xy}, and hence four
solutions, x = ±1, y = ±1.
2. G = {x2 � 1, y2 � 1, (x � 1)(y � 1)}. This is also a Gröbner base with
respect to any ordering. There are three irreducible monomials {1, x, y},
and hence three solutions. There are x = 1, y = ±1, but when x = �1,
we only have y = 1. The additional polynomial (x�1)(y�1), which rules
out the monomial xy, rules out the solution x = y = �1. Another way of
looking at this is that, when x = 1, the polynomial (x�1)(y�1) vanishes,
but when x = �1, it adds an extra constraint.
Can we generalise this? The answer is ‘yes’, at least for purely lexicographical
Gröbner bases of zero-dimensional ideals. If the order is xn < xn�1 < · · · < x1
then such a Gröbner base G must have the form
pn(xn)
pn�1,1(xn�1, xn), . . . , pn�1,kn�1(xn�1, xn),
3.3. NONLINEAR MULTIVARIATE EQUATIONS: DISTRIBUTED 115
pn�2,1(xn�2, xn�1, xn), . . . , pn�2,kn�2(xn�2, xn�1, xn),
· · ·
p1,1(x1, · · · , xn�1, xn), . . . , p1,k1(x1, · · · , xn�1, xn),
where16
degxi(pi,j) degxi(pi,j+1) (3.37)
and pi,ki is monic in xi. Let Gk = G \ k[xk, . . . , xn], i.e. those polynomials in
xk, . . . , xn only.
Theorem 23 (Gianni–Kalkbrener [Gia89, Kal89a]) Let ↵ be a solution of
Gk+1. Then if lcxk(pk,i) vanishes at ↵, then (pk,i) vanishes at ↵. Furthermore,
the lowest degree (in xk) polynomial of the pk,i not to vanish at ↵, say pk,m↵ ,
divides all of the other pk,j at ↵. Hence we can extend ↵ to solutions of Gk by
adding xk = RootOf(pk,m↵).
This gives us an algorithm (Figure 3.4) to describe the solutions of a zero-
dimensional ideal from such a Gröbner base G. This is essentially a generalisa-
tion of back-substitution into triangularised linear equations, except that there
may be more than one solution, since the equations are non-linear, and possibly
more than one equation to substitute into.
In practice, particularly if we are interested in keeping track of multiplicities
of solutions, it may be more e�cient to perform a square-free decomposition
(Definition 38) of pn, and initialise S to a list of the RootOf of each of its
square-free factors, and also associate the multiplicity to each solution.
In case 2. above, the three equations are G = {x2�1, y2�1, (x�1)(y�1)}.
Taking xn = x, we start o↵ with S = {x = RootOf(x2�1)}, and we call GKstep
on this. The initial value of L is RootOf(x2� 1)� 1, and we ask whether this is
invertible. Adopting a common-sense (i.e. heuristic) approach for the moment,
we see that this depends on which root we take: for +1 it is not invertible, and
for �1 it is. Hence GKstep makes two recursive calls to itself, on x = 1 and
x = �1.
GKstep(G, 1, {x = 1}) Here L := lcx1(p1,1(x = 1)) is 0, so we consider p1,2,
whose leading coe�cient is 1, so y = RootOf(y2 � 1).
GKstep(G, 1, {x = �1}) Here L := lcx1(p1,1(x = �1)) is �2, and y = 1.
There is a larger worked example of this later, at equation (3.47), and a gen-
eralisation of 2 above at http://staff.bath.ac.uk/masjhd/JHD-CA/WorkedGK.
html.
Theorem 23 relies on a special case of a more general result.
16It is tempting to write < rather than in (3.37). However, this is not always possible,
even though, by the time we come to use the p
k,j
in Algorithm 11, the non-zero p
k,j
(↵) will
indeed be in a strict order of deg
xk
. We can, of course sort the p
k,j
by the purely lexicographic
order.
http://staff.bath.ac.uk/masjhd/JHD-CA/WorkedGK.html
http://staff.bath.ac.uk/masjhd/JHD-CA/WorkedGK.html
116 CHAPTER 3. POLYNOMIAL EQUATIONS
Figure 3.4: Gianni–Kalkbrener Algorithm
Algorithm 10 (Gianni–Kalkbrener) GK(G,n)
Input: A Gröbner base G for a zero-dimensional ideal I in n variables with
respect to lexicographic order.
Output: A list of solutions of G.
S := {xn = RootOf(pn)}
for k = n� 1, . . . , 1 do
S := GKstep(G, k, S)
return S
Algorithm 11 (Gianni–Kalkbrener Step) GKstep(G, k,A)
Input: A Gröbner base G for a zero-dimensional ideal I with respect to lexico-
graphic order, an integer k, and A a list of solutions of Gk+1.
Output: A list of solutions of Gk.
B := ;
for each ↵ 2 A
i := 1
while (L := (lcxk(pk,i))(↵)) = 0 do i := i+ 1
if L is invertible with respect to ↵
# see section 3.1.5
then B := B [ {(↵ [ {xk = RootOf(pk,i(↵))})}
else # ↵ is split as ↵1 [ ↵2
B := B [GKstep(G, k, {↵1}) [GKstep(G, k, {↵2})
return B
3.3. NONLINEAR MULTIVARIATE EQUATIONS: DISTRIBUTED 117
Theorem 24 (Elimination Theorem) Suppose {u1, . . . , uk} ⇢ {x1, . . . , xn}
and � is an elimination ordering (page 106), only considering the ui if the
ordering on the rest of the variables is a tie. If G is a Gröbner basis for an ideal
I with respect to �, then G\k[u1, . . . , uk] is a Gröbner basis for I\k[u1, . . . , uk]
[BW93, Proposition 6.15]. I \ k[u1, . . . , uk] is called an elimination ideal for I
with respect to the variables {u1, . . . , uk}.
Theorem 23 can be generalised in several ways to non-zero-dimensional ide-
als, but not completely [FGT01, Examples 3.6, 3.11]. The first is particualrly
instructive for us.
Example 7 Let I = hax2 + x + y, bx + yi with the order a � b � y � x. The
Gröbner base is B1[B2 and there are no polynomials in (a, b) only (so the ideal
is at least two-dimensional), in (a, b, y) we have B2 := {ay2 + b2y � by}, and
in all variables B1 := {ax2 + x + y, axy � by + y, bx + y}. We then have the
following situation for values of a and b.
normally B2 determines y (generally as the solution of a quadratic), then x =
�y/b except when b = 0,when ay2 = 0 so y = 0, and ax2+x = 0, so x = 0
or x = �1/a.
a = b = 0 B2 vanishes, so we would be tempted, by analogy with Theorem 23,
to deduce that y is undetermined. But in fact B1|a=b=0 = {x+ y, y, y}, so
y = 0 (and then x = 0).
a = 0, b = 1 Again B2 vanishes. This time, B1|a=0,b=1 = {x+ y, 0, x+ y}, and
y is undetermined, with x = �y.
This example is taken up again as Example 11.
3.3.8 The Faugère–Gianni–Lazard–Mora Algorithm
We have seen in the previous section that, for a zero-dimensional ideal, a purely
lexicographical Gröbner base is a very useful concept. But these are generally
the most expensive to compute, with a worst-case complexity of O(dn
3
) for
polynomials of degree d in n variables [CGH88]. A total degree, reverse lexico-
graphic Gröbner base, on the other hand, has complexity O(dn
2
), or O(dn) if
the number of solutions at infinity is also finite [Laz83]. If one prefers practical
evidence, we can look at (4.27), which has a simple Gröbner base of (4.28),
but 120-digit numbers occur in the intermediate calculations if we compute the
Gröbner base in a total degree order, and numbers with tens of thousands of
digits when computed in a lexicographic order. The computing time17 is 1.02
or 2.04 seconds in total degree (depending on the order of variables), but 40126
seconds (over 11 hours) in lexicographic.
Hence the following algorithm [FGLM93] can be very useful, with >0 being
total degree, reverse lexicographic and >00 being purely lexicographical , though
it does have uses in other settings as well.
17Axiom 3.4 on a 3GHz Intel P4.
118 CHAPTER 3. POLYNOMIAL EQUATIONS
Algorithm 12 (FGLM)
Input: A Gröbner base G for a zero-dimensional ideal I with respect to >0; an
ordering >00.
Output: A Gröbner base H for I with respect to >00.
H := ;; i := j := 0
Enumerate the monomials irreducible under H in increasing order for >00
#When H is a Gröbner base, this is finite by proposition 37
#This enumeration needs to be done lazily — see (*)
for each such m
Let m
⇤!Gv
if v =
Pj
k=1 ckvk
then hi+1 := m�
Pj
k=1 ckmk
H := H [ {hi+1}; i := i+ 1
# Changes “irreducible under H” (*)
else j := j + 1; mj := m; vj := v
return H
#It is not totally trivial that H is a Gröbner base, but it is [FGLM93].
Since this algorithm is basically doing linear algebra in the space spanned by the
irreducible monomials under G, whose dimension D is the number of solutions
(proposition 37), it is not surprising that the running time seems to be O(D3),
whose worst case is O(d3n). Strictly speaking, this analysis refers to the number
of arithmetic operations, ignoring any growth in coe�cient sizes.
Open Problem 8 (Complexity of the FGLM Algorithm (I)) The com-
plexity of the FGLM algorithm (Algorithm 12) is O(D3) where D is the number
of solutions. Can faster matrix algorithms such as Strassen–Winograd [Str69,
Win71] (see Notation 43) speed this up? We note that this is not trivial, since
the rows are “arriving one at a time” rather than being presented al at once.
See [FGHR13, FSEDT14] for recent progress: in particular the latter claims
O(nD!).
Open Problem 9 (Complexity of the FGLM Algorithm (II)) The com-
plexity of the FGLM algorithm (Algorithm 12) is O(D3) where D is the number
of solutions. Can we do any better in practice if the linear algebra is sparse?
Some progress in this direction has been made in [FM13].
As an example of the FGLM algorithm, we take the system Aux from their
paper18, with three polynomials
abc+ a2bc+ ab2c+ abc2 + ab+ ac+ bc
a2bc+ a2b2c+ b2c2a+ abc+ a+ c+ bc
a2b2c+ a2b2c2 + ab2c+ ac+ 1 + c+ abc
18The system is obtained as they describe, except that the substitutions are x
5
= 1/c,
x
7
= 1/a.
3.3. NONLINEAR MULTIVARIATE EQUATIONS: DISTRIBUTED 119
The total degree Gröbner basis has fifteen polynomials, whose leading monomi-
als are
c4, bc3, ac3, b2c2, abc2, a2c2, b3c, ab2c, a2bc, a3c, b4, ab3, a2b2, a3b, a4.
This defines a zero-dimensional ideal (c4, b4 and a4 occur in this list), and we
can see that the irreducible monomials are
1, c, c2, c3, b, bc, bc2, b2, b2c, b3, a, ac, ac2, ab, abc, ab2, a2, a2c, a2b, a3 :
twenty in number (as opposed to the 64 we would have if the basis only had
the polynomials a4 + · · · , b4 + · · · , c4 + · · ·). If we wanted a purely lexicographic
base to which to apply Gianni-Kalkbrener, we would enumerate the monomials
in lexicographic order as
1 (irreducble)
c (irreducble)
c2 (irreducble)
c3 (irreducble)
c4 which reduces to � 185
14
� 293
42
a3 � 1153
42
a2b + 509
7
ab2 � 323
42
b3 � 2035
42
a2c �
821
21
abc + 173
6
b2c � 751
14
ac2 + 626
21
bc2 + 31
42
c3 � 449
14
a2 + 1165
14
ab � 772
21
b2 +
550
21
ac� 429
7
bc+ 184
21
c2 � 407
6
a� 281
42
b� 4799
42
c
…
c20 which reduces to � 156473200555876438
7
+ 1355257348062243268
21
bc2 �
2435043982608847426
21
a2c� 455474473888607327
3
a� 87303768951017165
21
b�
5210093087753678597
21
c+ 1264966801336921700
7
ab� 995977348285835822
7
bc�
2106129034377806827
21
abc+ 136959771343895855
3
b2c+ 1119856342658748374
21
ac+
629351724586787780
21
c2 � 774120922299216564
7
ac2 � 1416003666295496227
21
a2b+
1196637352769448957
7
ab2 � 706526575918247673
7
a2 � 1536916645521260147
21
b2 �
417871285415094524
21
a3 � 356286659366988974
21
b3 + 373819527547752163
21
c3, which
can be expressed in terms of the previous ones as p = �1 + 6 c+ 41 c2 �
71 c3 +41 c18� 197 c14� 106 c16 +6 c19� 106 c4� 71 c17� 92 c5� 197 c6�
145 c7 � 257 c8 � 278 c9 � 201 c10 � 278 c11 � 257 c12 � 145 c13 � 92 c15.
The polynomial c20 � p gets added to H: all higher powers of c are
therefore expressible, and need not be enumerated.
Open Problem 10 (Coe�cient growth in the FGLM Algorithm) We ob-
serve above that the coe�cient growth in the expression of c20 in terms of the
>0 monomials is far greater than its expression in terms of the >00 monomials
(powers of c). Could fraction-free methods as in Theorem 15 do better?
120 CHAPTER 3. POLYNOMIAL EQUATIONS
b which can be expressed in terms of the previous ones as
q = � 9741532
1645371
� 8270
343
c+ 32325724
548457
c2 + 140671876
1645371
c3 � 2335702
548457
c18 +
13420192
182819
c14 + 79900378
1645371
c16 + 1184459
1645371
c19 + 3378002
42189
c4 � 5460230
182819
c17 +
688291
4459
c5 + 1389370
11193
c6 + 337505020
1645371
c7 + 118784873
548457
c8 + 271667666
1645371
c9 +
358660781
1645371
c10 + 35978916
182819
c11 + 193381378
1645371
c12 + 553986
3731
c13 + 43953929
548457
c15. b� q
is added to H: and all multiples of b are therefore expressible, and need
not be enumerated.
a which can be expressed in terms of the previous ones as r = 487915
705159
c18 �
4406102
705159
c � 16292173
705159
c14 � 17206178
705159
c2 � 1276987
235053
c16 � 91729
705159
c19 + 377534
705159
�
801511
26117
c3 � 26686318
705159
c4 + 4114333
705159
c17 � 34893715
705159
c5 � 37340389
705159
c6 � 409930
6027
c7 �
6603890
100737
c8� 14279770
235053
c9� 15449995
235053
c10� 5382578
100737
c11� 722714
18081
c12� 26536060
705159
c13�
13243117
705159
c15. a � r is added to H, and there are no more monomials to
consider.
These last three give us the Gröbner base in a purely lexicographical order,
which looks like
�
c20 + · · · , b+ · · · , a+ · · ·
. As there are twenty solutions in
reasonably general position (the polynomial in c alone does factor, but is square-
free), we only need one polynomial per variable, as is often the case. The
complete version of this, and another worked example, are given at http://
staff.bath.ac.uk/masjhd/JHD-CA/FGLMexample.html.
The existence of this algorithm leads to the following process for ‘solving’ a
zero-dimensional set of polynomial equations.
Algorithm 13
Input: A set S of polynomials
Output: A ‘description of solutions’
G :=Buchberger(S,>tdeg)
if G is not zero-dimensional (Proposition 37)
then return “not zero-dimensional”
else H :=FGLM(G,>plex)
Use Gianni–Kalkbrener to solve H
In the author’s experience, describing the solutions of a set of polynomial equa-
tions when the dimension is not zero is still rather an art form, but much aided
by the computation of Gröbner bases: see the description at the end of Section
3.3.6.
Other ways of computing Gröbner bases over Q will be looked at in section
4.6.
3.3.9 The Gröbner Walk
Though the Gianni–Kalkbrener algorithm only works in dimension 0, we may
still want lexicographical-order Gröbner bases in higher dimensions, again balk
at the cost of computing them directly, and look for an alternative. The FGLM
algorithm instrinsically relies on the dimension being zero. Another method,
http://staff.bath.ac.uk/masjhd/JHD-CA/FGLMexample.html
http://staff.bath.ac.uk/masjhd/JHD-CA/FGLMexample.html
3.3. NONLINEAR MULTIVARIATE EQUATIONS: DISTRIBUTED 121
Figure 3.5: Algorithm 14
Algorithm 14 (Extended Buchberger)
Input: finite G0 = {g(0)1 , . . . , g
(0)
m } ⇢ R[x1, . . . , xn]; monomial ordering >.
Output: G = {g1, . . . , gk} a Gröbner base for (G0) with respect to >.
Matrix M = (mi,j) such that gi =
P
mi,jg
(0)
j
G := G0; n := |G|; M:=n⇥ n Identity
# we consider G as {g1, . . . , gn}
P := {(i, j) : 1 i < j n}
while P 6= ; do
Pick (i, j) 2 P ;
P := P \ {(i, j)};
Let S(gi, gj)
⇤!Gh # tracking M
If h 6= 0 then
# lm(h) /2 (lm(G))
gn+1 := h; G := G [ {h};
P := P [ {(i, n+ 1) : 1 i n};
n := n+ 1;
Add new row to M
G :=Interreduce(G)
We note that we have to keep track of the mi,j during the reduction process as
well as during the S-polynomial computation. We use the version that does the
inter-reduction of the final basis, noting that this has also to update the mi,j .
which does not, is the Gröbner walk, due originally to [CKM97], see also
[AGK97, Tra00, FJLT07]. The fundamental observation is in [MR88, Lemma
2.6], that a given ideal only has finitely many possible (across all possible order-
ings) leading monomial ideals, and hence for this ideal there are only finitely
many essentially di↵erent orderings. The space of orderings is then partitioned
into polyhedral sets: a partition known as the Gröbner Fan.
We first need a variant of Algorithm 9, given in Figure 3.5, which is to
Algorithm 9 as Algorithm 5 is to Algorithm 2.
Algorithm 15 (Gröbner Walk)
Input: A Gröbner base G for an ideal I with respect to >0; an ordering >00.
Output: A Gröbner base H for I with respect to >00.
The idea is to construct (generally incrementally) a sequence of orders >1=>
0,
>2, >3, . . .>k=>
00 and corresponding Gröbner bases G1 = G, . . . , Gk = H.
We use the matrix representation of orderings, so that >i is given by the matrix
Mi — see page 106. M2 is special — its first row is the first row of M1, and the
remaining rows are the corresponding rows of Mk. All subsequent Mi have the
same rows two onwards, and their first rows are a sequence of ‘hybrids’ between
122 CHAPTER 3. POLYNOMIAL EQUATIONS
the first row of M2 and the first row of Mk. Let !i be the first row of Mi. The
reader who is familar with homotopy methods may care to view this algorithm
as a homotopy method between !1 = !2 and !k. There are then two open
questions.
1. Which hybrids, i.e. which intermediate orderings, do we need? The
answer, at least for the simple version of the Gröbner walk, is to let
!i = (1 � ti)!1 + ti!k, so t1 = t2 = 0 and tk = 1. The ti are then
computed by NextCritical.
2. How do we compute Gi, given Gi�1? Näıvely, we could use Algorithm 9
for >i, ignoring the fact that Gi�1 is a Gröbner base for >i�1, an ordering
“fairly similar” to >i. We will see that we can do better.
Definition 57 Let ! = (!1, . . . ,!n) be a vector of (non-negative) rational num-
bers. The !-degree of a monomial
Q
xaii is the sum
P
!iai. A polynomial is said
to be !-homogeneous if every term has the same !-degree. A set of polynomi-
als is said to be !-homogeneous if each of them separately is !-homogeneous.19
The !-initial form of a polynomial p, denoted init!(p), is the sum of all terms
of maximal !-degree. If > is a monomial ordering the first row of whose matrix
is !, we write init> as well as init!.
With Algorithm 14, we can express step 2 in the Gröbner Walk as “when the
initials change, run Buchberger’s algorithm (14) on the new initials, then use
the results to update the whole Gröbner base”, as described in Figure 3.6. This
depends crucially on the fact that we are only making minimal changes to the
order, and the initials: see results in [AGK97].
The theoretical complexity of the Gröbner walk has not been analysed. The
experimental results in [AGK97] show that, on zero dimensional ideals, it can be
ten or more times faster than their implementation of FGML. Their implemen-
tation of the walk contains several improvements over the one we hve outlined.
There are two main sources of ine�ciency in the algorithm as we have outlined
it.
1. If many new terms suddenly appear in init!(G), the computation of Al-
gorithm 14 can become quite close to a full Gröbner base computation by
Algorithm 9. This happens when our walk passes through the intersection
of several polyhedral cones in the Gröbner fan, and can be avoided by
perturbing the walk [AGK97, §3].
2. In general, the t computed by NextCritical, especially if we perturb to
walk to avoid the previous problem, can become quite complicated rational
numbers. We “solve” this by clearing denominators, but then have to deal
with large integers, as seen in the examples.
19Note that we are not insisting that all terms in the set have the same !-degree, merely
that within each polynomial they have the same !-degree.
3.3. NONLINEAR MULTIVARIATE EQUATIONS: DISTRIBUTED 123
Figure 3.6: Body of Algorithm 15
i:=2; >1:=>
0; ! := first row of M>0
>2:= ordering whose first row is !, rest from >
00
⌧ := first row of M>00
while (>i 6=>00)
G0 := init!(G)
G00,M :=Algorithm 14(G0, >i)
H :=Transform(G,M,>i)
t :=NextCritical(H,!, ⌧)
G := H; i := i+ 1; ! := ! + t(⌧ � !)
>i:=Ordering with matrix as >
00 but first row !
where
• Algorithm 14 is applied to an !-homogeneous set of polynomials, many
of which will, generically, be monomials. This allows or some e�ciency
improvements, see [AGK97].
• Transform(G,M,>) computes hi =
P
j mi,jgj in order >;
• NextCritical(H,!, ⌧) computes the least t > 0 such that init!(H) 6=
init!+t(⌧�!)(H).
124 CHAPTER 3. POLYNOMIAL EQUATIONS
The worked versions of the examples of section 3.3.8 for this algorithm are given
at http://staff.bath.ac.uk/masjhd/JHD-CA/GWalkexample.html. We note
that, at least in Maple, for examples of dimension zero the FGLM process
seems ten times faster than the Gröbner walk, which rather contradicts the
experimental results in [AGK97].
Open Problem 11 Perform more extensive experiments comparing the FGLM
and Gröbner Walk examples.
3.3.10 Factorization and Gröbner Bases
It may happen, either initially or during the computation of a Gröbner basis,
that we observe that a polynomial fi factors as fi,1fi,2 (or more, but for sim-
plicity we consider the case of two factors. Since fi(x1, . . . , xn) = 0 is and only
if one of fi,1(x1, . . . , xn) or fi,2(x1, . . . , xn) is zero, we can reduce the problem
to two, hopefully simpler, ones. For “random” problems, this is unlikely20 to
happen, but people do not generally ask “random” questions.
This observation21 can be powerful, but is not as simple to implement as
might be expected. One ‘clearly’ modifies algorithm 9, so that, if h = h1h2,
instead of adding h to G, one forks two copies of the algorithm, one with G[{h1}
and one with G [ {h2}.
TO BE COMPLETED
3.3.11 The Shape Lemma
Let us look again at example 2 of section 3.3.7. Here we needed three equations
to define an ideal in two variables. We note that interchanging the rôles of x
and y does not help (in this case, it might in others). However, using other
coordinates than x and y definitely does. If we write the equations in terms of
u = x+ y, v = x� y instead, we get the basis
[�4 v + v3, v2 � 4 + 2u] : (3.38)
three values for v (0, 2 and�2), each with one value of u (2, 0 and 0 respectively),
from which the solutions in x, y can be read o↵. Note that ordering v before u
would give the basis
[u2 � 2u, uv, v2 � 4 + 2u], (3.39)
which is not of this form: it has three polynomials in two variables.
This kind of operation is called, for obvious reasons, a rotation. Almost all
rotations will place the equations “in general position”: and many theoretical
approaches to these problems assume a “generic rotation” has been performed.
In practice, this is a disaster, since sparsity is lost.
20A “random” zero-dimensional system will have a ‘shape basis’ (Definition 58, and the
nonlinear polynomial is a “random” polynomial, and therefore factors with zero probability.
21Another case of (near)-simultaneous discovery: [Dav87] (who could solve in 96 seconds
problems that could not be solved in two hours), [MMN89, NM92], and [Hie92, Hie93] (solving
Quantum Yang–Baxter equations).
http://staff.bath.ac.uk/masjhd/JHD-CA/GWalkexample.html
3.3. NONLINEAR MULTIVARIATE EQUATIONS: DISTRIBUTED 125
Definition 58 ([BMMT94]) A basis for a zero-dimensional ideal is a shape
basis if it is of the form
{g1(x1), x2 � g2(x1), . . . , xn � gn(x1)} .
This is a Gröbner basis for any ordering in which ‘degree in x1’ is the first
criterion: in the terminology of matrix orderings (page 106), any ordering where
the first row of the matrix is (�, 0 . . . , 0).
For a shape basis, the Gianni–Kalkbrener process is particularly simple: “de-
termine x1 and the rest follows”. Almost all zero-dimensional ideals have shape
bases. The precise criterion (•) in the theorem below is somewhat technical,
but is satisfied if there are no repeated components.
Theorem 25 (Shape lemma) [BMMT94, Corollary 3] After a generic rota-
tion, a zero-dimensional ideal has a shape basis if, and only if,
• each primary component is simple or of local dimension 1.
Furthermore [BMMT94, Lemma 2], such a rotation need only be 1-generic, i.e.
have matrix
✓
1 v
0 I
◆
for some generic vector v.
Their paper generalises this to ideals of higher dimension, but the complexity
in notation is not worth it for our purposes.
A word of warning is in order here. The Shape Lemma is a powerful theo-
retical tool, but its application can be costly. Consider example 2 (page 114):
G = {x2 � 1, y2 � 1, (x� 1)(y� 1)}. This is certainly not a shape basis, since it
has more polynomials than indeterminates. This is inevitable, since the variety
is not equiprojectable (see Definition 68 below) onto either x or y. If we write
s = x + y, t = x � y, then the basis becomes {�4 t + t3,�4 + t2 + 2 s} for the
ordering22 s > t, which is a shape basis. However, consider the similar basis
G0 = {x2n�1, y2n�1, (xn�1)(yn�1)}. Similar rotations will work, but t is now
the root of a polynomial of degree 3n2 with at least 3n+1 nonzero coe�cients,
and quite large ones at that, e.g. for n = 3
�
8469703983104� 1328571568128 t3 + 56109155544 t6 � 3387236203 t9
+149161506 t12 � 11557977 t15 + 279604 t18 � 1053 t21 � 78 t24 + t27,
�586877251095044672 t+ 11229793345003520 t4 � 363020550569195 t7
+24557528419410 t10 � 3328382464425 t13 + 88786830300 t16 � 417476125 t19
�23303630 t22 + 307217 t25 + 259287804304663680 s
.
22But not for t > s, since s defines the other line, besides the x and y axes, going through
two of the points.
126 CHAPTER 3. POLYNOMIAL EQUATIONS
3.3.12 The Hilbert function
Let I be any ideal (other than the whole ring) of k[x1, . . . , xn]. Let A be the
algebra k[x1, . . . , xn]/I, i.e. the set of all polynomials under the equivalence
relation f ⌘ g if there is an h 2 I with f = g + h.
Proposition 41 If G is a Gröbner base for I, then A is generated, as a k-
vector space, by M := {m monomial 2 k[x1, . . . , xn]|m ⇤!
G
m}, i.e. the set of
irreducible monomials.
We have already seen (Proposition 37) that the variety corresponding to I is
zero-dimensional if, and only if, M is finite. In this case, |M | is the number of
solutions (counted with multiplicity).
The Hilbert function is a way of measuring M when it is infinite.
Definition 59 Let Al be the subset of A where there is a representative poly-
nomial of total degree l. Al is a finite-dimension vector space over k. Let HI
be the function N! N defined by
HI(l) = dimkAl. (3.40)
Proposition 42 If G is a Gröbner base for I, then Al is generated, as a k-
vector space, by M := {m monomial 2 k[x1, . . . , xn]|tdeg(m) l ^m ⇤!
G
m},
i.e. the set of irreducible monomials of total degree at most l.
Note that, while M itself will depend on G (and therefore on the ordering used),
Definition 59 defines an intrinsic property of A, and hence |M | is independent
of the order chosen.
Theorem 26 ([BW93, Lemma 9.21])
✓
l + d
d
◆
HI(l)
✓
l + n
m
◆
.
Theorem 27 ([BW93, Part of Theorem 9.27]) Let G be a Gröbner base
for I k[x1, . . . , xn] under a total degree order. Let I have dimension d, and
let
N = max
�
degxi(lm(g)) | g 2 G; 1 i n
.
Then there is a unique polynomial hI 2 Q[X] such that HI(l) = hI(l) for all
l � nN . hI is called the Hilbert polynomial of I.
3.3.13 Comprehensive Gröbner Bases and Systems
This idea was introduced in [Wei92] (see also [Wei03]).
Example 8 (Comprehensive Gröbner Basis) Consider23 first the example
of H1 := {x + 1, uy + x} ⇢ Q[u, x, y]. Under any term order with x < y,
this forms a (zero-dimensional) Gröbner base in Q(u)[x, y]. However, if we
substitute u = 0, we get {x + 1, x}, which is not a Gröbner base at all. If
23I am grateful to John Abbott for discussions about this example.
3.3. NONLINEAR MULTIVARIATE EQUATIONS: DISTRIBUTED 127
we consider instead H2 := {x + 1, uy � 1}, which is equivalent in Q(u)[x, y],
substituting u = 0 gives us {x + 1,�1}, which is a Gröbner basis (admittedly
redundant) equivalent to {�1} — no solutions. In fact H2 is what we want —
a Gröbner basis which is comprehensive in the informal sense that it is valid,
not only for symbolic u, but for all values of u.
As a formal definition, we have the following.
Definition 60 (Comprehensive Gröbner Basis) Let K be an integral do-
main, R = K[u1, . . . , um] and T = R[x1, . . . , xn], and fix an ordering on
the monomials in x1, . . . , xn. Let G be a finite subset of T . G is said to be a
Comprehensive Gröbner basis if, for all fields K 0 and all ring homomorphisms
� : R ! K 0 (extended to homomorphisms � : T ! K 0[x1, . . . , xn]), �(G) is a
Gröbner basis (under ) in K 0[x1, . . . , xn].
It is not obvious that these exist, but they do [Wei92, Theorem 2.7]. The con-
struction of these proceeds via the related concept of a Comprehensive Gröbner
System, but we need a preliminary definition.
Definition 61 (Algebraic Partition) Let K be an integral domain, R =
K[u1, . . . , um] and S ✓ Km. A finite set {S1, . . . , St} of nonempty subsets
of S is called an algebraic partition of S if it satisfies the following properties
1.
St
i=1 Si = S.
2. Si \ Sj = ; if i 6= j.
3. For each i, Si = VK(I
(1)
i ) \VK(I
(2)
i ) for some ideals I
(1)
i , I
(2)
i of R, where
VK(I) is V (I) \Km.
Each Si is called a segment.
Definition 62 (Comprehensive Gröbner System) Let {S1, . . . , St} be an
algebraic partition of S ✓ Km as in the previous definition, let T = R[x1, . . . , xn],
and fix an ordering on the monomials in x1, . . . , xn. Let F be a finite subset of
T . A finite set G := {(S1, G1), . . . , (Ss, Gs)} satisfying the following properties
is called a comprehensive Gröbner system (CGS) of F over S with parameters
u1, . . . , um w.r.t. :
1. Each Gi is a finite subset of (F );
2. For each c 2 Si, Gi(c) := {g(c, x1, . . . , xn)|g(u1, . . . , um, x1, . . . , xn) 2 Gi}
is a Gröbner basis of the ideal (F (c) in C[x1, . . . , xn] with respect to ,
where F (c) := {f(c, x1, . . . , xn)|f(u1, . . . , um, x1, . . . , xn) 2 F}
3. For each c 2 Si, lc(g)(c) 6= 0 for any element g of Gi.
In addition, if each Gi(c) is a minimal (reduced) Gröbner basis, G is said to
be minimal (reduced). Being monic is not required. When S is the whole space
Km, the phrase “over S” is usually omitted.
128 CHAPTER 3. POLYNOMIAL EQUATIONS
Example 9 (Comprehensive Gröbner System) In the setting of Example
8, we partition Q as {S1 := {0}, S2 := Q\S1}. The Gröbner basis corresponding
to S2 is either H1 or H2 (or any other variant), and these are Gröbner bases
by the gcd Criterion (Proposition 38) as long as the leading term of uy + x is
uy. Hence u = 0 is a special case, and our polynomials are uy
|{z}
=0
+x and x+ 1,
whose S-polynomial (or indeed reduction) is
0
@ uy
|{z}
=0
+x
1
A � (x+ 1) = uy
|{z}
=0
�1.
So the Gröbner basis corresponding to S1 is {uy � 1}.
Computing a Comprehensive Gröbner System is conceptually straightforward:
we start with the trivial partition {S}, and run Buchberger’s Algorithm (9).
Every time we have to decide on the zeroness or not of a leading coe�cient,
either in the S(gi, gj)
⇤!Gh step or in deciding whether h = 0 (diretcly of via
the Criteria), and that decision depends on the ui, i.e. whether a polynomial
p in the ui is zero or not, we split our set Si = VK(I
(1)
i ) \ VK(I
(2)
i ) into Si0 =
VK(I
(1)
i [ {p}) \ VK(I
(2)
i ) and Si00 = VK(I
(1)
i ) \ VK(I
(2)
i [ {p}) and continue
Algorithm 9 over each set separately, but keeping the apparently zero terms. In
practice, the same polynomials p keep cropping up, and substantial ingenuity is
needed to reduce or eliminate duplication.
Theorem 28 ([Wei92, Proposition 3.4(i)]) If G := {(S1, G1), . . . , (Ss, Gs)}
is a Comprehensive Gröbner System for F over S, then G0 :=
Ss
i=1 Gi is a
Comprehensive Gröbner Basis for F .
3.3.14 Coe�cients other than fields
Most of the theory of this section, notably theorem 16, goes over to the case
when R is a P.I.D. (definition 14) rather than a field, as described in [BW93,
section 10.1]. However, when it comes to computation, things are not quite so
obvious. What is the Gröbner base of {2x, 3y} [Pau07]? There are two possible
answers to the question.
• {2x, 3y} [Tri78].
• {2x, 3y, xy} (where xy is computed as x(3y)� y(2x)) [Buc84].
We note that xy = x(3y) � y(2x) 2 (2x, 3y), so we need xy to reduce to zero.
We therefore modify definition 48 as follows
Definition 63 Let f, g 2 R[x1, . . . , xn]. Suppose the leading coe�cients of f
and g are af and ag, and the leading monomials mf and mg. Let a be a least
common multiple of af and ag, and write a = afbf = agbg. Let m be the least
common multiple of mf and mg. The S-polynomial of f and g, written S(f, g)
is defined as
S(f, g) = bf
m
mf
f � bg
m
mg
g. (3.41)
3.3. NONLINEAR MULTIVARIATE EQUATIONS: DISTRIBUTED 129
Let cfaf + cgag = gcd(af , ag), and define the G-polynomial of f and g, written
G(f, g), as
G(f, g) = cf
m
mf
f + cg
m
mg
g. (3.42)
Note that (3.41) is the same as (3.28) up to a factor of gcd(af , ag). The S-
polynomial is defined up to unit factors, whereas the G-polynomial is much less
well-defined, but it turns out not to matter.
For the example quoted above, S(2x, 3y) = 0 (which follows from Proposition
38), while G(2x, 3y) = xy. Algorithm 9 goes over to this setting, execpt that
we have to add G-polynomials as well as S-polynomials, and some care has to
be taken to eliminate G-polynomials first — see [BW93, table 10.1].
3.3.15 Non-commutative Ideals
Much of the general mechanism of ideals generalises to the case of non-commut-
ative ideals, provided we are careful to distinguish left, right or two-sided ideals.
However, the theory is notably weaker. In particular we have the following
opposite of theorem 1 and its corollary.
Proposition 43 Khx1, . . . , xni is not noetherian for n � 2.
Hence Buchberger’s algorithm 9 might not terminate, and in general it does not
[Mor86].
In fact, not only does this approach not work, but no approach can, as
demonstrated by this result.
Proposition 44 ([KRW90]) Ideal membership is insoluble in Qhx1, x2i.
One case of great interest is when R is some field of (expressions representing)
functions, and the “indeterminates” are di↵erential or di↵erence operators.
Example 10 R is Q(x, y) and the indeterminates are @
@x
and @
@y
, so that we
are working in R[ @
@x
, @
@y
]. Here the “indeterminates” commute with each other,
but not with R, since @
@x
(xf) = f + x @
@x
f , i.e. @
@x
x = 1 + x @
@x
.
We should note that the result of multiplying a term by an indeterminate is
not necessarily a term, e.g. @
@x
�
x @
@x
�
= @
@x
+ x @
2
@x2
. This makes characterising
a Gröbner base harder, but the following definition is an appropriate generali-
sation of the last clause of theorem 16 in the setting where the indeterminates
commute with each other.
Definition 64 [Pau07, Definition 4] A finite subset G of I \ {0}, where I is a
left-ideal, is a Gröbner basis of I i↵, for all monomials m, the R-ideal (lc(f)|f 2
I ^ lm(f) = m) is generated by {lc(g)|g 2 G ^ lm(g) divides m}.
If R is a principal ideal domain, it is possible to define S-polynomials and
G-polynomials as in the previous section, but in general we need to consider
130 CHAPTER 3. POLYNOMIAL EQUATIONS
more complicated (but still finitely many) combinations [Pau07]. This leads
to an e↵ective test for a Gröbner base in this setting, i.e. we need to check
that finitely many combinations reduce to zero. We also get a generalisation
of Buchberger’s algorithm [Pau07, Proposition 10]: if the combination does not
reduce to zero, add it. Termination is non-trivial, however.
3.4 Nonlinear Multivariate Equations: Recur-
sive
Whereas the previous section looked at polynomials as living in k[x1, . . . , xn]
thought of (irrespective of implementation) is a distributed way (page 50), with
a certain order on the monomials, it is equally possible to think of them in a
recursive way (page 49), but now with an order on the variables, rather than
the monomials.
3.4.1 Triangular Sets and Regular Chains
An alternative approach to polynomial equation solving is that of characteristic
[Rit32, Wu86] or triangular [Laz91] sets, or regular chains.24 See [ALM99] for
a reconciliation of the various theories, and [AM99] for a practical comparison.
We can regard regular chains as an approach based on recursive views of poly-
nomials, while Gröbner bases are based on a distributed view. We assume that
the variables x1, . . . , xn are ordered as x1 < . . . < xn, so that xn is the most
important variable.
Definition 65 Let p be a polynomial. The main variable of p, denoted mvar(p),
is the most important variable of p. The initial of p, written init(p), is its leading
coe�cient, when regarded as a univariate polynomial in mvar(p).
If S = {p1, . . . , pk} is a finite set of polynomials, we write
init(S) = lcm1ikinit(pi).
It should be noted that many authors define init(S) as
Q
1ik init(pi). Since
we are normally concerned with the zeros of init(S), the two definitions have
the same consequences, and ours leads to smaller polynomials.
Definition 66 A set T of polynomials is said to be triangular if di↵erent poly-
nomials have di↵erent main variables.
Example 2 of section 3.3.7 shows25 that there may not always be a triangular
set generating a particular ideal. If we have a triangular set, then the structure
of the ideal, and the variety, is relatively obvious.
24Terminology in this area has been confused: we are following a recent reconciliation of
terminology.
25It actually only shows that this set of generators is not triangular. {x2 � 1, (x + 1)(y �
1) + (x� 1)(y2 � 1)} is triangular [Ron11], but is not a Gröbner basis.
3.4. NONLINEAR MULTIVARIATE EQUATIONS: RECURSIVE 131
Definition 67 Let T be a triangular set generating an ideal I in k[x1, . . . , xn].
Then every variable xi which occurs as a main variable is called algebraic, and
the set of such variables is denoted AlgVar(T ).
Proposition 45 For a triangular set T , the dimension of I(T ) is n�|AlgVar(T )|.
3.4.2 Zero Dimension
Much of the theory applies to positive dimension as well, but we will only
consider in this section the case of zero-dimensional ideals/varieties. Let V be
a zero-dimensional variety, and Vk be its projection onto x1, . . . , xk, i.e.
Vk = {(↵1, . . . ,↵k) : 9(↵1, . . . ,↵n) 2 V }.
Definition 68 A zero-dimensional variety V is equiprojectable i↵, for all k,
the projection Vk ! Vk�1 is an nk : 1 mapping for some fixed nk. Note that this
definition depends on the order of the xi: a variety might be equiprojectable
with respect to one order, but not another, as in (3.38) versus (3.39).
Such an equiprojectable variety will have
Q
nk points (i.e. solutions, not count-
ing multiplicity, to the equations).
The variety V of Example 2 of section 3.3.7 is {(x = �1, y = 1), (x =
1, y = ±1)} and is not equiprojectable. In fact, its equations can be written as
{(x2�1), (x�1)(y�1)+(x+1)(y2�1)}, which is a triangular set with y more
important than x (main variables x and y respectively). However, the second
polynomial sometimes has degree 1 in y (if x = �1), and sometimes degree 2.
Hence we need a stronger definition.
Definition 69 A list, or chain, of polynomials f1, . . . , fk is a regular chain if:
1. whenever i < j, mvar(fi) � mvar(fj) (therefore the chain is triangular);
2. init(fi) is invertible modulo the ideal (init(fj) : j < i).
Proposition 46 Every equiprojectable variety corresponds to a zero-dimensional
regular chain, and vice versa.
However, V of Example 2 of section 3.3.7 can be written as V = V1 [ V2
where V1 = {(x = �1, y = 1)} and V2 = {(x = 1, y = ±1)}, each of which
is equiprojectable. The corresponding regular chains are T1 = {x+1, y�1} and
T2 = {x� 1, y2 � 1}.
Theorem 29 (Gianni–Kalkbrener (triangular variant)) Every zero-dim-
ensional variety can be written as a union of disjoint equiprojectable varieties
— an equiprojectable decomposition.
132 CHAPTER 3. POLYNOMIAL EQUATIONS
In fact, each solution description in Algorithm 10 is a description of an equipro-
jectable variety.
This theorem can be, and was, proved independently, and the decomposition
into regular chains (the union of whose varieties is the original variety) can be
computed directly. This gives us an alternative to algorithm 13: compute the
regular chains corresponding to the equiprojectable decomposition, and solve
each one separately [Laz92].
It appears that the triangular decomposition approach is more suitable to
modular methods (chapter 4, especially section 4.6) than the Gröbner-base ap-
proach, but both aspects are areas of active research.
3.4.3 Positive Dimension
Here we consider the case of solution sets of positive dimension (over the alge-
braic closure, e.g. over the complexes). As in Theorem 29, the ultimate aim is
to express a variety as a union (preferably a disjoint union) of “nicer” varieties,
or other sets.
Definition 70 Quasi-algebraic System If P and Q are two (finite) sets of poly-
nomials, we call the ordered pair (P,Q) a quasi-algebraic system, and we write
Z(P,Q), the zeros of the quasi-algebraic system, for V (P ) \ V (QQ), with the
convention that if Q is empty,
Q
Q = 1, so V (Q) = ;.
Z(P,Q) = {x 2 Kn|(8p 2 P p(x) = 0) ^ (8q 2 Q q(x) 6= 0)} .
We say that (P,Q) is consistent if Z(P,Q) 6= ;.
In Definition 49, we defined the variety of a set of polynomials, but we need some
more concepts, all of which depend on having fixed an order of the variables.
Definition 71 If T is a triangular system, we define the pseudo-remainder of
p by T to be the pseudo-remainder of dividing p by each qi 2 T in turn (turn
defined by decreasing order of mvar(qi)), regarded as univariate polynomials in
mvar(qi).
This is a generalization of Definition 36 (page 65).
Definition 72 Let S be a finite set of polynomials. The set of regular zeros of
S, written W (S), is Z(S, {init(S)}) = V (S) \ V ({init(S)}). For (a1, . . . , an) to
be in W (S), where S = {p1, . . . , pk}, we are insisting that all of the pi vanish
at this point, but none of the init(pi).
For Example 2 of section 3.3.7, the variety is {(x = �1, y = 1), (x = 1, y = ±1)}.
If we take y > x, then the inital of the set of polynomials is lcm(1, x � 1, 1) =
x � 1, so only the zero with x = �1, y = 1 is regular. Conversely, if we take
x > y, the initial is y � 1 and only the zero with y = �1, x = 1 is regular. This
emphasises that W depends on the variable ordering. It is also a property of
the precise set S, not just the ideal hSi.
3.4. NONLINEAR MULTIVARIATE EQUATIONS: RECURSIVE 133
In this case, W (S) was in fact a variety (as always happens in dimension 0).
In general, this is not guaranteed to happen: consider the (trivial) triangular
system S = {(x�1)y�x+1} with y > x. Since this polynomial is (x�1)(y�1),
V (S) is the two lines x = 1 and y = 1. However, W (S) is the line y = 1 except
for the point (1, 1). In fact this is the only direct description we can give,
though we could say that W (S) is “almost” the line y = 1. This “almost” is
made precise as follows.
Definition 73 If W is any subset of Kn, the Zariski closure of W , written26
W , is the smallest variety containing it:
W =
\
{V (F ) | W ✓ V (F )},
which is itself a variety by Proposition 35.
In the example above, W (S) = V (y � 1).
3.4.3.1 An example
This example is from [AM99, p. 126]27. Suppose we have, in two dimensions,
a manipulator consisting of an arm of length 1 fixed at the origin, and with
another arm, also of length 1, at its other end. We wish the far end of the
manipulator to reach the point (a, b) in the plane. Let ✓1 be the angle that the
first arm makes with the x axis, and write c1 = cos ✓1, s1 = sin ✓1. Let ✓2 be
the angle that the second arm makes with the first. Then we have the following
equations
c1 + cos(✓1 + ✓2) = a (3.43)
s1 + sin(✓1 + ✓2) = b (3.44)
s21 + c
2
1 = 1 (3.45)
s22 + c
2
2 = 1, (3.46)
where the last two equations state that the arms have length 1. We can apply
the addition formulae for trigonometric functions to (3.43) and (3.44) to get
c1 + c1c2 � s1s2 = a 3.430,
s1 + c1s2 + c2s1 = b 3.44
0.
Rewriting these equations as polynomials, assumed to be zero, and using the
order
c2 > s2 > c1 > s1 > b > a,
we get
S = {c2c1 � s2s1 + c1 � a, c2s1 + s2c1 + s1 � b, c21 + s21 � 1, c22 + s22 � 1},
26Note that we use the same notation for algebraic closure and Zariski closure.
27The author is grateful to Russell Bradford for explaining the geometric context.
134 CHAPTER 3. POLYNOMIAL EQUATIONS
which is not triangular since c2 is the main variable of three di↵erent equations.
[AM99] implement the method of [Laz91] to express V (S) as a (disjoint)
union
W (T1) [W (T2) [W (T3),
where
T1 = {(b2 + a2)(4s21 � 4bs1 + b2 + a2)� 4a2, 2ac1 + 2bs1 � b2 � a2,
2as2 + 2(b
2 + a2)s1 � b2 � a2b, 2c2 � b2 � a2 + 2},
T2 = {a, 2s1 � b, 4c21 + b2 � 4, s2 � bc1, 2c2 � b2 + 2},
T3 = {a, b, c21 + s21 � 1, s2, c2 + 1}.
3.4.3.2 Another Example
This was also considered as Example 7 (page 117).
Example 11 Let I = hax2 + x+ y, bx+ yi with the order a � b � y � x. The
full triangular decomposition (obtained from Maple’s RegularChains package
with option=lazard) is
�
[bx+ y, ay + b2 � b], [x, y], [ax+ 1, y, b], [x+ y, a, b� 1]
.
The first two components correspond to two two-dimensional surfaces (in fact the
second one is a plane), whereas the second two correspond to one-dimensional
solutions (hyperbola and straight line).
3.4.3.3 Regular Zeros and Saturated Ideals
Definition 74 If T is a triangular system, define the saturated ideal of T to
be
sat(T ) = {p 2 K[x1, . . . , xn]|9n 2 N init(T )np 2 (T )}
= {p 2 K[x1, . . . , xn]|prem(p, T ) = 0} .
In other words it is the set of polynomials which can be reduced to zero by T
after multiplying by enough of the initials of T so that division works. In terms
of the more general concept of saturation I : S1 of an ideal ([Bou61, p. 90]),
this is (T ) : (init(T ))1.
Theorem 30 ([ALM99, Theorem 2.1]) For any non-empty triangular set
T ,
W (T ) = V (sat(T )).
In the example motivating Definition 73, sat(S) is generated by y�1, and indeed
W (S) = V (sat(S)).
3.4. NONLINEAR MULTIVARIATE EQUATIONS: RECURSIVE 135
3.4.4 Conclusion
Whether we follow the Gianni–Kalkbrener approach directly (algorithm 13) or
go via triangular sets, the solutions to a zero-dimensional family of polynomial
equations can be expressed as a union of (equiprojectable) sets, each of which
can be expressed as a generalised RootOf construct. For example, if we take the
ideal
{�3x� 6 + x2 � y2 + 2×3 + x4,�x3 + x2y + 2x� 2 y,�6 + 2×2 � 2
y2 + x3 + y2x,�6 + 3×2 � xy � 2 y2 + x3 + y3},
its Gröbner basis (purely lexicographic, y > x) is
[6� 3×2� 2×3+x5,�x3+x2y+2x� 2 y, 3x+6�x2+ y2� 2×3�x4]. (3.47)
There are seven irreducible monomials: 1, x, x2, x3, x4, y and xy. We know
that x satisfies a quintic, and y then satisfies
�
x2 � 2
�
y�x3+2x. When x2 = 2,
this vanishes, so our quintic for x decomposes into (x2 � 2)(x3 � 3), and the
whole solution reduces to
⌦
x2 � 2, y2 � x
↵
[
⌦
x3 � 3, y � x
↵
. (3.48)
Unfortunately, we do not have a convenient syntax to express this other than
via the language of ideals. We are also very liable to fall into the ‘too many
solutions’ trap, as in equation (3.8): Maple resolves the first component (in
radical form) to
n
y =
4
p
2, x =
p
2
o
, (3.49)
and the second one to
n
y =
3
p
3, x =
3
p
3
o
, (3.50)
both of which lose the connections between x and y (x = y2 in the first case,
x = y in the second).
We are also dependent on the choice of order, since with x > y the Gröbner
basis is
[6� 3 y4 � 2 y3 + y7, 18� 69 y2 � 9 y4 � 46 y + 23 y5 � 2 y6 + 73x], (3.51)
and no simplification comes to mind, short of factoring the degree seven poly-
nomial in y, which of course is (y3 � 3)(y4 � 2), and using the choice here to
simplify the equation for x into either x� y or x� y2.
Maple’s RegularChains package, using the technology of section 3.4.1, pro-
duces essentially equation (3.48) for the order y > x, and for x > y produces
[[
�
2 y + y3 + 4 y2 + 2
�
x� 8� 2 y2 � 2 y3 � 2 y, y4 � 2],
[
�
5 y + 3 + 4 y2
�
x� 12� 5 y2 � 3 y,�3 + y3]],
essentially the factored form of 3.51.
136 CHAPTER 3. POLYNOMIAL EQUATIONS
3.4.5 Regular Decomposition
TO BE COMPLETED
3.5 Equations and Inequalities
While it is possible to work in more general settings (real closed fields), we will
restrict our attention to solving systems over R. Consider the two equations
x2 + y2 = 1 (3.52)
x2 + y2 = �1. (3.53)
Over the complexes, there is little to choose between these two equations, both
define a one-dimensional variety. Over R, the situation is very di↵erent: (3.52)
still defines a one-dimensional variety (a circle), while (3.53) defines the empty
set, even though we have only one equation in two variables.
Definition 75 ([ARS+13]) We say that a complex (hyper)-surface V := {(x1,
. . . , xn)|p1(x1, . . . , xn) = · · · = pk(x1, . . . , xn) = 0} is real if every complex
polynomial vanishing over VR := V \ Rn also vanishes over V . Algebraic
Geometers would say that V is the Zariski closure of VR.
With the definition, (3.52) defines a real surface (in fact a curve), but (3.53)
does not, since 1 vanishes over V \Rn (which is the empty set), but not over
V .
The above example shows that we can essentially introduce the constraint
x � 0 by adding a new variable y and the equation y2 � x = 0. We can also
introduce the constraint x 6= 0 by adding a new variable z and xz � 1 = 0
(essentially insisting that x be invertible). Hence x > 0 can be introduced.
Having seen that � and > can creep in through the back door, we might as well
admit them properly, and deal with the language of real closed fields, i.e. the
language of fields (definition 15) augmented with the binary predicate > and
the additional laws:
1. Precisely one of a = b, a > b and b > a holds;
2. a > b and b > c imply a > c;
3. a > b implies a+ c > b+ c;
4. a > b and c > 0 imply ac > bc.
This is the domain of real algebraic geometry , a lesser-known, but very im-
portant, variant of classical algebraic geometry. Suitable texts on the subject
are [BPR06, BCR98]. However, we will reserve the word ‘algebraic’ to mean
a set defined by equalities only, and reserve semi-algebraic for the case when
inequalities (or inequations28) are in use. More formally:
28Everyone agrees that an equation a = b is an equality. a > b and its variants are tradition-
ally referred to as inequalities. This only leaves the less familiar inequation for a 6= b. Some
3.5. EQUATIONS AND INEQUALITIES 137
Definition 76 An algebraic proposition is one built up from expressions of
the form pi(x1, . . . , xn) = 0, where the pi are polynomials with integer coe�-
cients, by the logical connectives ¬ (not), ^ (and) and _ (or). A semi-algebraic
proposition is the same, except that the building blocks are expressions of the
form pi(x1, . . . , xn)�0 where � is one of =, 6=, >,�, <,. The language of semi-
algebraic propositions is also called the Tarski language L.
This language is in fact redundant, since 6=,�, can be replaced with the help
of ¬, but corresponds more closely to natural usage. The reader will also notice
that it is not quite the language of real closed fields described above, since we
do not allow division. This is partly for ease of subsequent development, but
also allows us to sidestep “division by zero” questions, as raised in problem 1 of
section 1.2.3. Hence the proposition p
q
> 0 has to be translated as
(q > 0 ^ p > 0) _ (q < 0 ^ p < 0), (3.54) which is not true when q = 0. If this is not what we mean, e.g. when p and q have a common factor, we need to say so. Open Problem 12 (Better treatment of division) (3.54) is equivalent to pq > 0. However, p
q
� 0 is not equivalent to pq � 0, but rather to pq � 0 ^ q 6=
0. In general, the polynomial theorists tend to dismiss the problem as above.
The logician tends to worry much more about the problem: consider [AP10,
Appendix, lines 1–12] for an example of rational function manipulation.
3.5.1 Applications
It turns out that many of the problems one wishes to apply computer algebra
to can be expressed in terms of real semi-algebraic geometry. This is not totally
surprising, since after all, the “real world” is largely real in the sense of R.
Furthermore, even if problems are posed purely in terms of equations, there
may well be implicit inequalities as well. For example, it may be implicit that
quantities are non-negative, or that concentrations is biochemistry lie in the
range [0, 1].
Robot motion planning . . .TO BE COMPLETED
It is also often important to prove unsatisfiability , i.e. that a semi-algebraic
formula has no solutions. [Mon09] gives several examples, ranging from program
proving to biological systems. The program proving one is as follows. One
wishes to prove that I is an invariant (i.e. if it was true at the start, it is true
at the end) of a program which moves from one state to another by a transition
relation ⌧ . More formally, one wishes to prove that there do not exist two states
s, s0 such that s 2 I, s0 /2 I, but s!⌧ s0. Such a pair (s, s0) would be where “the
program breaks down”, so a proof of unsatisfiability becomes a proof of program
correctness. This places stress on the concept of ‘proof’ — “I can prove that
there are no bad cases” is much better than “I couldn’t find any bad cases”.
treatments ignore inequations, since “a 6= b”=“a > b _ a < b”, but in practice it is useful to
regard inequations as first-class objects.
138 CHAPTER 3. POLYNOMIAL EQUATIONS
3.5.2 Real Radical
. We recall the second part of Definition 50: the radical of an ideal I, denotedp
I, is defined as p
I = {p|9m : pm 2 I} .
Definition 77 Let A = R[x1, . . . , xk] (it is possible to be more general, and
talk about real closed fields). The real radical of an ideal I ⇢ A, denoted re
p
I,
is defined as
re
p
I =
(
p|9m, k 2 N, ri 2 R+, gi 2 A : p2m +
k
X
i=1
rig
2
i 2 I
)
.
3.5.3 Quantifier Elimination
A fundamental result of algebraic geometry is the following, which follows from
the existence of resultants (section A.1).
Theorem 31 A projection of an algebraic set is itself an algebraic set.
For example, the projection of the set defined by
n
(x� 1)2 + (y � 1)2 + (z � 1)2 � 4, x2 + y2 + z2 � 4
o
(3.55)
on the x, y-plane is the ellipse
8x2 + 8 y2 � 7� 12x+ 8xy � 12 y. (3.56)
We can regard equation (3.55) as defining the set
9z
⇣
(x� 1)2 + (y � 1)2 + (z � 1)2 = 4 ^ x2 + y2 + z2 = 4
⌘
(3.57)
and equation (3.56) as the quantifier-free equivalent
8x2 + 8 y2 � 12x+ 8xy � 12 y = 7. (3.58)
Is the same true in real algebraic geometry? If P is a projection operator,
and < denotes the real part, then clearly
P (<(U) \ <(V )) ✓ <(P (U \ V )). (3.59)
However, the following example shows that the inclusion can be strict. Consider
n
(x� 3)2 + (y � 1)2 + z2 � 1, x2 + y2 + z2 � 1
o
Its projection is (10� 6x� 2 y)2, i.e. a straight line (with multiplicity 2). If we
substitute in the equation for y in terms of x, we get z =
p
�10x2 + 30x� 24,
3.5. EQUATIONS AND INEQUALITIES 139
which is never real for real x. In fact <(U) \ <(V ) = ;, as is obvious from
the geometric interpretation of two spheres of radius 1 centred at (0, 0, 0) and
(3, 1, 0). Hence the methods we used for (complex) algebraic geometry will not
translate immediately to real algebraic geometry.
The example of y2�x, whose projection is x � 0, shows that the projection
of an algebraic set need not be an algebraic set, but might be a semi-algebraic
set. Is even this guaranteed? What about the projection of a semi-algebraic
set? In the language of quantified propositions, we are asking whether, when F
is an algebraic or semi-algebraic proposition, the proposition
9y1 . . . 9ymF (y1, . . . , ym, x1, . . . , xn) (3.60)
has a quantifier-free equivalentG(x1, . . . , xn), whereG is a semi-algebraic propo-
sition. We can generalise this.
Problem 3 (Quantifier Elimination) Given a quantified proposition29
Q1y1 . . . QmymF (y1, . . . , ym, x1, . . . , xn), (3.61)
where F is a semi-algebraic proposition and the Qi are each either 9 or 8, does
there exist a quantifier-free equivalent semi-algebraic proposition G(x1, . . . , xn)?
If so, can we compute it?
The fact that there is a quantifier-free equivalent is known as the Tarski–
Seidenberg Principle [Sei54, Tar51]. The first constructive answer to the ques-
tion was given by Tarski [Tar51], but the complexity of his solution was in-
describable30. A better (but nevertheless doubly exponential) solution had
to await the concept of cylindrical algebraic decomposition (CAD) [Col75] de-
scribed in the next section.
Notation 23 Since 9x9y is equivalent to 9y9x, and similarly for 8, we extend
9 and 8 to operate on blocks of variables, so that, if x = (x1, . . . , xn), 9x is
equivalent to 9x1 . . . 9xn. If we use this notation to rewrite equation 3.61 with
the fewest number of quantifiers, the quantifiers then have to alternate, so the
formula is (where the yi are sets of variables)
8y19y28y3 . . . F (y1,y2, . . . , x1, . . . , xn), (3.62)
or
9y18y29y3 . . . F (y1,y2, . . . , x1, . . . , xn). (3.63)
In either form, the number of (block) quantifiers is one more than the number
of alternations.
29Any proposition with quantified variables can be converted into one in this form, so-called
prenex normal form — see any standard logic text.
30In the formal sense, that there was no elementary function which could describe it, i.e.
no tower of exponentials of fixed height would su�ce!
140 CHAPTER 3. POLYNOMIAL EQUATIONS
3.5.4 Algebraic Decomposition
Definition 78 An algebraic decomposition of Rn is an expression of Rn as
the disjoint union of non-empty connected sets, known as cells, each defined as
p1(x1, . . . , xn)�0 ^ · · · ^ pm(x1, . . . , xn)�0, (3.64)
where the � are one of =, >,<. Equation (3.64) is known as the defining formula
of the cell C, and denoted Def(C).
These should properly be called semi-algebraic decompositions, but this termi-
nology has stuck. Note that (3.64) need not define a non-empty connected set
— external information is required to show this. We should note that these
definitions are a very restricted form of definition 76. Here are some examples.
1. R1 can be decomposed as {x < 0} [ {x = 0} [ {x > 0}.
2. R1 cannot be decomposed as {x2 = 0} ^ {x2 > 0}, as the second set is
not connected. Rather, we need the previous decomposition.
3. R1 cannot be decomposed as
{(x2 � 3)2 � 2 = 0} ^ {(x2 � 3)2 � 2 > 0} ^ {(x2 � 3)2 � 2 < 0},
as the sets are not connected. Rather, we need the decomposition (writing
(x2 � 3)2 � 2 as f)
{f > 0 ^ x < �2} [ {f = 0 ^ x < �2} [ {f < 0 ^ x < 0} [
{f = 0 ^ x > �2 ^ x < 0} [ {f > 0 ^ x > �2 ^ x < 2} [
{f = 0 ^ x > 0 ^ x < 2} [ {f < 0 ^ x > 0} [
{f = 0 ^ x > 2} [ {f > 0 ^ x > 2}.
4. R2 can be decomposed as {(x2+y2) < 0}[{(x2+y2) = 0}[{(x2+y2) > 0}.
5. R2 cannot be decomposed as {xy < 1}[ {xy = 1}[ {xy > 1}, as the last
two sets are not connected. Rather, we need the more complicated
{xy < 1} [ {xy = 1 ^ x > 0} [ {xy = 1 ^ x < 0} [ {xy > 1 ^ x < 0} [ {xy > 1 ^ x > 0}.
6. R2 cannot be decomposed as {f < 0} [ {f = 0} [ {f > 0}, where f =
�
x2 + y2 � 1
� �
(x� 3)2 + y2 � 1
�
, as the first two sets are not connected.
Rather, we need the more complicated
{f < 0 ^ x < 3 2 } [ {f < 0 ^ x > 3
2
} [ {f = 0 ^ x < 3
2
}
[{f = 0 ^ x > 3
2
} [ {f > 0}
3.5. EQUATIONS AND INEQUALITIES 141
The reader may complain that example 3 is overly complex: can’t we just write
{f > 0 ^ x < �2} [ {x = � p 3 + p 2} [ {f < 0 ^ x < 0} [ {x = � p 3� p 2 < 0} [ {f > 0 ^ x > �2 ^ x < 2} [ {x = p 3� p 2} [ {f < 0 ^ x > 0} [ {x =
p
3 +
p
2} [ {f > 0 ^ x > 2}?
In this case we could, but in general theorem 10 means that we cannot31: we
need RootOf constructs, and the question then is “which root of . . .”. In example
3, we chose to use numeric inequalities (and we were lucky that they could be
chosen with integer end-points). It is also possible [CR88] to describe the roots
in terms of the signs of the derivatives of f , i.e.
{f > 0 ^ x < �2} [ {f = 0 ^ f 0 < 0 ^ f 000 < 0} [ {f < 0 ^ x < 0} [
{f = 0 ^ f 0 > 0 ^ f 000 < 0} [ {f > 0 ^ x > �2 ^ x < 2} [
{f = 0 ^ f 0 < 0 ^ f 000 > 0} [ {f < 0 ^ x > 0} [
{f = 0 ^ f 0 > 0 ^ f 000 > 0} [ {f > 0 ^ x > 2}
(as it happens, the sign of f 00 is irrelevant here). This methodology can also
be applied to the one-dimensional regions, e.g. the first can also be defined as
{f > 0 ^ f 0 > 0 ^ f 00 < 0 ^ f 000 < 0}.
We may ask how we know that we have a decomposition, and where these
extra constraints (such as x > 0 in example 5 or x < 3
2
in example 6) come
from. This will be addressed in the next section, but the brief answers are:
• we know something is a decomposition because we have constructed it
that way;
• x = 0 came from the leading coe�cient (with respect to y) of xy � 1,
whereas 3
2
in example 6 is a root of Discy(f).
We stated in definition 78 that the cells must be non-empty. How do we
know this? For the zero-dimensional cells {f = 0 ^ x > a ^ x < b}, we can rely
on the fact that if f changes sign between a and b, there must be at least one
zero, and if f 0 does not32, there cannot be more than one: such an interval can
be called an isolating interval . In general, we are interested in the following
concept.
Definition 79 A sampled algebraic decomposition of Rn is an algebraic de-
composition together with, for each cell C, an explicit point Sample(C) in that
cell.
By ‘explicit point’ we mean a point each of whose coordinates is either a rational
number, or a precise algebraic number: i.e. a defining polynomial33 together
31And equation (3.11) demonstrates that we probably wouldn’t want to even when we could!
32Which will involve looking at f 00 and so on.
33Not necessarily irreducible, though it is normal to insist that it be square-free.
142 CHAPTER 3. POLYNOMIAL EQUATIONS
with an indication of which root is meant, an isolating interval, a su�ciently
exact34 numerical approximation or a Thom’s Lemma [CR88] list of signs of
derivatives.
Definition 80 A decomposition D of Rn is said to be sign-invariant for a
polynomial p(x1, . . . , xn) if and if only if, for each cell C 2 D, precisely one of
the following is true:
1. 8x 2 C p(x) > 0;
2. 8x 2 C p(x) < 0; 3. 8x 2 C p(x) = 0; It is sign-invariant for a set of polynomials if, and only if, for each polynomial, one of the above conditions is true for each cell. It therefore follows that, for a sampled decomposition, the sign throughout the cell is that at the sample point. A stronger concept is provided by the following definition. Definition 81 A decomposition D of Rn is said to be order-invariant for a polynomial p(x1, . . . , xn) if and if only if, for each cell C 2 D, precisely one of the following is true: 1. 8x 2 C p(x) > 0; and
2. 8x 2 C p(x) < 0;
3. 9k 2 N such that:
(3a) 8x 2 C all derivatives of p of order at most k vanish at x; and
(3b) 8x 2 C there exists a derivative @k+1p
@xi1 ...@xik+1
which does not vanish at x
• note that it may be di↵erent order k + 1-derivatives at di↵erent points of
C.
It is order-invariant for a set of polynomials if, and only if, for each polynomial,
one of the above conditions is true for each cell. Order-invariance is a strictly
stronger concept than sign-invariance.
34By this, we mean an approximation such that the root cannot be confused with any other,
which generally means at least an approximation close enough that Newton’s iteration will
converge to the indicated root. Maple’s RootOf supports such a concept.
3.5. EQUATIONS AND INEQUALITIES 143
3.5.5 Cylindrical Algebraic Decomposition
The idea of Cylindrical Algebraic Decomposition is due to [Col75]. The pre-
sentation here is more general (see Observation 6), and largely unpublished.
Notation 24 Let n > m be positive natural numbers, and let Rn have coordi-
nates x1, . . . , xn, with R
m having coordinates x1, . . . , xm.
Definition 82 An algebraic decomposition D of Rn is said to be cylindrical
over a decomposition D0 of Rm if the projection onto Rm of every cell of D is a
cell of D0. The cells of D which project to C 2 D0 are said to form the cylinder
over C, denoted Cyl(C). For a sampled algebraic decomposition, we also insist
that the sample point in C be the projection of the sample points of all the cells
in the cylinder over C. This definition is usually stated when m = n�1, but the
greater generality is theoretically worth having, even though we only currently
know how to compute these when m = n� 1.
Cylindricity is by no means trivial.
Example 12 Consider the decomposition of R2 = S1 [ S2 [ S3 where
S1 = {(x, y) | x2 + y2 � 1 > 0},
S2 = {(x, y) | x2 + y2 � 1 < 0},
S3 = {(x, y) | x2 + y2 � 1 = 0}.
This is an algebraic decomposition, and is sign-invariant for x2 + y2 � 1. How-
ever, it is not cylindrical over any decomposition of the x-axis R1. The projec-
tion of S2 is (�1, 1), so we need to decompose R1 as
(�1,�1) [ {�1} [ (�1, 1) [ {1} [ (1,1). (3.65)
S3 projects onto [�1, 1], which is the union of three sets in (3.65). We have to
decompose S3 into four sets:
S3,1 = {(�1, 0)}, S3,2 = {(1, 0)},
S3,3 = {(x, y) | x2 + y2 � 1 = 0 ^ y > 0},
S3,4 = {(x, y) | x2 + y2 � 1 = 0 ^ y < 0}.
S1 splits into eight sets, one above each of (�1,�1) and (1,1) and two above
each of the other components of (3.65). It is obvious that this is the minimal
refinement of the original decomposition to possess a cylindric decomposition.
Furthermore in this case no linear transformation of the axes can reduce this.
If we wanted a sampled decomposition, we could choose x-coordinates of �2,
�1, 0, 1 and 2, and y-coordinates to match, from {0,±1,±2}.
Cylindricity is fundamental to solving problem 3 via the following two proposi-
tions.
144 CHAPTER 3. POLYNOMIAL EQUATIONS
Proposition 47 Let
9xn . . . 9xm+1P (x1, . . . , xn) (3.66)
be an existentially quantified formula, D be a sampled algebraic decomposition
of Rn which is sign-invariant for all the polynomials occurring in P , and D0
be a sampled algebraic decomposition of Rm such that D is cylindrical over D0.
Then a quantifier-free form of (3.66) is
_
C02D0:9C2Cyl(C0)P (Sample(C))
Def(C 0). (3.67)
Proposition 48 Let
8xn . . . 8xm+1P (x1, . . . , xn) (3.68)
be a universally quantified formula, D be a sampled algebraic decomposition of
Rn which is sign-invariant for all the polynomials occurring in P , and D0 be
a sampled algebraic decomposition of Rm such that D is cylindrical over D0.
Then a quantifier-free form of (3.68) is
_
C02D0:8C2Cyl(C0)P (Sample(C))
Def(C 0). (3.69)
These two propositions lead to a solution of problem 3.
Theorem 32 ([Col75]) Let x0, . . . ,xk be sets of variables, with xi = (xi,1, . . . ,
xi,ni), and let Ni =
Pi
j=0 nj. Let P (x0, . . . ,xk) be a semi-algebraic proposition,
Di be an algebraic decomposition of R
Ni such that each Di is cylindric over Di�1
and Dk is sign-invariant for all the polynomials in P . Then a quantifier-free
form of
Qkxk . . . Q1x1P (x0, . . . ,xk) (3.70)
(where the Qi are 8 or 9) is
_
C02D089C2Cylk(C0)P (Sample(C))
Def(C 0), (3.71)
where by 89 we mean that we are quantifying across the coordinates of Sample(C)
according to the quantifiers in (3.70).
We can use the (sampled) cylindrical algebraic decomposition in example 12 to
answer various questions.
Example 13 8y x2 + y2 � 1 > 0. For the sampled cells h(�1,�1), (x =
�2, y = 0)i and h(1,1), (x = 2, y = 0)i, the proposition is true at the sample
points, hence true everywhere in the cell. For all the other cells in (3.65), there
is a sample point for which it is false (in fact, y = 0 always works). So the
answer is (�1,�1) [ (1,1).
3.5. EQUATIONS AND INEQUALITIES 145
Example 14 9y x2 + y2 � 1 > 0. For every cell in (3.65), there is a sample
point above it for which the proposition is true, hence we deduce that the answer
is (3.65), which can be simplified to true.
We should note (and this is both one of the strengths and weaknesses of this
approach) that the same cylindrical algebraic decomposition can be used to
answer all questions of this form with the same order of (blocks of) quantified
variables, irrespective of what the quantifiers actually are.
Example 15 (9y x2 + y2 � 1 > 0) ^ (9y x2 + y2 � 1 < 0). This formula
is not directly amenable to this approach, since it is not in prenex form. In
prenex form, it is 9y19y2
�
(x2 + y21 � 1 > 0) ^ (x2 + y22 � 1 < 0)
�
and we need
an analogous35 decomposition of R3 cylindric over R1. Fortunately, (3.65)
su�ces for our decomposition of R1, and the answer is (�1 < x < 1), shown
by the sample point (x = 0, y1 = 2, y2 = 0), and by the fact that at other sample
points of R1, we do not have y1, y2 satisfying the conditions.
We should note that it is not legitimate to reduce the formula to
9y
�
(x2 + y2 � 1 > 0) ^ (x2 + y2 � 1 < 0)
�
,
since that is trivially false. It is, however, legitimate to solve two separate
problems, 9y x2+y2�1 > 0 (which is true, with y = 2) and 9y x2+y2�1 < 0
(which is �1 < x < 1), and combine to get �1 < x < 1.
Open Problem 13 (RAG Formulation 1) It is pretty obviously simpler to
solve two two-dimensional problem than one three-dimensional one, but in gen-
eral the correct translation of an arbitrary statement such as that of Example
15 into the most e�cient problem formulation is a hard one.
However, it could be argued that all we have done is reduce problem 3 to
the following one.
Problem 4 Given a quantified semi-algebraic proposition as in theorem 32,
produce a sign-invariant decomposition Dk cylindrical over the appropriate Di
such that theorem 32 is applicable. Furthermore, since theorem 32 only talks
about “a” quantifier-free form, we would like the simplest possible such Dk (see
[Laz88]).
There is no common term for such a decomposition: we will call it block-
cylindrical .
Observation 6 Collins’ original presentation of Cylindrical Algebraic Decom-
position insisted that every decomposition of Rm, generated by x1, . . . , xm was
cylindrical, in the sense of Definition 82, over every Rk generated by x1, . . . , xk.
Currently, this is the only sort of decomposition we know how to compute,
but, since the weaker form of block-cylindrical is all we need for quantifier elim-
ination, we have given the more general definition.
35Easier said than done. Above x = �1 we have nine cells:{y
1
< 0, y
1
= 0, y
1
> 0}⇥ {y
2
< 0, y 2 = 0, y 2 > 0}, and the same for x = 1, whereas above (�1, 1) we have 25, totalling 45.
146 CHAPTER 3. POLYNOMIAL EQUATIONS
Figure 3.7: Cylindrical Decomposition after Collins
Sn ⇢ R[x1, . . . , xn] Rn Rn decomposed by Dn
# # ” Cyl
Sn�1 ⇢ R[x1, . . . , xn�1] Rn�1 Rn�1 decomposed by Dn�1
# # ” Cyl
· · · · · · # ” · · ·
S2 ⇢ R[x1, x2] R2 R2 decomposed by D2
# # ” Cyl
S1 ⇢ R[x1] R1 �!
|{z}
Problem 1
R1 decomposed by D1.
3.5.6 Computing Algebraic Decompositions
Though many improvements have been made to it since, the basic strategy
for computing algebraic decompositions is still generally36 that due to Collins
[Col75], and is to compute them cylindrically, as illustrated in the Figure 3.7.
From the original proposition, we extract the set of polynomials Sn. We then
project this set into Sn�1 in n � 1 variables, and so on, until we have a set
of univariates S1. We then isolate, or otherwise describe, the roots of these
polynomials, as described in problem 1, to produce a decomposition D1 of R
1,
and then successively lift this to a decomposition D2 of R
2 and so on, each Di
being sign-invariant for Si and cylindrical over Di�1.
Note that the projection from Si+1 to Si must be such that a decomposition
Di sign-invariant for Si can be lifted to a decomposition Di+1 sign-invariant for
Si+1. Note also that the decomposition thus produced will be block-cylindric
for every possible blocking of the variables, since it is block-cylindric for the
finest such.
Projection turns out to be a trickier problem than might be expected. One’s
immediate thought is that one needs the discriminants (with respect to the
variable being projected) of all the polynomials in Si+1, since this will give all
the critical points. Then one sees that one needs the resultants of all pairs of
such polynomials. Example 5 (page 140) shows that one might need leading
coe�cients. Then there are issues of what happens when leading coe�cients
vanish. This led Collins [Col75] to consider the following projection operation
ProjC for a set A of polynomials in x1, . . . , xn, where xn is the variable being
projected.
Notation 25 Define the following sets, where we assume A has k polynomials
of degree d.
B is the set of non-constant (with respect to xn) iterated reducta (Notation 13)
36But [CMXY09] have an alternative strategy based on triangular decompositions, as in
section 3.4, which may well turn out to have advantages.
3.5. EQUATIONS AND INEQUALITIES 147
of all elements of A: kd elements.
L is the set of all leading coe�cients of B, which is the same as saying all the
coe�cients of non-constant terms of A: kd elements.
S1 is the set of all principal subresultant coe�cients (Definition 110) pscj(g, @g@xn )
for all g 2 B: kd2 elements.
S2 is the set of all principal subresultant coe�cients pscj(g1, g2) for all g1, g2 2
B : g1 6= g2: k2d3 elements.
ProjC(A) is L [ S1 [ S2: O(k2d3) elements.
[Col75, essentially Theorem 5] showed that this projection operator is su�cient
for Figure 3.7 to be valid: more precisely the following.
Theorem 33 ([Col75]) If A is a set of polynomials in x1, . . . , xn] and D is a
cylindrical decomposition of Rn�1 sign-invariant for ProjC(A), then the polyno-
mials of A are delineable over every cell of D and D can be lifted to a cylindrical
decomposition of Rn sign-invariant for A.
The problem with this is the size of ProjC(A). McCallum [McC84, McC88] saw
that one could do better if one used order-invariance (Definition 81) instead.
Notation 26 Let B be a square-free basis for the primitive parts of A. Define
the following sets, where we assume A has k polynomials of degree d.
C The set of all contents of A.
B A square-free basis for the primitive parts of A.
C{ The set of all coe�cients of B.
D The set of all discriminants of B.
R The set of all resultants of B, i.e. {Resxn(Bi, Bj) : 1 i < j |B|}
ProjM (A) is C [ C{ [D [R.
It might seem relatively hard to say how big ProjM (A) is, since the process of
square-free decomposition might not decrease d, but might increase k.
Definition 83 ([McC84]) We say that a set of polynomials has the (m, d)
property if it can be partitioned into m sets of polynomials such that the product
of the elements of any one set has degree at most d (in each variable taken
separately.
Proposition 49 If A has the (m, d) property, then so does a square-free basis
for it, the set of all discriminants has the (m, 2d2) property, and the set of all
resultants (R above) has the (m(m+1)
2
, 2d2) property.
148 CHAPTER 3. POLYNOMIAL EQUATIONS
In these terms, there is a good bound for ProjM (A).
Lemma 6 ([BDE+14, Lemma 11]37) If A has the (m, d) property, then ProjM (A)
has the
⇣j
(m+1)2
2
k
, 2d2
⌘
property.
Theorem 34 ([McC84, McC88]) If A is a set of polynomials in x1, . . . , xn]
and D is a cylindrical decomposition of Rn�1 order-invariant for ProjB(A),
then the polynomials of A are delineable over every cell of D on which they do
not vanish identically and D can be lifted to a cylindrical decomposition of Rn
order-invariant for A0: those elements of A that do not vanish identically on
some cell of D.
The condition about elements of A not vanishing identically is known as stating
that A is well-oriented. Hence if A and all the ProjM (A), ProjM (ProjM (A)),
. . . are well-oriented, then Theorem 34 means that we can produce an order-
invariant decomposition of Rn for A. Note that, although “order-invariant” is
a stronger condition than “sign-invariant”, the fact that ProjM is much smaller
than ProjC means that we have a much more e�cient algorithm when it works,
i.e. when it does not detect the failure of well-orientedness.
What do we do when it does fail? In the notation of Figure 3.7, McCallum
suggests that, if we detect that a polynomial fk 2 Si vanishes identically over
some cell of Di�1, we should augment Si with all the partial derivatives
@fk
@xj
(1 j i) and project Si again. To the best of the author’s knowledge, this
has never been implemented due to the control-flow complexities it introduces,
and the usual solution is to give up and use ProjC (or a variant due to [Hon90]).
TO BE COMPLETED
3.5.7 Describing Solutions
If describing the roots of a general polynomial in one variable was tricky (Section
3.5.5), it is more so in two or more dimensions: not that the mathematics doesn’t
exist, but rather that we are, in general, unused to dealing with it. Consider the
example f := y3 � 7 y2 + 14 y � x� 8 in Figure 3.8. Above each value of x, y is
given by a univariate polynomial, whose roots can be described is terms of the
signs of the derivatives, by Thom’s Lemma (Lemma 4). The top and bottom
branches of the curve are defined by f = 0, f 0 > 0, but are distinguished by the
signs of f 00. Similarly, the two inflection points are defined by f = f 0 = 0, but
are again distinguished by the signs of f 00. However, the middle branch, f = 0,
f 0 < 0, is unhelpfully split by f 00 = 0 into two parts. We could (though doing
so algorithmically has never, to the author’s knowledge, been solved) remove
this distinction and just define the branch as f = 0, f 0 < 0. Since the Collins
projection ProjC includes all the derivatives, it is capable of expressing the
solutions in the Thom’s Lemma manner.
Conversely, and perhaps more naturally, we can try counting the branches,
just as we might count the roots of a univariate polynomial (starting from 1).
This is shown in Figure 3.9. Here the top branch is split: it starts o↵ being
3.5. EQUATIONS AND INEQUALITIES 149
Figure 3.8: y3 � 7 y2 + 14 y � x� 8: Thom’ Lemma
150 CHAPTER 3. POLYNOMIAL EQUATIONS
Figure 3.9: y3 � 7 y2 + 14 y � x� 8: indexing
the third root of f , but as we cross the (x-value of the) inflection point of the
other branch, it becomes the first root (passing momentarily through the pont
at which it is the second root). Unlike the previous description, this split cannot
be removed. Note also that “the first root” is not continuous, jumping from the
bottom branch to the top branch at this x-value.
3.5. EQUATIONS AND INEQUALITIES 151
Here is an edited version38 of the output from Maple on this problem.
8
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
< >
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
:
8
< : [RC, [[�4,�4], [�1,�1]]] y < R (f1, i1) [RC, [[�4,�4], [1/4, 1/2]]] y = R (f1, i1) [RC, [[�4,�4], [2, 2]]] R (f1, i1) < y x < R (f3, i1) 8 >
>
>
>
>
>
< >
>
>
>
>
>
:
[RC, [[� 271
128
,� 135
64
], [�1,�1]]] y < R (f2, i1) [RC, [[� 271 128 ,� 135 64 ], [ 1 2 , 5 8 ]]] y = R (f2, i1) [RC, [[� 271 128 ,� 135 64 ], [ 15 8 , 15 8 ]]] ^ ✓ R (f2, i1) < y, y < R (f2, i2) ◆ [RC, [[� 271 128 ,� 135 64 ], [ 25 8 , 13 4 ]]] y = R (f2, i2) [RC, [[� 271 128 ,� 135 64 ], [5, 5]]] R (f2, i2) < y x = R (f3, i1) 8 >
>
>
>
>
>
>
>
>
>
>
>
< >
>
>
>
>
>
>
>
>
>
>
>
:
[RC, [[� 95
128
,� 95
128
], [�1,�1]]] y < R (f1, i1) [RC, [[� 95 128 ,� 95 128 ], [3/4, 1]]] y = R (f1, i1) [RC, [[� 95 128 ,� 95 128 ], [ 13 8 , 13 8 ]]] ^ ✓ R (f1, i1) < y, y < R (f1, i2) ◆ [RC, [[� 95 128 ,� 95 128 ], [ 9 4 , 5 2 ]]] y = R (f1, i2) [RC, [[� 95 128 ,� 95 128 ], [ 25 8 , 25 8 ]]] ^ ✓ R (f1, i2) < y, y < R (f1, i3) ◆ [RC, [[� 95 128 ,� 95 128 ], [ 15 4 , 4]]] y = R (f1, i3) [RC, [[� 95 128 ,� 95 128 ], [5, 5]]] R (f1, i3) < y ^ ✓ R (f3, i1) < x, x < R (f3, i2) ◆ 8 >
>
>
>
>
>
< >
>
>
>
>
>
:
[RC, [[5/8, 81
128
], [0, 0]]] y < R (f2, i1)
[RC, [[5/8, 81
128
], [ 11
8
, 3/2]]] y = R (f2, i1)
[RC, [[5/8, 81
128
], [ 11
4
, 11
4
]]] ^
✓
R (f2, i1) < y,
y < R (f2, i2)
◆
[RC, [[5/8, 81
128
], [4, 33
8
]]] y = R (f2, i2)
[RC, [[5/8, 81
128
], [6, 6]]] R (f2, i2) < y
x = R (f3, i2)
8
<
:
[RC, [[2, 2], [3, 3]]] y < R (f1, i1)
[RC, [[2, 2], [ 17
4
, 9/2]]] y = R (f1, i1)
[RC, [[2, 2], [6, 6]]] R (f1, i1) < y
R (f3, i2) < x
f1 := Z
3 � 7 Z 2 + 14 Z � x� 8;
f2 := 14 Z
2 + (�9x� 72) Z + 21x+ 70
f3 := 27 Z
2 + 40 Z � 36
This splits R2 into 23 cells, which is the minimum for any cylindrical decom-
position with y projected first. In the other direction, we just have three cells:
“below the curve”, “on the curve” and “above the curve”. This sort of varia-
tion depending on projection order is not unusual: [BD07] have examples in n
variables where the number of cells is constant for some projections, and 22
O(n))
for others. The variation is not universal though: there are also examples with
22
O(n))
cells for all projection orders.
The drawback of this representation is that “the ith branch of” is not semi-
algebraic in the sense of Definition 76, and we need to convert to the other
in order to stay within the semi-algebraic language. One might think that
38Common polynomials have been pulled out, and various abbreviations, such as R(f, i
1
)
for RootOf(f, index = real
1
), have been used.
152 CHAPTER 3. POLYNOMIAL EQUATIONS
this involved adding derivatives in order to apply Thom’s Lemma (Lemma 4),
and that these derivatives wouldneed to enter into the projection phase, thus
moving us a long way back to the original Collins projection from the more
e�cient current ones, but fortunately this is not the case.
3.5.8 Complexity
Let us suppose that there are s polynomials involved in the input formula (3.61),
of maximal degree d. Then such a cylindrical algebraic decomposition can be
computed in time O
⇣
(sd)2
O(k))
⌘
.
There are examples [BD07, DH88], which shows that this behaviour is best-
possible, indeed the projection onto R1 might have a number of components
doubly-exponential in k. This is true even for [BD07], where the polynomials
are “only” linear.
While this behaviour is intrinsic to cylindrical algebraic decomposition, it is
not necessarily intrinsic to quantifier elimination as such. If a is the number of
alternations of quantifiers (Notation 23) in the problem (so a < k), then there
are algorithms [Bas99, for example] whose behaviour is singly-exponential in
k but doubly-exponential in a; typically O
⇣
(sd)O(k
2)2O(a))
⌘
. The construction
of [BD07, DH88] makes extensive use of _ symbols, and this is essential, as
[RESW14] have shown that, with no _ symbols in (3.61), the linear problem
can be solved in polynomial time, essentially by a series of reductions to linear
programming, which can be solved in polynomial time [Kha79, Kar84]. See also
Open Problem 15.
One particular special case is that of no alternations. Hence, using the
fact that 9x (P (x) _Q(x)) is equivalent to (9xP (x))_ (9xQ(x)), an existential
problem is equivalent to a set39 of problems of the form
9x
0
@
^
fi2F
fi(x) � 0
1
A ^
0
@
^
gi2G
gi(x) = 0
1
A ^
^
hi2H
hi(x) 6= 0
!
. (3.72)
This is generally referred to as the existential theory of the reals. Since the
truth of a universal problem is equivalent to the falsity of an existential problem
(8xP (x), ¬9x¬P (x)), this is all we need to consider.
Given a problem (3.72), cylindrical algebraic decomposition will yield such
an x, if one exists, and failure to yield one is a proof that no such x exists.
However, this is a somewhat unsatisfactory state of a↵airs in practice, since,
computationally, we are relying not just on the correctness of the theory of
cylindrical algebraic decomposition, but also on the absence of bugs in the im-
plementation.
An alternative is provided by the Positivstellensatz approach [Ste74].
39There may be singly-exponential blow-up here as we convert into disjunctive normal form,
but this is small compared to the other exponential issues in play!
3.5. EQUATIONS AND INEQUALITIES 153
Theorem 35 ([PQR09, Theorem 3]) The set of solutions to (3.72) is empty
if, and only if, there are:
s 2 con(F ) where con(F ), the cone of F , is the smallest set generated by F and
the set of squares of all elements of R[x] wich is closed under multiplication
and addition;
g 2 (G) the ideal generated by G;
m 2 mon(H) where mon(H), the (multiplicative) monoid of H is the set of all
products (inding 1 = the empty product) of elements of H;
such that s+ g +m2 = 0. Furthermore, there is an algorithm to find such s, g
and m (if they exist) in Q[x] provided F , G and H ⇢ Q[x].
Partial Proof. If s + g + m2 = 0 but x is a solution to (3.72), then s(x) +
g(x)+m(x)2 is of the form “non-negative + zero + strictly positive”, so cannot
be zero.
We can think of (s, g,m) as a witness to the emptiness of the set of solutions
to (3.72). Again, failure to find such an (s, g,m) is a proof of the existence of
solutions provided we trust the correctness of Theorem 35 and the correctness
of the implementation.
3.5.9 Further Observations
1. The methodology outlined in figure 3.7, and indeed that of [CMXY09],
has the pragmatic drawback that the decomposition computed, as well as
solving the given problem (3.61), solves all other problems of the same
form with the same polynomials and the variables in the same order. For
example, a decomposition which allows us to write down a quantifier-free
equivalent of
8x49x3p(x1, x2, x3, x4) > 0 ^ q(x1, x2, x3, x4) < 0 (3.73)
will also solve
8x49x3p(x1, x2, x3, x4) > 0 ^ q(x1, x2, x3, x4) < 0 (3.74)
and even
9x49x38x2p(x1, x2, x3, x4) < 0 ^ q(x1, x2, x3, x4) � 0 (3.75)
The process of Partial Cylindrical Algebraic Decomposition [CH91] can
make the lifting process (right hand side " in Figure 3.7) more e�cient, but
still doesn’t take full account of the structure of the incoming quantified
formula.
2. If F is of the form p(y1, . . . , ym, x1, . . . , xn) = 0^ �̂(y1, . . . , ym, x1, . . . , xn),
or can be transformed into this form, then, as observed in [Col98] we are
154 CHAPTER 3. POLYNOMIAL EQUATIONS
only concerned in Theorem 32 with those cells on which p is zero, and
the polynomials in �̂ do not need to be sign-invariant elsewhere. This was
formalised in [McC99], where p was described as an equational constraint.
This idea has been further generalised in [BDE+13] to cases where ev-
ery part of F has an equational constraint, but not necessarily the same
one, and in [BDE+14] to cases where only parts of F have equational
constraints, and/or where there is more than one equational constraint.
[EBD15], building on [McC01], shows how to handle more than one equa-
tional constraint, using the observation that, in p1(y1, . . . , ym, x1, . . . , xn) =
0 ^ p2(y1, . . . , ym, x1, . . . , xn) = 0 ^ �̂, Resxn(p1, p2) is also an equational
constraint after we project out xn.
3. Yet a further approach to Quantifier Elimination has recently been im-
plemented [FIS15, FIS15], though based on an idea of [Wei98], that a
suitable Comprehensive Gröbner System (see Section 3.3.13) can let one
write down the quantifier-free equivalent to a formula.
4. It is tempting to think that “the problem is the data structure”, and maybe
if used Straight-Line Programs (page 52), or some other data structure,
we could do better. However, this hope is destroyed by [CGH+03] who
state:
In this paper we are going to argue that the non-polynomial
complexity character of the known symbolic geometric elimi-
nation procedures is not a special feature of a particular data
structure (like the dense, sparse or arithmetic circuit encoding
of polynomials), but rather a consequence of the information
encoded by the respective data structure . . .
and their Theorem 4 (which is too complicated to state here in detail), says
that any universal elimination procedure for the theory of algebraically
closed firleds of charactersitic zero must have non-polynomial complexity.
Open Problem 14 (Not all CADs are outputs of our algorithms) Consider
the H-shaped set
(x = �1&y 2 (�1, 1)) [ (�1 < x < 1&y = 0) [ (x = 1&y 2 (�1, 1)).
Then indeed a cylindrical algebraic decomposition is
x < �1 one cell
x = �1 five cells (y < �1, y = �1, �1 < y < 1, y = 1, y > 1)
�1 < x < 1 three cells (y < 0, y = 0, y > 0)
x = 1 five cells
x > 1 one cell
but not one we know how to compute. Indeed, the author is not sure know how
to state it formally to QEPCAD or Maple. TO BE COMPLETED
3.6. VIRTUAL TERM SUBSTITUTION 155
3.6 Virtual Term Substitution
The idea of Virtual Term Substitution was introduced by [Wei94]40: our pre-
sentation owes much to [KSD15].
3.6.1 The Weak Case
Problem 5 Let us consider first a special case of Problem 3, 9x�(x,u), with
the following additional constraints:
1. � is positive, i.e. expressed purely in terms of ^ and _, with no ) or ¬;
* (this can be achieved by standard laws of logic, then converting ¬(a < b)
into a � b etc.)
2. each elementary formula is pi(x)�i0 with �i 2 {=,,�};
* (we will see later how to lift this restriction)
3. each polynomial in such a formula is at most quadratic in x;
* (this restriction has been investigated in [Wei94] — see also higher below,
but most practical uses of the Virtual Term Substitution method seem to
be with linear/quadratic formulae).
We wish to rewrite this as a logical formula (u).
We consider the values of u to be fixed. Then in principle (we will need to
worry about degeneracies), each pi defines one (linear) or two (quadratic) critical
points. Furthermore, because of point 2, it is su�cient to consider the truth of
� at the critical points, since, if �(x,u) is true, �(x#,u) and �(x",u) are also
true, where x# and x" are the critical points immediately and � x (we will
therefore need to worry later on about the case of x less than, or greater than,
any critical point).
We handle the question of degeneracies by considering, not just critical points
e, but rather guarded points41 (�, e) where � is a logical formula in u, and e is
only to be considered if � is true. We then consider each polynomial pi in �,
depending on its degree in x.
linear Suppose pi is f1x+ f0. Then the corresponding guarded point is
✓
f1 6= 0,
�f0
f1
◆
(3.76)
quadratic Suppose pi is f2x
2 + f1x + f0, with � := f
2
1 � 4f2f0. Then the
corresponding guarded points are
f2 6= 0 ^� � 0,
�f1 �
p
�
2f2
!
, (3.77)
40Foreshadowed by [Wei88] for the linear case.
41The concept is as old as Virtual Term Substitution, but this terminology is our own.
156 CHAPTER 3. POLYNOMIAL EQUATIONS
f2 6= 0 ^� � 0,
�f1 +
p
�
2f2
!
(3.78)
and
✓
f2 = 0 ^ f1 6= 0,
�f0
f1
◆
. (3.79)
higher One need merely read the discussion in Section 3.1.2 to see that cubic
equations would generate many more guarded points. Worse, as seen in
(3.10), we may need to involve complex numbers. See [Wei94] for details.
However, the points defined in (3.76)–(3.79) are not defined in the Tarski lan-
guage L of semi-algebraic propositions (Definition 76), and hence we cannot just
substitute them for x in pi(x). It is for this reason that we speak of Virtual Term
Substitution. We use the notation [x//t] to denote substituting, in this sense, t
for x in the whole proposition pi(x)�i0, noting that this takes propositions in L
to Boolean combinations of propositions in L.
Example 16 (f(x, u) = 0[x//
g1+g2
p
g3
g4
]) Since f is a polynomial, it is clear
that f(
g1+g2
p
g3
g4
,u) =
g⇤1+g
⇤
2
p
g3
g⇤
4
for suitable expressions g⇤1 , g
⇤
2 and g
⇤
4 .
g⇤1 + g
⇤
2
p
g3
g⇤4
= 0 ) g⇤1 + g⇤2
p
g3 = 0
) |g⇤1 | = |g⇤2
p
g3| ^
(sign(g⇤1) 6= sign(g⇤2) _ sign(g⇤1) = sign(g⇤2) = 0)
) g⇤12 � g⇤22g3 = 0 ^ g⇤1g⇤2 0.
In general, if the guarded points from the pi�i0 in � are {(�i,j , ei,j)}, then
Virtual Term Substitution (with the restrictions of Problem 5) is
9x�(x,u)VTS=)
_
i
_
j
(�i,j ^ ((pi�i0)[x//ei,j ])) (3.80)
Unfortunately the right-hand side of (3.80) does not satisfy the constraints of
Problem 5), not least because of the 6= operations in the guards of (3.76)–(3.79),
hence we cannot apply two levels of Virtual Term Substitution to 9x19x2�(x1, x2,u)
until we can lift restriction 2.
3.6.2 The Strict Case
If we consider a strict inequality (>, < or 6=), i.e lifting restriction 2, this again
defines a certain number of intervals, but the end-points no longer satisfy the
strict inequality: we need interior points.
The initial version of Virtual Term Substitution [Wei88] solved this problem
via using the point 1
2
(zi + zj), i.e. the arithmetic mean, for the interval (zi, zj).
However, this has two serious drawbacks:
3.6. VIRTUAL TERM SUBSTITUTION 157
1. we do not know the order of the zi(u), hence we need to add all possible
arithmetic means: 1
2
n(n� 1) of them if there are n such zi;
2. if zi and zj are independent quadratic expressions, then the arithmetic
mean is
1
2
✓�gi,1 ±pgi,3
2gi,4
+
�g,1 ±pgj,3
2gj,4
◆
,
which is not of the form
g1+g2
p
g3
g4
, so the reasoning of Example 16 does
not apply.
Hence the alternative approach is to use infinitesimals: for a strict quadratic
inequality we would add four points
�gi,1±
p
gi,3
2gi,4
± ✏. This is in fact overkill
as it wil give us two points in most strict intervals, and we only need add
�gi,1±
p
gi,3
2gi,4
� ✏, and a +1 term to allow for upper unbounded intervals where
we would otherwise have no point.
However, we need to handle these terms, as is shown in the next two exam-
ples.
Example 17 (p(x) < 0[x//t � ✏]) (where p is a polynomial and t is a regular
(non-infinitesimal) term).
p(x) < 0[x//t� ✏] )
✓
p(x) < 0 _ (p(x) = 0 ^
⇣
p0(x) < 0 _
�
p0(x) = 0 ^ (p00(x) < 0 _ · · ·
�
⌘
◆
[x//t]
Example 18 (ax2 + bx + c < 0[x//1])
ax2 + bx+ c < 0[x//1] ) a < 0 _
⇣
a = 0 ^
�
b < 0 _ (b = 0 ^ c < 0)
�
⌘
With these extensions, equation (3.80) is still valid, with restriction 2 lifted.
3.6.3 Nested Quantifiers
Consider
9x19x2�(x1, x2,u). (3.81)
We first consider 9x2 (x2,v), where v is x1||u (|| signifying concatenation) and
is � with the arguments reshu✏ed. As in (3.80), this is converted, eliminating
x2, into
_
i
_
j
(�i,j(v) ^ ((pi(x2,v)�i0)[x2//ei,j(v)])) ,
which we can write as (x1,u) after replacing v by x1||u. This satisfies re-
striction 1, and we have lifted restriction 2. The only problem is restriction 3:
there is no guarantee that, even if � is only quadratic in x1, that will be.
In practice, though, often does, or can be made to (see [KSD15, §5] for a
158 CHAPTER 3. POLYNOMIAL EQUATIONS
useful technique), satisfy restriction 3. We then apply the same technique to
9x1 (x1,u) to obtain
_
i0
_
j0
�
�0i0,j0(u) ^
�
(p0i0(x1,u)�
0
i00)[x1//e
0
i0,j0(u)]
��
,
as desired.
There is an important practical42 point, though. We have suggested applying
“the same technique” to
9x1
_
i
_
j
(�i,j(x1||u) ^ ((pi(x2, x1||u)�i0)[x2//ei,j(x1||u)])) ,
but we are much better o↵ applying it to the equivalent
N
_
i=1
Ni
_
j=1
9x1 (�i,j(x1||u) ^ ((pi(x2, x1||u)�i0)[x2//ei,j(x1||u)])) , (3.82)
so that we are solving
P
Ni problems each the complexity, at least informally
speaking, of the original, rather than one problem
P
Ni times as long.
3.6.4 Universal quantifiers
So far we have merely treated existential 9 quantifiers. This is, in one sense, all
we need do, since 8x�(x) , ¬9x¬�(x). The formula ¬�(x) breaks constraint
1, but the standard laws of logic, such as
¬(� _ ), (¬�) ^ (¬�) (3.83)
will deal with this problem. There are two remarks we can make about the
complexity of this process.
1. We may as well work with blocks of quantifiers (as in Notation 23), i.e.
transform 8x�(x) into ¬9x¬�(x).
2. If the innermost block is existential, then after eliminating these quan-
tifiers, we are left with an expression of the form (3.82), i.e. a large
disjunction, which we were processing clause-by-clause. When we negate
this, we get a large conjunction, which has to be processed in its entirety.
Eliminating the 9x will give another disjunction, but this again negates
to a large conjunction which has to be processed in its entirety. It seems
to be this process that makes the complexity heavily dependent on the
number of alternations.
42And indeed important from the point of complexity tehory, as pointed out by [Wei88, p.
25].
3.7. CONCLUSIONS 159
3.6.5 Complexity of VTS
The general method we have outlined above was foreshadowed by [Wei88] for
the linear case. In particular that paper proved that the complexity of linear
quantifier elimination was singly exponential in the number of variables, but
doubly exponential in the number of alternations (see Notation 23). A similar
claim was made for the quadratic case in [Stu96], but was not proved.
Open Problem 15 Show, either in the quadratic case, or more generally, that
the complexity of quantifier elimination by Virtual Term Substitution is expo-
nential in the number of variables, and only doubly exponential in the number
of alternations.
Sturm wrote to the author as follows. “At the time of writing I had blindly
believed that this holds also for the quadratic case. This is still my intuition,
but there is no proof.
I had furthermore assumed that the argument would remain correct when
generalizing to arbitrary degrees. After [C.W. Brown] told me in Timis,oara
that there is both a combinatorial and a size-of-polynomials aspect of double
exponential complexity, I am not at all certain about this anymore.”
3.7 Conclusions
1. The RootOf construct is inevitable (theorem 10), so should be used, as
described in footnote 3 (page 81). Such a notation can avoid the “too
many solutions” trap — see equations (3.49) and (3.50). We should find
a way of extending it to situations such as equation (3.48).
2. While matrix inversion is a valuable concept, it should generally be avoided
in practice.
3. Real algebraic geometry is not simply “algebraic geometry writ real”: it
has di↵erent problems and needs di↵erent techniques.
160 CHAPTER 3. POLYNOMIAL EQUATIONS
Chapter 4
Modular Methods
In chapter 2, describing the subresultant method of computing greatest common
divisors, we said the following.
This algorithm is the best method known for calculating the g.c.d.,
of all those based on Euclid’s algorithm applied to polynomials with
integer coe�cients. In chapter 4 we shall see that if we go beyond
these limits, it is possible to find better algorithms for this calcula-
tion.
Now is the time to fulfil that promise, which we do by describing the historically-
first “advanced” algorithm, first with a simple example (section 4.1), and then
its greatest success, g.c.d. calculation, first in one variable (section 4.2), then in
two (section 4.3) and several (section 4.4) variables. We will then look at other
applications (section 4.5), and finally at Gröbner bases (section 4.6).
The basic idea behind these algorithms is shown in Figure 4.1: instead of
doing a calculation in some (large) domain R, we do it in several smaller domains
Ri, possibly discard some of these, piece the result together to R
0
1···k, regard this
Figure 4.1: Diagrammatic illustration of Modular Algorithms
R
calculation
- - - - - - - - - - - - - - - - - - - -> R
k⇥reduce # ”
interpret
& check
R1
calculation�! R1
…
…
…
Rk
calculation�! Rk
9
>
>
>
=
>
>
>
;
combine
�!
?discard
R01···k
R01···k indicates that some of the Ri may have been rejected by the
compatibility checks, so the combination is over a subset of R1, . . . , Rk.
161
162 CHAPTER 4. MODULAR METHODS
as being in R and check that it is indeed the right result. The key questions are
then the following.
1. Are there domains Ri for which the behaviour of the computation in Ri
is su�ciently close to that in R for us to make deductions about the com-
putation in R from that in Ri? Such Ri will generally be called “good”.
2. Can we tell, either immediately or with hindsight, whether an Ri is
“good”? It will often turn out that we can’t, but may be able to say
that, given Ri and Rj , one of them is definitely “bad”, and preferably
which.
3. How many reductions should we take? In practice we will only count
“good” reductions, so this question is bound up with the previous one.
4. How do we combine the results from the various Ri? The answer to this
will often turn out to be a variant of the Chinese Remainder Theorem.
5. How do we check the result? Can we be absolutely certain that this result
is in fact the answer to the original question? In category speak, does
Figure 4.1 commute?
A common choice for the Ri is given by the following.
Notation 27 Let n be a positive number. By Zn we mean the integers consid-
ered modulo n.
Proposition 50 Zn is a commutative ring with a multiplicative identity (the
number 1 itself).
Example 19 Note that we can’t say that Zn is an integral domain (Definition
11), since, for example, in Z6 2 ⇥ 3 = 0, even though neither 2 nor 3 are zero
in Z6.
Proposition 51 Let p be a prime number. Then Zp is a field.
Proof. Since p is a prime, the problem of Example 19 can’t happen, and Zp is
certainly an integral domain. Let n 2 Zp be nonzero, and apply the Extended
Euclidean Algorithm (in Z) to the integers n and p. Since gcd(n, p) = 1, we
find a, b such that an+ bp = 1. Then an ⌘ 1 (mod p), so a = n�1 and hence
inverses exist.
4.1 Matrices: a Simple Example
Suppose we wish to calculate the determinant, or possibly the inverse, D of an
n ⇥ n matrix M , and for simplicity we will assume that n is even. If the size
of the entries (degree for polynomials, or number of bits/words for integers) is
s, then the determinant will have size at most1 ns, since it is the sum of n!
products of n entries from the matrix.
1Not quite for integers, since a + b can be bigger than either a or b, but we’ll ignore this
complication, which does not a↵ect the general point.
4.1. MATRICES: A SIMPLE EXAMPLE 163
4.1.1 Matrices with integer coe�cients: Determinants
If we use the fraction-free methods of Theorem 15, then, just before clearing
column k, the entries should have size roughly2 bounded by ks, and at the
end, the last entry, which is the determinant, is (roughly) bounded by ns. But
consider the half-way state, after we have eliminated n/2 columns. The entries
are:
row 1 n entries of size s: total ns;
row 2 n� 1 entries of size 2s: total 2(n� 1)s;
row 3 n� 2 entries of size 3s: total 3(n� 3)s;
. . .
row n/2 n/2 entries of size (n/2)s: total (n/2)2s;
Total
Pn/2
i=1 i(n+ 1� i)s =
⇣
n3
12
+ n
2
4
+ n
6
⌘
s.
next n/2 rows n/2 elements of size (n/2)s in each row, totalling n
3
8
s
Grand total 5
24
n3s+O(n2s).
Since the final result has size ns, we are storing, and manipulating, 5
24
n2 times
as much data as are needed — a quantifiable case of intermediate expression
swell. Can we do better?
Let the pi be a family of distinct primes
3, chosen so that arithmetic modulo
p is fast, typically 31-bit or 63-bit primes4. Since D is the sum of products
of entries of the matrix, we can evaluate D (mod p) (which we will write as
Dp) either by computing D, and then reducing it modulo p, or by computing
(modulo p) the determinant of the matrix Mp: in symbols
det(M)p = det(Mp). (4.1)
The latter takes O(n3) operations, by classical arithmetic, to which we should
add O(n2s) for the initial reduction modulo p.
The Hadamard bounds (Propositions 84 and 85: see [AM01] for an analysis
of the extent to which they are over-estimates — roughly 0.22n decimal digits)
will give us a number D0 such that |D| D0, log2 D0 n2 s(log2 n). If we know
Dpi for enough primes such that N :=
Q
pi > 2D0, we can use the Chinese
2Again, we need “roughly” in the integer case, since there is again the question of carries,
as in note1.
3For this example, we do not actually need them to be primes, merely relatively prime,
but the easiest way to ensure this is for them to be primes.
4We therefore take the cost of arithmetic modulo p to be constant — O(1). Strictly
speaking, we should allow for the possibility that our problem is so large that there are not
enough small primes. But such a problem would not fit in the computer in the first place, and
the computer algebra community systematically (and often silently) ignores this possibility.
We can replace O by Õ if we are really worried about it.
164 CHAPTER 4. MODULAR METHODS
Remainder Theorem (Theorem 50 and Algorithm 47) to deduce DN . If we
choose it in the range �N
2
< DN <
N
2
, then DN = D, and we are done.
The number of primes needed is O(logD0) = O(ns log n). Hence the cost of
all the determinants is O((n3 + n2s)ns log n) = O(n4s log n + n3s2 log n). The
cost of the Chinese Remainder Algorithm is, by Proposition 95, O(log2 D0) =
O(n2s2 log2 n), which is dominated by the second term in the determinant ost.
For large n, the first term dominates, but for large s, the second term might
dominate. Hence we have proved the following, slightly clumsy, statement.
Proposition 52 The cost of computing the determinant of an n ⇥ n matrix
with integer entries bounded by 2s is O
�
max(n4s log n, n3s2 log n)
�
.
If we attempt to simplify this, we can write O(n4s2 log n), which doesn’t look
much di↵erent from (3.25)’s O(n5s2 log2 n), and this is true in terms of worst-
case complexity with respect to each of n and s considered separately. But if,
for example, s and n are both k, then (3.25) is O(k7 log2 k) and Proposition 52
is O(k5 log k). An even better method is mentioned in Section 5.9.4.
4.1.2 Matrices with polynomial coe�cients: Determinants
Assume our matrix entries are polynomials in x, of degree at most d, over a
(su�ciently large) ring R which supports the polynomial Chinese Remainder
Theorem (Appendix A.4) — in practice Q or Q(y1, . . . , yn). One method is
the fraction-free computations of Theorem 15, then the same analysis as at the
start of the previous section shows that we have intermediate expression swell.
Can we do better?
Suppose we know that the degree (in x) of the determinant is less than N ,
and let v1, . . . , vN be N distinct values in R. Then, using the notation Zx�v to
mean “Z, but replacing x by the value v” (which for polynomial Z is the same
as the remainder on dividing Z by x� v), we have
det(M)x�v = det(Mx�v). (4.2)
Proposition 53 If M is a n⇥ n matrix whose entries are polynomials in x of
degree at most d, then the degree of det(M) is at most nd.
If we assume no expression growth in the elements of R (unrealistic, but
simplifying), the cost of computing Mx�v is O(n
2d) operations in K, and the
cost of evaluating the determinant is O(n3). The cost of the Chinese Remainder
Algorithm is, by Proposition 96, O(N2) = O((nd+ 1)2).
Proposition 54 The cost of computing the determinant of an n ⇥ n matrix
with polynomial entries of degree bounded by d is O
�
max(n4d, n3d2)
�
coe�cient
operations.
4.1. MATRICES: A SIMPLE EXAMPLE 165
In practice, if we had polynomial entries with integer coe�cients, we would
reduce the polynomial determinant calculation to many integer ones, and do
these by the method of section 4.1.1. The analysis becomes rather tedious, but
it is still possible to show that the complexity is always better than fraction-free
methods.
Open Problem 16 (Matrix Determinant costs) In Open Problem 6, we
saw that the obvious calculation of costs did not seem to have been borne out
by (admittedly old) experiments. What is the practical validity of these calcula-
tions?
4.1.3 Conclusion: Determinants
Hence we have simple answer to the questions on page 162.
1. Are there good reductions from R?: yes, every reduction.
2. How can we tell if Ri is good? — always.
3. How many reductions should we take? This depends on bounding the size
of det(M). In the polynomial case, this is easy from the polynomial form
of the determinant: Proposition 53. In the integer case, the fact that we
are adding n! products makes the estimate more subtle: see Proposition
84.
4. How do we combine the results — one of Algorithms 47 (integer) or 48
(polynomial).
5. How do we check the result: irrelevant.
4.1.4 Linear Equations with integer coe�cients
Now suppose that we wish to actually solve a linear system M.x = a with
coe�cients in Z, rather than simply compute the determinant (which we will
assume is nonzero). As before, one option is the fraction-free method (Corol-
lary 4) followed by back-substitution, and it can be shown that the cost of the
back-substitution is asymptotically the same as that of the fraction-free trian-
gularization.
Let p be a prime not dividing det(M). We know (3.14) that x = M�1.a, and
hence xp = (M
�1.a)p = (M
�1)p.ap = Mp
�1.ap. If we knew the xp for enough
p, we ought to be able to recover x. However, the entries of x are rational
numbers, not integers. There is a general answer to this problem in Section
4.5.2.3, but in fact we can do better here. We know that the denominators of x
all divide det(M), so we shall solve the related problem
y = (det(M))M�1.a, (4.3)
whose solution y is a vector of integers, again bounded by the Hadamard bounds
(Propositions 84 and 85).
166 CHAPTER 4. MODULAR METHODS
Algorithm 16 (Modular Linear Equations)
Input: M an n⇥ n non-singular matrix, a an n-vector over Z
Output: x an n-vector over Q with M.x = a
1. for enough (N) primes pi
2. reduce M mod pi to get Mi, and a to get ai
3. triangularize Mi||ai
4. if Mi is non-singular
5. Backsubstitute to solve yi = (det(Mi))Mi
�1.ai.
6. Reconstruct det(M) from the det(Mi).
7. Reconstruct y from the yi.
8. Return y/ det(M) (after cancellation).
The cost of steps 2 and 3 are O(n2s) and O(n3), as in section 4.1.1. Step 5
costs O(n2) operations. Step 6 costs, by Proposition 95, O(log2 | det(M)|) =
O(n2s2 log2 n) operations, and step 7 costs n times as much. Step 8 as the same
order of complexity as step 7, as we are computing n gcds of numbers of size
bounded by the Hadamard bounds. Hence the total cost is
O
�
N(n2s+ n3) + n(n2s2 log2 n)
�
(4.4)
operations. The number of primes we actually need is given by the Hadamard
bounds5, i.e. ns log n, and the number of primes that we have to discard is
(at most) the same, as these must divide the determinant. The total cost is
therefore
O(n3s2 log n
| {z }
evaluation
+ n4s log n
| {z }
det/solving
+ n3s2 log2 n
| {z }
CRT/simplify
) = O(n4s log n+n3s2 log2 n) (4.5)
(where we have annotated the causes of the various summands): almost the
same as Proposition 52, and the extra factor of log n multipliying n3s2 is caused
by the fact that the Chinese Remainder Theorem is no longer negligeable, as
we have to reconstruct n numbers.
4.1.5 Linear Equations with polynomial coe�cients
This is su�ciently similar that we will skip the details.
4.1.6 Conclusion: Linear Equations
Hence we have simple answer to the questions on page 162.
1. Are there good reductions from R?: yes, all reductions except those that
annihilate the determinant — a finite number.
5Though it doesn’t change the asymptotics, these logarithms are log
B
, where B is the size
of our working primes. Hence if B = 231, we have log
2
31 rather than log
e
, saving a factor of
21.
4.2. GCD IN ONE VARIABLE 167
2. How can we tell if Ri is good? — if the determinant is nonzero.
3. How many reductions should we take? This depends on bounding the size
of det(M). In the polynomial case, this is easy from the polynomial form
of the determinant: Proposition 53. In the integer case, the fact that we
are adding n! products makes the estimate more subtle: see Proposition
84.
4. How do we combine the results — one of Algorithms 47 (integer) or 48
(polynomial).
5. How do we check the result: irrelevant.
4.1.7 Matrix Inverses
The same techniques as linear equations apply: indeed we could (though proba-
bly should not) compute the columns of the inverse as the solutions of M.x = ei,
where ei has a 1 in the ith position, and zero elsewhere.
4.2 Gcd in one variable
Let us consider Brown’s example (from which page 65 was modified):
A(x) = x8 + x6 � 3x4 � 3x3 + 8x2 + 2x� 5; (4.6)
B(x) = 3x6 + 5x4 � 4x2 � 9x+ 21. (4.7)
Let us suppose that these two polynomials have a common factor, that is a
polynomial P (of non-zero degree) which divides A and B. Then there is a
polynomial Q such that A = PQ. This equation still holds if we take each
coe�cient as an integer modulo 5. If we write P5 to signify the polynomial P
considered as a polynomial with coe�cients modulo 5, this equation implies that
P5 divides A5. Similarly, P5 divides B5, and therefore it is a common factor
6 of
A5 and B5. But calculating the g.c.d. of A5 and B5 is fairly easy:
A5(x) = x
8 + x6 + 2x4 + 2x3 + 3x2 + 2x;
B5(x) = 3x
6 + x2 + x+ 1;
C5(x) = remainder(A5(x), B5(x)) = A5(x) + 3(x
2 + 1)B5(x) = 4x
2 + 3;
D5(x) = remainder(B5(x), C5(x)) = B5(x) + (x
4 + 4x2 + 3)C5(x) = x;
E5(x) = remainder(C5(x), D5(x)) = C5(x) + xD5(x) = 3.
Thus A5 and B5 are relatively prime, which implies that P5 = 1. As the leading
coe�cient of P has to be one, we deduce that P = 1.
The concept of modular methods is inspired by this calculation, where there
is no possibility of intermediate expression swell, for the integers modulo 5 are
bounded (by 4). Obviously, there is no need to use the integers modulo 5: any
6Note that we cannot deduce that P
5
= gcd(A
5
, B
5
): a counter-example is A = x � 3,
B = x+ 2, where P = 1, but A
5
= B
5
= x+ 2, and so gcd(A
5
, B
5
) = x+ 2, whereas P
5
= 1.
168 CHAPTER 4. MODULAR METHODS
prime number p will su�ce (we chose 5 because the calculation does not work
modulo 2, for reasons to be described later, and 3 divides one of the leading
coe�cients). In this example, the result was that the polynomials are relatively
prime. This raises several questions about generalising this calculation to an
algorithm capable of calculating the g.c.d. of any pair of polynomials:
1. how do we calculate a non-trivial g.c.d.?
2. what do we do if the modular g.c.d. is not the modular image of the g.c.d.
(as in the example in the footnote6)?
3. how much does this method cost?
4.2.1 Bounds on divisors
Before we can answer these questions, we have to be able to bound the coe�-
cients of the g.c.d. of two polynomials.
Theorem 36 (Landau–Mignotte Inequality [Lan05, Mig74, Mig82]) Let
Q =
Pq
i=0 bix
i be a divisor of the polynomial P =
Pp
i=0 aix
i (where ai and bi
are integers). Then
q
max
i=0
|bi|
q
X
i=0
|bi| 2q
�
�
�
�
bq
ap
�
�
�
�
v
u
u
t
p
X
i=0
a2i .
These results are corollaries of statements in Appendix A.2.2.
If we regard P as known and Q as unknown, this formulation does not
quite tell us about the unknowns in terms of the knowns, since there is some
dependence on Q on the right, but we can use a weaker form:
q
X
i=0
|bi| 2p
v
u
u
t
p
X
i=0
a2i .
When it comes to greatest common divisors, we have the following result.
Corollary 9 Every coe�cient of the g.c.d. of A =
P↵
i=0 aix
i and B =
P�
i=0 bix
i
(with ai and bi integers) is bounded by
2min(↵,�) gcd(a↵, b�)min
0
@
1
|a↵|
v
u
u
t
↵
X
i=0
a2i ,
1
|b� |
v
u
u
t
�
X
i=0
b2i
1
A .
Proof. The g.c.d. is a factor of A and of B, the degree of which is, at most,
the minimum of the degrees of the two polynomials. Moreover, the leading
coe�cient of the g.c.d. has to divide the two leading coe�cients of A and B,
and therefore has to divide their g.c.d.
A slight variation of this corollary is provided by the following result.
4.2. GCD IN ONE VARIABLE 169
Corollary 10 Every coe�cient of the g.c.d. of A =
P↵
i=0 aix
i and B =
P�
i=0 bix
i
(where ai bi are integers) is bounded by
2min(↵,�) gcd(a0, b0)min
0
@
1
|a0|
v
u
u
t
↵
X
i=0
a2i ,
1
|b0|
v
u
u
t
�
X
i=0
b2i
1
A .
Proof. If C =
P�
i=0 cix
i is a divisor of A, then Ĉ =
P�
i=0 c��ix
i is a divisor of
 =
P↵
i=0 a↵�ix
i, and conversely. Therefore, the last corollary can be applied
to  and B̂, and this yields the bound stated.
It may seem strange that the coe�cients of a g.c.d. of two polynomials can
be greater than the coe�cients of the polynomials themselves. One example
which shows this is the following (due to Davenport and Trager):
A = x3 + x2 � x� 1 = (x+ 1)2(x� 1);
B = x4 + x3 + x+ 1 = (x+ 1)2(x2 � x+ 1);
gcd(A,B) = x2 + 2x+ 1 = (x+ 1)2.
This example can be generalised, as say
A = x5 + 3x4 + 2x3 � 2x2 � 3x� 1 = (x+ 1)4(x� 1);
B = x6 + 3x5 + 3x4 + 2x3 + 3x2 + 3x+ 1 = (x+ 1)4(x2 � x+ 1);
gcd(A,B) = x4 + 4x3 + 6x2 + 4x+ 1 = (x+ 1)4.
In fact, Mignotte [Mig81] has shown that the number 2 in corollaries 9 and 10
is asymptotically the best possible, i.e. it cannot be replaced by any smaller c.
4.2.2 The modular – integer relationship
In this sub-section, we answer the question raised above: what do we do if
the modular g.c.d. is not the modular image of the g.c.d. calculated over the
integers?
Lemma 7 If p does not divide the leading coe�cient of gcd(A,B), the degree
of gcd(Ap, Bp) is greater than or equal to that of gcd(A,B).
Proof. Since gcd(A,B) divides A, then (gcd(A,B))p divides Ap. Similarly, it
divides Bp, and therefore it divides gcd(Ap, Bp). This implies that the degree
of gcd(Ap, Bp) is greater than or equal to that of gcd(A,B)p. But the degree of
gcd(A,B)p is equal to that of gcd(A,B), for the leading coe�cient of gcd(A,B)
does not cancel when it is reduced modulo p.
This lemma is not very easy to use on its own, for it supposes that we know
the g.c.d. (or at least its leading coe�cient) before we are able to check whether
the modular reduction has the same degree. But this leading coe�cient has
to divide the two leading coe�cients of A and B, and this gives a formulation
which is easier to use.
170 CHAPTER 4. MODULAR METHODS
Corollary 11 If p does not divide the leading coe�cients of A and of B (it
may divide one, but not both), then the degree of gcd(Ap, Bp) is greater than or
equal to that of gcd(A,B).
As the g.c.d. is the only polynomial (to within an integer multiple) of its degree
which divides A and B, we can test the correctness of our calculations of the
g.c.d.: if the result has the degree of gcd(Ap, Bp) (where p satisfies the hypothesis
of this corollary) and if it divides A and B, then it is the g.c.d. (to within an
integer multiple).
It is quite possible that we could find a gcd(Ap, Bp) of too high a degree.
For example, in the case cited above, gcd(A2, B2) = x + 1 (it is obvious that
x+1 divides the two polynomials modulo 2, because the sum of the coe�cients
of each polynomial is even). The following lemma shows that this possibility
can only arise for a finite number of p.
Lemma 8 Let C = gcd(A,B). If p satisfies the condition of the corollary above,
and if p does not divide Resx(A/C,B/C), then gcd(Ap, Bp) = Cp.
Proof. A/C and B/C are relatively prime, for otherwise C would not be the
g.c.d. of A and B. By the corollary, Cp does not vanish. Therefore
gcd(Ap, Bp) = Cp gcd(Ap/Cp, Bp/Cp).
For the lemma to be false, the last g.c.d. has to be non-trivial. This implies that
the resultant Resx(Ap/Cp, Bp/Cp) vanishes, by proposition 76 of the Appendix.
This resultant is the determinant of a Sylvester matrix, and |Mp| = (|M |)p, for
the determinant is only a sum of products of the coe�cients. In the present case,
this amounts to saying that Resx(A/C,B/C)p vanishes, that is that p divides
Resx(A/C,B/C). But the hypotheses of the lemma exclude this possibility.
Definition 84 If gcd(Ap, Bp) = gcd(A,B)p, we say that the reduction of this
problem modulo p is good, or that p is of good reduction. If not, we say that p
is of bad reduction.
This lemma implies, in particular, that there are only a finite number of values
of p such that gcd(Ap, Bp) does not have the same degree as that of gcd(A,B),
that is the p which divide the g.c.d. of the leading coe�cients and the p which
divide the resultant of the lemma (the resultant is non-zero, and therefore has
only a finite number of divisors). In particular, if A and B are relatively prime,
we can always find a p such that Ap and Bp are relatively prime.
Observation 7 It would be tempting to conclude “the probabiliy that p is bad is
the probability that p divides this resultant, i.e. 1/p. However, a given resultant
either is, or is not, divisible by p: there is no probability involved. If we consider
the space of all (A,B) pairs7, then we have to allow for p dividing the leading
coe�cients. This argument would also assume that the distribution of resultants
7Technically, the limit as H ! 1 of all pairs with coe�cients at most H.
4.2. GCD IN ONE VARIABLE 171
Figure 4.2: Diagrammatic illustration of Algorithm 17
Z[x] - - - - - -> Z[x]
reduce # ”
interpret
& check
Zp[x]
gcd�! Zp[x]
is the same as the distribution of integers, and this seems not to be the case: very
few resultants are actually prime, for example. Nevertheless, it is an empirical
observation that the probability of p being bad does seem to be proportional to
1/p.
An alternative proof that there are only finitely primes of bad reduction can be
deduced from Lemma 11. We can summarize this section in the following.
Theorem 37 (Good Reduction Theorem (Z)) If p does not divide gcd(a↵, b�)
(which can be checked for in advance) or Resx(A/C,B/C), then p is of good re-
duction. Furthermore, if p divides Resx(A/C,B/C) but not gcd(a↵, b�), then
the gcd computed modulo p has a larger degree than the true result.
4.2.3 Computing the g.c.d.: one large prime
In this section we answer the question posed earlier: how do we calculate a non-
trivial g.c.d.? One obvious method is to use the Landau-Mignotte inequality,
which can determine anM such that all the coe�cients of the g.c.d. are bounded
by M , and to calculate modulo a prime number greater than 2M . This method
translates into Algorithm 17/Figure 4.2 (where Landau_Mignotte_bound(A,B)
applies corollary 9 and/or corollary 10, and find_large_prime(2M) produces a
di↵erent prime > 2M each tine it is called within a given invocation of the algo-
rithm). We restrict ourselves to monic polynomials, and assume modular_gcd
gives a monic result, to avoid the problem that a modular g.c.d. is only defined
up to a constant multiple.
Algorithm 17 (Modular GCD (Large prime version))
Input: A,B monic polynomials in Z[x].
Output: gcd(A,B)
M :=Landau_Mignotte_bound(A,B);
do p :=find_large_prime(2M);
if p does not divide gcd(lc(A), lc(B))
then C :=modular_gcd(A,B, p);
if C divides A and C divides B
then return C
forever #Lemma 8 guarantees termination
172 CHAPTER 4. MODULAR METHODS
We can think of the use of modular_gcd as a Monte Carlo Algorithm (Section
1.4.2) and Algorithm 17 as an instance of Figure 1.1 converting a Monte Carlo
algorithm into a Las Vegas (“always correct/probably fast”) one.
If the inputs are not monic, the answer might not be monic. For example,
x + 4 divides both 2×2 + x and 2×2 � x � 1 modulo 7, but the true common
divisor over the integers is 2x+1, which is 2(x+4) (mod 7). We do know that
the leading coe�cient of the g.c.d. divides each leading coe�cient lc(A) and
lc(B), and therefore their g.c.d. g = gcd(lc(A), lc(B)). We therefore compute
C := pp(g ⇥ modular gcd(A,B)
| {z }
computed modulo M ,
then interpreted in Z
) (4.8)
instead, where the pp is needed in case the leading coe�cient of the g.c.d. is a
proper factor of g.
It is tempting to argue that this algorithm will only handle numbers of the
size of twice the Landau–Mignotte bound, but this belief has two flaws.
• While we have proved that there are only finitely many bad primes, we
have said nothing about how many there are. The arguments can in fact
be made e↵ective, but the results tend to be unduly pessimistic, since it is
extremely unlikely that all the bad primes would be clustered just above
2M .
• In theory, and indeed very often in practice[ABD88], the division tests
could yield very large numbers if done as tests of the remainder being
zero: for example the remainder on dividing x100 by x� 10 is 10100. This
can be solved by a technique known as “early abort” trial division.
Proposition 55 If h, of degree m, is a factor of f of degree n, the coef-
ficient of xn�m�i in the quotient is bounded by
�
n�m
i
�
1
lc(h)
||f ||.
This is basically Corollary 25. Hence, as we are doing the trial division,
we can give up as soon as we find a coe�cient in the quotient that exceeds
this bound, which is closely related to M (the di↵erence relates to the
leading coe�cient terms).
For example, if we take p = 7 in the example at the start of this chapter,
we find that the g.c.d. of A7 and B7 is x + 10 (it could equally well be
x + 3, but x + 10 makes the point better). Does x + 10 divide B? We
note that ||B|| ⇡ 23.92. Successive terms in the quotient are 3×5 (and 3
is a permissible coe�cient), �30×4 (and 30 <
�
5
1
�
⇥ 23.92) and 305x3, at
which point we observe that 305 >
�
5
2
�
⇥ 23.92 = 239.2, so this cannot be
a divisor of B. Hence 7 was definitely unlucky.
With this refinement, it is possible to state that the numbers dealt with
in this algorithm are “not much larger” than 2M , though considerable
ingenuity is needed to make this statement more precise.
4.2. GCD IN ONE VARIABLE 173
If we apply this algorithm to the polynomials at the start of this section,
we deduce that
q
P8
i=0 a
2
i =
p
113,
q
P6
i=0 b
2
i = 2
p
143, and hence corollary 9
gives a bound of
26 min
✓p
113,
2
3
p
143
◆
⇡ 510.2, (4.9)
so our first prime would be 1021, which is indeed of good reduction. In this
case, corollary 10 gives a bound of
26 min
✓
1
5
p
113,
2
21
p
143
◆
⇡ 72.8, (4.10)
so our first prime would be 149. In general, we cannot tell which gives us the
best bound, and it is normal to take the minimum.
Open Problem 17 [Improving Landau–Mignotte for g.c.d.] A significant fac-
tor in the Landau–Mignotte bound here, whether (4.9) or (4.10), was the 2min(8,6)
contribution from the degree of the putative g.c.d. But in fact the exponent is
at most 4, not 6, since the g.c.d. cannot have leading coe�cient divisible by 3
(since A does not). Hence the g.c.d. must have at most the degree of the g.c.d.
modulo 3, and modulo 3 B has degree 4, so the gc.d. must have degree at most
4.
Can this be generalised, in particular can we update our estimate “on the
fly” as upper bounds on the degree of the g.c.d change, and is it worth it? In
view of the ‘early success’ strategies discussed later, the answer to the last part
is probably negative.
4.2.4 Computing the g.c.d.: several small primes
While algorithm 17 does give us some control on the size of the numbers being
considered, we are still often using numbers larger than those which hindsight
would show to be necessary. For example, in (4.6), (4.7) we could deduce co-
primeness using the prime 5, rather than 1021 from (4.9) or 149 from (4.10). If
instead we consider (x� 1)A and (x� 1)B, the norms change, giving 812.35 in
(4.9) (a prime of 1627) and 116.05 in (4.10) (a prime of 239). Yet primes such
as 5, 11, 13 etc. will easily show that the result is x� 1. Before we leap ahead
and use such primes, though, we should reflect that, had we taken (x � 10)A
and (x � 10)B, 5 would have suggested x as the gcd, 11 would have suggested
x+ 1, 13 would have suggested x+ 3 and so on.
The answer to this comes in observing that the smallest polynomial (in terms
of coe�cient size) which is congruent to x modulo 5 and to x+ 1 modulo 11 is
x�10 (it could be computed by algorithm 48). More generally, we can apply the
Chinese Remainder Theorem (Theorem 50) to enough primes of good reduction,
as follows. We assume that find_prime(g) returns a prime not dividing g, a
di↵erent one each time. The algorithm is given in Figure 4.3, with a diagram
in Figure 4.4.
174 CHAPTER 4. MODULAR METHODS
Figure 4.3: Algorithm 18
Algorithm 18 (Modular GCD (Small prime version))
Input: A,B polynomials in Z[x].
Output: gcd(A,B)
M :=Landau_Mignotte_bound(A,B);
g := gcd(lc(A), lc(B));
p := find_prime(g);
D := gmodular_gcd(A,B, p);
if deg(D) = 0 then return 1
N := p; # N is the modulus we will be constructing
while N < 2M repeat (*)
p := find_prime(g);
C := gmodular_gcd(A,B, p);
if deg(C) = deg(D)
then D := Algorithm 48(C,D, p,N);
N := pN ;
else if deg(C) < deg(D)
# C proves that D is based on primes of bad reduction
if deg(C) = 0 then return 1
D := C;
N := p;
else #D proves that p is of bad reduction, so we ignore it
D := pp(D); # In case multiplying by g was overkill
Check that D divides A and B, and return it
If not, all primes must have been bad, and we start again
The “early abort” of Proposition 55 is needed for these divsibility checks if we
are to maintain a “numbers not much larger than 2M” guarantee.
4.2. GCD IN ONE VARIABLE 175
Figure 4.4: Diagrammatic illustration of Algorithm 18
Z[x] - - - - - - - - - - - - - - - - - - - -> Z[x]
k⇥reduce # ”
interpret
& check
Zp1 [x]
gcd�! Zp1 [x]
…
…
…
Zpk [x]
gcd�! Zpk [x]
9
>
>
=
>
>
;
C.R.T.�! Z0p1···pk [x]
Z0p1···pk [x] indicates that some of the pi may have been rejected by the compat-
ibility checks, so the product is over a subset of p1 · · · pk.
Observation 8 The reader may think that Algorithm 18 is faulty: line (*)
in Figure 4.3 iterates until N � 2M , which would be fine if we were actually
computing the g.c.d. But we have forced the leading coe�cient to be g, which
may be overkill. Hence aren’t we in danger of trying to recover g times the true
g.c.d., whose coe�cients may be greater than 2M?
In fact there is not a problem. The proof of Corollary 9 relies on estimating
the leading coe�cient of gcd(A,B) by g, and so the bound is in fact a bound for
the coe�cients after this leading coe�cient has been imposed.
Having said that, we can’t “mix and match”. If we decide that Corollary 10
provides a better lower bound than Corollary 9, then we must go for “imposed
trailing coe�cients” rather than “imposed leading coe�cients”, or, and this is
the way the author has tended to implement it, compute the g.c.d. of  and B̂,
and reverse that.
We should note the heavy reliance on Corollary 11 to detect bad reduction. We
impose g as the leading coe�cient throughout, and make the result primitive
at the end as in the large prime variant.
4.2.5 Computing the g.c.d.: early success
While Algorithm 18 will detect a g.c.d. of 1 early, it will otherwise compute as
far as the Landau–Mignotte bound if the g.c.d. is not 1. While this may be
necessary, it would be desirable to terminate earlier if we have already found
the g.c.d. This is easily done by replacing the line
then D := Algorithm 48(C,D, p,N);
by the code in Figure 4.5. We should note that we return an E which divides
the inputs, and is derived from modular images, and therefore has to be the
greatest common divisor by Corollary 11.
176 CHAPTER 4. MODULAR METHODS
Figure 4.5: “Early termination” g.c.d. code
then D0 := D
D := Algorithm 48(C,D, p,N);
if D = D0 #We may have found the answer
then E := pp(D);
if E divides A and B
then return E;
# Otherwise this was a false alert, and we continue as normal.
4.2.6 An alternative correctness check
So far we have suggested computing the putative g.c.d. G, then checking that
it really divides both, and relying on Corollary 11 to say that G is therefore a
greatest common divisor. An alternative approach is to compute the co-factors,
i.e. A0 such that A = A0G and B0 such that B = B0G at the same time,
and use these as the check. So let us assume that modular gcd cofactors
returns a triple [G,A0, B0] modulo p. The Algorithm (19) is given in Figure 4.6,
and the diagram in Figure 4.4 is still relevant. TO BE COMPLETEDearly
termination: here and elsewhere.
Observation 9 It is tempting to conjecture that we do not need to make both
the multiplication checks at the end, but this is false: consider A = H, B =
H + 3p1 · · · pk, when the algorithm will find H as the putative g.c.d., since the
Landau–Mignotte bound will ignore the large extra term in B, and only the
multiplication check for B will detect this.
Observation 10 Early termination can perfectly well be applied to this variant:
at any time gA = DA0 and gB = DB0 over the integers, we can finish.
4.2.7 Conclusion
Observation 11 We have presented this material as if there were a choice
between one large prime (Algorithm 17) and several small ones (Algorithms 18,
19). In practice, of course, a computer regards all numbers less than 32 bits (and
increasingly 64 bits) as ‘small’, so an implementation would generally use the
largest ‘small’ primes it could in Algorithms 18, 19, and thus often one prime
will su�ce, and we have the same e↵ect as Algorithm 17.
Let us see how we have answered the questions on page 162.
1. Are there “good” reductions from Z to Zp? Yes — all primes except those
that divide both leading coe�cients (Lemma 7) and do not divide a certain
resultant (Lemma 8). The technique of Lemma 11 (page 200) will show
that there are only finitely many bad primes, but does not give a bound.
4.2. GCD IN ONE VARIABLE 177
Figure 4.6: Algorithm 19
Algorithm 19 (Modular GCD (Alternative small prime version))
Input: A,B polynomials in Z[x].
Output: G := gcd(A,B), A0, B0 with A = A0G, B = B0G.
M :=Landau_Mignotte_bound(A,B);
g := gcd(lc(A), lc(B));
p := find_prime(g);
[D,A0, B0] := modular_gcd_cofactors(A,B, p);
if deg(D) = 0 then return [1,A,B]
D := gD # g is our guess at the leading coe�cient of the g.c.d.
N := p; # N is the modulus we will be constructing
while N < 2M repeat
p := find_prime(g);
[C,A1, B1] := modular_gcd_cofactors(A,B, p);
if deg(C) = deg(D)
then D := Algorithm 48(C,D, p,N);
A0 := Algorithm 48(A1, A
0, p,N);
B0 := Algorithm 48(B1, B
0, p,N);
N := pN ;
else if deg(C) < deg(D)
# C proves that D is based on primes of bad reduction
if deg(C) = 0 then return [1,A,B]
D := C; A0 = A1; B
0 = B1;
N := p;
else #D proves that p is of bad reduction, so we ignore it
if gA = DA0 and gB = DB0
then G := pp(D) gcd(cont(A), cont(B)); # Theorem 6
A0 := A0/lc(G); B0 := B0/lc(G); # Fix leading coe�cients
return [G,A0, B0]
else all primes must have been bad, and we start again
178 CHAPTER 4. MODULAR METHODS
2. How can we tell if p is good? We can’t, but given two di↵erent primes
p and q which give di↵erent results, we can tell which is definitely bad:
Corollary 11.
3. How many reductions should we take? We want the apparently good
primes to have a product greater than twice the Landau–Mignotte bound
(Corollaries 9 and 10). Alternatively we can use early termination as in
Figure 4.5.
4. How do we combine? Chinese Remainder Theorem (Algorithm 48).
5. How do we check the result? If it divides both the inputs, then it is a
common divisor, and hence (Corollary 11) has to be the greatest.
All these algorithms use modular computations as Monte Carlo (“always fast/
probably correct”) algorithms (Section 1.4.2), converted into a Las Vegas (“al-
ways correct/probably fast”) one in the style of Figure 1.1 because we have
correctness checks. In fact, because we can bound the number of bad primes (as
opposed to just the probability of a prime being bad) we are actually guaran-
teed polynomial running time, so these Las Vegas algorithms are in fact perfectly
normal algorithms, though the upper limits on the running times so obtained
are almost always gross over-estimates.
4.3 Polynomials in two variables
The same techniques can be used to compute the greatest common divisor of
polynomials in several variables. This is even more important than in the case of
univariate polynomials, since the coe↵cient growth observed on page 66 becomes
degree growth in the other variables.
4.3.1 Degree Growth in Coe�cients
As a trivial example of this, we can consider
A = (y2 � y � 1)x2 � (y2 � 2)x+ (2y2 + y + 1);
B = (y2 � y + 1)x2 � (y2 + 2)x+ (y2 + y + 2).
The first elimination gives
C = (2y2 � 4y)x+ (y4 � y3 + 2y2 + 3y + 3),
and the second gives
D = �y10 + 3 y9 � 10 y8 + 11 y7 � 23 y6 + 22 y5 � 37 y4 + 29 y3 � 32 y2 + 15 y � 9.
Since this is a polynomial in y only, the greatest common divisor does not depend
on x, i.e. ppx(gcd(A,B)) = 1. Since contx(A) = 1, contx(gcd(A,B)) = 1, and
4.3. POLYNOMIALS IN TWO VARIABLES 179
hence gcd(A,B) = 1, but we had to go via a polynomial of degree 10 to show
this. Space does not allow us to show bigger examples in full, but it can be
shown that, even using the subresultant algorithm (Algorithm 4), computing
the g.c.d. of polynomials of degree d in x and y can lead to intermediate8
polynomials of degree O(d2).
Suppose A and B both have degree dx in x and dy in y, with A =
P
aix
i
and B =
P
bix
i). After the first division (which is in fact a scaled subtraction
C := acB � bcA), the coe�cients have, in general, degree 2dy in y. If we
assume9 that each division reduces the degree by 1, then the next result would
be �B � (µx + ⌫)C where µ = bc has degree dy in y and � and ⌫ have degree
3dy. This result has degree 5dy in y, but the subresultant algorithm will divide
through by bc, to leave a result D of degree dx� 2 in x and 4dy in y. Tthe next
result would be �0C� (µ0x+ ⌫0)D where µ0 = lcx(C) has degree 2dy in y and �0
and ⌫0 have degree 6dy. This result has degree 10dy in y, but the subresultant
algorithm will divide by a factor of degree 4, leaving an E of degree 6dy in y.
The next time round, we will have degree (after subresultant removal) 8dy in y,
and ultimately degree 2dxdy in y when it has degree 0 in x.
If this is not frigntening enough, consider what happens if, rather than being
in x and y, our polynomials were in n+ 1 variables x and y1, . . . , yn. Then the
coe�cients (with respect to x) of the initial polynomials would have degree
dy, and hence (at most) (1 + dy)
n terms, whereas the result would have (1 +
2dxdy)
n terms, roughly (2dx)
n, or exponentially many, times as many terms as
the inputs.
Although the author knows of no way of formalising this statement, expe-
rience suggests that, even if f and g are sparse, i.e. have many fewer than
(1 + dy)
n terms, the intermediate results are dense, and so the blowup ratio is
even worse.
If we wish to consider taking greatest common divisors of polynomials in
several variables, we clearly have to do better. Fortunately there are indeed
better algorithms. The historically first such algorithm is based on Algorithm
18, except that evaluating a variable at a value replaces working modulo a prime.
Open Problem 18 (Alternative Route for Bivariate Polynomial g.c.d.)
Is there any reasonable hope of basing an algorithm on Algorithm 17? Intuition
says not, and that, just as Algorithm 18 is preferred to Algorithm 17 for the uni-
variate problem, so should it be here, but to the best of the author’s knowledge
the question has never been explored.
The reader will observe that the treatment here is very similar to that of the
univariate case, and may well ask “can the treatments be unified?” Indeed
they can, and this was done in [Lau82], but the unification requires rather more
algebraic machinery than we have at our disposal.
8We stress this word. Unlike the integer case, where the coe�cients of a g.c.d. can be
larger than those of the original polynomials, the degree in y of the final g.c.d. cannot be
greater than the (minimum of) the degrees (in y) of the inputs.
9This is the “normal p.r.s. assumption: see footnote 37 (page 71).
180 CHAPTER 4. MODULAR METHODS
Notation 28 From now until section 4.4, we will consider the bivariate case,
gcd(A,B) with A,B 2 R[x, y] ⌘ R[y][x], and we will be considering evaluation
maps replacing y by v 2 R, writing Ay�v for the result of this evaluation.
This notation is analogous to the previous section, where Ap was the remainder
on dividing A by p.
Observation 12 Clearly R[x, y] ⌘ R[x][y] also, and the definition of g.c.d. is
independent of this choice. Algorithmically, though, it seems as if we must make
such a choice. Some systems may already have imposed a default choice, but if
we have a free hand it is usual to choose as the main variable (x in Notation
28) the one which minimises min(deg(A), deg(B)).
4.3.2 The evaluation–interpolation relationship
In this sub-section, we answer a question analogous to that in section 4.2.2:
what do we do if the g.c.d. of the evaluations is not the image under evaluation
of the g.c.d. calculated before evaluation?
Lemma 9 If y�v does not divide the leading coe�cient of gcd(A,B), the degree
of gcd(Ay�v, By�v) is greater than or equal to that of gcd(A,B).
Proof. Since gcd(A,B) divides A, then (gcd(A,B))y�v divides Ay�v. Similarly,
it divides By�v, and therefore it divides gcd(Ay�v, By�v). This implies that the
degree of gcd(Ay�v, By�v) is greater than or equal to that of gcd(A,B)y�v.
But the degree of gcd(A,B)y�v is equal to that of gcd(A,B), for the leading
coe�cient of gcd(A,B) does not cancel when it is evaluated at v.
This lemma is not very easy to use on its own, for it supposes that we know
the g.c.d. (or at least its leading coe�cient) before we are able to check whether
the modular reduction has the same degree. But this leading coe�cient has
to divide the two leading coe�cients of A and B, and this gives a formulation
which is easier to use.
Corollary 12 If y � v does not divide the leading coe�cients of A and of B
(it may divide one, but not both), then the degree of gcd(Ay�v, By�v) is greater
than or equal to that of gcd(A,B).
As the g.c.d. is the only polynomial (to within a multiple from R[y]) of its degree
(in x) which divides A and B, we can test the correctness of our calculations of
the g.c.d.: if the result has the degree of gcd(Ay�v, By�v) (where v satisfies the
hypothesis of this corollary) and if it divides A and B, then it is the g.c.d. (to
within a multiple from R[y]).
As in section 4.2.2, it is quite possible that we could find a gcd(Ay�v, By�v)
of too high a degree: consider A = x� 1, B = x� y and the evaluation y 7! 1.
The following lemma shows that this possibility can only arise for a finite number
of v.
4.3. POLYNOMIALS IN TWO VARIABLES 181
Lemma 10 Let C = gcd(A,B). If v satisfies the condition of the corollary
above, and if y � v does not divide Resx(A/C,B/C), then gcd(Ay�v, By�v) =
Cy�v.
Proof. A/C and B/C are relatively prime, for otherwise C would not be the
g.c.d. of A and B. By the corollary, Cy�v does not vanish. Therefore
gcd(Ay�v, By�v) = Cy�v gcd(Ay�v/Cy�v, By�v/Cy�v).
For the lemma to be false, the last g.c.d. has to be non-trivial. This implies
that the resultant Resx(Ay�v/Cy�v, By�v/Cy�v) vanishes, by proposition 76 of
the Appendix. This resultant is the determinant of a Sylvester matrix10, and
|My�v| = (|M |)y�v, for the determinant is only a sum of products of the coef-
ficients. In the present case, this amounts to saying that Resx(A/C,B/C)y�v
vanishes, that is that y� v divides Resx(A/C,B/C). But the hypotheses of the
lemma exclude this possibility.
Definition 85 If gcd(Ay�v, By�v) = gcd(A,B)y�v, we say that the evaluation
of this problem at v is good, or that y � v is of good reduction. If not, we say
that y � v is of bad reduction.
This lemma implies, in particular, that there are only a finite number of values v
such that gcd(Ay�v, By�v) does not have the same degree as that of gcd(A,B),
that is the y� v which divide the g.c.d. of the leading coe�cients and the y� v
which divide the resultant of the lemma (the resultant is non-zero, and therefore
has only a finite number of divisors). In particular, if A and B are relatively
prime, we can always find a v such that Ay�v and By�v are relatively prime.
We can summarize this section as follows.
Theorem 38 (Good Reduction Theorem (polynomial)) If y�v does not
divide gcd(a↵, b�) (which can be checked for in advance) or Resx(A/C,B/C),
then v is of good reduction. Furthermore, if y � v divides Resx(A/C,B/C) but
not gcd(a↵, b�), then the gcd computed modulo y � v has a larger degree than
the true result.
4.3.3 G.c.d. in Zp[x, y]
By Gauss’ Lemma (Theorem 6),
gcd(A,B) = gcd(contx(A), contx(B)) gcd(ppx(A), ppx(B)),
and the real problem is the second factor.
10There’s a subtle point here. The resultant of polynomials of degrees m and n is the
determinant of an (m + n)2 matrix. Hence if y � v divides neither leading coe�cient, the
Sylvester matrix of A
y�v and By�v is indeed the reduction of the Sylvester matrix of A and
B. If y�v divides one leading coe�cient, but not the other, the Sylvester matrix of A
y�v and
B
y�v is smaller, and the reader should check that this only makes a di↵erence of a product
of that leading coe�cient which doesn’t vanish when v is substituted for y.
182 CHAPTER 4. MODULAR METHODS
Figure 4.7: Diagrammatic illustration of Algorithm 21
Zp[y][x] - - - - - - - - - - - - - - - - - - - - - - - - - -> Zp[y][x]
k⇥reduce # ”
interpret
& check
Zp[y]y�v1 [x]
gcd�! Zp[y]y�v1 [x]
…
…
…
Zp[y]y�vk [x]
gcd�! Zp[y]y�vk [x]
9
>
>
=
>
>
;
C.R.T.�! Zp[y][x]
Q0
(y � v1) · · · (y � vk)
Q0
indicates that some of the vi may have been rejected by the compatibility
checks, so the product is over a subset of (y � v1) · · · (y � vk).
Observation 13 While the content of A =
Pm
i=0 aix
i can be computed as
gcd(am, gcd(am�1, gcd(am�2, . . . , a0) . . .),
the following process is more e�cient in practice (asymptotically, it’s only worth
a constant factor on what is asymptotically the cheaper operation, but in practice
it is worthwhile), and is valid over any g.c.d. domain R with a concept of ‘size’
equivalent to degree.
Algorithm 20 (Content)
Input: A =
Pm
i=0 aix
i 2 R[x].
Output: contx(A)
S := {ai} # Really a set, as duplicates don’t matter
g := minimal degree element of S; S := S \ {g}
if g is a unit
then return 1
g := gcd(g,
P
hi2S �ihi) # �i random
if g is a unit
then return 1
for h 2 S
if g does not divide h
then g := gcd(g, h)
if g is a unit
then return 1
return g
We present an Algorithm (21), analogous to Algorithm 18, for the g.c.d. of
primitive polynomials. It would equally be possible to start from Algorithm 19.
There is also an ‘early termination’ version, analogous to the code in Figure 4.5.
We should note the caveat on find_value in Figure 4.8: although we know
there are only a finite number of bad values v, since Zp is finite it is possible
for too many, indeed all, values in Zp to be bad, and we may need to move to
an algebraic extension of Zp to have enough good values.
4.3. POLYNOMIALS IN TWO VARIABLES 183
Figure 4.8: Algorithm 21
Algorithm 21 (Bivariate Modular GCD)
Input: A,B primitive polynomials in Zp[y[[x].
Output: gcd(A,B)
g := gcd(lcx(A), lcx(B));
v := find_value(g);
D := gmodular_gcd(Ay�v, By�v, p);
if deg(D) = 0 then return 1
N := y � v; # N is the modulus we will be constructing
while degy(N) min(degy(A), degy(B)) repeat
v := find_value(g);
C := gmodular_gcd(Ay�v, By�v, p);
if deg(C) = deg(D)
then D := Algorithm 50(C,D, y � v,N);
N := (y � v)N ;
else if deg(C) < deg(D) # C proves that D is based on values of bad reduction if deg(C) = 0 then return 1 D := C; N := y � v; else #D proves that v is of bad reduction, so we ignore it D := pp(D); # In case multiplying by g was overkill Check that D divides A and B, and return it If not, all values must have been bad, and we start again find_value(g) finds a new value v each time, such that gy�v does not vanish. It is conceivable that we will exhaust Zp, in which case we have to move to choosing v from an algebraic extension of Zp. Indeed it is possible that there are no values v in Zp such that gy�v does not vanish, e.g. g = y(y � 1)(y � 2) when p = 3. 184 CHAPTER 4. MODULAR METHODS Figure 4.9: Diagrammatic illustration of g.c.d.s in Z[x, y] (1) Z[y][x] - - - - - - - - - - - - - - - - - - - - - - - - - -> Z[y][x]
k⇥reduce # ”
interpret
& check
Z[y]y�v1 [x]
gcd�! Z[y]y�v1 [x]
…
…
…
Z[y]y�vk [x]
gcd�! Z[y]y�vk [x]
”
9
>
>
>
>
=
>
>
>
>
;
C.R.T.�! Z[y][x]
Q0
(y � v1) · · · (y � vk)
using Algorithm 18/19
4.3.4 G.c.d. in Z[x, y]
We know how to compute g.c.d.s in Zp[x], and wish to compute g.c.d.s in Z[x, y].
We have seen the building blocks in the previous sections, diagrammatically in
figures 4.3 and 4.7. We have a choice of ways of combining these constructions,
though, depending on what we choose as the intermediate domain.
Z[x] Here we use an analogue of Figure 4.7 to reduce us to the univariate
case. The corresponding diagram then looks like Figure 4.9. There is one
complication we should note, though. We are now applying the Chinese
Remainder Theorem to polynomials in Z[y], rather than Zp[y], and Z is
not a field. Indeed, the Theorem does not always apply over Z, since,
for example, the polynomial satisfying f(0) = 0 and f(2) = 1 is f = 1
2
y,
which is in Q[y] but not Z[y].
Should this happen, we actually know that all reductions so far are wrong,
because they are compatible in degree, but cannot be right. Hence we start
again.
Zp[y][x] Here we use an analogue of Figure 4.2 to reduce us to the case of Figure
4.7. The overall structure is shown in Figure 4.10.
Open Problem 19 [Which is the Better Route for Bivariate g.c.d.?] As far as
the author can tell, the question of which route to follow has not been systemati-
cally explored. The initial literature [Bro71b, Algorithm M] and the more recent
survey [Lau82] assume the route in Figure 4.10. [Bro71b] explicitly assumes
that p is large enough that we never run out of values, i.e. that the algebraic
extension at the end of Figure 4.8 is never needed.
Implementations the author has seen tend to follow the route in Figure 4.9.
This is probably for pragmatic reasons: as one is writing a system one first
wishes for univariate g.c.d.s, so implements Algorithm 18 (or 19). Once one
has this, Figure 4.9 is less work.
There is a natural tendency to believe that Figure 4.10 is better, as all num-
bers involved are bounded by p untl the very last Chinese Remainder calculations.
4.3. POLYNOMIALS IN TWO VARIABLES 185
Figure 4.10: Diagrammatic illustration of g.c.d.s in Z[x, y] (2)
Z[y][x] – – – – – – – – – – – – – – – – – – – – – – – – – -> Z[y][x]
k⇥reduce # ”
interpret
& check
Zp1 [y][x]
gcd�! Zp1 [y][x]
…
…
…
Zpk [y][x]
gcd�! Zpk [y][x]
”
9
>
>
>
>
=
>
>
>
>
;
C.R.T.�! Z[y][x]
Q0
p1 · · · pk
using Algorithm 21
On the other hand, if the g.c.d. is actually 1, but the first prime chosen is un-
lucky, one will reconstruct all the way back to a bivariate before realising the
problem.
Let us look at the questions on page 162.
1. Are there ”good” reductions from R? Yes, since there are only finitely
many bad reductions, but it is possible that we will need to go to an
algebraic extension if we are working over a finite field: see the caveat on
find_value in Figure 4.8.
2. How can we tell if an evaluation v is good? We can’t, but given two
di↵erent evaluations v and w which give di↵erent results, we can tell which
is definitely bad: Corollary 12.
3. How many reductions should we take? If d bounds the degree in y of the
g.c.d., then we only need d+ 1 good reductions. We don’t have the same
problem as in the univariate case, when our bounds were almost certainly
pessimistic, but it is nevertheless common to use early termination as in
Figure 4.5.
4. How do we combine the results — Algorithm 50.
5. How do we check the result? If it divides both the inputs, then it is a
common divisor, and hence (Corollary 12) has to be the greatest.
As on page 178, these algorithms use modular/evaluation computations as
Monte Carlo (“always fast/probably correct”) algorithms (Section 1.4.2), con-
verted into a Las Vegas (“always correct/probably fast”) one in the style of
Figure 1.1 because we have correctness checks. In fact, because we can bound
the number of bad evaluations and bad primes (as opposed to just the proba-
bility of an evaluation/prime being bad) we are actually guaranteed polynomial
running time. It is even truer here that the upper limits on the running times
so obtained are almost always gross over-estimates.
186 CHAPTER 4. MODULAR METHODS
4.4 Polynomials in several variables
There is no conceptual di�culty in generalising the work of the previous section
to n variables. In principle, there are n possible diagrams corresponding to
Figures 4.9-4.10, but in practice the choice is between ‘modular last’ (Figure
4.9) and ‘modular first’ (Figure 4.10), and the author has never seen a hybrid
algorithm whch reduces, say,
Z[y][z][x]! Z[y][z]z�v[x]! Zp[y][z]z�v[x]! Zp[y]y�v0 [z]z�v[x].
Theorem 39 ([Bro71b, (95)]) Suppose f, g 2 Z[x1, . . . , xn]. Let l bound the
lengths of the integer coe�cients of f and g, and d bound their degree in each
of the xi. Then under the following assumptions:
1. classical O(N2) multiplication and division of N -word numbers or N -term
polynomials;
2. no bad values are found, either for primes or for evaluation points;
3. that the answer, and corresponding quotients, have coe�cient lengths at
most l;
4. that we can use single-word primes;
the running time of the ‘modular first’ (Figure 4.10) algorithm is
O
�
l2(d+ 1)n + (d+ 1)n+1
�
.
This contrasts with the subresultant algorithm’s bound [op. cit. (80)] (assuming
the p.r.s. are normal) of
O
⇣
l2(d+ 1)4n22n
2
3n
⌘
,
so the dependence on (d+1)n has essentially gone from quartic to linear. (d+1)n
is the maximal number of terms in the input, so this “modular first” algorithm
is “almost optimal” in the sense of Definition 20 for dense inputs.
The real challenge comes with the potential sparsity of the polynomials.
The factor (d + 1)n in the running time comes simply from the fact that a
dense polynomial of degree d in n variables has that many terms, and we will
need (d + 1)n�1 univariate g.c.d.s (therefore (d + 1)n coe�cients) to deduce
these potential terms. Furthermore it is possible for sparse polynomials to have
dense g.c.d.s: the following elegant univariate example is due to [Sch03a] (see
also (2.21)):
gcd(xpq � 1
| {z }
f(x)
, xp+q � xp � xq + 1
| {z }
g(x)
) =
(xp � 1)(xq � 1)
x� 1 = x
p+q�1 � xp+q�2 ± · · ·� 1
| {z }
h(x)
.
(4.11)
4.4. POLYNOMIALS IN SEVERAL VARIABLES 187
Example 20 (Dense g.c.d.s) Therefore
gcd(f(x1)f(x2) · · · f(xk), g(x1)g(x2) · · · g(xk)) = h(x1)h(x2) · · ·h(xk), (4.12)
and the righthand side has (2min(p, q))
k
terms, whereas the arguments to gcd
have 2k and 4k terms.
In fact, we can be slightly more subtle and take
gcd(f(x1)g(x2)f(x3)g(x4) · · · g(xk), g(x1)f(x2)g(x3)f(x4) · · · f(xk)), (4.13)
with the same gcd but where the arguments have 8k/2 ⇡ 2.8k terms.
Problem 6 (Sparse g.c.d.) Produce an algorithm for sparse multivariate g.c.d.
whose running time is polynomial, ideally linear, in the number of terms in the
inputs and output. If we can’t find a deterministic algorithm, maybe we can find
a Las Vegas (section 1.4.2) one.
The ‘modular first’ (Figure 4.10) algorithm does not solve this problem as it
will attempt to interpolate
Q
(d1 + 1) terms if the g.c.d. has degree di in xi.
The problem was first addressed, in Las Vegas style, in [Zip79a, Zip79b].
4.4.1 A worked example
Let f be
x5y7z4 + x4y8z4 + x4y7z4 + x3y7z4 + x2y8z4 + x5y3z5 + x4y4z5+
x3y4z6 + x2y7z4 + x2y5z6 � x5y7 + x5yz6 � x4y8 + x4y3z5+
x4y2z6 � x3y4z5 � x2y5z5 + x2y4z6 � x4y7 + x4yz6 � x2y4z5 � x3y7�
x2y8 � x2y7 � x5y3 � x4y4 � x4y3 � x5y � x4y2 � x4y + x3z+
x2yz � x3 � x2y + x2z � x2 + xz + yz � x� y + z � 1
(4.14)
and g be
x6y7z4 � x4y7z4 + x6y3z5 + x4y7z3 + x4y4z6 � x6y7 + x6yz6 + x4y7z2
�x4y4z5 � 2x2y7z4 + x4y7z � 2x4y3z5 + x2y7z3 � 2x2y4z6 + 2x4y7+
x4y3z4 � 2x4yz6 + x2y7z2 + 3x2y4z5 + x4y3z3 + x4yz5 + x2y7z�
x6y3 + x4y3z2 + x4yz4 + 3x2y7 + x4y3z + x4yz3 � x6y + 3x4y3+
x4yz2 + x4yz + 3x4y + x4z � x4 � x2z + 2×2 � 2 z + 3
(4.15)
polynomials with 42 and 39 terms respectively (as against the 378 and 392 they
would have if they were dense of the same degree).
Let us regard x as the variable to be preserved throughout. If we compute
gcd(f |z=2, g|z=2) (which would require at least11 8 y values) we get (assuming
that z = 2 is a good reduction)
gcd(f, g)|z=2 =
�
15 y7 + 31 y3 + 63 y
�
x4 +
�
15 y7 + 32 y4 + 1
�
x2 + 1. (4.16)
11In fact, with y = 0 both f and g drop in x-degree, whereas with y = �2 we get a g.c.d.
with x-degree 5, as y = �2 is a root of the resultant in Lemma 10.
188 CHAPTER 4. MODULAR METHODS
We note that each coe�cient (with respect to x) has at most three terms. A
dense algorithm would compute five more equivalents of (4.16), at di↵erent
z values, and then interpolate z polynomials. Each such computation would
require 8 y values. Instead, if we believe12 that (4.16) describes accurately the
dependence of the g.c.d. on y, we should be able to deduce these equivalents
with only three y values. Consider
gcd(f |z=3,y=1, g|z=3,y=1) = 525×4 + 284×2 + 1
gcd(f |z=3,y=�1, g|z=3,y=�1) = �525×4 + 204×2 + 1
gcd(f |z=3,y=2, g|z=3,y=2) = 6816×4 + 9009×2 + 1
9
=
;
(4.17)
We should interpolate the coe�cients of x4 to fit the template ay7 + by3 + cy,
and those of x2 to fit the template a0y7 + b0y4 + c, while those of x0 should fit
the template a00y0, i.e. be constant. Considering the coe�cients of x4, we have
to solve the equations
a+ b+ c = 525
�a� b� c = �525
27a+ 23b+ 2c = 6816
9
=
;
. (4.18)
Unfortunately, these equations are under-determined (the second is minus twice
the first), so we have to add another equation to (4.17), e.g.
gcd(f |z=3,y=3, g|z=3,y=3) = 91839×4 + 107164×2 + 1,
which adds 37a+ 33b+ 3c = 91839 to (4.18) and the augmented system is now
soluble, as a = 40, b = 121, c = 364.
This process gives us an assumed (two assumptions are now in play here:
that z = 3 is a good reduction, and that (4.16) describes the sparsity structure
gcd(f, g) accurately) value of
gcd(f, g)|z=3 =
�
40 y7 + 121 y3 + 364 y
�
x4+
�
40 y7 + 243 y4 + 1
�
x2+1. (4.19)
Similarly we can deduce that
gcd(f, g)|z=4 =
�
85 y7 + 341 y3 + 1365 y
�
x4 +
�
85 y7 + 1024 y4 + 1
�
x2 + 1
gcd(f, g)|z=5 =
�
156 y7 + 781 y3 + 3906 y
�
x4 +
�
156 y7 + 3125 y4 + 1
�
x2 + 1
gcd(f, g)|z=6 =
�
259 y7 + 1555 y3 + 9331 y
�
x4 +
�
259 y7 + 7776 y4 + 1
�
x2 + 1
gcd(f, g)|z=7 =
�
400 y7 + 2801 y3 + 19608 y
�
x4 +
�
400 y7 + 16807 y4 + 1
�
x2 + 1
gcd(f, g)|z=8 =
�
585 y7 + 4681 y3 + 37449 y
�
x4 +
�
585 y7 + 32768 y4 + 1
�
x2 + 1
(4.20)
(each requiring, at least, three y evaluations, but typically no more).
If we now examine the coe�cients of x4y7, and assume that in gcd(f, g) there
is a corresponding term x4y7p(z), we see that p(2) = 15 (from (4.16)), p(3) = 40
(from (4.19)), and similarly p(4), . . . , p(8) from (4.20). Using Algorithm 49, we
12We refer to this as the Zippel assumption, after its introduction in [Zip79a, Zip79b], and
formalize it in Definition 87.
4.4. POLYNOMIALS IN SEVERAL VARIABLES 189
deduce that p(z) = z3 + z2 + z + 1. We can do the same for all the other
(presumed) xiyj terms in the g.c.d., to get the following final result:
x4
⇣
y7
�
z3 + z2 + z + 1
�
+ y3
�
z4 + z3 + z2 + z + 1
�
+
y
�
z5 + z4 + z3 + z2 + z + 1
�
⌘
+ x2
�
y7
�
z3 + z2 + z + 1
�
+ y4z5 + 1
�
+ 1.
(4.21)
We still need to check that it does divide g and g, but, once we have done so,
we are assured by Corollary 12 that it is the greatest common divisor.
4.4.2 Converting this to an algorithm
There are, however, several obstacles to converting this to an algorithm: most
centring around the linear systems such as (4.18). First, however, we must
formalize the assumption made on page 188.
Definition 86 We define the skeleton of a polynomial f 2 R[x1, . . . , xn], de-
noted Sk(f), to be the set of monomials in a (notional) distributed representation
of f .
Definition 87 We say that an evaluation � : R[x1, . . . , xn] ! R[xi : i /2 S],
which associates a value in R to every xi : i 2 S, satisfies the Zippel assumption
for f if it preserves the skeleton in the variables not in S, i.e. (Sk(f)) =
Sk(�(f)), where deletes the powers of xi : i 2 S from monomials.
Another way of saying this is to write f (notionally) in R[xi : i 2 S][xi : i /2
S], where the outer [. . .] is represented distributedly, and say that � satisfies the
Zippel assumption for f if it maps no non-zero coe�cient of � to zero.
For example, if f = (y2 � 1)x2 � x, Sk(f) = {x2y2, x2, x}. If � : y 7! 1,
then �(f) = �x, so Sk(�(f)) = {x}, but (Sk(f)) = {x2, x}, so this � does
not satisfy the Zippel assumption (it have mapped y2 � 1 to 0). However, if
� : y 7! 2, �(f) = 3×2 � x and Sk(�(f)) = {x2, x}, so this � satisfies the Zippel
assumption.
Hence the assumption on page 188 was that z 7! 2 satisfies the Zippel
assumption. Let us now look at the problems surrounding (4.18). In fact, there
are three of them.
1. The equations might be under-determined, as we saw there.
2. We need to solve a t ⇥ t system, where t is the number of terms being
interpolated. This will normally take O(t3) coe�cient operations, and, as
we know from section 3.2.3, the coe�cients may also grow during these
operations.
3. The third problem really goes back to (4.17). We stated that
gcd(f |z=3,y=�1, g|z=3,y=�1) = �525×4 + 204×2 + 1,
190 CHAPTER 4. MODULAR METHODS
Figure 4.11: Diagrammatic illustration of sparse g.c.d.
R[y][x] – – – – – – – – – – – – – – – – – – – – – – – – – -> R[y][x]
# k⇥reduce ”
interpret
& check
R[y]yn�v1 [x]
gcd�! R[y]yn�v1 [x]
. Sk
R[y]yn�v2 [x]
gcd0�! R[y]yn�v2 [x]
…
…
…
R[y]yn�vk [x]
gcd0�! R[y]yn�vk [x]
9
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
;
C.R.T.�! R[y][x]
Q0
(yn � v1) · · · (yn � vk)
Q0
indicates that some of the vi may have been rejected by the compatibility
checks, so the product is over a subset of (yn � v1) · · · (yn � vk).
gcd is a recursive call to this algorithm, while gcd0 indicates a g.c.d. computation
to a prescribed skeleton, indicated by Sk: Algorithm 24.
but we could equally well have said that it was 525×4 � 204×2 � 1, at
which point we would have deduced that the equivalent of (4.18) was
inconsistent. In fact, Definition 31 merely defines a greatest common
divisor, for the reasons outlined there. We therefore, as in Algorithm 18,
impose gcd(lc(f), lc(g)), as the leading coe�cient of the greatest common
divisor.
To solve the first two problems, we will use the theory of section A.5, and
in particular Corollary 28, which guarantees that a Vandermonde matrix is
invertible. In fact, we will need to solve a modified Vandermonde system of
form (A.12), and rely on Corollary 29. If, instead of random values for y, we
had used powers of a given value, we would get such a system in place of (4.18).
We now give the interlocking set of algorithms, in Figures 4.12–4.14. We
present them in the context of the diagram in Figure 4.11.
4.4.3 Worked example continued
We now consider a larger example in four variables. Let f and g be the polyno-
mials in Figures 4.15 and 4.16, with 104 and 83 terms respectively (as against
the 3024 and 3136 they would have if they were dense of the same degree).
Let us regard x as the variable to be preserved throughout. f |w=1 and g|w=1
are, in fact, the polynomials of (4.14) and (4.15), whose g.c.d. we have already
computed to be (4.21).
We note that the coe�cients (with respect to x) have fifteen, six and one
term respectively, and f and g both have degree seven in w. We need to compute
seven more equivalents of (4.21), at di↵erent w values, and then interpolate w
polynomials. Let us consider computing gcd(f |w=2, g|w=2) under the assump-
tions that 1 and 2 are good values of w, and that (4.21) correctly describes the
4.4. POLYNOMIALS IN SEVERAL VARIABLES 191
Figure 4.12: Algorithm 22: Sparse g.c.d.
Algorithm 22 (Sparse g.c.d.)
Input: A,B polynomials in R[y][x].
Output: gcd(A,B)
Ac := contx(A); A := A/Ac; # Using Algorithm 20, which
Bc := contx(B); B := B/Bc; # calls this algorithm recursively
g :=Algorithm 22(lcx(A), lcx(B)); # one fewer variable
G :=failed
while G =failed # Expect only one iteration
G :=Algorithm 23(g, gA, gB);
return ppx(G)⇥ Algorithm 22(Ac, Bc); # one fewer variable
Figure 4.13: Algorithm 23: Inner sparse g.c.d.
Algorithm 23 (Inner sparse g.c.d.)
Input: g leading coe�cient; A,B polynomials in R[y][x].
Output: gcd(A,B), but with leading coe�cient g
or failed if we don’t have the Zippel assumption
dn := min(degyn(A), degyn(B)) # (over)estimate for degyn(gcd(A,B))
v0 :=random(lc(A), lc(B)); #assumed good and satisfying Zippel for gcd(A,B);
P0 :=Algorithm 23(g|yn�v0 , A|yn�v0 , B|yn�v0)
if P0 =failed
return failed # Don’t try again
dx := degx(P0)
i := 0
while i < d
v :=random(lc(A), lc(B)); #assumed good
Pi+1 :=Algorithm 24(Sk(P0), g|yn�v, A|yn�v, B|yn�v)
if degx(Pi+1) > dx # Either v
j in Algorithm 24 or
continue round the loop # |yn�v was not a good evaluation
if degx(Pi+1) < dx
return failed # |yn�v0 was not a good evaluation
i := i+ 1; vi := v # store v and the corresponding P
C :=Algorithm 50({Pi}, {vi}) # Reconstruct
if C divides both A and B
return C
else return failed
random(lc(A), lc(B)) chooses a value not annihilating both leading coe�cients
192 CHAPTER 4. MODULAR METHODS
Figure 4.14: Algorithm 24: Sparse g.c.d. from skeleton
Algorithm 24 (Sparse g.c.d. from skeleton)
Input: Sk skeleton, g leading coe�cient; A,B polynomials in R[y][x].
Output: gcd(A,B) assumed to fit Sk, but with leading coe�cient g
or the univariate gcd if this doesn’t fit Sk
Note that y is (y1, . . . , yn�1) since yn has been evaluated
dx := degx(Sk)
t := max0idxterm count(Sk, x
i))
v = (v1, . . . , vn�1):= values in R
for j := 1 . . . t
C := gcd(Ay=vj , By=vj )
if degx(C) 6= dx
return C # We return the univariate
Pj :=
g
y=vj
lc(C)
C # impose leading coe�cient
G := 0
for i := 0 . . . dx
S := [yn : xiyn 2 Sk]
R := [m|y=v : m 2 S]
V := [coe↵(xi, Pj) : j = 1 . . .length(S)]
W :=Algorithm 52(R, V )
for j = 1 . . .length(S)
G := G+WjSjx
i
return G
Figure 4.15: f from section 4.4.3
4w6x4y7z + 2w5x5y7z + 2w5x4y8z � 4w6x4y7 � 2w5x5y7 � 2w5x4y8�
2w5x4y7z + 2w5x4y7 + 2wx4y7z4 + x5y7z4 + x4y8z4 + 6w7x4y3z+
3w6x5y3z + 3w6x4y4z + 4w6x4yz4 + 2w5x5yz4 + 2w5x4y2z4 � x4y7z4�
6w7x4y3 � 3w6x5y3 � 3w6x4y4 � 3w6x4y3z � 4w6x4yz3 � 2w5x5yz3�
2w5x4y2z3 � 2w5x4yz4 + 2wx2y7z4 + x3y7z4 + x2y8z4 + 3w6x4y3+
2w5x4yz3 � 4wx4y7z + 2wx4y3z5 + 2wx2y4z6 � 2x5y7z + x5y3z5�
2x4y8z + x4y4z5 + x3y4z6 � x2y7z4 + x2y5z6 + 2wx4y7 + 2wx4yz6�
2wx2y4z5 + x5y7 + x5yz6 + x4y8 + 2x4y7z � x4y3z5 + x4y2z6 � x3y4z5�
x2y5z5 � x2y4z6 � x4y7 � x4yz6 + x2y4z5 � 4wx4yz4 � 2wx2y7 � 2x5yz4�
2x4y2z4 � x3y7 � x2y8 � 6wx4y3z + 4wx4yz3 � 3x5y3z + 2x5yz3�
3x4y4z + 2x4y2z3 + 2x4yz4 + x2y7 + 4w7z + 2w6xz + 2w6yz + 4wx4y3
+2x5y3 + 2x4y4 + 3x4y3z � 2x4yz3 � 4w7 � 2w6x� 2w6y � 2w6z
�2x4y3 + 2w6 � 2wx4y � x5y � x4y2 + x4y + 2wx2z + x3z + x2yz�
2wx2 � x3 � x2y � x2z � 2wz + x2 � xz � yz + 2w + x+ y + z � 1
4.4. POLYNOMIALS IN SEVERAL VARIABLES 193
Figure 4.16: g from section 4.4.3
2w5x6y7z + 2w6x4y7z � 2w5x6y7 + 2w6x4y7 � 2w5x4y7z + x6y7z4+
3w6x6y3z + 2w5x6yz4 + wx4y7z4 + 3w7x4y3z � 3w6x6y3 + 2w6x4yz4�
2w5x6yz3 + 2wx4y7z3 + 3w7x4y3 � 3w6x4y3z + 2w6x4yz3 � 2w5x4yz4+
2wx4y7z2 + wx2y7z4 � 2x6y7z + x6y3z5 � x4y7z3 + x4y4z6 + wx4y3z5+
2wx2y7z3 + wx2y4z6 + x6y7 + x6yz6 � x4y7z2 � x4y4z5 � x2y7z4 � wx4y7+
2wx4y3z4 + wx4yz6 + 2wx2y7z2 + wx2y4z5 + x4y7z � x4y3z5 � x2y7z3�
x2y4z6 + 2wx4y3z3 + 2wx4yz5 + 2wx2y7z � 2x6yz4 � x4y7 � x4y3z4�
x4yz6 � x2y7z2 + 2wx4y3z2 + wx2y7 � 3x6y3z + 2x6yz3 � x4y3z3 � x4yz5�
x2y7z + 2w6x2z � wx4y3z + 2x6y3 � x4y3z2 + x4yz4 + 2w7z � 2w6x2�
2wx4y3 + 2wx4yz2 + 2x4y3z � x4yz3 + 2w7 � 2w6z + 2wx4yz � x6y�
x4yz2 + wx4y � x4yz + x4z + wx2z � x4 + wx2 � 2x2z � wz + x2 � w + z
Table 4.1: g.c.d.s of univariate images of f and g
y z gcd(f |w=2,y,z, g|w=2,y,z)
2 3 19612x4 + 9009x2 + 127
4 9 15381680x4 + 28551425x2 + 127
8 27 43407439216x4 + 101638909953x2 + 127
16 81 144693004136768x4 + 372950726410241x2 + 127
32 243 495206093484836032x4 + 1383508483289120769x2 + 127
64 729 1706321451043380811520x4 + 5160513838422975053825x2 + 127
...
...
...
215 315 19841169176522147209214289933300473966170786477572096x4+
821125029267868660742578644189289158324777001819308033x2
+127
sparsity structure of gcd(f, g).
We take evaluations with y = 2i and z = 3i. For simplicity, let us consider
the coe�cient of x2, which, we know from (4.21), is assumed to have the shape
y7
�
a1z
3 + a2z
2 + a3z + a4
�
+ a5y
4z5 + a6. (4.22)
Hence, taking the coe�cients of x2 from Table 4.1,
2733a1 + 2
732a2 + 2
73a3 + 2
7a4 + 2
435a5 + a6 = 9009
�
2733
�2
a1 +
�
2732
�2
a2 +
�
273
�2
a3 +
�
27
�2
a4 +
�
2433
�2
a5 + a6 = 28551425
�
2733
�3
a1 +
�
2732
�3
a2 +
�
273
�3
a3 +
�
27
�3
a4 +
�
2433
�3
a5 + a6 = 10 . . .
�
2733
�4
a1 +
�
2732
�4
a2 +
�
273
�4
a3 +
�
27
�4
a4 +
�
2433
�4
a5 + a6 = 37 . . .
�
2733
�5
a1 +
�
2732
�5
a2 +
�
273
�5
a3 +
�
27
�5
a4 +
�
2433
�5
a5 + a6 = 13 . . .
�
2733
�6
a1 +
�
2732
�6
a2 +
�
273
�6
a3 +
�
27
�6
a4 +
�
2433
�6
a5 + a6 = 51 . . .
(4.23)
194 CHAPTER 4. MODULAR METHODS
This is indeed a system of linear equations
4.4.4 Conclusions
Let us look at the questions on page 162.
1. Are there evaluations which are not merely good in the sense of the pre-
vious section, but also satisfy the Zippel assumption? The answer is yes.
For the evaluation yn = v to be good, v must not annihilate both leading
coe�cients (checked for in random) and must not be a root of the relevant
resultant (Corollary 12). For the Zippel assumption to be violated, a cer-
tain coe�cient must vanish under the of Definition 87. There are only
finitely many such, and each such only has finitely many roots, hence in
all there are only finitely many bad values for each xi.
2. How can we tell if an evaluation should be used? The tests in Algorithm
23 apply.
3. How many reductions should we take? The number of good reductions at
each stage xi is 1+min(degxi(f), degxi(g)), by the usual argument on the
degree of a g.c.d.
Open Problem 20 (Bad reductions in Zippel’s algorithm) An in-
teresting question is how many bad reductions there can be. If the degree
in each variable is at most d, then the number of values that annihilate
both leading coe�cients is at most d, and the number that divide the rele-
vant resultant is at most 2d2, the degree of the resultant. The challenge is
to bound the number that violate the Zippel assumption, which we showed
was finite in point 1.
It seems to the author, though he has not seen this in the literature, that,
as a polynomial with t terms can have at most 2t�1 roots (Proposition 24),
the number of Zippel-bad values is bounded by twice the number of terms
in the g.c.d. being computed. Though we have no good a priori bound on
this (Example 20), it means that the number of Zippel-bad reductions is
linear in the size of the output.
Conjecture 1 Let df , dg be the degrees of f in the main variable x, d
be maxi min(degyi(f), degyi(g)), d
0 be maxi(degyi(f)+degyi(g)) and t the
number of terms in gcd(f, g). Then the running time of Zippel’s algorithm
is bounded by
0
B
@
d+ (df + dg)d
0
| {z }
kills Res
+2t
1
C
A
min(df , dg)dt
| {z }
cost
, (4.24)
where we multiply the number of evaluations (allowing for the maximal
number of failed ones) by the cost of a successful calculation. There is
also an O(t2) cost of the linear algebra, but this is dominated by the cost
of the evaluations.
4.5. FURTHER APPLICATIONS 195
4. How do we combine? This is the main subtlety of this method, using the
theory of section A.5.
5. How do we check the result? We have used the “If C divides both A andB”
test, as in Algorithm 17 et seq . We could consider using a “reconstruction
of cofactors” argument, as in Algorithm 19. To the best of the author’s
knowledge, this route has not been explored experimentally. However,
there is a strong argument against it: it would require the evaluations to
satisfy the Zippel assumption for the cofactors as well as for the g.c.d.,
and intuitively this seems like asking for trouble.
A variant on this algorithm is presented in [HM13], which uses an interpolation
technique from [GLL09], and this seems to be the current state of the art, though
there is no formal complexity analysis.
In practice these methods seem to solve Open Problem 2, a g.c.d. algorithm
whose running time is polynomial in the number of terms in the input and
outputs, even though (4.24) is O(t2), rather than the O(t) we would need for
the algorithm to be optimal.
4.5 Further Applications
4.5.1 Resultants and Discriminants
The resultant of two polynomials f =
Pn
i=0 aix
i and g =
Pm
i=0 bix
i is defined
(Definition 110) as the determinant of a certain (m+n)⇥(m+n) matrix Syl(f, g).
Hence if Syl(f, g)|p = Syl(f |p, g|p) (or the equivalent for |y�v), the theory of
Section 4.1 holds. But the only way this can fail to hold is if Syl(f |p, g|p) is no
longer an (m + n) ⇥ (m + n) matrix, i.e. if an or bm evaluate to zero. Hence
again we have simple answer to the questions on page 162.
1. Are there ”good” reductions from R: yes — all evaluations which do not
reduce anbm to zero, and this can be tested a priori .
2. How can we tell if Ri is good?: in advance!
3–5 As in Section 4.1.
4.5.2 Linear Systems
We consider the matrix equation
M.x = a, (3.13)
and the case of M and a having integer entries, and reduction modulo a prime:
the polynomial version is very similar. Conceptually, this has the well-known
solution
x = M�1.a. (3.14)
196 CHAPTER 4. MODULAR METHODS
This formulation shows the fundamental problem: M�1, and x itself, might not
have integer entries. In fact, if p divides det(M), (M|p)�1 does not exist. There
are two solutions to this problem.
4.5.2.1 Clear Fractions
If we compute det(M) first, by the method of section 4.1, we can clear denom-
inators in (3.14), and get
bx := det(M)x = det(M)M�1.a, (4.25)
i.e.
M.bx = det(M)a. (4.26)
If we avoid primes dividing det(M), we can solve (4.26) for enough primes
(suitable bounds are given in Corollary 24), reconstruct integer entries in bx,
and then divide through by det(M).
4.5.2.2 Solve with Fractions
If (3.13) is soluble modulo p, its solution xp is indeed congruent to x when
evaluated at p, i.e. x|p = xp. If we use many primes pi (discarding those for
which (3.13) is not soluble), and apply Algorithm 47 to the vectors xpi , we
get a vector xN such that xN ⌘ x (mod N), where N =
Q
pi. However, the
entries of x are rationals, with numerator and denominator bounded, say, by B
(see Corollary 24), rather than integers. How do we find the entries of x from
xN? This problem has a long history in number theory, generally under the
name Farey fractions, but was first considered in computer algebra in [Wan81].
Since we will have occasion to use this solution elsewhere, we consider it in more
generality in the next section.
If we assume this problem solved, we then have the following answer to the
questions on page 162.
1. Are there good primes?: yes — all that do not divide det(M)
2. How can we tell if a prime p is bad? Equation (3.13) is not soluble modulo
p.
3. How many reductions should we take? Enough such that the product of
the good primes is greater than 2B2.
4. How do we combine? Algorithm 25.
5. How do we check the result? If we use the bound from Corollary 24),
we do not need to check. However, there are “early success” variations,
analogous to section 4.2.5, where we do need to check, which can be done
by checking that M.x = a: an O(n2) arithmetic operations process rather
than the O(n3) of solving.
4.6. GRÖBNER BASES 197
4.5.2.3 Farey reconstruction
In this section, we consider the problem of reconstructing an unknown fraction
x = n/d, with |n|, |d| < B, given that we know x ⌘ y (mod N), i.e. n ⌘ yd
(mod N), where N > 2B2. We first observe that this representation is unique,
for if n0/d0 (similarly bounded) is also congruent to y, then nd0 ⌘ ydd0 ⌘ n0d,
so nd0 � n0d ⌘ 0 (mod N), and the only solution satisfying the bounds is
nd0 � n0d = 0, i.e. n/d = n0/d0.
Actually finding n and d is done with the Extended Euclidean Algorithm
(see Algorithm 5).
Algorithm 25 (Farey Reconstruction)
Input: y,N 2 N
Output: n, d 2 Z such that |n|, |d| <
p
N/2 and n/d ⌘ y (mod N), or
failed if none such exist.
i := 1;
a0 := N ; a1 := y; a := 1; d := 1; b := c := 0
#Loop invariant: ai = ay + bN ; ai�1 = cy + dN ;
while ai >
p
N/2 do
ai+1 = rem(ai�1, ai);
qi :=the corresponding quotient: #ai+1 = ai�1 � qiai
e := c� qia; e0 := d� qib; #ai+1 = ef + e0g
i := i+ 1;
(c, d) = (a, b);
(a, b) = (e, e0)
if |a| <
p
N/2 and gcd(b,N) = 1
then return (ai, a)
else return failed
Correctness of this algorithm, i.e. the fact that the first ai <
p
N/2 corre-
sponds to the solution if it exists, is proved in [WGD82], using [HW79, Theorem
184]. The condition gcd(b,N) = 1 was stressed by [CE95], without which we
may return meaningless results, such as (�1, 2), when trying to reconstruct 5
(mod 12).
4.6 Gröbner Bases
If coe�cient growth is a major problem in g.c.d. computations, it can be even
more apparent in Gröbner basis computations. [Arn03] gives the example of
�
8x2y2 + 5xy3 + 3x3z + x2yz, x5 + 2 y3z2 + 13 y2z3 + 5 yz4,
8x3 + 12 y3 + xz2 + 3, 7x2y4 + 18xy3z2 + y3z3
(4.27)
whose Gröbner base, computed with practically any order, is
⇢
x, y3 +
1
4
, z2
�
. (4.28)
198 CHAPTER 4. MODULAR METHODS
Both (4.27) and (4.28) involve very manageable numbers, but computing with
total degree orders gives intermediate terms such as
80791641378310759316435307942453674533 . . . 364126736536799662381920549
31979005164817514134306974902709502756 . . . 247397413295611141592116516
yx3
(where the ellipses stand for 60 deleted digits)13, and a lexicographic order
gives coe�cients with tens of thousands of digits. Can the same ideas allow us
to bypass such huge intermediate expressions?
Observation 14 [Tra88] proposed a very simple e�ciency improvement, known
as the Gröbner trace idea. We should only compute and reduce an S-polynomial
over Q if the equivalent computation modulo p yields a non-zero polynomial.
This isn’t guaranteed correct, so a final check, either using Theorem 16(1) or on
the lines of Theorem 42, is necessary, but can be a great time-saver in practice.
We should note that there can be various di↵erences between the Gröbner base
computed over the rationals and that computed modulo p.
Example 21 These examples are taken from [Win88].
1. (From [Ebe83]). Let F = {xy2 � 2y, x2y + 3x}. Then (under any order)
S(F1, F2) = 5xy. Hence if this is non-zero, it reduces the other elements
of F to {x, y}, which is a Gröbner base. But if this is zero (i.e. if we are
working modulo 5), then F is already a Gröbner base (Theorem 16 clause
1).
2. Let F = {7xy+y+4x, y+2} Then (under any order) S(F1, F2) = �10x+y.
p = 2 This is therefore y, which is F2. Hence a reduced Gröbner base is
{y}: infinitely many solutions (x unconstrained).
p = 5 This is therefore y, which reduces F2 to 2, hence the Gröbner base
is {1}: no solutions.
Otherwise We get a Gröbner base of {y + 2, 5x+ 1}, with one solution.
3. Let F = {4xy2 + 16x2 � 4 z + 1, 2 y2z + 4x+ 1,�2x2z + 2 y2 + x} with a
purely lexicographic ordering z > y > x. Then the Gröbner base over the
integers is
32×7 + 232×6 � 34×4 � 44×3 + x2 + 30x+ 8,
2745 y2 + 112×6 � 756×5 � 11376×4 � 65×3 + 756×2 + 1772x+ 2,
10980 z � 6272×6 � 45504×5 + 216×4 + 3640×3 � 36846×2 � 412x� 2857.
(4.29)
p = 2 Here F is immediately {1, x2 + 1, 1} and there are no solutions,
whech is a classic “p divides the leading coe�cient” issue.
13This computation was performed with Axiom. [Arn03], computing with Macauley, quotes
intermediate numbers of 80,000 digits.
4.6. GRÖBNER BASES 199
p = 7 Though there might not seem to be a problem (none of the leading
coe�cients above are divisible by 7), in fact S(S(F1, F3), S(F1, F2))
reduces S(F1, F2) to 14 ⇤ x2 ⇤ y2 + y2 � 8 ⇤ x4 � 2 ⇤ x3 + 8 ⇤ x + 2,
and the leading monomial here vanishes modulo 7. In fact, though
the computation follows a di↵erent route, we get out a Gröbner basis
of the same shape as (4.29).
If we reverse the order, though, events take a somewhat di↵erent course.
The Gröbner base over the integers is
16 z7 � 8 z6 + z5 + 52 z4 + 75 z3 � 342 z2 + 266 z � 60,
1988 y2 � 76752 z6 + 1272 z5 � 4197 z4 � 251555 z3 � 481837 z2
+1407741 z � 595666,
3976x+ 37104 z6 � 600 z5 + 2111 z4 + 122062 z3 + 232833 z2
�680336 z + 288814.
(4.30)
Here 1988 is divisible by 7, and we might expect a di↵ernet result modulo
7, which indeed we get:
z6 + 4 z5 + z4 + 6 z3 + 5 z2 + 2 z + 2,
y2z + 3 z5 + 6 z4 + 5 z3 + 6 y2 + 3 z + 4,
y4 + 5 y2 + 6 z5 + 2 z4 + 4 z3 + 5 y2 + 4 z2 + 6 z + 5,
x+ 2 z5 + 4 z4 + z3 + 4 y2 + 2 z.
(4.31)
This example shows that the badness of the prime might depend, not just
on the input polynomials, but also on the ordering chosen. Both (4.30)
and (4.31) have 14 solutions (as does (4.29), since this does not depend
on the order), but the bases have di↵erent leading monomial ideals.
The rest of this section is substantially based on [Arn03]14: a p-adic ap-
proach to Gröbner bases is described in section 5.9.3. As in that paper, we
assume that our ideal is homogeneous, i.e. that it can15 be generated by a set
of generators each of which is homogeneous (Definition 53). If we wish to han-
dle non-homogeneous ideals with respect to a total degree ordering16, we can
homogenize, compute the Gröbner basis and dehomogenize, as in [MM84]. This
may not be optimal, see [GMN+91, p. 49] and Open Problem 21 later. [IPS11]
have shown that the assumption of homogeneity is not strictly necessary.
4.6.1 General Considerations
Given the blow-up in coe�cients that can arise when computing Gröbner bases,
we might wish to compute them by a modular method.
14A somewhat di↵erent approach is taken in [IPS11]: they distinguish good/bad primes by
majority voting — their algorithm deleteUnluckyPrimesSB. This idea is used in the Singular
polynomial algebra system. They also use the Gröbner trace idea: subsequent primes do not
try an S-polynomial which the first prime reduced to 0.
15Note that there may well be sets of generators which are not homogeneous.
16And we generally wish to do so, before, if necessary, using the FGLM Algorithm (12) to
compute a lexicographical base.
200 CHAPTER 4. MODULAR METHODS
Definition 88 Given a specific computation C (by which we mean specifying not
just the ordering, but all the choices in Algorithm 9, or any other algorithm for
computing a Gröbner base) of a Gröbner base G from an input S over Q, denoted
G := C(S), we say that the prime p is of good reduction if lt(G) = lt(C(Sp)),
i.e. we get the same leading term ideal when we apply C modulo p as when we
apply it over Q. If not, we say that p is of bad reduction.
Lemma 11 For a given S and C, there are only finitely many primes of bad
reduction.
Proof. If we compare C(S) and C(Sp), they can only start di↵ering when a
leading coe�cient in the computation of C(S) vanishes when considered modulo
p, because the computation of S-polynomials, and reduction, is entirely driven
by leading terms. But there are only finitely many such coe�cients, and hence
only finitely many bad primes: the set of divisors of these coe�cients.
We should note that this proof is far less e↵ective than the proof of Lemma 8,
but it is su�cient for our purposes. Another proof is given in [Win88, Theorem
1], which in fact shows that the set of bad primes depends only on the input
and the ordering, not on the computation.
Note that we cannot follow the plan of section 4.2.3 as we do not know a
bound on all the integers appearing (and anyway, the idea is to compute with
smaller integers). If we try to follow the plan of section 4.2.4, we hit a snag.
Suppose that two computations modulo di↵erent primes (say p and q) give us
di↵erent leading term ideals. Then one is certainly bad, but which? We do not
have a simple degree comparison as in the g.c.d. case.
4.6.2 The Hilbert Function and reduction
We recall the Hilbert function from section 3.3.12, which will turn out to be
a key tool in comparing Gröbner bases, as earlier we have used degrees for
comparing polynomials.
Theorem 40 ([Arn03, Theorem 5.3]) Let (f1, . . . , fk) generate the ideal I
over Q, and the ideal Ip modulo p. Then
8n 2 N : HI(n) HIp(n). (4.32)
Definition 89 p is said to be Hilbert-good if, and only if, we have equality in
(4.32).
Observation 15 Note that we do not have a test for Hilbert-goodness as such,
but we do have one for Hilbert-badness: If HIp(n) < HIq (n) then q is definitely
Hilbert-bad.
Example 22 (Bigatti [Big15]) Let S be the set {x2, x (x� 3 y) , xy2, xyz, xz (y � 5 z)}.
Over various characteristics its Gröbner bases (always using tdeg(x,y,z)) and
(first few values of) Hilbert functions are:
4.6. GRÖBNER BASES 201
0 {xy, x2, xz2}; (1,3,4,4,5,6,7,8,9,10)
3 {x2, xz2, xyz, xy2}; (1,3,5,4,5,6,7,8,9,10)
5 {xy, x2}; (1,3,4,5,6,7,8,9,10,11)
7 {xy, x2, xz2}; (1,3,4,4,5,6,7,8,9,10)
HI3(2) = 5 > HI5(2) = 4, so p = 3 is proved to be bad by p = 5. But
HI3(3) = 4 < HI5(3) = 5, so p = 5 is proved to be bad by p = 3. To the
best of the author’s knowledge, modular Gröbner bases are the only modular
calculation in which two primes can prove each other bad.
Proposition 56 If p is of good reduction for C(S), then it is Hilbert-good for
the ideal generated by S.
Theorem 41 ([Arn03, Theorem 5.6]) Let (g1, . . . , gt) be a Gröbner base un-
der < for the ideal generated by (f1, . . . , fs) over Q, and (g
0
1, . . . , g
0
t0) be a
Gröbner base under < for the ideal generated by (f1, . . . , fs) modulo p. In both
cases these bases are to be ordered by increasing order under <. Suppose that p
is Hilbert-good for this ideal. Then:
1. lt(g01) lt(g1);
2. If lt(g0j) = lt(gj) for 1 j i, then lt(g0i+1) lt(gi+1).
This result means that, for Hilbert-good primes, we can compare the sequence
of lt(gi) for relative luckiness as we compared degrees in the case of g.c.d.s: at
the first point i where they di↵er, the prime with the lesser lt(gi) is definitely
of bad reduction.
Example 23 (importance of caveat above) If we consider example 22, and
hadn’t checked Hilbert functions, we would deduce that p = 3 proved that p = 7
was bad, since xy < x2. In fact p = 7 is good and generates precisely the right
ideal.
Arnold’s original example was more complicated.
Example 24 ([Arn03, Example 5.7]) Let I = h3 y2x�5 yx2+2x3,�7 y3x+
5 y2x2, 7 y6�2 y3x3+yx5i. We use the ordering ‘total degree then lexicographic,
with y > x’, and consider the primes 5 and 2.
I5 has Gröbner base
�
3 y2x+ 2×3, 29 yx3, x5, y6
, (4.33)
whereas I2 has Gröbner base
�
y2x+ yx2, y6 + yx5
. (4.34)
202 CHAPTER 4. MODULAR METHODS
Table 4.2: Hilbert functions for example 24
l HI5(l) HI5(l) Ml
0 1 1 {1}
1 3 3 {1, x, y}
2 6 6 {1, x, y, x2, xy, y2}
3 9 9 {1, x, y, x2, xy, y2, x3, x2y, y3}
4 12 {1, x, y, x2, xy, y2, x3, x2y, y3, x4, x3y, y4}
4 11 {1, x, y, x2, xy, y2, x3, x2y, y3, x4, y4}
We can tabulate the Hilbert functions as in Table 4.2, though in fact we know
they are di↵erent, since I5 has dimension zero and I2 does not, and hence the
Hilbert polynomials have di↵erent degree. Either way, Hilbert functions tell us
that 2 is definitely not good.
If we just looked at the leading terms, we would note that xy2, the least
leading term, is the same for both I2 and I5, but that x
3y < y6 (the next leading terms), leading us, incorrectly, to infer that 5 was of bad reduction. In fact, 5 is of good reduction, and the true Gröbner base for I itself (as determined by Chinese remainder with 7 and 11) is � 3 y2x� 5 yx2 + 2x3, 29 yx3 � 20x4, x5, y6 . (4.35) 4.6.3 The Modular Algorithm We are now most of the way towards a Chinese Remainder Theorem solution to computing Gröbner Bases for an ideal Ihf1, . . . , fki. We still have to solve three issues. leading coe�cients In the case of g.c.d. computations, the modular answer is only defined up to multiplication by a constant, and the same is true here: each element of the Gröbner base modulo p can be multiplied, indepen- dently, by any non-zero constant. For g.c.d.s, we knew a multiple of the leading coe�cient of the answer (i.e. the g.c.d. of the leading coe�cients of the inputs), which we imposed in (4.8) and thereafter. Here we have no such prior knowledge. We therefore reconstruct a monic Gröbner base, with rational number coe�cients, using Algorithm 25 above. When do we stop? In the g.c.d. case we had the Landau–Mignotte bound. For Gröbner bases, while some bounds are known (e.g. [Dah09]), they are not directly relevant. In practice we take a leaf from Figure 4.5, and say that, if the Gröbner base doesn’t change when we take a new prime, we are likely to have terminated. How do we know correctness? The first issue is that our reconstructed ob- ject G, while a Gröbner base modulo each prime we used, may not be a Gröbner base over the rationals. This can be tested by Theorem 16 clause 4.6. GRÖBNER BASES 203 (1), using, as appropriate, the optimizations from Propositions 38 and 39. But, even if G is a Gröbner base, is it one for I? Once we know G is a Gröbner base, we can check that I ✓ hGi simply by checking that fi ⇤!G0 for i = 1, . . . , k. Inclusion the other way round comes from the following theorem. Theorem 42 ([Arn03, Theorem 7.1]) If G is a Gröbner base, I ✓ hGi and lm(G) = lm(Gp) for some Gp a Gröbner base obtained from the generators of I modulo p, then I = hGi. It is worth noting that we have produced no complexity bound for this algorithm, just as we have stated none for the original Algorithm 9. In practice it seems to be especially e�cient in cases where the final Gröbner basis has small coe�cients, despite the internmediate expression swell. Open Problem 21 (Modular Gröbner Bases for Inhomogeneous Ideals) This algorithm as described is limited, as is [Arn03], to homogeneous ideals with respect to a total degree ordering. Can it be generalized? [IPS11, Remark 2.5] claims that it can be, at the cost of either • being content with a probabilistic algorithm, which may return a G such that hf1, . . . , fki ⇢ hGi; or • homogenizing first — however they say “experiments showed that this is usually not e�cient since the standard basis of the homogenized input often has many more elements than the standard basis of the ideal that we started with”. Open Problem 22 (Reconstructed Bases might not be Gröbner) We stated that it was possible that the reconstruction of modular Gröbner bases might not be a Gröbner base. This certainly seems to be theoretically possible, though we have given no examples of this. 1. Give examples where this happens. This is probably trivial, but what might be harder is to give ones where the primes were nevertheless good, and the problem is that we did not take enough of them. 2. Or show that this cannot in fact happen when the primes are of good reduction. 4.6.4 Conclusion Let us see how we have answered the questions on page 162. 1. Are there ”good” reductions p? Yes, by Lemma 11 there are only finitely many primes of bad reduction, though we have no bounds on the number. 204 CHAPTER 4. MODULAR METHODS Figure 4.17: Algorithm 26 Algorithm 26 (Modular Gröbner base) Input: S = {f1, . . . , fk} homogeneous polynomials in Z[x1, . . . , xn]. Output: G a Gröbner base for hSi over Q. p := find_prime(S); G := modular_GB(S, p); GQ :=Algorithm 25(G, p) N := p; # N is the modulus we will be constructing while true repeat p := find_prime(S); G0 := modular_GB(S, p); if HG, HG0 are incomparable then start again # All primes are bad (see Example 22) else if HG < HG0 then # Do nothing: p is of bad reduction else if HG > HG0
G := G0; N := p; # previous primes were bad
GQ :=Algorithm 25(G, p)
# Hilbert Functions agree: on to Theorem 41
else if lm(G) < lm(G0) #Comparison in the sense of Theorem 41
then # Do nothing: p is of bad reduction
else if lm(G) > lm(G0)
G := G0; N := p; previous primes were bad
GQ :=Algorithm 25(G, p)
else if G0 ⌘ GQ (mod p) # Stabilization as in Figure 4.5
and GQ is a Gröbner base
and 8ifi ⇤!
GQ
0
return GQ #Correct by Theorem 42
else G := Algorithm 48(G0, G, p,N);
N := pN ;
GQ :=Algorithm 25(G,N)
find_prime finds a prime that does not divide any lc(fi) (otherwise we’re
bound to be following a di↵erent computation, and almost certain to have bad
reduction).
modular_GB computes a monic modular Gröbner base, either via Algorithm 9
or any other such (Observation 4).
4.7. CONCLUSIONS 205
2. How can we tell if p is good? Following [Arn03], we have a two-stage
process for comparing p and q when Gp and Gq di↵er: we first compare
Hilbert functions (Theorem 40), then leading monomial ideals (Theorem
41).
3. How many reductions should we take? We have no bound, but rely on
stabilization, as in Figure 4.5.
4. How do we combine? Algorithm 47 and Algorithm 25.
5. How do we check the result? Theorem 42. [IPS10] propose doing a further
modular check, but this works for them as it is essentially the equivalent
of waiting for stabilization in Algorithm 26.
4.7 Conclusions
Let us look generally at our key questions for modular calculations.
1. Are there ”good” reductions from R? All the examples we have seen answer
this question positively, and indeed, if we are using modular arithmetic
to mimic calculations that we could have done without it, then the argu-
ment of Lemma 11 shows that there are always only finitely many bad
reductions.
2. How can we tell if Ri is good? This tends to have a problem-specific
answer, and is always one of the major challenges. Sometimes it may be
immediately evident (as when a set of linear equations becomes insoluble
(mod p)), sometimes we may have di↵erent results in di↵erent-looking
domains R1 and R2 and be able to say that one is definitely wrong (as in
the case of gcd), and sometimes we may need to resort to majority voting
(see note 14).
3. How many reductions should we take? Sometimes we can compute bounds,
but sometimes we have to rely on stabilization as in Figure 4.5 and Al-
gorithm 26. Even if bounds exist, stabilization is often more e�cient in
practice.
4. How do we combine? Either via a version of Chinese Remainder, or by
Farey Reconstruction (Section 4.5.2.3).
5. How do we check the result? This also tends to have a problem-specific
answer, and is generally the other major challenge.
How good are these algorithms, and what are the challenges?
Section 4.1 For matrix determinants, these methods are definitely better than
the classical methods (see Proposition 52), though there is a further im-
provement in Section 5.9.4. There is an Open Problem (16) on the di↵er-
ence between theory and practice, though.
206 CHAPTER 4. MODULAR METHODS
gcd The theoretical state of computations in general is hampered by the ab-
sence of a solution to Open Problems 2 and 3 (page 74), i.e. the absence
of an algorithm whose complexity depends only on the number of terms,
not the degrees. The remaining analyses ignore this challenge.
Section 4.2 Univariate polynomial g.c.d. seems to be about as good as it can
get algorithmically, though there are useful engineering improvements to
the Landau–Mignotte bound in practice (Open Problem 17).
Section 4.3 Bivariate polynomial g.c.d. is also about as good as it can get
algorithmically, though we have posed Open Problem 19.
Section 4.4 Multivariate polynomial g.c.d., based on the bivariate work, seems
as good as it can get algorithmically for dense polynomials. For sparse
polynomials, the work of Zippel explained in this section is good, and
empirically the complexity is essentially a function of the number of terms
in the output, but there is definitely scope for a proper theoretical analysis,
and probably substantial room for practical improvement.
Section 4.5 This depends on the precise application, but the remarks above
about Section 4.1 apply.
Section 4.6 The computation of Gröbner bases by modular techniques is prob-
ably the area where there is the greatest room for improvement, but it is
challenging.
Chapter 5
p-adic Methods
In this chapter, we wish to consider a di↵erent problem, that of factoring poly-
nomials. We will see that this cannot reasonably be solved by the methods
of the previous chapter, and we need a new technique for solving problems in
(large) domains via small ones. The technique is based on the mathematical
concept of p-adic numbers, and its fundamental result, Hensel’s Lemma. The
basic idea behind these algorithms is shown in Figure 5.1: instead of doing a
calculation in some (large) domain R, we do it in several smaller domains Ri,
pick one of these (say Rl) as the best one, grow the solution to some larger
domain bRl, regard this as being in R and check that it is indeed the right result.
5.1 Introduction to the factorization problem
For simplicity, we will begin with the case of factoring a univariate polynomial
over Z. More precisely, we consider the following.
Problem 7 Given f 2 Z[x], compute polynomials fi 2 Z[x] (1 i k) such
that:
Figure 5.1: Diagrammatic illustration of Hensel Algorithms
R
calculation
– – – – – – – – – – – – – – – – – – – – – – – – – – – -> R
k⇥reduce # ”
interpret
& check
R1
calculation�! R1
…
…
…
Rk
calculation�! Rk
9
>
>
>
=
>
>
>
;
choose�! Rl
grow�! bRl
207
208 CHAPTER 5. P -ADIC METHODS
1. f =
Qk
i=1 fi;
2. each fi is irreducible in Z[x], i.e. any polynomial g that divides fi is either
an integer or has the same degree as fi.
We might wonder whether we wouldn’t be better o↵ considering fi 2 Q[x], but
in fact the answers are the same.
Proposition 57 Any factorization over Q[x] of a polynomial f 2 Z[x] is (up
to rational multiples) a factorization over Z[x].
Proof. Let f =
Qk
i=1 fi with fi 2 Q[x]. By clearing denoninators and removing
contents, we can write fi = cigi with gi 2 Z[x] and primitive and ci 2 Q. Hence
f =
⇣
Qk
i=1 ci
⌘⇣
Qk
i=1 gi
⌘
, and, since the product of primitive polynomials is
primitive (Lemma 2),
Qk
i=1 ci is an integer, and can be absorbed into, say, g1.
Even knowing that we have only to consider integer coe�cients does not
seem to help much — we still seem to have an infinite number of possibilities
to consider. In fact this is not quite the case.
Notation 29 Let the polynomial f =
Pn
i=0 aix
i to be factored have degree n,
and coe�cients bounded by H. Let us suppose we are looking for factors of
degree at most d.
Corollary 13 (to Theorem 36) It is su�cient to look for factors of degree
d n/2, whose coe�cients are bounded by 2dH.
One might have hoped that it was su�cient to look for factors whose coe�cients
are bounded by H, but this is not the case. [Abb09] gives the example of
f = x80 � 2×78 + x76 + 2×74 + 2×70 + x68 + 2×66 + x64 + x62 + 2×60 + 2×58
�2×54 + 2×52 + 2×50 + 2×48 � x44 � x36 + 2×32 + 2×30 + 2×28 � 2×26
+2×22 + 2×20 + x18 + x16 + 2×14 + x12 + 2×10 + 2×6 + x4 � 2×2 + 1
whose factors have coe�cients as large as 36, i.e. 18 times as large as the
coe�cients of f . Non-squarefree polynomials are even worse: [Abb09, p. 18]
gives the example of
�1� x+ x2 � x3 + x4 + x5 + x6 + x8 + x10 � x11
�x12 � x13 + x14 � x15 � x17 � x18 + x20 + x21 =
�
1 + 4x+ 8×2 + 14×3 + 21×4 + 28×5 + 34×6 + 39×7 + 42×8 + 43×9 + 41×10
+37×11 + 32×12 + 27×13 + 21×14 + 15×15 + 9×16 + 4×17 + x18
�
(x� 1)3
Although some caution is in order, his table appears to show coe�cient growth
behaving like 0.7⇥1.22d, where d is the degree of a polynomial with coe�cients
at most ±1.
Corollary 14 To detect all irreducible factors of f (except possibly for the last
one, which is f divided by all the factors of degree n/2), it su�ces to consider
�
2dH
�d+1
polynomials.
5.2. MODULAR METHODS 209
We can in fact do better, since the leading coe�cient of the factor must divide
an, and similarly the trailing coe�cient must divide a0, so we get 2
d(d�1)Hd+1,
and in practice such “brute force” methods1 can easily factor low-degree poly-
nomials, but the asymptotics are still very poor.
5.2 Modular methods
Hence we might want to use modular methods. Assuming always that p does
not divide an, we know that, if f is irreducible modulo p, it is irreducible over
the integers. Since, modulo p, we can assume our polynomials are monic, there
are only pd polynomials of degree d, this gives us a bound that is exponential in
d rather than d2. In fact, we can do better by comparing results of factorizations
modulo di↵erent primes.
Example 25 Suppose f is quartic, and factors modulo p into a linear and a
cubic, and modulo q into two quadratics (all such factors being irreducible), we
can deduce that it is irreducible over the integers, since no factorization over the
integers is compatible with both pieces of information.
Definition 90 For a square-free polynomial f , define its factorization shape to
be the multiset2 of the degrees of all the irreducible factors in a given factor-
ization of f into irreducibles. Roughly speaking, this is the “degrees of all the
irreducible factors”, but we have to be slightly careful, since if (x � 1) is an
irreducible factor, so is (�x + 1), and we don’t want to count both. Note that
the factorization shape is independent of which factorization we take.
If f is not known to be square-free, then we have to take a multiset of (degree,
multiplicity) pairs. This case is not useful in practice, but because a square-free
decomposition can be larger than the starting point (Observation 2, page 74),
it can be useful in theory: see section 5.8.1.
In this language, we are saying that the factorization shape modulo p has to
be a splitting of the factorization shape over the integers.
5.2.1 The Musser test
Definition 91 The sumset of a factorisation shape is the set of all sums of
subsets of the factorisation shape.
The sumset of a factorisation shape modulo p is therefore the set of all possible
degrees of factors over the integers. Example 25 has two factorisation shapes
{1, 3} and {2, 2}, whose sumsets are {0, 1, 3, 4} and {0, 2, 4}. The intersection
of these sumsets is {0, 4}, so no proper factorisation over Z is possible.
1Sometimes known as Newton’s algorithm.
2That is, a set but allowing repetitions, so {2, 2, 1} and {2, 1, 1, 1} would be di↵erent
multisets, but of course are both the set {2, 1}.
210 CHAPTER 5. P -ADIC METHODS
It is an old experimental observation [Mus78] that, if a polynomial can be
proved irreducible by intersecting sumsets of factorisation shapes, five shapes,
i.e. five primes, are nearly always su�cient to prove it.
This test was formally analysed in [?], and very substantial experimental
evidence is presented in their Figure 1. The bottom line is that we should
probably take seven primes. The rest of this subsection is somewhat more
technical, and can be skipped by the reader not familiar with Galois Theory
and permutation groups.
Definition 92 The Galois group of a polynomial f = an
Qn
i=1(x�↵i) 2 Z[x] is
the set of all automorphisms of Q(↵1, . . . ,↵n) which leaves Q fixed. The Galois
group is determined by its action on the ↵i, and is therefore a subgroup of the
symmetric group Sn.
Proposition 58 f is irreducible if, and only if, its Galois group is transitive
(for all roots ↵ and �, there is a permutation of the Galois group taking ↵ to
�).
Example 26 Various Galois groups are as follows.
x2 � 2 Here the roots are ↵ =
p
2 ⇡ 1.4142 and � = �
p
2 ⇡ �1.4142. Apart
from the identity permutation, the only option is to exchange ↵ and �.
Hence the Galois Group is C2 = h(↵,�)i. The cycle types are (1, 1) and
(2).
(x2 � 2)(x2 � 3) Here the roots are ↵ =
p
2 ⇡ 1.4142, � = �
p
2 ⇡ �1.4142,
� =
p
3 ⇡ 1.732 and � = �
p
3 ⇡ �1.732. Apart from the identity
permutation, the only options are to exchange ↵ and �, exchange � and �,
or possibly both. Sending ↵ to �, say, is not possible as ↵2 = 2 but �2 = 3.
Hence the Galois Group is C2 ⇥ C2 = h(↵,�), (�, �)i. The cycle types are
(1, 1, 1, 1), (1, 1, 2) and (2, 2). Note that this group is not transitive.
x4 + 1 Here the roots are ↵ = 1+ip
2
, � = 1�ip
2
= 1
↵
, � = �↵ and � = ��. One
option is to send ↵ to �, which replaces i by �i, and is therefore the
permutation (↵,�)(�, �), and another is to replace
p
2 by �
p
2, which is
the permutation (↵, �)(�, �). Hence the Galois group contains, and is in
fact generated by, these two permutations: it is often known as the Klein
4-group, or V . The cycle types are (1, 1, 1, 1) and (2, 2). This group is
transitive, but every cycle shape in this group is a valid cycle shape in
C2 ⇥ C2 from the previous example, which is not transitive.
x8 � 2 Here the roots are ↵k = 8
p
2e2⇡ik/8, k = 0..7. One option is to replace
8
p
2 by 8
p
2e2⇡i/8, which cyclically permutes the roots, i.e. (↵0,↵1, . . . ,↵7).
Another option is to replace i by �i, which corresponds to the permutation
(↵1,↵7), (↵2,↵6), (↵3,↵5). In fact the Galois group is generated by these
two, and is generally called the dihedral group, with 16 elements.
x7 � 2 Here the roots are ↵k = 7
p
2e2⇡ik/7, k = 0..6. One option is to re-
place 7
p
2 by 7
p
2e2⇡i/7, which cyclically permutes the roots, i.e. ⇡1 :=
5.2. MODULAR METHODS 211
(↵0,↵1, . . . ,↵6). Another option is to replace i by �i, which is the permu-
tation ⇡2 := (↵1,↵6), (↵2,↵5), (↵3,↵4). However, ⇡3 := (↵1,↵3,↵2,↵6↵6,↵5)
is in fact also a legal permutation, and the group is generated by ⇡1 and
⇡3 (since ⇡2 = ⇡
3
3) and has 42 elements.
Proposition 59 ([vdW34]) For any n, almost all polynomials of degree n
have Galois group Sn, or, more precisely, the probability of a polynomial having
Galois group Sn is 1, i.e.
lim
H!1
|{polynomials of degree n with coe�cients H and group Sn}|
|{polynomials of degree n with coe�cients H}| = 1.
Corollary 15 For any n, almost all polynomials of degree n are irreducible,
since Sn is transitive.
Hence, to make a statement about “almost all polynomials”, it su�ces to con-
sider those with Galois group Sn.
The next question is how the factorisation shapes mod p relate to the Galois
group, which is settled by the following result.
Proposition 60 (Frobenius Density Theorem) The density of prime num-
bers q for which f(x) (mod q) has factorisation shape {d1, . . . , dk} is equal to
the density in the Galois group G Sn of f(x) of elements of Sn with cycle
type (d1, . . . , dk).
Hence, if we assume the Galois group of f is Sn, the factorisation shape of f(x)
(mod q) is distributed the same way as random permutations from Sn, and the
concept of sumset generalises.
Definition 93 (See Definition 91) The sumset I(�) of a permutation � is
the set of all sums of subsets of the cycle shape of the permutation.
Proposition 61 ([PPR14, Theorem 1.5]; [EFG15, Theorem 1.1]) There
is a constant b4, independent of n, such that
PrS4n (I(�1) \ I(�2) \ I(�3) \ I(�4) = {0, n}) > b4,
where the probability is taken uniformly over quadruples of permutations. This
means that, if the Galois group is Sn, four
3 primes have a nonzero probability of
detecting transitivity (i.e. irreducibility) and in fact of detecting that the group
is Sn [DS00], [PPR14, Theorem 1.3 and discussion].
Emprically [PPR14, Figure 1], b4 ⇡ 0.5. Similarly b5 seems to be about 0.7 (for
small n, the probability is larger, which is presumably why [Mus78] recommends
five primes), and b7 > 0.9.
Observation 16 Since determining that a polynomial is irreducible if one thinks
it factors is much mure expensive than modulo p factorisation, we should prob-
ably take seven primes, rather than the five of [Mus78].
3Four is minimal: cite[EFG15, Theorem 1.2].
212 CHAPTER 5. P -ADIC METHODS
5.3 Factoring modulo a prime
Throughout this section, we let p be an odd4 prime, and f a polynomial of
degree n which is square-free modulo p. There are three common methods for
factoring f : two due to Berlekamp and one to Cantor & Zassenhaus, and various
subsequent improvements, largely in terms of asymptotic complexity. A survey
of this field is [vzGP01]. Most implementers of systems use (variants of) the
Cantor–Zassenhaus algorithm, for reasons explained at the start of Section 5.7.
Although the author knows of no major implementations, the asymptotically
fastest factoring algorithm at the moment of writing seems to be [KU08], which
for polynomials of degree n over a field with q elements takes time Õ(n3/2 log q+
n log2 q).
5.3.1 Berlekamp’s small p method
This is due to [Ber67]. We first need to state some facts about arithmetic modulo
a prime p.
Proposition 62 If a and b are two integers modulo p, then (a+ b)p ⌘ ap + bp.
If we expand (a+ b)p by the Binomial Theorem, all intermediate coe�cients are
divsible by p.
Proposition 63 ap ⌘ a modulo p.
Corollary 16 Every integer modulo p is a root of xp � x, and therefore
xp � x = (x� 0)(x� 1) . . . (x� (p� 1)).
These facts extend to polynomials.
Proposition 64 Let a(x) be a polynomial, then a(x)p ⌘ a(xp) modulo p.
Let us now suppose that f factorises into r irreducible polynomials:
f(x) = f1(x)f2(x) . . . fr(x)
(r is unknown for the present). Since f has no multiple factors, the fi are
relatively prime. Let s1, . . . , sr be integers modulo p. By the Chinese Remainder
Theorem (Algorithm 48)there is a polynomial v such that
v ⌘ si (mod p, fi(x)), (5.1)
and moreover, the degree of v is less than that of the product of the fi, that is
f . Such a polynomial v is useful, for if si 6= sj , then gcd(f, v � si) is divisible
4It is possible to generalise these methods to the case p = 2, but the cost, combined with the
unlikelihood of 2 being a good prime, means that computer algebraists (unlike cryptographers)
rarely do so.
5.3. FACTORING MODULO A PRIME 213
by fi, but not by fj , and therefore leads to a decomposition of f . We have the
following relation:
v(x)p ⌘ spj ⌘ sj ⌘ v(x) (mod fj(x), p),
and, by the Chinese remainder theorem,
v(x)p ⌘ v(x) (mod f(x), p). (5.2)
But, on replacing x by v(x) in Corollary 16
v(x)p � v(x) ⌘ (v(x)� 0)(v(x)� 1) . . . (v(x)� (p� 1)) (mod p). (5.3)
Thus, if v(x) satisfies (5.2), f(x) divides the left hand side of (5.3), and each of
its irreducible factors, the fi, divides one of the polynomials on the right hand
side of (5.3). But this implies that v is equivalent to an integer modulo fi, that
is that v satisfies (5.1). We have proved (by Knuth’s method [Knu81, p. 422])
the following result.
Theorem 43 The solutions v of (5.1) are precisely the solutions of (5.2).
We have already said that the solutions of (5.1) provide information about the
factorisation of f , but we still have the problem of finding them. Berlekamp’s
basic idea is to note that (5.2) is a linear equation for the coe�cients of v. This
remark may seem strange, but it is a consequence of Proposition 64. In fact, if
n is the degree of f , let us consider the matrix
Q =
0
B
B
@
q0,0 q0,1 . . . q0,n�1
q1,0 q1,1 . . . q1,n�1
…
…
…
qn�1,0 qn�1,1 . . . qn�1,n�1
1
C
C
A
,
where
xpk ⌘ qk,n�1xn�1 + · · ·+ qk,1x+ qk,0 (mod f(x), p).
If we consider a polynomial as a vector of its coe�cients, multiplication by
Q corresponds to the calculation of the p-th power of the polynomial. The
solutions of (5.2) are thus the eigenvectors of the matrix Q (mod p) for the
eigenvalue 1. Hence the algorithm in Figure 5.2.
Qmight seem expensive to compute, but if we precompute (xi)p (mod (f(x), p)),
it takes O(n3 + log(p)n2) operations (with classical linear algebra, and assum-
ing that p is single-word size). Determining r, the number of irreducible fac-
tors, similarly takes time O(n3) The running time of the full algorithm5 is
O(n3 + p(r � 1)n2), and the average value of r is lnn. This is very fast if p is
small, but may be expensive if p is not small. We note that the algorithm is
completely deterministic.
5Texts often say O(n3 + prn2), but the rth factor needn’t be calculated this way, as it
follows from knowing all the others.
214 CHAPTER 5. P -ADIC METHODS
Figure 5.2: Algorithm 27; Berlekamp for small p
Algorithm 27
Input: Prime p and f a squarefree (mod p) univariate polynomial
Output: The irreducible factors of f
Calculate Q
Calculate (a basis for) the eigenvectors of Q for the eigenvalue 1,
i.e. (a basis for) the solutions v of (QI)v = 0.
# (1, 0, . . . , 0)T is the trivial eigenvector, as integers solve (5.2) r, the size of the basis, is the number of irreducible factors
if r = 1
then return {f}.
S = ;
for v a non-trivial eigenvector
Let g be the polynomial whose coe�cients are v
for s 2 {0, . . . , p� 1}
if (h := gcd(g � s, f)) 6= 1
then S := S [ {h}
if |S| = r
then return S
5.3.2 The Cantor–Zassenhaus method
This method, a combination of Algorithms 28 and 29, is generally attributed
to [CZ81]6. It is based on the following generalization of Fermat’s (Little)
Theorem/ Corollary 16.
Proposition 65 ([LN97, Theorem 3.20]) All irreducible polynomials of de-
gree d with coe�cients modulo p divide xp
d � x, and in fact
xp
d � x =
Y
e|d
⇤
Y
f :deg(f)=e
f, (5.4)
where
Q⇤
means that we only take irreducible polynomials.
Corollary 17 All irreducible polynomials of degree d with coe�cients modulo
p divide xp
d�1 � 1, except for x itself in the case d = 1. Furthermore, no
irreducible polynomials of degree more than d divide xp
d�1 � 1.
Corollary 18 Half of the irreducible polynomials of degree d (except for x itself
in the case d = 1) with coe�cients modulo p divide (x� a)(pd�1)/2 � 1, for any
a (but a di↵erent 50%,depending on a).
Hence we deduce Algorithm 28, which splits a square-free polynomial f as
Q
fi,
where each fi is the product of irreducibles of degree i.
6Though [Zip93] credits algorithm 28 to [Arw18], and [vzGP01] credits it to Gauß.
5.3. FACTORING MODULO A PRIME 215
Figure 5.3: Algorithm28: Distinct Degree Factorization
Algorithm 28 (Distinct Degree Factorization)
Input: f(x) a square-free polynomial modulo p, not divisible by x; a prime p
Output: A decomposition f =
Q
fi, where each fi is the product of irreducibles
of degree i.
i:=1
while 2i deg(f)
g := xp
i�1 (mod f) (*)
fi := gcd(g � 1, f)
f := f/fi
i := i+ 1
if f 6= 1
then fdeg(f) := f
Note that the computation in line (*) should be done by the repeated squaring
method, reducing modulo f at each stage. We can save time in practice by re-
using the previous g. Various improvements (at least in asymptotic complexity)
are given in [vzGS92].
If fi has degree i, then it is clearly irreducible: otherwise we have to split
it. This is the purpose of Algorithm 29, which relies on a generalization of
Corollary 18.
Proposition 66 ([CZ81, p. 589]) Let f be a product of r > 1 irreducible
polynomials of degree d modulo p, and g a random (non-constant) polynomial
of degree < d. Then the probability that gcd(g(p
d�1)/2 � 1, f) is either 1 or f is
at most 21�r.
Proposition 67 In classical arithmetic, the running time of the Cantor–Zass-
enhaus Algorithm (i.e. Algorithm 28 followed by Algorithm 29) is O(d3 log p),
where d is the degree of the polynomial being factored.
We do O(d log p) operations on polynomials of degree d, where the factor log p
comes from the xp
i
(mod f) computations. We note that Algorithm 28 is
deterministic, but Algorithm 29 is probabilistic.
5.3.3 Berlekamp’s large p method
This method is due to [Ber70]. It has an alternative, matrix-based, form of
Algorithm 28, again deterministic, and a probabilistic equivalent of Algorithm
29. We do not go further into it here.
216 CHAPTER 5. P -ADIC METHODS
Figure 5.4: Algorithm29: Split a Distinct Degree Factorization
Algorithm 29 (Split a Distinct Degree Factorization)
Input: A prime p, a degree d and a polynomial f(x) (mod p) known to be the
product of irreducibles of degree d (and not divisible by x)
Output: The factorization of f (modulo p) into irreducibles of degree d.
if d = deg(f)
then return f # degree d so is irreducible
W := {f}
ans := ;
while W 6= ; # factors of degree d found
h :=RandomPoly(p,d)
h := h(p
d�1)/2 (mod g) (*)
V := ; # list of polynomials to be processed
for g 2W do
h1 := gcd(h� 1, g)
if deg(h1) = 0 _ deg(h1) = deg(g)
then V := V [ {g} # we made no progress
else process(h1)
process(g/h1)
W := V
return ans
Here RandomPoly(p,d) returns a random non-constant polynomial modulo p of
degree less than d, and the sub-function process(g1) takes a polynomial g1 which
is a result of splitting g by h1, and adds it to ans if it has degree d, otherwise
adds it to V . Again (*) should be done by repeated squaring, reduing modulo
g every time.
5.4 From Zp to Z?
Now that we know that factoring over the integers modulo p is possible, the
obvious strategy for factoring polynomials over the integers would seem to be
to follow one of algorithms 17 or 18. This would depend on having ‘good’
reduction, which one would naturally define as follows.
Definition 94 (Optimistic) We say that p is of good reduction for the factor-
ization of the polynomial f if the degrees of the irreducible factors of f (modulo
p) are the same as those of the factors of f .
And indeed, if we can find p of good reduction, then we could factorize f .
Unfortunately these are rare, and possibly non-existent.
Corollary 19 (of Proposition 60) If f is a generic (formally speaking, with
Galois group Sn) polynomial of degree n, the probability of its remaining irre-
ducible modulo p is 1/n.
5.4. FROM ZP TO Z? 217
Nevertheless, we could hope that we can piece together results from several
primes to get information. For example, x5 + x+ 3 factors into irreducibles as
�
x3 + 7x2 + 3x+ 7
� �
x2 + 4x+ 2
�
mod 11
(and therefore does not have linear factors over Z) and as
�
x4 + 5x3 + 12x2 + 8x+ 2
�
(x+ 8) mod 13
(therefore any factorization has a linear factor) and hence must be irreducible
over the integers. This test is described in Section 5.2.1. However, there are
irreducible polynomials which can never be proved so this way.
Example 27 The polynomial x4 + 1, which is irreducible over the integers,
factors into two quadratics (and possibly further) modulo every prime7.
p = 2 Then x4 + 1 = (x+ 1)4.
p = 4k + 1 In this case, �1 is always a square, say �1 = q2. This gives us the
factorisation x4 + 1 = (x2 � q)(x2 + q).
p = 8k ± 1 In this case, 2 is always a square, say 2 = q2. This gives us the
factorisation x4 + 1 = (x2 � (2/q)x + 1)(x2 + (2/q)x + 1). In the case
p = 8k + 1, we have this factorisation and the factorisation given in the
previous case. As these two factorisations are not equal, we can calculate
the g.c.d.s of the factors, in order to find a factorisation as the product of
four linear factors.
p = 8k + 3 In this case, �2 is always a square , say �2 = q2. This is a result
of the fact that �1 and 2 are not squares, and so their product must
be a square. This property of �2 gives us the factorisation x4 + 1 =
(x2 � (2/q)x� 1)(x2 + (2/q)x� 1)
This polynomial is not an isolated oddity: [SD70] and [KMS83] proved that there
are whole families of polynomials with this property of being irreducible, but of
factorising compatibly modulo every prime, and indeed compatibly into many
quadratics. Several people have said that these polynomials are, nevertheless,
“quite rare”, which is true if one takes polynomials at random, but [ABD85]
showed that they can often occur in the manipulation of algebraic numbers.
Even if we have factorizations modulo several primes, a further problem
arises, which we will illustrate with the example of x4 + 3. This factors as
x4 + 3 =
�
x2 + 2
�
(x+ 4) (x+ 3) mod 7
and
x4 + 3 =
�
x2 + x+ 6
� �
x2 + 10x+ 6
�
mod 11. (5.5)
7The Galois group of this polynomial is discussed on page 210.
218 CHAPTER 5. P -ADIC METHODS
In view of the second factorization, the first has too much decomposition, and
we need only consider the split
x4 + 3 =
�
x2 + 2
� �
x2 + 5
�
mod 7, (5.6)
obtained by combining the two linear factors.
When we come to combine results modulo these two primes by Chinese
Remainder Theorem (Theorem 50) to deduce a congruence modulo 77, we have
a dilemma: do we pair
�
x2 + x+ 6
�
with
�
x2 + 2
�
or
�
x2 + 5
�
? Both seem
feasible.
In fact, both are feasible. The first pairing gives
x4 + 3 =
�
x2 + 56x+ 72
� �
x2 � 56x� 16
�
mod 77, (5.7)
and the second gives
x4 + 3 =
�
x2 + 56x+ 61
� �
x2 � 56x� 5
�
mod 77 : (5.8)
both of which are correct. The di�culty in this case, as in general, is that,
while polynomials over Z7 have unique factorization, as do those over Z11 (and
indeed modulo any prime), polynomials over Z77 (or any product of primes) do
not, as (5.7) and (5.8) demonstrate.
5.5 Hensel Lifting
Our attempts to use the Chinese Remainder Theorem seem doomed8: we need
a di↵erent solution, which is provided by what mathematicians call p-adic meth-
ods, and computer algebraists call Hensel Lifting. This is the topic of the next
four subsections.
5.5.1 Linear Hensel Lifting
This is the simplest implementation of the phase described as ‘grow’ in Figure
5.1: we grow incrementally Zp ! Zp2 ! Zp3 ! · · ·! Zpm .
For simplicity, we consider first the case of a monic polynomial f , which
factorizes modulo p as f = gh, where g and h are relatively prime (which implies
that f modulo p is square-free, that is, that p does not divide the resultant of f
and f 0). We use parenthesized superscripts, as in g(1), to indicate the power of
p modulo which an object has been calculated. Thus our factorization can be
written f (1) = g(1)h(1) (mod p1) and our aim is to calculate a corresponding
factorization f (k) = g(k)h(k) (mod pk) such that pk is su�ciently large.
8In fact, they are doomed for two distinct reasons. The first is that we are not emulating
modulo p a calculation over the integers, so the argument of Lemma 11 does not apply. The
second is that we cannot answer question 4 — how do we combine the results, as our results
are a set with, possibly, no distinguishing features.
5.5. HENSEL LIFTING 219
Obviously, g(2) ⌘ g(1) (mod p), and therefore we can write g(2) = g(1)+pĝ(2)
where ĝ(2) is a measure of the di↵erence between g(1) and g(2). The same holds
for f and h, so that f (2) = g(2)h(2) (mod p2) becomes
f (1) + pf̂ (2) = (g(1) + pĝ(2))(h(1) + pĥ(2)) (mod p2).
Since f (1) = g(1)h(1) (mod p1), this equation can be rewritten in the form9
f (1) � g(1)h(1)
p
�
| {z }
computed mod p2
+f̂ (2) = ĝ(2)h(1) + ĥ(2)g(1) (mod p). (5.9)
The left hand side of this equation is known, whereas the right hand side de-
pends linearly on the unknowns ĝ(2) and ĥ(2). Applying the extended Euclidean
Algorithm (5) ) to g(1) and h(1), which are relatively prime, we can find polyno-
mials ĝ(2) and ĥ(2) of degree less than g(1) and h(1) respectively, which satisfy
this equation modulo p. The restrictions on the degrees of ĝ(2) and ĥ(2) are
valid in the present case, for the leading coe�cients of g(k) and h(k) have to be
1. Thus we can determine g(2) and h(2).
Similarly, g(3) ⌘ g(2) (mod p2), and we can therefore write g(3) = g(2) +
p2ĝ(3) where ĝ(3) is a measure of the di↵erence between g(2) and g(3). The same
is true for f and h, so that f (3) = g(3)h(3) (mod p3) becomes
f (2) + p2f̂ (3) = (g(2) + p2ĝ(3))(h(2) + p2ĥ(3)) (mod p3).
Since f (2) = g(2)h(2) (mod p2), this equation can be rewritten in the form
f (2) � g(2)h(2)
p2
+ f̂ (3) = ĝ(3)h(2) + ĥ(3)g(2) (mod p). (5.10)
Moreover, g(2) ⌘ g(1) (mod p), so this equation simplifies to
f (2) � g(2)h(2)
p2
+ f̂ (3) = ĝ(3)h(1) + ĥ(3)g(1) (mod p).
The left hand side of this equation is known, whilst the right hand side depends
linearly on the unknowns ĝ(3) and ĥ(3). Applying the extended Euclidean algo-
rithm to g(1) and h(1), which are relatively prime, we can find the polynomials
ĝ(3) and ĥ(3) of degrees less than those of g(1) and h(1) respectively, which satisfy
this equation modulo p. Thus we determine g(3) and h(3) starting from g(2) and
h(2), and we can continue these deductions in the same way for every power pk
of p until pk is su�ciently large.
We should note that Euclid’s algorithm is always applied to the same poly-
nomials, and therefore it su�ces to perform it once. In fact, we can state the
algorithm in Figure 5.5.
9This comes from dividing the previous congruence by p, and hence the term in [. . .] must
be computed modulo p2. The same warning applies throughout this chapter. Note also that
we have applied the rule that A ⌘ B (mod p2) means that A/p ⌘ B/p (mod p) not
(mod p2).
220 CHAPTER 5. P -ADIC METHODS
Figure 5.5: Algorithm 30
Algorithm 30 (Univariate Hensel Lifting (Linear Two Factor version))
Input: f, g(1), h(1), p, k with f monic and ⌘ g(1)h(1) (mod p)
Output: g(k), h(k) with f ⌘ g(k)h(k) (mod pk)
g := g(1)
h := h(1)
g(r), h(r) := Algorithm 5(g(1), h(1)) in Zp[x]
for i := 2 . . . k
� :=
f � gh (mod pi)
pi�1
g(c) := � ⇤ h(r) (mod (p, g(1)))
h(c) := � ⇤ g(r) (mod (p, h(1)))
g := g + pi�1g(c)
h := h+ pi�1h(c)
return (g, h)
To solve the more general problem of lifting a factorization of a non-monic
polynomial, we adopt the same solution as in the g.c.d. case: we impose the
leading coe�cient in each factor to be the leading coe�cient of the orgiinal
polynomial (which we may as well assume to be primitive). We also address the
problem of lifting n (rather than just 2) factors in Algorithm 31. We have also
incorporated an “early termination” test, which is very useful in practice.
Proposition 68 Assuming classical arithmetic (but ignoring the fact that the
length of numbers is quantised into multiples of the wordlength), the cost of the
lift from pk to pk+1 is
�
k+1
k
�2
the cost of the lift from pk�1 to pk. Hence the
cost of a lift to pk is O(k3).
5.5.2 Quadratic Hensel Lifting
This is an alternative implementation of the phase described as ‘grow’ in Figure
5.1: we grow incrementally Zp ! Zp2 ! Zp4 ! · · · ! Zp2m , squaring the
modulus each time.
As in the previous section, we first consider the case of a monic polynomial
with two factors. The lifting from p to p2 proceeds exactly as in the previous
section, and equation (5.9) is still valid. The lifting from p2 to p4 is similar to
the lifting from p2 to p3 in the previous section, and the analogue of equation
(5.10) is the following
f (2) � g(2)h(2)
p2
+ f̂ (4) = ĝ(4)h(2) + ĥ(4)g(2) (mod p2). (5.11)
5.5. HENSEL LIFTING 221
Figure 5.6: Algorithm 31
Algorithm 31 (Univariate Hensel Lifting (Linear version))
Input: f, g
(1)
1 , . . . , g
(1)
n , p, k with f 2 Z[x] primitive and ⌘
Q
g
(1)
i (mod p)
Output: g
(k)
1 , . . . , g
(k)
n with f ⌘
Q
g
(k)
i (mod p
k)
for j := 1 . . . n
g
(1)
j :=
lc(f)
lc(g
(1)
j
)
g
(1)
j
F := lc(f)n�1f #leading coe�cients imposed
for j := 1 . . . n
g
(r)
j , h
(r)
j := Algorithm 5(g
(1)
j ,
Q
i 6=j g
(1)
i ) in Zp[x]
for i := 2 . . . k
� :=
F �Qj gj (mod pi)
pi�1
if � = 0
then break #True factorization discovered
for j := 1 . . . n
g
(c)
j := � ⇤ h
(r)
j (mod (p, g
(1)
j ))
gj := gj + p
i�1g
(c)
j
for j := 1 . . . n
gj := pp(gj) #undo the imposed leading coe�cients
return (g1, . . . , gn)
222 CHAPTER 5. P -ADIC METHODS
Figure 5.7: Algorithm 32
Algorithm 32 (Univariate Hensel Lifting (Quadratic Two Factor version))
Input: f, g(1), h(1), p, k with f monic and ⌘ g(1)h(1) (mod p)
Output: g, h with f ⌘ gh (mod p2k)
g := g(1)
h := h(1)
for i := 1 . . . k #f ⌘ gh (mod p2i�1)
g(r), h(r) := Algorithm 5(g, h) in Z
p2
i�1 [x]
� :=
f � gh (mod p2i)
p2
i�1
g(c) := � ⇤ h(r) (mod (p2i�1 , g))
h(c) := � ⇤ g(r) (mod (p2i�1 , h))
g := g + p2
i�1
g(c)
h := h+ p2
i�1
h(c) #f ⌘ gh (mod p2i)
return (g, h)
The di↵erence is that this equation is modulo p2 rather than modulo p, and
hence the inverses have to be recomputed, rather than being the same, as (5.10)
can re-use the inverses from (5.9). We give the corresponding Algorithm 32 in
Figure 5.7, an analogue of Algorithm 30, except that the call to Algorithm 5 is
inside the loop, rather than preceding it.
Equally, we can give the general Algorithm 33 in Figure 5.8 as an equivalent
of Algorithm 31. Again, the Extended Euclidean Algorithm is in the innermost
loop, and pratcical experience confirms that this is indeed the most expensive
step. It is also the case that we are repeating work, inasmuch as the calculations
are the same as before, except that the inputs are correct to a higher power of
p than before, and the resuts are required to a higher power of p than before.
Proposition 69 Assuming classical arithmetic (but ignoring the fact that the
length of numbers is quantised into multiples of the word length), the cost of the
lift from p2
k
to p2
k+1
is 4 times the cost of the lift from p2
k�1
to p2
k
. Hence
the cost of a lift to p2
k
is dominated by the cost of the last step, which involves
arithmetic on numbers of length O(2k), and so is O(22k).
5.5.3 Quadratic Hensel Lifting Improved
The main cost in algorithms 32 and 33 is the repeated calls to Algorithm 5. In
fact, for each iteration on k, Algorithm 5 is doing the same calculations, but
to a higher power of p. Do we need to do these calculations from scratch each
time? The answer is that in fact we can re-use the previous calculations.
5.5. HENSEL LIFTING 223
Figure 5.8: Algorithm 33
Algorithm 33 (Univariate Hensel Lifting (Quadratic version))
Input: f, g
(1)
1 , . . . , g
(1)
n , p, k with f primitive and ⌘
Q
g
(1)
i (mod p)
Output: g1, . . . , gn with f ⌘
Q
gi (mod p
2k)
for j := 1 . . . n
g
(1)
j :=
lc(f)
lc(g
(1)
j
)
g
(1)
j
F := lc(f)n�1f #leading coe�cients imposed
for i := 1 . . . k #F =
Q
gj (mod p
2i�1)
� :=
F �Qj gj (mod p2
i
)
p2
i�1
if � = 0
then break #True factorization discovered
for j := 1 . . . n
g
(r)
j , h
(r)
j := Algorithm 5(gj ,
Q
l 6=j gl) in Zp2i�1 [x]
g
(c)
j := � ⇤ h
(r)
j (mod (p
2i�1 , gj))
gj := gj + p
2i�1g
(c)
j
#F =
Q
gj (mod p
2i)
for j := 1 . . . n
gj := pp(gj) #undo the imposed leading coe�cients
return (g1, . . . , gn)
224 CHAPTER 5. P -ADIC METHODS
Figure 5.9: Algorithm 34
Algorithm 34 (Univariate Hensel Lifting (Improved Quadratic Two Factor version))
Input: f, g(1), h(1), p, k with f monic and ⌘ g(1)h(1) (mod p)
Output: g, h with f ⌘ gh (mod p2k)
g := g(1)
h := h(1)
g(r), h(r) := Algorithm 5(g, h) in Zp[x]
for i := 1 . . . k #f ⌘ gh (mod p2i�1)
#gg(r) ⌘ 1 (mod h, p2i�1)
#hh(r) ⌘ 1 (mod g, p2i�1)
� :=
f � gh (mod p2i)
p2
i�1
g(c) := � ⇤ h(r) (mod (p2i�1 , g))
h(c) := � ⇤ g(r) (mod (p2i�1 , h))
if i < k
�(r) := � gg(r)+hh(r)�1
p2
i�1 � g(c)g(r) � h(c)h(r)
g(r,c) := �(r) ⇤ g(r) (mod (p2i�1 , h))
g(r) := g(r) + p2
i�1
g(r,c)
h(r,c) := �(r) ⇤ h(r) (mod (p2i�1 , g))
h(r) := h(r) + p2
i�1
h(r,c)
g := g + p2
i�1
g(c)
h := h+ p2
i�1
h(c) #f ⌘ gh (mod p2i)
return (g, h)
Consider equation (5.11), where we need the inverses of g(2) and h(2), i.e.
g(r,2) and h(r,2) such that g(2)g(r,2) + h(2)h(r,2) ⌘ 1 (mod p2). From the previ-
ous step we know g(r,1) and h(r,1) such that g(1)g(r,1)+h(1)h(r,1) ⌘ 1 (mod p1).
Write g(r,2) = g(r,1)+pĝ(r,2) etc., then g
(1)g(r,1)+h(1)h(r,1)�1
p
+ĝ(2)g(r,1)+g(1)ĝ(r,2)+
h(2)h(r,1) + h(1)ĥ(r,2) ⌘ 0 (mod p). This is again a linear equation for the un-
knowns ĝ(r,2) and ĥ(r,2). If we write it as
g(1)ĝ(r,2) + h(1)ĥ(r,2) = �(r) (mod p), (5.12)
the similarity with (5.10) is obvious, and ĝ(r,2) = �(r)g(r,1) (mod h(1), p),
ĥ(r,2) = �(r)h(r,1) (mod g(1), p).
The analogy of Algorithm 32 is then Algorithm 34, given in Figure 5.9. We
do not give an analogy of Algorithm 33, as the notation becomes quite intricate.
TO BE COMPLETEDcitation from MooreNorman
5.5. HENSEL LIFTING 225
5.5.4 Hybrid Hensel Lifting
In theory, quadratic lifting requires many fewer lifting stages than linear lifting,
and, though the individual steps are more expensive, quadratic lifting should
be faster. This is borne out by simple asymptotic analysis: Proposition 68 says
that the cost of linear lifting to pk is O(k3), while Proposition 69 says that the
cost of quadratic lifting to p2
k0
is O((2k
0
)2). Since we expect k ⇡ 2k0 , quadratic
lifting seems to win.
Reality, though, is more complicated than this asymptotic analysis. Firstly
we are lifting to the least power of p, say pm, greater than 2M , where M is
the appopriate Landau–Mignotte bound. Secondly, number lengths are indeed
quantised into units of the length of the machine word, whereas p itself is nor-
mally small, often10 19. Thirdly, the cost of quadratic lifting is dominated by
that of the last step.
In practice, therefore, we tend to use a hybrid: lifting quadratically to begin
with, p, p2, p4, . . . p`, and then linearly, p2`, p4` . . . . There are two main variants
of this.
• Let ` be such that p` is the greatest power of p to fit in one machine word.
This means Algorithm 5, generally the most expensive step, is only ever
applied to polynomials with single-word coe�cients. Since ` may not be
a power of two, the last quadratic lift may be somewhat truncated.
• Let ` be the largest power of 2 such that pm can be reached economically
from p`. The following procedure is suggested. Let m0 be the greatest
power of 2 such that 4m0 m, and lift quadratically to pm0 . The next
steps depend on where m fits with respect to multiples of m0. The details
are given in [Abb88].
4m0 = m Two more quadratic lifts p
2m0 and p4m0 = pm.
4m0 < m 5m0 Linear lifting p2m0 , p3m0 , p4m0 and p5m0 � pm.
5m0 < m 6m0 Quadratic lifting to p2m0 , then linear lifting p4m0 and
p6m0 � pm.
6m0 < m 7m0 Linear lifting p2m0 , p3m0 , p4m0 , p5m0 , p6m0 and p7m0 �
pm.
7m0 < m 8m0 Quadratic lifting to p2m0 , then linear lifting p4m0 , p6m0
and p8m0 � pm.
The choice between these, and the fine-tuning of the choices in the second
option, are very dependent on details of the implementation in practice.
Further details, and a “balanced factor tree” approach, are given in [vzGG99,
§15.5].
TO BE COMPLETED
1019 is the seventh odd prime: see Observation 16.
226 CHAPTER 5. P -ADIC METHODS
Figure 5.10: Algorithm 35: Combine Modular Factors
Algorithm 35 (Combine Modular Factors)
Input: A prime power pk, a monic polynomial f(x) over the integers, a fac-
torisation f =
Qr
i=1 fi modulo p
k
Output: The factorization of f into monic irreducible polynomials over the
integers.
ans := ;
M :=LandauMignotteBound(f)
S := {f1, . . . , fr}
for subsets T of S
h :=
Q
g2T g (mod p
k)
if |h| < M and h divides f
then ans := ans [ {h}
f := f/h
S := S \ T
M := min(M,LandauMignotteBound(f))
5.6 The recombination problem
However, as pointed out in section 5.4, the fact that we have a factorization of
f modulo pk, where pk > 2M , M being the Landau–Mignotte bound (Theorem
36), or some alternative (see page 297) on the size of any coe�cients in any
factor of f , does not solve our factorization problem. It is perfectly possibly
that f is irreducible, or that f does factor, but less than it does modulo p.
Assuming always that p does not divide lc(f), all that can happen when we
reduce f modulo p is that a factor of f that was irreducible over the integers
now factors modulo p (and hence modulo pk). This gives us the algorithm in
Figure 5.6, first suggested in [Zas69]. To guarantee irreducibility of the factors
found, the loop must try all subsets of T before trying T itself, but this still
leaves a lot of choices for the loop: see 1 below.
The running time of this algorithm is, in the worst case, exponential in r since
2r�1 subsets of S have to be considered (2r�1 since considering T e↵ectively also
considers S \T ). Let n be the degree of f , and H a bound on the coe�cients of
f , so the Landau–Mignotte bound is at most 2n(n+1)H, and k logp(2n+1(n+
1)H).
Many improvements to, or alternatives to, this basic algorithm have been
suggested since. In essentially chronological order, the most significant ones are
as follows.
1. [Col79] pointed out that there were two obvious ways to code the “for
subsets T of S” loop: increasing cardinality of T and increasing degree of
Q
g2T g. He showed that, subject to two very plausible conjectures, the
5.7. UNIVARIATE FACTORING SOLVED 227
average number of products actually formed with the cardinality ordering
was O(n2), thus the average running time would be polynomial in n.
2. [LLJL82] had a completely di↵erent approach to algorithm 35. They
asked, for each d < n, “given f1 2 S, what is the polynomial g of de-
gree d which divides f over the integers and is divisible by f1 modulo
pk?”. Unfortunately, answering this question needed a k far larger than
that implied by the Landau–Mignotte bound, and the complexity, while
polynomial in n, was O(n12), at least while using classical arithmetic. This
paper introduced the ‘LLL’ lattice reduction algorithm, which has many
applications in computer algebra and far beyond.
3. [ABD85] showed that, by a combination of simple divisibility tests and
“early abort” trial division (Proposition 55) it was possible to make dra-
matic reductions, at the time up to four orders of magnitude, in the con-
stant implied in the statement “exponential in r”.
4. [ASZ00] much improved this, and the authors were able to eliminate whole
swathes of possible T at one go.
5. [vH02] reduces the problem of finding T to a ‘knapsack’ problem, which, as
in method 2, is solved by LLL, but the lattices involved are much smaller
— of dimension r rather than n. At the time of writing, this seems to
be the best known method. His paper quoted a polynomial of degree
n = 180, with r = 36 factors of degree 5 modulo p = 19, but factoring as
two polynomials of degree 90 over the integers. This took 152 seconds to
factor.
Open Problem 23 (Evaluate [vH02] against [ASZ00]) How does the fac-
torization algorithm of [vH02] perform on the large factorizations successfully
solved by [ASZ00]? Clearly the algorithm of [vH02] is asymptotically faster, but
where is the cut-o↵ point? Note also that [vH02]’s example of x128 � x112 +
x80� x64 + x48� x16 +1 is in fact a disguised cyclotomic polynomial (as shown
by the methods of [BD89]), being
�
x240 � 1
� �
x16 � 1
�
(x80 � 1) (x48 � 1) =
Y
1 k < 15
gcd(k, 15) = 1
⇣
x16 � e2⇡ik/15
⌘
.
5.7 Univariate Factoring Solved
We can put together the components we have seen to deduce an algorithm
(Figure 5.11) for factoring square-free polynomials over Z[x]. We have chosen to
use the Cantor–Zassenhaus method (Section 5.3.2) for the modular factorisation,
as most implementers today do, simply because it gives the information needed
for the Musser test.
228 CHAPTER 5. P -ADIC METHODS
Figure 5.11: Overview of Factoring Algorithm
Algorithm 36 (Factor over Z)
Input: A primitive square-free f(x) 2 Z[x]
Output: The factorization of f into irreducible polynomials over the integers.
p1 := find_prime(f);
F1 :=Corollary 17(f, p1)
if F1 is a singleton
return f
S :=AllowableDegrees(F1)
for i := 2, . . . , 7 #7 from Observation 16
pi := find_prime(f);
Fi :=Corollary 17(f, pi)
S := S\AllowableDegrees(Fi)
if S = ;
return f
(p, F ) :=best({(pi, Fi)}); #‘best’ in terms of fewest factors
Complete factorization modulo p by Algorithm 29
F :=Algorithm 31(f, F, p, logp(2LM(f)))
return Algorithm 35(plogp(2LM(f)), f, F )
find_prime(f) returns an odd prime p, generally as small as possible, such
that f remains square-free modulo p. AllowableDegrees(F ) returns the set of
allowable proper factor degrees from a distinct degree (not needing a complete)
modular factorization.
Algorithm 35 can be replaced by any of improvements 1–5 on pages 226–227.
5.8. MULTIVARIATE FACTORING 229
Open Problem 24 (Better Choice of ‘Best’ Prime) In algorithm 36, we
picked the ‘best’ prime and the corresponding factorization. In principle, it is
possible to do better, as in the following example, but the author has seen no
systematic treatment of this.
Example 28 Suppose fp factors as polynomials of degree 2, 5 and 5, and fq
factors as polynomials of degree 3, 4 and 5: in terms of number of factors p and
q are equivalent. The factorization modulo p implies that the allowable degrees
of (proper) factors are 2, 5, 7 and 10. The factorization modulo q implies that
the allowable degrees of (proper) factors are 3, 4, 5, 7, 8 and 9. Hence the only
possible degrees of proper factors over the integers are 5 and 7. The factorization
modulo p can yield this in two ways, but the factorization modulo q can only
yield this by combining the factors of degrees 3 and 4. Hence we should do this,
and lift this factorization of fq, rather than the complete factorization.
Open Problem 25 (Low-degree Factorization) Although no major system
to the author’s knowledge implements it, it would be possible to use the tools we
have described to implement an e�cient procedure to find all factors of limited
total degree d. This would be e�cient for three reasons:
• the Cantor–Zassenhaus algorithm (section 5.3.2) need only run up to max-
imum degree d;
• it should be possible to use smaller bounds on the lifting
• The potentially-exponential recombination process need only run up to total
degree d. In particular, if d = 1, no recombination is needed.
At the moment, the user who only wants low-degree factors has to either program
such a search, or rely on the system’s factor, which may waste large amounts
of time looking for high-degree factors, which the user does not want.
In fact, [Gre15] described a library implemented in Mathemagix, which finds
the factors of degree d of a polynomial f in time which is polynomial in d
and the sparse bit size (Definition 27) of f . This is a much more powerful claim
than that made above, and relies on key results [Len99a, Len99b] that state,
essentially, that if a non-cyclotomic (Definition 115) polynomial g of degree
d divides f1 + f2, and there is a su�ciently large gap between the degrees
appearing in f1 and f2, then g must divide both f1 and f2 separately.
5.8 Multivariate Factoring
Just as in section 4.3, we can use “evaluating y at the value v”, i.e. working
modulo (y � v), as an analogue of working modulo p. Just as in section 4.3,
having worked modulo (y � v), we can lift this to (y � v)2 and so on. There is
a fairly obvious generalisation of Algorithm 31, given in Algorithm 37, where
we are given f(x, y, x1, . . . , xm) 2 Z[x1, . . . , xm, y, x] = R[y, x], a value v 2 Z
and the factorization of f(x, v, x1, . . . , xm) 2 R[x], and we wish to lift this to a
230 CHAPTER 5. P -ADIC METHODS
Figure 5.12: Algorithm 37
Algorithm 37 (Multivariate Hensel Lifting (Linear version))
Input: f, g
(1)
1 , . . . , g
(1)
n , v, k with f 2 R[y][x] primitive and ⌘
Q
g
(1)
i (mod (y�
v))
Output: g
(k)
1 , . . . , g
(k)
n with f ⌘
Q
g
(k)
i (mod (y � v)k)
for j := 1 . . . n
g
(1)
j :=
lc(f)
lc(g
(1)
j
)
g
(1)
j
F := lc(f)n�1f #leading coe�cients imposed
for j := 1 . . . n
g
(r)
j , h
(r)
j := Algorithm 5(g
(1)
j ,
Q
i 6=j g
(1)
i ) in R[x]
for i := 2 . . . k
� :=
F �Qj gj (mod (y � v)i)
(y � v)i�1
if � = 0
then break #True factorization discovered
for j := 1 . . . n
g
(c)
j := � ⇤ h
(r)
j (mod ((y � v), g
(1)
j ))
gj := gj + (y � v)i�1g(c)j
for j := 1 . . . n
gj := pp(gj) #undo the imposed leading coe�cients
if f 6=Qj gj
then return “bad evaluation”
return (g1, . . . , gn)
factorization modulo (y � v)k. In practice, k = 1 + degy(f), for then this lifted
factorization should be the true factorization in R[y, x].
This is linear lifting: there are also analogies of quadratic lifting (Algorithm
33) and hybrid lifting (section 5.5.4),but in practice these are rarely imple-
mented, as in the multivariate setting most of the cost is in the last lift, as the
polynomials have more and more terms as we lift.
5.8.1 A “Good Reduction” Complexity Result
Of course, it is possible that v is a “bad reduction” for f , in that f(x, v, x1, . . . , xn)
factors more than f(x, y, x1, . . . , xn) does. Unlike the modulo p case, where we
saw in Example 27 that this might always happen, in practice11 these cases
11This can be proved for reductions to two or more variables, in the sense that the “bad
reduction” values satisfy polynomials of degree as most d2 � 1 (characteristic 0, [Rup86]) or
12d6 (characteristic p, [Kal95]), where d is the degree of the polynomial in the variables not
being reduced.
5.8. MULTIVARIATE FACTORING 231
are rare. Indeed, if m � 1, i.e. we are factoring trivariates or beyond, we can
e�ciently pick a good reduction, by the following result.
Theorem 44 ([vzG85, Theorem 4.5]) Let f be a representation of a multi-
variate polynomial. Then there is a reduction from the problem of finding the
factorisation shape of f (Definition 90) to that of finding the factorisation shape
of bivariates, which is polynomial in the size and degree of f , for any of the sizes
in (2.8).
5.8.2 A Sparsity Result
We have already seen (section 2.3.7) that the complexity questions associated
to sparse polynomials are challenging, and indeed
xp � 1 = (x� 1)(xp�1 + xp�2 + · · ·+ x+ 1) (5.13)
shows that a sparse univariate polynomial can have completely dense factors.
There are two possible generalisations of (5.13) to multivariates:
(x1 . . . xn)
p � 1 =
(x1 . . . xn � 1)
�
(x1 . . . xn)
p�1 + (x1 . . . xn)
p�2 + · · ·+ x1 . . . xn + 1
�
;
(5.14)
n
Y
i=1
(x
p
i � 1) =
n
Y
i=1
(xi � 1)
!
n
Y
i=1
(x
p�1
i + x
p�2
i + · · ·+ xi + 1)
!
, (5.15)
If we write these equations as f = gh, and if we write tf for the number of
monomials in the expanded12 representation of f , then (5.14) has tf = 2 and
th = p, as in the univariate case, and (5.15) has tf = 2
n and th = p
n — in
particular if n = p, th is not polynomial in tf , being t
log2 log2 tf
f . Might there be
still worse examples, say where tf was polynomial in n and the degree, but th
was exponential. It has recently been shown that not.
Theorem 45 ([DdO14, Theorem 1]) If f , a polynomial in n variables, of
degree at most d in each variable, factors as f = gh then
tg = O
⇣
max
⇣
t
O(log tf log log tf )
f , d
O(log d)
⌘⌘
. (5.16)
5.8.3 The Leading Coe�cient Problem
It seems too good to be true, that the multivariate generalization of Hensel
lifting needed no new concepts. While this is true as it stands, there is a major
snag in practice with Algorithm 37, and that is the cost of imposing leading
coe�cients. In the univariate case, all this meant was that we had to lift very
slightly further, but in the multivariate case we are lifting factors of F , which
may well have many more terms than f . Solving this problem was (one of) the
major contributions of the EEZ algorithm proposed in [Wan78], which can best
232 CHAPTER 5. P -ADIC METHODS
Algorithm 38 (Wang’s EEZ Hensel Lifting)
Input: f 2 Z[x1, . . . , xn, x] squarefree and primitive (no integer common fac-
tor)
Output: Factorization of f over Z
Write f =
Pd
i=0 ci(x1, . . . , xn)x
i
Recursively, factor cd(x1, . . . , xn) as ⌦F
e1
1 . . . F
ek
k
Choose integers a1, . . . , an such that (see [Wan78, p. 1218] for feasibility)
1. cd(a1, . . . , an) 6= 0
2. f (0)(x) := f(a1, . . . , an, x) is square-free
3. Each Fi(a1, . . . , an) is divisible by a prime pi, not dividing any
Fj(a1, . . . , an) : j < i, ⌦ or cont(f
(0)).
Factor f (0)(x) = c
Qm
i=1 f
(0)
i (x)
[In practice, repeat the previous steps a few times, to ensure m minimal, and
hence reduce the possibility of bad reduction]
#Then f =
Qm
i=1 fi where fi(a1, . . . , an, x) = f
(0)
i (x).
Use the pi to distribute the Fi among the leading coe�cients of the fi,
to get hi but with the correct leading coe�cient
For k := 1 . . . n
#f(x1, . . . , xk�1, ak, . . . , an, x) =
Q
hi(x1, . . . , xk�1, ak, . . . , an, x)
#We know what the leading coe�cients are
Use a variant of Algorithm 37 to lift this to
f(x1, . . . , xk, ak+1, . . . , an, x) =
Q
hi(x1, . . . , xk, ak+1, . . . , an, x)
be described as “discovered leading coe�cients” rather than “imposed leading
coe�cients”.
Example 29 ([Wan78, p. 1218]) Consider13
f(x, y, z) = (4z2y4 + 4z3y3 � 4z4y2 � 4z5y)x6 + · · · .
The leading coe�cient factors as 4yz2(y + z)2(y � z). A suitable evaluation is
(y = �14, z = 3) when
F1 = y ! �14
|{z}
p1=7
;F2 = z ! 3
|{z}
p2=3
;F3 = y + z ! �11
|{z}
p3=11
;F4 = y � z ! �17
|{z}
p4=17
.
f(x,�14, 3) = f (0)1 f
(0)
2 f
(0)
3 = (187x
2� 23)(44x2 +42x+1)(126x2� 9x+28), so
f
(0)
1 has p3p4, f
(0)
2 has 4p3 and f
(0)
3 has p1p
2
2. Hence the leading coe�cients, are
12The theory and implementations of this chapter have all been in terms of recursive repre-
sentations, but the theory appears to be better expressed in terms of expanded representations.
13There is a typographical error: see http://staff.bath.ac.uk/masjhd/JHD-CA/
Wangp1220.html.
http://staff.bath.ac.uk/masjhd/JHD-CA/Wangp1220.html
http://staff.bath.ac.uk/masjhd/JHD-CA/Wangp1220.html
5.9. OTHER APPLICATIONS 233
in fact (y+z)(y�z), �4(y+z) and �yz2. The lift, with these leading coe�cients,
rather than the whole of 4z2y4 + 4z3y3 � 4z4y2 � 4z5y, imposed, works fine.
The polynomial f has 88 terms when expanded, whereas if we imposed leading
coe�cients in the style of algorithm 37, we would be lifting against an F with
530 terms.
5.9 Other Applications
5.9.1 Factoring Straight-LIne Programs
This is discussed in [Kal89b], with Monte Carlo algorithms. For Q[x1, . . . , xn],
his algorithm is polynomial in the degree, the size of the straight-line program
determining the input, the size of the numerators and common denominator
of the coe�cients and the logarithm of the error probability. Th eoutputs are
straight-line programs for the factors. These can then be converted into the
standard sparse representation by means of algorithms such as Proposition 9.
Forthermore, since these algorithms can be set to fail if asked to convert poly-
nomials with more than T terms, we have a polynomial-time algorithm that
returns the usual sparse format factors, or straight0lkine prorams if these would
have more than T terms in sparse format.
5.9.2 p-adic Greatest Common Divisors
The entire p-adic/Hensel construction is applicable to the computation of great-
est common divisors as well.
5.9.2.1 Univariate Greatest Common Divisors
Since it is practically as cheap to use the largest possible single word prime as it
is to use a small one, and the probability that p is bad seems to be proportional
to 1/p (Observation 7), we take one such large p (possibly a second just to check
that we don’t have bad reduction), and compute hp = gcd(fp, gp). This gives
us a factorization fp = hpfp. Since we are not sure of the leading coe�cient,
we will impose gcd(lc(f), lc(g)) as the leading coe�cient of hp and lc(f) as
that of fp. We then lift this (Algorithm 31 or 33, or hybrid: section 5.5.4) to
a factorization of gcd(lc(f), lc(g))f = h
(k)
p f
(k)
p for a suitable k (as in section
4.2.1), interpret h
(k)
p as a polynomial over the integers, make it primitive, and
check that it divides f and g.
5.9.2.2 Multivariate Greatest Common Divisors
Assume we have two polynomials f, g 2 R[y][x] (where typicallyR = Z[1, . . . , xn])
and we wish to compute their greatest common divisor. We may as well (The-
orem 6) assume that they are primitive with respect to x. We take a value v
(normally in Z or the base ring of R) for y (possibly a second just to check
234 CHAPTER 5. P -ADIC METHODS
that we don’t have bad reduction), and compute hy=v = gcd(fy=v, gy=v). This
gives us a factorization fy=v = hy=vfy=v. Since we are not sure of the leading
coe�cient, we will impose gcd(lcx(f), lcx(g)) as the leading coe�cient of hy=v
and lcx(f) as that of fy=v. We then lift this (generally Algorithm 37) to a
factorization of gcd(lcx(f), lc(g))f = h
(k)
y=vf
(k)
y=v for a suitable k, interpret h
(k)
y=v
as a polynomial in R[y][x], make it primitive, and check that it divides f and g.
5.9.3 p-adic Gröbner Bases
This section is largely taken from [Win88]. The importance of, and di�culties
with, moving from computations over Q to finite domains are described in
section 4.6. In particular we note the definition of bad reduction and the fact
(Lemma 11) that there are only finitely many primes of bad reduction.
Suppose we are given F as input, and wish to compute its Gröbner base
G. We can choose a prime p (it would certainly be a good idea to choose a p
not dividing any leading coe�cient of F ), and compute a Gröbner base G(1)
modulo p = p1. In the matrix formulation of Gröbner base theory (section
3.3.5), it follows that there exist matrices X, Y and R such that
X(1).F = G(1)
Y (1).G(1) = F
R(1).G(1) = 0
(mod p) (3.360)
where F and G(1) are the matrix versions of F and G(1). We might then be
tempted to lift these equations, by analogy with (5.9), but there is a snag.
Since G(1) is a Gröbner base, it reduces F to zero, and hence the monomials
needed to write Y (1).G(1) = F are at most those of F (or possibly lower ones that
don’t actually occur in a sparse representation of F ). Similarly R(1).G(1) = 0 is
the statement that all S-polynomials reduce to 0, and hence the highest degree
that any xi can need is its highest degree in G
(1). However, there is no such
tidy bound for X(1). Fortunately [Win88, Theorem 2] shows that we only need
to lift the other two, whose degrees are bounded.
So we consider
Y (2).G(2) = F
R(2).G(2) = 0
(mod p2) (5.17)
write G(2) = G(1) + pĜ(2) etc. as in (5.9), divide through by p and get
Y (1).Ĝ(2) + Ŷ (2).G(1) = F�Y
(1).G(1)
p
R̂(2).G(1) +R(1).Ĝ(2) = 0
(mod p) (5.18)
Just as (3.36) did not guarantee uniqueness, these equations don’t necessarily
have a unique solution. Indeed G is not unique, since we can multiply by any
rational (without a p part), so we need to impose that G and every G(k) is
monic. Then the Ĝ(2) part is unique [Win88, Theorem 4] as long as p is lucky.
This lifting process can be continued until we reach a su�ciently high power
of p, at which point we can convert the coe�cients from being modulo pk to Q
5.9. OTHER APPLICATIONS 235
by the techniques of Algorithm 25. But what is a su�ciently high power? This
is somewhat of a conundrum, unsolved in [Win88]. We can try taking a leaf
out of section 4.2.5, and say that, if the solution stabilises, we will verify if it is
correct.
Stabilisation This is harder to check than in section 4.2.5, since we are looking
for rational not integer coe�cients. Hence it is not correct to check that
Ĝ(k) = 0: generally it never will be. Instead, we have to check that the
reconstitutions by Algorithm 25 are equal. Doing this e�ciently, rather
than reconstructing completely and discovering that the two are not equal,
is an interesting programming task.
Verification We need to check two things: F
⇤!G0 and that G is a Gröbner
base, i.e. that every S(gi, gj)
⇤!G0.
Unluckiness of the prime p can manifest itself in one of three ways.
1. Failure to lift. [Win88, Theorem 4] only guarantees a lift (unique for every
Ĝ(k)) if p is lucky. In some sense, this is the best case, since we at least
know that p is unlucky.
2. Failure to stabilise — we just compute forever.
3. Failure to verify — if we terminate in a genuine Gröbner base G over
the rationals, but F does not reduce to 0, then indeed p must have been
unlucky. However, if the reconstructed G is not a Gröbner base over the
rationals, we do not know whether p was unlucky, or the stabilisation was
an accident and computing further will give the right result.
Open Problem 26 (p-adic Gröbner bases) Convert the above, and [Win88],
into a genuine algorithm for computing Gröbner bases p-adically. In particular,
is it possible to use the “majority voting” idea of [IPS11] (footnote 14 on page
199) to improve the chances of having good reduction.
5.9.4 p-adic determinants
[ABM99] describes a method for computing determinants which is “largely p-
adic”, in the sense that “most of” det(M) is computed via a p-adic solution
to some linear systems, the l.c.m. of whose denominators is typically det(M),
and generally a large factor D of det(M), and then computing det(M)/D by a
Chinese Remainder approach. The expected complexity is
O(n4 + n3(log n+ s)2),
where s is the length of the entries.
236 CHAPTER 5. P -ADIC METHODS
5.10 Conclusions
In this and the previous chapter, we have seen two di↵erent methods for com-
puting in “large” domains via “small” ones: the modular method (Figure 4.1)
and the p-adic/Hensel method (Figure 5.1). Both su↵er from “bad reduction”
in that the calculation in the small domain may not be a faithful representa-
tion of the calculation in the large domain. Although the p-adic/Hensel method
only needs one small domain, we may compute more than one to minimise the
probability of bad reduction. How do the two fare on various examples?
Linear Algebra Section 4.1 presented the modular approach for computing
the determinant and other tasks of linear algebra. [Dix82] presents a p-
adic approach for solving Ax = b which he claims to be somewhat faster
— O(n3 log2 n) rather than O(n4). However he admits that the modular
approach is probably better for determinants themselves: a belief contra-
dicted by [ABM99]. TO BE COMPLETED
gcd computation TO BE COMPLETED
Factorization Here p-adic methods are basically the only choice.
Gröbner bases Here the p-adic methods of Section 5.9.3 have so many un-
solved issues that the modular methods of Section 4.6 seem the only real-
istic possibility.
In sum, both have their unique strengths, and can also compete in several areas.
Chapter 6
Algebraic Numbers and
Functions
Definition 95 An algebraic number is a root of a polynomial with integer co-
e�cients, i.e. ↵ such that f(↵) = 0 : f 2 Z[t]. The set of all algebraic numbers
is denoted by A. An ↵ 2 A \ Q will be referred to as a non-trivial algebraic
number. If f is monic, we say that ↵ is an algebraic integer.
Allowing f 2 Q[t] gives us no advantage, since we can clear denominators
without a↵ecting whether f(↵) = 0. Of course, “monic” only makes sense over
Z[t].
Notation 30 Let A stand for the set of all algebraic numbers, so that Q ⇢
A ⇢ C.
Definition 96 An algebraic function is a root of a polynomial with polyno-
mial coe�cients, i.e. ↵ such that f(↵) = 0 : f 2 Z[x1, . . . , xn][t]. If ↵ /2
Q(x1, . . . , xn), we will say that ↵ is a non-trivial algebraic function.
As above, allowing f 2 Q(x1, . . . , xn)[t] gives us no advantage, since we can
clear denominators without a↵ecting whether f(↵) = 0.
In either case, if f is irreducible, there is no algebraic way of distinguishing
one root of f from another. For example,
p
2 ⇡ 1.4142 and �
p
2 ⇡ �1.4142 are
both roots of t2 � 2 = 0, but distinguishing them involves operations outside
the language of fields, e.g. by saying that
p
2 2 [1, 2] but �
p
2 2 [�2,�1]: see
Section 6.5.
Notation 31 Let f(t) be the polynomial defining ↵, and write f =
Pn
i=0 ait
i
with an 6= 0. Until Section 6.4, we will assume that f is irreducible.
Definition 97 If f in Definition 95 or 96 is irreducible and primitive, we say
that f is the minimal polynomial of ↵. Strictly speaking, we have only defined
f up to associates, but this is usually ignored in theory, though it can be tedious
237
238 CHAPTER 6. ALGEBRAIC NUMBERS AND FUNCTIONS
Figure 6.1: Non-candidness of algebraics
> (sqrt(2)+1)*(sqrt(2)-1);
1/2 1/2
(2 – 1) (2 + 1)
> expand(%);
1
in practice: see the description of the canonicalUnitNormal property in [DT90]
for a pragmatic solution.
It is perfectly possible to work with ↵ as if it were another variable, but replacing
↵n by
⇣
Pn�1
i=0 ai↵
i
⌘
/an. A simple example of this is shown in the Maple session
in Figure 6.1, which also demonstrates the lack of candidness (i.e. the expression
appears to contain
p
2, but doesn’t really) if we don’t expand.
Observation 17 Introduction of denominators of an can be very tedious in
practice, so several systems will work internally with monic f . For example,
Maple itself does not, but its high-performance algorithms [vHM02, vHM04] so.
However, mere use of a polynomial-style expand command, which gave us canon-
ical forms in the case of polynomials, is insu�cient to give us even normal forms
when we allow algebraic numbers, as seen in Figure 6.2. Here, simplify has
Figure 6.2: Algebraic numbers in the denominator
> (sqrt(2)+1)-1/(sqrt(2)-1);
1/2
2 + 1 – 1
——–
1/2
2 – 1
> expand(%);
1/2
2 + 1 – 1
——–
1/2
2 – 1
> simplify(%);
0
to “clear denominators”. This process, which may have seemed quite obscure
when it was introduced at school, is, from our present point of view, quite easy
6.1. REPRESENTATIONS OF FINITE FIELDS 239
to understand. Consider an expression
E :=
p(↵)
q(↵)
(6.1)
where p and q are polynomials in ↵, of degree less than n. If we apply the
extended Euclidean algorithm (Algorithm 5) to q(t) and f(t), we get h(t) =
gcd(f, q) and polynomials c, d such that cf + dq = h. Assuming h = 1 (as is
bound to be the case if f is irreducible), we see that
E =
p(↵)
q(↵)
=
d(↵)p(↵)
d(↵)q(↵)
=
d(↵)p(↵)
c(↵)f(↵) + d(↵)q(↵)
(since f(↵) = 0)
= d(↵)p(↵),
and this is purely a polynomial in ↵.
In the case of Figure 6.2, f(t) = t2� 1 and p(t) = 1, q(t) = t� 1. The gcd is
indeed 1, with f = �1, d = t+ 1. Hence 1p
2�1
=
p
2+1
(
p
2+1)(
p
2�1)
=
p
2 + 1. Such
is the simplicity of this approach that it is usual to insist that f is irreducible,
however in section 6.4 we will explain an alternative approach.
OOnce we have agreed to clear denominators, we are representing algebraic
numbers or functions as elements of Q(x1, . . . , xn)[↵ � 1, . . . ,↵m] where the
degrees in ↵i are less than the degree of the corresponding minimal polynomial.
In practice it may be more convenient to use a common denominator approach,
i.e. store an algebraic number as n/d where n 2 Z[x1, . . . , xn][↵ � 1, . . . ,↵m]
and d 2 Z[x1, . . . , xn] with no common factor between d and all the coe�cients
in n.
6.1 Representations of Finite Fields
If we are working in a finite field F , and the user explicitly calls for an algebraic
extension, e.g. ↵ := RootOf(x2 + x+ 1), then we are almost certainly going to
represent ↵ as above, noting that there is actually no need for a denominator
as we are working over a field.
But often, as in Figure 6.3, the user will just call for a finite field K := Fpn
without specifying the generator, since all finite fields of the same size pn are
isomorphic: in other words there is only one abstract field.
6.1.1 Additive Representation
The obvious thing to do is to proceed as Maple has done, and pick an irreducible
polynomial f of the right degree (assuring ourselves that it is irreducible modulo
240 CHAPTER 6. ALGEBRAIC NUMBERS AND FUNCTIONS
Figure 6.3: An unspecified field in Maple
> K:=GF(2,4);
4
K := Z[2] [T] /
> g:=K:-random();
3
g := 1 + T + T
> K:-‘*‘(g,g);
3
1 + T
p, and not just over the integers, of course).
Maple picks the polynomial at random, but it may be more e�cient to choose
a polynomial that is sparse, and also one where the terms other than xn are
of as low a degree as possible, to reduce the impact of carries. The last time
the author implemented finite fields, he essentially1 started xn + 1, xn + 2, . . . ,
xn+(p�1), xn+x+1, xn+x+2, . . . until he found an irreducible polynomial.
In this representation, addition takes n coe�cient operations, and multipli-
cation, at least if done näıvely, O(n2). Karatsuba-style multiplication (Excursus
B.3) may well be worthwhile if n is at all large2
If n is not prime, we might build the field in stages, e.g. if n = n1n2 (with
the ni relatively prime) we can writeK := Fp(↵,�) where ↵ and � have minimal
polynomials of degree n1, n2 respectively. This is generally only done if the user
asks for it, though there is no reason why a system couldn’t do it automatically.
6.1.2 Multiplicative representation
An alternative is to note that the multiplicative groups of a finite field (i.e. all
elements except 0) is always cyclic, and to pick, as well as f , a generator g of
that group, i.e. such that, while gp
n�1 = 1, no previous power of g is 1. See
Figure 6.4, where we have verified that g3 and g5 are not 1 (which is in fact
su�cient)
0 is then a special case, but all other elements � are just stored as the
integer m such that � = gm. Multiplication is then just one integer addition
(gm) · (gm0) = gm+m0 , with reduction if m+m0 � pn � 1.
The downside of this representation is that addition is more complicated,
and in general requires conversion to additive form (expensive) and back (very
expensive, known as the “discrete logarithm problem”). For large fields, the cost
1There were some optimisations to skip patently reducible polynomials
2A general consensus for integer multiplication is n � 16, so one would expect the same
to hold here, but the author knows of no study here. Since, once the field is built, one is
probably going to do many operations with the same n, it might be worth the field-building
operation doing some work to determine the best multiplication, since if n is not an exact
power of 2, the Karatsuba method needs padding or other adaptations.
6.2. REPRESENTATIONS OF ALGEBRAIC NUMBERS 241
Figure 6.4: Primitive elements in Maple
> g:=K:-PrimitiveElement();
3
g := 1 + T + T
> K:-‘^‘(g,3);
2 3
T + T
> K:-‘^‘(g,5);
2
T + T
of addition, if it is required, therefore rules out the multiplicative representation.
For smaller fields, though, there is a useful trick.
We first note that gm + gm
0
= gm
⇣
1 + gm
0�m
⌘
, so the general problem of
addition reduces to the problem of adding 1.
Definition 98 Define Zg, generally known as the Zech logarithm3 as the func-
tion such that gZg(m) = 1 + gm, with Zg(m) being a special symbol �1 if
gm = �1.
Then gm+gm
0
= gm+Zg(m
0�m): a subtraction, a table look up, and an addition
(with range reduction as in the case of multiplication).
6.2 Representations of Algebraic Numbers
The calculations in Figure 6.2 would have been valid whether
p
2 = 1.4142 . . . orp
2 = �1.4142 . . ., but often we care about which specific root of a polynomial
we mean. For quadratics with real roots there is no problem, we (generally
implicitly) use
p
n to mean ↵ such that ↵2 � n and ↵ � 0. Once we move
beyond quadratics, life is more complicated: as pointed out at (3.10), the three
real roots of x3 � x can only be computed via complex numbers. A slight
perturbation makes thinsg worse: the roots of x3 � x� 1
1000
, numerically
1.000499625,�0.001000001000,�.9994996245 (6.2)
3Named after Julius Zech, author of [Zec49].
242 CHAPTER 6. ALGEBRAIC NUMBERS AND FUNCTIONS
Figure 6.5: An evaluation of Maple’s RootOf construct
Digits := 100;
100
evalf(%);
1.00049962549918118457337015375160406664441926071961647122676591\
7631800401921686516946533704564198509
are given algebraically by
n
1
60
3
p
108 + 12 i
p
11999919 + 20 1
3
p
108+12 i
p
11999919
,
� 1
120
3
p
108 + 12 i
p
11999919� 10 1
3
p
108+12 i
p
11999919
�
i
p
3
20
✓
1
6
3
p
108 + 12 i
p
11999919� 200 1
3
p
108+12 i
p
11999919
◆
,
� 1
120
3
p
108 + 12 i
p
11999919� 10 1
3
p
108+12 i
p
11999919
+
i
p
3
20
✓
1
6
3
p
108 + 12 i
p
11999919� 200 1
3
p
108+12 i
p
11999919
◆
o
,
(6.3)
and these cannot be expressed in terms of real radicals, i.e. the i is intrinsic.
We can avoid this by means of a representation in terms of RootOf constructs:
RootOf(f(z), index=1), . . . ,RootOf(f(z), index=3), (6.4)
or by saying “near” which value these roots are, as well as the polynomial:
RootOf(f(z), 1.0), . . . ,RootOf(f(z),�1.0). (6.5)
We should be careful of the di↵erence between (6.2) and (6.5): the first specifies
three floating-point numbers, but the second specifies three precise algebraic
numbers, which can in theory (and in practice) be evaluated to arbitrary preci-
sion, as shown in Figure 6.5.
6.3 Factorisation with Algebraic Numbers
If we insist, as we do in this section, that the f in Definition 95 or 96 be irre-
ducible, the obvious question is “how do we ensure this?” If f 2 Z[x1, . . . , xn, t],
then the theory of Chapter 5 will factor f . However, suppose we have defined ↵
by f , and then wish to define � by g 2 Z[x1, . . . , xn,↵, t]. How do we ensure g is
irreducible? Indeed, even if g does not involve ↵ as such, how do we ensure that
g is irreducible in Z[x1, . . . , xn,↵, t], even if it is irreducible in Z[x1, . . . , xn, t]?
This is not just an abstract conundrum. Consider f = t2 � 2, g = t2 � 8.
Both are irreducible in Z[t]. But if ↵ is defined to be a root of f , then g factors
as (t� 2↵)(t+ 2↵), corresponding to the fact that
p
8 = 2
p
2.
6.4. THE D5 APPROACH TO ALGEBRAIC NUMBERS 243
For the rest of this section we will consider the case of factoring over algebaric
number fields, i.e. polynomials in Q[↵1, . . . ,↵m, x1, . . . , xn], where each ↵i is
defined by a minimal polynomial in Z[↵1, . . . ,↵i�1][t]. The more general case
of algebraic function fields is discussed in [DT81].
The first remark is that the methods of Section 5.3 extend to factoring over
algebraic extensions of the integers modulo p.
[Wan76, Len87]
TO BE COMPLETED
Open Problem 27 (Algebraic Numbers Reviewed) Reconsider the stan-
dard approach to factoring polynomials with algebraic number/function coe�-
cients, as described above, in the light of recent progress in factoring polynomials
with integer coe�cients, notably [vH02].
6.4 The D5 approach to algebraic numbers
The techniques of the previous section, while they do guarantee irreducibility
of the defining polynomials, are extremely expensive, and often overkill.
In fact, we have already seen4 a technique that does not require factoring,
and that is the Gianni–Kalkbrener algorithm (Algorithms 10, 11). If we consider
again the example at equation (3.47), we see that x satisfies 6�3×2�2×3+x5 =
0. Over Z, this already factors as (x2 � 2)(x3 � 3), and x3 � 3 is irreducible5
over Z[
p
2], so the five possible values of x are ↵ with ↵2 � 2 = 0, �↵, � with
�3 � 3 = 0, � with �2 + �� + �2 = 0 and �� � �. For each of these there is
corresponding value of y, but it is hard to believe that this is simpler than
⌦
x2 � 2, y2 � x
↵
[
⌦
x3 � 3, y � x
↵
. (3.48)
[DDDD85]
TO BE COMPLETED
TO BE COMPLETED[vHM16]
6.5 Distinguishing roots
In Section 3.5.5 we saw that there are essentially two ways of distinguishing the
real roots of a polynomial f .
interval As well as asserting that f(↵) = 0, quote an interval (a, b) such that
it is known that f has precisely one real root in (a, b). Thus
p
2 ⇡ 1.4142
would be described as (x2 � 2, (1, 2)) for example. Of course, it could
equally well be (x2�2, (0, 2)), or many other choices. But (x2�2, (�2, 2))
would be invalid (more than one root) and (x2 � 2, (2, 3)) would also be
invalid (no roots).
4This is not the historical development of the subject, for which see [DDDD85], but is more
pedagogic from our current point of view.
5Given that x3 � 3 is irreducible over Z, its irreducibility over Z[
p
2] follows from the fact
that the exponent, 2 and 3, are relatively prime. But this is a very ad hoc argument.
244 CHAPTER 6. ALGEBRAIC NUMBERS AND FUNCTIONS
Thom As well as asserting that f(↵) = 0, quote the signs (positive or negative)
of f 0(↵), f 00(↵) etc. Here
p
2 ⇡ 1.4142 would be described as (x2�2, f 0 >
0, f 00 > 0). Note that (x2 � 2, f 0 > 0, f 00 < 0) does not describe a root. If we want to describe the complex roots, the only known ways are variants of the interval method. box As well as asserting that f(↵) = 0, quote intervals (a, b) and (c, d) such that f has ony one root in the box whose corners are a+ ci, b+ ci, b+ di and a+ di. circle As well as asserting that f(↵) = 0, quote a point a + ci and a radius r such that ↵ is the only root of f with |↵� (a+ ci)| < r. The two representations are to some extend exchangeable, as every box is con- tained in a circle and vice versa. Chapter 7 Calculus Throughout this chapter we shall assume that we are in characteristic zero, and therefore all rings contains Z, and all fields contain Q. The emphasis in this chapter will be on algorithms for integration. Historically, the earliest attempts at integration in computer algebra [Sla61] were based on rules, and attempts to “do it like a human would”. These were rapidly replaced by algorithmic meth- ods, based on the systematisation, in [Ris69], of work going back to Liouville. Recently, attempts to get ‘neat’ answers have revived interest in rule-based ap- proaches, which can be surprisingly powerful on known integrals — see [JR10] for one recent approach. In general, though, only the algorithmic approach is capable of proving that an expression in unintegrable. 7.1 Introduction We defined (Definition 37) the formal derivative of a polynomial purely alge- braically, and observed (Proposition 2.3.6) that it satisfied the sum and product laws. We can actually make the whole theory of di↵erentiation algebraic as fol- lows. Definition 99 A di↵erential ring is a ring (Definition 8) equipped with an additional unary operation, referred to as di↵erentiation and normally written with a postfix 0, which satisfies two additional laws: 1. (f + g)0 = f 0 + g0; 2. (fg)0 = fg0 + f 0g. A di↵erential ring which is also a field (Definition 15) is referred to as a di↵er- ential field. Definition 37 and Proposition 2.3.6 can then be restated as the following result. 245 246 CHAPTER 7. CALCULUS Proposition 70 If R is any ring, we can make R[x] into a di↵erential ring by defining r0 = 0 8r 2 R and x0 = 1. Definition 100 In any di↵erential ring, an element ↵ with ↵0 = 0 is referred to as a constant. For any ring R, we write Rconst for the constants of R, i.e. Rconst = {r 2 R | r0 = 0}. We will use C to stand for any appropriate field of constants. We will revisit this definition in section 8.4. Proposition 71 In any di↵erential field ✓ f g ◆0 = f 0g � fg0 g2 . (7.1) The proof is by di↵erentiating f = g ⇣ f g ⌘ to get f 0 = g0 ✓ f g ◆ + g ✓ f g ◆0 and solving for ⇣ f g ⌘0 . (7.1) is generally referred to as the quotient rule and can be given a fairly tedious analytic proof in terms of ✏/�, but from our present point of view it is an algebraic corollary of the product rule. It therefore follows that K(x) can be made into a di↵erential field. Notation 32 (Fundamental Theorem of Calculus) (Indefinite) integration is the inverse of di↵erentiation, i.e. F = Z f , F 0 = f. (7.2) The reader may be surprised to see a “Fundamental Theorem” reduced to the status of a piece of notation, but from the present point of view, that is what it is. We shall return to this point in section 8.3. The reader may also wonder “where did dx go?”, but x is, from this point of view, merely that object such that x0 = 1, i.e. x = R 1. We should also note that we have no choice over the derivative of algebraic expressions1. Proposition 72 Let K be a di↵erential field, and ✓ be algebraic over K with p(✓) = 0 for some polynomial p = Pn i=0 aiz i 2 K[z]. Then K(✓) can be made into a di↵erential field in only one way: by defining ✓0 = � Pn i=0 a 0 i✓ i Pn i=0 iai✓ i�1 . (7.3) In particular, if the coe�cients of p are all constants, so is ✓. The proof is by formal di↵erentiation of p(✓) = 0. 1It would be more normal to write “algebraic functions” instead of “algebraic expressions”, but for reasons described in section 8.1 we reserve ‘function’ for specific mappings (e.g. C ! C), and the proposition is a property of di↵erentiation applied to formulae. 7.2. INTEGRATION OF RATIONAL EXPRESSIONS 247 7.2 Integration of Rational Expressions The integration of polynomials is trivial: Z n X i=0 aix i = n X i=0 1 i+ 1 aix i+1. (7.4) Since any rational expression f(x) 2 K(x) can be written as f = p+ q r with ⇢ p, q, r 2 K[x] deg(q) < deg(r) , (7.5) and p is always integrable by (7.4), we have proved the following (trivial) result: we will see later that its generalisations, Lemmas 12 and 13, are not quite so trivial. Proposition 73 (Decomposition Lemma (rational expressions)) In the notation of (7.5), f is integrable if, and only if, q/r is. q/r with deg(q) < deg(r) is generally termed a proper rational function, but, since we are concerned with the algebraic form of expressions in this chapter, we will say “proper rational expression.”. 7.2.1 Integration of Proper Rational Expressions In fact, the integration of proper rational expressions is conceptually trivial (we may as well assume r is monic, absorbing any constant factor in q): 1. perform a square-free decomposition (Definition 38) of r = Qn i=1 r i i; 2. factorize each ri completely, as ri(x) = Qni j=1(x� ↵i,j); 3. perform a partial fraction decomposition (Section 2.3.4) of q/r as q r = q Qn i=1 r i i = n X i=1 qi rii = n X i=1 ni X j=i i X k=1 �i,j,k (x� ↵i,j)k ; (7.6) 4. integrate this term-by-term, obtaining Z q r = n X i=1 ni X j=i i X k=2 ��i,j,k (k � 1)(x� ↵i,j)k�1 + n X i=1 ni X j=i �i,j,1log(x� ↵i,j). (7.7) From a practical point of view, this approach has several snags: 1. we have to factor r, and even the best algorithms from the previous chapter can be expensive; 248 CHAPTER 7. CALCULUS 2. we have to factor each ri into linear factors, which might necessitate the introduction of algebraic numbers to represent the roots of polynomials; 3. These steps might result in a complicated expression of what is otherwise a simple answer. To illustrate these points, consider the following examples. Z 5x4 + 60x3 + 255x2 + 450x+ 274 x5 + 15x4 + 85x3 + 225x2 + 274x+ 120 dx = log(x5 + 15x4 + 85x3 + 225x2 + 274x+ 120) = log(x+ 1) + log(x+ 2) + log(x+ 3) + log(x+ 4) + log(x+ 5) (7.8) is pretty straightforward, but adding 1 to the numerator gives Z 5x4 + 60x3 + 255x2 + 450x+ 275 x5 + 15x4 + 85x3 + 225x2 + 274x+ 120 dx = 5 24 log(x24 + 72x23 + · · ·+ 102643200000x+ 9331200000) = 25 24 log(x+ 1) + 5 6 log(x+ 2) + 5 4 log(x+ 3) + 5 6 log(x+ 4) + 25 24 log(x+ 5) (7.9) Adding 1 to the denominator is pretty straightforward, Z 5x4 + 60x3 + 255x2 + 450x+ 274 x5 + 15x4 + 85x3 + 225x2 + 274x+ 121 dx = log(x5 + 15x4 + 85x3 + 225x2 + 274x+ 121), (7.10) but adding 1 to both gives Z 5x4 + 60x3 + 255x2 + 450x+ 275 x5 + 15x4 + 85x3 + 225x2 + 274x+ 121 dx = 5 X ↵ ↵ ln ⇣ x+ 2632025698 289 ↵4 � 2086891452 289 ↵3+ 608708804 289 ↵2 � 4556915 17 ↵+ 3632420 289 ⌘ , (7.11) where ↵ = RootOf � 38569 z5 � 38569 z4 + 15251 z3 � 2981 z2 + 288 z � 11 � . (7.12) Hence the challenge is to produce an algorithm that achieves (7.8) and (7.10) simply, preferably gives us the second form of the answer in (7.9), but is still capable of solving (7.11). We might also wonder where (7.11) came from: the answer is at (7.26). 7.2.2 Hermite’s Algorithm The key to the method (usually attributed to Hermite [Her72], though infor- mally known earlier) is to rewrite equation (7.7) as Z q r = s1 t1 + Z s2 t2 , (7.13) 7.2. INTEGRATION OF RATIONAL EXPRESSIONS 249 where the integral on the right-hand resolves itself as purely a sum of logarithms, i.e. is the Pn i=1 Pni j=i �i,j,1log(x� ↵i,j) term. Then a standard argument from Galois theory shows that s1, t1, s2 and t2 do not involve any of the ↵i, i.e. that the decomposition (7.13) can be written without introducing any algebraic numbers. If we could actually obtain this decomposition without introducing these algebraic numbers, we would have gone a long way to solving objection 2 above. We can perform a square-free decomposition (Definition 38) of r as Q rii, and then a partial fraction decomposition to write q r = X qi rii (7.14) and, since each term is a rational expression and therefore integrable, it su�ces to integrate (7.14) term-by-term. Now ri and r 0 i are relatively prime, so, by Bezout’s Identity (2.13), there are polynomials a and b satisfying ari + br 0 i = 1. Therefore Z qi rii = Z qi(ari + br 0 i) rii (7.15) = Z qia ri�1i + Z qibr 0 i rii (7.16) = Z qia ri�1i + Z (qib/(i� 1))0 ri�1i � ✓ qib/(i� 1) ri�1i ◆0 (7.17) = � ✓ qib/(i� 1) ri�1i ◆ + Z qia+ (qib/(i� 1))0 ri�1i , (7.18) and we have reduced the exponent of ri by one. When programming this method one may need to take care of the fact that, while qi ri i is a proper rational expres- sion, qib ri�1 i may not be, but the excess is precisely compensated for by the other term in (7.18). Hence, at the cost of a square-free decomposition and a partial fraction de- composition, but not a factorization, we have found the rational part of the integral, i.e. performed the decomposition of (7.13). In fact, we have done somewhat better, since the R s2 t2 term will have been split into summands cor- responding to the di↵erent ri. 7.2.3 The Ostrogradski–Horowitz Algorithm Although quite simple, Hermite’s method still needs square-free decomposition and partial fractions. Horowitz [Hor69, Hor71] therefore proposed to computer algebraists the following method, which was in fact already known [Ost45], but largely forgotten2 in the west. It follows from (7.18) that, in the notation of 2The author had found no references to it, but apparently it had been taught under that name. 250 CHAPTER 7. CALCULUS (7.14) t1 = Q ri�1i . Furthermore every factor of t2 arises from the ri, and is not repeated. Hence we can choose t1 = gcd(r, r 0) and t2 = r/t1. (7.19) Having done this, we can solve for the coe�cients in s1 and s2, and the resulting equations are linear in the unknown coe�cients. More precisely, the equations become q = s01 r ti � s1 t01t2 t1 + s2t1, (7.20) where the polynomial divisions are exact, and the linearity is now obvious. The programmer should note that s1/t1 is guaranteed to be in lowest terms, but s2/t2 is not (and indeed will be 0 if there is no logarithmic term). 7.2.4 The Trager–Rothstein Algorithm Whether we use the method of section 7.2.2 or 7.2.3, we have to integrate the logarithmic part. (7.8)–(7.11) shows that this may, but need not, require algebraic numbers. How do we tell? The answer is provided by the following observation3 [Rot76, Tra76]: if we write the integral of the logarithmic part as P ci log vi, we can determine the equation satisfied by the ci, i.e. the analogue of (7.12), by purely rational computations. So write Z s2 t2 = X ci log vi, (7.21) where we can assume4: 1. s2 t2 is in lowest terms; 2. the vi are polynomials (using log f g = log f � log g); 3. the vi are square-free (using log Q f ii = P i log fi); 4. the vi are relatively prime (using c log pq+d log pr = (c+d) log p+c log q+ d log r); 5. the ci are all di↵erent (using c log p+ c log q = c log pq); 6. the ci generate the smallest possible extension of the original field of co- e�cients. (7.21) can be rewritten as s2 t2 = X ci v0i vi . (7.22) 3As happens surprisingly often in computer algebra, this was a case of simultaneous dis- covery. 4We are using “standard” properties of the log operator here without explicit justification: they are justified in Section 7.3, and revisited in Section 8.6. 7.2. INTEGRATION OF RATIONAL EXPRESSIONS 251 Hence t2 = Q vi and, writing uj = Q i 6=j vi, we can write (7.22) as s2 = X civ 0 iui. (7.23) Furthermore, since t2 = Q vi, t 0 2 = P v0iui. Hence vk = gcd(0, vk) = gcd ⇣ s2 � X civ 0 iui, vk ⌘ = gcd (s2 � ckv0kuk, vk) since all the other ui are divisible by vk = gcd ⇣ s2 � ck X v0iui, vk ⌘ for the same reason = gcd (s2 � ckt02, vk) . But if l 6= k, gcd (s2 � ckt02, vl) = gcd ⇣ X civ 0 iui � ck X v0iui, vl ⌘ = gcd (clv 0 lul � ckv0lul, vl) since all the other ui are divisible by vl = 1. Since t2 = Q vl, we can put these together to deduce that vk = gcd(s2 � ckt02, t2). (7.24) Given ck, this will tell us vk. But we can deduce more from this: the ck are precisely those numbers � such that gcd(s2 � �t02, t2) is non-trivial. Hence we can appeal to Proposition 76 (page 293), and say that � must be such that P (�) := Resx(s2 � �t02, t2) = 0. (7.25) If t2 has degree n, P (�) is the determinant of a 2n � 1 square matrix, n of whose rows depend linearly on �, and thus is a polynomial of degree n in � (strictly speaking, this argument only shows that it has degree at most n, but the coe�cient of �n is Res(t02, t2) 6= 0). Algorithm 39 (Trager–Rothstein) Input: s2, t2 2 K[x] relatively prime with degx(s2) < degx(t2), t2 squarefree Output: A candid5 expression for R s2/t2dx 5In the sense of Definition 6: every algebraic extension occurring in the expression is necessary. 252 CHAPTER 7. CALCULUS P (�) := Res(s2 � �t02, t2, x) Write P (�) = Q i Qi(�) i (square-free decomposition) R := 0 for i in 1 . . . such that Qi 6= 1 vi := gcd(s2 � �t02, t2) where Qi(�) = 0 #deg(vi) = i R := R+ X � a root of Qi � log vi(x) return R An alternative formulation is given in [LR90], with an important clarification in [Mul97]. In practice roots of a Qi which are rational, or possibly even quadratic over the rationals, are normally made explicit rather than represented as RootOf constructs. This accounts for answers such as the following: R 3 x3�2 x2 (x2�2)(x3�x�3)dx = � ln � x2 � 2 � + X r=RootOf(z3�z�3) r (3 + 2 r) ln (x� r) 3 r2 � 1 . The integrands in (7.8–7.11) are all special cases of Z 5x4 � 60x3 + 255x2 � 450x+ a x5 � 15x4 + 85x3 � 225x2 + 274x� bdx (7.26) whose integral is, by Algorithm 39, X ↵ � 5↵4 � 60↵3 + 255↵2 � 450↵+ a � ln (x� ↵) 5↵4 � 60↵3 + 255↵2 � 450↵+ 274 (7.27) where ↵ = RootOf � z 5 � 15 z 4 + 85 z 3 � 225 z 2 + 274 z � b � . (7.11) is of this form, but (7.8–7.10) show how Algorithm 39 can produce more candid expres- sions where available, and indeed guarantee not to involve any unnecessary RootOf constructs. The same process also leads to Z 1 x2 + 1 dx = i 2 (ln (1� ix)� ln (1 + ix)) , (7.28) at which point the reader might complain “I asked to integrate a real function, but the answer is coming back in terms of complex numbers”. The answer is, of course, formally correct : di↵erentiating the right-hand side of (7.28) yields i 2 ✓ �i 1� ix � i 1 + ix ◆ = i 2 ✓�i(1 + ix 1 + x2 � i(1� ix) 1 + x2 ◆ = 1 1 + x2 : the issue is that the reader, interpreting the symbols log etc. as the usual functions of calculus, is surprised. This question of interpretation as functions R ! R or C ! C, will be taken up in section 8.5. In the next section we will give a totally algebraic definition of log etc. 7.3. THEORY: LIOUVILLE’S THEOREM 253 7.3 Theory: Liouville’s Theorem Definition 101 Let K be a field of expressions. The expression ✓ is an ele- mentary generator over K if one of the following is satisfied: (a) ✓ is algebraic over K, i.e. ✓ satisfies a polynomial equation with coe�cients in K; (b) ✓ (assumed to be nonzero)6 is an exponential over K, i.e. there is an ⌘ in K such that ✓0 = ⌘0✓, which is only an algebraic way of saying that ✓ = exp ⌘; (c) ✓ is a logarithm over K, i.e. there is an ⌘ in K such that ✓0 = ⌘0/⌘, which is only an algebraic way of saying that ✓ = log ⌘. In the light of this definition, (7.28) would be interpreted as saying Z 1 x2 + 1 dx = i 2 (✓1 � ✓2) , (7.28 restated) where ✓01 = �i 1�ix and ✓ 0 2 = i 1+ix . We should note that, if ✓ is a logarithm of ⌘, then so is ✓+c for any constant (Definition 100) c. Similarly, if ✓ is an exponential of ⌘, so is c✓ for any constant c, including the case c = 0, which explains the stipulation of nonzeroness in Definition 101(b). A consequence of these definitions is that log and exp satisfy the usual laws “up to constants”. log Suppose ✓i is a logarithm of ⌘i. Then (✓1 + ✓2) 0 = ✓01 + ✓ 0 2 = ⌘01 ⌘1 + ⌘02 ⌘2 = ⌘01⌘2 + ⌘1⌘ 0 2 ⌘1⌘2 = (⌘1⌘2) 0 ⌘1⌘2 , and hence ✓1+✓2 is a logarithm of ⌘1⌘2, a rule normally expressed as (but see the discussion on page 277 for what happens when we try to interpret log as a function C! C) log ⌘1 + log ⌘2 = log(⌘1⌘2). (7.29) Similarly ✓1� ✓2 is a logarithm of ⌘1/⌘2 and n✓1 is a logarithm of ⌘n1 (for n 2 Z: we have attached no algebraic meaning to arbitrary powers). 6This clause is not normally stated, but is important in practice: see the statement about “up to a constant” later. 254 CHAPTER 7. CALCULUS exp Suppose now that ✓i is an exponential of ⌘i. Then (✓1✓2) 0 = ✓01✓2 + ✓1✓ 0 2 = ⌘01✓1✓2 + ✓1⌘ 0 2✓2 = (⌘1 + ⌘2) 0 (✓1✓2) and hence ✓1✓2 is an exponential of ⌘1 + ⌘2, a rule normally expressed as exp ⌘1 exp ⌘2 = exp(⌘1 + ⌘2). (7.30) Similarly ✓1/✓2 is an exponential of ⌘1�⌘2 and ✓n1 is an exponential of n⌘1 (for n 2 Z: we have attached no algebraic meaning to arbitrary powers). (1) Suppose ✓ is a logarithm of ⌘, and � is an exponential of ✓. Then �0 = ✓0� = ⌘0 ⌘ �, so �0 � = ⌘0 ⌘ = ✓0, and ✓ is a logarithm of �, as well as � being an exponential of ✓. (2) Suppose now that ✓ is an exponential of ⌘, and � is a logarithm of ✓. Then �0 = ✓0 ✓ = ⌘0✓ ✓ = ⌘0, so ⌘ and � di↵er by a constant. But �, being a logarithm, is only defined up to a constant. (1)+(2) These can be summarised by saying that, up to constants, log and exp are inverses of each other. Definition 102 Let K be a field of expressions. An overfield K(✓1, . . . , ✓n) of K is called a field of elementary expressions over K if every ✓i is an elementary generator over K(✓1, . . . , ✓i�1). A expression is elementary over K if it belongs to a field of elementary expressions over K. If K is omitted, we understand C(x): the field of rational expressions. For example, the expression exp(expx) can be written as elementary over K = Q(x) by writing it as ✓2 2 K(✓1, ✓2) where ✓01 = ✓1, so ✓1 is elementary over K, and ✓02 = ✓ 0 1✓2, and so ✓2 is elementary over K(✓1). Observation 18 Other functions can be written this way as well. For example, if ✓0 = i✓ (where i2 = �1), then � = 1 2i (✓ � 1/✓) is a suitable model for sin(x), as in the traditional sinx = 1 2i � eix � e�ix � . Note that �00 = ��, as we would hope. 7.3. THEORY: LIOUVILLE’S THEOREM 255 From this point of view, the problem of integration, at least of elementary expressions, can be seen as an exercise in the following paradigm. Algorithm 40 (Integration Paradigm) Input: an elementary expression f in x Output: An elementary g with g0 = f , or failure Find fields C of constants, L of elementary expressions over C(x) with f 2 L if this fails then error "integral not elementary" Find an elementary overfield M of L, and g 2M with g0 = f if this fails then error "integral not elementary" else return g This looks very open-ended, and much of the rest of this chapter will be de- voted to turning this paradigm into an algorithm: one that when it returns "integral not elementary" has actually proved this fact. We will see later on that the poly-logarithm function and the erf function (defined as erf x = R e�x 2 up to a constant multiple) are not elementary, and can be proved so this way. Other examples of non-elementary functions are: W the Lambert W function, the solution of W (z) exp(W (z)) = z [CGH+96], proved non-elementary in [BCDJ08]; ! the Wright ! function, the solution of !(z) + ln!(z) = z [CJ02], proved non-elementary in [BCDJ08]. 7.3.1 Liouville’s Principle The first question that might come to mind is that the search for M looks pretty open-ended. Here we are helped by the following result. Theorem 46 (Liouville’s Principle) Let f be a expression from some ex- pression field L. If f has an elementary integral over L, it has an integral of the following form: Z f = v0 + n X i=1 ci log vi, (7.31) where v0 belongs to L, the vi belong to L̂, an extension of L by a finite number of constants algebraic over Lconst, and the ci belong to L̂ and are constant. The proof of this theorem (see, for example, [Rit48]), while quite subtle in places, is basically a statement that the only new expression in g which can disappear on di↵erentiation is a logarithm with a constant coe�cient. In terms of the integration paradigm (Algorithm 40), this says that we can restrict our search for M to M of the form L(c1, . . . , ck, v1, . . . , vk) where the ci are algebraic over Lconst and the vi are logarithmic over L(c1, . . . , ck). 256 CHAPTER 7. CALCULUS Another way of putting this is to say that, if f has an elementary integral over L, then f has the following form: f = v00 + n X i=1 civ 0 i vi . (7.32) 7.3.2 Finding L A question that might seem trivial is the opening line of Algorithm 40: “Find a field C of constants, and a field L of elementary expressions over C(x) with f 2 L”. From an algorithmic point of view, this is not so trivial, and indeed is where many of the theoretical di�culties lie. We can distinguish three major di�culties. 1. Hidden constants. It is possible to start from a field C, make various elementary extensions (Definition 101) of C(x) to create L, but have Lconst be strictly larger than C. Consider, for example, L = Q(x, ✓1, ✓2, ✓3) where ✓01 = ✓1, ✓ 0 2 = 2x✓2 and ✓ 0 3 = (2x+ 1)✓3. Then ✓ ✓1✓2 ✓3 ◆0 = ✓3 (✓1✓2) 0 � ✓1✓2✓03 ✓23 = ✓3✓ 0 1✓2 + ✓3✓1✓ 0 2 � ✓1✓2✓03 ✓23 = ✓1✓2✓3 + 2x✓1✓2✓3 � (2x+ 1)✓1✓2✓3 ✓23 = 0, so ✓1✓2 ✓3 is, unexpectedly, a constant c, and we should be considering Q(c)(x, ✓1, ✓2), with ✓3 = c✓1✓2. This is perhaps not so surprising if we give the ✓i their conventional meanings as exp(x) etc., and write L = Q(x, exp(x), exp(x2), exp(x2 + x)), where we can clearly write exp(x2 + x) = exp(x2) exp(x). Of course, ✓1 might equally well be 100 exp(x), etc., so all we can deduce (in the language of di↵erential algebra) is that the ratio is a constant c, not necessarily that c = 1. Equally, we could consider L = Q(x, ✓1, ✓2, ✓3) where ✓ 0 1 = 1 x�1 , ✓ 0 2 = 1 x�1 and ✓03 = 2x x2�1 . Then (✓1 + ✓2 � ✓3)0 = ✓01 + ✓02 � ✓03 = 1 x� 1 + 1 x� 1 � 2x x2 � 1 = 0, and again we have a hidden constant c = ✓1 + ✓2 � ✓3, and we should be considering Q(c)(x, ✓1, ✓2), with ✓3 = ✓1 + ✓2 � c. Again, this is not so surprising if we give the ✓i their conventional meanings as log(x� 1) etc., 7.3. THEORY: LIOUVILLE’S THEOREM 257 where we can clearly write log(x2�1) = log(x�1)+log(x+1). Of course, ✓1 might equally well be 100 + log(x � 1), etc., so all we can deduce (in the language of di↵erential algebra) is that ✓1 + ✓2 � ✓3 is a constant c, not necessarily that c = 0. 2. Hidden algebraics. It is possible to start from a field C, make k exponential and logarithmic extensions (Definition 101) of C(x) to create L, but have the transcendence degree of L over C(x) be less than k, i.e. for there to be unexpected algebraic elements of L, where we had thought they were transcendental. The obvious example is that p x = exp( 1 2 log x), but there are more subtle ones, such as the following variant of the exponential example from the previous item. Consider L = Q(x, ✓1, ✓2, ✓3) where ✓01 = ✓1, ✓ 0 2 = 2x✓2 and ✓ 0 3 = (2x+ 1 2 )✓3. Then ✓ ✓1✓ 2 2 ✓23 ◆0 = ✓3 � ✓1✓ 2 2 �0 � 2✓1✓22✓03 ✓33 = ✓3✓ 0 1✓ 2 2 + ✓3✓1✓ 0 2✓ 2 � 2✓1✓2✓03 ✓33 = ✓1✓ 2 2✓3 + 4x✓1✓ 2 2✓3 � 2(2x+ 12 )✓1✓22✓3 ✓33 = 0, so again we have an unexpected constant c. The correct rewriting now is Q(c)(x, ✓1, ✓2, ✓3), with ✓3 = p c✓1✓ 2 2. 3. Ine↵ective constants. The previous two di�culties, have led to the in- troduction of “new” constants: what are these? Their values arise from the translation of the langauge of functions R ! R (or C ! C) to the language of di↵erential algebra. We deduced a constant that might be 1, or might be 100, but equally it might be e, as when the user has both ex and ex+1 in an expression, when we have to transform ex+1 ! e · ex (or ex ! 1 e · ex+1). This doesn’t seem to be a problem: e is well-known to be transcendental, so we can e↵ectively regard it as a new indeterminate. [Ric68] [Ax71] TO BE COMPLETED The last di�culty is inherent in mathematics, while the first two can be seen as failures of candidness (Definition 6), and are addressed in the next section. 7.3.3 Risch Structure Theorem [Ris79] TO BE COMPLETED 7.3.4 Overview of Integration Notation 33 Throughout sections 7.4–7.7, we assume that we are given an integrand f 2 L = C(x, ✓1, . . . , ✓n), where Lconst = C is an e↵ective field of con- stants, each ✓i is an elementary generator (Definition 101) over C(x, ✓1, . . . , ✓i�1) 258 CHAPTER 7. CALCULUS and, if ✓i in given as an exponential or logarithmic generator, then ✓i is actually transcendental over C(x, ✓1, . . . , ✓i�1). Ideally, the solution of the integration problem would then proceed by induc- tion on n: we assume we can integrate in K = C(x, ✓1, . . . , ✓n�1), and reduce integration in L to problems of integration in K. It was the genius of Risch [Ris69] to realise that a more sophisticated approach is necessary. Definition 103 For a di↵erential field L, the elementary integration problem for L is to find an algorithm which, given an element f of L, finds an elementary overfield M of L, and g 2M with g0 = f , or proves that such M and g do not exist. Definition 104 For a di↵erential field L, the elementary Risch di↵erential equation problem for L is to find an algorithm which, given elements f and g of L (with f such that exp( R f) is a genuinely new expression, i.e., for any non-zero F with F 0 = fF , M = L(F ) is transcendental over L and has Mconst = Lconst), finds y 2 L with y0 + fy = g, or proves that such a y does not exist. We write RDE(f, g) for the solution to this problem. The reader might object “but I can solve this by integrating factors!”. Indeed, if we write y = z exp( R �f), we get ✓ z exp( Z �f) ◆0 + fz exp( Z �f) = g z0 exp( Z �f)� fz exp( Z �f) + fz exp( Z �f) = g which simplifies to z0 exp( Z �f) = g and hence z0 = g exp( R f), z = R g exp( R f) and y = exp( Z �f) Z ⇤ g exp( Z f). (7.33) However, if we come to apply the theory of section 7.5 to the principal integral (marked R ⇤) of (7.33), we find the integration problem reduces to solving the original y0 + fy = g. Hence this problem must be attacked in its own right, which we do in section 7.7 for the base case f, g 2 C(x). Theorem 47 (Risch Integration Theorem) Let L = C(x, ✓1, . . . , ✓n), where Lconst = C is an e↵ective field of constants, each ✓i is an elementary generator (Definition 101) over C(x, ✓1, . . . , ✓i�1) and, if ✓i is given as an exponential or logarithmic generator, then ✓i is actually transcendental over C(x, ✓1, . . . , ✓i�1). Then: (a) we can solve the elementary integration problem for L; 7.4. INTEGRATION OF LOGARITHMIC EXPRESSIONS 259 (b) we can solve the elementary Risch di↵erential equation problem for L. Here the proof genuinely is by induction on n, and, when n 6= 0, the case satisfied by ✓n in Definition 101. (a) n = 0 This was treated in section 7.2. (b) n = 0 This will be treated in section 7.7. TheRisch induction hypothesis then is that both parts hold for C(x, ✓1, . . . , ✓n�1), and we prove them for C(x, ✓1, . . . , ✓n). (a) ✓n logarithmic This will be treated in section 7.4. (b) ✓n logarithmic See [Dav85a]. (a) ✓n exponential This will be treated in section 7.5. (b) ✓n exponential See [Dav85a]. (a) ✓n algebraic This will be treated in section 7.6. (b) ✓n algebraic See [Bro90, Bro91]. The case n = 1, i.e. algebraic expressions in C(x, y) with y algebraic over C(x), was solved in [Dav84]. 7.4 Integration of Logarithmic Expressions Let ✓ = ✓n be a (transcendental) logarithm over K = C(x, ✓1, . . . , ✓n�1). Lemma 12 (Decomposition Lemma (logarithmic)) f 2 K(✓) can be writ- ten uniquely as p+ q/r, where p, q and r are polynomials of K[✓], q and r are relatively prime, and the degree of q is less than that of r. If f has an elementary integral over K, then p and q/r each possess an elementary integral over K. Proof. By Liouville’s Principle (Theorem 46), if f is integrable, it is of the form f = v00 + n X i=1 civ 0 i vi , (7.32 bis) where v0 2 K(✓), ci 2 C, and vi 2 K(c1, . . . , cn)[✓]. Write v0 = p0 + q0r0 , where p0, q0, r0 2 K[✓] with deg(q0) < deg(r0), and re-arrange the vi such that v1, . . . , vk 2 K(c1, . . . , cn), but vk+1, . . . , vn genuinely involve ✓, and are monic. Then we can re-arrange (7.32 bis) as p+ q r = p00 + k X i=1 civ 0 i vi | {z } in K(c1, . . . , cn)[✓] + ✓ q0 r0 ◆0 + n X i=k+1 civ 0 i vi | {z } proper rational expression , (7.34) and the decomposition of the right-hand side proves the result. 260 CHAPTER 7. CALCULUS This means that it is su�cient to integrate the polynomial part (which we will do in Algorithm 41) and the rational part (which we will do in Algorithm 42) separately, and that failure in either part indicates that the whole expres- sion does not have an elementary integral. In other words, there is no cross- cancellation between these parts. 7.4.1 The Polynomial Part Let us turn first to the polynomial part. Assume that p = Pn i=0 ai✓ i and p0 = Pm i=0 bi✓ i. The polynomial parts of equation (7.34) then say that n X i=0 ai✓ i = m X i=0 b0i✓ i + m X i=0 ibi✓ 0✓i�1 + k X i=1 civ 0 i vi | {z } independent of ✓ . (7.35) Hence n = m, except for the special case n = m � 1 and bm constant. If we consider coe�cients of ✓n (assuming n > 0) we have
an = b
0
n + (n+ 1)bn+1✓
0.
We can integrate this formally (recalling that bn+1 is constant) to get
Z
an = bn + (n+ 1)bn+1✓. (7.36)
But an 2 K, and, by the Risch induction hypothesis (page 259), we have an
integration algorithm for K. In fact, not any integral will do: we want an
answer in K itself, apart possibly from a new logarithmic term of ✓, which will
determine bn+1. If
R
an contained any other logarithms, then multiplying them
by ✓n would give us a new logarithm with a non-constant coe�cient, which is
not allowed by Liouville’s Principle (Theorem 46).
Hence the contribution to
R
p is bn✓
n + bn+1✓
n+1. However,
�
bn✓
n + bn+1✓
n+1
�0
= an✓
n + nbn✓
0✓n�1, (7.37)
so we should subtract nbn✓
0 from an�1. Of course, bn is only determined “up
to a constant of integration bn✓”. When we come to integrate an�1, we may
get a term nbn✓, which determines this. The process proceeds until we come
to integrate a0, when any new logarithms are allowed, and the constant of
integration here is that of the whole integration. The corresponding algorithm
is given in Figure 7.1.
7.4.2 The Rational Expression Part
Now for the rational expression part, where we have to integrate a proper ratio-
nal expression, and the integral will be a proper rational expression plus a sum
7.4. INTEGRATION OF LOGARITHMIC EXPRESSIONS 261
Figure 7.1: Algorithm 41: IntLog–Polynomial
Algorithm 41 (IntLog–Polynomial)
Input: p =
Pn
i=0 ai✓
i 2 K[✓].
Output: An expression for
R
pdx, or failed if not elementary
Ans:=0
for i := n, n� 1, . . . , 1 do
ci :=
R
aidx # integration in K: (7.36)
if ci =failed or ci /2 K[✓]
return failed
Write ci = bi + (i+ 1)bi+1✓: bi 2 K and bi+1 constant
Ans:=Ans+bi✓
i + bi+1✓
i+1
ai�1 := ai�1 � i✓0bi # correction from (7.37)
c0 :=
R
a0dx # integration in K
if c0 =failed
return failed
Ans:=Ans+c0
of logarithms with constant coe�cients — see (*) in Algorithm 42. The proper
rational expression part is determined by an analogue of Hermite’s algorithm,
and (7.18) is still valid.
The Trager–Rothstein algorithm is still valid, with that additional clause
that the roots of P (�), which don’t depend on ✓ since P (�) is a resultant,
actually be constants.
Example 30 To see why this is necessary, consider
R
1
log xdx. Here q1 = 1,
r1 = ✓, and P (�) = Res(1��/x, ✓, ✓) = 1��/x (made monic, i.e. ��x). This
would suggest a contribution to the integral of x log log x, which indeed gives
log x as one term on di↵erentiation, but also a term of log log x, which is not
allowed, since it is not present in the integrand.
Note that
R
1
x log xdx gives P = ��1 and an integral of log log x, which is correct.
7.4.3 Conclusion of Logarithmic Integration
There are essentially three ways in which an expression whose “main variable”
is a logarithmic ✓, i.e. which is being written in K(✓), can fail to have an
elementary integral.
1. In Algorithm 42, some P (�), whose roots should be the coe�cients of
the logarithms, turns out to have non-constant roots. Again, this would
violate Liouville’s principle. Example 30 is a classic case of this.
262 CHAPTER 7. CALCULUS
Figure 7.2: Algorithm 42: IntLog–Rational Expression
Algorithm 42 (IntLog–Rational Expression)
Input: s2, t2 2 K[✓] relatively prime, degs2 < degt2.
Output: An expression for
R
s2/t2dx, or failed if not elementary
Ans:=0
Square-free decompose t2 as
Qk
i=1 r
i
i
Write
s2
t2
=
k
X
i=1
qi
rii
for i := 1, . . . , k do
Find ai, bi such that airi + bir
0
i = 1
for j := i, i� 1, . . . , 2 do
Ans:=Ans� qibi
(j � 1)rj�1i
#(7.18)
qi := qiai + (qibi/(j � 1))0
P (�) := Res(qi � �r0i, ri, ✓) (made monic)
if P has non-constant coe�cients (*)
return failed
Write P (�) =
Q
j Qj(�)
j (square-free decomposition)
for j in 1 . . . such that Qj 6= 1
vj := gcd(qi � �r0i, ri) where Qj(�) = 0 #deg(vj) = j
Ans:=Ans+
X
� a root of Q(j)
� log vj(x)
7.5. INTEGRATION OF EXPONENTIAL EXPRESSIONS 263
Notation 34 It is usual to define
R
1
log x
to be the logarithmic integral
function li(x), but this is not an elementary function in the sense of Def-
inition 102.
2. In Algorithm 41, some ai 2 K (i > 0) may have an elementary integral,
but one that involves ‘new’ logarithms other than ✓, which would violate
Liouville’s Principle.
Example 31 Consider
R
1
x
log(x+1) with ✓ being log(x+1). Then a1 =
1
x
, and
R
a1 = log(x), which is not in the original field. If we allow this,
and suggest that the integral has a term of log(x) log(x+1), its derivative
also has a term of 1
x+1
log(x), and we seem to be trapped in a vicious cycle.
3. In Algorithm 41, some ai 2 K may itself fail to have an elementary in-
tegral: this failure cannot be compensated for elsewhere, by Lemma 12.
Example 32 Consider
R
1
log x
log(x + 1) with ✓ being log(x + 1). Then
a1 =
1
log x
, and, as we have seen in Example 30, this does not have an
elementary integral.
Even if we use the logarithmic integral li(x) here, and claim that
R
1
log x
=
li(x), we get a term in the integral of li(x) log(x+1), whose derivative also
has a term of li x
x+1
, and we now need a theory of integrating li-involving
functions, which is beyond the scope of this book and is, as far as the author
knows, unsolved (note that [Che86] only considers elementary integrands).
7.5 Integration of Exponential Expressions
Throughout this section, we let ✓ = ✓n be a (transcendental) exponential over
K = C(x, ✓1, . . . , ✓n�1), so that ✓0 = ⌘0✓. We should note that this choice
is somewhat arbitrary, since ✓�1 = ✓ satisfies ✓
0
= �⌘0✓ and K(✓) ⌘ K(✓).
Hence negative powers of ✓ are just as legitimate as positive powers, and this
translates into a di↵erence in the next result: rather than writing expressions as
“polynomial + proper rational expressions”, we will make use of the following
concept.
Definition 105 A generalised (or Laurent) polynomial in ✓ over K is a sum
Pn
i=�m ai✓
i with ai 2 K.
Lemma 13 (Decomposition Lemma (exponential)) f 2 K(✓) can be writ-
ten uniquely as p+q/r, where p is a Laurent polynomial, q and r are polynomials
of K[✓] such that ✓ does not divide r, q and r are relatively prime, and the degree
of q is less than that of r. If f has an elementary integral over K, then each of
the terms of p, and also q/r, have an elementary integral over K.
264 CHAPTER 7. CALCULUS
Proof. By Liouville’s Principle (Theorem 46), if f is integrable, it is of the
form
f = v00 +
n
X
i=1
civ
0
i
vi
, (7.32 ter)
where v0 2 K(✓), ci 2 C, and vi 2 K(c1, . . . , cn)[✓]. Write v0 = p0 + q0r0 , where
p0 2 K[✓, 1/✓] is a Laurent polynomial, q0, r0 2 K[✓] with deg(q0) < deg(r0) and
✓ 6 |q0, and re-arrange the vi such that v1, . . . , vk 2 K(c1, . . . , cn), but vk+1, . . . , vn
genuinely involve ✓, and are monic. Furthermore, we can suppose that ✓ does
not divide any of these vi, since log ✓ = ⌘ (up to constants). Unlike Lemma
12, though, it is no longer the case that (log vi)
0
(i > k) is a proper rational
expression. Let ni be the degree (in ✓) of vi, then, recalling that we have
supposed that the vi are monic, v
0
i = ni⌘
0✓ni+ lower terms, and (v0i�ni⌘0vi)/vi
is a proper rational expression
Then we can re-arrange (7.32 ter) as
p+
q
r
= p00 +
k
X
i=1
civ
0
i
vi
+
n
X
i=k+1
cini⌘
0
| {z }
in K(c1, . . . , cn)[✓, 1/✓]
+
✓
q0
r0
◆0
+
n
X
i=k+1
ci(v
0
i � ni⌘0vi)
vi
| {z }
proper rational expression
✓ not dividing the denominator
, (7.38)
and the decomposition of the right-hand side proves the result.
This means that it is su�cient to integrate the polynomial part (which we
will do in Algorithm 43) and the rational part (which we will do in Algorithm
44) separately, and that failure in either part indicates that the whole expres-
sion does not have an elementary integral. In other words, there is no cross-
cancellation between these parts.
7.5.1 The Polynomial Part
In fact, this is simpler than the logarithmic case, since Lemma 13 says that each
summand ai✓
i has to be integrable separately.The case i = 0 is just integration
in K, and all the others are cases of the Risch di↵erential equation problem
(Definition 104).
This translates straightforwardly into Algorithm 43.
7.5.2 The Rational Expression Part
The last equation (7.38) of Lemma 13 says that
Z
q
r
=
n
X
i=k+1
cini
!
| {z }
“correction”
⌘ +
✓
q0
r0
◆
+
n
X
i=k+1
ci(log vi � ni⌘)
| {z }
proper rational expression
, (7.39)
where the vi are monic polynomials (not divisible by ✓) of degree ni. The proper
rational expression part is determined by an analogue of Hermite’s algorithm,
7.5. INTEGRATION OF EXPONENTIAL EXPRESSIONS 265
Figure 7.3: Algorithm 43: IntExp–Polynomial
Algorithm 43 (IntExp–Polynomial)
Input: p =
Pn
i=�m ai✓
i 2 K[✓], where ✓ = exp ⌘.
Output: An expression for
R
pdx, or failed if not elementary
Ans:=0
for i := �m, . . . ,�1, 1, . . . , n do
bi := RDE(i⌘
0, ai)
if bi =failed
return failed
else Ans:=Ans+bi✓
i
c0 :=
R
a0dx # integration in K
if c0 =failed
return failed
Ans:=Ans+c0
and (7.18) is still valid, though we should point out that the justification involved
stating that gcd(ri, r
0
i) = 1, where the ri were the square-free factors of r0. Since
✓0 = ⌘✓, this is no longer true if ✓|ri, but this is excluded since such factors were
moved into the Laurent polynomial (Definition 105) part.
Hence the Hermite part of Algorithm 44 is identical to the rational and
logarithmic cases. The Trager–Rothstein part is slightly more complicated, since
v0i/vi is no longer a proper rational expression, which is the cause of the term
marked ‘correction” in (7.39). Suppose we have a term s2/t2 (in lowest terms:
the programmer must not forget this check) left after Hermite’s algorithm, and
write
Z
s2
t2
=
X
ci(log vi � ni⌘), (7.40)
(7.40) can be di↵erentiated to
s2
t2
=
X
ci
v0i � ni⌘0vi
vi
. (7.41)
Hence t2 =
Q
vi, which means N := deg✓t2 =
P
ni, and, writing uj =
Q
i 6=j vi,
we can write (7.41) as
s2 =
X
ci(v
0
i � ni⌘0vi)ui. (7.42)
Furthermore, since t2 =
Q
vi, t
0
2 =
P
v0iui. Hence
vk = gcd(0, vk)
= gcd
⇣
s2 �
X
ci(v
0
i � ni⌘0vi)ui, vk
⌘
= gcd (s2 � ck(v0k � nk⌘0vk)uk, vk)
since all the other ui are divisible by vk
266 CHAPTER 7. CALCULUS
Figure 7.4: Algorithm 44: IntExp–Rational Expression
Algorithm 44 (IntExp–Rational Expression)
Input: q, r 2 K[✓] relatively prime, degq < degr.
Output: An expression for
R
q/rdx, or failed if not elementary
Ans:=0
Square-free decompose r as
Qk
i=1 r
i
i
Write
q
r
=
k
X
i=1
qi
rii
for i := 1, . . . , k do
Find ai, bi such that airi + bir
0
i = 1
for j := i, i� 1, . . . , 2 do
Ans:=Ans� qibi
(j � 1)rj�1i
#(7.18)
qi := qiai + (qibi/(j � 1))0
Cancel any common factor between qi and ri
P (�) := Res(qi � �(r0i � (deg✓ri)⌘0ri), ri, ✓) (made monic)
if P has non-constant coe�cients (*)
return failed
Write P (�) =
Q
j Qj(�)
j (square-free decomposition)
for j in 1 . . . such that Qj 6= 1
vj := gcd(qi � �r0i, ri) where Qj(�) = 0 #deg(vj) = j
Ans:=Ans+
X
� a root of Q(j)
� log vj(x)
7.5. INTEGRATION OF EXPONENTIAL EXPRESSIONS 267
= gcd
⇣
s2 � ck
X
(v0i � ni⌘0vi)ui, vk
⌘
for the same reason
= gcd (s2 � ck(t02 � t2N⌘0), vk)
since t2 = viui and N =
P
ni
But if l 6= k,
gcd (s2 � ck(t02 � t2N⌘0), vl) = gcd
⇣
X
civ
0
iui � ck
X
v0iui, vl
⌘
t2N⌘
0 disappears as vl|t2
= gcd (clv
0
lul � ckv0lul, vl)
since all the other ui are divisible by vl
= 1.
Since t2 =
Q
vl, we can put these together to deduce that
vk = gcd(s2 � ck(t02 �N⌘0t2), t2). (7.43)
Given ck, this will tell us vk. But we can deduce more from this: the ck are
precisely those numbers � such that gcd(s2 � �(t2 � N⌘0t2)0, t2) is non-trivial.
Hence we can appeal to Proposition 76 (page 293), and say that � must be such
that
P (�) := Res✓(s2 � �(t02 �N⌘0t2), t2) = 0. (7.44)
As t2 has degree N and t
0
2 �N⌘0t2 has degree N � 1, P (�) is the determinant
of a 2N � 1 square matrix, N of whose rows depend linearly on �, and thus is a
polynomial of degree N in � (strictly speaking, this argument only shows that
it has degree at most N , but the coe�cient of �N is Res(t02 �N⌘0t2, t2) 6= 0).
Hence, while we needed to re-prove the result, the application of Trager–
Rothstein is little di↵erent from the logarithmic, and indeed rational expression,
case. As in the logarithmic case, we have the caveat (*) that the roots of P must
be constant. An analogous example to that of section 7.4.2 is that of
R
x
1+ex
.
Here s2 = x, t2 = 1 + ✓ and
Res✓(s2 � �(t02 �N⌘0t2), t2) = Res✓(x� �(✓ � (1 + ✓)), 1 + ✓) = x+ �.
This would suggest a contribution of �x log(1+ex), but this also leaves � log(1+
ex) on di↵erentiating, which contradicts Liouville’s Principle. Computer algebra
systems will give answers such as Maple’s
Z
x
1 + ex
dx =
1
2
x2 � x ln (1 + ex)� polylog (2,�ex) , (7.45)
but we have, essentially, just proved that polylog (2,�ex) is not elementary.
268 CHAPTER 7. CALCULUS
7.6 Integration of Algebraic Expressions
The integration of algebraic expressions is normally taught as a bunch of, ap-
parently ad hoc tricks. A typical calculation would be
Z
1p
1� x2 d
x =
Z
1
p
1� sin2 t
d sin t
dt
dt (7.46)
substituting x = sin t
=
Z
1
cos t
cos tdt (7.47)
= t = arccos(x), (7.48)
which we can write as 1/2⇡ + i log
�
p
1� x2 + ix
�
to show that it really is
elementary, and furthermore does not violate Liouville’s Principle, since the
only new expression is now seem to be a logarithm with a constant coe�cient.
The ⇡
2
is, of course, just a particular choice of constant of integration. This
leaves numerous questions.
1. Why x = sin t? x = cos t seems to work, but gives a di↵erent result.
2. How do I know which substitutions will work?
3. How can I tell when no substitutions work?
4. What about cases when it only partially works, such as
Z
1p
1� x2
p
1� 2x2 d
x =
Z
1
p
1� 2 sin2 t
dt =?
5. These inverse functions have branch cuts — what does that mean for the
result?
The last question will be taken up in Chapter 8. The others are material for
this section, and indeed seem di�cult, to the point where the great 20th cen-
tury number theorist Hardy [Har16] even conjectured that there was no general
method. Integration theory in the 1960s, typified by the SAINT program [Sla61]
concentrated on “doing it like a human”, i.e. by guessing and substitution.
Risch [Ris70] observed that this wasn’t necessary, and [Dav81, Tra84]7 con-
verted his observations into algorithms, and implementations.
TO BE COMPLETED
7.7 The Risch Di↵erential Equation Problem
In this section we solve case (b), n = 0 of Theorem 47: viz.
7Another case of simultaneous discovery.
7.7. THE RISCH DIFFERENTIAL EQUATION PROBLEM 269
to find an algorithm which, given elements f and g of C(x) (with f
such that exp(
R
f) is a genuinely new expression, i.e., for any non-
zero F with F 0 = fF , M = C(x, F ) is transcendental over C(x) and
has Mconst = C), finds y 2 C(x) with y0 + fy = g, or proves that
such a y does not exist. This is written as RDE(f, g).
These conditions on f mean that it is not a constant, and its integral is not
purely a sum of logarithms with rational number coe�cients.
We will first consider the denominator of y 2 C(x). We could assume that
C is algebraically closed, so that all polynomials factor into linears, but in fact
we need not. We will assume that we can factor polynomials, though we will
see afterwards that this is algorithmically unnecessary.
Let p be an irreducible polynomial. Let ↵ be the largest integer such that
p↵ divides the denominator of y, which we can write as p↵ k den(y). Let � and
� be such that p� k den(f) and p� k den(g). So we can calculate the powers of
p which divide the terms of the equation to be solved:
y0
|{z}
↵+1
+ fy
|{z}
↵+�
= g
|{z}
�
.
There are then three possibilities.
1. � > 1. In this case the terms in p↵+� and p� have to cancel, that is we
must have ↵ = � � �.
2. � < 1 (in other words, � = 0). In this case the terms in p↵+1 and p� must
cancel, that is, we must have ↵ = � � 1.
3. � = 1. In this case, it is possible that the terms on the left-hand side cancel
and that the power of p which divides the denominator of y0 + fy is less
than ↵+1. If there is no cancellation, the result is indeed ↵ = ��1 = ���.
So let us suppose that there is a cancellation. We express f and y in partial
fractions with respect to p: f = F/p� + f̂ and y = Y/p↵ + ŷ, where the
powers of p which divide the denominators of f̂ and ŷ are at most ��1 = 0
and ↵� 1, and F and Y have degree less than that of p.
y0 + fy =
�↵p0Y
p↵+1
+
Y 0
p↵
+ ŷ0 +
FY
p↵+1
+
f̂Y
p↵
+
F ŷ
p
+ f̂ ŷ. (7.49)
For there to be a cancellation in this equation, p must divide �↵p0Y +FY .
But p is irreducible and Y is of degree less than that of p, therefore p and
Y are relatively prime. This implies that p divides ↵p0�F . But p0 and F
are of degree less than that of p, and the only polynomial of degree less
than that of p and divisible by p is zero. Therefore ↵ = F/p0.
Putting these together proves the following result.
Lemma 14 ([Ris69]) ↵ max(min(� � 1, � � �), F/p0), where the last term
only applies when � = 1, and when it gives rise to a positive integer.
270 CHAPTER 7. CALCULUS
In fact, it is not necessary to factorise the denominators into irreducible poly-
nomials. It is enough to find square-free polynomials pi, relatively prime in
pairs, and non-negative integers �i and �i such that den(f) =
Q
p
�i
i and
den(g) =
Q
p
�i
i . When � = 1, we have, in theory, to factorise p completely,
but it is enough to find the integral roots of Resx(F � zp0, p), by an argument
similar to Trager’s algorithm for calculating the logarithmic part of the integral
of a rational expression.
We have, therefore, been able to bound the denominator of y by D =
Q
p↵ii ,
so that y = Y/D with Y polynomial. So it is possible to suppress the denomi-
nators in our equation, and to find an equation
RY 0 + SY = T. (7.50)
Let ↵, �, � and � be the degrees of Y , R, S and T . Then (7.50) becomes
RY 0
|{z}
↵+��1
+ SY
|{z}
↵+�
= T
|{z}
�
. (7.51)
There are again three possibilities8.
1. � � 1 > �. In this case, the terms of degree ↵+ �� 1 must cancel out the
terms of degree �, therefore ↵ = � + 1� �.
2. � � 1 < �. In this case, the terms of degree ↵ + � must cancel out the
terms of degree �, therefore ↵ = � � �.
3. � � 1 = �. In this case, the terms of degree ↵ + � � 1 on the left may
cancel. If not, the previous analysis still holds, and ↵ = � + 1 � �. To
analyse the cancellation, we write Y =
P↵
i=0 yix
i, R =
P�
i=0 rix
i and
S =
P�
i=0 six
i. The coe�cients of the terms of degree ↵+��1 are ↵r�y↵
and s�y↵. The cancellation is equivalent to ↵ = �s�/r� .
Lemma 15 ([Ris69]) ↵ max(min(� � �, � + 1� �),�s�/r�), where the last
term is included only when � = � + 1, and only when it gives rise to a positive
integer.
Determining the coe�cients yi of Y is a problem of linear algebra. In fact, the
system of equations is triangular, and is easily solved.
Example 33 Let us consider y0 � 2xy = 1, i.e. f = �2x, g = 1. Since neither
f nor g have any denominator, y does not, i.e. it is purely polynomial, of degree
↵, and R = 1, S = �2x and T = 1 in (7.50). Comparing degrees gives us
y0
|{z}
↵
+�2xy
| {z }
↵+1
= 1
|{z}
0
, (7.52)
i.e. ↵ = �1, as predicted by Lemma 15. But this is impossible.
This is the Risch Di↵erential equation that arises from integrating exp(�x2),
whose integral has to be y exp�(x2). Hence we have proved the folk-lore result:
8This is not an accidental similarity, but we refer to [Dav84] for a unifying exposition.
7.8. THE PARALLEL APPROACH 271
e�x
2
has no [elementary] integral.
We can introduce a new, non-elementary, operator erf, with the property9 that
(erf(⌘))
0
= ⌘0 exp(�⌘2), and then
R
exp(�x2) = erf(x).
The case when K is an algebraic extension of C(x) is treated in [Dav84],
and the more general cases in [Ris69, Dav85b]. The principles are the same, but
the treatment of the cancellation case gets more tedious, both in theory and in
practice.
7.8 The Parallel Approach
The methods described so far this chapter have essentially taken a recursive
approach to the problem: if the integrand lives in K(x, ✓1, . . . , ✓n), we have
regarded this as K(x)(✓1)(. . .)(✓n), and the results of the previous sections all
justify an “induction on n” algorithm. In this section we will adopt a di↵erent
approach, regarding K(x, ✓1, . . . , ✓n) as the field of fractions of K[x, ✓1, . . . , ✓n],
and taking a distributed (page 50), also called parallel as we are treating all the
✓i in parallel, view of K[x = ✓0, ✓1, . . . , ✓n].
We note that Liouville’s Principle (Theorem 46) can be applied to this set-
ting.
Theorem 48 (Liouville’s Principle, Parallel version) Let f be a expres-
sion from some expression field L = K(x, ✓1, . . . , ✓n). If f has an elementary
integral over L, it has an integral of the following form:
Z
f = v0 +
m
X
i=1
ci log vi, (7.53)
where v0 belongs to L, the vi belong to K̂[x, ✓1, . . . , ✓n], an extension of K[x, ✓1, . . . , ✓n]
by a finite number of constants algebraic over K, and the ci belong to K̂.
Other that being more explicit about the domains objects belong to, we are
asserting that the vi are polynomial, which can be assumed since log(p/q) =
log(p)� log(q), or in the language of di↵erential algebra,
(p/q)0
p/q
=
p0
p
� q
0
q
.
Notation 35 In (7.53), we write v0 =
p
q
, where p, q 2 K[x, ✓1, . . . , ✓n], but are
not necessarily relatively prime.
9The reader may object that
R
e�x
2
=
p
⇡
2
erf(x). This statement is only relevant after one
has attached precise numeric meanings e�x
2
to the di↵erential-algebraic concept exp(�x2) —
see Chapter 8.
272 CHAPTER 7. CALCULUS
From now on, until section 7.8.1, we will assume that the ✓i are transcen-
dental. We can then infer the following algorithm for integrating expressions in
this setting.
Pseudo-Algorithm 45 (Parallel Risch [NM77, Bro07])
Input: L = K(x, ✓1, . . . , ✓n) a purely transcendental di↵erential field with con-
stants K, f 2 L
Output: g an elementary integral of f , or a proof that there is no such one
satisfying the assumptions of steps (1)–(3).
(1) Decide candidate v1, . . . , vm (we may have too many)
(2) Decide a candidate q (which may be a multiple of the true value)
(3) Decide degree bounds for p (which may be too large),
i.e. n0 . . . , nn such that
p =
n0
X
i0=0
n1
X
i1=0
· · ·
nn
X
in=0
ci0,i1,...,inx
i0
n
Y
j=1
✓
ij
j
(4) Clear denominators in the derivative of (7.53)
(5) Solve the resulting linear equations for the ci and ci0,i1,...,in
(c0,...,0 is the the constant of integration, and is never determined)
(6) if there’s a solution
then reduce p/q to lowest terms and return the result
else “integral not found”
As explained in [Bro07], it is the decisions taken in steps (1)–(3) which mean
that this is not a guaranteed “integrate or prove unintegrable” algorithm. The
work of the previous sections allows partial solutions:
(1) Those v1, . . . , vm which depend on ✓n
(2) A candidate q (which may be a multiple of the true value)
But the multiple is in K(x, ✓1, . . . , ✓n�1)[✓n], not in K[x, ✓1, . . . , ✓n]
(3) A degree bound nn for p as a polynomial in ✓n
7.8.1 The Parallel Approach: Algebraic Expressions
7.9 Definite Integration
We have shown (Example 33) that
R
exp(�x2) has no elementary expression,
i.e. that it is a new expression, which we called erf. However,
R1
�1 e
�x2 (where
we have attached a precise numerical meaning ex to exp(x)) is well-known to
be ⇡, which is essentially another way of saying that erf(±1) = ±1, and is
therefore a mater of giving numerical values to functions — see Chapter 8.
TO BE COMPLETEDMeijer etc.
7.10. OTHER CALCULUS PROBLEMS 273
7.10 Other Calculus Problems
7.10.1 Indefinite summation
The initial paper in this area was [Gos78]. Although not expressed that way in
this paper, the key idea [Kar81] is the following, akin to Definition 99.
Definition 106 A di↵erence ring is a ring (Definition 8) equipped with an
additional unary operation, referred to as di↵erentiation and written10 with a
prefix �, which satisfies three additional laws:
1. �(f + g) = �f + �g;
2. �(fg) = f(�g) + (�f)g + (�f)(�g);
3. �(1) = 0.
A di↵erence ring which is also a field (Defintion 15) is referred to as a di↵erence
field.
One thinks of �(f) as being f(n+1)�f(n). It is worth remarking the di↵erences
with Definition 99: clause 3 is necessary, whereas previously it could be inferred
from the others, and clause 2 is di↵erent, essentially because (x2)0 = 2x but
�(n2) = (n+ 1)2 � n2 = 2n+ 1.
The process of formal (indefinite) summation
P
is then that of inverting �,
as
R
is the prcess of inverting 0. Just as the rôle of dx in integration theory
is explained by the fact that x0 = 1, i.e.
R
1 = x, so the traditional rôle of n
in summation theory is explained by the fact that �n = 1, equivalently that
P
1 = n. Note that the two procedures of integration and summation are more
similar than normal notation would suggest: we are happy with
R
x = 1
2
x2, but
less so with
P
n = 1
2
n(n+ 1), yet the two are essentially equivalent.
To further the analogy, we know that
R
1
x
cannot be expressed as a rational
function of x, but is a new expression, known conventionally as log x. Similarly,
P
1
n
cannot be expressed as a rational function of n [Kar81, Example 16], but
is a new expression, known conventionally as Hn, the n-th harmonic number.
Again,
R
log x = x log x� x, and PHn = nHn � n.
7.10.2 Definite Symbolic Summation
Definite summation might be thought to be related to indefinite summation in
the way that definite integration (Section 7.9) is to indefinite. There is certainly
the same problem of evaluating expressions from di↵erence algebra at numerical
points. However, in practice we are more concerned with a subtly di↵erent class
of problems, where the limits of summation also figure in the summand.
Example 34 ([GKP94, Exercise 6.69], [Sch00b])
n
X
k=1
k2Hn+k =
1
3
n
✓
n+
1
2
◆
(n+ 1)(2H2n �Hn)�
1
36
n(10n2 + 9n� 1). (7.54)
10In this text: notations di↵er.
274 CHAPTER 7. CALCULUS
What are the key references here? [Sch04]?
7.10.3
7.10.4
Chapter 8
Algebra versus Analysis
We have seen in the previous chapter how we can construct an algebraic theory
of mathematical objects such as ‘exp’ and ‘log’, and possibly others. From an
algebraic point of view, they seem to behave like the mathematical objects we are
familiar with from analysis. Are they the same? If not, what are the di↵erences?
This is perhaps one of the less-discussed topics1 in computer algebra, and indeed
possibly in mathematics more generally.
Notation 36 Throughout this chapter, we use the notation
?
= to denote an
equation that might or might not be true, or partially true, depending on the
interpretations, from algebra or from analysis, that one places on the symbols
either side of
?
=.
8.1 Functions and Formulae
This question turns out to be related to the di↵erence between functions and
formulae (which we have also called expressions). Consider the two-dimensional
formula x
2�1
x�1 , or (x^2-1)/(x-1) if we prefer one-dimensional expressions. It
has potentially many rôles.
formula There are several options here: strings of characters, or a parse tree.
Whichever we choose, (x^2-1)/(x-1) is a di↵erent formula from x+1.
2 Q(x) (see section 2.2.) This is therefore mathematically equivalent to x+ 1,
and an algebra system may or may not transform one into the other:
a system that aims for candidness in this context (section 2.2.2) should
transform (x^2-1)/(x-1) into x+1.
2 K(x) Of course, it is only convention that chooses Q for the ground field.
It could be any extension of Q, or, more challengingly, a finite field of
positive characteristic.
1But see [Dav10].
275
276 CHAPTER 8. ALGEBRA VERSUS ANALYSIS
rule This is what a computer scientist would think of as �x.x
2�1
x�1 : that rule
which, given an x, computes the corresponding value of the formula
None of these are, as such, a function in the sense of Notation 3, though the
last is the nearest. However, trying to express x
2�1
x�1 in this notation exposes our
problems.
✓⇢✓
x,
x2 � 1
x� 1
◆
| x 2 Q
�
,Q,Q
◆
B
(8.1)
is illegal, because of the case x = 1. Ruling this out gives us
✓⇢✓
x,
x2 � 1
x� 1
◆
| x 2 Q \ {1}
�
,Q \ {1},Q
◆
B
. (8.2)
If we regard 0
0
as ?, then we can have
✓⇢✓
x,
x2 � 1
x� 1
◆
| x 2 Q
�
,Q,Q [ {?}
◆
B
. (8.3)
Making the ? explicit gives us
✓⇢✓
x,
x2 � 1
x� 1
◆
| x 2 Q \ {1}
�
[ {(1,?)},Q,Q [ {?}
◆
B
. (8.4)
If we’re going to single out a special case, we might as well (since 2 = limx!1
x2�1
x�1 ,
i.e. this is a removable singularity , in the sense that we can give our function a
value at the apparently singular point whch makes it continuous) write
✓⇢✓
x,
x2 � 1
x� 1
◆
| x 2 Q \ {1}
�
[ {(1, 2)},Q,Q
◆
B
, (8.5)
dropping the ?, and this is equal to
({(x, x+ 1) | x 2 Q} ,Q,Q)B . (8.6)
The case of polynomials is simpler (we restrict consideration to one variable,
but this isn’t necessary). Over a ring R of characteristic zero2, equality of
abstract polynomials is the same as equality of the corresponding functions:
p = q in R[x], ({(x, p(x)) | x 2 R} , R,R)B = ({(x, q(x)) | x 2 R} , R,R)B
(8.7)
This is a consequence of the fact that a non-zero polynomial p � q has only a
finite number of zeros.
If we attach a meaning to elements of R(x) by analogy with (8.2), omitting
dubious points in the domain, then equality in the sense of Definition 16 is
related to the equality of Bourbakist functions in the following way
f = g in R(x), ({(x, f(x)) | x 2 S} , S,R)B = ({(x, g(x)) | x 2 S} , S,R)B ,
(8.8)
where S is the intersection of the domains of f and g.
2See example 5 on page 25 to explain this limitation.
8.2. BRANCH CUTS 277
8.2 Branch Cuts
In the previous section, we observed that there was a di↵erence between “expres-
sions”, whether concrete formulae or abstract expressions inK(x), and functions
R! R or C! C. If we go beyond K(x), the problems get more serious.
Example 35 Consider the formula sqrt(z), or
p
z, which we can formalise
as ✓ 2 K(z, ✓|✓2 = z). What happens if we try to interpret ✓ as a function
✓(z) : C ! C? Presumably we choose ✓(1) = 1, and, by continuity, ✓(1 + ✏) =
1 + 1
2
✏ � 1
8
✏2 + · · · ⇡ 1 + 1
2
✏. Similarly, ✓(1 + ✏i) ⇡ 1 + 1
2
✏i, and so on. If we
continue to track ✓(z) around the unit circle {z = x+ iy : x2 + y2 = 1}, we see
that ✓(i) = 1+ip
2
, and ✓(�1) = i. Continuing, ✓(�i) = �1+ip
2
and ✓(1) = �1.
It would be tempting to dismiss this as “the usual square root ambiguity”, but
in fact the problem is not so easily dismissed.
Example 36 Similarly, consider log(z), which we can formalise as ✓ 2 K(z,
✓| exp(✓(z)) = z}. What happens if we try to interpret ✓ as a function ✓(z) :
C ! C? We chose ✓(1) = 0, and, since exp(✏i) ⇡ ✏i, ✓(✏i) ⇡ ✏i. As we track
✓(z) around the unit circle {z = x + iy : x2y2 = 1}, we see that ✓(i) = i⇡/2,
✓(�1) = i⇡, and ultimately ✓(1) = 2⇡i.
8.2.1 Some Unpleasant Facts
The blunt facts3 are the following (and many analogues).
Proposition 74 (No square root function) There is no continuous func-
tion f : C! C (or C \ {0}! C \ {0}) with the property that 8z : f(z)2 = z.
Proposition 75 (No logarithm function) There is no continuous function
f : C \ {0} ! C \ {0} with the property that 8z : exp(f(z)) = z (note that
f(exp(z)) = z is clearly impossible as exp is many:one).
Observation 19 This statement, about actual functions C! C, might seem to
be in contradiction with the statement labelled “(1)+(2)” on page 254. In fact,
it is not so much a contradiction as a statement that the world of functions
C! C is not as neat as the algebraic world of the previous chapter.
In particular, the common statements
log z1 + log z2
?
= log z1z2. (8.9)
log 1/z
?
=� log z. (8.10)
are false4 for (any interpretation of) the function log : C \ {0}! C \ {0}, even
though they are true for the usual log : R+ ! R. On page 253 we proved a
3“No continuous argument”, from which the others follow, is proved in [Pri03, §9.2].
4Consider z
1
= z
2
= �1 and z = �1 for counter-examples.
278 CHAPTER 8. ALGEBRA VERSUS ANALYSIS
result which we said was “normally expressed as” (8.9). What we in fact proved
was that � := log ⌘1 + log ⌘2 � log(⌘1⌘2) was a constant, in the sense that it
di↵erentiated to 0: in fact it is
� =
8
<
:
2⇡i arg ⌘1 + arg ⌘2 > ⇡
0 �⇡ < arg ⌘1 + arg ⌘2 ⇡
�2⇡i arg ⌘1 + arg ⌘2 �⇡
. (8.11)
Similarly the discrepancy in (8.10), i.e. := log(1/z) + log z, is a di↵erential
constant whose value is
=
⇢
2⇡i z = x+ iy : x < 0 ^ y = 0
0 otherwise
. (8.12)
8.2.2 The Problem with Square Roots
These di�culties arise even with the square root function, and without needing
to consider di↵erential algebra.
Example 37 ([BCD+02]) Consider two apparently similar statements:
p
1� z
p
1 + z
?
=
p
1� z2 (8.13)
p
z � 1
p
z + 1
?
=
p
z2 � 1. (8.14)
Näıvely, we might consider both to be true: square both sides and they reduce to
(1�z)(1+z) = 1�z2 and (z�1)(z+1) = z2�1, both of which are true. But in
fact, while (8.13) is true [BCD+02, Lemma 2], (8.14) is false, as evaluation at
z = �2 gives
p
�3
p
�1 ?=
p
3,and in fact the left-hand side is �
p
3, rather thanp
3.
8.2.3 Possible Solutions
These facts have, of course, been known since the beginnings of complex analysis,
and there are various approaches to the problem, discussed in [Dav10].
Multivalued Functions (taking values not on C, but in the set of subsets of
C, i.e. P(C)). Here we think of the logarithm of 1 as being, not 0, but
any of {. . . ,�4⇡i,�2⇡i, 0, 2⇡i, 4⇡i, . . .}. Our exemplar problem (8.9) is
then treated as follows.
The equation merely states that the sum of one of the (infinitely
many) logarithms of z1 and one of the (infinitely many) loga-
rithms of z2 can be found among the (infinitely many) loga-
rithms of z1z2, and conversely every logarithm of z1z2 can be
represented as a sum of this kind (with a suitable choice of log z1
and log z2).
[Car58, pp. 259–260] (our notation)
8.2. BRANCH CUTS 279
Since log is to be the inverse of exp, we essentially transpose the graph and
have
�
graph(exp)T ,C,P(C)
�
B as the Bourbaki formulation of logarithm.
Notation 37 We will use function names beginning with capitals, e.g.
Log or Sqrt (so Sqrt(4) = {�2, 2}) for such multivalued functions.
(8.13) and (8.14), then both become true, when rewritten as
Sqrt(1� z) Sqrt(1 + z) ?=Sqrt(1� z2) (8.130)
Sqrt(z � 1) Sqrt(z + 1) ?=Sqrt(z2 � 1). (8.140)
! Note, however, that what appears to be a special case of (8.9), viz.
2 log z
?
= log z2, (8.15)
is not true in this setting, since when z = 1, Log z2 = Log 1 = {2k⇡i},
while 2 Log z = 2Log 1 = {4k⇡i}. The problem is, in fact, that Log z +
Log z 6= 2Log z.
Riemann Surfaces Instead of saying that log 1 has multiple values, we say
that there are multiple versions of ‘1’, each of which has a well-defined
logarithm. In terms of Example 35, the ‘1’ that we reach after going
round the circle once is di↵erent from the ‘1’ we started from. In terms of
Notation 3, we have
�
graph(exp)T ,Rlog z,C
�
B, where Rlog z signifies the
Riemann surface corresponding to the function log z, shown in Figure 8.1.
The Riemann surface view is discussed in [BCD+02, Section 2.4], which
concludes
Riemann surfaces are a beautiful conceptual scheme, but at the
moment they are not computational schemes.
Note that the standard counterexample to (8.10), viz . z = �1, so that
log(1/� 1) = ⇡i 6= � log(�1) is now solved by the fact that 1/� 1 is a �1
on a di↵erent branch, and in (8.9) z1 + z2 is on the appropriate branch to
make (8.9) valid.
* Both the previous solutions have preserved continuity, and the identities, at
the cost of not having a function C! C.
Branches Of course, we could also consider subsets of C which do not contain
the whole of the troublesome circle of Example 35. If we do that, then we
can have a continuous version of, say, log z as in
log z has a branch in any simply connected open set which does
not contain 0.
[Car73, p. 61]
280 CHAPTER 8. ALGEBRA VERSUS ANALYSIS
Figure 8.1: A Riemann surface example: log
So any given branch would be (G,D, I)B, where D is a simply connected
open set which does not contain 0, G is a graph obtained from one element
of the graph (i.e. a pair (z, log(z)) for some z 2 D) by analytic contin-
uation, and I is the relevant image set. While this approach preserves
continuity, it has three drawbacks.
1. We now have as many definitions of log, say logD, as we have choices
of D, and these do not necessarily agree, even if we insist that 1 2 D
and logD 1 = 0. For example, if D1 = C \ {iy : y 2 [0,1)} and D2 =
C\{�iy : y 2 [0,1)}, then logD1(�1) = �i⇡ (since log(1�✏i) ⇡ �✏i
and we get from 1 to �1 within D1 by going clockwise round the unit
circle from 1 via �i, whose logarithm is �i⇡/2) but logD2(�1) = i⇡,
since now we go anticlockwise round the unit circle, from 1 to �1 via
i.
2. logD is not defined outside D.
3. Identities such as (8.9) are not valid, even if z1, z2 and z1z2 are all
in D: consider D1 as above, with z1 = z2 = �1. This is inevitable:
no assignment of a nonzero value to log(�1) can satisfy (8.10), for
example.
This solution preserves continuity within D, at the cost of losing identities,
and the uncertainty caused by the multiple options for D.
8.2. BRANCH CUTS 281
Branch Cuts Here we associate with each potentially multi-valued operator f
the following.
• An “initial value” (z0, f(z0)).
• The “branch cut”, which is a curve, or set of curves, B in C joining
the singularities of f , such that C \ B is simply-connected. f(z) for
any z /2 B is then obtained by analytic continuation from (z0, f(z0))
by analytic continuation along any path not meeting B.
• A rule for computing f(z) on B, which is generally arranged to make
f(z) continuous with one side or the other of B.
A typical example would be log, where it is usual these days to have:
• Initial value (1, 0);
• Branch cut {x+ 0i : x 2 (�1, 0]};
• Rule for log x : x < 0: log(x+0i) = lim✏!0+ log(x+✏i) (= i⇡+log |x|).
We should note that there are many possible choices: the author was in
fact initially taught
• Initial value (1, 0);
• Branch cut {x+ 0i : x 2 [0,1)};
• Rule for log x : x > 0: log(x+ 0i) = lim✏!0+ log(x+ ✏i).
Despite the apparent arbitrariness, the world has tended to converge on
the branch cuts as defined5 in [AS64, Nat10].
8.2.4 Removable Branch Cuts
This term is introduced in [DF94, §4.2], presumably by analogy with “removable
singularity”. The example they give is illustrative.
Example 38 ([DF94, §4.2], our phrasing) Consider f(z) = g(z) � h(z),
where g(z) = log(z + 1) and h(z) = log(z � 1). Then g has a branch cut along
(�1,�1] and h has one along (�1, 1], and hence we would expect a branch
cut for f along (�1, 1]. In fact, the contributions from g and h exactly cancel
along (�1,�1), and the actual branch cut is only [�1, 1]. We should note that
this is only true of g � h, and (1 + ✏)g � h genuinely has a branch cut along
(�1, 1] for all ✏ 6= 0.
This is, in fact, analogous to the “removable singularity” behaviour we saw at
(8.1): x
2�1�✏
x�1 has a genuine singularity at x = 1 for all ✏ 6= 0, but when ✏ = 0
the singularity is in fact removable.
5The careful reader should note that one should use printing 9 or later of [AS64]: the
branch cut for arccot moved!
282 CHAPTER 8. ALGEBRA VERSUS ANALYSIS
8.3 Fundamental Theorem of Calculus Revisited
In the previous chapter we reduced the Fundamental Theorem of Calculus to
the status of Notation 32, saying that integration is the inverse of di↵erentiation.
From the algebraic point of view, that is correct. From the analytic point of
view, where the following definitions hold, there is indeed something to prove.
Definition 107 (Analysis) Given a function f : R ! R, we define, where
the right-hand sides exist,
f 0(x) = limh!0
f(x+h)�f(x)
h
F (x) =
R x
a
f(x)dx = lim|�|!0 S
�(f) = lim|�|!0 S�(f)
provided both limits exist and are equal
(8.16)
where � = [x0 = a < x1 < · · · < xn = x] is a dissection of [a, x], |�| = maxi(xi�xi�1),and S�(f) = Pn i=1(xi�xi�1)maxy2[xi�1,xi] f(y) and S�(f) = Pn i=1(xi � xi�1)miny2[xi�1,xi] f(y) and the upper and lower approximations of “the area under the curve” of f . Theorem 49 (Fundamental Theorem of Calculus [Apo67, §5.3]) Let f and F be functions defined on a closed interval [a, b] such that F 0 = f . Then, if f is Riemann-integrable on [a, b], then Z b a f(x)dx = F (b)� F (a). Though this is the classical statement, we must emphasise that F 0 = f must hold throughout [a, b], and therefore F is di↵erentiable, hence continuous, thoughout this interval. 8.4 Constants Revisited In the previous chapter we defined (Definition 100) a constant as being an element whose derivative was zero. How well does this compare with our usual intuition, which is of a constant function, i.e. one whose value is independent of the argument? It is a truism of calculus that a constant function has to have derivative zero, and a theorem of analysis that a function whose derivative is defined and zero everywhere is indeed constant, but that italicised phrase is important. The classic example is the Heaviside function: H(x) = ⇢ 0 x 0 1 x > 0
which is clearly not constant, but whose derivative is zero except at x = 0, where
it is undefined.
8.5. INTEGRATING ‘REAL’ FUNCTIONS 283
8.4.1 Constants can be useful
We can make constructive use of the concept of a di↵erential constant, however.
Mathematica, for example, writes [Lic11]
p
x2 in the form x sign(x), and then
treats sign(x) as a di↵erential constant internally, replacing it by 1
x
p
x2 at the
end.
8.4.2 Constants are often troubling
One sometimes sees6 (but stated as an equality, rather than with our
?
=)
arctan(x) + arctan(y)
?
=arctan
✓
x+ y
1� xy
◆
. (8.17)
If we let
C = arctan(x) + arctan(y)� arctan
✓
x+ y
1� xy
◆
, (8.18)
then
@C
@x
=
1
1 + x2
�
(1�xy)+y(x+y)
(1�xy)2
1 +
⇣
x+y
1�xy
⌘2
=
1
1 + x2
� (1� xy) + y(x+ y)
(1� xy)2 + (x+ y)2 = 0, (8.19)
and similarly @C
@y
= 0, so C would seem to be a constant. Di↵erentially, it is,
but in fact
C(x, y) =
8
< : �⇡ xy > 1;x < 0 0 < 1 ⇡ xy > 1;x > 0
: (8.20)
see Figure 8.2.
8.5 Integrating ‘real’ Functions
In the previous chapter we saw
Z
1
x2 + 1
dx =
i
2
(ln (1� ix)� ln (1 + ix)) , 7.28bis
and said that the reader might complain “I asked to integrate a real function,
but the answer is coming back in terms of complex numbers”.
A calculus text would normally write
Z
1
x2 + 1
dx = arctan(x), 7.28ter
6For an example, see http://www.mathamazement.com/Lessons/Pre-Calculus/05_
Analytic-Trigonometry/sum-and-difference-formulas.html.
http://www.mathamazement.com/Lessons/Pre-Calculus/05_Analytic-Trigonometry/sum-and-difference-formulas.html
http://www.mathamazement.com/Lessons/Pre-Calculus/05_Analytic-Trigonometry/sum-and-difference-formulas.html
284 CHAPTER 8. ALGEBRA VERSUS ANALYSIS
Figure 8.2: plot3d(C, x =-4..4, y=-4..4): C from (8.20)
where arctan is the inverse7 of the tan function, defined by
tan(x) =
sin(x)
cos(x)
=
1
i
eix � e�ix
eix + e�ix
=
1
i
✓ � ✓�1
✓ + ✓�1
=
1
i
✓2 � 1
✓2 + 1
, (8.21)
where ✓0 = i✓ is an exponential in the sense of Definition 101.
If we write � = 1
i
✓2�1
✓2+1
, we can deduce that �0 = �2 + 1, and treat this as a
definition of “arctangents” in the same way that Definition 101 is a definition
of “exponentials”, by adding a clause
(d) ✓ (assumed to be nonzero) is a tangent over K, i.e. there is an ⌘ in K such
that ✓0 = ⌘0(✓2 + 1): ✓ is a tan of ⌘.
Similarly, we could add
(e) ✓ is an inverse tangent over K, i.e. there is an ⌘ in K such that ✓0 =
⌘0/(⌘2 + 1): ✓ is an arctan of ⌘.
We could then deduce an equivalent of statement “(1)+(2)” on page 254, that
tan are (di↵erential) inverses of each other.
Lemma 16 ((8.17) restated) If ✓ is an arctan of ↵, � is an arctan of � and
is an arctan of ↵+�
1�↵� , then ✓+ �� is a constant, in the sense of Definition
100.
7It is common to write tan�1(x) rather than arctan(x), but the author finds this open to
confusion with the other abuse of notation that writes tan2(x) for (tan(x))2. What would
tan�2(x) mean?
8.6. LOGARITHMS REVISITED 285
The proof is equivalent to (8.19):
(✓ + �� )0 = ↵
0
1 + ↵2
+
�0
1 + �2
�
⇣
↵+�
1�↵�
⌘0
1 +
⇣
↵+�
1�↵�
⌘2
= 0.
However, the fact that this “constant” is given by (8.20) can trip us up: consider
Z
1
x2 + 1
+
2
x2 + 4
dx
?
=arctan
✓
3x
2� x2
◆
. (8.22)
This is di↵erentially valid, but apparently violates numerical evaluation
Z 2
1
1
x2 + 1
+
2
x2 + 4
dx
?
=
arctan
✓
3x
2� x2
◆�2
x=1
= �2 arctan(3) ⇡ �2.498,
(8.23)
and we have an integral of a positive function apparently being negative. In
fact the correct answer is ⇡ � 2 arctan(3) ⇡ 0.644.
One way of explaining this problem to some-one not familiar with the area is
to point out that an integral should always be continuous, whereas arctan
⇣
3x
2�x2
⌘
is not continuous in the interval [1, 2], since there’s a discontinuity at x =
p
2
(see Figure 8.3:
lim
x!
p
2
�
arctan
✓
3x
2� x2
◆
= arctan(+1) = ⇡
2
but
lim
x!
p
2
+
arctan
✓
3x
2� x2
◆
= arctan(�1) = �⇡
2
,
and it is this discontinuity of ⇡ that needs to be accounted for.
8.6 Logarithms revisited
8.7 Other decision questions
We saw the theory of polynomial quantifier elimination in section 3.5.3. There,
given a formula
Q1x1, . . . , Qkxk�(x1, . . . , xn)
(where each Qi is either 9 or 8, and � is a Boolean combination of equal-
ities and inequalities between polynomials), this produces an equivalent for-
mula (xk+1, . . . , xn). In particular, if k = n, we get either ‘true’ or ‘false’.
Expressions like
p
x2 � 1 can be handled by writing them as y and adding
9y : y2 = x2 � 1^ in the appropriate place, for example. How can we add
transcendental functions? This, of course, assumes that we know about the
286 CHAPTER 8. ALGEBRA VERSUS ANALYSIS
Figure 8.3: Graph of apparent integral in (8.22)
plot(arctan(3/(2-x^2)), x = 1 .. 2)
8.7. OTHER DECISION QUESTIONS 287
numerical values of the functions as well as their di↵erential properties. It is
usual to assume Schanuel’s conjectures [Ax71], which roughly speaking state
that there are no unexpected identities in exponentials and logarithms of com-
plex numbers.
Conjecture 2 (Schanuel [Ax71]) Given any n complex numbers z1, . . . , zn
which are linearly independent over the rational numbers Q, the extension field
Q(z1, . . . , zn, exp(z1), . . . , exp(zn)) has transcendence degree of at least n over
Q.
[AW00] take a di↵erent approach.
Definition 108 A real or complex valued function f defined in some open do-
main of R or C, respectively, is called strongly transcendental (with exceptional
point ⇠) if for all numbers x in the domain of f excluding ⇠ not both x and f(x)
are algebraic.
It follows from Lindemann’s Theorem that exp is transcendental with excep-
tional point 0, and log is transcendental with exceptional point 1. Similarly
sin, cos and tan and their inverses, with exceptional points 0 except for arccos,
which has 1.
[MW12] first consider problems of the form
Q1x1�(x1, trans(x1)),
where trans is a strongly transcendental function with
trans0(x) =
a(x) + b(x) trans(x)
d(x)
: a, b, d 2 Z[x].
log, exp and arctan all satisfy these requirements. Here they produce an un-
conditional algorithm for reducing this to ‘true’ or ‘false’, where ‘unconditional’
means not relying on Conjecture 2 or equivalent hard problems in number the-
ory, and it is this unconditionality that is the most surprising part of the work.
This will, for example, decide
8×1(exp(x1)(1� x1) 1 _ x1 � 1,
and in fact does so in less than 1 second [Ach06].
They then take problems of the form
Q1x1Q2x2 . . . , Qnxn�(x1, trans(x1), x2, . . . , xn),
apply the Collins-style quantifier elimination toQ2x2 . . . , Qnxn�(x1, y, x2, . . . , xn)
to get �(x1, y), and apply their previous method to Q1x1�(x, trans(x1)). Again,
the process is unconditional. They also show how various other problems, e.g.
8x1x1 > 7) cosh(x1) > x31 � 4×1,
can be transformed into this form, and decided (17 seconds).
While this is a significant step forward, we should note that we are limited
to one occurrence of trans, and this has to correspond to the first quantifier.
Also, this is purely a decision procedure, and will not do more general quantifier
elimination.
288 CHAPTER 8. ALGEBRA VERSUS ANALYSIS
8.8 Limits
While we can, and have, algebraicised di↵erentiation and integration, the pro-
cess of computing limits is more fundamentally analytic. The first serious ex-
ploration of limits was in Macsyma [Wan71b, Wan71a]. This process essentially
added four “constants” to Macsyma’s language:
infinity which is the infinity of the (one-point compactification of the) com-
plex plane;
inf which is the positive one of the two-point compactification of the reals
(“plus infinity”);
minf which is the negative one of the two-point compactification of the reals
(“minus infinity”)
ind which is “indeterminate”.
Computer algebra systems are in general schizophrenic about whether they are
computing over the reals or the complexes, but we should note the confusing
fact that the reals are a subset of the complexes, but the usual (two-point, +1
and �1) compactification of the reals is not a subset of the compactification
of the complexes.
8.8.1 A Definite Integral
With the conventional choice of branch cuts, we can write the fractional part of
x as
fra(x) = x� bxc = 1
2
+
i
2⇡
log(� exp(�2⇡ix)). (8.24)
If we ask Maple to simplify the above, nothing happens, but with the symbolic
option (i.e. ignoring the branch cuts of log)8, we just get x. We note that
fra0(x) = 1 since fra(x) and x “di↵er by a constant”, that (di↵erential) constant
being bxc.
If we ask for
Z 1
1/2
fra(1/x), dx =
Z 1
1/2
✓
1
x
� 1
◆
dx = ln 2� 1
2
⇡ 0.193, (8.25)
many systems9 will get this right. The fun comes when we try, say,
R 1
1/3
fra(1/x), dx.
Maple returns ln 3� 1
3
⇡ 0.765, which is absurd for integrating a function whose
value is between 0 and 1 on an interval of length 2/3. It gets
Z 1/2
1/3
fra(1/x), dx =
Z 1/2
1/3
✓
1
x
� 2
◆
dx = ln 3� ln 2� 2
6
⇡ 0.072 (8.26)
8Sage’s full simplify apparently does the same.
9Sage apparently returns log 2 — https://groups.google.com/forum/#!topic/
sage-support/oUkERaidET0.
https://groups.google.com/forum/#!topic/sage-support/oUkERaidET0
https://groups.google.com/forum/#!topic/sage-support/oUkERaidET0
8.8. LIMITS 289
correct, but has failed to spot the singularity in the middle of the integrand in
Z 1
1/3
fra(1/x), dx =
Z 1
1/3
✓
1
x
�
⇢
2 x < 1/2 1 x � 1/2 ◆ dx = ln 3� 1 2 � 2 6 ⇡ 0.265. (8.27) If we split the integral by hand, as in R 1/2 1/3 fra(1/x), dx+ R 1 1/2 fra(1/x), dx, we get the right answer, as we do when we use Maple’s preferred notation, frac(1/x) rather than (8.24). 290 CHAPTER 8. ALGEBRA VERSUS ANALYSIS Appendix A Algebraic Background We are quite often concerned with estimating the sizes of things, either to com- pute the running time or to be sure that we have exhausted all possibilities of failure. The following notations are used. Notation 38 Let f(x) = n X i=0 aix i = an n Y i=1 (x� ↵i) be a polynomial of degree n (i.e. an 6= 0). Define the following measures of the size of the polynomial f : H(f) (often written ||f ||1), the height or 1-norm, is maxni=0 |ai|; ||f || (often written ||f ||2), the 2-norm, is p Pn i=0 |ai|2; L(f) (often written ||f ||1), the length or 1-norm, is Pn i=0 |ai|; M(f) , the Mahler measure of f , is |an| Y |↵i|>1
|↵i|.
A.1 The resultant and friends
A.1.1 Resultant
It quite often happens that we have to consider whether two polynomials, which
are usually relatively prime, can have a common factor in certain special cases.
The basic algebraic tool for solving this problem is called the resultant. In this
section we shall define this object and we shall give some properties of it. We
take the case of two polynomials f and g in one variable x and with coe�cients
in a ring R.
We write f =
Pn
i=0 aix
i and g =
Pm
i=0 bix
i.
291
292 APPENDIX A. ALGEBRAIC BACKGROUND
Definition 109 The Sylvester matrix of f and g is the matrix
Syl(f, g) =
0
B
B
B
B
B
B
B
B
B
B
B
B
B
B
B
@
an an�1 . . . a1 a0 0 0 . . . 0
0 an an�1 . . . a1 a0 0 . . . 0
…
. . .
. . .
. . . . . .
. . .
. . .
. . .
…
0 . . . 0 an an�1 . . . a1 a0 0
0 . . . 0 0 an an�1 . . . a1 a0
bm bm�1 . . . b1 b0 0 0 . . . 0
0 bm bm�1 . . . b1 b0 0 . . . 0
…
. . .
. . .
. . . . . .
. . .
. . .
. . .
…
0 . . . 0 bm bm�1 . . . b1 b0 0
0 . . . 0 0 bm bm�1 . . . b1 b0
1
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
where there are m lines constructed with the ai, n lines constructed with the bi.
Definition 110 The resultant of f and g, written Res(f, g), or Resx(f, g) if
there is doubt about the variable, is the determinant of this matrix. The j-th
principal subresultant coe�cient, pscj(f, g) is the determinant of the matrix ob-
tained by deleting the last j rows of f coe�cients, the last j rows of g coe�cients
and the last 2j columns, i.e. the resultant of the quotients of dividing f and g
by xj.
Well-known properties of determinants imply that the resultant belongs to R,
and that Res(f, g) and Res(g, f) are equal, to within a sign. We must note that,
although the resultant is defined by a determinant, this is not the best way
of calculating it. Because of the special structure of the Sylvester matrix, we
can consider Euclid’s algorithm as Gaussian elimination in this matrix (hence
the connection betwen the resultant and the g.c.d.). One can also consider the
sub-resultant method as an application of the Sylvester identity (theorem 15)
to this elimination. It is not very di�cult to adapt advanced methods (such
as the method of sub-resultants described in section 2.3.2, or the modular1
and p-adic methods described in chapters 4 and 5) to the calculation of the
resultant. Collins [1971] and Loos [1982] discuss this problem. We now give a
version of Euclid’s algorithm for calculating the resultant. We denote by lc(p)
the leading coe�cient of the polynomial p(x),by degree(p) its degree, and by
remainder(p, q) the remainder from the division of p(x) by q(x). We give the
algorithm in a recursive form.
Algorithm 46 (resultant)
Input: f, g;
Output: r = Res(f, g).
n := degree(f);
m := degree(g);
if n > m then r := (�1)nmresultant(g, f)
1The modular method is described in section 4.5.1.
A.1. THE RESULTANT AND FRIENDS 293
else an := lc(f);
if n = 0 then r := amn
else h := remainder(g, f);
if h = 0 then r := 0
else p := degree(h);
r := am�pn resultant(f, h);
return r;
It is worth noting that this returns a partially factored form of the resultant,
if the underlying arithmetic supports this (e.g. the factored representation of
section 2.1.3).
We write h =
Pp
i=0 cix
i and ci = 0 for i > p. This algorithm does indeed
give the resultant of f and g for, when n m and n 6= 0, the polynomials
xig � xih (for 0 i < n) are linear combinations of the xjf (for 0 j < m),
and therefore we are not changing the determinant of the Sylvester matrix of f
and g by replacing bi by ci for 0 i < m. Now this new matrix has the form
✓
A ⇤
0 B
◆
where A is a triangular matrix with determinant am�pn and B is the
Sylvester matrix of f and h. From this algorithm we immediately get
Proposition 76 Res(f, g) = 0 if and only if f and g have a factor in common.
It is now easy to prove the following propositions:
Proposition 77 If the ↵i are the roots of f , then
Res(f, g) = amn
n
Y
i=1
g(↵i).
Proposition 78 If the �i are the roots of g, then
Res(f, g) = (�1)mnbnm
m
Y
i=1
f(�i).
Proposition 79 Res(f, g) = amn b
n
m
Qn
i=1
Qm
j=1 (↵i � �j).
Proof [Duv87]. We write the right hand sides of the three propositions as
R2(f, g) = a
m
n
n
Y
i=1
g(↵i),
R3(f, g) = (�1)mnbnm
m
Y
i=1
f(�i),
R4(f, g) = a
m
n b
n
m
n
Y
i=1
m
Y
j=1
(↵i � �j).
294 APPENDIX A. ALGEBRAIC BACKGROUND
It is clear that R2(f, g) and R3(f, g) are equal to R4(f, g). The three proposi-
tions are proved simultaneously, by induction on the integer min(n,m). If f and
g are swapped, their resultant is multiplied by (�1)nm, and gives R4(f, g) =
(�1)nmR4(g, f), while R2(f, g) = (�1)nmR3(g, f). We can therefore suppose
that n m. Moreover R2(f, g) is equal to amn when n = 0, as is the resultant
of f and g, and R4(f, g) is zero when n > 0 and h = 0, as is the resultant.
It only remains to consider the case when m � n > 0 and h 6= 0. But then
R2(f, g) = a
m�p
n R2(f, h) for g(↵i) = h(↵i) for each root ↵i of f , and the al-
gorithm shows that Res(f, g) = am�pn Res(f, h), from which we get the desired
result.
Corollary 20 Res(fg, h) = Res(f, h)Res(g, h).
Lemma 17 If f and g are polynomials in y and other variables, of total degrees
df and dg, then Resy(f, g) is a polynomial of total degree at most dfdg.
A.1.2 Discriminants
Definition 111 The discriminant of f , Disc(f) or Discx(f), is
a2n�2n
n
Y
i=1
n
Y
j=1
j 6=i
(↵i � ↵j).
Proposition 80 Disc(f) = 0 if, and only if, f has a repeated root, i.e. is not
square-free.
Proposition 81 Res(f, f 0) = an Disc(f). Moreover Disc(f) 2 R.
Corollary 21 Disc(fg) = Disc(f)Disc(g)Res(f, g)2.
Whichever way they are calculated, the resultants are often quite large. For
example, if the ai and bi are integers, bounded by A and B respectively, the
resultant is less than (n+ 1)
m/2
(m+ 1)
n/2
AmBn, but it is very often of this
order of magnitude (see section A.2). Similarly, if the ai and bi are polynomials
of degree ↵ and � respectively, the degree of the resultant is bounded bym↵+n�.
A case in which this swell often matters is the use of resultants to calculate
primitive elements, which uses the following result.
Proposition 82 If ↵ is a root of p(x) = 0, and � is a root of q(x,↵) = 0, then
� is a root of Resy(y � p(x), q(x, y)).
A.1.3 Iterated Operations
Suppose we wish to eliminate y and z from the three polynomials fi(x, y, z),
each with total degree d. We could compute
R123 := Resy(Resz(f1, f2),Resz(f1, f3)). (A.1)
A.2. USEFUL ESTIMATES 295
By Lemma 17, each inner resultant has degree at most d2, so R123 has total
degree at most d4.
TO BE COMPLETED[BM09]
A.2 Useful Estimates
Estimates of the sizes of various things crop up throughout computer algebra.
They can be essentially of three kinds.
• Upper bounds: X B.
• Lower bounds: X � B.
• Average sizes: X “is typically” B. This may be a definitive average
result (but even here we have to be careful what we are averaging over:
the average number of factors of a polynomial is one, for example, but this
does not mean we can ignore factorisation), or some heuristic “typically
we find”.
Estimates are used throughout complexity theory, of course. “Worst case” com-
plexity is driven by upper bound results, while “average case” complexity is
driven by average results. They are also used in algorithms: most algorithms of
the ‘Chinese Remainder’ variety (section 4) rely on upper bounds to tell when
they have used enough primes/evaluation points. In this case, a better upper
bound generally translates into a faster algorithm in practice.
A.2.1 Matrices
How big is the determinant |M | of an n⇥ n matrix M?
Notation 39 If v is a vector, then |v| denotes the Euclidean norm of v,
p
P |v2i |. If f is a polynomial, |f | denotes the Euclidean norm of its vector
of ceo�cients.
Proposition 83 If M is an n⇥ n matrix with entries B, |M | n!Bn.
This bound is frequently used in algorithm analysis, but we can do better.
Proposition 84 [Hadamard bound Hr] If M is an n ⇥ n matrix whose rows
are the vectors vi, then |M | Hr =
Q ||vi||2, which in turn is nn/2Bn.
Corollary 22 If f and g are polynomials of degrees n and m respectively, then
Res(f, g) (m+ n)(m+n)/2||f ||m2 ||g||n2 .
Corollary 23 If f is a polynomial of degree n, then Disc(f) nn�1||f ||2n�12 .
In practice, especially if f is not monic, it is worth taking rather more care over
this estimation.
296 APPENDIX A. ALGEBRAIC BACKGROUND
Proposition 85 (Hadamard bound (columns)) If M is an n ⇥ n matrix
whose columns are the vectors vi, then |M | Hc =
Q ||vi||2 nn/2Bn.
In practice, for general matrices one computes min(Hr, Hc). While there are
clearly bad cases (e.g. matrices of determinant 0), the Hadamard bounds are
“not too bad”. As pointed out in [AM01], log(min(Hr, Hc)/|M |) is a measure
of the “wasted e↵ort” in a modular algorithm for computing the determinant,
and “on average” this is O(n), with a variance of O(log n). It is worth noting
that this is independent of the size of the entries.
Proposition 85 has a useful consequence.
Corollary 24 If x is the solution to M.x = a, where M and a have integer
entries bounded by B and A respectively, then the denominators of x are bounded
by min(Hr, Hc) and the numerators of x are bounded by n
n/2ABn�1.
Proof. This follows from the linear algebra result that x = 1|M |adj(M).a, where
adj(M), the adjoint of M , is the matrix whose (i, j)th entry is the determinant
of the matrix obtained by striking row i and column j from M . The ith entry
of adj(M).a is then the determinant of the matrix obtained by replacing the ith
column of M by a, and we can apply Proposition 85 to this matrix.
A.2.2 Coe�cients of a polynomial
Here we are working implicitly with polynomials with complex coe�cients,
though the bounds will be most useful in the case of integer coe�cients.
Proposition 86 In terms of Notation 38,
H(f) ||f || L(f) (n+ 1)H(f),
where the first inequality is strict unless f is a monomial, the second is strict
unless all the ai are equal or zero, and the third is strict unless all the ai are
equal.
Observation 20 The last inequality could be replaced by cH(f), where c is
the number of nonzero monomials in f , but this seems not to be much exploited.
Proposition 87 (Landau’s Inequality [Lan05], [Mig89, §4.3]) .
M(f) ||f ||
and the inequality is strict unless f is a monomial.
Corollary 25 |an�i|
�
n
i
�
M(f)
�
n
i
�
||f ||.
The first part of this follows from the fact that an�i is ±an times a sum of
�
n
i
�
products of roots, and products of roots are bounded by Proposition 87. For
some applications, e.g. Theorem 36, we often bound
�
n
i
�
by 2n, but for others,
such as Proposition 55, the more precise value is needed. 2n might seem like
A.2. USEFUL ESTIMATES 297
overkill, but in fact, both in general and in the application to factoring [Mig81],
2 cannot be replaced by any smaller number.
It should be noted that the Landau–Mignotte bound is not the only way to
bound the coe�cients of a polynomial: [Abb09] gives four methods that depend
on knowing the degree of the factor being searched for, and two others that will
bound the least height of a factor. Depressingly, he gives a family of examples
that shows that no bound is superior to any other, and indeed [Abb09, 3.3.5]
it may be necessary to “mix-and-match” using di↵erent bounds for di↵erent
coe�cients.
These results can be generalised to polynomials in several variables [Mig89,
Chapter 4.4]. The definition of M(f) is more complicated.
Definition 112 The Mahler measure of f 2 C[x1, . . . , xn] is defined induc-
tively on n, using Notation 38 for n = 1 and more generally
logM(f) =
Z 1
0
log
�
M(f(e2⇡it, x2, . . . , xn))
�
dt.
Proposition 88 ([Mig89, Proposition 4.8]) Let
f(x1, . . . , xn) =
d1
X
i1=0
d2
X
i2=0
· · ·
dn
X
in=0
ai1,…,inx
i1
1 x
i2
2 . . . x
in
n .
Then
|ai1,…,in |
✓
d1
i1
◆✓
d2
i2
◆
· · ·
✓
dn
in
◆
M(f).
Proposition 87 is still valid.
Corollary 26 Hence
|ai1,…,in |
✓
d1
i1
◆✓
d2
i2
◆
· · ·
✓
dn
in
◆
||f ||.
A.2.3 Roots of a polynomial
Several questions can be asked about the roots of a univariate polynomial. The
most obvious ones are how large/small can they be, but one is often interested in
how close they can be. These questions are often asked of the real roots (section
3.5.5), but we actually need to study all roots, real and complex. The distinction
between real and complex roots is pretty fine, as shown in the following example.
Example 39 (Wilkinson Polynomial [Wil59]) Let W20 have roots at �1 ,
�2 , . . . , �20, so that W20 = (x+1)(x+2) . . . (x+20) = x20+210×19+· · ·+20!.
Consider now the polynomial W20(x) + 2
�23×19. One might expect this to have
twenty real roots close to the original ones, but in fact it has ten real roots, at
approximately �1, �2, . . .�7, �8.007, �8.917 and �20.847, and five pairs of
complex conjugate roots, �10.095±0.6435i, �11.794±1.652i, �13.992±2.519i,
�16.731± 2.813i and �19.502± 1.940i.
298 APPENDIX A. ALGEBRAIC BACKGROUND
The discriminant of W20 is 2.74⇥10275, which would seem well away from zero.
However, the largest coe�cient of W20 is 1.38⇥1019, and of W 020 is �3.86⇥1019.
The norms of W20 and W
0
20 are 2.27⇥1019 and 6.11⇥1019, so corollary 23 gives
a bound of 4.43 ⇥ 10779 for Disc(W20), and a direct application of corollary
22 gives 3.31 ⇥ 10763. Hence the discriminant of W20 is “much smaller than
it ought to be”, and W20 is “nearly not square-free”. Put another way, the
Sylvester matrix for the discriminant is very illconditioned (this was in fact
Wilkinson’s original motivation for constructing W20): the discrepancy between
the actual determinant and corollary 22 is 489 decimal digits, whereas [AM01]
would lead us to expect about 17.
Notation 40 Let f 2 C[x] =Pni=0 aixi, and let the roots of f be ↵1, . . . , ↵n,
and define
rb(f) = max
1in
|↵i|,
sep(f) = min
1i
p
3|Disc(f)|n�(n+2)/2|f |1�n.
We note that the bound is zero if, and only if, the discriminant is zero, as it
should be, and this bound is unchanged if we multiply the polynomial by a
constant. The bound for W20 is 7.27 ⇥ 10�245, but for the centred variant it
becomes 1.38⇥10�113. Should we ever need a root separation bound in practice,
centring the polynomial first is almost always a good idea. Similarly re-balancing
changes the separation bound to 5.42⇥ 10�112, which means 6.52⇥ 10�113 for
the original polynomial.
These bounds seem very far away from reality (i.e. 1 for Wn), but in fact
are almost optimal.
Proposition 94 ([ESY06, Theorem 3.6]) Let a � 3 be an L-bit integer, and
n � 4 be an even integer. Then Q(x) =
�
xn � 2(ax� 1)2
� �
xn � (ax� 1)2
�
has
degree 2n and coe�cients of size O(L), but any sub-division method requires
⌦(nl) subdivisions to isolate the roots, of which there are three in (a�1�h, a�1+
h) with h = a�n/2�1.
A.2.5 Developments
The monic polynomials of degree 7 with maximum root separation and all roots
in the unit circle are:
six complex x7 � 1, with discriminant �823543, norm
p
2 and root bounds
[2.0, 1.383087554, 2.0]
and root separation bound (proposition 93)
p
3
56
⇡ 3.09⇥ 10�2 (true value
0.868);
2For the history of the attribution to Grae↵e, see [Hou59].
300 APPENDIX A. ALGEBRAIC BACKGROUND
four complex x7�x, with discriminant 66 = 46656, norm
p
2 and root bounds
[2.0, 1.383087554, 2.0]
and root separation bound (proposition 93) 27
p
21
18607
⇡ 7.361786385 ⇥ 10�3
(true value 1);
two complex x7 � 1
4
x5 � x3 + 1
4
x, with discriminant �50625
4096
⇡ �12.36, norm
1
4
p
34 ⇡ 1.457737974 and root bounds
[2, 1.626576562, 2.0]
and root separation bound (proposition 93) 1880
p
21
82572791
⇡ 9.99 ⇥ 10�5 (true
value 1/2);
all real x7 � 14
9
x5 + 49
81
x3 � 4
81
x, with discriminant3
10485760000
1853020188851841
⇡ 5.66⇥ 10�6, (A.2)
norm 17
81
p
86 ⇡ 1.946314993 and root bounds
[
23
9
, 3.299831645, 2.494438258]
and root separation bound (proposition 93) 83980800
32254409474403781
p
3
p
7 ⇡ 1.19⇥
10�8 (true value 1/3).
If there are n + 1 equally-spaced real roots (call the spacing 1 for the time
being), then the first root contributes n! to
Q
i
Q
j 6=i(↵i � ↵j), the second root
1!(n� 1)! and so on, so we have a total product of
n
Y
i=0
i!(n� i)! =
n
Y
i=0
i!2 =
n
Y
i=0
i!
!2
. (A.3)
This is sequence A055209 [Slo07], which is the square of A000178.
If we now assume that the roots are equally-spaced in [�1, 1], then the
spacing is 2/n, we need to correct equation (A.3) by dividing by (n/2)n(n�1):
call the result Cn. C is initially greater than one, with C3 =
65536
59049
⇡ 1.11, but
C4 =
81
1024
, C5 =
51298814505517056
37252902984619140625
⇡ 0.001377042066, and C6 as in (A.2).
While assuming equal spacing might seem natural, it does not, in fact, lead
to the largest values of the discriminant. Consider polynomials with all real
roots 2 [�1, 1], so that we may assume the extreme roots are at ±1.
degree 4 Equally spaced roots, at ± 1
3
, give a discriminant of 65536
59049
⇡ 1.11,
whereas ± 1p
5
gives 4096
3125
⇡ 1.31, the optimum. The norms are respectively
p
182
9
⇡ 1.4999 and
p
62
5
⇡ 1.575.
3We note how the constraint that all the roots be real forces the discriminant to be small.
A.2. USEFUL ESTIMATES 301
degree 5 Equally spaced roots, at ± 1
2
and 0, give a discriminant of 81
1024
⇡
0.079, whereas ±
q
3
7
and 0 gives 12288
16807
⇡ 0.73, the optimum. The norms
are respectively
p
42
4
⇡ 1.62 and
p
158
7
⇡ 1.796.
degree 6 Equally spaced roots, at± 3
5
and± 1
5
, give a discriminant of 51298814505517056
37252902984619140625
⇡
0.00138. Unconstrained solving for the maximum of the discriminant, us-
ing Maple’s Groebner,Solve, starts becoming expensive, but if we assume
symmetry, we are led to choose roots at
±
p
147±42
p
7
21
, with a discriminant
of 67108864
16209796869
⇡ 0.0041. The norms are respectively 2
p
305853
625
⇡ 1.77 and
2
p
473
21
⇡ 2.07.
degree 7 Equally spaced roots, at± 2
3
, ± 1
3
and 0, give a discriminant of 209715200000
5615789612636313
⇡
5.66 · 10�6. Again assuming symmetry, we are led to choose roots at
±
p
495±66
p
15
33
and 0, which gives 209715200000
5615789612636313
⇡ 3.73 ·10�5. The norms
are respectively 17
p
86
81
⇡ 1.95 and 2
p
1577
33
⇡ 2.41,
degree 8 Equally spaced roots, at ± 5
7
, ± 3
7
and ± 1
7
, give a discriminant of
⇡ 5.37 · 10�9. Assuming symmetry, we get roots at ± ⇡ 0.87, ± ⇡ 0.59
and ± ⇡ 0.21, with a discriminant of ⇡ 9.65 · 10�8. The norms are
respectively ⇡ 2.15 and
p
2
p
727171
429
⇡ 2.81.
degree 9 Equally spaced roots, at ± 3
4
, ± 1
2
, ± 1
4
and 0, give a discriminant of
1.15·10�12. Assuming symmetry, we get roots at ±0.8998, ±0.677, ±0.363
and zero, with a discriminant of ⇡ 7.03·10�11. The norms are respectivelyp
5969546
1024
⇡ 2.39 and ⇡ 3.296.
If we now consider the case with two complex root, which may as well be at
x = ±i, we have the following behaviour.
degree 4 The maximal polynomial is x4�1, with discriminant �256 and normp
2. The bound is
p
6
216
⇡ 0.153.
degree 5 The maximal polynomial is x5�x, with discriminant �256 and normp
2. The bound is 4
p
15
625
⇡ 0.0248.
degree 6 Equally spaced roots, at ±1
3
, gives a discriminant of �108.26 and
a norm of 2
p
41
9
. The bound is 400
p
132
1860867
⇡ 2.38 · 10�3. The maximal
discriminant is attained with roots at ±1p
3
, with discriminant �4194304
19683
⇡
�213.09 and norm 2
p
5
3
. The bound is 4
p
5
3375
⇡ 2.65 · 10�3.
degree 7 Equally spaced roots, at ±1
2
and 0, gives a discriminant of ⇡ �12.36
and norm of
p
34
4
⇡ 1.46. The bound is 1800
p
21
82572791
⇡ 9.99 · 10�5. The
maximal discriminant is attained with roots at 4
q
3
11
, with discriminant
⇡ 40.8 and norm of 2
p
77
11
⇡ 1.596. The bound is ⇡ 1.06 · 10�4.
302 APPENDIX A. ALGEBRAIC BACKGROUND
A.3 Chinese Remainder Theorem
In this section we review the result of the title, which is key to the methods in
section 4.2.4, and hence to much of computer algebra.
Theorem 50 (Chinese Remainder Theorem (coprime form)) Two con-
gruences
X ⌘ a (mod M) (A.4)
and
X ⌘ b (mod N), (A.5)
where M and N are relatively prime, are precisely equivalent to one congruence
X ⌘ c (mod MN). (A.6)
By this we mean that, given any a, b, M and N , we can find such a c that
satisfaction of (A.4) and (A.5) is precisely equivalent to satisfying (A.6). The
converse direction, finding (A.4) and (A.5) given (A.6), is trivial: one takes a
to be c (mod M) and b to be d (mod N).
Algorithm 47 (Chinese Remainder)
Input: a, b, M and N (with gcd(M,N) = 1).
Output: c satisfying Theorem 50.
Compute �, µ such that �M + µN = 1;
#The process is analogous to Lemma 1 (page 63)
c:=a+ �M(b� a);
Clearly c ⌘ a (mod M), so satisfying (A.6) means that (A.4) is satisfied. What
about (A.5)?
c = a+ �M(b� a)
= a+ (1� µN)(b� a)
⌘ a+ (b� a) (mod N)
so, despite the apparent asymmetry of the construction, c ⌘ b (mod N) as
well.
In fact, we need not restrict ourselves to X being an integer: X, a and b
may as well be polynomials (but M and N are still integers).
Algorithm 48 (Chinese Remainder (Polynomial form))
Input: Polynomials a =
Pn
i=0 aix
i, b =
Pn
i=0 bix
i, and integers M and N
(with gcd(M,N) = 1).
Output: A polynomial =
Pn
i=0 cix
i satisfying Theorem 50.
Compute �, µ such that �M + µN = 1;
#The process is analogous to Lemma 1 (page 63)
for i := 0 to n do
ci:=ai + �M(bi � ai);
A.4. CHINESE REMAINDER THEOREM FOR POLYNOMIALS 303
In some applications, e.g. Section 4.1, we will want to apply Algorithm 47 re-
peatedly to many primes pi known in advance, to deduce a value c modulo
N :=
Q
pi from many values ai (mod pi). While the obvious way to do this is
“one prime at a time”, applying Algorithm 47 to p1 and p2, then to p1p2 and p3,
and so on, the correct way to do this is “balanced combination”, first combining
the pi in pairs, then the pairs to form quadruples and so on.
Proposition 95 In this case, the dominant cost is that of the last step, and
with classical arithmetic this is O((logN)2).
A.4 Chinese Remainder Theorem for Polynomi-
als
The theory of the previous section has an obvious generalisation if we replace
Z by K[y] for a field K, and an equivalent application to sections 4.3–4.4.
Theorem 51 (Chinese Remainder Theorem for polynomials) Two con-
gruences
X ⌘ a (mod M) (A.7)
and
X ⌘ b (mod N), (A.8)
where M and N are relatively prime polynomials in K[y], are precisely equivalent
to one congruence
X ⌘ c (mod MN). (A.9)
By this we mean that, given any a, b, M and N , we can find such a c that
satisfaction of (A.7) and (A.8) is precisely equivalent to satisfying (A.9). The
converse direction, finding (A.7) and (A.8) given (A.9), is trivial: one takes a
to be c (mod M) and b to be d (mod N).
Algorithm 49 (Chinese Remainder for Polynomials)
Input: a, b, M and N 2 K[y] (with gcd(M,N) = 1).
Output: c satisfying Theorem 51.
Compute �, µ such that �M + µN = 1;
#As in Lemma 1 (page 63). Note µ isn’t needed in practice.
c:=a+ �M(b� a);
Clearly c ⌘ a (mod M), so satisfying (A.9) means that (A.7) is satisfied. What
about (A.8)?
c = a+ �M(b� a)
= a+ (1� µN)(b� a)
⌘ a+ (b� a) (mod N)
304 APPENDIX A. ALGEBRAIC BACKGROUND
so, despite the apparent asymmetry of the construction, c ⌘ b (mod N) as
well.
As in proposition 95, the correct way to to combine values modulo many (say
d) evaluations is “balanced combination”, first combining the x � vi in pairs,
then the pairs to form quadruples and so on.
Proposition 96 In this case, the dominant cost is that of the last step, and
with classical arithmetic this is O(d2).
As in Algorithm 48, X, a and b may as well be polynomials in x whose
coe�cients are polynomials in y (but M and N are still in y only).
Algorithm 50 (Chinese Remainder (Multivariate))
Input: Polynomials a =
Pn
i=0 aix
i, b =
Pn
i=0 bix
i 2 K[y][x], and M and
N 2 K[y] (with gcd(M,N) = 1).
Output: A polynomial =
Pn
i=0 cix
i satisfying Theorem 51.
Compute �, µ such that �M + µN = 1;
#As in Lemma 1 (page 63). Note µ isn’t needed in practice.
for i := 0 to n do
ci:=ai + �M(bi � ai);
It is even possible for x (and i) to represent numerous indeterminates, as we are
basically just doing coe�cient-by-coe�cient reconstruction.
Observation 21 We have explicitly considered K[y] (e.g. Q[y]), but in practice
we will often wish to consider Z[y]. Even if all the initial values are in Z[x],
� and µ may not be, as in M = (y � 1) and N = (y + 1), when � = �1
2
and
µ = 1
2
. This may not be an obstacle to such reconstruction from values (often
called interpolation, by analogy with interpolation over R). Interpolation over
Z[y] may still be possible: consider reconstructing X with X ⌘ 1 (mod M)
and X ⌘ �1 (mod N), which is 1+ �1
2
M(�1�1) = y. But interpolation over
Z[y] is not always possible: consider reconstructing X with X ⌘ 1 (mod M)
and X ⌘ 0 (mod N), which gives 1
2
(y � 1).
A.5 Vandermonde Systems
Definition 113 The Vandermonde matrix4 generated by k1, . . . , kn is
V (k1, . . . , kn) =
0
B
B
B
@
1 k1 k
2
1 . . . k
n�1
1
1 k2 k
2
2 . . . k
n�1
2
…
…
…
…
1 kn k
2
n . . . k
n�1
n
1
C
C
C
A
.
4This section is based on [Zip93, §13.1]. Our algorithm 51 is his SolveVanderMonde, and
our algorithm 52 is the one at the top of his p. 214.
A.5. VANDERMONDE SYSTEMS 305
Notation 41 In the context of V (k1, . . . , kn), let P (z) =
n
Y
i=1
(z�ki) and Pj(z) =
n
Y
i=1
i 6=j
(z � ki).
Proposition 97 The inverse of a Vandermonde matrix has a particularly sim-
ple form: V (k1, . . . , kn)
�1 = (mi,j) with
mi,j =
coeff(Pj(z), z
i)
Q
k 6=j(kj � kk)
=
coeff(Pj(z), z
i)
Pj(kj)
. (A.10)
For example
V (k1, . . . , k3)
�1 =
0
B
B
B
@
k2k3
(�k2+k1)(�k3+k1)
� k1k3
(�k2+k1)(k2�k3)
k1k2
(k2�k3)(�k3+k1)
� k2+k3
(�k2+k1)(�k3+k1)
k1+k3
(�k2+k1)(k2�k3)
� k1+k2
(k2�k3)(�k3+k1)
1
(�k2+k1)(�k3+k1)
� 1
(�k2+k1)(k2�k3)
1
(k2�k3)(�k3+k1)
1
C
C
C
A
Corollary 28 If all the ki are distinct, then V (k1, . . . , kn) is invertible.
Equation (A.10) and the fact that the Pj can be rapidly computed form p suggest
a rapid way of computing the elements of the inverse of a n⇥ n Vandermonde
matrix in O(n2) time. In fact, we can solve a system of Vandermonde linear
equations in O(n2) time and O(n) space.
Algorithm 51 (Vandermonde solver)
Input: Vandermonde matrix V (k1, . . . , kn), right-hand side w.
Output: Solution x to V (k1, . . . , kn)x = w
x := 0
P :=
Qn
i=1(z � ki) #O(n2)
for i := 1 to n
Q := P/(z � ki) #nO(n)
D := P (ki) #nO(n) by Horner’s rule
for j := 1 to n
xj := xj + wi
coeff(Q,zj�1)
D
In section 4.4.2 we will want to solve a slightly di↵erent system. Algorithm 51
solves a system of the form
x1 + k1x2 + k
2
1×3 + · · · + kn�11 xn = w1
x1 + k2x2 + k
2
2×3 + · · · + kn�12 xn = w1
…
x1 + knx2 + k
2
nx3 + · · · + kn�1n xn = w1
(A.11)
306 APPENDIX A. ALGEBRAIC BACKGROUND
whereas we need to solve a system of the form
k1x1 + k2x2 + · · · + knxn = w1
k21x1 + k
2
2×2 + · · · + k2nxn = w2
…
kn1 x1 + k
n
2 x2 + · · · + knnxn = wn .
(A.12)
By comparison with (A.11), we have transposed the matrix (which is not a
problem, since the inverse of the transpose is the traspose of the inverse), and
multiplied column i by an extra factor of ki. From Corollary 28, we can deduce
the criteria for this system to be soluble.
Corollary 29 If all the ki are distinct and non-zero, then the system (A.12) is
soluble.
The following variant of Algorithm 51 will solve the system.
Algorithm 52 (Vandermonde variant solver)
Input: Vandermonde style data (k1, . . . , kn), right-hand side w.
Output: Solution x to (A.12)
x := 0
P :=
Qn
i=1(z � ki) #O(n2)
for i := 1 to n
Q := P/(z � ki) #nO(n)
D := kiP (ki) #nO(n) by Horner’s rule
for j := 1 to n
xi := xi + wj
coeff(Q,zj�1)
D
A.6 More matrix theory
See, for the time being, then entry Minor_(linear_algebra) in Wikipedia until
JHD has written this. TO BE COMPLETED
A.7. ALGEBRAIC STRUCTURES 307
A.7 Algebraic Structures
In this section we gather together many of the definitions we have seen through-
out the book in one table. The definitive references, as far as computer algebra
(rather than abstract algebra) is concerned are [DT90, DGT91], which explain
why we say “g.c.d. domain” rather than “unique factorisation domain”: the
argument is summarised in note 30 (page 61).
Table A.1: Algebraic Structures
Name Reference Example(s)
(#What’s added#)
Ring (+,�, 0, ⇤) Definition 8 2⇥ 2 matrices
#commutative *#
Commutative ring {0, 2, 4, 6, 8, 10} (mod 12)
#multiplicative identity#
(Comm.) ring with 1 {0, 1, 2, . . . , 11} (mod 12) = Z12
#No zero divisors#
Integral Domain Definition 11 {a+ b
p
�5} (Example 3)
#gcd operation#
g.c.d. domain Definition 32 Z[x, y]
#All ideals principal#
principal ideal domain Definition 14 Z; Q[x]
#division#
field Definition 15 Q; Zp
#Roots of polynomials# (see Proposition 51)
Algebraically closed field Definition 18 C
Note that any integral domain can be extended into its field of fractions (Def-
inition 16), and any field can be extended into its algebraic closure (Definition
19).
308 APPENDIX A. ALGEBRAIC BACKGROUND
Appendix B
Excursus
This appendix includes various topics on computer algebra that do not seem to
be well-treated in the literature.
B.1 The Budan–Fourier Theorem
Definition 114 Given a sequence A of non-zero numbers a0, . . . , an, the num-
ber of sign variations of A, witten V (A), is the number of times two consecutive
elements have di↵erent signs, i.e. aiai+1 < 0. If A does contain zeros, we
erase them before doing this computation, or equivalent we count the number of
(i, j) with aiaj < 0 and all intermediate ai+1, . . . , aj�1 = 0. If f is the polyno-
mial anx
n + · · · a0, we write V (f) rather than V (A) where A is the sequence of
coe�cients of f .
Proposition 98 V (f) = V (f(0), f 0(0), . . . , f (n)(0) for a polynomial f of degree
n.
The reader should note that the definition of variation is numerically unsta-
ble. V (1) = 0, and therefore (by the erasure rule) V (1, 0) = 0. For positive
✏, V (1, ✏) = 0, but V (1,�✏) = 1. This is related to the fact that x + ✏ has no
positive real roots, but x� ✏ has one, as seen in the following result.
Theorem 52 (Descartes’ rule of signs [CA76]) (the number of roots of f
in (0,1) is less than or equal to, by an even number, V (f).
Corollary 30 The number of roots of f in (a,1) is less than or equal to, by
an even number, V (f(x� a)).
Corollary 31 The number of roots of f in (a,1) is less than or equal to, by
an even number, V ((f(a), f 0(a), . . . , f (n)(a)).
For dense f , there is not a great deal to choose between these two formulations,
but, since the derivatives of a sparse polynomial are sparse but its translate is
not, corollary 31 is greatly to be preferred to corollary 30 in the sparse case.
309
310 APPENDIX B. EXCURSUS
We can also deduce some results about the number of roots of sparse poly-
nomials. If f has n non-zero terms, V (f) n � 1. We note that V (axk) = 0,
and this polynomial indeed has no roots in (0,1).
Corollary 32 A polynomial in x with n terms, not divisible by x, has at most
2(n� 1) roots in R. If it is divisible by x, then the answer is at most 2n� 1.
The example of x3�x, which has three real roots (±1, 0) shows that the special
case is necessary.
For the sake of simplicity, we will consider only square-free polynomials in
the rest of this section: the results generalise fairly easily to non square-free
ones.
Let us fix f , and consider V (y) := V (f(x�y)) and N(y) := |{x > y : f(x) =
0}| as functions of y. For large enough y, both are zero. As y decreases, N(y)
increases monotonically, by 1 at each root of f . In fact, the same monotonic
behaviour is true of V (y), increasing by 1 at roots of f and by 2 at certain other
points. This allows us to compute the number of roots in an interval, a result
known as the Budan–Fourier Theorem1.
Corollary 33 (Budin–Fourier Theorem) The number of roots of f in (a, b)
is less than or equal to, by an even number, V (f(x� a))� V (f(x� b)).
Corollary 34 (Budin–Fourier Theorem [Hur12]) The number of roots of
f in (a, b) is less than or equal to, by an even number, V ((f(a), f 0(a), . . . , f (n)(a))�
V ((f(b), f 0(b), . . . , f (n)(b)).
For the same reasons as above, corollary 34 is to be preferred in the case of
sparse polynomials.
An almost complete generalisation of Corollary 32, in the sense that the
bounds depend on the number of monomials rather than the degrees, to sparse
polynomials in n variables is given in [BHNS15, §2].
B.2 Equality of factored polynomials
This section treats the following problem.
Problem 8 Given two polynomials in factored form (section 2.1.3), are they
equal? More precisely, if
f =
n
Y
i=1
faii g =
m
Y
j=1
g
bj
j ,
with the fi and gj square-free and relatively prime, i.e.:w gcd(fi, fi0) = 1,
gcd(gj , gj0) = 1), is f
?
=g.
1See [BdB22, Fou31]. The question of precedence was hotly disputed at the
time: see [Akr82] and http://www-history.mcs.st-andrews.ac.uk/Biographies/Budan_de_
Boislaurent.html.
http://www-history.mcs.st-andrews.ac.uk/Biographies/Budan_de_Boislaurent.html
http://www-history.mcs.st-andrews.ac.uk/Biographies/Budan_de_Boislaurent.html
B.2. EQUALITY OF FACTORED POLYNOMIALS 311
The obvious solution is to expand f and g, and check that the expanded forms
(which are canonical) are equal. Can we do better?
One important preliminary remark is that the square-free representation of
a polynomial is unique. This leads to the following result.
Proposition 99 If f = g, then every ai has to be a bj, and vice versa. For
each such occurring value k, we must verify
fk =
n
Y
i=1
ai=k
fi
?
=gk =
m
Y
j=1
bj=k
gj . (B.1)
In particular, fk and gk must have the same degree, i.e.
n
X
i=1
ai=k
deg(fi) =
m
X
j=1
bj=k
deg(gj). (B.2)
Again, we could check fk
?
=gk by expansion, but there is a better way.
Example 40 Let f = x2
l�1 and g = (x�1)(x+1)(x2+1) · · · (x2l�1+1), where
both are square-free, so proposition 99 does not help. f is already expanded, but
expansion of g can give rise to x2
l�1 + x2
l�2 + · · · + x + 1, which has length
O(2l), whereas g has length O(l).
From now on we will assume that we are working over a domain that includes
the integers.
Lemma 18 If f and g have degree at most N , and agree at N + 1 di↵erent
values, then f = g.
Proof. f � g is a polynomial of degree at most N , but has N + 1 zeros, which
contradicts proposition 5 if it is non-zero.
Hence it su�ces to evaluate f and g at N + 1 points xi and check that the
values agree. It is not necessary to construct any polynomial, and the integers
involved are bounded (if we choose |xi| < N) by BNN , where B is a function
of the coe�cients of the fi, gj . Furthermore, we can evaluate at all these points
in parallel. However, it does seem that we need to evaluate at N + 1 points, or
very nearly so, even if f and g are very small.
Open Problem 28 (Crossings of factored polynomials) Produce some non-
trivial bounds on the maximum number of zeros of f � g, where f and g have
small factored representations. See [RR90].
The best we can say is as follows. Suppose, in the notation of (B.1), each fi
has ki non-zero terms, and gj has lj non-zero terms, and no fi or gj is x (if
either was, then trivially f 6= g, since a common factor of x would have been
detected). Then, by Corollary 32, f � g, if it is not identically zero, has at most
2
✓
Pn
i=1
ai=k
ki +
Pm
j=1
bj=k
lj � 1
◆
roots in R, and hence, if it is zero when evaluated
at more than this number of integers, is identically zero. The factor of 2 can be
dropped if we use only positive evaluation points, and rely on Theorem 52.
312 APPENDIX B. EXCURSUS
B.3 Karatsuba’s method
This method was originally introduced in [KO63] for multiplying large integers:
however, it is easier to explain in the (dense) polynomial context, where issues
of carrying do not arise. Consider the product of two linear polynomials
(aX + b)(cX + d) = acX2 + (ad+ bc)X + bd. (B.3)
This method so patently requires four multiplications of the coe�cients that
the question of its optimality was never posed. However, [KO63] rewrote it as
follows:
(aX + b)(cX + d) = acX2 + [(a+ b)(c+ d)� ac� bd]X + bd, (B.4)
which only requires three distinct multiplications, ac and bd each being used
twice. However, it requires four coe�cients additions rather than one, so one
might question the practical utility of it. For future reference, we will also
express2 equation (B.4) as
(aX + b)(cX + d) = ac(X2 �X) + (a+ b)(c+ d)X + bd(1�X), (B.5)
which makes the three coe�cient multiplications explicit.
However, it can be cascaded. Consider a product of two polynomials of
degree three (four terms):
(a3Y
3 + a2Y
2 + a1Y + a0)(b3Y
3 + b2Y
2 + b1Y + b0). (B.6)
If we write X = Y 2, a = a3Y +a2 etc., this product looks like the left-hand side
of equation (B.4), and so can be computed with three multiplications of linear
polynomials, each of which can be done in three multiplications of coe�cients,
thus making nine such multiplications in all, rather than the classic method’s
16.
If the multiplicands have 2k terms, then this method requires 3k =
�
2k
�log2 3
multiplications rather than the classic
�
2k
�2
. For arbitrary numbers n of terms,
not necessarily a power of two, the cost of “rounding up” to a power of two is
subsumed in the O notation, and we see a cost of O(nlog2 3) rather than the
classic O(n2) coe�cient multiplications. We note that log2 3 ⇡ 1.585, and the
number of extra coe�cient additions required is also O(nlog2 3), being three extra
additions at each step. While the “rounding up” is not important in O-theory,
it matters in practice, and [Mon05] shows various other formulae, e.g.
(aX2 + bX + c)(dX2 + eX + f) =
cf(1�X) + be(�X + 2X2 �X3) + ad(�X3 +X4) + (b+ c)(e+ f)(X �X2)
+(a+ b)(d+ e)(X3 �X2) + (a+ b+ c)(d+ e+ f)X2,
2This formulation is due to [Mon05].
B.3. KARATSUBA’S METHOD 313
requiring six coe�cient multiplications rather than the obvious nine, or eight if
we write it as
�
aX2 + (bX + c)
� �
dX2 + (eX + f)
�
=
adX4 + aX2(eX + f) + dX2(bX + c) + (bX + c)(eX + f)
and use (B.4) on the last summand (asymptotics would predict 3log2 3 ⇡ 5.7, so
we are much nearer the asymptoic behaviour). Cascading this formula rather
than (B.4) gives O(nlog3 6), which as log3 6 ⇡ 1.63 is not as favorable asymp-
totically. His most impressive formula describes the product of two six-term
polynomials in 17 coe�cient multiplications, and log6 17 ⇡ 1.581, a slight im-
provement. We refer the reader to the table in [Mon05], which shows that his
armoury of formulae can get us very close to the asymptotic costs.
Many of these formulae can be explained as instances of the general formula
[Sco15, (1)]
n�1
X
i=0
xib
i
!
n�1
X
i=0
yib
i
!
=
0
@
n�1
X
i=i
i�1
X
j=0
(xi � xj)(yj � yi)bi+j
1
A+
n�1
X
i=0
bi
!
0
@
n�1
X
j=0
xjyjb
j
1
A .
Theorem 53 We can multiply two (dense) polynomials with m and n terms
respectively in O
�
max(m,n)min(m,n)(log2 3)�1
�
coe�cient operations.
Let the polynomials be f = am�1Y
m�1 + · · · and f = bn�1Y n�1 + · · · Without
loss of generality, we may assume m � n (so we have to prove O(mnlog2 3�1),
and write k = dm
n
e. We can then divide f into blocks with n terms (possibly
fewer for the most significant one) each: f =
Pk�1
i=1 fiY
in. Then
fg =
k�1
X
i=1
(fig)Y
in.
Each fig can be computed in time O(n
log2 3), and the addition merely takes
time O(m � n) (since there is one addition to be performed for each power of
Y where overlap occurs). Hence the total time is O(knlog2 3), and the constant
factor implied by the O-notation allows us to replace k = dm
n
e by m
n
, which
gives the desired result.
B.3.1 Karatsuba’s method in practice
When it comes to multiplying numbers of n digits (in whatever base we are
actually using, generally a power of 2), the received wisdom is to use O(n2)
methods for small n (say n < 16), Karatsuba-style methods for intermediate
n (say 16 n < 4096 and Fast Fourier Transform methods (section B.3.4 or
314 APPENDIX B. EXCURSUS
[SS71]) for larger n. However, it is possible for the Fast Fourier Transform
to require too much memory, and [Tak10] was forced to use these methods on
numbers one quarter the required size, and then nested Karatsuba to combine
the results.
B.3.2 Karatsuba’s method and sparse polynomials
The use of the Karatsuba formula and its variants for sparse polynomials is
less obvious. One preliminary remark is in order: in the dense case we split
the multiplicands in equation (B.6) or its equivalent in two (the same point for
each), and we were multiplying components of half the size. This is no longer
the case for sparse polynomials, e.g. every splitting point of
(a7x
7 + a6x
6 + a5x
5 + a0)(b
7 + b2x
2 + b1x+ b0) (B.7)
gives a 3–1 split of one or the other: indeed possibly both, as when we use x4
as the splitting point.
Worse, in equation (B.5), the component multiplications were on ‘smaller’
polynomials, whereas, measured in number of terms, this is no longer the case.
If we split equation (B.7) at x4, the sub-problem corresponding to (a+b)⇤(c+d)
in equation (B.5) is
(a7x
3 + a6x
2 + a5x
1 + a0)(b
3 + b2x
2 + b1x+ b0)
which has as many terms as the original (B.7). This di�culty led [Fat03] to
conclude that Karatsuba-based methods did not apply to sparse polynomials.
It is clear that they cannot always apply, since the product of a polynomial
with m terms by one with n terms may have mn terms, but it is probaby the
di�culty of deciding when they apply that has led system designers to shun
them.
B.3.3 Karatsuba’s method and multivariate polynomials
TO BE COMPLETED
B.3.4 Faster still
It is possible to do still better than 3k =
�
2k
�log2 3 for multiplying dense poly-
nomials f and g with n = 2k coe�cients. Let ↵0, . . . ,↵2n�1 be distinct values.
If h = fg is the desired product, then h(↵i) = f(↵i)g(↵i). Hence if we write
ef for the vector f(↵0), f(↵1), . . . , f(↵2n�1), then eh = ef ⇤ eg, where ⇤ denoted
element-by-element multiplication of vectors, an O(n) operation on vectors of
length 2n. This gives rise to three questions.
1. What should the ↵i be?
2. How do we e�ciently compute ef from f (and eg from g)?
B.3. KARATSUBA’S METHOD 315
3. How do we e�ciently compute h from eh?
The answer to the first question is that the ↵j should be the 2n-th roots of unity
in order, i.e. over C we would have ↵i = e
2⇡ij/2n. In practice, though, we work
over a finite field in which these roots of unity exist. If we do this, then the
answers to the next two questions are given by the Fast Fourier Transform (FFT)
[SS71], and we can compute ef from f (or h from eh) in time O(k2k) = O(n log n)
(note that we are still assuming n is a power of 2). Putting these together gives
O(n log n)
| {z }
f̃ from f
+O(n log n)
| {z }
g̃ from g
+ O(n)
| {z }
h̃ from f̃ , g̃
+O(n log n)
| {z }
h from h̃
= O(n log n). (B.8)
While this method works fine for multiplying polynomials over a finite field K
with an appropriate root of unity, or over C, it requires substantial attention to
detail to make it work over the integers. Nevertheless it is practicable, and the
point at which this FFT-based method is faster than ones based on Theorem
53 depends greatly on the coe�cient domain.
Notation 42 We will denote MZ(n) the asymptotic O(MZ(n)) time taken to
multiply two integers of length n, and MK[x](n) the number of operations in K
or its associated structures needed to multiply two polynomials with n terms (or
of degree n, since in O-speak the two are the same). Most authors omit the
subscripts, and speak of M(n), when the context is clear.
For integers, the point at which this FFT-based method is faster than ones
based on Theorem 53 is typically when the integers are of length a few thousand
words.
B.3.5 Faster division
TO BE COMPLETED
B.3.6 Faster g.c.d. computation
In order to use this method to compute g.c.d.s faster, we need to solve a slightly
more general problem: the extended Euclidean one3. If we look at Algorithm
5, the key updating operation is (2.16), which updates the matrix
✓
a b
c d
◆
based only on qi, which depends only on the leading entries of ai and ai�1.
Call this matrix, immediately after qi has been computed and used, R
i,0. Then
(2.14) defines R0,0 and (2.16) becomes
Ri,0 =
✓
0 1
1 �qi
◆
Ri�1,0. (B.9)
TO BE COMPLETED
3This section is base don [Moe73], generalising [Sch71].
316 APPENDIX B. EXCURSUS
B.4 Strassen’s method
Just as Karatsuba’s method (Excursus B.3) lets us multiply dense polynomials
of degree n � 1 (n terms) in O(nlog2 3 ⇡ n1.585) operations rather than O(n2),
so Strassen’s method, introduced in [Str69], allows us to multiply dense n ⇥ n
matrices in O(nlog2 7 ⇡ n2.807) operations rather than O(n3) operations. Again
like Karatsuba’s method, it is based on “divide and conquer” and an ingenious
base case for n = 2, analogous to (B.4).
✓
c1,1 c1,2
c2,1 c2,2
◆
=
✓
a1,1 a1,2
a2,1 a2,2
◆✓
b1,1 b1,2
b2,1 b2,2
◆
(B.10)
is usually computed as
✓
c1,1 c1,2
c2,1 c2,2
◆
=
✓
a1,1b1,1 + a1,2b2,1 a1,1b1,2 + a1,2b2,2
a2,1b1,1 + a2,2b2,1 a2,1b1,2 + a2,2b2,2
◆
(B.11)
This method so patently requires eight multiplications of the coe�cients that
the question of its optimality was never posed. However, [Str69] rewrote it as
follows:
M1 := (a1,1 + a2,2)(b1,1 + b2,2)
M2 := (a2,1 + a2,2)b1,1
M3 := a1,1(b1,2 � b2,2)
M4 := a2,2(b2,1 � b1,1)
M5 := (a1,1 + a1,2)b2,2
M6 := (a2,1 � a1,1(b1,1 + b1,2)
M7 := (a1,2 � a2,2)(b2,1 + b2,2)
c1,1 = M1 +M4 �M5 +M7
c1,2 = M3 +M5
c2,1 = M2 +M4
c2,2 = M1 �M2 +M3 +M6
9
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
;
(B.12)
requiring seven multiplications (rather than eight), though eighteen additions
rather than four, so one might question the practical utility of it. [Win71] has
a variant requiring fifteen rather than eighteen additions.
However, it can be cascaded. We first note that we do not require the
multiplication operation to be commutative4. Hence if we have two 4⇥4 matrices
to multiply, say
0
B
B
@
a1,1 a1,2 a1,3 a1,4
a2,1 a2,2 a2,3 a2,4
a3,1 a3,2 a3,3 a3,4
a4,1 a4,2 a4,3 a4,4
1
C
C
A
0
B
B
@
b,1,1 b,1,2 b,1,3 b,1,4
b,2,1 b,2,2 b,2,3 b,2,4
b,3,1 b,3,2 b,3,3 b,3,4
b,4,1 b,4,2 b,4,3 b,4,4
1
C
C
A
(B.13)
4A point often glossed over. For example, we can multiply 3 ⇥ 3 matrices in 22 multipli-
cations [Mak86], but this assumes commutativity and hence cannot be cascaded. The best
non-commutative results are 23 multiplications [Lad76, CBH11]. Asymptotically, both are in
any case worse than Strassen, since log
2
7 ⇡ 2.807, but log
3
23 ⇡ 2.854 and log
3
22 ⇡ 2.813.
B.4. STRASSEN’S METHOD 317
we can write this as
✓
A1,1 A1,2
A2,1 A2,2
◆✓
B1,1 B1,2
B2,1 B2,2
◆
, (B.14)
where A1,1 =
✓
a1,1 a1,2
a2,1 a2,2
◆
etc., and apply (B.12) both to the multiplcations
in (B.14), needing seven multiplications (and 18 additions) of 2 ⇥ 2 matrices,
and to those multiplications themselves, thus needing 49 = 7⇥7 multiplications
of entries, at the price of 198 = 7⇥ 18
| {z }
recursion
+ 18⇥ 4
| {z }
2⇥ 2 adds
additions, rather than 64
multiplications and 48 additions.
Similarly, multiplying 8⇥ 8 matrices can be regarded as (B.14), where now
A1,1 =
0
B
B
@
a1,1 a1,2 a1,3 a1,4
a2,1 a2,2 a2,3 a2,4
a3,1 a3,2 a3,3 a3,4
a4,1 a4,2 a4,3 a4,4
1
C
C
A
etc., needing seven multiplications (and 18
additions) of 4⇥4 matrices, hence 343 = 7⇥49 multiplications of entries, at the
price of 1674 = 7⇥ 198
| {z }
recursion
+ 18⇥ 16
| {z }
4⇥ 4 adds
additions, rather than 512 multiplications
and 448 additions.
If the matrices have 2k rows/columns, then this method requires 7k =
�
2k
�log2 7 multiplications rather than the classic
�
2k
�3
. For arbitrary sizes n,
not necessarily a power of two, the cost of “rounding up” to a power of two is
subsumed in the O notation5, and we see a cost of O(nlog2 7) rather than the
classic O(n3) coe�cient multiplications.
We note that log2 7 ⇡ 2.8074, and the number of extra coe�cient additions
required is also O(nlog2 7), being 18(n/2)2 additions at each step.
Theorem 54 (Strassen) We can multiply two (dense) n⇥ n matrices in
O(nlog2 7) ⇡ n2.8074
multiplications, and the same order of additions, of matrix entries.
B.4.1 Strassen’s method in practice
Strassen’s method, with floating-point numbers, has break-even points between
400 and 2000 rows (160,000 to 4 million elements) [DN07]. This is achieved
with one or two (occasionally three) levels of (B.12), followed by classical O(n3)
multiplication from then on. The pragmatist also notes that modern chips have
instructions for manipulating four-vectors of floating-point numbers in a single
instruction, so there is no advantage in applying (B.12) all the way.
5In principle: substantial ingenuity is required to get good performance in practice.
318 APPENDIX B. EXCURSUS
B.4.2 Further developments
Seven is in fact minimal for 2⇥ 2 matrices [Lan06]. 3⇥ 3 matrices can be mul-
tiplied in 23 multiplications rather than the obvious 27 (but log3 23 ⇡ 2.854 >
log2 7), and there is an approximation algorithm with 21 multiplications [Sch71],
giving a O(nlog3 21⇡2.77) general method, but for 3⇥ 3 matrices the best known
lower bound is 15 [Lic84, LO11]. In general the best known lower bound is
3n2 +O(n3/2) [Lan12, MR12].
In theory one can do better than Theorem 54, O(n2.376), [CW90]6, but
these methods require unfeasably large n to be cost-e↵ective. The complexity
of matrix multiplication has to be at least O(n2) since there are that many
entries.
Notation 43 It is customary to assume that the cost of n⇥ n matrix multipli-
cation is O(n!), where 2 ! 3.
B.4.3 Matrix Inversion
Lemma 19 (Frobenius) Let A be a square matrix divided into blocks
✓
P Q
R S
◆
,
with P square and nonsingular. Assume that U = S � R(P�1Q) is also non-
singular. Then
A�1 =
✓
P�1 + (P�1Q)(U�1RP�1) �(P�1Q)U�1
�(U�1RP�1) U�1
◆
. (B.15)
The proof is by direct verification. Note that the computation of the right-
hand side of (B.15) requires two matrix inverses (P and U) and six matrix
multiplications (viz. P�1Q, R(P�1Q), RP�1, U(RP�1), (P�1Q)(U�1RP�1)
and (P�1Q)U�1). If the conditions of Lemma 19 were always satisfied, and we
took P to be precisely one quarter the size of A (half as many rows and half as
many columns), this would show that the complexity of matrix inverse is also
O(n!) if ! > 2 (if ! = 2 we get an extra factor of log n).
Lemma 20 If B is nonsingular then A := BTB is symmetric and positive
definite (for any non-zero x, xAxT > 0).
Lemma 21 If A is symmetric and positive definite then, in the notation of
Lemma 19, so are P and U .
Theorem 55 If B is an invertible square matrix, B�1 can be computed in time
O(n!) if ! > 2 (if ! = 2 we get an extra factor of log n).
Proof. We compute A := BTB in time O(n!), its inverse by the process after
Lemma 19, and then B�1 := A�1AT (another O(n!)).
6Recently improved to O(n2.373) [Wil12]. This reference actually claimed 2.2327, but this
has been corrected to 2.2329 in [Wil14]. The most recent seems to be 2.3728639[LG14, §6.3].
B.5. CYCLOTOMIC POLYNOMIALS 319
B.5 Cyclotomic Polynomials
Definition 115 A polynomial is said to be cyclotomic if it divides xn � 1 for
sme n.
These innocent-looking polynomials are in fact a great nuisance in computer
algebra, as they defy most useful beliefs. We first saw them in equation 2.1,
where
xn � 1
x� 1 = x
n�1 + · · ·+ x+ 1
showed that the quotient of two two-term polynomials could have an arbitrary
number of terms.
In the complex plane, the roots of xn � 1 are the nth roots of unity, i.e.
e2⇡ik/n for 0 k < n. k = 0 gives us the factor x � 1. If k and n have a
common factor l, then e2⇡ik/n = e2⇡i(k/l)/(n/l) and is a root of xn/l � 1.
Definition 116 The nth cyclotomic polynomial, �n, has as roots all the roots
of xn � 1 that divide no earlier xn0 � 1, i.e.
�n(x) =
n�1
Y
k=1
gcd(k,n)=1
⇣
x� e2⇡ik/n
⌘
.
It is a consequence of Galois theory that this polynomial is in Q[x], and in fact
it is in Z[x]. In particular, if p is prime, �p = x
p�1 + · · ·+ x+ 1.
Proposition 100
xn � 1 =
Y
d|n
�d(x), (B.16)
and therefore, by a trick well-known to number-theorists,
�n(x) =
Y
d|n
(xd � 1)µ(n/d), (B.17)
where µ is the Möbius function:
µ(n) =
8
<
:
+1 n squarefree with an even number of prime factors
0 n not squarefree
�1 n squarefree with an odd number of prime factors
.
320 APPENDIX B. EXCURSUS
The first few factorisations of xn � 1 are pretty innocuous:
x2 � 1 = (x� 1)(x+ 1)
x3 � 1 = (x� 1)(x2 + x+ 1)
x4 � 1 = (x� 1)(x+ 1) (x2 + 1)
| {z }
�4
x5 � 1 = (x� 1)(x4 + x3 + x2 + x+ 1)
x6 � 1 = (x� 1)(x+ 1)(x2 + x+ 1) (x2 � x+ 1)
| {z }
�6
x7 � 1 = (x� 1)(x6 + x5 + x4 + x3 + x2 + x+ 1)
x8 � 1 = (x� 1)(x+ 1)(x2 + 1) (x4 + 1)
| {z }
�8
x9 � 1 = (x� 1)(x2 + x+ 1) (x6 + x3 + 1)
| {z }
�9(x)=�3(x3)
It follows from (B.16) that
x105 � 1 = (x� 1)�3�5�7�15�21�35�105,
but what is not so obvious is that
�105 = x
48 + x47 + x46 � x43 � x42 � 2x41 � x40 � x39 + x36 + x35 + x34
+x33 + x32 + x31 � x28 � x26 � x24 � x22 � x20 + x17 + x16 + x15
+x14 + x13 + x12 � x9 � x8 � 2x7 � x6 � x5 + x2 + x+ 1,
where the coe�cients of x41 and x7 are �2. The pattern continues: �385 has
coe�cients of ±3, �1385 has coe�cients of ±4, and the growth continues, albeit
apparently modestly. This is deceptive: �1181895 has coe�cients
7 of ±14102773,
�(43730115) has coe�cients8 of ±862550638890874931, and the current record
[Arn11, p. 89] is n = 99660932085 with �(n) having coe�cients almost as
large as n8. If we let A⇤(n) be the largest (in absolute value) coe�cient of
�(m) : m n, then we now know [Bat49, Erd49] that there are coe�cients
c1, c2 such that
ec2/ log logn < A⇤(n) < ec1/ log logn. (B.18)
This slightly obscure growth formula, which can be written (but see Notation
10) as A⇤(n) = n⇥(1/ log logn), means that A⇤(n) grows faster than any power of
n, but not quite as fast as en.
Other places that cyclotomic polynomials occur as special cases include Open
Problem 25.
71181895 is the least n with �(n) having coe�cents larger than n [Arn11, p. 89].
843730115 is the least n with �(n) having coe�cents larger than n2 [Arn11, p. 89].
Appendix C
Systems
This appendix discusses various computer algebra systems, especially from the
point of view of their internal data structures and algorithms, and how this
relates to the ideas expressed in the body of this book. We do not discuss the
user interfaces as such, nor is this intended to replace any manuals, or specialist
books.
Moses [Mos71] described systems as falling into various categories, depending
on the extent of automatic transformation. While most systems today are what
Moses would call ‘new left’, the extent of automatic transformation still varies
dramatically.
C.1 Axiom
C.1.1 Overview
See Figure C.1.
We note that the polynomials have been expanded and the greatest common
divisor cancelled (in the terminology of Definition 4 we have canonical forms for
this data type). We are told the data type of the result, Fraction Polynomial
Integer, i.e. the field of fractions of Z[variables]. In fact, this answer happens
to have denominator 1, so lies in Polynomial Integer, but the system does
not automatically check for such retractions.
C.1.2 History
This system [JS92], with which the author has been heavily involved, can be
seen as the first of the ‘Bourbakist’1 systems — since followed, in particular,
by Magma [BCM94] and SAGE. It was developed at IBM Yorktown Heights,
1Nicolas Bourbaki was the pseudonym of a group of mathematicians, largely French,
who wrote very influential abstract mathematical texts from the 1930s on. See http:
//www-history.mcs.st-andrews.ac.uk/HistTopics/Bourbaki_1.html.
321
http://www-history.mcs.st-andrews.ac.uk/HistTopics/Bourbaki_1.html
http://www-history.mcs.st-andrews.ac.uk/HistTopics/Bourbaki_1.html
322 APPENDIX C. SYSTEMS
Figure C.1: Axiom output
(1) -> (x^2-1)^10/(x-1)^10
(1)
10 9 8 7 6 5 4 3 2
x + 10x + 45x + 120x + 210x + 252x + 210x + 120x + 45x + 10x + 1
Type: Fraction Polynomial Integer
(2) -> %^2
(2)
20 19 18 17 16 15 14 13
x + 20x + 190x + 1140x + 4845x + 15504x + 38760x + 77520x
+
12 11 10 9 8 7 6
125970x + 167960x + 184756x + 167960x + 125970x + 77520x + 38760x
+
5 4 3 2
15504x + 4845x + 1140x + 190x + 20x + 1
Type: Fraction Polynomial Integer
originally under name of Scratchpad II, with its implementation language known
as Modlisp [Jen79, DJ80].
It was commercialised under the name Axiom by NAG Ltd in the 1990s,
but never achieved the necessary commercial success, and is now open-source
— www.axiom-developer.org.
C.1.3 Structure
All Axiom objects are typed, as seen in Figures C.1 and C.2. In the second
one, we see that a and b are objects of appropriate, but di↵erent, types. The
system ‘knows’ (for details, see [Doy99]) that an appropriate common type is
Polynomial Integer, which is what c becomes.
C.2 Macsyma
C.2.1 Overview
See Figure C.3. We note that the g.c.d. is only cancelled on demand (radcan
stands for ‘radical canonicalizer’). The equation at line 3 is treated as such, but
if we force the system (line 4) to regard it as a Boolean, the result is false,
despite the fact that the comparands are mathematically equal.
www.axiom-developer.org
C.2. MACSYMA 323
Figure C.2: Axiom type system
(1) -> a:=1
(1) 1
Type: PositiveInteger
(2) -> b:=x
(2) x
Type: Variable x
(3) -> c:=a+b
(3) x + 1
Type: Polynomial Integer
Figure C.3: Macsyma output
(%i1) (x^2-1)^10/(x+1)^10;
2 10
(x – 1)
(%o1) ———-
10
(x + 1)
(%i2) radcan(%);
10 9 8 7 6 5 4 3 2
(%o2) x – 10 x + 45 x – 120 x + 210 x – 252 x + 210 x – 120 x + 45 x
– 10 x + 1
(%i3) %= (x^2-1)^10/(x+1)^10;
10 9 8 7 6 5 4 3 2
(%o4) x – 10 x + 45 x – 120 x + 210 x – 252 x + 210 x – 120 x + 45 x
2 10
(x – 1)
– 10 x + 1 = ———-
10
(x + 1)
(%i5) if % then 1 else 2;
(%05) 2
324 APPENDIX C. SYSTEMS
C.2.2 History
C.3 Maple
C.3.1 Overview
See Figure C.4. Note that cancelling the g.c.d., and expansion, did not take place
until asked for. Also, while the exponents in the expanded form of (x + 1)10
were in order, the same was not true of its square.
C.3.2 History
This system started in the early 1980s, a period when computer power, and
particularly memory, were much less than they are today. It was designed to
support multiple users, particularly classes of students, on what were, by the
standards of the time, relatively modest university resources. Two important
early references are [CGGG83, CFG+84]. These circumstances led to three
design principles.
1. The system had to be small — early hosts systems had limitations of, for
example, 110K words of memory. In particular, users must not pay, in
terms of memory occupancy, for features they were not using, which led
to a ‘kernel plus loadable modules’ design, where the kernel knew basic
algebraic features, and the rest of the system was in loadable modules
written in the Maple langauge itself, and therefore interpreted rather than
compiled. The precise definition of ‘basic’ has changed over time — see
section C.3.4.
2. The system had to be portable — early versions ran on both 32-bit VAX
computers and 36-bit Honeywell computers. Its kernel, originally some
5500 lines of code, was macro-processed into ‘languages of the BCPL fam-
ily’, of which the most successful, and the one used today, is C.
3. Memory was scarce, and hence re-use had to be a priority.
C.3.3 Data structures
These considerations led to an “expression DAG” design (see page 52). The
internal form of an expression is a node of a certain type followed by an arbi-
trary number (hence we talk about an n-ary tree) of operands. If A is a Maple
expression, the construct op(0,A) gives the type of the node, and op(i,A) gives
the i-th operand. This is demonstrated by the session in table C.1, which builds
the DAG shown in figure C.5. It might be assumed from table C.1 that Maple
had some ‘preferred’ order which A matched but B did not. However, if we look
at table Maple:code2 (run in a fresh session of Maple), we see that B is now the
preferred instance. The point is that, since Maple’s internalising process2 con-
C.3. MAPLE 325
Figure C.4: Maple output
> (x^2-1)^10/(x+1)^10;
2 10
(x – 1)
———-
10
(x + 1)
> simplify(%);
10
(x – 1)
> expand(%);
10 9 8 7 6 5 4 3
x – 10 x + 45 x – 120 x + 210 x – 252 x + 210 x – 120 x
2
+ 45 x – 10 x + 1
> %^2;
10 9 8 7 6 5 4 3
(x – 10 x + 45 x – 120 x + 210 x – 252 x + 210 x – 120 x
2 2
+ 45 x – 10 x + 1)
> expand(%);
2 3 5 6 7
1 + 190 x – 20 x – 1140 x – 15504 x + 38760 x – 77520 x
8 9 10 20 19
+ 125970 x – 167960 x + 184756 x + x – 20 x
18 17 16 15 14
+ 190 x – 1140 x + 4845 x – 15504 x + 38760 x
13 12 11 4
– 77520 x + 125970 x – 167960 x + 4845 x
326 APPENDIX C. SYSTEMS
Table C.1: A small Maple session
> A:=x+y+a*b;
A := x + y + a b
> op(0,A);
+
> op(3,A);
a b
> op(0,op(3,A));
*
> B:=y+b*a+x;
B := x + y + a b
Figure C.5: Tree for A, B corresponding to table C.1
+
. # &
x y ⇤
# &
a b
structs these trees, in which + nodes cannot be children of + nodes, or + nodes
of * nodes (hence implicitly enforcing associativity of these operators), and the
children are unordered (hence implicitly enforcing commutativity3), once the
subtree corresponding to a*b has been built, as in the first line of table C.1, an
equivalent tree, such as that corresponding to b*a, is stored as the same tree.
If it is fundamentall unordered, which way does it print? The answer is given
in [CFG+84, p. 7]
if the expression is found, the one in the table is used, and the [new]
one is discarded.
Hence in table C.1, the presence of a*b means that b*a automatically becomes
a*b. The converse behaviour occurs in table C.2.
In terms of the 10 algebraic rules on pp. 41–42, this structure automatically
follows all except (8), which is implemented only in the weaker form (8’).
2Referred to as the ‘simplifier’ in [CGGG83, CFG+84], but we do not use that word to
avoid confusion with Maple’s simplify command.
3We have found an example where this is not the case, but this is explicitly described as a
bug by Maple’s Senior Architect.
Two simpl’d PROD DAGs containing the same entries but in a di↵erent order
is a kernel bug by definition. [Vor10]
C.3. MAPLE 327
Table C.2: Another small Maple session
> B:=y+b*a+x;
B := y + b a + x
> op(0,B);
+
> op(2,B);
b a
> op(0,op(2,B));
*
> A:=x+y+a*b;
A := y + b a + x
Figure C.6: Tree for A, B corresponding to table C.2
+
. # &
y ⇤ x
# &
b a
The Maple command expand implements (8) fully, therefore producing, for
polynomials, what we referred to (page 50) as a distributed representation4.
This is therefore canonical (definition 4), but in a limited sense: it is canonical
within a given Maple session, but may vary between sessions. This means
that operations such as op(i,A) (i 6= 0) are not necessarily consistent between
sessions.
C.3.4 Heuristic GCD
[CGG84, CGG89].
C.3.5 Conclusion
There are many books written on Maple, particularly in the educational context.
A comprehensive list would be out-of-date before it was printed, but we should
mention [Cor02].
4But we should note that there are no guaranteed mathematical properties of the ordering.
Good properties are provided by the MonomialOrders of the Groebner package.
328 APPENDIX C. SYSTEMS
Figure C.7: MuPAD output
>> (x^2+1)^10/(x-1)^10
ans =
(x^2 + 1)^10/(x – 1)^10
>> (x^2-1)^10/(x+1)^10
ans =
(x^2 – 1)^10/(x + 1)^10
>> simplify(ans)
ans =
(x – 1)^10
>> expand(ans)
ans =
x^10 – 10*x^9 + 45*x^8 – 120*x^7 + 210*x^6 – 252*x^5 + 210*x^4 – 120*x^3 + 45*x^2 – 10*x + 1
>> ans == (x^2 + 1)^10/(x – 1)^10
ans =
0
C.4 MuPAD
C.4.1 Overview
See Figure C.7. We note that the g.c.d. is only cancelled on demand, and that
the result of the equality test is 0, i.e. false.
C.4.2 History
This system [FGK+94] could easily have been written o↵ as “yet another com-
puter algebra system”, until MathWorks Inc. bought the company, and made
MuPAD into the “symbolic toolbox” of MatLab (replacing Maple). The session
C.5. REDUCE 329
Figure C.8: Reduce output
1: (x^2-1)^10/(x+1)^10;
x10 � 10×9 + 45×8 � 120×7 + 210×6 � 252×5 + 210×4 � 120×3 + 45×2 � 10x+ 1
2: off exp;
3: (x^2-1)^10/(x+1)^10;
(x� 1)10
4: off gcd;
5: (x^2-1)^10/(x+1)^10;
(x� 1)10
6: (x^2-1)^10/(x^2+2*x+1)^10;
�
x2 � 1
�10
(x2 + 2x+ 1)
10
7: on gcd;
8: ws;
(x� 1)10
(x+ 1)
10
above is taken from MatLab.
C.5 Reduce
C.5.1 Overview
Note that Reduce produces output that cuts/pastes as TEX, as in Figure C.8.
By default, the output is canonical (Definition 4), but this can be changed
via the exp (short for expand) and gcd (compute them) switches. Note that
exact division tests are done even when gcd is o↵, as in line 5, and we need a
more challenging example, as in line 6.
C.5.2 History
330 APPENDIX C. SYSTEMS
Appendix D
Index of Notation
Notation Meaning Reference Page
(S) The ideal generated by S Notation 5 29
(F,A,B)B A Bourbakist function definition Notation 3 23
(F,A,B)B |C (F,A,B)B restricted to C Notation 3 23
?
= An equation whose validity depends Notation 36 275
on interpretations
=CA Equality of a computer algebra system Notation 4 28
=O Equality of mathematical objects Notation 4 28
=R Equality of representations Notation 4 28
||f ||k The k-norm of f(x) Notation 38 291
p ^ q The logical and of p and q Definition 76 137
p _ q The logical (inclusive) or of p and q Definition 76 137
¬p The logical negation of p Definition 76 137
A The algebraic numbers Notation 30 237
C The complex numbers
C Any field of constants Definition 100 246
con(F ) The cone of F Theorem 35 153
Rconst The constants of R Definition 100 246
den(f) The denominator of f Proposition 12 59
✏ The relaxation of ✏ Notation 21 88
H(f) The height of f(x), ||f ||1 Notation 38 291
L(f) The length of f(x), ||f ||1 Notation 38 291
li The logarithmic integral Notation 34 261
Ln The multivalued logarithm Notation 37 279
331
332 APPENDIX D. INDEX OF NOTATION
Notation Meaning Reference Page
(m, d) McCaullum’s size measure Definition 83 147
M(f) The Mahler measure of f(x) Notation 38 291
M(f) The Mahler measure of f(x1, . . . , xn) Definition 112 297
M(n) The time needed to multiply
two objects of length n Notation 42 315
mon(H) The (multiplicative) monoid of H Theorem 35 153
µ(n) The Möbius function Proposition 100 319
N The nonnegative integers (natural numbers)
Newton(p) The Newton series Notation 17 55
num(f) The numerator of f Proposition 12 59
ProjC(A) The Collins projection Notation 25 146
ProjM (A) The McCallum projection Notation 26 147
Q The rational numbers
R The real numbers
S(f, g) The S-polynomial of f and g Definition 48 100
Sf The “sugar’ of the polynomial f Definition 55 110
S(f,g) The “sugar’ of the polynomial pair (f, g) Definition 55 110
Sk(f) The skeleton of the polynomial f Definition 86 189
Sqrt The multivalued square root Notation 37 279
W (z) The Lambert W function 255
Z The integers
ZN The integers modulo N
! The exponent of matrix multiplication Notation 43 318
!(z) The Wright ! function 255
Bibliography
[Abb88] J.A. Abbott. Factorisation of Polynomials over Algebraic Number
Fields. PhD thesis, University of Bath, 1988.
[Abb02] J.A. Abbott. Sparse Squares of Polynomials. Math. Comp., 71:407–
413, 2002.
[Abb04] J.A. Abbott. CoCoA: a laboratory for computations in commuta-
tive algebra. ACM SIGSAM Bulletin 1, 38:18–19, 2004.
[Abb09] J.A. Abbott. Bounds on Factors in Z[x]. http://arxiv.org/abs/
0904.3057, 2009.
[Abb15] J.A. Abbott. Geobuckets in CoCoA. Personal Communication,
2015.
[ABD85] J.A. Abbott, R.J. Bradford, and J.H. Davenport. A Remark on
Factorisation. SIGSAM Bulletin 2, 19:31–33, 1985.
[ABD88] J.A. Abbott, R.J. Bradford, and J.H. Davenport. Factorisation of
Polynomials: Old Ideas and Recent Results. In R. Janssen, editor,
Proceedings “Trends in Computer Algebra”, pages 81–91, 1988.
[ABM99] J.A. Abbott, M. Bronstein, and T. Mulders. Fast Deterministic
Computations of Determinants of Dense Matrices. In S. Dooley,
editor, Proceedings ISSAC ’99, pages 197–204, 1999.
[Ach06] M. Achatz. Deciding polynomial-exponential problems. Master’s
thesis, (Diploma) Diplomarbeit Uiversität Passau, 2006.
[Ada43] Ada Augusta Countess of Lovelace. Sketch of the Analytical Engine
invented by Charles Babbage, by L.F. Menabrea of Turin, with
notes on the memoir by the translator. Taylor’s Scientific Memoirs
29, 3:666–731, 1843.
[AGK97] B. Amrhein, O. Gloor, and W. Küchlin. On the Walk. Theor.
Comp. Sci., 187:179–202, 1997.
333
http://arxiv.org/abs/0904.3057
http://arxiv.org/abs/0904.3057
334 BIBLIOGRAPHY
[AGR14a] A. Arnold, M. Giesbrecht, and D.S Roche. Faster Sparse Multi-
variate Polynomial Interpolation of Straight-Line Programs. http:
//arxiv.org/abs/1412.4088, 2014.
[AGR14b] A. Arnold, M. Giesbrecht, and D.S Roche. Sparse interpolation
over finite fields via low-order roots of unity. In K. Nabeshima,
editor, Proceedings ISSAC 2014, pages 27–34, 2014.
[AHU74] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Anal-
ysis of Computer Algorithms. Addison-Wesley, 1974.
[AHU83] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. Data Structures and
Algorithms. Addison-Wesley, 1983.
[AIR14] A. Abdesselam, C. Ikenmeyer, and G. Royle. 16,051 formulas for
Ottaviani’s invariant of cubic threefolds. http://arxiv.org/abs/
1402.2669, 2014.
[Akr82] A.G. Akritas. Reflections on a Pair of Theorems by Budan and
Fourier. Mathematics Magazine, 55:292–298, 1982.
[AKS04] M. Agrawal, N. Kayal, and N. Saxena. Primes is in P. Ann. Math.
(2), 160:781–793, 2004.
[AL94] W.W. Adams and P. Loustaunau. An introduction to Gröbner
bases. Amer. Math. Soc., 1994.
[ALM99] P. Aubry, D. Lazard, and M. Moreno Maza. On the Theories of
Triangular Sets. J. Symbolic Comp., 28:105–124, 1999.
[ALSU07] A.V. Aho, M.S. Lam, R. Sethi, and J.D. Ullman. Compilers: Prin-
ciples, Techniques and Tools. Pearson Addison-Wesley, 2007.
[AM99] P. Aubry and M. Moreno Maza. Triangular Sets for Solving Polyno-
mial Systems: A Comparison of Four Methods. J. Symbolic Comp.,
28:125–154, 1999.
[AM01] J.A. Abbott and T. Mulders. How Tight is Hadamard’s Bound?
Experimental Math., 10:331–336, 2001.
[Ano07] Anonymous. MACSYMA . . . . http://web.archive.org/
web/20140103015032/http://www.symbolicnet.org/systems/
macsyma.html, 2007.
[AP10] B. Akbarpour and L.C. Paulson. MetiTarski: An Automatic Theo-
rem Prover for Real-Valued Special Functions. J. Automated Rea-
soning, 44:175–205, 2010.
[Apo67] T.M. Apostol. Calculus, Volume I, 2nd edition. Blaisdell, 1967.
http://arxiv.org/abs/1412.4088
http://arxiv.org/abs/1412.4088
http://arxiv.org/abs/1402.2669
http://arxiv.org/abs/1402.2669
http://web.archive.org/web/20140103015032/http://www.symbolicnet.org/systems/macsyma.html
http://web.archive.org/web/20140103015032/http://www.symbolicnet.org/systems/macsyma.html
http://web.archive.org/web/20140103015032/http://www.symbolicnet.org/systems/macsyma.html
BIBLIOGRAPHY 335
[AR15] A. Arnold and D.S Roche. Output-Sensitive Algorithms for Sum-
set and Sparse Polynomial Multiplication. In D. Robertz, editor,
Proceedings ISSAC 2015, pages 29–36, 2015.
[Arn03] E.A. Arnold. Modular algorithms for computing Gröbner bases. J.
Symbolic Comp., 35:403–419, 2003.
[Arn11] A.D. Arnold. Algorithms for computing cyclotomic polynomials.
Ms.C. Thesis Department of Mathematics Simon Fraser University,
2011.
[ARS+13] C. Andradas, T. Recio, J.R. Sendra, L. Tabera, and C. Villarino.
Reparametrizing Rational Revolution Surfaces over the Reals. Sub-
mitted, 2013.
[Arw18] A. Arwin. Über die Kongruenzen von dem fünften und höheren
Graden nach einem Primzahlmodulus. Arkiv før matematik, 14:1–
48, 1918.
[AS64] M. Abramowitz and I. Stegun. Handbook of Mathematical Func-
tions with Formulas, Graphs, and Mathematical Tables, 9th print-
ing. US Government Printing O�ce, 1964.
[ASZ00] J.A. Abbott, V. Shoup, and P. Zimmermann. Factorization in Z[x]:
The Searching Phase. In C. Traverso, editor, Proceedings ISSAC
2000, pages 1–7, 2000.
[AW00] H. Anai and V. Weispfenning. Deciding Linear-Trigonometric Prob-
lems. In C. Traverso, editor, Proceedings ISSAC 2000, pages 14–22,
2000.
[Ax71] J. Ax. On Schanuel’s Conjectures. Ann. Math., 93:252–268, 1971.
[B7́9] E. Bézout. Théorie générale des équations algébriques. Ph.-D.
Pierres, 1779.
[Bac94] P. Bachmann. Die analytische Zahlentheorie. Teubner, 1894.
[Bak75] A. Baker. Transcendental Number Theory. Cambridge University
Press, 1975.
[Bar68] E.H. Bareiss. Sylvester’s Identity and Multistep Integer-preserving
Gaussian Elimination. Math. Comp., 22:565–578, 1968.
[Bas99] S. Basu. New results on quantifier elimination over real closed
fields and applications to constraint databases. J. ACM, 46:537–
555, 1999.
[Bat49] P.T. Bateman. Note on the coe�cients of the cyclotomic polyno-
mial. Bull. AMS, 55:1180–1181, 1949.
336 BIBLIOGRAPHY
[BBDP07] J.C. Beaumont, R.J. Bradford, J.H. Davenport, and N. Phisan-
but. Testing Elementary Function Identities Using CAD. AAECC,
18:513–543, 2007.
[BC90] G. Butler and J. Cannon. The Design of Cayley — A Language for
Modern Algebra. In Proceedings DISCO ’90, 1990.
[BCD+02] R.J. Bradford, R.M. Corless, J.H. Davenport, D.J. Je↵rey, and S.M.
Watt. Reasoning about the Elementary Functions of Complex Anal-
ysis. Annals of Mathematics and Artificial Intelligence, 36:303–318,
2002.
[BCDJ08] M. Bronstein, R. Corless, J.H. Davenport, and D.J. Je↵rey. Alge-
braic properties of the Lambert W function from a result of Rosen-
stein and Liouville. J. Integral Transforms and Special Functions,
18:709–712, 2008.
[BCM94] W. Bosma, J. Cannon, and G. Matthews. Programming with al-
gebraic structures: design of the Magma language. In Proceedings
ISSAC 1994, pages 52–57, 1994.
[BCR98] J. Bochnak, M. Coste, and M.-F. Roy. Real algebraic geometry.
Ergebnisse der Mathematik 38 (translated from the French and re-
vised by the authors), 1998.
[BCR11] A. Bigatti, M. Caboara, and L. Robbiano. Computing inhomoge-
neous Grbner bases. J. Symbolic Comp., 46:498–510, 2011.
[BD89] R.J. Bradford and J.H. Davenport. E↵ective Tests for Cyclotomic
Polynomials. In P. Gianni, editor, Proceedings ISSAC 1988, pages
244–251, 1989.
[BD02] R.J. Bradford and J.H. Davenport. Towards Better Simplification
of Elementary Functions. In T. Mora, editor, Proceedings ISSAC
2002, pages 15–22, 2002.
[BD07] C.W. Brown and J.H. Davenport. The Complexity of Quantifier
Elimination and Cylindrical Algebraic Decomposition. In C.W.
Brown, editor, Proceedings ISSAC 2007, pages 54–60, 2007.
[BdB22] F.F.D. Budan de BoisLaurent. Nouvelle méthode pour la résolution
des équations numériques d’un degré quelconque. Dondey-Dupré,
1822.
[BDE+13] R.J. Bradford, J.H. Davenport, M. England, S. McCallum, and
D.J. Wilson. Cylindrical Algebraic Decompositions for Boolean
Combinations. In Proceedings ISSAC 2013, pages 125–132, 2013.
[BDE+14] R.J. Bradford, J.H. Davenport, M. England, S. McCallum, and
D.J. Wilson. Truth Table Invariant Cylindrical Algebraic Decom-
position. To appear in J. Symbolic Computation, 2014.
BIBLIOGRAPHY 337
[BDS09] R.J. Bradford, J.H. Davenport, and C.J. Sangwin. A Comparison
of Equality in Computer Algebra and Correctness in Mathemati-
cal Pedagogy. In J. Carette et al., editor, Proceedings Intelligent
Computer Mathematics, pages 75–89, 2009.
[BDS10] R.J. Bradford, J.H. Davenport, and C.J. Sangwin. A Comparison
of Equality in Computer Algebra and Correctness in Mathematical
Pedagogy (II). International Journal of Technology in Mathemati-
cal Education 2, 17:93–98, 2010.
[Ber67] E.R. Berlekamp. Factoring Polynomials over Finite Fields. Bell
System Tech. J., 46:1853–1859, 1967.
[Ber70] E.R. Berlekamp. Factoring Polynomials over Large Finite Fields.
Math. Comp., 24:713–735, 1970.
[BF91] J. Backelin and R. Fröberg. How we proved that there are exactly
924 cyclic 7-roots. In S.M. Watt, editor, Proceedings ISSAC 1991,
pages 103–111, 1991.
[BF93] M. Bartolozzi and R. Franci. La regola dei segni dall’ enunciato
di R. Descartes (1637) alla dimostrazione di C.F. Gauss (1828).
Archive for History of Exact Sciences 335-374, 45, 1993.
[BFSS06] A. Bostan, P. Flajolet, B. Salvy, and É. Schost. Fast computation
of special resultants. J. Symbolic Comp., 41:1–29, 2006.
[BHNS15] D.J. Bates, J.D. Hauenstein, M.E. Niemerg, and F. Sottile. Soft-
ware for the Gale transform of fewnomial systems and a Descartes
rule for fewnomials. http://arxiv.org/abs/1505.05241, 2015.
[BHPR11] O. Bastani, C.J. Hillar, P. Popov, and J.M. Rojas. Randomization,
Sums of Squares, and Faster Real Root Counting for Tetranomials
and Beyond. http://arxiv.org/abs/1101.2642, 2011.
[Big15] A. Bigatti. An example of Hilbert-badness. Personal Communica-
tion, 2015.
[BL98] T. Breuer and S.A. Linton. The GAP4 Type System Organising
Algebraic Algorithms. In O.Gloor, editor, Proceedings ISSAC ’98,
pages 38–45, 1998.
[BM09] L. Busé and B. Mourrain. Explicit factors of some iterated resul-
tants and discriminants. Math. Comp., 78:345–386, 2009.
[BMMT94] E. Becker, M.G. Marinari, T. Mora, and C. Traverso. The shape
of the shape lemma. In Proceedings ISSAC 1994, pages 129–133,
1994.
[Bou61] N. Bourbaki. Algèbre Commutative, chapter 2. Hermann, 1961.
http://arxiv.org/abs/1505.05241
http://arxiv.org/abs/1101.2642
338 BIBLIOGRAPHY
[Bou70] N. Bourbaki. Théorie des Ensembles. Di↵usion C.C.L.S., 1970.
[BPR06] S. Basu, R. Pollack, and M.-F. Roy. Algorithms in Real Algebraic
Geometry, 2nd ed. Springer, 2006.
[Bro69] W.S. Brown. Rational Exponential Expressions, and a conjecture
concerning ⇡ and e. Amer. Math. Monthly, 76:28–34, 1969.
[Bro71a] W.S. Brown. On Euclid’s Algorithm and the Computation of Poly-
nomial Greatest Common Divisors. In Proceedings SYMSAC 1971,
pages 195–211, 1971.
[Bro71b] W.S. Brown. On Euclid’s Algorithm and the Computation of Poly-
nomial Greatest Common Divisors. J. ACM, 18:478–504, 1971.
[Bro90] M. Bronstein. Integration of elementary function. J. Symbolic
Comp., 9:117–173, 1990.
[Bro91] M. Bronstein. The Algebraic Risch Di↵erential Equation. In Pro-
ceedings ISSAC 91, pages 241–246, 1991.
[Bro03] C.W. Brown. QEPCAD B: a program for computing with semi-
algebraic sets using CADs. ACM SIGSAM Bulletin 4, 37:97–108,
2003.
[Bro07] M. Bronstein. Structure theorems for parallel integration (Paper
completed by Manuel Kauers). J. Symbolic Comp., 42:757–769,
2007.
[BS86] D. Bayer and M. Stillman. The Design of Macaulay: A System for
Computing in Algebraic Geometry and Commutative Algebra. In
Proceedings SYMSAC 86, pages 157–162, 1986.
[BS93] E. Bach and J. Sorenson. Sieve Algorithms for Perfect Power Test-
ing. Algorithmica, 9:313–328, 1993.
[Buc70] B. Buchberger. Ein algorithmisches Kriterium für die Lösbarkeit
eines algebraischen Gleichungssystem (English translation in
[Buc98]). Aequationes Mathematicae, 4:374–383, 1970.
[Buc79] B. Buchberger. A Criterion for Detecting Unnecessary Reductions
in the Construction of Groebner Bases. In Proceedings EUROSAM
79, pages 3–21, 1979.
[Buc84] B. Buchberger. A critical pair/completion algorithm for finitely
generated ideals in rings. Logic and Machines: Decision Problems
and Complexity, pages 137–155, 1984.
[Buc98] B. Buchberger. An Algorithmic Criterion for the Solvability of a
System of Algebraic Equations. In Gröbner Bases and Applications,
pages 535–545, 1998.
BIBLIOGRAPHY 339
[Bur13] M.A. Burr. Applications of Continuous Amortization to Bisection-
based Root Isolation. http://arxiv.org/abs/1309.5991, 2013.
[BW93] T. Becker and V. (with H. Kredel) Weispfenning. Groebner Bases.
A Computational Approach to Commutative Algebra. Springer
Verlag, 1993.
[CA76] G.E. Collins and A.V. Akritas. Polynomial Real Root Isolation
Using Descartes’ Rule of Signs. In R.D. Jenks, editor, Proceedings
SYMSAC 76, pages 272–275, 1976.
[Car58] C. Carathéodory. Theory of functions of a complex variable. Chelsea
Publ., 1958.
[Car73] H. Cartan. Elementary Theory of Analytic Functions of One or
Several Complex Variables. Addison-Wesley, 1973.
[Car04] J. Carette. Understanding Expression Simplification. In J. Gutier-
rez, editor, Proceedings ISSAC 2004, pages 72–79, 2004.
[Cau29] A.-L. Cauchy. Exercices de Mathématiques Quatrième Année. De
Bure Frères, Paris, 1829.
[CBH11] N.T. Courtois, G.V. Bard, and D. Hulme. A New General-Purpose
Method to Multiply 3⇥ 3 Matrices Using Only 23 Multiplications.
http://arxiv.org/abs/1108.2830, 2011.
[CD85] D. Coppersmith and J.H. Davenport. An Application of Factoring.
J. Symbolic Comp., 1:241–243, 1985.
[CD91] D. Coppersmith and J.H. Davenport. Polynomials whose Powers
are Sparse. Acta Arithmetica, 58:79–87, 1991.
[CDJW00] R.M. Corless, J.H. Davenport, D.J. Je↵rey, and S.M. Watt. Ac-
cording to Abramowitz and Stegun, or arccoth needn’t be uncouth.
SIGSAM Bulletin 2, 34:58–65, 2000.
[CE95] G.E. Collins and M.J. Encarnación. E�cient rational number re-
construction. J. Symbolic Comp., 20:287–297, 1995.
[CFG+84] B.W. Char, G.J. Fee, K.O. Geddes, G.H. Gonnet, M.B. Monagan,
and S.M. Watt. On the Design and Performance of the Maple
System. Technical Report CS-84-13 University of Waterloo, 1984.
[CGG84] B.W. Char, K.O. Geddes, and G.H. Gonnet. GCDHEU: Heuristic
Polynomial GCD Algorithm Based on Integer GCD Computation.
In J.P. Fitch, editor, Proceedings EUROSAM 84, pages 285–296,
1984.
http://arxiv.org/abs/1309.5991
http://arxiv.org/abs/1108.2830
340 BIBLIOGRAPHY
[CGG89] B.W. Char, K.O. Geddes, and G.H. Gonnet. GCDHEU: Heuristic
Polynomial GCD Algorithm Based on Integer GCD Computations.
J. Symbolic Comp., 7:31–48, 1989.
[CGGG83] B.W. Char, K.O. Geddes, M.W. Gentleman, and G.H. Gonnet. The
Design of MAPLE: A Compact, Portable and Powerful Computer
Algebra System. In Proceedings EUROCAL 83, pages 101–115,
1983.
[CGH88] L. Caniglia, A. Galligo, and J. Heintz. Some New E↵ectivity Bounds
in Computational Geometry. In Proceedings AAECC-6, pages 131–
152, 1988.
[CGH+96] R.M. Corless, G.H. Gonnet, D.E.G. Hare, D.J. Je↵rey, and D.E.
Knuth. On the Lambert W Function. Advances in Computational
Mathematics, 5:329–359, 1996.
[CGH+03] D. Castro, M. Giusti, J. Heintz, G. Matera, and L.M. Pardo. The
Hardness of Polynomial Equation Solving. Foundations of Compu-
tational Mathematics, 3:347–420, 2003.
[CH91] G.E. Collins and H. Hong. Partial Cylindrical Algebraic Decompo-
sition for Quantifier Elimination. J. Symbolic Comp., 12:299–328,
1991.
[Che86] G.W. Cherry. Integration in Finite Terms with Special Functions:
the Logarithmic Integral. SIAM J. Computing, 15:1–21, 1986.
[Chi53] F. Chiò. Mémoire sur les Fonctions Connues sous le nom des
Résultants ou des Déterminants. A Pons & Cie, 1853.
[CJ02] R.M. Corless and D.J. Je↵rey. The Wright ! Function. Artificial
Intelligence, pages 76–89, 2002.
[CKM97] S. Collart, M. Kalkbrener, and D. Mall. Converting Bases with the
Gröbner Walk. J. Symbolic Comp., 24:465–469, 1997.
[CLO06] D.A. Cox, J.B. Little, and D.B. O’Shea. Ideals, Varieties and Al-
gorithms. Springer–Verlag, 2006.
[CMXY09] C. Chen, M. Moreno Maza, B. Xia, and L. Yang. Computing Cylin-
drical Algebraic Decomposition via Triangular Decomposition. In
J. May, editor, Proceedings ISSAC 2009, pages 95–102, 2009.
[Col71] G.E. Collins. The SAC-1 System: An Introduction and Survey. In
Proceedings SYMSAC 1971, 1971.
[Col75] G.E. Collins. Quantifier Elimination for Real Closed Fields by
Cylindrical Algebraic Decomposition. In Proceedings 2nd. GI Con-
ference Automata Theory & Formal Languages, pages 134–183,
1975.
BIBLIOGRAPHY 341
[Col79] G.E. Collins. Factoring univariate integral polynomials in polyno-
mial average time. In Proceedings EUROSAM 79, pages 317–329,
1979.
[Col85] G.E. Collins. The SAC-2 Computer Algebra System. In Proceedings
EUROCAL 85, pages 34–35, 1985.
[Col98] G.E. Collins. Quantifier elimination by cylindrical algebraic de-
composition — twenty years of progess. In B.F. Caviness and J.R.
Johnson, editors, Quantifier Elimination and Cylindrical Algebraic
Decomposition, pages 8–23. Springer Verlag, Wien, 1998.
[Cor02] R.M. Corless. Essential Maple 7 : an introduction for scientific
programmers. Springer-Verlag, 2002.
[CR88] M. Coste and M.-F. Roy. Thom’s Lemma, the Coding of Real
Algebraic Numbers and the Computation of the Topology of Semi-
Algebraic Sets. J. Symbolic Comp., 5:121–129, 1988.
[CW90] D. Coppersmith and S. Winograd. Matrix Multiplication via Arith-
metic Progressions. J. Symbolic Comp., 9:251–280, 1990.
[CZ81] D.G. Cantor and H. Zassenhaus. A New Algorithm for Factoring
Polynomials over Finite Fields. Math. Comp., 36:587–592, 1981.
[Dah09] X. Dahan. Size of coe�cients of lexicographical Gröbner bases:
the zero-dimensional, radical and bivariate case. In J. May, editor,
Proceedings ISSAC 2009, pages 119–126, 2009.
[Dav81] J.H. Davenport. On the Integration of Algebraic Functions, volume
102 of Springer Lecture Notes in Computer Science. Springer Berlin
Heidelberg New York (Russian ed. MIR Moscow 1985), 1981.
[Dav84] J.H. Davenport. Intégration Algorithmique des fonctions
élémentairement transcendantes sur une courbe algébrique. An-
nales de l’Institut Fourier, 34:271–276, 1984.
[Dav85a] J.H. Davenport. Computer Algebra for Cylindrical Algebraic De-
composition. Technical Report TRITA-NA-8511 NADA KTH
Stockholm (Reissued as Bath Computer Science Technical report
88-10), 1985.
[Dav85b] J.H. Davenport. The LISP/VM Foundations of Scratchpad II.
Scratchpad II Newsletter 1, 1:4–5, 1985.
[Dav87] J.H. Davenport. Looking at a set of equations (Technical Report 87-
06, University of Bath Computer Science). http://staff.bath.
ac.uk/masjhd/TR87-06.pdf, 1987.
[Dav02] J.H. Davenport. Equality in computer algebra and beyond. J.
Symbolic Comp., 34:259–270, 2002.
http://staff.bath.ac.uk/masjhd/TR87-06.pdf
http://staff.bath.ac.uk/masjhd/TR87-06.pdf
342 BIBLIOGRAPHY
[Dav10] J.H. Davenport. The Challenges of Multivalued “Functions”. In
S. Autexier et al., editor, Proceedings AISC/Calculemus/MKM
2010, pages 1–12, 2010.
[DC10] J.H. Davenport and J. Carette. The Sparsity Challenges. In S. Watt
et al., editor, Proceedings SYNASC 2009, pages 3–7, 2010.
[DDDD85] J. Della Dora, C. DiCrescenzo, and D. Duval. About a new Method
for Computing in Algebraic Number Fields. In Proceedings EURO-
CAL 85, pages 289–290, 1985.
[DdO14] Z. Dvir and R.M. de Oliveira. Factors of Sparse Polynomials are
Sparse. http://arxiv.org/abs/1404.4834, 2014.
[DF94] A. Dingle and R.J. Fateman. Branch cuts in computer algebra. In
Proceedings ISSAC 1994, pages 250–257, 1994.
[DGT91] J.H. Davenport, P. Gianni, and B.M. Trager. Scratchpad’s View
of Algebra II: A Categorical View of Factorization. In S.M. Watt,
editor, Proceedings ISSAC 1991, pages 32–38, 1991.
[DH88] J.H. Davenport and J. Heintz. Real Quantifier Elimination is Dou-
bly Exponential. J. Symbolic Comp., 5:29–35, 1988.
[Dic13] L.E. Dickson. Finiteness of the odd perfect and primitive abundant
numbers with n prime factors. Amer. J. Math., 35:413–422, 1913.
[Dix82] J.D. Dixon. Exact Solutions of Linear Equations Using p-adic
Methods. Numer. Math., 40:137–141, 1982.
[DJ80] J.H. Davenport and R.D. Jenks. MODLISP — an Introduction. In
Proceedings LISP80, 1980.
[DL08] J.H. Davenport and P. Libbrecht. The Freedom to Extend Open-
Math and its Utility. Mathematics in Computer Science 2(2008/9),
pages 379–398, 2008.
[DM90] J.H. Davenport and M. Mignotte. On Finding the Largest Root of
a Polynomial. Modélisation Mathématique et Analyse Numérique,
24:693–696, 1990.
[DN07] P. D’Alberto and A. Nicolau. Adaptive Strassen’s matrix multipli-
cation. In Proceedings Supercomputing 2007, pages 284–292, 2007.
[Dod66] C.L. Dodgson. Condensation of determinants, being a new and
brief method for computing their algebraic value. Proc. Roy. Soc.
Ser. A, 15:150–155, 1866.
[Doy99] N.J. Doye. Automated Coercion for Axiom. In S. Dooley, editor,
Proceedings ISSAC ’99, pages 229–235, 1999.
http://arxiv.org/abs/1404.4834
BIBLIOGRAPHY 343
[DS97] A. Dolzmann and Th. Sturm. Redlog: Computer Algebra Meets
Computer Logic. ACM SIGSAM Bull. 2, 31:2–9, 1997.
[DS00] J.H. Davenport and G.C. Smith. Fast recognition of alternating
and symmetric groups. J. Pure Appl. Algebra, 153:17–25, 2000.
[DT81] J.H. Davenport and B.M. Trager. Factorization over finitely gener-
ated fields. In Proceedings SYMSAC 81, pages 200–205, 1981.
[DT90] J.H. Davenport and B.M. Trager. Scratchpad’s View of Algebra I:
Basic Commutative Algebra. In Proceedings DISCO ’90, 1990.
[Dub90] T.W. Dubé. The structure of polynomial ideals and Gröbner Bases.
SIAM J. Comp., 19:750–753, 1990.
[Duv87] D. Duval. Diverses Questions relatives au Calcul Formel avec les
Nombres Algébriques. Thèse d’Etat, 1987.
[EBD15] M. England, R. Bradford, and J.H. Davenport. Improving the Use
of Equational Constraints in Cylindrical Algebraic Decomposition.
In D. Robertz, editor, Proceedings ISSAC 2015, pages 165–172,
2015.
[Ebe83] G.L. Ebert. Some comments on the modular approach to Grobner-
bases. ACM SIGSAM Bull., 17:28–32, 1983.
[EFG15] S. Eberhard, K. Ford, and B. Green. Invariable Generation of
the Symmetric Group. http://arxiv.org/pdf/1508.01870.pdf,
2015.
[Erd49] P. Erdős. On the coe�cients of the cyclotomic polynomial. Portu-
galiae Mathematica, 8:63–71, 1949.
[ESY06] A. Eigenwillig, V. Sharma, and C.K. Yap. Almost tight recursion
tree bounds for the Descartes method. In Proceedings ISSAC 2006,
pages 71–78, 2006.
[Fat03] R.J. Fateman. Comparing the speed of programs for sparse poly-
nomial multiplication. SIGSAM Bulletin 1, 37:4–15, 2003.
[Fau02] J.-C. Faugère. A New E�cient Algorithm for Computing Gröbner
Bases Without Reduction to Zero (F5). In T. Mora, editor, Pro-
ceedings ISSAC 2002, pages 75–83, 2002.
[FGHR13] J.-C. Faugère, P. Gaudry, L. Huot, and G. Renault. Polynomial
Systems Solving by Fast Linear Algebra. http://arxiv.org/abs/
1304.6039, 2013.
[FGK+94] B. Fuchssteiner, K. Gottheil, A. Kemper, O. Kluge, K. Morisse,
H. Naundorf, G. Oevel, T. Schulze, and W. Wiwianka. Mu-
PAD Multi Processing Algebra Data Tool Tutorial (version 1.2).
Birkhäuser Verlag, 1994.
http://arxiv.org/pdf/1508.01870.pdf
http://arxiv.org/abs/1304.6039
http://arxiv.org/abs/1304.6039
344 BIBLIOGRAPHY
[FGLM93] J.C. Faugère, P. Gianni, D. Lazard, and T. Mora. E�cient Compu-
tation of Zero-Dimensional Gröbner Bases by Change of Ordering.
J. Symbolic Comp., 16:329–344, 1993.
[FGT01] E. Fortuna, P. Gianni, and B. Trager. Degree reduction under
specialization. J. Pure Appl. Algebra, 164:153–163, 2001.
[fHN76] J.P. �tch, P. Herbert, and A.C. Norman. Design Features of
COBALG. In R.D. Jenks, editor, Proceedings SYMSAC 76, pages
185–188, 1976.
[FIS15] R. Fukasaku, H. Iwane, and Y. Sato. Real Quantifier Elimination by
Computation of Comprehensive Gröbner Systems. In D. Robertz,
editor, Proceedings ISSAC 2015, pages 173–180, 2015.
[FJLT07] K. Fukuda, A.N. Jensen, N. Lauritzen, and R. Thomas. The generic
Gröbner walk. Journal of Symbolic Computation, 42(3):298 – 312,
2007.
[FM89] D.J. Ford and J. McKay. Computation of Galois Groups from Poly-
nomials over the Rationals. Computer Algebra (Lecture Notes in
Pure and Applied Mathematics 113, 1989.
[FM13] J.-C. Faugère and C. Mou. Sparse FGLM algorithms. http://
arxiv.org/abs/1304.1238, 2013.
[Fou31] J. Fourier. Analyse des équations déterminées. Didot, 1831.
[FS56] A. Fröhlich and J.C. Shepherdson. E↵ective Procedures in Field
Theory. Phil. Trans. Roy. Soc. Ser. A 248(1955-6), pages 407–432,
1956.
[FSEDT14] J.-C. Faugère, M. Safey El Din, and V. Thibaut. On the complexity
of computing Gröbner bases for weighted homogeneous systems.
http://arxiv.org/abs/1412.7547, 2014.
[Ful69] W. Fulton. Algebraic Curves, An Introduction to Algebraic Geom-
etry. W.A. Benjamin Inc, 1969.
[Gal79] É. Galois. Œuvres mathématiques. Gauthier-Villars (sous l’auspices
de la SMF), 1879.
[Gia89] P. Gianni. Properties of Gröbner bases under specializations. In
Proceedings EUROCAL 87, pages 293–297, 1989.
[GJ76] W.M. Gentleman and S.C. Johnson. Analysis of Algorithms, A
Case Study: Determinants of Matrices. ACM TOMS, 2:232–241,
1976.
[GJY75] J.H. Griesmer, R.D. Jenks, and D.Y.Y. Yun. SCRATCHPAD
User’s Manual. IBM Research Publication RA70, 1975.
http://arxiv.org/abs/1304.1238
http://arxiv.org/abs/1304.1238
http://arxiv.org/abs/1412.7547
BIBLIOGRAPHY 345
[GKP94] R.L. Graham, D.E. Knuth, and O. Patashnik. Concrete Mathe-
matics (2nd edition). Addison-Wesley, 1994.
[GKS15] A. Grabowski, A. Kornilowicz, and C. Schwarzweller. Equality in
Computer Proof-Assistants. In Proceedings 2015 Federated Confer-
ence on Computer Science and Information Systems, pages 45–54,
2015.
[GLL09] M. Giesbrecht, G. Labahn, and W. Lee. Symbolic-numeric sparse
interpolation of multivariate polynomials. J. Symbolic Comp.,
44:943–959, 2009.
[GMN+91] A. Giovini, T. Mora, G. Niesi, L. Robbiano, and C. Traverso. One
sugar cube, please, or selection strategies in the Buchberger algo-
rithm. In S.M. Watt, editor, Proceedings ISSAC 1991, pages 49–54,
1991.
[GN90] A. Giovini and G. Niesi. CoCoA: A User-Friendly System for Com-
mutative Algebra. In Proceedings DISCO ’90, 1990.
[Gon84] G.H. Gonnet. Determining Equivalence of Expressions in Random
Polynomial Time. In Proceedings 16th ACM Symp. Theory of Com-
puting, pages 334–341, 1984.
[Gos78] R.W. Gosper Jr. A Decision Procedure for Indefinite Hypergeo-
metric Summation. Proc. Nat. Acad. Sci., 75:40–42, 1978.
[Gra37] C.H. Grae↵e. Die Auflösulg der höheren numerischen Gleichungen.
F. Schulthess, 1837.
[Gre15] B. Grenet. Lacunaryx: Computing bounded-degree factors of lacu-
nary polynomials. http://arxiv.org/abs/1506.03726, 2015.
[HA10] A. Hashemi and G. Ars. Extended F5 criteria. J. Symbolic Comp.,
45:1330–1340, 2010.
[Ham07] S. Hammarling. Life as a developer of numerical software. Talk at
NAG Ltd AGM, 2007.
[Har16] G.H. Hardy. The Integration of Functions of a Single Variable (2nd.
ed.). Cambridge Tract 2, C.U.P., 1916. Jbuch. 46, 1916.
[Har11] M.C. Harrison. Explicit Solution By Radicals of Algebraic Curves
of Genus 5 or 6. http://arxiv.org/abs/1103.4946, 2011.
[Has53] C.B. Haselgrove. Implementations of the Todd-Coxeter Algorithm
on EDSAC-1. Unpublished, 1953.
[HB78] D.R. Heath-Brown. Almost-primes in arithmetic progressions and
short intervals. Math. Proc. Camb. Phil. Soc., 83:357–375, 1978.
http://arxiv.org/abs/1506.03726
http://arxiv.org/abs/1103.4946
346 BIBLIOGRAPHY
[HD03] A.J. Holt and J.H. Davenport. Resolving Large Prime(s) Variants
for Discrete Logarithm Computation. In P.G. Farrell, editor, Pro-
ceedings 9th IMA Conf. Coding and Cryptography, pages 207–222,
2003.
[Hea05] A.C. Hearn. REDUCE: The First Forty Years. In T. Sturm A. Dolz-
mann, A. Seidl, editor, Proceedings A3L, pages 19–24. Books on
Demand GmbH, 2005.
[Her72] E. Hermite. Sur l’intégration des fractions rationelles. Nouvelles
Annales de Mathématiques, 11:145–148, 1872.
[Hie92] J. Hietarinta. Solving the constant quantum Yang-Baxter equa-
tion in 2 dimensions with massive use of factorizing Gröbner basis
computations. In Proceedings ISSAC 1992, pages 350–357, 1992.
[Hie93] J. Hietarinta. Solving the two-dimensional constant quantum Yang-
Baxter equation J. Math. Phys. 34(1993) pp. 1725-1756. doi:
10.1063/1.530185, 1993.
[Hig02] N.J. Higham. Accuracy and Stability of Numerical Algorithms, 2nd
ed. SIAM, 2002.
[HM13] J. Hu and M. Monagan. A parallel algorithm to compute the
greatest common divisor of sparse multivariate polynomials. ACM
Comm. Computer Algebra, 47:108–109, 2013.
[Hon90] H. Hong. Improvements in CAD-Based Quantifier Elimination.
PhD thesis, OSU-CISRC-10/90-TR29 Ohio State University, 1990.
[Hor69] E. Horowitz. Algorithm for Symbolic Integration of Rational Func-
tions. PhD thesis, Univ. of Wisconsin, 1969.
[Hor71] E. Horowitz. Algorithms for Partial Fraction Decomposition and
Rational Function Integration. In Proceedings Second Symposium
on Symbolic and Algebraic Manipulation, pages 441–457, 1971.
[Hou59] A.S. Householder. Dandelin, Lobačevskĭıor Grae↵e? Amer. Math.
Monthly, 66:464–466, 1959.
[HP07] H. Hong and J. Perry. Are Buchberger’s criteria necessary for the
chain condition? J. Symbolic Comp., 42:717–732, 2007.
[Hur12] A. Hurwitz. Über den Satz von Budan-Fourier. Math. Annalen,
71:584–591, 1912.
[HW79] G.H. Hardy and E.M. Wright. An Introduction to the Theory of
Numbers (5th. ed.). Clarendon Press, 1979.
[IEE85] IEEE. IEEE Standard 754 for Binary Floating-Point Arithmetic.
IEEE, 1985.
BIBLIOGRAPHY 347
[IL80] O.H. Ibarra and B.S. Leininger. The Complexity of the Equivalence
Problem for Straight-line Programs. In Proceedings ACM STOC
1980, pages 273–280, 1980.
[IPS10] I. Idrees, G. Pfister, and S. Steidel. Parallelization of Modular
Algorithms. http://arxiv.org/abs/1005.5663v1, 2010.
[IPS11] I. Idrees, G. Pfister, and S. Steidel. Parallelization of Modular
Algorithms. J. Symbolic Comp., 46:672–684, 2011.
[Isa85] I.M. Isaacs. Solution of polynomials by real radicals. Amer. Math.
Monthly, 92:571–575, 1985.
[Jef10] D.J. Je↵rey. LU Factoring of Non-Invertible Matrices. Communi-
cations in Computer Algebra 1, 45:1–8, 2010.
[Jen79] Jenks.R.D. MODLISP. In Proceedings EUROSAM 79, pages 466–
480, 1979.
[JHM11] R. Jones, A. Hosking, and E. Moss. The garbage collection hand-
book: The Art of Automatic Memory Management (1st ed.). Chap-
man & Hall/CRC, 2011.
[Joh71] S.C. Johnson. On the Problem of Recognizing Zero. J. ACM,
18:559–565, 1971.
[Joh74] S.C. Johnson. Sparse Polynomial Arithmetic. In Proceedings EU-
ROSAM 74, pages 63–71, 1974.
[JR10] D.J. Je↵rey and A.D. Rich. Reducing Expression Size Using Rule-
Based Integration. In S. Autexier et al., editor, Proceedings CICM
2010, pages 234–246, 2010.
[JS92] R.D. Jenks and R.S. Sutor. AXIOM: The Scientific Computation
System. Springer-Verlag, 1992.
[Kac43] M. Kac. On the Average Number of Real Roots of a Random
Algebraic Equation. Bull. A.M.S., 49:314–320, 1943.
[Kah53] H.G. Kahrimanian. Analytic di↵erentiation by a digital computer.
Master’s thesis, Temple U Philadelphia, 1953.
[Kal88] E. Kaltofen. Greatest Common Divisors of Polynomials given by
Straight-line Programs. J. ACM, 35:231–264, 1988.
[Kal89a] M. Kalkbrener. Solving systems of algebraic equations by using
Gröbner bases. In Proceedings EUROCAL 87, pages 282–292, 1989.
[Kal89b] E. Kaltofen. Factorization of Polynomials given by Straight-line
Programs. Randomness and Computation, pages 375–412, 1989.
http://arxiv.org/abs/1005.5663v1
348 BIBLIOGRAPHY
[Kal95] E. Kaltofen. E↵ective Noether irreducibility forms and applications.
J. Computer System Sci., 50:274–295, 1995.
[Kal10] E. Kaltofen. Fifteen years after DSC andWLSS2 what parallel com-
putations I do today: invited lecture at PASCO 2010. In Proceed-
ings 4th International Workshop on Parallel and Symbolic Compu-
tation, pages 10–17, 2010.
[Kar81] M. Karr. Summation in Finite Terms. J. ACM, 28:305–350, 1981.
[Kar84] N.K. Karmarkar. A New Polynomial-Time Algorithm for Linear
Programming. Combinatorica, 4:373–395, 1984.
[Kha79] L.G. Khachian. A polynomial algorithm in linear programming.
Doklay Akad. Nauk SSSR, 224:1093–1096, 1979.
[KMS83] E. Kaltofen, D.R. Musser, and B.D. Saunders. A Generalized Class
of Polynomials That are Hard to Factor. SIAM J. Comp., 12:473–
483, 1983.
[KMS13] H. Khalil, B. Mourrain, and M. Schatzman. Superfast solution of
Toeplitz systems based on syzygy reduction. http://arxiv.org/
abs/1301.5798, 2013.
[Knu74] D.E. Knuth. Big Omicron and big Omega and big Theta. ACM
SIGACT News 2, 8:18–24, 1974.
[Knu81] D.E. Knuth. The Art of Computer Programming, Vol. II, Semi-
numerical Algorithms. Second Edition, 1981.
[Knu98] D.E. Knuth. The Art of Computer Programming, Vol. II, Semi-
numerical Algorithms (Third Edition). Addison-Wesley, 1998.
[KO63] A. Karatsuba and J. Ofman. Multiplication of multidigit numbers
on automata. Sov. Phys. Dokl., 7:595–596, 1963.
[Kol88] J. Kollár. Sharp e↵ective nullstellensatz. J.A.M.S., 1:963–975, 1988.
[KPT12] P. Koiran, N. Portier, and S. Tavenas. A Wronskian approach to
the real ⌧ -conjecture. http://arxiv.org/abs/1205.1015, 2012.
[KRW90] A. Kandri-Rody and V. Weispfenning. Non-commutative Gröbner
bases in algebras of solvable type. J. Symbolic Comp., 9:1–26, 1990.
[KS11] M. Kerber and M. Sagralo↵. Root Refinement for Real Polynomials.
http://arxiv.org/abs/1104.1362, 2011.
[KSD15] M. Košta, T. Sturm, and A. Dolzmann. Better Answers to Real
Questions. http://arxiv.org/abs/1501.05098, 2015.
http://arxiv.org/abs/1301.5798
http://arxiv.org/abs/1301.5798
http://arxiv.org/abs/1205.1015
http://arxiv.org/abs/1104.1362
http://arxiv.org/abs/1501.05098
BIBLIOGRAPHY 349
[KU08] K. Kedlaya and C. Umans. Fast modular composition in any char-
acteristic. In Proceedings of the 49th Annual IEEE Symposium on
Foundations of Computer Science, pages 146–155, 2008.
[Lad76] J.D. Laderman. A Non-Commutative Algorithm for Multiplying
3 ⇥ 3 Matrices Using 23 Multiplications. Bull. Amer. Math. Soc.,
82:126–128, 1976.
[Lan05] E. Landau. Sur Quelques Théorèmes de M. Petrovic Relatif aux
Zéros des Fonctions Analytiques. Bull. Soc. Math. France, 33:251–
261, 1905.
[Lan06] J.M. Landsberg. The border rank of the multiplication of 2 ⇥ 2
matrices is seven. J. Amer. Math. Soc., 19:447–459, 2006.
[Lan12] J.M. Landsberg. New lower bounds for the rank of matrix multi-
plication. http://arxiv.org/abs/1206.1530, 2012.
[Lau82] M. Lauer. Computing by Homomorphic Images. Symbolic and Alge-
braic Computation (Computing Supplementum 4) Springer-Verlag,
pages 139–168, 1982.
[Laz83] D. Lazard. Gröbner Bases, Gaussian Elimination and Resolution
of Systems of Algebraic Equations. In Proceedings EUROCAL 83,
pages 146–157, 1983.
[Laz88] D. Lazard. Quantifier Elimination: Optimal Solution for Two Clas-
sical Problems. J. Symbolic Comp., 5:261–266, 1988.
[Laz91] D. Lazard. A NewMethod for Solving Algebraic Systems of Positive
Dimension. Discrete Appl. Math., 33:147–160, 1991.
[Laz92] D. Lazard. Solving Zero-dimensional Algebraic Systems. J. Sym-
bolic Comp., 13:117–131, 1992.
[Laz09] D. Lazard. Thirty years of Polynomial System Solving, and now?
J. Symbolic Comp., 44:222–231, 2009.
[Lec08] G. Lecerf. Fast separable factorization and applications. AAECC,
19:135–160, 2008.
[Len87] A.K. Lenstra. Factoring Multivariate Polynomials over Algebraic
Number Fields. SIAM J. Comp., 16:591–598, 1987.
[Len99a] H.W. Lenstra Jr. Finding small degree factors of lacunary polyno-
mials. Number theory in progress, pages 267–276, 1999.
[Len99b] H.W. Lenstra Jr. On the factorization of lacunary polynomials.
Number theory in progress, pages 277–291, 1999.
http://arxiv.org/abs/1206.1530
350 BIBLIOGRAPHY
[LG14] F. Le Gall. Algebraic Complexity Theory and Matrix Multipli-
cation. In K. Nabeshima, editor, Proceedings ISSAC 2014, pages
23–23, 2014.
[Lic84] T. Lickteig. A note on border rank. Inform. Process. Lett., 18:173–
178, 1984.
[Lic11] D. Lichtblau. Mathematica’s
p
x2. Private Communication, 2011.
[LL12] A. Lerario and E. Lundberg. Statistics on Hilbert’s Sixteenth Prob-
lem. http://arxiv.org/abs/1212.3823, 2012.
[LLJL82] A.K. Lenstra, H.W. Lenstra, Jun., and L. Lovász. Factoring Poly-
nomials with Rational Coe�cients. Math. Ann., 261:515–534, 1982.
[LN97] R. Lidl and H. Niederreiter. Finite Fields: volume 20 of Encyclo-
pedia of Mathematics and its Applications. Cambridge University
Press, 1997.
[LO11] J. Landsberg and G. Ottaviani. New lower bounds for the border
rank of matrix multiplication. http://arxiv.org/abs/1112.6007,
2011.
[Loo82] R. Loos. Generalized Polynomial Remainder Sequences. Sym-
bolic and Algebraic Computation (Computing Supplementum 4)
Springer-Verlag, pages 115–137, 1982.
[LR90] D. Lazard and R. Rioboo. Integration of Rational Functions —
Rational Computation of the Logarithmic Part. J. Symbolic Comp.,
9:113–115, 1990.
[LR01] T. Lickteig and M.-F. Roy. Sylvester-Habicht Sequences and Fast
Cauchy Index Computation. J. Symbolic Comp., 31:315–341, 2001.
[LV40] U. Le Verrier. Sur les variations séculaires des éléments elliptiques
des sept planètes principales: Mercurce, Vénus, La Terre, Mars,
Jupiter, Saturne et Uranus. J. Math. Pure Appl., 4:220–254, 1840.
[Mah64] K. Mahler. An Inequality for the Discriminant of a Polynomial.
Michigan Math. J., 11:257–262, 1964.
[Mak86] O.M. Makarov. An algorithm for multiplication of 3 ⇥ 3 matri-
ces. USSR Computational Mathematics and Mathematical Physics,
26:179–180, 1986.
[McC84] S. McCallum. An Improved Projection Operation for Cylindrical
Algebraic Decomposition. PhD thesis, University of Wisconsin-
Madison Computer Science, 1984.
http://arxiv.org/abs/1212.3823
http://arxiv.org/abs/1112.6007
BIBLIOGRAPHY 351
[McC88] S. McCallum. An Improved Projection Operation for Cylindrical
Algebraic Decomposition of Three-dimensional Space. J. Symbolic
Comp., 5:141–161, 1988.
[McC99] S. McCallum. On Projection in CAD-Based Quantifier Elimina-
tion with Equational Constraints. In S. Dooley, editor, Proceedings
ISSAC ’99, pages 145–149, 1999.
[McC01] S. McCallum. On Propagation of Equational Constraints in CAD-
Based Quantifier Elimination. In B. Mourrain, editor, Proceedings
ISSAC 2001, pages 223–230, 2001.
[MF71] W.A. Martin and R.J. Fateman. The MACSYMA System. In
Proceedings Second Symposium on Symbolic and Algebraic Manip-
ulation, pages 59–75, 1971.
[MGH+03] M.B. Monagan, K.O. Geddes, K.M. Heal, G. Labahn, S.M. Vorkoet-
ter, J. McCarron, and P. DeMarco. Maple 9 : introductory pro-
gramming guide. Maplesoft, 2003.
[Mig74] M. Mignotte. An Inequality about Factors of Polynomials. Math.
Comp., 28:1153–1157, 1974.
[Mig81] M. Mignotte. Some Inequalities About Univariate Polynomials. In
Proceedings SYMSAC 81, pages 195–199, 1981.
[Mig82] M. Mignotte. Some Useful Bounds. Symbolic and Algebraic Compu-
tation (Computing Supplementum 4) Springer-Verlag, pages 259–
263, 1982.
[Mig89] M. Mignotte. Mathématiques pour le Calcul Formel. PUF, 1989.
[Mig00] M. Mignotte. Bounds for the roots of lacunary polynomials. J.
Symbolic Comp., 30:325–327, 2000.
[MM82] E. Mayr and A. Meyer. The Complexity of the Word Problem for
Commutative Semi-groups and Polynomial Ideals. Adv. in Math.,
46:305–329, 1982.
[MM84] H.M. Möller and F. Mora. Upper and Lower Bounds for the Degree
of Groebner Bases. In J.P. Fitch, editor, Proceedings EUROSAM
84, pages 172–183, 1984.
[MMN89] H. Melenk, H.M. Möller, and W. Neun. Symbolic Solution of Large
Stationary Chemical Kinetics Problems. Impact of Computing in
Sci. and Eng., 1:138–167, 1989.
[Moe73] R. Moenck. Fast Computation of GCDs. In Proceedings Fifth An-
nual ACM Symposium of Theory of Computing [ACM, pages 142–
151, 1973.
352 BIBLIOGRAPHY
[Mon05] P.L. Montgomery. Five, Six, and Seven-Term Karatsuba-Like For-
mulae. IEEE Trans. Computers, 54:362–369, 2005.
[Mon09] D. Monniaux. Fatal Degeneracy in the Semidefinite Program-
ming Approach to the Decision of Polynomial Inequalities. http:
//arxiv.org/abs/0901.4907, 2009.
[Mor86] F. Mora. Groebner Bases for Non-commutative Polynomial Rings.
In Proceedings AAECC-3, pages 353–362, 1986.
[Mos71] J. Moses. Algebraic Simplification — A Guide for the Perplexed.
Comm. ACM, 14:527–537, 1971.
[MP08] M. Monagan and R. Pearce. Parallel sparse polynomial multiplica-
tion using heaps. In D.J.Je↵rey, editor, Proceedings ISSAC 2008,
pages 263–270, 2008.
[MP11] M. Monagan and R. Pearce. Sparse polynomial division using a
heap. J. Symbolic Comp., 46:807–822, 2011.
[MP12] M. Monagan and R. Pearce. POLY : A new polynomial data struc-
ture for Maple 17. Comm. Computer Algebra, 46:164–167, 2012.
[MP13] M. Monagan and R. Pearce. POLY : A new polynomial data struc-
ture for Maple 17. To appear in Proc. ASCM 2012, 2013.
[MP14] M. Monagan and R. Pearce. The design of Maple’s sum-of-products
and POLY data structures for representing mathematical objects.
ACM Comm. Computer Algebra, 48:166–186, 2014.
[MR88] F. Mora and L. Robbiano. The Gröbner fan of an ideal. J. Symbolic
Comp., 6:183–208, 1988.
[MR10] E.W. Mayr and S. Ritscher. Degree Bounds for Gröbner Bases of
Low-Dimensional Polynomial Ideals. In S.M. Watt, editor, Proceed-
ings ISSAC 2010, pages 21–28, 2010.
[MR11] E.W. Mayr and S. Ritscher. Space e�cient Gröbner basis compu-
tation without degree bounds. In Proceedings ISAAC 2011, pages
257–264, 2011.
[MR12] A. Massarenti and E. Raviolo. The Rank of n ⇥ n Matrix Multi-
plication is at least 3n2 � 2
p
2n
3
2 � 3n. http://arxiv.org/abs/
1211.6320, 2012.
[Mul97] T. Mulders. A note on subresultants and the
Lazard/Rioboo/Trager formula in rational function integration. J.
Symbolic Comp., 24:45–50, 1997.
[Mus78] D.R. Musser. On the e�ciency of a polynomial irreducibility test.
J. ACM, 25:271–282, 1978.
http://arxiv.org/abs/0901.4907
http://arxiv.org/abs/0901.4907
http://arxiv.org/abs/1211.6320
http://arxiv.org/abs/1211.6320
BIBLIOGRAPHY 353
[MW51] J.C.P. Miller and D.J. Wheeler. Missing Title. Nature p. 838, 168,
1951.
[MW12] S. McCallum and V. Weispfenning. Deciding polynomial-
transcendental problems. J. Symbolic Comp., 47:16–31, 2012.
[Nat10] National Institute for Standards and Technology. The NIST Digital
Library of Mathematical Functions. http://dlmf.nist.gov, 2010.
[Neu95] J. Neubüser. Re: Origins of GAP. Message m0t5WVW-
00075GC.951018.121802@astoria.math.rwth-aachen.de to GAP-
Forum on 18.10.95, 1995.
[NM77] A.C. Norman and P.M.A. Moore. Implementing the New Risch
Integration Algorithm. In Proceedings 4th. Int. Colloquium on Ad-
vanced Computing Methods in Theoretical Physics, pages 99–110,
1977.
[NM92] W. Neun and H. Melenk. Very large Gröbner basis calculations.
Computer Algebra and Parallelism, pages 89–99, 1992.
[Nol53] J. Nolan. Analytic di↵erentiation on a digital computer. Master’s
thesis, Math. Dept. M.I.T., 1953.
[Ost45] M.W. Ostrogradski. De l’intégration des fractions rationelles. Bull.
Acad. Imp. Sci. St. Petersburg (Class Phys.-Math.), 4:145–167,
1845.
[Pan02] V.Y. Pan. Univariate Polynomials: Nearly Optimal Algorithms
for Numerical Factorization and Root-finding. J. Symbolic Comp.,
33:701–733, 2002.
[Pau07] F. Pauer. Gröbner bases with coe�cients in rings. J. Symbolic
Computation, 42:1003–1011, 2007.
[Per09] J. Perry. An extension of Buchberger’s criteria for Groebner basis
decision. http://arxiv.org/abs/0906.4358, 2009.
[Pla77] D.A. Plaisted. Sparse Complex Polynomials and Irreducibility. J.
Comp. Syst. Sci., 14:210–221, 1977.
[PPR14] R. Pemantle, Y. Peres, and I. Rivin. Four random permutations
conjugated by an adversary generate Sn with high probability.
http://arxiv.org/abs/1412.3781, 2014.
[PQR09] A. Platzer, J.-D. Quesel, and P. Rümmer. Real World Verificatiom.
In R.A. Schmidt, editor, Proceedings CADE 2009, pages 485–501,
2009.
[Pri03] H.A. Priestley. Introduction to Complex Analysis (second edition).
Oxford University Press, 2003.
http://dlmf.nist.gov
http://arxiv.org/abs/0906.4358
http://arxiv.org/abs/1412.3781
354 BIBLIOGRAPHY
[PW85] R. Pavelle and P.S. Wang. MACSYMA from F to G. J. Symbolic
Comp., 1:69–100, 1985.
[Rab80] M.O. Rabin. Probabilistic Algorithm for Testing Primality. J.
Number Theory, 12:128–138, 1980.
[RESW14] S. Ruggieri, P. Eirinakis, K. Subramani, and P. Wojciechowski. On
the complexity of quantified linear systems. Theoretical Computer
Science, 518:128–135, 2014.
[Ric68] D. Richardson. Some Unsolvable Problems Involving Elementary
Functions of a Real Variable. Journal of Symbolic Logic, 33:514–
520, 1968.
[Ric97] D.S. Richardson. How to Recognize Zero. J. Symbolic Comp.,
24:627–645, 1997.
[Ris69] R.H. Risch. The Problem of Integration in Finite Terms. Trans.
A.M.S., 139:167–189, 1969.
[Ris70] R.H. Risch. The Solution of the Problem of Integration in Finite
Terms. Bulletin A.M.S., 76:605–608, 1970.
[Ris79] R.H. Risch. Algebraic Properties of the Elementary Functions of
Analysis. Amer. J. Math., 101:743–759, 1979.
[Ris85] J.-J. Risler. Additive Complexity of Real Polynomials. SIAM J.
Comp., 14:178–183, 1985.
[Ris88] J.-J. Risler. Some Aspects of Complexity in Real Algebraic Geom-
etry. J. Symbolic Comp., 5:109–119, 1988.
[Rit32] J.F. Ritt. Di↵erential Equations from an Algebraic Standpoint. Vol-
ume 14. American Mathematical Society, 1932.
[Rit48] J.F. Ritt. Integration in Finite Terms, Liouville’s Theory of Ele-
mentary Methods. Columbia University Press, 1948.
[Rob85] L. Robbiano. Term orderings on the Polynomial Ring. In Proceed-
ings EUROCAL 85, pages 513–525, 1985.
[Roc14] D. Roche. Polynomial interpolation. Private Communication, 2014.
[Ron11] X. Rong. Generating Sets. Private Communication, 2011.
[Rot76] M. Rothstein. Aspects of Symbolic Integration and Simplification
of Exponential and Primitive Functions. PhD thesis, Univ. of Wis-
consin, 1976.
[RR90] J.-J. Risler and F. Ronga. Testing Polynomials. J. Symbolic Comp.,
10:1–5, 1990.
BIBLIOGRAPHY 355
[RS79] A.D. Rich and D.R. Stoutemyer. Capabilities of the MUMATH-79
Computer Algebra System for the INTEL-8080 Microprocessor. In
Proceedings EUROSAM 79, pages 241–248, 1979.
[RS92] A.D. Rich and D.R. Stoutemyer. DERIVE Reference Manual. Soft-
Warehouse, 1992.
[RS13] A.D. Rich and D.R. Stoutemyer. Representation, simplification
and display of fractional powers of rational numbers in computer
algebra. http://arxiv.org/abs/1302.2169, 2013.
[Rup86] W.M. Ruppert. Reduzibilität ebener Kurven. J. Reine Angew.
Math., 369:167–191, 1986.
[Sag14] M. Sagralo↵. A Near-Optimal Algorithm for Computing Real Roots
of Sparse Polynomials. In K. Nabeshima, editor, Proceedings ISSAC
2014, pages 359–366, 2014.
[Sch71] A. Schönhage. Partial and total matrix multiplication. SIAM J.
Comp., 10:434–455, 1971.
[Sch82] A. Schönhage. The Fundamental theorem of Algebra in Terms of
Computational Complexity. Tech. Rep. U. Tübingen, 1982.
[Sch00a] A. Schinzel. Polynomials with Special Regard to Irreducibility.
C.U.P., 2000.
[Sch00b] C. Schneider. An implementation of Karr’s summation algorithm
in Mathematica. Sém. Lothar. Combin., S43b:1–20, 2000.
[Sch03a] A. Schinzel. On the greatest common divisor of two univariate
polynomials, I. In A Panorama of number theory or the view from
Baker’s garden, pages 337–352. C.U.P., 2003.
[Sch03b] H. Schönemann. Singular in a Framework for Polynomial Compu-
tations. Algebra Geometry and Software Systems, pages 163–176,
2003.
[Sch04] C. Schneider. Symbolic Summation with Single-Sum Extensions.
In J. Gutierrez, editor, Proceedings ISSAC 2004, pages 282–289,
2004.
[Sch15] H. Schönemann. Geobuckets in SINGULAR. Personal Communi-
cation, 2015.
[Sco15] M. Scott. Missing a trick: Karatsuba variations. http://eprint.
iacr.org/2015/1247.pdf, 2015.
[SD70] H.P.F. Swinnerton-Dyer. Letter to E.H. Berlekamp. Mentioned in
[Ber70], 1970.
http://arxiv.org/abs/1302.2169
http://eprint.iacr.org/2015/1247.pdf
http://eprint.iacr.org/2015/1247.pdf
356 BIBLIOGRAPHY
[Sei54] A. Seidenberg. A new decision method for elementary algebra. Ann.
Math., 60:365–374, 1954.
[Sen08] J.R. Sendra. Algebraic Curves Soluble by Radicals. http://arxiv.
org/abs/0805.3214, 2008.
[SGV94] A. Schönhage, A.F.W. Grotefeld, and E. Vetter. Fast Algo-
rithms: A Multitape Turing Machine Implementation. BI Wis-
senschaftsverlag, 1994.
[Sla61] J. Slagle. A Heuristic Program that Solves Symbolic Integration
Problems in Freshman Calculus. PhD thesis, Harvard U., 1961.
[Slo07] N.J.A. Sloane. The Online Encyclopedia of Integer Sequences.
http://www.research.att.com/~njas/sequences, 2007.
[Smi76] J. Smit. The E�cient Calculation of Symbolic Determinants. In
R.D. Jenks, editor, Proceedings SYMSAC 76, pages 105–113, 1976.
[Smi79] J. Smit. New Recursive Minor Expansion Algorithms, a Presen-
tation in a Comparative Context. In Proceedings EUROSAM 79,
pages 74–87, 1979.
[SS71] A. Schönhage and V. Strassen. Schnelle Multiplikation großer
Zahlen. Computing, 7:282–292, 1971.
[SS06] A.P. Sexton and V. Sorge. Abstract matrices in symbolic compu-
tation. In Proceedings ISSAC 2006, pages 318–325, 2006.
[SS11] J. Schicho and D. Sevilla. E↵ective radical parametrization of trig-
onal curves. http://arxiv.org/abs/1104.2470, 2011.
[ST92] J.H. Silverman and J. Tate. Rational Points on Elliptic Curves.
Springer-Verlag, 1992.
[Ste74] G. Stengle. A Nullstellensatz and a Positivstellensatz in Semialge-
braic Geometry. Mathematische Annalen, 207:87–97, 1974.
[Sto77] D.R. Stoutemyer. sin(x)**2 + cos(x)**2 = 1. In Proceedings 1977
MACSYMA Users’ Conference, pages 425–433, 1977.
[Sto11a] D. Stoutemyer. Ten commandments for good default expression
simplification. J. Symbolic Comp., 46:859–887, 2011.
[Sto11b] D. Stoutemyer. Ways to implement computer algebra compactly.
Comm. Computer Algrebra, 178:199–224, 2011.
[Sto13] D.R. Stoutemyer. Can the Eureqa Symbolic Regression Program,
Computer Algebra, and Numerical Analysis Help Each Other? No-
tices A.M.S., 60:713–724, 2013.
http://arxiv.org/abs/0805.3214
http://arxiv.org/abs/0805.3214
http://www.research.att.com/~njas/sequences
http://arxiv.org/abs/1104.2470
BIBLIOGRAPHY 357
[Str69] V. Strassen. Gaussian Elimination is not Optimal. Numer. Math.,
13:354–356, 1969.
[Stu96] T. Sturm. Real quadratic quantifier elimination in RISA/ASIR.
Technical Report Memorandum ISIS-RM-5E ISIS Fujitsu Labora-
tories Limited, 1996.
[Tak10] D. Takahashi. Parallel implementation of multiple-precision arith-
metic and 2,576,980,370,000 decimal digits of ⇡ calculation. Parallel
Computing, 36:439–448, 2010.
[Tar51] A. Tarski. A Decision Method for Elementary Algebra and Geome-
try. 2nd ed., Univ. Cal. Press. Reprinted in Quantifier Elimination
and Cylindrical Algebraic Decomposition (ed. B.F. Caviness & J.R.
Johnson), Springer-Verlag, Wein-New York, 1998, pp. 24–84., 1951.
[TE07] E.P. Tsigaridas and I.Z. Emiris. Univariate polynomial real root
isolation: Continued Fractions revisited. Proc. 14th European
Symp. Algorithms, Springer Lecture Notes in Computer Science,
4168:817–828, 2007.
[Tra76] B.M. Trager. Algebraic Factoring and Rational Function Integra-
tion. In R.D. Jenks, editor, Proceedings SYMSAC 76, pages 219–
226, 1976.
[Tra84] B.M. Trager. Integration of Algebraic Functions. PhD thesis, M.I.T.
Dept. of Electrical Engineering and Computer Science, 1984.
[Tra88] C. Traverso. Gröbner trace algorithms. In P. Gianni, editor, Pro-
ceedings ISSAC 1988, pages 125–138, 1988.
[Tra00] Q.-N. Tran. A Fast Algorithm for Gröbner Basis Conversion and
its Applications. J. Symbolic Comp., 30:451–467, 2000.
[Tri78] W. Trinks. Über B. Buchbergers Verfahren, Systeme algebraischer
Gleichungen zu lösen. J. Number Theory, 10:475–488, 1978.
[vdW34] B.L. van der Waerden. Die Seltenheit der Gleichungen mit A↵ekt.
Mathematische Annalen, 109:13–16, 1934.
[vH02] M. van Hoeij. Factoring polynomials and the knapsack problem. J.
Number Theory, 95:167–189, 2002.
[vH15] M. van Hoeij. Groebner basis in Boolean rings is not polynomial-
space. http://arxiv.org/abs/1502.07220, 2015.
[vHM02] M. van Hoeij and M. Monagan. A Modular GCD Algorithm over
Number Fields Presented with Multiple Extensions. In T. Mora,
editor, Proceedings ISSAC 2002, pages 109–116, 2002.
http://arxiv.org/abs/1502.07220
358 BIBLIOGRAPHY
[vHM04] M. van Hoeij and M. Monagan. Algorithms for Polynomial GCD
Computation over Algebraic Function Fields. In J Gutierrez, editor,
Proceedings ISSAC 2004, pages 297–304, 2004.
[vHM16] M. van Hoeij and M. Monagan. A Modular Algorithm for Comput-
ing Polynomial GCDs over Number Fields presented with Multiple
Extensions. http://arxiv.org/abs/1601.01038, 2016.
[Vor10] S. Vorkoetter. Maple kernel. E-mail
4BF15DC8.7030207@maplesoft.com, 2010.
[vT83] E.W. von Tschirnhaus. Methodus auferendi omnes terminos inter-
medios ex data aeqvatione. Acta Eruditorium, ?:204–207, 1683.
[vzG85] J. von zur Gathen. Irreducibility of multivariate polynomials. J.
Computer Syst. Sci., 31:225–264, 1985.
[vzGG99] J. von zur Gathen and J. Gerhard. Modern Computer Algebra.
C.U.P., 1999.
[vzGP01] J. von zur Gathen and D. Panario. Factoring Polynomials Over
Finite Fields: A Survey. J. Symbolic Comp., 31:3–17, 2001.
[vzGS92] J. von zur Gathen and V. Shoup. Computing Frobenius maps and
Factoring Polynomials. Computing Complexity, 2:187–224, 1992.
[Wan71a] P.S. Wang. Automatic Computation of Limits. In Proceedings
Second Symposium on Symbolic and Algebraic Manipulation, pages
458–464, 1971.
[Wan71b] P.S. Wang. Evaluation of Definite Integrals by Symbolic Manipula-
tion. PhD thesis, M.I.T & Project MAC TR-92, 1971.
[Wan76] P.S. Wang. Factoring Multivariate Polynomials over Algebraic
Number Fields. Math. Comp., 30:324–336, 1976.
[Wan78] P.S. Wang. An Improved Multivariable Polynomial Factorising Al-
gorithm. Math. Comp., 32:1215–1231, 1978.
[Wan81] P.S. Wang. A p-adic Algorithm for Univariate Partial Fractions. In
Proceedings SYMSAC 81, pages 212–217, 1981.
[Wei88] V. Weispfenning. The Complexity of Linear Problems in Fields. J.
Symbolic Comp., 5:1–27, 1988.
[Wei92] V. Weispfenning. Comprehensive Gröbner Bases. J. Symbolic
Comp., 14:1–29, 1992.
[Wei94] V. Weispfenning. Quantifier elimination for real algebra — the
cubic case. In Proceedings ISSAC 1994, pages 258–263, 1994.
http://arxiv.org/abs/1601.01038
BIBLIOGRAPHY 359
[Wei98] V. Weispfenning. A New Approach to Quantifier Elimination for
Real Algebra. Quantifier Elimination and Cylindrical Algebraic
Decomposition, pages 376–392, 1998.
[Wei03] V. Weispfenning. Canonical Comprehensive Gröbner Bases. J.
Symbolic Comp., 36:669–683, 2003.
[WG93] T. Weibel and G.H. Gonnet. An Assume Facility for CAS with
a Sample Implementation for Maple. In Proceedings DISCO ’92,
pages 95–103, 1993.
[WGD82] P.S. Wang, M.J.T. Guy, and J.H. Davenport. p-adic Reconstruction
of Rational Numbers. SIGSAM Bulletin 2, 16:2–3, 1982.
[Wil59] J.H. Wilkinson. The Evaluation of the Zeros of Ill-conditioned Poly-
nomials. Num. Math., 1:150–166, 1959.
[Wil12] V.V. Williams. ”multiplying matrices faster than coppersmith-
winograd”. In Proceedings of the 44th symposium on Theory of
Computing, STOC ’12, pages 887–898, New York, NY, USA, 2012.
ACM.
[Wil14] V.V. Williams. Multiplying matrices faster than Coppersmith-
Winograd. http://theory.stanford.edu/~virgi/
matrixmult-f.pdf, 2014.
[Win71] S. Winograd. On multiplication of 2 x 2 matrices. Linear algebra
and its applications, 4:381–388, 1971.
[Win88] F. Winkler. A p-adic Approach to the Computation of Gröbner
Bases. J. Symbolic Comp., 6:207–334, 1988.
[Wu86] W.-T. Wu. On zeros of algebraic equations — an application of
Ritt principle. Kexue Tongbao, 31:1–5, 1986.
[Yan98] T. Yan. The geobucket data structure for polynomials. J. Symbolic
Comp., 25:285–294, 1998.
[Yap91] C.K. Yap. A new lower bound construction for commutative Thue
systems with applications. J. Symbolic Comp., 12:1–27, 1991.
[Yun76] D.Y.Y. Yun. On Square-free Decomposition Algorithms. In R.D.
Jenks, editor, Proceedings SYMSAC 76, pages 26–35, 1976.
[Zar26] O. Zariski. Sull’impossibilità di risolvere parametricamente per rad-
icali un’equazione algebrica f(x, y) = 0 di genere p > 6 a moduli
generali. Atti Accad. Naz. Lincei Rend. Cl. Sc. Fis. Mat. Natur.
serie VI, 3:660–666, 1926.
[Zas69] H. Zassenhaus. On Hensel Factorization I. J. Number Theory,
1:291–311, 1969.
http://theory.stanford.edu/~virgi/matrixmult-f.pdf
http://theory.stanford.edu/~virgi/matrixmult-f.pdf
360 BIBLIOGRAPHY
[Zec49] J. Zech. Tafeln des Additions- und Subtractions- Logarithmen.
Weidmann, 1849.
[Zim07] P. Zimmerman. We recommend students never to use simplify in-
side programs. Personal communication, 2007.
[Zip79a] R.E. Zippel. Probabilistic Algorithms for Sparse Polynomials. In
Proceedings EUROSAM 79, pages 216–226, 1979.
[Zip79b] R.E. Zippel. Probabilistic Algorithms for Sparse Polynomials. PhD
thesis, M.I.T. Dept. of EE&CS, 1979.
[Zip93] R.E. Zippel. E↵ective Polynomial Computation. Kluwer Academic
Publishers, 1993.
Index
Abel’s theorem, 80
Active functions (Maple), 36
Additive complexity, 54
Admissible orderings, 50
Algebra
Free, 42
Algebraic
closure, 31
curve, 83
decomposition, 140
block-cylindrical, 145
cylindrical, 143
sampled, 141
function, 237
integer, 237
number, 237
proposition, 137
variable, 131
Algorithm
Atlantic City, 35
Bareiss, 95
Buchberger, 102
Extended, 121
Cantor–Zassenhaus, 215
Chinese Remainder, 302
for Polynomials, 303
Multivariate, 304
Polynomial form, 302
Euclid’s, 62
Extended Euclid, 68
Extended Subresultant, 69
Faugère–Gianni–Lazard–Mora, 117
Hermite’s, 248
IntExp–Polynomial, 265
IntExp–Rational Expression, 266
IntLog–Polynomial, 261
IntLog–Rational Expression, 262
Las Vegas, 35
Monte Carlo, 34
Ostrogradski–Horowitz, 249
Parallel Risch, 272
Primitive p.r.s., 66
Sturm sequence evaluation, 86
Subresultant, 67
Trager–Rothstein, 251
Vandermonde solver, 305
variant, 306
Alternations
of quantifiers, 139
Arithmetic
Classical, 33
Ascending Chain Condition, 30
Associates, 30
Associativity, 29
Assumption
Zippel, 189
Axiom, 321
Bézout’s Identity, 69
Bad reduction, 170, 181, 200
Banded Matrix, 90
Bareiss algorithm, 95
Basis
completely reduced, 102
Gröbner, 101, 129
completely reduced, 102
shape, 125
Birational equivalence, 84
Block-cylindrical
decomposition, 145
Bound, 295
Cauchy, 298
Hadamard, 295
Knuth, 298
361
362 INDEX
Mahler, 299
Branch Cut
Removable, 281
Buchberger
Algorithm, 102
Extended, 121
Criterion
First, 109
gcd, 109
lcm, 109
Third, 109
Theorem, 102
Budan–Fourier theorem, 87, 310
Calculus
Fundamental Theorem of, 246, 282
Candid representation, 22
Canonical representation, 21, 28
locally, 21
Cantor–Zassenhaus Algorithm, 215
Cauchy bound, 298
Cell, 140
Chain
Regular, 131
Chain Condition
Ascending, 30
Descending, 50
Characteristic, 31
set, 130
Chinese Remainder Theorem, 302
(Polynomial), 303
Circulant Matrix, 90
Classical
arithmetic, 33
Closure
Zariski, 133
Coe�cient
leading, 43
of a polynomial, 41
Commutativity, 29
Complexity
additive, 54
poly-dense, 50
poly-semisparse, 50
poly-sparse, 44, 50
Comprehensive Gröbner Basis, 126
Comprehensive Gröbner System, 126,
154
Conjecture
Schanuel’s, 287
Constant
definition of, 246
implicit, 32
Constraint
equational, 154
Content (of polynomials), 63
Criterion
S, 101
First Buchberger, 109
gcd, 109
lcm, 109
Third Buchberger, 109
Cyclotomic
polynomial, 319
Cylinder, 143
Cylindrical algebraic decomposition, 143
Partial, 153
Decomposition
algebraic, 140
block-cylindrical, 145
cylindrical, 143
partial, 153
equiprojectable, 131
Lemma (exponential), 263
Lemma (logarithmic), 259
Lemma (rational expressions), 247
order-invariant, 142
Partial Fraction, 70
sign-invariant, 142
square-free, 78
Defining formula, 140
Degree
of polynomial, 43
Total, 49
Denominator, 59
common, 59
Dense
matrix, 90
polynomial, 43
Density Theorem
Frobenius, 211, 216
INDEX 363
Descartes rule of signs, 87, 309
Descending Chain Condition, 50
Di↵erence
field, 273
ring, 273
Di↵erential
field, 245
ring, 245
Dimension
ideal, 104
linear space, 98
mixed, 105
triangular set, 131
Directed Acyclic Graph, 52
Discriminant, 294
Distributed representation, 50
Distributivity, 29
Division, 42
Dodgson–Bareiss theorem, 93
Domain
g.c.d., 61
integral, 30
Elementary
expression, 254
generator, 253
Elimination
Gauss, 91
fraction-free, 95
ideal, 117
ordering, 106
Equality
algebraic, 59
Equational constraint, 154
Equiprojectable
decomposition, 131
variety, 131
Equivalent, 99
Euclid’s
Algorithm, 62
Algorithm (Extended), 68, 197
Theorem, 62
Excel, 17
Existential theory of the reals, 152
Exponentiation
polynomial, 42
Expression
DAG representation, 52
tree representation, 52
Factorization
shape, 209
Farey
fractions, 196
reconstruction, 197
Farey Reconstruction, 197
Faugère–Gianni–Lazard–Mora
algorithm, 117
Fermat
Little Theorem of, 212
Field, 31
di↵erence, 273
di↵erential, 245
of fractions, 31
real closed, 136
Fill-in (loss of sparsity), 91
Formula
defining, 140
Free Algebra, 42
Frobenius
Lemma, 318
Frobenius Density Theorem, 211, 216
Function
active (Maple), 36
Algebraic, 237
Hilbert, 126
inert (Maple), 36
Möbius, 319
Fundamental
Theorem of Calculus, 246, 282
Galois
group, 210
Gauss
elimination, 91
Lemma, 63
Generalised
Polynomial, 263
Geobuckets, 47
Gianni–Kalkbrener
algorithm, 116
theorem, 115
364 INDEX
Good reduction, 170, 181, 200
Grae↵e method, 298
Greatest common divisor, 30, 61
domain, 61
Gröbner base, 101, 129
completely reduced, 102
Gröbner Basis
Comprehensive, 126
Gröbner fan, 121
Gröbner System
Comprehensive, 126, 154
Gröbner trace idea, 198
Gröbner walk, 121
Hadamard bound, 295
Hensel
Algorithm
General, 207
Lemma, 207
Lifting
Hybrid, 225
Linear, 218–220, 230
Multivariate, 232
Quadratic, 220–224
Hermite
Algorithm, 248
Hilbert
function, 126
polynomial, 126
History of Computer Algebra, 17
Ideal, 29
elimination, 117
polynomial, 99
Principal, 30
saturated, 134
Identity
Bézout’s, 69
Implicit
constants, 32
Inequality
Landau, 296
Landau–Mignotte, 168
Inert functions (Maple), 36
Initial, 130
Integer
Algebraic, 237
Integral domain, 30
Integration
indefinite, 246
Intermediate Expression Swell, 18, 163
Example, 198
IntExp–Polynomial
Algorithm, 265
IntExp–Rational Expression
Algorithm, 266
IntLog–Polynomial
Algorithm, 261
IntLog–Rational Expression
Algorithm, 262
Inverse
matrix, 91
Karatsuba Multiplication, 312
Knuth
bound, 298
Landau Inequality, 296
Landau notation, 32
Landau–Mignotte Inequality, 168
Laurent
Polynomial, 263
Least common multiple, 62
Lemma
Frobenius, 318
Hensel’s, 207
Thom’s, 88
Liouville’s Principle, 255
Parallel Version, 271
Locally canonical representation, 21
Logarithm
Zech, 241
Macsyma, 322
Mahler
bound, 299
measure, 291, 297
Main variable, 130
Maple, 324
Matrix
Banded, 90
Circulant, 90
INDEX 365
inverse, 91
Sylvester, 292
Toeplitz, 90
measure
Mahler, 291, 297
Minimal Polynomial, 237
Möbius function, 319
Monic polynomial, 43
Monomial, 50
leading, 99
Multiplication
Karatsuba, 312
Multiplicity of a solution, 78
MuPAD, 328
Newton
series, 55
Noetherian ring, 30
Normal representation, 21
Normal Selection Strategy, 110
Number
Algebraic, 237
Numerator, 59
Ordering
admissible, 50
elimination, 106
matrix, 106
purely lexicographic, 105
total degree, then lexicographic, 105
total degree, then reverse lexico-
graphic, 105
weighted, 106
Ostrogradski–Horowitz
Algorithm, 249
p-adic numbers, 207
p.r.s., 65
primitive, 66
subresultant, 67
Parallel Risch Algorithm, 272
Partial
Cylindrical algebraic decomposition,
153
Fraction Decomposition, 70
Polynomial
definition, 41
factored representation, 48
generalised, 263
height of, 291
Hilbert, 126
Laurent, 263
length of, 291
minimal, 237
remainder sequence, 65
signed, 66
time, 34
Wilkinson, 297
Positive
Formula, 155
Positivstellensatz, 153
Prenex normal form, 139
Primitive
(of polynomials), 63
p.r.s., 66
part, 63
Principal
ideal, 30
domain, 30
Principle
Liouville’s, 255
(Parallel Version), 271
Tarski–Seidenberg, 139
Problem
Elementary integration, 258
Elementary Risch di↵erential equa-
tion, 258
Projection, 146
Proposition
algebraic, 137
semi-algebraic, 137
Pseudo-euclidean algorithm, 66
Pseudo-remainder, 66, 132
Quantifier
alternation, 139
elimination, 139
Quantifier-free, 138
Quotient (polynomial), 62
Quotient rule, 246
Radical
366 INDEX
(i.e. n-th root), 80
(of an ideal), 103
Real, 138
Real closed fields, 136
existential theory of, 152
Recursive representation, 49
Reduce, 329
Reduction
bad, 170, 181, 200
good, 170, 181, 200
Reductum, 43
Regular
Chain, 131
Remainder (polynomial), 62
Removable
Branch Cut, 281
Singularity, 276
Representation
distributed (of polynomials), 50
expression DAG, 52
expression tree, 52
factored (of polynomials), 48
of objects, 20
candid, 22
canonical, 21, 28
locally canonical, 21
normal, 21
recursive (of polynomials), 49
Straight-Line Program, 52
Resultant, 291–294
modular calculation, 195
Ring, 29
di↵erence, 273
di↵erential, 245
Risch
di↵erential equation problem, 258
induction hypothesis, 259
Integration Theorem, 258
Parallel Algorithm, 272
RootOf, 36, 81
Rule
quotient, 246
Saturated ideal, 134
Schanuel’s Conjecture, 287
Series
Newton, 55
Set
triangular, 130
Shape
factorization, 209
Shape basis, 125
sign
variations, 309
Signed polynomial remainder sequence,
66
Singularity
Removable, 276
Sparse
bit size, 44
matrix, 90
polynomial, 43
Sparsity
of a polynomial, 46
S-polynomial, 100, 128
Square-free, 73
decomposition, 73
Stability
numerical, 26
Straight-Line Program Representation,
52
Strassen’s Theorem, 317
Strategy
Normal Selection, 110
Sugar, 111
Sturm
–Habicht sequence, 85
sequence, 85
Subresultant algorithm, 67, 69
Sugar, 110
Sugar Selection Strategy, 111
Support
of a polynomial, 46
Sylvester matrix, 292
System
quasi-algebraic, 132
Systems
Axiom, 321
Macsyma, 322
Maple, 324
MuPAD, 328
Reduce, 329
INDEX 367
Tarski
language, 137
Tarski–Seidenberg principle, 139
Term, 50
leading, 99
Substitution, Virtual, 155
Theorem
Buchberger, 102
Chinese Remainder, 302
Chinese Remainder (Polynomial),
303
Risch Integration, 258
Thom’s Lemma, 88
Toeplitz Matrix, 90
Total degree, 49
Trager–Rothstein
Algorithm, 251
Transcendental, 18
Strongly, 287
Triangular set, 130
Tschirnhaus transformation, 78
Unique factorisation domain, 61
Unit, 30
Vandermonde systems, 304
Variable
algebraic, 131
main, 130
Well-oriented, 148
Wilkinson Polynomial, 297
Zariski closure, 133
Zech logarithm, 241
Zero-divisors, 30
Zippel
Assumption, 189
368 INDEX
The total-degree Gröbner base from page 119.
{42 c4+555+293 a3+1153 a2b�3054 ab2+323 b3+2035 a2c+1642 abc�1211 b2c+
2253 ac2�1252 bc2�31 c3+1347 a2�3495 ab+1544 b2�1100 ac+2574 bc�368 c2+
2849 a+ 281 b+ 4799 c, 21 bc3 � 57� 10 a3 � 41 a2b+ 183 ab2 � 25 b3 � 104 a2c�
65 abc+91 b2c�129 ac2+59 bc2�16 c3�33 a2+225 ab�142 b2+67 ac�174 bc�
5 c2�154 a�46 b�310 c, 21 ac3�75�29 a3�121 a2b+369 ab2�41 b3�226 a2c�
178 abc+ 161 b2c� 267 ac2 + 148 bc2 + 4 c3 � 123 a2 + 411 ab� 206 b2 + 146 ac�
324 bc+38 c2�329 a�62 b�584 c, 14 b2c2+5+a3+9 a2b+2 ab2� b3+23 a2c+
10 abc�21 b2c+15 ac2�8 bc2+3 c3�3 a2�19 ab�4 b2�6 ac�26 bc�10 c2+7 a�b+
3 c, 21 abc2+30�a3�2 a2b�51 ab2+8 b3+40 a2c+25 abc�56 b2c+69 ac2�13 bc2+
11 c3�18 a2�72 ab+53 b2�8 ac+54 bc+10 c2+56 a+29 b+116 c, 14 a2c2+11+
5 a3+31 a2b�74 ab2+9 b3+45 a2c+22 abc�35 b2c+61 ac2�40 bc2+c3+13 a2�
95 ab+50 b2�44 ac+66 bc+6 c2+63 a+23 b+127 c, 21 b3c�6�4 a3+13 a2b+
48 ab2�10 b3�8 a2c�5 abc�14 b2c+3 ac2�10 bc2+2 c3�30 a2+27 ab�40 b2+
10 ac�57 bc�2 c2�7 a�10 b�40 c, 6 ab2c�3�a3�5 a2b+12 ab2�b3�11 a2c�
2 abc+7 b2c�9 ac2+2 bc2� c3�3 a2+15 ab�4 b2+4 ac�6 bc�2 c2�13 a� b�
19 c, 14 a2bc�13+3 a3+13 a2b+6 ab2�3 b3�a2c+2 abc+21 b2c�25 ac2+4 bc2�
5 c3+19 a2+27 ab�26 b2+10 ac�8 bc�2 c2�7 a�17 b�33 c, 7 a3c+3�5 a3�
24 a2b+25 ab2�2 b3�17 a2c�8 abc+7 b2c�19 ac2+12 bc2� c3�6 a2+32 ab�
8 b2+23 ac�17 bc�6 c2�21 a�2 b�36 c, 42 b4�3�121 a3�557 a2b+570 ab2+
275 b3�515 a2c�608 abc�77 b2c�555 ac2+2 bc2�55 c3�645 a2+633 ab�160 b2+
82 ac�690 bc�302 c2�679 a+65 b�1147 c, 42 ab3+15�11 a3�85 a2b+6 ab2+
25 b3�43 a2c�40 abc�49 b2c�39 ac2+4 bc2�5 c3�51 a2�15 ab+58 b2�4 ac+
6 bc�16 c2�35 a+25 b�5 c, 21 a2b2+3+2 a3+25 a2b�45 ab2+5 b3+25 a2c�
29 abc�14 b2c+9 ac2�16 bc2�c3�6 a2�45 ab+20 b2�47 ac+18 bc+c2+14 a+
5 b+41 c, 21 a3b+18�16 a3�74 a2b+24 ab2+2 b3�53 a2c+abc�35 b2c+12 ac2+
23 bc2+8 c3�36 a2+24 ab+29 b2+40 ac+3 bc�8 c2�28 a+23 b�13 c, 42 a4�57+
431 a3+757 a2b�804 ab2+59 b3+799 a2c�2 abc�119 b2c+417 ac2�340 bc2+
5 c3+303 a2� 1203 ab+194 b2� 752 ac+246 bc+184 c2+581 a� 67 b+1013 c}.
The equivalent lexicographic base.
{1 � 6 c � 41 c2 + 71 c3 � 41 c18 + 197 c14 + 106 c16 � 6 c19 + 106 c4 + 71 c17 +
c20 + 92 c5 + 197 c6 + 145 c7 + 257 c8 + 278 c9 + 201 c10 + 278 c11 + 257 c12 +
145 c13+92 c15, 9741532+39671190 c�96977172 c2�140671876 c3+7007106 c18�
120781728 c14�79900378 c16�1184459 c19�131742078 c4+49142070 c17�253979379 c5�
204237390 c6 � 337505020 c7 � 356354619 c8 � 271667666 c9 � 358660781 c10 �
323810244 c11�193381378 c12�244307826 c13�131861787 c15+1645371 b,�487915 c18+
705159 a + 4406102 c + 16292173 c14 + 17206178 c2 + 3830961 c16 + 91729 c19 �
377534+21640797 c3+26686318 c4� 4114333 c17+34893715 c5+37340389 c6+
47961810 c7+46227230 c8+42839310 c9+46349985 c10+37678046 c11+28185846 c12+
26536060 c13 + 13243117 c15}.
Introduction
History and Systems
The `calculus’ side
The `group theory’ side
A synthesis?
Expansion and Simplification
A Digression on “Functions”
Expansion
Simplification
An example of simplification
Equality
Algebraic Definitions
Algebraic Closures
Some Complexity Theory
Complexity Hierarchy
Probabilistic Algorithms
Some Maple
Maple polynomials
Maple Polynomials
Maple rational functions
The RootOf construct
Active and Inert Functions
The simplify command
Equality
Polynomials
What are polynomials?
How do we manipulate polynomials?
Polynomials in one variable
A factored representation
Polynomials in several variables
Other representations
The Newton Representation
Representations in Practice
Comparative Sizes
Rational Functions
Canonical Rational Functions
Candidness of rational functions
Greatest Common Divisors
Polynomials in one variable
Subresultant sequences
The Extended Euclidean Algorithm
Partial Fractions
Polynomials in several variables
Square-free decomposition
Sparse Complexity
Non-commutative polynomials
Polynomial Equations
Equations in One Variable
Quadratic Equations
Cubic Equations
Quartic Equations
Higher Degree Equations
Reducible defining polynomials
Multiple Algebraic Numbers
Solutions in Real Radicals
Equations of curves
How many Real Roots?
Sturm–Habicht Example
Thom’s Lemma
Linear Equations in Several Variables
Linear Equations and Matrices
Representations of Matrices
Matrix Inverses: not a good idea!
Bareiss–Dodgson Warning
Complexity
Over/under-determined Systems
Nonlinear Multivariate Equations: Distributed
Gröbner Bases
How many Solutions?
Orderings
Complexity of Gröbner Bases
A Matrix Formulation
Example
The Gianni–Kalkbrener Theorem
Gianni–Kalkbrener Theorem
The Faugère–Gianni–Lazard–Mora Algorithm
FGLM Example
The Gröbner Walk
Groebner Walk Example
Factorization and Gröbner Bases
The Shape Lemma
The Hilbert function
Comprehensive Gröbner Bases and Systems
Coefficients other than fields
Non-commutative Ideals
Nonlinear Multivariate Equations: Recursive
Triangular Sets and Regular Chains
Zero Dimension
Positive Dimension
Conclusion
Regular Decomposition
Equations and Inequalities
Applications
Real Radical
Quantifier Elimination
Algebraic Decomposition
Cylindrical Algebraic Decomposition
Computing Algebraic Decompositions
Describing Solutions
Complexity
Further Observations
Virtual Term Substitution
The Weak Case
The Strict Case
Nested Quantifiers
Universal quantifiers
Complexity of VTS
Conclusions
Modular Methods
Matrices: a Simple Example
Matrices with integer coefficients: Determinants
Matrices with polynomial coefficients: Determinants
Conclusion: Determinants
Linear Equations with integer coefficients
Linear Equations with polynomial coefficients
Conclusion: Linear Equations
Matrix Inverses
Gcd in one variable
Bounds on divisors
The modular – integer relationship
Computing the g.c.d.: one large prime
Computing the g.c.d.: several small primes
Computing the g.c.d.: early success
An alternative correctness check
Conclusion
Polynomials in two variables
Degree Growth in Coefficients
The evaluation–interpolation relationship
G.c.d. in Zp[x,y]
G.c.d. in Z[x,y]
Polynomials in several variables
A worked example
Converting this to an algorithm
Worked example continued
Conclusions
Further Applications
Resultants and Discriminants
Linear Systems
Gröbner Bases
General Considerations
The Hilbert Function and reduction
The Modular Algorithm
Conclusion
Conclusions
p-adic Methods
Introduction to the factorization problem
Modular methods
The Musser test
Factoring modulo a prime
Berlekamp’s small p method
The Cantor–Zassenhaus method
Berlekamp’s large p method
From Zp to Z?
Hensel Lifting
Linear Hensel Lifting
Quadratic Hensel Lifting
Quadratic Hensel Lifting Improved
Hybrid Hensel Lifting
The recombination problem
Univariate Factoring Solved
Multivariate Factoring
A “Good Reduction” Complexity Result
A Sparsity Result
The Leading Coefficient Problem
Other Applications
Factoring Straight-LIne Programs
p-adic Greatest Common Divisors
p-adic Gröbner Bases
p-adic determinants
Conclusions
Algebraic Numbers and Functions
Representations of Finite Fields
Additive Representation
Multiplicative representation
Representations of Algebraic Numbers
Factorisation with Algebraic Numbers
The D5 approach to algebraic numbers
Distinguishing roots
Calculus
Introduction
Integration of Rational Expressions
Integration of Proper Rational Expressions
Hermite’s Algorithm
The Ostrogradski–Horowitz Algorithm
The Trager–Rothstein Algorithm
Theory: Liouville’s Theorem
Liouville’s Principle
Finding L
Risch Structure Theorem
Overview of Integration
Integration of Logarithmic Expressions
The Polynomial Part
The Rational Expression Part
Conclusion of Logarithmic Integration
Integration of Exponential Expressions
The Polynomial Part
The Rational Expression Part
Integration of Algebraic Expressions
The Risch Differential Equation Problem
The Parallel Approach
The Parallel Approach: Algebraic Expressions
Definite Integration
Other Calculus Problems
Indefinite summation
Definite Symbolic Summation
Algebra versus Analysis
Functions and Formulae
Branch Cuts
Some Unpleasant Facts
The Problem with Square Roots
Possible Solutions
Removable Branch Cuts
Fundamental Theorem of Calculus Revisited
Constants Revisited
Constants can be useful
Constants are often troubling
Integrating `real’ Functions
Logarithms revisited
Other decision questions
Limits
A Definite Integral
Algebraic Background
The resultant and friends
Resultant
Discriminants
Iterated Operations
Useful Estimates
Matrices
Coefficients of a polynomial
Roots of a polynomial
Root separation
Developments
Chinese Remainder Theorem
Chinese Remainder Theorem for Polynomials
Vandermonde Systems
More matrix theory
Algebraic Structures
Excursus
The Budan–Fourier Theorem
Equality of factored polynomials
Karatsuba’s method
Karatsuba’s method in practice
Karatsuba’s method and sparse polynomials
Karatsuba’s method and multivariate polynomials
Faster still
Faster division
Faster g.c.d. computation
Strassen’s method
Strassen’s method in practice
Further developments
Matrix Inversion
Cyclotomic Polynomials
Systems
Axiom
Overview
History
Structure
Macsyma
Overview
History
Maple
Overview
History
Data structures
Heuristic GCD
Conclusion
MuPAD
Overview
History
Reduce
Overview
History
Index of Notation