CS计算机代考程序代写 AI ant Review of Linear Algebra

Review of Linear Algebra
April 1, 2021
1/60

1 Basic Concepts and Notation
2 Matrix Multiplication
3 Operations and Properties
4 Matrix Calculus
2/60

Basic Concepts and Notation
3/60

Basic Notation
– By x 2 Rn, we denote a vector with n entries.
26 x 1 37
6 x2 7 x = 64 . . . 75 .
xn
– By A 2 Rm⇥n we denote a matrix with m rows and n columns, where the entries of A are
real numbers.
2 a 1 1 a 1 2 · · · a 1 n 3 2 3 2 — a 1T — 3
6 6 a 2 1 a 2 2 · · · a 2 n 7 7 | | | 6 6 — a 2T — 7 7 A = 64 . . . . . . . . . . . . 75 = 4 a 1 a 2 · · · a n 5 = 64 . . . 75 .
a m 1 a m 2 · · · a m n | | | — a mT —
4/60

The Identity Matrix
The identity matrix, denoted I 2 Rn⇥n, is a⇢square matrix with ones on the diagonal and zeros
everywhere else. That is,
Iij = It has the property that for all A 2 Rn⇥n,
AI =A=IA.
1i=j 0 i6=j
5/60

Diagonal matrices
A diagonal matrix is a matrix where all non-diagonal elements are 0. This is typically denoted
D = diag(d1,d2,…,dn), with Clearly, I = diag(1,1,…,1).
Dij =

di i=j 0 i6=j
6/60

1 Basic Concepts and Notation
2 Matrix Multiplication
3 Operations and Properties
4 Matrix Calculus
7/60

Vector-Vector Product
– inner product or dot product
26y137 n
xTy2R=⇥x x ··· x ⇤6y2 7=Xxy. 12 n4.5 ii
i=1 yn
– outer product
26 x1 37 26 x1y1 x1y2 ··· x1yn 37
xyT 2Rm⇥n =6 x2 7⇥ y y ··· y ⇤=6 x2y1 x2y2 ··· x2yn 7. 4 . 5 1 2 n 4 . . … . 5
xm xmy1 xmy2 ··· xmyn
8/60

Matrix-Vector Product
– If we write A by rows, then we can express Ax as, 2—a1T—3 2a1Tx3
6—a2T—7 6a2Tx7 y = A x = 64 . . . 75 x = 64 . . . 75 .
—amT— amTx – If we write A by columns, then we have:
2|| |326×1372323 23 y=Ax=4a1 a2 ··· an 56×2 7=4a1 5x +4a2 5x +…+4an 5x .
|||4.512 n xn
(1)
y is a linear combination of the columns of A.
9/60

Matrix-Vector Product
It is also possible to multiply on the left by a row vector.
– If we write A by columns, then we can express x>A as,
2|| |3⇥ ⇤ yT =xTA=xT 4 a1 a2 ··· an 5= xTa1 xTa2 ··· xTan
|||
– expressing A in terms of rows we have:
2—a1T —3
y T = x T A = ⇥ x x · · · x ⇤ 6 6 6 — a 2T — 7 7 7 12 m4.5
— amT —
= x1⇥— a1T —⇤+x2⇥— a2T —⇤+…+xm⇥— amT —⇤ yT is a linear combination of the rows of A.
10/60

Matrix-Matrix Multiplication (different views)
1. As a set of vector-vector products 2— a1T —32
6— a2T —7 | | C = A B = 64 . . . 75 4 b 1 b 2
— a mT — | |
|
b p |
3 2a1Tb1 a1Tb2 ··· a1Tbp 3 6a2Tb1 a2Tb2 ··· a2Tbp 7
· · ·
5 = 64 . . . . . . . . . . . . 75 . a mT b 1 a mT b 2 · · · a mT b p
11/60

Matrix-Matrix Multiplication (different views)
2. As a sum of outer products
2 32—b1T —3
| | | 6 6 — b 2T — 7 7 Xn C=AB=4a1 a2 ··· an 564 . 75= aibiT .
|| | —bnT— i=1
12/60

Matrix-Matrix Multiplication (different views)
3. As a set of matrix-vector products.
2|||32|||3 C=AB=A4b1 b2 ··· bp 5=4Ab1 Ab2 ··· Abp 5. (2) ||||||
Here the ith column of C is given by the matrix-vector product with the vector on the right, ci = Abi . These matrix-vector products can in turn be interpreted using both viewpoints given in the previous subsection.
13/60

Matrix-Matrix Multiplication (different views)
4. As a set of vector-matrix products.
2—a1T—3 2—a1TB—3
6— a2T —7 6— a2TB —7 C = A B = 64 . . . 75 B = 64 . . . 75 .
— a mT —
— a mT B —
14/60

Matrix-Matrix Multiplication (properties)
– Associative: (AB)C = A(BC).
– Distributive: A(B + C) = AB + AC.
– In general, not commutative; that is, it can be the case that AB 6= BA. (For example, if A 2 Rm⇥n and B 2 Rn⇥q, the matrix product BA does not even exist if m and q are not equal!)
15/60

1 Basic Concepts and Notation
2 Matrix Multiplication
3 Operations and Properties
4 Matrix Calculus
16/60

Operations and Properties
17/60

The Transpose
The transpose of a matrix results from “flipping” the rows and columns. Given a matrix
A 2 Rm⇥n, its transpose, written AT 2 Rn⇥m, is the n ⇥ m matrix whose entries are given by
(AT)ij =Aji.
The following properties of transposes are easily verified:
– (AT)T =A
– (AB)T =BTAT
– (A+B)T =AT +BT
18/60

Trace
The trace of a square matrix A 2 Rn⇥n, denoted trA, is the sum of diagonal elements in the
matrix:
Xn
Aii .
i=1
trA =
The trace has the following properties:
– ForA2Rn⇥n,trA=trAT.
– ForA,B2Rn⇥n,tr(A+B)=trA+trB.
– ForA2Rn⇥n,t2R,tr(tA)=ttrA.
– For A, B such that AB is square, trAB = trBA.
– For A, B, C such that ABC is square, trABC = trBCA = trCAB, and so on for the product of more matrices.
19/60

Norms
A norm of a vector kxk is informally a measure of the “length” of the vector.
More formally, a norm is any function f : Rn ! R that satisfies 4 properties: 1. For all x 2 Rn, f (x) 0 (non-negativity).
2. f(x)=0ifandonlyifx=0(definiteness).
3. For all x 2 Rn, t 2 R, f (tx) = |t|f (x) (homogeneity).
4. Forallx,y2Rn,f(x+y)f(x)+f(y)(triangleinequality).
20/60

Examples of Norms
The commonly-used Euclidean or `2 norm, v
uXn kxk2 = t xi2.
i=1
The `1 norm, The `1 norm,
Xn i=1
parameterized by a real number p 1, and defined as!1/p
Xn
kxkp = |xi|p . i=1
kxk1 =
kxk1 = maxi |xi|.
|xi|
In fact, all three norms presented so far are examples of the family of `p norms, which are
21/60

Matrix Norms
Norms can also be defined for matrices, such as the Frobenius norm,
vu u Xm Xn q
kAkF =t A2ij = tr(ATA).
i=1 j=1
Many other norms exist, but they are beyond the scope of this review.
22/60

Linear Independence
A set of vectors {x1, x2, . . . xn} ⇢ Rm is said to be (linearly) dependent if one vector
belonging to the set can be represented as a linear combination of the remaining vectors; that is,
if X n1
xn = ↵ixi i=1
for some scalar values ↵1, . . . , ↵n1 2 R; otherwise, the vectors are (linearly) independent.
Example:
24 1 35 24 4 35 24 2 35 x1= 2 x2= 1 x3= 3
3 5 1
are linearly dependent because x3 = 2×1 + x2.
23/60

Rank of a Matrix
– The column rank of a matrix A 2 Rm⇥n is the size of the largest subset of columns of A that constitute a linearly independent set.
– The row rank is the largest number of rows of A that constitute a linearly independent set.
– For any matrix A 2 Rm⇥n, it turns out that the column rank of A is equal to the row rank of A (prove it yourself!), and so both quantities are referred to collectively as the rank of A, denoted as rank(A).
24/60

Properties of the Rank
– For A 2 Rm⇥n, rank(A)  min(m, n). If rank(A) = min(m, n), then A is said to be full rank .
– For A 2 Rm⇥n, rank(A) = rank(AT ).
– For A 2 Rm⇥n, B 2 Rn⇥p, rank(AB)  min(rank(A),rank(B)). – For A, B 2 Rm⇥n, rank(A + B)  rank(A) + rank(B).
25/60

The Inverse of a Square Matrix
– The inverse of a square matrix A 2 Rn⇥n is denoted A1, and is the unique matrix such
that
A1A = I = AA1.
– We say that A is invertible or non-singular if A1 exists and non-invertible or singular
otherwise.
– In order for a square matrix A to have an inverse A1, then A must be full rank.
– Properties (Assuming A, B 2 Rn⇥n are non-singular): – (A1)1 = A
– (AB)1 = B1A1
– (A1)T = (AT )1. For this reason this matrix is often denoted AT .
26/60

Orthogonal Matrices
– Two vectors x , y 2 Rn are orthogonal if x T y = 0.
– A vector x 2 Rn is normalized if kxk2 = 1.
– A square matrix U 2 Rn⇥n is orthogonal if all its columns are orthogonal to each other and are normalized (the columns are then referred to as being orthonormal).
– Properties:
– The inverse of an orthogonal matrix is its transpose.
UT U = I = UUT .
– Operating on a vector with an orthogonal matrix will not change its Euclidean norm, i.e.,
kUxk2 = kxk2 for any x 2 Rn, U 2 Rn⇥n orthogonal.
27/60

Span and Projection
– The span of a set of vectors {x1, x2, . . . xn} is the set of all vectors that can be expressed as a linear combination of {x1,…,xn}. That is,
(Xn ) span({x1,…xn}) = v : v = ↵ixi, ↵i 2 R .
i=1
– The projection of a vector y 2 Rm onto the span of {x1,…,xn} is the vector
v 2 span({x1,…xn}), such that v is as close as possible to y, as measured by the Euclidean norm kv yk2.
Proj(y; {x1, . . . xn}) = argminv2span({x1,…,xn})ky vk2.
28/60

Range
– The range or the columnspace of a matrix A 2 Rm⇥n, denoted R(A), is the the span of the columns of A. In other words,
R(A)={v 2Rm :v =Ax,x 2Rn}.
– Assuming A is full rank and n < m, the projection of a vector y 2 Rm onto the range of A is given by, Proj(y; A) = argminv2R(A)kv yk2. 29/60 Null space The nullspace of a matrix A 2 Rm⇥n, denoted N(A) is the set of all vectors that equal 0 when multiplied by A, i.e., N(A)={x 2Rn :Ax =0}. 30/60 The Determinant The determinant of a square matrix A 2 Rn⇥n, is a function det : Rn⇥n ! R, and is denoted |A| or det A. Given a matrix 2—a1T —3 6—a2T —7 64 . . . 75 , — anT — consider the set of points S ⇢ Rn as follows: n Xn S={v2R :v= ↵iai where0↵i 1,i=1,...,n}. i=1 The absolute value of the determinant of A, it turns out, is a measure of the “volume” of the set S. 31/60 The Determinant: intuition For example, consider the 2 ⇥ 2 matrix, A=1 3. (3) 32 Here, the rows of the matrix are a1=13 a2=32. 32/60 The determinant Algebraically, the determinant satisfies the following three properties: 1. The determinant of the identity is 1, |I | = 1. (Geometrically, the volume of a unit hypercube is 1). 2. GivenamatrixA2Rn⇥n,ifwemultiplyasinglerowinAbyascalart2R,thenthe determinant of the new matrix is t|A|, (Geometrically, multiplying one of the sides of the set S by a factor t causes the volume to increase by a factor t.) 3. If we exchange any two rows aiT and ajT of A, then the determinant of the new matrix is |A|, for example In case you are wondering, it is not immediately obvious that a function satisfying the above three properties exists. In fact, though, such a function does exist, and is unique (which we will not prove here). 33/60 The Determinant: Properties - ForA2Rn⇥n,|A|=|AT|. - For A, B 2 Rn⇥n, |AB| = |A||B|. - For A 2 Rn⇥n, |A| = 0 if and only if A is singular (i.e., non-invertible). (If A is singular then it does not have full rank, and hence its columns are linearly dependent. In this case, the set S corresponds to a “flat sheet” within the n-dimensional space and hence has zero volume.) - For A 2 Rn⇥n and A non-singular, |A1| = 1/|A|. 34/60 The determinant: formula Let A 2 Rn⇥n, A\i,\j 2 R(n1)⇥(n1) be the matrix that results from deleting the ith row and jth column from A. The general (recursive) formula for the determinant is |A| = = Xn i=1 Xn j=1 (1)i+jaij|A\i,\j| (1)i+jaij|A\i,\j| (for any j 2 1,...,n) (for any i 2 1,...,n) with the initial case that |A| = a11 for A 2 R1⇥1. If we were to expand this formula completely for A 2 Rn⇥n, there would be a total of n! (n factorial) different terms. For this reason, we hardly ever explicitly write the complete equation of the determinant for matrices bigger than 3 ⇥ 3. 35/60 The determinant: examples However, the equations for determinants of matrices up to size 3 ⇥ 3 are fairly common, and it is good to know them:  a11 2 a21 |[a11 ]| a12 a22 3 a13 a23 5 a33 = a11 = a11a22 a12a21 a11a22a33 + a12a23a31 + a13a21a32 = a11a23a32 a12a21a33 a13a22a31 a11 4 a21 a22 a12 a31 a32 36/60 Quadratic Forms Given a square matrix A 2 Rn⇥n and a vector x 2 Rn, the scalar value xT Ax is called a quadratic form. Written explicitly, we see that T Xn x Ax= i=1 xi(Ax)i= Xn0@Xn 1AXnXn xi Aijxj = Aijxixj . i=1 j=1 i=1 j=1 We often implicitly assume that the matrices appearing in a quadratic form are symmetric. xTAx=(xTAx)T =xTATx=xT✓12A+12AT◆x, 37/60 Positive Semidefinite Matrices A symmetric matrix A 2 Sn is: - positive definite (PD), denoted A 0 if for all non-zero vectors x 2 Rn, xT Ax > 0.
– positive semidefinite (PSD), denoted A ⌫ 0 if for all vectors xT Ax 0.
– negative definite (ND), denoted A 0 if for all non-zero x 2 Rn, xT Ax < 0. - negative semidefinite (NSD), denoted A 0 ) if for all x 2 Rn, xT Ax  0. - indefinite, if it is neither positive semidefinite nor negative semidefinite — i.e., if there exists x1,x2 2 Rn such that x1TAx1 > 0 and x2TAx2 < 0. 38/60 Positive Semidefinite Matrices - One important property of positive definite and negative definite matrices is that they are always full rank, and hence, invertible. - Given any matrix A 2 Rm⇥n (not necessarily symmetric or even square), the matrix G = AT A (sometimes called a Gram matrix ) is always positive semidefinite. Further, if m n and A is full rank, then G = ATA is positive definite. 39/60 Eigenvalues and Eigenvectors GivenasquarematrixA2Rn⇥n,wesaythat2CisaneigenvalueofAandx2Cn isthe corresponding eigenvector if Ax=x, x6=0. Intuitively, this definition means that multiplying A by the vector x results in a new vector that points in the same direction as x, but scaled by a factor . 40/60 Eigenvalues and Eigenvectors We can rewrite the equation above to state that (,x) is an eigenvalue-eigenvector pair of A if, (I A)x =0, x 6=0. But (I A)x = 0 has a non-zero solution to x if and only if (I A) has a non-empty nullspace, which is only the case if (I A) is singular, i.e., |(I A)| = 0. We can now use the previous definition of the determinant to expand this expression |(I A)| into a (very large) polynomial in , where will have degree n. It’s often called the characteristic polynomial of the matrix A. 41/60 Properties of eigenvalues and eigenvectors - The trace of a A is equal to the sum of its eigenvalues, Xn trA = - The determinant of A is equal to the product of its eigenvalues, |A| = - The rank of A is equal to the number of non-zero eigenvalues of A. - Suppose A is non-singular with eigenvalue and an associated eigenvector x. Then 1/ is an eigenvalue of A1 with an associated eigenvector x, i.e., A1x = (1/)x. - The eigenvalues of a diagonal matrix D = diag(d1, . . . dn) are just the diagonal entries d1,...dn. i=1 Yn i=1 i . i . 42/60 Eigenvalues and Eigenvectors of Symmetric Matrices Throughout this section, let’s assume that A is a symmetric real matrix (i.e., A> = A). We have the following properties:
1. All eigenvalues of A are real numbers. We denote them by 1, . . . , n.
2. There exists a set of eigenvectors u1,…,un such that (i) for all i, ui is an eigenvector with eigenvalue i and (ii) u1, . . . , un are unit vectors and orthogonal to each other.
43/60

New Representation for Symmetric Matrices
– Let U be the orthonormal matrix that contains ui ’s as columns:
24|| |35 U= u1 u2 ··· un
|||
– Let ⇤ = diag(1,…,n) be the diagonal matrix that contains 1,…,n. – We can verify that
24 | | | 35 24 | | | 35
AU = Au1 Au2 ··· Aun = 1u1 2u2 ··· nun = Udiag(1,…,n) = U⇤
||||||
– Recalling that orthonormal matrix U satisfies that UUT = I, we can diagonalize matrix A: A = AUUT = U⇤UT (4)
44/60

Background: representing vector w.r.t. another basis.
24|| |35
– Any orthonormal matrix U = u1 u2 · · · un defines a new basis of Rn.
|||
– For any vector x 2 Rn can be represented as a linear combination of u1, . . . , un with
coefficient xˆ1, . . . , xˆn:
– Indeed, such xˆ uniquely exists
x = xˆ 1 u 1 + · · · + xˆ n u n = U xˆ x = U xˆ , U T x = xˆ
In other words, the vector xˆ = U T x can serve as another representation of the vector x w.r.t the basis defined by U.
45/60

“Diagonalizing” matrix-vector multiplication.
– Left-multiplying matrix A can be viewed as left-multiplying a diagonal matrix w.r.t the basic of the eigenvectors.
– Suppose x is a vector and xˆ is its representation w.r.t to the basis of U. – Let z = Ax be the matrix-vector product.
– the representation z w.r.t the basis of U:
2 xˆ 3 11
6 xˆ 7 TTTT6227 zˆ=U z=U Ax=U U⇤U x=⇤xˆ=64 . 75
n xˆ n
– We see that left-multiplying matrix A in the original space is equivalent to left-multiplying the diagonal matrix ⇤ w.r.t the new basis, which is merely scaling each coordinate by the corresponding eigenvalue.
46/60

“Diagonalizing” matrix-vector multiplication.
Under the new basis, multiplying a matrix multiple times becomes much simpler as well. For example, suppose q = AAAx.
6 3xˆ 7 TTTTTT36227 qˆ=U q=U AAAx=U U⇤U U⇤U U⇤U x=⇤xˆ=64 . 75
2 3xˆ 3 11
3 xˆ nn
47/60

“Diagonalizing” quadratic form.
As a directly corollary, the quadratic form xTAx can also be simplified under the new basis T T T T Xn 2
x Ax=x U⇤U x=xˆ ⇤xˆ= xˆ ii
i=1
(Recall that with the old representation, x T Ax = Pni =1,j =1 xi xj Aij involves a sum of n2 terms instead of n terms in the equation above.)
48/60

The definiteness of the matrix A depends entirely on the sign of its eigenvalues
1 .
2 . 3.
4.
I f a l l i > 0 , t h e n t h e m a t r i x A s p o s i t i v e d e fi n i t e b e c a u s e x T A x = P n i xˆ 2 > 0 f o r a n y 1 i=1 i
xˆ 6 = 0 . P
I f a l l i 0 , i t i s p o s i t i v e s e m i d e fi n i t e b e c a u s e x T A x = n i xˆ 2 0 f o r a l l xˆ .
i=1 i
Likewise, if all i < 0 or i  0, then A is negative definite or negative semidefinite respectively. Finally, if A has both positive and negative eigenvalues, say i > 0 and j < 0, then it is indefinite.PThis is because if we let xˆ satisfy xˆ = 1 and xˆ = 0,8k 6= i, then ik xTAx=Pn xˆ2>0.Similarlywecanletxˆsatisfyxˆ=1andxˆ =0,8k6=j,then i=1 i i j k
x T A x = n i xˆ 2 < 0 . i=1 i 1 N o t e t h a t xˆ 6 = 0 , x 6 = 0 . 49/60 1 Basic Concepts and Notation 2 Matrix Multiplication 3 Operations and Properties 4 Matrix Calculus 50/60 Matrix Calculus 51/60 The Gradient Supposethatf :Rm⇥n !RisafunctionthattakesasinputamatrixAofsizem⇥nand returns a real value. Then the gradient of f (with respect to A 2 Rm⇥n) is the matrix of partial derivatives, defined as: 2@f(A) @f(A) ···@f(A)3 6@A11 @A12 @A1n 7 i.e., an m ⇥ n matrix with (rAf (A))ij = @f (A). @ Aij 6@f(A) @f(A) ···@f(A)7 r f(A)2Rm⇥n =6 @A21 @A22 @A2n 7 A 64 . . . . . . . . . . . . 75 @f(A) @f(A) ··· @f(A) @Am1 @Am2 @Amn 52/60 The Gradient Note that the size of rAf (A) is always the same as the size of A. So if, in particular, A is just a vectorx2Rn, It follows directly from the equivalent properties of partial derivatives that: - rx(f(x)+g(x))=rxf(x)+rxg(x). - Fort2R,rx(tf(x))=trxf(x). 3 6 @x1 7 2@f(x) 6 @f(x) 7 rf(x)=6 @x2 7. x 64 . 75 @f(x) @xn 53/60 The Hessian Suppose that f : Rn ! R is a function that takes a vector in Rn and returns a real number. Then the Hessian matrix with respect to x , written r2x f (x ) or simply as H is the n ⇥ n matrix of partial derivatives, 2 r2f(x)2Rn⇥n =6 x 64 @2f(x) 3 @x1@xn 7 @2f(x) 7 @x2@xn 7. . 75 @2f(x) @ x n2 6 6 @2f(x) @2f(x) ··· @ x 12 @ x 1 @ x 2 @2f(x) @2f(x) ··· @x2@x1 @x2 . . ... @2f(x) @2f(x) ··· @xn@x1 @xn@x2 I n o t h e r w o r d s , r 2x f ( x ) 2 R n ⇥ n , w i t h (r2xf(x))ij = @2f(x). @xi@xj Note that the Hessian is always symmetric, since @2f (x) = @2f (x). @xi @xj @xj @xi 54/60 Gradients of Linear Functions For x 2 Rn, let f (x) = bT x for some known vector b 2 Rn. Then Xn i=1 so Xn bixi=bk. f (x ) = @f(x)= @ bi xi @xk @xk i=1 From this we can easily see that rx bT x = b. This should be compared to the analogous situation in single variable calculus, where @/(@x) ax = a. 55/60 Gradients of Quadratic Function Now consider the quadratic function f (x) = xT Ax for A 2 Sn. Remember that @ f ( x ) @ Xn Xn @x = @x Aijxixj Xn Xn i=1 j=1 f (x ) = To take the partial derivative, we’ll consider the terms including xk and xk2 factors separately: Aij xi xj . k = = = k i=1 j=1 @ 24XXAijxixj + XAikxixk + XAkjxkxj + Akkxk235 XAikxi + XAkjxj + 2Akkxk i 6=k j 6=k @xk i6=k j6=k i6=k j6=k Xn Xn Xn Aikxi + Akjxj =2 Akixi, i=1 j=1 i=1 56/60 Hessian of Quadratic Functions Finally, let’s look at the Hessian of the quadratic function f (x) = xT Ax In this case, @2f(x) @ @f(x) @ " Xn # @x @x = @x @x = @x 2 A`ixi = 2A`k = 2Ak`. k ` k ` k i=1 Therefore, it should be clear that r2x x T Ax = 2A, which should be entirely expected (and again analogous to the single-variable fact that @2/(@x2) ax2 = 2a). 57/60 Recap - rxbTx=b - r 2x b T x = 0 - rx x T Ax = 2Ax (if A symmetric) - r 2x x T A x = 2 A ( i f A s y m m e t r i c ) 58/60 Matrix Calculus Example: Least Squares - Given a full rank matrices A 2 Rm⇥n, and a vector b 2 Rm such that b 62 R(A), we want to find a vector x such that Ax is as close as possible to b, as measured by the square of the Euclidean norm kAx bk2. - Using the fact that kxk2 = xT x, we have kAxbk2 =(Axb)T(Axb)=xTATAx2bTAx+bTb - Taking the gradient with respect to x we have: rx(xTATAx 2bTAx +bTb) = rxxTATAx rx2bTAx +rxbTb = 2ATAx2ATb - Setting this last expression equal to zero and solving for x gives the normal equations x = (AT A)1AT b 59/60 1 Basic Concepts and Notation 2 Matrix Multiplication 3 Operations and Properties 4 Matrix Calculus 60/60