CS计算机代考程序代写 matlab algorithm Linear Algebra Module IV: Inner Product Spaces

Linear Algebra Module IV: Inner Product Spaces

Linear Algebra Module IV: Inner Product Spaces

Summer Term 2021

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 1 / 51

Table of Contents

1 Inner Products on Rn
The Dot Product on Rn
Inner Products
Orthogonal Vectors and Subspaces
Projections and Least Squares
Least-Squares Approximation
Applications of Least-Squares

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 2 / 51

Table of Contents

1 Inner Products on Rn
The Dot Product on Rn
Inner Products
Orthogonal Vectors and Subspaces
Projections and Least Squares
Least-Squares Approximation
Applications of Least-Squares

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 3 / 51

The Dot Product on Rn

Consider Rn equipped with its standard basis (so that vectors in Rn are
canonically identified with their n × 1 coordinate representations). Given
v = [v1 v2 . . . vn]T ,w = [w1 w2 . . . wn]T ∈ Rn, their dot product (also
referred to as scalar product) is given by

v ·w := vT ∗w =
n∑

i=1
vi wi

This operation on pairs of vectors satisfies three basic properties
(IP1) It is symmetric:

v ·w = w · v
(IP2) It is bilinear:

(α1v1 + α2v2) ·w = α1v1 ·w + α2v2 ·w
v · (β1w1 + β2w2) = β1v ·w1 + β2v ·w2

(IP3) It is positive non-degenerate:
v · v ≥ 0; v · v = 0 iff v = 0

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 4 / 51

The Dot Product on Rn

Consider Rn equipped with its standard basis (so that vectors in Rn are
canonically identified with their n × 1 coordinate representations). Given
v = [v1 v2 . . . vn]T ,w = [w1 w2 . . . wn]T ∈ Rn, their dot product (also
referred to as scalar product) is given by

v ·w := vT ∗w =
n∑

i=1
vi wi

This operation on pairs of vectors satisfies three basic properties
(IP1) It is symmetric:

v ·w = w · v
(IP2) It is bilinear:

(α1v1 + α2v2) ·w = α1v1 ·w + α2v2 ·w
v · (β1w1 + β2w2) = β1v ·w1 + β2v ·w2

(IP3) It is positive non-degenerate:
v · v ≥ 0; v · v = 0 iff v = 0

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 4 / 51

The Dot Product on Rn

Consider Rn equipped with its standard basis (so that vectors in Rn are
canonically identified with their n × 1 coordinate representations). Given
v = [v1 v2 . . . vn]T ,w = [w1 w2 . . . wn]T ∈ Rn, their dot product (also
referred to as scalar product) is given by

v ·w := vT ∗w =
n∑

i=1
vi wi

This operation on pairs of vectors satisfies three basic properties

(IP1) It is symmetric:
v ·w = w · v

(IP2) It is bilinear:
(α1v1 + α2v2) ·w = α1v1 ·w + α2v2 ·w
v · (β1w1 + β2w2) = β1v ·w1 + β2v ·w2

(IP3) It is positive non-degenerate:
v · v ≥ 0; v · v = 0 iff v = 0

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 4 / 51

The Dot Product on Rn

Consider Rn equipped with its standard basis (so that vectors in Rn are
canonically identified with their n × 1 coordinate representations). Given
v = [v1 v2 . . . vn]T ,w = [w1 w2 . . . wn]T ∈ Rn, their dot product (also
referred to as scalar product) is given by

v ·w := vT ∗w =
n∑

i=1
vi wi

This operation on pairs of vectors satisfies three basic properties
(IP1) It is symmetric:

v ·w = w · v

(IP2) It is bilinear:
(α1v1 + α2v2) ·w = α1v1 ·w + α2v2 ·w
v · (β1w1 + β2w2) = β1v ·w1 + β2v ·w2

(IP3) It is positive non-degenerate:
v · v ≥ 0; v · v = 0 iff v = 0

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 4 / 51

The Dot Product on Rn

Consider Rn equipped with its standard basis (so that vectors in Rn are
canonically identified with their n × 1 coordinate representations). Given
v = [v1 v2 . . . vn]T ,w = [w1 w2 . . . wn]T ∈ Rn, their dot product (also
referred to as scalar product) is given by

v ·w := vT ∗w =
n∑

i=1
vi wi

This operation on pairs of vectors satisfies three basic properties
(IP1) It is symmetric:

v ·w = w · v
(IP2) It is bilinear:

(α1v1 + α2v2) ·w = α1v1 ·w + α2v2 ·w
v · (β1w1 + β2w2) = β1v ·w1 + β2v ·w2

(IP3) It is positive non-degenerate:
v · v ≥ 0; v · v = 0 iff v = 0

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 4 / 51

The Dot Product on Rn

Consider Rn equipped with its standard basis (so that vectors in Rn are
canonically identified with their n × 1 coordinate representations). Given
v = [v1 v2 . . . vn]T ,w = [w1 w2 . . . wn]T ∈ Rn, their dot product (also
referred to as scalar product) is given by

v ·w := vT ∗w =
n∑

i=1
vi wi

This operation on pairs of vectors satisfies three basic properties
(IP1) It is symmetric:

v ·w = w · v
(IP2) It is bilinear:

(α1v1 + α2v2) ·w = α1v1 ·w + α2v2 ·w
v · (β1w1 + β2w2) = β1v ·w1 + β2v ·w2

(IP3) It is positive non-degenerate:
v · v ≥ 0; v · v = 0 iff v = 0

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 4 / 51

The Dot Product on Rn – The Norm of a Vector
Note that the standard Euclidean norm of v (also referred to as the
L2-norm) is closely related to the dot product; precisely

‖v‖ = ‖v‖2 =

v21 + v
2
2 + . . . v2n = (v · v)

1
2 (1)

Its essential features are
(N1) It is positive definite:

‖v‖ ≥ 0; ‖v‖ = 0 iff v = 0
(N2) It satisfies the triangle inequality (for norms):

‖v + w‖ ≤ ‖v‖+ ‖w‖

This is the norm used in Rn to define the standard Euclidean metric, which
is the conventional way to measure the distance between two vectors:

d(v,w) := ‖v−w‖2 (2)

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 5 / 51

The Dot Product on Rn – The Norm of a Vector
Note that the standard Euclidean norm of v (also referred to as the
L2-norm) is closely related to the dot product; precisely

‖v‖ = ‖v‖2 =

v21 + v
2
2 + . . . v2n = (v · v)

1
2 (1)

Its essential features are
(N1) It is positive definite:

‖v‖ ≥ 0; ‖v‖ = 0 iff v = 0
(N2) It satisfies the triangle inequality (for norms):

‖v + w‖ ≤ ‖v‖+ ‖w‖

This is the norm used in Rn to define the standard Euclidean metric, which
is the conventional way to measure the distance between two vectors:

d(v,w) := ‖v−w‖2 (2)

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 5 / 51

The Dot Product on Rn – The Norm of a Vector
Note that the standard Euclidean norm of v (also referred to as the
L2-norm) is closely related to the dot product; precisely

‖v‖ = ‖v‖2 =

v21 + v
2
2 + . . . v2n = (v · v)

1
2 (1)

Its essential features are

(N1) It is positive definite:
‖v‖ ≥ 0; ‖v‖ = 0 iff v = 0

(N2) It satisfies the triangle inequality (for norms):
‖v + w‖ ≤ ‖v‖+ ‖w‖

This is the norm used in Rn to define the standard Euclidean metric, which
is the conventional way to measure the distance between two vectors:

d(v,w) := ‖v−w‖2 (2)

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 5 / 51

The Dot Product on Rn – The Norm of a Vector
Note that the standard Euclidean norm of v (also referred to as the
L2-norm) is closely related to the dot product; precisely

‖v‖ = ‖v‖2 =

v21 + v
2
2 + . . . v2n = (v · v)

1
2 (1)

Its essential features are
(N1) It is positive definite:

‖v‖ ≥ 0; ‖v‖ = 0 iff v = 0

(N2) It satisfies the triangle inequality (for norms):
‖v + w‖ ≤ ‖v‖+ ‖w‖

This is the norm used in Rn to define the standard Euclidean metric, which
is the conventional way to measure the distance between two vectors:

d(v,w) := ‖v−w‖2 (2)

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 5 / 51

The Dot Product on Rn – The Norm of a Vector
Note that the standard Euclidean norm of v (also referred to as the
L2-norm) is closely related to the dot product; precisely

‖v‖ = ‖v‖2 =

v21 + v
2
2 + . . . v2n = (v · v)

1
2 (1)

Its essential features are
(N1) It is positive definite:

‖v‖ ≥ 0; ‖v‖ = 0 iff v = 0
(N2) It satisfies the triangle inequality (for norms):

‖v + w‖ ≤ ‖v‖+ ‖w‖

This is the norm used in Rn to define the standard Euclidean metric, which
is the conventional way to measure the distance between two vectors:

d(v,w) := ‖v−w‖2 (2)

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 5 / 51

The Dot Product on Rn – The Norm of a Vector
Note that the standard Euclidean norm of v (also referred to as the
L2-norm) is closely related to the dot product; precisely

‖v‖ = ‖v‖2 =

v21 + v
2
2 + . . . v2n = (v · v)

1
2 (1)

Its essential features are
(N1) It is positive definite:

‖v‖ ≥ 0; ‖v‖ = 0 iff v = 0
(N2) It satisfies the triangle inequality (for norms):

‖v + w‖ ≤ ‖v‖+ ‖w‖

This is the norm used in Rn to define the standard Euclidean metric, which
is the conventional way to measure the distance between two vectors:

d(v,w) := ‖v−w‖2 (2)

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 5 / 51

The Dot Product on Rn – The Norm of a Vector
Note that the standard Euclidean norm of v (also referred to as the
L2-norm) is closely related to the dot product; precisely

‖v‖ = ‖v‖2 =

v21 + v
2
2 + . . . v2n = (v · v)

1
2 (1)

Its essential features are
(N1) It is positive definite:

‖v‖ ≥ 0; ‖v‖ = 0 iff v = 0
(N2) It satisfies the triangle inequality (for norms):

‖v + w‖ ≤ ‖v‖+ ‖w‖

This is the norm used in Rn to define the standard Euclidean metric, which
is the conventional way to measure the distance between two vectors:

d(v,w) := ‖v−w‖2 (2)

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 5 / 51

The Dot Product on Rn – Distance Between Vectors

Again, this distance function – or metric – satisfies three basic properties,
which are direct consequences of the ones above.

(M1) It is symmetric:
d(v,w) = d(w, v)

(M2) It is positive non-degenerate:

d(v,w) ≥ 0 ∀v,w ∈ Rn; moreover d(v,w) = 0 iff v = w

(M3 ) It satisfies the triangle inequality (for metrics):

d(u,w) ≤ d(u, v) + d(v,w) ∀u, v,w ∈ Rn

So the i) dot product, ii) Euclidean norm, and iii) Euclidean distance are
all closely related.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 6 / 51

The Dot Product on Rn – Distance Between Vectors

Again, this distance function – or metric – satisfies three basic properties,
which are direct consequences of the ones above.

(M1) It is symmetric:
d(v,w) = d(w, v)

(M2) It is positive non-degenerate:

d(v,w) ≥ 0 ∀v,w ∈ Rn; moreover d(v,w) = 0 iff v = w

(M3 ) It satisfies the triangle inequality (for metrics):

d(u,w) ≤ d(u, v) + d(v,w) ∀u, v,w ∈ Rn

So the i) dot product, ii) Euclidean norm, and iii) Euclidean distance are
all closely related.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 6 / 51

The Dot Product on Rn – Distance Between Vectors

Again, this distance function – or metric – satisfies three basic properties,
which are direct consequences of the ones above.

(M1) It is symmetric:
d(v,w) = d(w, v)

(M2) It is positive non-degenerate:

d(v,w) ≥ 0 ∀v,w ∈ Rn; moreover d(v,w) = 0 iff v = w

(M3 ) It satisfies the triangle inequality (for metrics):

d(u,w) ≤ d(u, v) + d(v,w) ∀u, v,w ∈ Rn

So the i) dot product, ii) Euclidean norm, and iii) Euclidean distance are
all closely related.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 6 / 51

The Dot Product on Rn – Distance Between Vectors

Again, this distance function – or metric – satisfies three basic properties,
which are direct consequences of the ones above.

(M1) It is symmetric:
d(v,w) = d(w, v)

(M2) It is positive non-degenerate:

d(v,w) ≥ 0 ∀v,w ∈ Rn; moreover d(v,w) = 0 iff v = w

(M3 ) It satisfies the triangle inequality (for metrics):

d(u,w) ≤ d(u, v) + d(v,w) ∀u, v,w ∈ Rn

So the i) dot product, ii) Euclidean norm, and iii) Euclidean distance are
all closely related.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 6 / 51

The Dot Product on Rn – Distance Between Vectors

Again, this distance function – or metric – satisfies three basic properties,
which are direct consequences of the ones above.

(M1) It is symmetric:
d(v,w) = d(w, v)

(M2) It is positive non-degenerate:

d(v,w) ≥ 0 ∀v,w ∈ Rn; moreover d(v,w) = 0 iff v = w

(M3 ) It satisfies the triangle inequality (for metrics):

d(u,w) ≤ d(u, v) + d(v,w) ∀u, v,w ∈ Rn

So the i) dot product, ii) Euclidean norm, and iii) Euclidean distance are
all closely related.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 6 / 51

The Dot Product on Rn – Equivalence of Constructions

In fact, any one of them determines the other two. Obviously, the dot
product determines the norm, and the norm determines the distance. But
also one has

the equality
v ·w =

1
2

(‖v + w‖2 − ‖v‖2 − ‖w‖2)

so that one can also recover the dot product from the norm;
‖v‖ = d(v, 0), so d(−,− ) and ‖−‖ determine each other.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 7 / 51

The Dot Product on Rn – Equivalence of Constructions

In fact, any one of them determines the other two. Obviously, the dot
product determines the norm, and the norm determines the distance. But
also one has

the equality
v ·w =

1
2

(‖v + w‖2 − ‖v‖2 − ‖w‖2)

so that one can also recover the dot product from the norm;

‖v‖ = d(v, 0), so d(−,− ) and ‖−‖ determine each other.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 7 / 51

The Dot Product on Rn – Equivalence of Constructions

In fact, any one of them determines the other two. Obviously, the dot
product determines the norm, and the norm determines the distance. But
also one has

the equality
v ·w =

1
2

(‖v + w‖2 − ‖v‖2 − ‖w‖2)

so that one can also recover the dot product from the norm;
‖v‖ = d(v, 0), so d(−,− ) and ‖−‖ determine each other.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 7 / 51

The Dot Product on Rn – The Angle between Two Vectors

If v, w are two non-zero vectors in Rn, then there span is at most
two-dimensional, and so we may choose a plane in Rn containing them.

Within that plane the vectors determine two angles. We define θv,w to be
the smaller of the two angles; this angle will always lie between 0 and π
radians

[Note: Except in the case the two vectors are colinear and pointing in
opposite directions, the smaller angle will be unique and less than π
radians. In that one remaining case the choice of angle representative will
not be unique, but both angles will be equal to π, so the angle itself is
well-defined.]

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 8 / 51

The Dot Product on Rn – The Angle between Two Vectors

If v, w are two non-zero vectors in Rn, then there span is at most
two-dimensional, and so we may choose a plane in Rn containing them.

Within that plane the vectors determine two angles. We define θv,w to be
the smaller of the two angles; this angle will always lie between 0 and π
radians

[Note: Except in the case the two vectors are colinear and pointing in
opposite directions, the smaller angle will be unique and less than π
radians. In that one remaining case the choice of angle representative will
not be unique, but both angles will be equal to π, so the angle itself is
well-defined.]

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 8 / 51

The Dot Product on Rn – The Angle between Two Vectors

If v, w are two non-zero vectors in Rn, then there span is at most
two-dimensional, and so we may choose a plane in Rn containing them.

Within that plane the vectors determine two angles. We define θv,w to be
the smaller of the two angles; this angle will always lie between 0 and π
radians

[Note: Except in the case the two vectors are colinear and pointing in
opposite directions, the smaller angle will be unique and less than π
radians. In that one remaining case the choice of angle representative will
not be unique, but both angles will be equal to π, so the angle itself is
well-defined.]

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 8 / 51

The Dot Product on Rn – The Angle Between Two Vectors

The angle formula is given by the following lemma.

Lemma
For Rn 3 v,w 6= 0

v ·w = ‖v‖‖w‖ cos(θv,w)

As cos is 1-1 on the interval [0, π], this equality implies

Corollary
For v,w as above

θv,w = arccos (v ·w/(‖v‖‖w‖))

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 9 / 51

The Dot Product on Rn – The Angle Between Two Vectors

The angle formula is given by the following lemma.

Lemma
For Rn 3 v,w 6= 0

v ·w = ‖v‖‖w‖ cos(θv,w)

As cos is 1-1 on the interval [0, π], this equality implies

Corollary
For v,w as above

θv,w = arccos (v ·w/(‖v‖‖w‖))

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 9 / 51

The Dot Product on Rn – The Angle Between Two Vectors

The angle formula is given by the following lemma.

Lemma
For Rn 3 v,w 6= 0

v ·w = ‖v‖‖w‖ cos(θv,w)

As cos is 1-1 on the interval [0, π], this equality implies

Corollary
For v,w as above

θv,w = arccos (v ·w/(‖v‖‖w‖))

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 9 / 51

The Dot Product on Rn – The Angle Between Two Vectors

The angle formula is given by the following lemma.

Lemma
For Rn 3 v,w 6= 0

v ·w = ‖v‖‖w‖ cos(θv,w)

As cos is 1-1 on the interval [0, π], this equality implies

Corollary
For v,w as above

θv,w = arccos (v ·w/(‖v‖‖w‖))

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 9 / 51

The Dot Product on Rn – The Angle Between Two Vectors

The angle formula is given by the following lemma.

Lemma
For Rn 3 v,w 6= 0

v ·w = ‖v‖‖w‖ cos(θv,w)

As cos is 1-1 on the interval [0, π], this equality implies

Corollary
For v,w as above

θv,w = arccos (v ·w/(‖v‖‖w‖))

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 9 / 51

The Dot Product on Rn – The Angle Between Two Vectors

The angle formula is given by the following lemma.

Lemma
For Rn 3 v,w 6= 0

v ·w = ‖v‖‖w‖ cos(θv,w)

As cos is 1-1 on the interval [0, π], this equality implies

Corollary
For v,w as above

θv,w = arccos (v ·w/(‖v‖‖w‖))

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 9 / 51

The Dot Product on Rn – The Angle Between Two Vectors

Examples

Suppose v =


 3−1

4


 ,w =


−21

5


, and we want to compute the angle θv,w

determined by v and w.

Using Octave or MATLAB, we see that (in ”short” format)
v ·w = 13;
‖v‖ = 5.0990;
‖w‖ = 5.4772;
Therefore

θv,w = arccos (v ·w/(‖v‖‖w‖))

= arccos(13/(5.0990 ∗ 5.4772)) = arccos(0.4655) = 1.0866 radians

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 10 / 51

The Dot Product on Rn – The Angle Between Two Vectors

Examples

Suppose v =


 3−1

4


 ,w =


−21

5


, and we want to compute the angle θv,w

determined by v and w.

Using Octave or MATLAB, we see that (in ”short” format)

v ·w = 13;
‖v‖ = 5.0990;
‖w‖ = 5.4772;
Therefore

θv,w = arccos (v ·w/(‖v‖‖w‖))

= arccos(13/(5.0990 ∗ 5.4772)) = arccos(0.4655) = 1.0866 radians

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 10 / 51

The Dot Product on Rn – The Angle Between Two Vectors

Examples

Suppose v =


 3−1

4


 ,w =


−21

5


, and we want to compute the angle θv,w

determined by v and w.

Using Octave or MATLAB, we see that (in ”short” format)
v ·w = 13;

‖v‖ = 5.0990;
‖w‖ = 5.4772;
Therefore

θv,w = arccos (v ·w/(‖v‖‖w‖))

= arccos(13/(5.0990 ∗ 5.4772)) = arccos(0.4655) = 1.0866 radians

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 10 / 51

The Dot Product on Rn – The Angle Between Two Vectors

Examples

Suppose v =


 3−1

4


 ,w =


−21

5


, and we want to compute the angle θv,w

determined by v and w.

Using Octave or MATLAB, we see that (in ”short” format)
v ·w = 13;
‖v‖ = 5.0990;

‖w‖ = 5.4772;
Therefore

θv,w = arccos (v ·w/(‖v‖‖w‖))

= arccos(13/(5.0990 ∗ 5.4772)) = arccos(0.4655) = 1.0866 radians

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 10 / 51

The Dot Product on Rn – The Angle Between Two Vectors

Examples

Suppose v =


 3−1

4


 ,w =


−21

5


, and we want to compute the angle θv,w

determined by v and w.

Using Octave or MATLAB, we see that (in ”short” format)
v ·w = 13;
‖v‖ = 5.0990;
‖w‖ = 5.4772;

Therefore

θv,w = arccos (v ·w/(‖v‖‖w‖))

= arccos(13/(5.0990 ∗ 5.4772)) = arccos(0.4655) = 1.0866 radians

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 10 / 51

The Dot Product on Rn – The Angle Between Two Vectors

Examples

Suppose v =


 3−1

4


 ,w =


−21

5


, and we want to compute the angle θv,w

determined by v and w.

Using Octave or MATLAB, we see that (in ”short” format)
v ·w = 13;
‖v‖ = 5.0990;
‖w‖ = 5.4772;
Therefore

θv,w = arccos (v ·w/(‖v‖‖w‖))

= arccos(13/(5.0990 ∗ 5.4772)) = arccos(0.4655) = 1.0866 radians

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 10 / 51

The Dot Product on Rn – The Angle Between Two Vectors

Examples

Suppose v =


 3−1

4


 ,w =


−21

5


, and we want to compute the angle θv,w

determined by v and w.

Using Octave or MATLAB, we see that (in ”short” format)
v ·w = 13;
‖v‖ = 5.0990;
‖w‖ = 5.4772;
Therefore

θv,w = arccos (v ·w/(‖v‖‖w‖))

= arccos(13/(5.0990 ∗ 5.4772)) = arccos(0.4655) = 1.0866 radians

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 10 / 51

The Dot Product on Rn – The Angle Between Two Vectors

Examples

Suppose v =


 3−1

4


 ,w =


−21

5


, and we want to compute the angle θv,w

determined by v and w.

Using Octave or MATLAB, we see that (in ”short” format)
v ·w = 13;
‖v‖ = 5.0990;
‖w‖ = 5.4772;
Therefore

θv,w = arccos (v ·w/(‖v‖‖w‖))
= arccos(13/(5.0990 ∗ 5.4772)) = arccos(0.4655) = 1.0866 radians

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 10 / 51

Inner Products

The dot product defined in the previous section is a specific example of a
bilinear, symmetric, positive definite pairing on the vector space Rn.

We consider these properties in sequence, for an arbitrary vector space V
over R.
A bilinear pairing on V is a map P : V× V→ R which simply satisfies
property
(IP2)

P(α1v1 + α2v2,w) = α1P(v1,w) + α2P(v2,w)
P(v, β1w1 + β2w2) = β1P(v,w1) + β2P(v,w2)

A symmetric bilinear pairing is a bilinear pairing that also satisfies
(IP1)

P(v,w) = P(w, v)

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 11 / 51

Inner Products

The dot product defined in the previous section is a specific example of a
bilinear, symmetric, positive definite pairing on the vector space Rn.
We consider these properties in sequence, for an arbitrary vector space V
over R.

A bilinear pairing on V is a map P : V× V→ R which simply satisfies
property
(IP2)

P(α1v1 + α2v2,w) = α1P(v1,w) + α2P(v2,w)
P(v, β1w1 + β2w2) = β1P(v,w1) + β2P(v,w2)

A symmetric bilinear pairing is a bilinear pairing that also satisfies
(IP1)

P(v,w) = P(w, v)

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 11 / 51

Inner Products

The dot product defined in the previous section is a specific example of a
bilinear, symmetric, positive definite pairing on the vector space Rn.
We consider these properties in sequence, for an arbitrary vector space V
over R.
A bilinear pairing on V is a map P : V× V→ R which simply satisfies
property

(IP2)

P(α1v1 + α2v2,w) = α1P(v1,w) + α2P(v2,w)
P(v, β1w1 + β2w2) = β1P(v,w1) + β2P(v,w2)

A symmetric bilinear pairing is a bilinear pairing that also satisfies
(IP1)

P(v,w) = P(w, v)

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 11 / 51

Inner Products

The dot product defined in the previous section is a specific example of a
bilinear, symmetric, positive definite pairing on the vector space Rn.
We consider these properties in sequence, for an arbitrary vector space V
over R.
A bilinear pairing on V is a map P : V× V→ R which simply satisfies
property
(IP2)

P(α1v1 + α2v2,w) = α1P(v1,w) + α2P(v2,w)
P(v, β1w1 + β2w2) = β1P(v,w1) + β2P(v,w2)

A symmetric bilinear pairing is a bilinear pairing that also satisfies
(IP1)

P(v,w) = P(w, v)

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 11 / 51

Inner Products

The dot product defined in the previous section is a specific example of a
bilinear, symmetric, positive definite pairing on the vector space Rn.
We consider these properties in sequence, for an arbitrary vector space V
over R.
A bilinear pairing on V is a map P : V× V→ R which simply satisfies
property
(IP2)

P(α1v1 + α2v2,w) = α1P(v1,w) + α2P(v2,w)
P(v, β1w1 + β2w2) = β1P(v,w1) + β2P(v,w2)

A symmetric bilinear pairing is a bilinear pairing that also satisfies

(IP1)
P(v,w) = P(w, v)

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 11 / 51

Inner Products

The dot product defined in the previous section is a specific example of a
bilinear, symmetric, positive definite pairing on the vector space Rn.
We consider these properties in sequence, for an arbitrary vector space V
over R.
A bilinear pairing on V is a map P : V× V→ R which simply satisfies
property
(IP2)

P(α1v1 + α2v2,w) = α1P(v1,w) + α2P(v2,w)
P(v, β1w1 + β2w2) = β1P(v,w1) + β2P(v,w2)

A symmetric bilinear pairing is a bilinear pairing that also satisfies
(IP1)

P(v,w) = P(w, v)

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 11 / 51

Inner Products on Rn – Matrix Representation of Bilinear
Pairings
When V = Rn, these pairings admit a straightforward matrix
representation, not unlike the matrix representation of linear
transformations discussed previously. Again, we assume we are looking at
coordinate vectors with respect to the standard basis for Rn.

Theorem

For any bilinear pairing P on Rn, there is a unique n × n matrix AP such
that

P(v,w) = vT ∗ AP ∗w

Moreover, if P is symmetric then so is AP . Conversely, any n × n matrix
A determines a unique bilinear pairing PA on Rn by

PA(v,w) = vT ∗ A ∗w

which is symmetric precisely when A is.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 12 / 51

Inner Products on Rn – Matrix Representation of Bilinear
Pairings
When V = Rn, these pairings admit a straightforward matrix
representation, not unlike the matrix representation of linear
transformations discussed previously. Again, we assume we are looking at
coordinate vectors with respect to the standard basis for Rn.

Theorem

For any bilinear pairing P on Rn, there is a unique n × n matrix AP such
that

P(v,w) = vT ∗ AP ∗w

Moreover, if P is symmetric then so is AP . Conversely, any n × n matrix
A determines a unique bilinear pairing PA on Rn by

PA(v,w) = vT ∗ A ∗w

which is symmetric precisely when A is.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 12 / 51

Inner Products on Rn – Matrix Representation of Bilinear
Pairings
When V = Rn, these pairings admit a straightforward matrix
representation, not unlike the matrix representation of linear
transformations discussed previously. Again, we assume we are looking at
coordinate vectors with respect to the standard basis for Rn.

Theorem

For any bilinear pairing P on Rn, there is a unique n × n matrix AP such
that

P(v,w) = vT ∗ AP ∗w

Moreover, if P is symmetric then so is AP . Conversely, any n × n matrix
A determines a unique bilinear pairing PA on Rn by

PA(v,w) = vT ∗ A ∗w

which is symmetric precisely when A is.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 12 / 51

Inner Products on Rn – Matrix Representation of Bilinear
Pairings
When V = Rn, these pairings admit a straightforward matrix
representation, not unlike the matrix representation of linear
transformations discussed previously. Again, we assume we are looking at
coordinate vectors with respect to the standard basis for Rn.

Theorem

For any bilinear pairing P on Rn, there is a unique n × n matrix AP such
that

P(v,w) = vT ∗ AP ∗w

Moreover, if P is symmetric then so is AP . Conversely, any n × n matrix
A determines a unique bilinear pairing PA on Rn by

PA(v,w) = vT ∗ A ∗w

which is symmetric precisely when A is.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 12 / 51

Inner Products on Rn – Matrix Representation of Bilinear
Pairings
When V = Rn, these pairings admit a straightforward matrix
representation, not unlike the matrix representation of linear
transformations discussed previously. Again, we assume we are looking at
coordinate vectors with respect to the standard basis for Rn.

Theorem

For any bilinear pairing P on Rn, there is a unique n × n matrix AP such
that

P(v,w) = vT ∗ AP ∗w

Moreover, if P is symmetric then so is AP .

Conversely, any n × n matrix
A determines a unique bilinear pairing PA on Rn by

PA(v,w) = vT ∗ A ∗w

which is symmetric precisely when A is.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 12 / 51

Inner Products on Rn – Matrix Representation of Bilinear
Pairings
When V = Rn, these pairings admit a straightforward matrix
representation, not unlike the matrix representation of linear
transformations discussed previously. Again, we assume we are looking at
coordinate vectors with respect to the standard basis for Rn.

Theorem

For any bilinear pairing P on Rn, there is a unique n × n matrix AP such
that

P(v,w) = vT ∗ AP ∗w

Moreover, if P is symmetric then so is AP . Conversely, any n × n matrix
A determines a unique bilinear pairing PA on Rn by

PA(v,w) = vT ∗ A ∗w

which is symmetric precisely when A is.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 12 / 51

Inner Products on Rn – Matrix Representation of Bilinear
Pairings
When V = Rn, these pairings admit a straightforward matrix
representation, not unlike the matrix representation of linear
transformations discussed previously. Again, we assume we are looking at
coordinate vectors with respect to the standard basis for Rn.

Theorem

For any bilinear pairing P on Rn, there is a unique n × n matrix AP such
that

P(v,w) = vT ∗ AP ∗w

Moreover, if P is symmetric then so is AP . Conversely, any n × n matrix
A determines a unique bilinear pairing PA on Rn by

PA(v,w) = vT ∗ A ∗w

which is symmetric precisely when A is.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 12 / 51

Inner Products on Rn – Matrix Representation of Bilinear
Pairings
When V = Rn, these pairings admit a straightforward matrix
representation, not unlike the matrix representation of linear
transformations discussed previously. Again, we assume we are looking at
coordinate vectors with respect to the standard basis for Rn.

Theorem

For any bilinear pairing P on Rn, there is a unique n × n matrix AP such
that

P(v,w) = vT ∗ AP ∗w

Moreover, if P is symmetric then so is AP . Conversely, any n × n matrix
A determines a unique bilinear pairing PA on Rn by

PA(v,w) = vT ∗ A ∗w

which is symmetric precisely when A is.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 12 / 51

Inner Products – The Positive Definite Property

Definition
An inner product on V is a symmetric bilinear pairing P that is also
positive definite:

(IP3)
P(v, v) ≥ 0; P(v, v) = 0 iff v = 0

However, unlike properties (IP1) and (IP2), (IP3) is harder to translate
into properties of the representing matrix. In fact, this last property is
closely related to the property of diagonalizability for symmetric matrices.

Theorem

Let P be a symmetric bilinear pairing on Rn, represented (as above) by the
real symmetric matrix AP . Then P is positive definite precisely when all of
the eigenvalues of AP are positive.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 13 / 51

Inner Products – The Positive Definite Property

Definition
An inner product on V is a symmetric bilinear pairing P that is also
positive definite:

(IP3)
P(v, v) ≥ 0; P(v, v) = 0 iff v = 0

However, unlike properties (IP1) and (IP2), (IP3) is harder to translate
into properties of the representing matrix. In fact, this last property is
closely related to the property of diagonalizability for symmetric matrices.

Theorem

Let P be a symmetric bilinear pairing on Rn, represented (as above) by the
real symmetric matrix AP . Then P is positive definite precisely when all of
the eigenvalues of AP are positive.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 13 / 51

Inner Products – The Positive Definite Property

Definition
An inner product on V is a symmetric bilinear pairing P that is also
positive definite:

(IP3)
P(v, v) ≥ 0; P(v, v) = 0 iff v = 0

However, unlike properties (IP1) and (IP2), (IP3) is harder to translate
into properties of the representing matrix. In fact, this last property is
closely related to the property of diagonalizability for symmetric matrices.

Theorem

Let P be a symmetric bilinear pairing on Rn, represented (as above) by the
real symmetric matrix AP . Then P is positive definite precisely when all of
the eigenvalues of AP are positive.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 13 / 51

Inner Products – The Positive Definite Property

Definition
An inner product on V is a symmetric bilinear pairing P that is also
positive definite:

(IP3)
P(v, v) ≥ 0; P(v, v) = 0 iff v = 0

However, unlike properties (IP1) and (IP2), (IP3) is harder to translate
into properties of the representing matrix. In fact, this last property is
closely related to the property of diagonalizability for symmetric matrices.

Theorem

Let P be a symmetric bilinear pairing on Rn, represented (as above) by the
real symmetric matrix AP . Then P is positive definite precisely when all of
the eigenvalues of AP are positive.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 13 / 51

Inner Products – The Positive Definite Property

Definition
An inner product on V is a symmetric bilinear pairing P that is also
positive definite:

(IP3)
P(v, v) ≥ 0; P(v, v) = 0 iff v = 0

However, unlike properties (IP1) and (IP2), (IP3) is harder to translate
into properties of the representing matrix. In fact, this last property is
closely related to the property of diagonalizability for symmetric matrices.

Theorem
Let P be a symmetric bilinear pairing on Rn, represented (as above) by the
real symmetric matrix AP . Then P is positive definite precisely when all of
the eigenvalues of AP are positive.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 13 / 51

Inner Products – Positive Definite Matrices

For this reason we call a real symmetric matrix positive definite iff all of
its eigenvalues are positive.

It is conventional in mathematics to denote inner products, and more
generally bilinear pairings, by the symbol <−,−>. In other words, a
bilinear pairing on a vector space V is denoted by

V × V 3 (v,w) 7→< v,w >∈ R

We have seen above how a given bilinear pairing on Rn is represented by
an n × n matrix, and conversely, how any n × n matrix can be used to
define a bilinear pairing on Rn. The same is true more generally for
bilinear pairings on an n-dimensional vector space V .

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 14 / 51

Inner Products – Positive Definite Matrices

For this reason we call a real symmetric matrix positive definite iff all of
its eigenvalues are positive.

It is conventional in mathematics to denote inner products, and more
generally bilinear pairings, by the symbol <−,−>.

In other words, a
bilinear pairing on a vector space V is denoted by

V × V 3 (v,w) 7→< v,w >∈ R

We have seen above how a given bilinear pairing on Rn is represented by
an n × n matrix, and conversely, how any n × n matrix can be used to
define a bilinear pairing on Rn. The same is true more generally for
bilinear pairings on an n-dimensional vector space V .

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 14 / 51

Inner Products – Positive Definite Matrices

For this reason we call a real symmetric matrix positive definite iff all of
its eigenvalues are positive.

It is conventional in mathematics to denote inner products, and more
generally bilinear pairings, by the symbol <−,−>. In other words, a
bilinear pairing on a vector space V is denoted by

V × V 3 (v,w) 7→< v,w >∈ R

We have seen above how a given bilinear pairing on Rn is represented by
an n × n matrix, and conversely, how any n × n matrix can be used to
define a bilinear pairing on Rn. The same is true more generally for
bilinear pairings on an n-dimensional vector space V .

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 14 / 51

Inner Products – Positive Definite Matrices

For this reason we call a real symmetric matrix positive definite iff all of
its eigenvalues are positive.

It is conventional in mathematics to denote inner products, and more
generally bilinear pairings, by the symbol <−,−>. In other words, a
bilinear pairing on a vector space V is denoted by

V × V 3 (v,w) 7→< v,w >∈ R

We have seen above how a given bilinear pairing on Rn is represented by
an n × n matrix, and conversely, how any n × n matrix can be used to
define a bilinear pairing on Rn. The same is true more generally for
bilinear pairings on an n-dimensional vector space V .

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 14 / 51

Inner Products – Positive Definite Matrices

For this reason we call a real symmetric matrix positive definite iff all of
its eigenvalues are positive.

It is conventional in mathematics to denote inner products, and more
generally bilinear pairings, by the symbol <−,−>. In other words, a
bilinear pairing on a vector space V is denoted by

V × V 3 (v,w) 7→< v,w >∈ R

We have seen above how a given bilinear pairing on Rn is represented by
an n × n matrix, and conversely, how any n × n matrix can be used to
define a bilinear pairing on Rn.

The same is true more generally for
bilinear pairings on an n-dimensional vector space V .

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 14 / 51

Inner Products – Positive Definite Matrices

For this reason we call a real symmetric matrix positive definite iff all of
its eigenvalues are positive.

It is conventional in mathematics to denote inner products, and more
generally bilinear pairings, by the symbol <−,−>. In other words, a
bilinear pairing on a vector space V is denoted by

V × V 3 (v,w) 7→< v,w >∈ R

We have seen above how a given bilinear pairing on Rn is represented by
an n × n matrix, and conversely, how any n × n matrix can be used to
define a bilinear pairing on Rn. The same is true more generally for
bilinear pairings on an n-dimensional vector space V .

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 14 / 51

Inner Products – Matrix Representation for Arbitrary
Vector Spaces

Theorem

Let V be a vector space over R of dimension n, and let P =<−,−> be a
bilinear pairing on V . Then for any basis S of V (not necessarily
orthogonal) there is a matrix A = AP,S depending on both P and S such
that for every v,w ∈ V there is an equality (of real numbers)

< v,w >= SvT ∗ A ∗ Sw (3)

and, conversely, any n × n matrix A determines a bilinear pairing on
<−,−>=<−,−>A on V by the above equation. For fixed basis S this
establishes a 1-1 correspondence between the set of bilinear pairings on V
and the set of n × n real matrices. Moreover, the bilinear pairing is
symmetric iff its matrix representation is a symmetric matrix, and the
symmetric bilinear pairing is positive definite iff its symmetric matrix
representation is positive definite (as defined above).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 15 / 51

Inner Products – Matrix Representation for Arbitrary
Vector Spaces

Theorem
Let V be a vector space over R of dimension n, and let P =<−,−> be a
bilinear pairing on V .

Then for any basis S of V (not necessarily
orthogonal) there is a matrix A = AP,S depending on both P and S such
that for every v,w ∈ V there is an equality (of real numbers)

< v,w >= SvT ∗ A ∗ Sw (3)

and, conversely, any n × n matrix A determines a bilinear pairing on
<−,−>=<−,−>A on V by the above equation. For fixed basis S this
establishes a 1-1 correspondence between the set of bilinear pairings on V
and the set of n × n real matrices. Moreover, the bilinear pairing is
symmetric iff its matrix representation is a symmetric matrix, and the
symmetric bilinear pairing is positive definite iff its symmetric matrix
representation is positive definite (as defined above).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 15 / 51

Inner Products – Matrix Representation for Arbitrary
Vector Spaces

Theorem
Let V be a vector space over R of dimension n, and let P =<−,−> be a
bilinear pairing on V . Then for any basis S of V (not necessarily
orthogonal) there is a matrix A = AP,S depending on both P and S such
that for every v,w ∈ V there is an equality (of real numbers)

< v,w >= SvT ∗ A ∗ Sw (3)

and, conversely, any n × n matrix A determines a bilinear pairing on
<−,−>=<−,−>A on V by the above equation. For fixed basis S this
establishes a 1-1 correspondence between the set of bilinear pairings on V
and the set of n × n real matrices. Moreover, the bilinear pairing is
symmetric iff its matrix representation is a symmetric matrix, and the
symmetric bilinear pairing is positive definite iff its symmetric matrix
representation is positive definite (as defined above).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 15 / 51

Inner Products – Matrix Representation for Arbitrary
Vector Spaces

Theorem
Let V be a vector space over R of dimension n, and let P =<−,−> be a
bilinear pairing on V . Then for any basis S of V (not necessarily
orthogonal) there is a matrix A = AP,S depending on both P and S such
that for every v,w ∈ V there is an equality (of real numbers)

< v,w >= SvT ∗ A ∗ Sw (3)

and, conversely, any n × n matrix A determines a bilinear pairing on
<−,−>=<−,−>A on V by the above equation. For fixed basis S this
establishes a 1-1 correspondence between the set of bilinear pairings on V
and the set of n × n real matrices. Moreover, the bilinear pairing is
symmetric iff its matrix representation is a symmetric matrix, and the
symmetric bilinear pairing is positive definite iff its symmetric matrix
representation is positive definite (as defined above).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 15 / 51

Inner Products – Matrix Representation for Arbitrary
Vector Spaces

Theorem
Let V be a vector space over R of dimension n, and let P =<−,−> be a
bilinear pairing on V . Then for any basis S of V (not necessarily
orthogonal) there is a matrix A = AP,S depending on both P and S such
that for every v,w ∈ V there is an equality (of real numbers)

< v,w >= SvT ∗ A ∗ Sw (3)

and, conversely, any n × n matrix A determines a bilinear pairing on
<−,−>=<−,−>A on V by the above equation.

For fixed basis S this
establishes a 1-1 correspondence between the set of bilinear pairings on V
and the set of n × n real matrices. Moreover, the bilinear pairing is
symmetric iff its matrix representation is a symmetric matrix, and the
symmetric bilinear pairing is positive definite iff its symmetric matrix
representation is positive definite (as defined above).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 15 / 51

Inner Products – Matrix Representation for Arbitrary
Vector Spaces

Theorem
Let V be a vector space over R of dimension n, and let P =<−,−> be a
bilinear pairing on V . Then for any basis S of V (not necessarily
orthogonal) there is a matrix A = AP,S depending on both P and S such
that for every v,w ∈ V there is an equality (of real numbers)

< v,w >= SvT ∗ A ∗ Sw (3)

and, conversely, any n × n matrix A determines a bilinear pairing on
<−,−>=<−,−>A on V by the above equation. For fixed basis S this
establishes a 1-1 correspondence between the set of bilinear pairings on V
and the set of n × n real matrices.

Moreover, the bilinear pairing is
symmetric iff its matrix representation is a symmetric matrix, and the
symmetric bilinear pairing is positive definite iff its symmetric matrix
representation is positive definite (as defined above).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 15 / 51

Inner Products – Matrix Representation for Arbitrary
Vector Spaces

Theorem
Let V be a vector space over R of dimension n, and let P =<−,−> be a
bilinear pairing on V . Then for any basis S of V (not necessarily
orthogonal) there is a matrix A = AP,S depending on both P and S such
that for every v,w ∈ V there is an equality (of real numbers)

< v,w >= SvT ∗ A ∗ Sw (3)

and, conversely, any n × n matrix A determines a bilinear pairing on
<−,−>=<−,−>A on V by the above equation. For fixed basis S this
establishes a 1-1 correspondence between the set of bilinear pairings on V
and the set of n × n real matrices. Moreover, the bilinear pairing is
symmetric iff its matrix representation is a symmetric matrix, and the
symmetric bilinear pairing is positive definite iff its symmetric matrix
representation is positive definite (as defined above).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 15 / 51

Orthogonal Vectors and Subspaces

The concept of orthogonality is dependent on the choice of inner product.
So assume first that we are working with the standard dot product in Rn.

Definition of orthogonality for a set of vectors

Two vectors v, w are orthogonal if they are non-zero and v ·w = 0; we
indicate this by writing v ⊥ w. Orthogonality with respect to this standard
inner product corresponds to our usual notion of perpendicular. More
generally, a collection of non-zero vectors {vi} is said to be orthogonal if
they are pairwise orthogonal; in other words, vi · vj = 0 for all i 6= j .

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 16 / 51

Orthogonal Vectors and Subspaces

The concept of orthogonality is dependent on the choice of inner product.
So assume first that we are working with the standard dot product in Rn.

Definition of orthogonality for a set of vectors

Two vectors v, w are orthogonal if they are non-zero and v ·w = 0; we
indicate this by writing v ⊥ w. Orthogonality with respect to this standard
inner product corresponds to our usual notion of perpendicular. More
generally, a collection of non-zero vectors {vi} is said to be orthogonal if
they are pairwise orthogonal; in other words, vi · vj = 0 for all i 6= j .

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 16 / 51

Orthogonal Vectors and Subspaces

The concept of orthogonality is dependent on the choice of inner product.
So assume first that we are working with the standard dot product in Rn.

Definition of orthogonality for a set of vectors
Two vectors v, w are orthogonal if they are non-zero and v ·w = 0; we
indicate this by writing v ⊥ w.

Orthogonality with respect to this standard
inner product corresponds to our usual notion of perpendicular. More
generally, a collection of non-zero vectors {vi} is said to be orthogonal if
they are pairwise orthogonal; in other words, vi · vj = 0 for all i 6= j .

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 16 / 51

Orthogonal Vectors and Subspaces

The concept of orthogonality is dependent on the choice of inner product.
So assume first that we are working with the standard dot product in Rn.

Definition of orthogonality for a set of vectors
Two vectors v, w are orthogonal if they are non-zero and v ·w = 0; we
indicate this by writing v ⊥ w. Orthogonality with respect to this standard
inner product corresponds to our usual notion of perpendicular.

More
generally, a collection of non-zero vectors {vi} is said to be orthogonal if
they are pairwise orthogonal; in other words, vi · vj = 0 for all i 6= j .

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 16 / 51

Orthogonal Vectors and Subspaces

The concept of orthogonality is dependent on the choice of inner product.
So assume first that we are working with the standard dot product in Rn.

Definition of orthogonality for a set of vectors
Two vectors v, w are orthogonal if they are non-zero and v ·w = 0; we
indicate this by writing v ⊥ w. Orthogonality with respect to this standard
inner product corresponds to our usual notion of perpendicular. More
generally, a collection of non-zero vectors {vi} is said to be orthogonal if
they are pairwise orthogonal; in other words, vi · vj = 0 for all i 6= j .

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 16 / 51

Orthogonal Vectors and Subspaces
The notion of orthogonality extends to subspaces and pairs of sets.

Definition of orthogonality for subspaces

If V ,W ⊂ Rn are two non-zero subspaces, we say V and W are
orthogonal ( written as V ⊥W ) if v ·w = 0 ∀v ∈ V ,w ∈W . As with a
collection of vectors, a collection of subspaces {Vi} is orthogonal iff it is
pairwise orthogonal: Vi ⊥ Vj ∀i 6= j . Finally, we say that two sets of
vectors S,T are orthogonal if the dot product of any vector in S with any
other vector in T is zero. In this case we again write this as S ⊥ T .

Examples
Let V = Span{[2 3]T},W = Span{[3 (−2)]T} ⊂ R2. Then it is easy to
check that V and W are orthogonal (geometrically, they are represented
by two lines in R2 passing through the origin and forming a 90◦ angle
between them).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 17 / 51

Orthogonal Vectors and Subspaces
The notion of orthogonality extends to subspaces and pairs of sets.

Definition of orthogonality for subspaces

If V ,W ⊂ Rn are two non-zero subspaces, we say V and W are
orthogonal ( written as V ⊥W ) if v ·w = 0 ∀v ∈ V ,w ∈W . As with a
collection of vectors, a collection of subspaces {Vi} is orthogonal iff it is
pairwise orthogonal: Vi ⊥ Vj ∀i 6= j . Finally, we say that two sets of
vectors S,T are orthogonal if the dot product of any vector in S with any
other vector in T is zero. In this case we again write this as S ⊥ T .

Examples
Let V = Span{[2 3]T},W = Span{[3 (−2)]T} ⊂ R2. Then it is easy to
check that V and W are orthogonal (geometrically, they are represented
by two lines in R2 passing through the origin and forming a 90◦ angle
between them).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 17 / 51

Orthogonal Vectors and Subspaces
The notion of orthogonality extends to subspaces and pairs of sets.

Definition of orthogonality for subspaces
If V ,W ⊂ Rn are two non-zero subspaces, we say V and W are
orthogonal ( written as V ⊥W ) if v ·w = 0 ∀v ∈ V ,w ∈W .

As with a
collection of vectors, a collection of subspaces {Vi} is orthogonal iff it is
pairwise orthogonal: Vi ⊥ Vj ∀i 6= j . Finally, we say that two sets of
vectors S,T are orthogonal if the dot product of any vector in S with any
other vector in T is zero. In this case we again write this as S ⊥ T .

Examples
Let V = Span{[2 3]T},W = Span{[3 (−2)]T} ⊂ R2. Then it is easy to
check that V and W are orthogonal (geometrically, they are represented
by two lines in R2 passing through the origin and forming a 90◦ angle
between them).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 17 / 51

Orthogonal Vectors and Subspaces
The notion of orthogonality extends to subspaces and pairs of sets.

Definition of orthogonality for subspaces
If V ,W ⊂ Rn are two non-zero subspaces, we say V and W are
orthogonal ( written as V ⊥W ) if v ·w = 0 ∀v ∈ V ,w ∈W . As with a
collection of vectors, a collection of subspaces {Vi} is orthogonal iff it is
pairwise orthogonal: Vi ⊥ Vj ∀i 6= j .

Finally, we say that two sets of
vectors S,T are orthogonal if the dot product of any vector in S with any
other vector in T is zero. In this case we again write this as S ⊥ T .

Examples
Let V = Span{[2 3]T},W = Span{[3 (−2)]T} ⊂ R2. Then it is easy to
check that V and W are orthogonal (geometrically, they are represented
by two lines in R2 passing through the origin and forming a 90◦ angle
between them).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 17 / 51

Orthogonal Vectors and Subspaces
The notion of orthogonality extends to subspaces and pairs of sets.

Definition of orthogonality for subspaces
If V ,W ⊂ Rn are two non-zero subspaces, we say V and W are
orthogonal ( written as V ⊥W ) if v ·w = 0 ∀v ∈ V ,w ∈W . As with a
collection of vectors, a collection of subspaces {Vi} is orthogonal iff it is
pairwise orthogonal: Vi ⊥ Vj ∀i 6= j . Finally, we say that two sets of
vectors S,T are orthogonal if the dot product of any vector in S with any
other vector in T is zero. In this case we again write this as S ⊥ T .

Examples
Let V = Span{[2 3]T},W = Span{[3 (−2)]T} ⊂ R2. Then it is easy to
check that V and W are orthogonal (geometrically, they are represented
by two lines in R2 passing through the origin and forming a 90◦ angle
between them).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 17 / 51

Orthogonal Vectors and Subspaces
The notion of orthogonality extends to subspaces and pairs of sets.

Definition of orthogonality for subspaces
If V ,W ⊂ Rn are two non-zero subspaces, we say V and W are
orthogonal ( written as V ⊥W ) if v ·w = 0 ∀v ∈ V ,w ∈W . As with a
collection of vectors, a collection of subspaces {Vi} is orthogonal iff it is
pairwise orthogonal: Vi ⊥ Vj ∀i 6= j . Finally, we say that two sets of
vectors S,T are orthogonal if the dot product of any vector in S with any
other vector in T is zero. In this case we again write this as S ⊥ T .

Examples
Let V = Span{[2 3]T},W = Span{[3 (−2)]T} ⊂ R2. Then it is easy to
check that V and W are orthogonal (geometrically, they are represented
by two lines in R2 passing through the origin and forming a 90◦ angle
between them).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 17 / 51

Orthogonal Vectors and Subspaces

If V , W are two non-zero subspaces, it would be impossible to verify if
they are orthogonal by computing every possible dot product of a vector in
V with a vector in W . We need an efficient way of checking this.

Theorem

If V = Span{S},W = Span{T} are subspaces of Rn, then V ⊥W iff
S ⊥ T .

This applies in particular when S and T are minimal spanning sets (i.e.,
bases).

Definition of Orthogonal Complement
If W ⊂ Rn is a subspace, its orthogonal complement is given by
W⊥ := {v ∈ Rn | v ·w = 0∀w ∈W }.

W⊥ is the largest subspace of Rn
orthogonal to W as a subspace.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 18 / 51

Orthogonal Vectors and Subspaces

If V , W are two non-zero subspaces, it would be impossible to verify if
they are orthogonal by computing every possible dot product of a vector in
V with a vector in W . We need an efficient way of checking this.

Theorem

If V = Span{S},W = Span{T} are subspaces of Rn, then V ⊥W iff
S ⊥ T .

This applies in particular when S and T are minimal spanning sets (i.e.,
bases).

Definition of Orthogonal Complement
If W ⊂ Rn is a subspace, its orthogonal complement is given by
W⊥ := {v ∈ Rn | v ·w = 0∀w ∈W }.

W⊥ is the largest subspace of Rn
orthogonal to W as a subspace.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 18 / 51

Orthogonal Vectors and Subspaces

If V , W are two non-zero subspaces, it would be impossible to verify if
they are orthogonal by computing every possible dot product of a vector in
V with a vector in W . We need an efficient way of checking this.

Theorem
If V = Span{S},W = Span{T} are subspaces of Rn, then V ⊥W iff
S ⊥ T .

This applies in particular when S and T are minimal spanning sets (i.e.,
bases).

Definition of Orthogonal Complement
If W ⊂ Rn is a subspace, its orthogonal complement is given by
W⊥ := {v ∈ Rn | v ·w = 0∀w ∈W }.

W⊥ is the largest subspace of Rn
orthogonal to W as a subspace.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 18 / 51

Orthogonal Vectors and Subspaces

If V , W are two non-zero subspaces, it would be impossible to verify if
they are orthogonal by computing every possible dot product of a vector in
V with a vector in W . We need an efficient way of checking this.

Theorem
If V = Span{S},W = Span{T} are subspaces of Rn, then V ⊥W iff
S ⊥ T .

This applies in particular when S and T are minimal spanning sets (i.e.,
bases).

Definition of Orthogonal Complement
If W ⊂ Rn is a subspace, its orthogonal complement is given by
W⊥ := {v ∈ Rn | v ·w = 0∀w ∈W }.

W⊥ is the largest subspace of Rn
orthogonal to W as a subspace.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 18 / 51

Orthogonal Vectors and Subspaces

If V , W are two non-zero subspaces, it would be impossible to verify if
they are orthogonal by computing every possible dot product of a vector in
V with a vector in W . We need an efficient way of checking this.

Theorem
If V = Span{S},W = Span{T} are subspaces of Rn, then V ⊥W iff
S ⊥ T .

This applies in particular when S and T are minimal spanning sets (i.e.,
bases).

Definition of Orthogonal Complement
If W ⊂ Rn is a subspace, its orthogonal complement is given by
W⊥ := {v ∈ Rn | v ·w = 0∀w ∈W }.

W⊥ is the largest subspace of Rn
orthogonal to W as a subspace.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 18 / 51

Orthogonal Vectors and Subspaces

If V , W are two non-zero subspaces, it would be impossible to verify if
they are orthogonal by computing every possible dot product of a vector in
V with a vector in W . We need an efficient way of checking this.

Theorem
If V = Span{S},W = Span{T} are subspaces of Rn, then V ⊥W iff
S ⊥ T .

This applies in particular when S and T are minimal spanning sets (i.e.,
bases).

Definition of Orthogonal Complement
If W ⊂ Rn is a subspace, its orthogonal complement is given by
W⊥ := {v ∈ Rn | v ·w = 0∀w ∈W }. W⊥ is the largest subspace of Rn
orthogonal to W as a subspace.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 18 / 51

Orthogonal Vectors and Subspaces

Orthogonality is connected to the property of linear independence.

Lemma
If {v1, . . . , vm} is an orthogonal set of vectors in Rn, then it is linearly
independent.

Orthogonal Basis – Definition
An orthogonal basis is a basis consisting of orthogonal vectors.

Given this definition, it is natural to ask whether orthogonal bases always
exist. That question will be answered below.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 19 / 51

Orthogonal Vectors and Subspaces

Orthogonality is connected to the property of linear independence.

Lemma
If {v1, . . . , vm} is an orthogonal set of vectors in Rn, then it is linearly
independent.

Orthogonal Basis – Definition
An orthogonal basis is a basis consisting of orthogonal vectors.

Given this definition, it is natural to ask whether orthogonal bases always
exist. That question will be answered below.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 19 / 51

Orthogonal Vectors and Subspaces

Orthogonality is connected to the property of linear independence.

Lemma
If {v1, . . . , vm} is an orthogonal set of vectors in Rn, then it is linearly
independent.

Orthogonal Basis – Definition
An orthogonal basis is a basis consisting of orthogonal vectors.

Given this definition, it is natural to ask whether orthogonal bases always
exist. That question will be answered below.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 19 / 51

Orthogonal Vectors and Subspaces

Orthogonality is connected to the property of linear independence.

Lemma
If {v1, . . . , vm} is an orthogonal set of vectors in Rn, then it is linearly
independent.

Orthogonal Basis – Definition
An orthogonal basis is a basis consisting of orthogonal vectors.

Given this definition, it is natural to ask whether orthogonal bases always
exist. That question will be answered below.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 19 / 51

Orthogonal Matrices

An orthogonal set of vectors {u1,u2, . . . ,un} is said to be orthonormal if
‖ui‖ = 1, 1 ≤ i ≤ n.

Clearly, given an orthogonal set of vectors
{v1, v2, . . . , vn}, one can orthonormalize it by setting ui = vi/‖vi‖ for
each i .
We call an n × n matrix A orthogonal if the columns of A form an
orthonormal set of vectors.

Lemma
An n × n matrix A is orthogonal iff AT ∗ A = I.

Lemma
An n × n matrix A is orthogonal iff

v ·w = (A ∗ v) · (A ∗w)

for all v,w ∈ Rn.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 20 / 51

Orthogonal Matrices

An orthogonal set of vectors {u1,u2, . . . ,un} is said to be orthonormal if
‖ui‖ = 1, 1 ≤ i ≤ n. Clearly, given an orthogonal set of vectors
{v1, v2, . . . , vn}, one can orthonormalize it by setting ui = vi/‖vi‖ for
each i .

We call an n × n matrix A orthogonal if the columns of A form an
orthonormal set of vectors.

Lemma
An n × n matrix A is orthogonal iff AT ∗ A = I.

Lemma
An n × n matrix A is orthogonal iff

v ·w = (A ∗ v) · (A ∗w)

for all v,w ∈ Rn.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 20 / 51

Orthogonal Matrices

An orthogonal set of vectors {u1,u2, . . . ,un} is said to be orthonormal if
‖ui‖ = 1, 1 ≤ i ≤ n. Clearly, given an orthogonal set of vectors
{v1, v2, . . . , vn}, one can orthonormalize it by setting ui = vi/‖vi‖ for
each i .
We call an n × n matrix A orthogonal if the columns of A form an
orthonormal set of vectors.

Lemma
An n × n matrix A is orthogonal iff AT ∗ A = I.

Lemma
An n × n matrix A is orthogonal iff

v ·w = (A ∗ v) · (A ∗w)

for all v,w ∈ Rn.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 20 / 51

Orthogonal Matrices

An orthogonal set of vectors {u1,u2, . . . ,un} is said to be orthonormal if
‖ui‖ = 1, 1 ≤ i ≤ n. Clearly, given an orthogonal set of vectors
{v1, v2, . . . , vn}, one can orthonormalize it by setting ui = vi/‖vi‖ for
each i .
We call an n × n matrix A orthogonal if the columns of A form an
orthonormal set of vectors.

Lemma
An n × n matrix A is orthogonal iff AT ∗ A = I.

Lemma
An n × n matrix A is orthogonal iff

v ·w = (A ∗ v) · (A ∗w)

for all v,w ∈ Rn.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 20 / 51

Orthogonal Matrices

An orthogonal set of vectors {u1,u2, . . . ,un} is said to be orthonormal if
‖ui‖ = 1, 1 ≤ i ≤ n. Clearly, given an orthogonal set of vectors
{v1, v2, . . . , vn}, one can orthonormalize it by setting ui = vi/‖vi‖ for
each i .
We call an n × n matrix A orthogonal if the columns of A form an
orthonormal set of vectors.

Lemma
An n × n matrix A is orthogonal iff AT ∗ A = I.

Lemma
An n × n matrix A is orthogonal iff

v ·w = (A ∗ v) · (A ∗w)

for all v,w ∈ Rn.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 20 / 51

Projection onto a 1-Dimensional Subspace

Suppose V = Span{v} is a 1-dimensional subspace of Rn (so that v 6= 0).

Then given w ∈ Rn, we define the projection of w onto V to be

prV (w) :=
(

v ·w
v · v

)
v

Note that this quantity makes sense, as v 6= 0 implies v · v > 0.

Theorem
The vector prV (w) depends only on the vector w and the subspace V . In
other words, it does not depend on the choice of basis for V .

The vector prV (w) represents the V -component of w (in texts, this
projection is also referred to as the component of w in the direction of v.
We prefer the subspace interpretation, as it makes clear the independence
on the choice of basis element).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 21 / 51

Projection onto a 1-Dimensional Subspace

Suppose V = Span{v} is a 1-dimensional subspace of Rn (so that v 6= 0).
Then given w ∈ Rn, we define the projection of w onto V to be

prV (w) :=
(

v ·w
v · v

)
v

Note that this quantity makes sense, as v 6= 0 implies v · v > 0.

Theorem
The vector prV (w) depends only on the vector w and the subspace V . In
other words, it does not depend on the choice of basis for V .

The vector prV (w) represents the V -component of w (in texts, this
projection is also referred to as the component of w in the direction of v.
We prefer the subspace interpretation, as it makes clear the independence
on the choice of basis element).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 21 / 51

Projection onto a 1-Dimensional Subspace

Suppose V = Span{v} is a 1-dimensional subspace of Rn (so that v 6= 0).
Then given w ∈ Rn, we define the projection of w onto V to be

prV (w) :=
(

v ·w
v · v

)
v

Note that this quantity makes sense, as v 6= 0 implies v · v > 0.

Theorem
The vector prV (w) depends only on the vector w and the subspace V . In
other words, it does not depend on the choice of basis for V .

The vector prV (w) represents the V -component of w (in texts, this
projection is also referred to as the component of w in the direction of v.
We prefer the subspace interpretation, as it makes clear the independence
on the choice of basis element).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 21 / 51

Projection onto a 1-Dimensional Subspace

Suppose V = Span{v} is a 1-dimensional subspace of Rn (so that v 6= 0).
Then given w ∈ Rn, we define the projection of w onto V to be

prV (w) :=
(

v ·w
v · v

)
v

Note that this quantity makes sense, as v 6= 0 implies v · v > 0.

Theorem
The vector prV (w) depends only on the vector w and the subspace V . In
other words, it does not depend on the choice of basis for V .

The vector prV (w) represents the V -component of w (in texts, this
projection is also referred to as the component of w in the direction of v.
We prefer the subspace interpretation, as it makes clear the independence
on the choice of basis element).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 21 / 51

Projection onto a 1-Dimensional Subspace

Suppose V = Span{v} is a 1-dimensional subspace of Rn (so that v 6= 0).
Then given w ∈ Rn, we define the projection of w onto V to be

prV (w) :=
(

v ·w
v · v

)
v

Note that this quantity makes sense, as v 6= 0 implies v · v > 0.

Theorem
The vector prV (w) depends only on the vector w and the subspace V . In
other words, it does not depend on the choice of basis for V .

The vector prV (w) represents the V -component of w (in texts, this
projection is also referred to as the component of w in the direction of v.
We prefer the subspace interpretation, as it makes clear the independence
on the choice of basis element).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 21 / 51

Projection onto a 1-Dimensional Subspace

Suppose V = Span{v} is a 1-dimensional subspace of Rn (so that v 6= 0).
Then given w ∈ Rn, we define the projection of w onto V to be

prV (w) :=
(

v ·w
v · v

)
v

Note that this quantity makes sense, as v 6= 0 implies v · v > 0.

Theorem
The vector prV (w) depends only on the vector w and the subspace V . In
other words, it does not depend on the choice of basis for V .

The vector prV (w) represents the V -component of w (in texts, this
projection is also referred to as the component of w in the direction of v.
We prefer the subspace interpretation, as it makes clear the independence
on the choice of basis element).

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 21 / 51

Projections onto m-Dim. Subspaces and Gram-Schmidt

We know that every non-zero vector space admits a basis. It is natural
then to ask: does every non-zero inner product space admit an orthogonal
basis?

The answer is: yes, it does. In fact, given a basis for an inner product
space, there is a systematic way to convert it into an orthogonal basis.
And this method simultaneously provides a method for projecting a vector
onto a subspace.

Again, we discuss the procedure first for Rn equipped with its standard
scalar product, then show how this naturally extends to more general inner
product spaces.

Let {v1, v2, . . . , vn} be a basis for Rn. We will give an inductive procedure
for constructing an orthogonal basis for Rn from this original set.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 22 / 51

Projections onto m-Dim. Subspaces and Gram-Schmidt

We know that every non-zero vector space admits a basis. It is natural
then to ask: does every non-zero inner product space admit an orthogonal
basis?

The answer is: yes, it does.

In fact, given a basis for an inner product
space, there is a systematic way to convert it into an orthogonal basis.
And this method simultaneously provides a method for projecting a vector
onto a subspace.

Again, we discuss the procedure first for Rn equipped with its standard
scalar product, then show how this naturally extends to more general inner
product spaces.

Let {v1, v2, . . . , vn} be a basis for Rn. We will give an inductive procedure
for constructing an orthogonal basis for Rn from this original set.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 22 / 51

Projections onto m-Dim. Subspaces and Gram-Schmidt

We know that every non-zero vector space admits a basis. It is natural
then to ask: does every non-zero inner product space admit an orthogonal
basis?

The answer is: yes, it does. In fact, given a basis for an inner product
space, there is a systematic way to convert it into an orthogonal basis.
And this method simultaneously provides a method for projecting a vector
onto a subspace.

Again, we discuss the procedure first for Rn equipped with its standard
scalar product, then show how this naturally extends to more general inner
product spaces.

Let {v1, v2, . . . , vn} be a basis for Rn. We will give an inductive procedure
for constructing an orthogonal basis for Rn from this original set.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 22 / 51

Projections onto m-Dim. Subspaces and Gram-Schmidt

We know that every non-zero vector space admits a basis. It is natural
then to ask: does every non-zero inner product space admit an orthogonal
basis?

The answer is: yes, it does. In fact, given a basis for an inner product
space, there is a systematic way to convert it into an orthogonal basis.
And this method simultaneously provides a method for projecting a vector
onto a subspace.

Again, we discuss the procedure first for Rn equipped with its standard
scalar product, then show how this naturally extends to more general inner
product spaces.

Let {v1, v2, . . . , vn} be a basis for Rn. We will give an inductive procedure
for constructing an orthogonal basis for Rn from this original set.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 22 / 51

Projections onto m-Dim. Subspaces and Gram-Schmidt

We know that every non-zero vector space admits a basis. It is natural
then to ask: does every non-zero inner product space admit an orthogonal
basis?

The answer is: yes, it does. In fact, given a basis for an inner product
space, there is a systematic way to convert it into an orthogonal basis.
And this method simultaneously provides a method for projecting a vector
onto a subspace.

Again, we discuss the procedure first for Rn equipped with its standard
scalar product, then show how this naturally extends to more general inner
product spaces.

Let {v1, v2, . . . , vn} be a basis for Rn. We will give an inductive procedure
for constructing an orthogonal basis for Rn from this original set.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 22 / 51

Projections onto m-Dim. Subspaces and Gram-Schmidt

First, some notation. Let Wi := Span{v1, . . . , vi} be the span of the first i
vectors in the set.

Since any subset of linearly independent vectors is
linearly independent, we see that Dim(Wi ) = i , 1 ≤ i ≤ n, with Wn = Rn.

Now {v1} is an orthogonal basis for W1, since it has only one element. We
set u1 = v1, and consider the vector v2.

This need not be orthogonal to v1, but it cannot be simply a scalar
multiple of v1 either, since that would imply that the set {v1, v2} was
linearly dependent, contradicting what we know.

So we define
u2 := v2 − prW1(v2)

As we have just observed, u2 6= 0.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 23 / 51

Projections onto m-Dim. Subspaces and Gram-Schmidt

First, some notation. Let Wi := Span{v1, . . . , vi} be the span of the first i
vectors in the set. Since any subset of linearly independent vectors is
linearly independent, we see that Dim(Wi ) = i , 1 ≤ i ≤ n, with Wn = Rn.

Now {v1} is an orthogonal basis for W1, since it has only one element. We
set u1 = v1, and consider the vector v2.

This need not be orthogonal to v1, but it cannot be simply a scalar
multiple of v1 either, since that would imply that the set {v1, v2} was
linearly dependent, contradicting what we know.

So we define
u2 := v2 − prW1(v2)

As we have just observed, u2 6= 0.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 23 / 51

Projections onto m-Dim. Subspaces and Gram-Schmidt

First, some notation. Let Wi := Span{v1, . . . , vi} be the span of the first i
vectors in the set. Since any subset of linearly independent vectors is
linearly independent, we see that Dim(Wi ) = i , 1 ≤ i ≤ n, with Wn = Rn.

Now {v1} is an orthogonal basis for W1, since it has only one element. We
set u1 = v1, and consider the vector v2.

This need not be orthogonal to v1, but it cannot be simply a scalar
multiple of v1 either, since that would imply that the set {v1, v2} was
linearly dependent, contradicting what we know.

So we define
u2 := v2 − prW1(v2)

As we have just observed, u2 6= 0.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 23 / 51

Projections onto m-Dim. Subspaces and Gram-Schmidt

First, some notation. Let Wi := Span{v1, . . . , vi} be the span of the first i
vectors in the set. Since any subset of linearly independent vectors is
linearly independent, we see that Dim(Wi ) = i , 1 ≤ i ≤ n, with Wn = Rn.

Now {v1} is an orthogonal basis for W1, since it has only one element. We
set u1 = v1, and consider the vector v2.

This need not be orthogonal to v1, but it cannot be simply a scalar
multiple of v1 either, since that would imply that the set {v1, v2} was
linearly dependent, contradicting what we know.

So we define
u2 := v2 − prW1(v2)

As we have just observed, u2 6= 0.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 23 / 51

Projections onto m-Dim. Subspaces and Gram-Schmidt

First, some notation. Let Wi := Span{v1, . . . , vi} be the span of the first i
vectors in the set. Since any subset of linearly independent vectors is
linearly independent, we see that Dim(Wi ) = i , 1 ≤ i ≤ n, with Wn = Rn.

Now {v1} is an orthogonal basis for W1, since it has only one element. We
set u1 = v1, and consider the vector v2.

This need not be orthogonal to v1, but it cannot be simply a scalar
multiple of v1 either, since that would imply that the set {v1, v2} was
linearly dependent, contradicting what we know.

So we define
u2 := v2 − prW1(v2)

As we have just observed, u2 6= 0.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 23 / 51

Projections onto m-Dim. Subspaces and Gram-Schmidt

First, some notation. Let Wi := Span{v1, . . . , vi} be the span of the first i
vectors in the set. Since any subset of linearly independent vectors is
linearly independent, we see that Dim(Wi ) = i , 1 ≤ i ≤ n, with Wn = Rn.

Now {v1} is an orthogonal basis for W1, since it has only one element. We
set u1 = v1, and consider the vector v2.

This need not be orthogonal to v1, but it cannot be simply a scalar
multiple of v1 either, since that would imply that the set {v1, v2} was
linearly dependent, contradicting what we know.

So we define
u2 := v2 − prW1(v2)

As we have just observed, u2 6= 0.

Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 23 / 51

Projections onto m-Dim. Subspaces and Gram-Schmidt

Lemma
The dot product u1 · u2, is zero. Also Span({u1,u2}) = Span({v1, v2}).
Hence {u1,u2} is an orthonormal basis for W2.

We now suppose that we have constructed an orthogonal basis
{u1, . . . ,um} for Wm. We need to show how to this can be extended to
Wm+1 if m < n. First, for v ∈ Rn, we define the projection of v onto Wm to be prWm (v) := m∑ i=1 ( ui · v ui · ui ) ui ∈Wm Again, if v /∈Wm, then v 6= prWm (v) and so their difference will not be zero. As above, we then set um+1 = vm+1 − prWm (vm+1) Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 24 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Lemma The dot product u1 · u2, is zero. Also Span({u1,u2}) = Span({v1, v2}). Hence {u1,u2} is an orthonormal basis for W2. We now suppose that we have constructed an orthogonal basis {u1, . . . ,um} for Wm. We need to show how to this can be extended to Wm+1 if m < n. First, for v ∈ Rn, we define the projection of v onto Wm to be prWm (v) := m∑ i=1 ( ui · v ui · ui ) ui ∈Wm Again, if v /∈Wm, then v 6= prWm (v) and so their difference will not be zero. As above, we then set um+1 = vm+1 − prWm (vm+1) Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 24 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Lemma The dot product u1 · u2, is zero. Also Span({u1,u2}) = Span({v1, v2}). Hence {u1,u2} is an orthonormal basis for W2. We now suppose that we have constructed an orthogonal basis {u1, . . . ,um} for Wm. We need to show how to this can be extended to Wm+1 if m < n. First, for v ∈ Rn, we define the projection of v onto Wm to be prWm (v) := m∑ i=1 ( ui · v ui · ui ) ui ∈Wm Again, if v /∈Wm, then v 6= prWm (v) and so their difference will not be zero. As above, we then set um+1 = vm+1 − prWm (vm+1) Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 24 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Lemma The dot product u1 · u2, is zero. Also Span({u1,u2}) = Span({v1, v2}). Hence {u1,u2} is an orthonormal basis for W2. We now suppose that we have constructed an orthogonal basis {u1, . . . ,um} for Wm. We need to show how to this can be extended to Wm+1 if m < n. First, for v ∈ Rn, we define the projection of v onto Wm to be prWm (v) := m∑ i=1 ( ui · v ui · ui ) ui ∈Wm Again, if v /∈Wm, then v 6= prWm (v) and so their difference will not be zero. As above, we then set um+1 = vm+1 − prWm (vm+1) Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 24 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Lemma The dot product u1 · u2, is zero. Also Span({u1,u2}) = Span({v1, v2}). Hence {u1,u2} is an orthonormal basis for W2. We now suppose that we have constructed an orthogonal basis {u1, . . . ,um} for Wm. We need to show how to this can be extended to Wm+1 if m < n. First, for v ∈ Rn, we define the projection of v onto Wm to be prWm (v) := m∑ i=1 ( ui · v ui · ui ) ui ∈Wm Again, if v /∈Wm, then v 6= prWm (v) and so their difference will not be zero. As above, we then set um+1 = vm+1 − prWm (vm+1) Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 24 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Lemma The dot product u1 · u2, is zero. Also Span({u1,u2}) = Span({v1, v2}). Hence {u1,u2} is an orthonormal basis for W2. We now suppose that we have constructed an orthogonal basis {u1, . . . ,um} for Wm. We need to show how to this can be extended to Wm+1 if m < n. First, for v ∈ Rn, we define the projection of v onto Wm to be prWm (v) := m∑ i=1 ( ui · v ui · ui ) ui ∈Wm Again, if v /∈Wm, then v 6= prWm (v) and so their difference will not be zero. As above, we then set um+1 = vm+1 − prWm (vm+1) Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 24 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Lemma If m < n, then {u1, . . . ,um,um+1} is an orthogonal basis for Wm+1. Continuing in this fashion, we eventually reach the case m = n at which point the algorithm is complete. Note that this procedure depends not only on the basis but also on the order in which we list the basis elements - changing the order will (most of the time) result in a different orthogonal basis for Rn. Note also that this procedure works just as well if we start with a subspace of Rn, together with a basis for that subspace. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 25 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Lemma If m < n, then {u1, . . . ,um,um+1} is an orthogonal basis for Wm+1. Continuing in this fashion, we eventually reach the case m = n at which point the algorithm is complete. Note that this procedure depends not only on the basis but also on the order in which we list the basis elements - changing the order will (most of the time) result in a different orthogonal basis for Rn. Note also that this procedure works just as well if we start with a subspace of Rn, together with a basis for that subspace. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 25 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Lemma If m < n, then {u1, . . . ,um,um+1} is an orthogonal basis for Wm+1. Continuing in this fashion, we eventually reach the case m = n at which point the algorithm is complete. Note that this procedure depends not only on the basis but also on the order in which we list the basis elements - changing the order will (most of the time) result in a different orthogonal basis for Rn. Note also that this procedure works just as well if we start with a subspace of Rn, together with a basis for that subspace. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 25 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Theorem For any subspace W of Rn and basis S = {v1, . . . , vm} for that subspace, the Gram-Schmidt algorithm produces an orthogonal basis {u1, . . . ,um} for W , which depends only on the ordering of the initial basis elements in S. Given this orthogonal basis for W and an arbitrary vector v ∈ Rn, the projection of v onto W , or the W -component of v is given by prW (v) := m∑ i=1 ( ui · v ui · ui ) ui The reason for calling this projection the W -component of v is more or less clear, since the equation v = prW (v) + (v− prW (v)) gives v as a sum of i) its component in W , and ii) its component in W⊥. Rn = W ⊕W⊥, so this sum decomposition of v is unique. In other words Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 26 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Theorem For any subspace W of Rn and basis S = {v1, . . . , vm} for that subspace, the Gram-Schmidt algorithm produces an orthogonal basis {u1, . . . ,um} for W , which depends only on the ordering of the initial basis elements in S. Given this orthogonal basis for W and an arbitrary vector v ∈ Rn, the projection of v onto W , or the W -component of v is given by prW (v) := m∑ i=1 ( ui · v ui · ui ) ui The reason for calling this projection the W -component of v is more or less clear, since the equation v = prW (v) + (v− prW (v)) gives v as a sum of i) its component in W , and ii) its component in W⊥. Rn = W ⊕W⊥, so this sum decomposition of v is unique. In other words Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 26 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Theorem For any subspace W of Rn and basis S = {v1, . . . , vm} for that subspace, the Gram-Schmidt algorithm produces an orthogonal basis {u1, . . . ,um} for W , which depends only on the ordering of the initial basis elements in S. Given this orthogonal basis for W and an arbitrary vector v ∈ Rn, the projection of v onto W , or the W -component of v is given by prW (v) := m∑ i=1 ( ui · v ui · ui ) ui The reason for calling this projection the W -component of v is more or less clear, since the equation v = prW (v) + (v− prW (v)) gives v as a sum of i) its component in W , and ii) its component in W⊥. Rn = W ⊕W⊥, so this sum decomposition of v is unique. In other words Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 26 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Theorem For any subspace W of Rn and basis S = {v1, . . . , vm} for that subspace, the Gram-Schmidt algorithm produces an orthogonal basis {u1, . . . ,um} for W , which depends only on the ordering of the initial basis elements in S. Given this orthogonal basis for W and an arbitrary vector v ∈ Rn, the projection of v onto W , or the W -component of v is given by prW (v) := m∑ i=1 ( ui · v ui · ui ) ui The reason for calling this projection the W -component of v is more or less clear, since the equation v = prW (v) + (v− prW (v)) gives v as a sum of i) its component in W , and ii) its component in W⊥. Rn = W ⊕W⊥, so this sum decomposition of v is unique. In other words Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 26 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Theorem For any subspace W of Rn and basis S = {v1, . . . , vm} for that subspace, the Gram-Schmidt algorithm produces an orthogonal basis {u1, . . . ,um} for W , which depends only on the ordering of the initial basis elements in S. Given this orthogonal basis for W and an arbitrary vector v ∈ Rn, the projection of v onto W , or the W -component of v is given by prW (v) := m∑ i=1 ( ui · v ui · ui ) ui The reason for calling this projection the W -component of v is more or less clear, since the equation v = prW (v) + (v− prW (v)) gives v as a sum of i) its component in W , and ii) its component in W⊥. Rn = W ⊕W⊥, so this sum decomposition of v is unique. In other words Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 26 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Theorem For any subspace W of Rn and basis S = {v1, . . . , vm} for that subspace, the Gram-Schmidt algorithm produces an orthogonal basis {u1, . . . ,um} for W , which depends only on the ordering of the initial basis elements in S. Given this orthogonal basis for W and an arbitrary vector v ∈ Rn, the projection of v onto W , or the W -component of v is given by prW (v) := m∑ i=1 ( ui · v ui · ui ) ui The reason for calling this projection the W -component of v is more or less clear, since the equation v = prW (v) + (v− prW (v)) gives v as a sum of i) its component in W , and ii) its component in W⊥. Rn = W ⊕W⊥, so this sum decomposition of v is unique. In other words Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 26 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Theorem For any subspace W of Rn and basis S = {v1, . . . , vm} for that subspace, the Gram-Schmidt algorithm produces an orthogonal basis {u1, . . . ,um} for W , which depends only on the ordering of the initial basis elements in S. Given this orthogonal basis for W and an arbitrary vector v ∈ Rn, the projection of v onto W , or the W -component of v is given by prW (v) := m∑ i=1 ( ui · v ui · ui ) ui The reason for calling this projection the W -component of v is more or less clear, since the equation v = prW (v) + (v− prW (v)) gives v as a sum of i) its component in W , and ii) its component in W⊥. Rn = W ⊕W⊥, so this sum decomposition of v is unique. In other words Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 26 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Theorem For any non-zero subspace W ⊂ Rn, prW (v) is the unique vector in W for which the difference v− prW (v) lies in W⊥. As we are about to see, this is equivalent to saying that it is the vector in W closest to v, where distance is in terms of the standard Euclidean distance for Rn based on the scalar product. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 27 / 51 Projections onto m-Dim. Subspaces and Gram-Schmidt Theorem For any non-zero subspace W ⊂ Rn, prW (v) is the unique vector in W for which the difference v− prW (v) lies in W⊥. As we are about to see, this is equivalent to saying that it is the vector in W closest to v, where distance is in terms of the standard Euclidean distance for Rn based on the scalar product. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 27 / 51 Least-Squares Approximation As before, we are given a subspace W ⊂ Rn and a vector v ∈ Rn. The questions are then: (LS1) Is there a vector wv ∈W satisfying the property that ‖wv − v‖ ≤ ‖w− v‖ for all w ∈W ? (LS2) If such a vector exists, is it unique? The answer is “yes.” More precisely, the vector in W we are looking for is exactly the projection of v onto W : Theorem wv = prW (v). Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 28 / 51 Least-Squares Approximation As before, we are given a subspace W ⊂ Rn and a vector v ∈ Rn. The questions are then: (LS1) Is there a vector wv ∈W satisfying the property that ‖wv − v‖ ≤ ‖w− v‖ for all w ∈W ? (LS2) If such a vector exists, is it unique? The answer is “yes.” More precisely, the vector in W we are looking for is exactly the projection of v onto W : Theorem wv = prW (v). Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 28 / 51 Least-Squares Approximation As before, we are given a subspace W ⊂ Rn and a vector v ∈ Rn. The questions are then: (LS1) Is there a vector wv ∈W satisfying the property that ‖wv − v‖ ≤ ‖w− v‖ for all w ∈W ? (LS2) If such a vector exists, is it unique? The answer is “yes.” More precisely, the vector in W we are looking for is exactly the projection of v onto W : Theorem wv = prW (v). Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 28 / 51 Least-Squares Approximation As before, we are given a subspace W ⊂ Rn and a vector v ∈ Rn. The questions are then: (LS1) Is there a vector wv ∈W satisfying the property that ‖wv − v‖ ≤ ‖w− v‖ for all w ∈W ? (LS2) If such a vector exists, is it unique? The answer is “yes.” More precisely, the vector in W we are looking for is exactly the projection of v onto W : Theorem wv = prW (v). Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 28 / 51 Least-Squares Approximation As before, we are given a subspace W ⊂ Rn and a vector v ∈ Rn. The questions are then: (LS1) Is there a vector wv ∈W satisfying the property that ‖wv − v‖ ≤ ‖w− v‖ for all w ∈W ? (LS2) If such a vector exists, is it unique? The answer is “yes.” More precisely, the vector in W we are looking for is exactly the projection of v onto W : Theorem wv = prW (v). Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 28 / 51 Least-Squares Approximation The vector vm = prW (v) is referred to as the least-squares approximation of v by a vector in W , because prW (v) satisfies the property that ‖v− prW (v)‖2, which is computed as a sum of squares of differences in coordinates, is minimized. This turns out to have an important application to finding the best approximation to a system of equations in the event no actual solution exists. We discuss this next. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 29 / 51 Least-Squares Approximation The vector vm = prW (v) is referred to as the least-squares approximation of v by a vector in W , because prW (v) satisfies the property that ‖v− prW (v)‖2, which is computed as a sum of squares of differences in coordinates, is minimized. This turns out to have an important application to finding the best approximation to a system of equations in the event no actual solution exists. We discuss this next. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 29 / 51 Least-Squares and the Fundamental Subspaces Theorem Suppose we are given a matrix equation A ∗ x = b with x a vector variable taking values in Rn, and b a fixed vector in Rm (implying that A is an m × n matrix). The consistency theorem for systems of equations tells us that the equation is consistent precisely when b is in the span of the columns of A, or alternatively, when b ∈ C(A). But what if it is not? In other words, the system is inconsistent? Up until now we have simply left it at that; inconsistency was the end of the story. But it is not. Because whether or not the original system is consistent, one can always find a solution to the related equation A ∗ x = prC(A)(b) (4) because the projection prC(A)(b) of b onto the column space of A will always be in the column space of A regardless of whether or not the original vector b is. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 30 / 51 Least-Squares and the Fundamental Subspaces Theorem Suppose we are given a matrix equation A ∗ x = b with x a vector variable taking values in Rn, and b a fixed vector in Rm (implying that A is an m × n matrix). The consistency theorem for systems of equations tells us that the equation is consistent precisely when b is in the span of the columns of A, or alternatively, when b ∈ C(A). But what if it is not? In other words, the system is inconsistent? Up until now we have simply left it at that; inconsistency was the end of the story. But it is not. Because whether or not the original system is consistent, one can always find a solution to the related equation A ∗ x = prC(A)(b) (4) because the projection prC(A)(b) of b onto the column space of A will always be in the column space of A regardless of whether or not the original vector b is. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 30 / 51 Least-Squares and the Fundamental Subspaces Theorem Suppose we are given a matrix equation A ∗ x = b with x a vector variable taking values in Rn, and b a fixed vector in Rm (implying that A is an m × n matrix). The consistency theorem for systems of equations tells us that the equation is consistent precisely when b is in the span of the columns of A, or alternatively, when b ∈ C(A). But what if it is not? In other words, the system is inconsistent? Up until now we have simply left it at that; inconsistency was the end of the story. But it is not. Because whether or not the original system is consistent, one can always find a solution to the related equation A ∗ x = prC(A)(b) (4) because the projection prC(A)(b) of b onto the column space of A will always be in the column space of A regardless of whether or not the original vector b is. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 30 / 51 Least-Squares and the Fundamental Subspaces Theorem Suppose we are given a matrix equation A ∗ x = b with x a vector variable taking values in Rn, and b a fixed vector in Rm (implying that A is an m × n matrix). The consistency theorem for systems of equations tells us that the equation is consistent precisely when b is in the span of the columns of A, or alternatively, when b ∈ C(A). But what if it is not? In other words, the system is inconsistent? Up until now we have simply left it at that; inconsistency was the end of the story. But it is not. Because whether or not the original system is consistent, one can always find a solution to the related equation A ∗ x = prC(A)(b) (4) because the projection prC(A)(b) of b onto the column space of A will always be in the column space of A regardless of whether or not the original vector b is. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 30 / 51 Least-Squares and the Fundamental Subspaces Theorem Suppose we are given a matrix equation A ∗ x = b with x a vector variable taking values in Rn, and b a fixed vector in Rm (implying that A is an m × n matrix). The consistency theorem for systems of equations tells us that the equation is consistent precisely when b is in the span of the columns of A, or alternatively, when b ∈ C(A). But what if it is not? In other words, the system is inconsistent? Up until now we have simply left it at that; inconsistency was the end of the story. But it is not. Because whether or not the original system is consistent, one can always find a solution to the related equation A ∗ x = prC(A)(b) (4) because the projection prC(A)(b) of b onto the column space of A will always be in the column space of A regardless of whether or not the original vector b is. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 30 / 51 Least-Squares and the Fundamental Subspaces Theorem Suppose we are given a matrix equation A ∗ x = b with x a vector variable taking values in Rn, and b a fixed vector in Rm (implying that A is an m × n matrix). The consistency theorem for systems of equations tells us that the equation is consistent precisely when b is in the span of the columns of A, or alternatively, when b ∈ C(A). But what if it is not? In other words, the system is inconsistent? Up until now we have simply left it at that; inconsistency was the end of the story. But it is not. Because whether or not the original system is consistent, one can always find a solution to the related equation A ∗ x = prC(A)(b) (4) because the projection prC(A)(b) of b onto the column space of A will always be in the column space of A regardless of whether or not the original vector b is. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 30 / 51 Least-Squares and the Fundamental Subspaces Theorem The question then becomes: given that we know (4) has at least one solution, how do we go about finding it (or them)? The starting point for answering that question is the following theorem, often referred to as the Fundamental Subspaces theorem (originally proven by Gauss) Theorem For any m × n matrix A, there are equalities C(A)⊥ = N(AT ) C(A) = N(AT )⊥ C(AT )⊥ = N(A) C(AT ) = N(A)⊥ Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 31 / 51 Least-Squares and the Fundamental Subspaces Theorem The question then becomes: given that we know (4) has at least one solution, how do we go about finding it (or them)? The starting point for answering that question is the following theorem, often referred to as the Fundamental Subspaces theorem (originally proven by Gauss) Theorem For any m × n matrix A, there are equalities C(A)⊥ = N(AT ) C(A) = N(AT )⊥ C(AT )⊥ = N(A) C(AT ) = N(A)⊥ Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 31 / 51 Least-Squares and the Fundamental Subspaces Theorem The question then becomes: given that we know (4) has at least one solution, how do we go about finding it (or them)? The starting point for answering that question is the following theorem, often referred to as the Fundamental Subspaces theorem (originally proven by Gauss) Theorem For any m × n matrix A, there are equalities C(A)⊥ = N(AT ) C(A) = N(AT )⊥ C(AT )⊥ = N(A) C(AT ) = N(A)⊥ Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 31 / 51 Least-Squares and the Fundamental Subspaces Theorem The question then becomes: given that we know (4) has at least one solution, how do we go about finding it (or them)? The starting point for answering that question is the following theorem, often referred to as the Fundamental Subspaces theorem (originally proven by Gauss) Theorem For any m × n matrix A, there are equalities C(A)⊥ = N(AT ) C(A) = N(AT )⊥ C(AT )⊥ = N(A) C(AT ) = N(A)⊥ Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 31 / 51 Least-Squares and the Fundamental Subspaces Theorem Write b′ for prC(A)(b). Then A ∗ x = b′ ⇔ A ∗ x− b ∈ C(A)⊥ (as b′ is the unique vector in C(A) with b′ − b ∈ C(A)⊥) ⇔ A ∗ x− b ∈ N(AT ) (by Theorem 31) ⇔ AT ∗ A ∗ x− AT ∗ b = AT ∗ (A ∗ x− b) = 0 ⇔ AT ∗ A ∗ x = AT ∗ b This last equation AT ∗ A ∗ x = AT ∗ b has the same set of solutions as the equation that started the sequence, namely A ∗ x = b′, and is therefore always consistent. It is derived from our original equation A ∗ x = b by simply multiplying both sides on the left by AT , and is often referred to as the associated normal equation of the original matrix equation from which it was derived. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 32 / 51 Least-Squares and the Fundamental Subspaces Theorem Write b′ for prC(A)(b). Then A ∗ x = b′ ⇔ A ∗ x− b ∈ C(A)⊥ (as b′ is the unique vector in C(A) with b′ − b ∈ C(A)⊥) ⇔ A ∗ x− b ∈ N(AT ) (by Theorem 31) ⇔ AT ∗ A ∗ x− AT ∗ b = AT ∗ (A ∗ x− b) = 0 ⇔ AT ∗ A ∗ x = AT ∗ b This last equation AT ∗ A ∗ x = AT ∗ b has the same set of solutions as the equation that started the sequence, namely A ∗ x = b′, and is therefore always consistent. It is derived from our original equation A ∗ x = b by simply multiplying both sides on the left by AT , and is often referred to as the associated normal equation of the original matrix equation from which it was derived. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 32 / 51 Least-Squares and the Fundamental Subspaces Theorem Write b′ for prC(A)(b). Then A ∗ x = b′ ⇔ A ∗ x− b ∈ C(A)⊥ (as b′ is the unique vector in C(A) with b′ − b ∈ C(A)⊥) ⇔ A ∗ x− b ∈ N(AT ) (by Theorem 31) ⇔ AT ∗ A ∗ x− AT ∗ b = AT ∗ (A ∗ x− b) = 0 ⇔ AT ∗ A ∗ x = AT ∗ b This last equation AT ∗ A ∗ x = AT ∗ b has the same set of solutions as the equation that started the sequence, namely A ∗ x = b′, and is therefore always consistent. It is derived from our original equation A ∗ x = b by simply multiplying both sides on the left by AT , and is often referred to as the associated normal equation of the original matrix equation from which it was derived. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 32 / 51 Least-Squares and the Fundamental Subspaces Theorem Write b′ for prC(A)(b). Then A ∗ x = b′ ⇔ A ∗ x− b ∈ C(A)⊥ (as b′ is the unique vector in C(A) with b′ − b ∈ C(A)⊥) ⇔ A ∗ x− b ∈ N(AT ) (by Theorem 31) ⇔ AT ∗ A ∗ x− AT ∗ b = AT ∗ (A ∗ x− b) = 0 ⇔ AT ∗ A ∗ x = AT ∗ b This last equation AT ∗ A ∗ x = AT ∗ b has the same set of solutions as the equation that started the sequence, namely A ∗ x = b′, and is therefore always consistent. It is derived from our original equation A ∗ x = b by simply multiplying both sides on the left by AT , and is often referred to as the associated normal equation of the original matrix equation from which it was derived. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 32 / 51 Least-Squares and the Fundamental Subspaces Theorem Write b′ for prC(A)(b). Then A ∗ x = b′ ⇔ A ∗ x− b ∈ C(A)⊥ (as b′ is the unique vector in C(A) with b′ − b ∈ C(A)⊥) ⇔ A ∗ x− b ∈ N(AT ) (by Theorem 31) ⇔ AT ∗ A ∗ x− AT ∗ b = AT ∗ (A ∗ x− b) = 0 ⇔ AT ∗ A ∗ x = AT ∗ b This last equation AT ∗ A ∗ x = AT ∗ b has the same set of solutions as the equation that started the sequence, namely A ∗ x = b′, and is therefore always consistent. It is derived from our original equation A ∗ x = b by simply multiplying both sides on the left by AT , and is often referred to as the associated normal equation of the original matrix equation from which it was derived. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 32 / 51 Least-Squares and the Fundamental Subspaces Theorem Write b′ for prC(A)(b). Then A ∗ x = b′ ⇔ A ∗ x− b ∈ C(A)⊥ (as b′ is the unique vector in C(A) with b′ − b ∈ C(A)⊥) ⇔ A ∗ x− b ∈ N(AT ) (by Theorem 31) ⇔ AT ∗ A ∗ x− AT ∗ b = AT ∗ (A ∗ x− b) = 0 ⇔ AT ∗ A ∗ x = AT ∗ b This last equation AT ∗ A ∗ x = AT ∗ b has the same set of solutions as the equation that started the sequence, namely A ∗ x = b′, and is therefore always consistent. It is derived from our original equation A ∗ x = b by simply multiplying both sides on the left by AT , and is often referred to as the associated normal equation of the original matrix equation from which it was derived. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 32 / 51 Least-Squares and the Fundamental Subspaces Theorem This yields a straightforward procedure for finding the least-squares solution to our original equation A ∗ x = b; i.e., a solution to the associated normal equation AT ∗ A ∗ x = AT ∗ b, which by the above is equivalent to a solution to the related equation A ∗ x = prC(A)(b). Note that the original equation is consistent precisely when b ∈ C(A), or equivalently when b = prC(A)(b); in other words, when the least-squares solution is an exact solution. The advantages to seeking a least-squares solution are i) it always exists (regardless of whether or not the original equation is consistent), and ii) it yields an actual solution whenever an actual solutions exist. Because this procedure finds the least-squares solution first, it can be also applied to finding the least-squares approximation to b as prC(A)(b) = A ∗ x, where x is a least-squares solution to the original equation. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 33 / 51 Least-Squares and the Fundamental Subspaces Theorem This yields a straightforward procedure for finding the least-squares solution to our original equation A ∗ x = b; i.e., a solution to the associated normal equation AT ∗ A ∗ x = AT ∗ b, which by the above is equivalent to a solution to the related equation A ∗ x = prC(A)(b). Note that the original equation is consistent precisely when b ∈ C(A), or equivalently when b = prC(A)(b); in other words, when the least-squares solution is an exact solution. The advantages to seeking a least-squares solution are i) it always exists (regardless of whether or not the original equation is consistent), and ii) it yields an actual solution whenever an actual solutions exist. Because this procedure finds the least-squares solution first, it can be also applied to finding the least-squares approximation to b as prC(A)(b) = A ∗ x, where x is a least-squares solution to the original equation. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 33 / 51 Least-Squares and the Fundamental Subspaces Theorem This yields a straightforward procedure for finding the least-squares solution to our original equation A ∗ x = b; i.e., a solution to the associated normal equation AT ∗ A ∗ x = AT ∗ b, which by the above is equivalent to a solution to the related equation A ∗ x = prC(A)(b). Note that the original equation is consistent precisely when b ∈ C(A), or equivalently when b = prC(A)(b); in other words, when the least-squares solution is an exact solution. The advantages to seeking a least-squares solution are i) it always exists (regardless of whether or not the original equation is consistent), and ii) it yields an actual solution whenever an actual solutions exist. Because this procedure finds the least-squares solution first, it can be also applied to finding the least-squares approximation to b as prC(A)(b) = A ∗ x, where x is a least-squares solution to the original equation. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 33 / 51 Least-Squares and the Fundamental Subspaces Theorem This yields a straightforward procedure for finding the least-squares solution to our original equation A ∗ x = b; i.e., a solution to the associated normal equation AT ∗ A ∗ x = AT ∗ b, which by the above is equivalent to a solution to the related equation A ∗ x = prC(A)(b). Note that the original equation is consistent precisely when b ∈ C(A), or equivalently when b = prC(A)(b); in other words, when the least-squares solution is an exact solution. The advantages to seeking a least-squares solution are i) it always exists (regardless of whether or not the original equation is consistent), and ii) it yields an actual solution whenever an actual solutions exist. Because this procedure finds the least-squares solution first, it can be also applied to finding the least-squares approximation to b as prC(A)(b) = A ∗ x, where x is a least-squares solution to the original equation. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 33 / 51 Least-Squares and the Fundamental Subspaces Theorem The steps are: 1 Form the associated normal equation AT ∗ A ∗ x = AT ∗ b; 2 find the solution(s) to the normal equation by computing rref ([AT ∗ A | AT ∗ b]). These will be the least-squares solution(s) to the original equation; 3 for any least-squares solution x from Step 2, compute A ∗ x. This will yield the least-squares approximation prC(A)(b) to b by a vector in the column space of A. Again, there will only be one least-squares approximation to b by a vector in C(A), because we have already seen such a vector is unique. However, the set of least-squares solutions to the original equation may not be unique. Thus another consequence of this theory is Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 34 / 51 Least-Squares and the Fundamental Subspaces Theorem The steps are: 1 Form the associated normal equation AT ∗ A ∗ x = AT ∗ b; 2 find the solution(s) to the normal equation by computing rref ([AT ∗ A | AT ∗ b]). These will be the least-squares solution(s) to the original equation; 3 for any least-squares solution x from Step 2, compute A ∗ x. This will yield the least-squares approximation prC(A)(b) to b by a vector in the column space of A. Again, there will only be one least-squares approximation to b by a vector in C(A), because we have already seen such a vector is unique. However, the set of least-squares solutions to the original equation may not be unique. Thus another consequence of this theory is Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 34 / 51 Least-Squares and the Fundamental Subspaces Theorem The steps are: 1 Form the associated normal equation AT ∗ A ∗ x = AT ∗ b; 2 find the solution(s) to the normal equation by computing rref ([AT ∗ A | AT ∗ b]). These will be the least-squares solution(s) to the original equation; 3 for any least-squares solution x from Step 2, compute A ∗ x. This will yield the least-squares approximation prC(A)(b) to b by a vector in the column space of A. Again, there will only be one least-squares approximation to b by a vector in C(A), because we have already seen such a vector is unique. However, the set of least-squares solutions to the original equation may not be unique. Thus another consequence of this theory is Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 34 / 51 Least-Squares and the Fundamental Subspaces Theorem The steps are: 1 Form the associated normal equation AT ∗ A ∗ x = AT ∗ b; 2 find the solution(s) to the normal equation by computing rref ([AT ∗ A | AT ∗ b]). These will be the least-squares solution(s) to the original equation; 3 for any least-squares solution x from Step 2, compute A ∗ x. This will yield the least-squares approximation prC(A)(b) to b by a vector in the column space of A. Again, there will only be one least-squares approximation to b by a vector in C(A), because we have already seen such a vector is unique. However, the set of least-squares solutions to the original equation may not be unique. Thus another consequence of this theory is Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 34 / 51 Least-Squares and the Fundamental Subspaces Theorem The steps are: 1 Form the associated normal equation AT ∗ A ∗ x = AT ∗ b; 2 find the solution(s) to the normal equation by computing rref ([AT ∗ A | AT ∗ b]). These will be the least-squares solution(s) to the original equation; 3 for any least-squares solution x from Step 2, compute A ∗ x. This will yield the least-squares approximation prC(A)(b) to b by a vector in the column space of A. Again, there will only be one least-squares approximation to b by a vector in C(A), because we have already seen such a vector is unique. However, the set of least-squares solutions to the original equation may not be unique. Thus another consequence of this theory is Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 34 / 51 Least-Squares and the Fundamental Subspaces Theorem Corollary The value of A ∗ x remains constant as x ranges over the set of least-squares solutions to the matrix equation A ∗ x = b. A final question then remains; when will there be a unique least-squares solution? We say that the matrix A has full column rank (or just full rank when there is no confusion) if the columns of A are linearly independent; namely that rank(A) = n. If A is m × n, this imposes the constraint that m ≥ n (otherwise the rank would have to be less than n the number of columns). A useful fact about the ranks of matrices (which we do not prove here) is Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 35 / 51 Least-Squares and the Fundamental Subspaces Theorem Corollary The value of A ∗ x remains constant as x ranges over the set of least-squares solutions to the matrix equation A ∗ x = b. A final question then remains; when will there be a unique least-squares solution? We say that the matrix A has full column rank (or just full rank when there is no confusion) if the columns of A are linearly independent; namely that rank(A) = n. If A is m × n, this imposes the constraint that m ≥ n (otherwise the rank would have to be less than n the number of columns). A useful fact about the ranks of matrices (which we do not prove here) is Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 35 / 51 Least-Squares and the Fundamental Subspaces Theorem Corollary The value of A ∗ x remains constant as x ranges over the set of least-squares solutions to the matrix equation A ∗ x = b. A final question then remains; when will there be a unique least-squares solution? We say that the matrix A has full column rank (or just full rank when there is no confusion) if the columns of A are linearly independent; namely that rank(A) = n. If A is m × n, this imposes the constraint that m ≥ n (otherwise the rank would have to be less than n the number of columns). A useful fact about the ranks of matrices (which we do not prove here) is Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 35 / 51 Least-Squares and the Fundamental Subspaces Theorem Corollary The value of A ∗ x remains constant as x ranges over the set of least-squares solutions to the matrix equation A ∗ x = b. A final question then remains; when will there be a unique least-squares solution? We say that the matrix A has full column rank (or just full rank when there is no confusion) if the columns of A are linearly independent; namely that rank(A) = n. If A is m × n, this imposes the constraint that m ≥ n (otherwise the rank would have to be less than n the number of columns). A useful fact about the ranks of matrices (which we do not prove here) is Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 35 / 51 Least-Squares and the Fundamental Subspaces Theorem Lemma For any matrix A, rank(A) = rank(AT ∗ A). In particular, A has full column rank iff AT ∗ A is non-singular. As the normal equation is always consistent, we see Theorem AT ∗ A ∗ x = AT ∗ b will have a unique solution precisely when N(AT ∗ A) = {0}, which happens iff AT ∗ A is non-singular. In this case, the unique least-squares solution is given by x = (AT ∗ A)−1 ∗ AT ∗ b and the least-squares approximation to b by a vector in C(A) is prC(A)(b) = A ∗ x = A ∗ (AT ∗ A)−1 ∗ AT ∗ b Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 36 / 51 Least-Squares and the Fundamental Subspaces Theorem Lemma For any matrix A, rank(A) = rank(AT ∗ A). In particular, A has full column rank iff AT ∗ A is non-singular. As the normal equation is always consistent, we see Theorem AT ∗ A ∗ x = AT ∗ b will have a unique solution precisely when N(AT ∗ A) = {0}, which happens iff AT ∗ A is non-singular. In this case, the unique least-squares solution is given by x = (AT ∗ A)−1 ∗ AT ∗ b and the least-squares approximation to b by a vector in C(A) is prC(A)(b) = A ∗ x = A ∗ (AT ∗ A)−1 ∗ AT ∗ b Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 36 / 51 Least-Squares and the Fundamental Subspaces Theorem Lemma For any matrix A, rank(A) = rank(AT ∗ A). In particular, A has full column rank iff AT ∗ A is non-singular. As the normal equation is always consistent, we see Theorem AT ∗ A ∗ x = AT ∗ b will have a unique solution precisely when N(AT ∗ A) = {0}, which happens iff AT ∗ A is non-singular. In this case, the unique least-squares solution is given by x = (AT ∗ A)−1 ∗ AT ∗ b and the least-squares approximation to b by a vector in C(A) is prC(A)(b) = A ∗ x = A ∗ (AT ∗ A)−1 ∗ AT ∗ b Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 36 / 51 Least-Squares and the Fundamental Subspaces Theorem Lemma For any matrix A, rank(A) = rank(AT ∗ A). In particular, A has full column rank iff AT ∗ A is non-singular. As the normal equation is always consistent, we see Theorem AT ∗ A ∗ x = AT ∗ b will have a unique solution precisely when N(AT ∗ A) = {0}, which happens iff AT ∗ A is non-singular. In this case, the unique least-squares solution is given by x = (AT ∗ A)−1 ∗ AT ∗ b and the least-squares approximation to b by a vector in C(A) is prC(A)(b) = A ∗ x = A ∗ (AT ∗ A)−1 ∗ AT ∗ b Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 36 / 51 Least-Squares and the Fundamental Subspaces Theorem Lemma For any matrix A, rank(A) = rank(AT ∗ A). In particular, A has full column rank iff AT ∗ A is non-singular. As the normal equation is always consistent, we see Theorem AT ∗ A ∗ x = AT ∗ b will have a unique solution precisely when N(AT ∗ A) = {0}, which happens iff AT ∗ A is non-singular. In this case, the unique least-squares solution is given by x = (AT ∗ A)−1 ∗ AT ∗ b and the least-squares approximation to b by a vector in C(A) is prC(A)(b) = A ∗ x = A ∗ (AT ∗ A)−1 ∗ AT ∗ b Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 36 / 51 Least-Squares and the Fundamental Subspaces Theorem Lemma For any matrix A, rank(A) = rank(AT ∗ A). In particular, A has full column rank iff AT ∗ A is non-singular. As the normal equation is always consistent, we see Theorem AT ∗ A ∗ x = AT ∗ b will have a unique solution precisely when N(AT ∗ A) = {0}, which happens iff AT ∗ A is non-singular. In this case, the unique least-squares solution is given by x = (AT ∗ A)−1 ∗ AT ∗ b and the least-squares approximation to b by a vector in C(A) is prC(A)(b) = A ∗ x = A ∗ (AT ∗ A)−1 ∗ AT ∗ b Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 36 / 51 Applications of Least-Squares - Projection onto a subspace The theorem stated at the end of the previous section indicates an alternative, and more computationally efficient method of computing the projection of a vector onto a subspace W of Rn. Previously we had to first establish an orthogonal basis for W . But given any basis {v1, . . . , vm} for W , we can avoid first orthogonalizing the basis by Concatenating the basis vectors to form the matrix A with A(:, i) = vi , 1 ≤ i ≤ m, then for any vector v ∈ Rn, computing the projection of v onto W = C(A) as prW (v) = A ∗ (AT ∗ A)−1 ∗ AT ∗ v Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 37 / 51 Applications of Least-Squares - Projection onto a subspace The theorem stated at the end of the previous section indicates an alternative, and more computationally efficient method of computing the projection of a vector onto a subspace W of Rn. Previously we had to first establish an orthogonal basis for W . But given any basis {v1, . . . , vm} for W , we can avoid first orthogonalizing the basis by Concatenating the basis vectors to form the matrix A with A(:, i) = vi , 1 ≤ i ≤ m, then for any vector v ∈ Rn, computing the projection of v onto W = C(A) as prW (v) = A ∗ (AT ∗ A)−1 ∗ AT ∗ v Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 37 / 51 Applications of Least-Squares - Projection onto a subspace The theorem stated at the end of the previous section indicates an alternative, and more computationally efficient method of computing the projection of a vector onto a subspace W of Rn. Previously we had to first establish an orthogonal basis for W . But given any basis {v1, . . . , vm} for W , we can avoid first orthogonalizing the basis by Concatenating the basis vectors to form the matrix A with A(:, i) = vi , 1 ≤ i ≤ m, then for any vector v ∈ Rn, computing the projection of v onto W = C(A) as prW (v) = A ∗ (AT ∗ A)−1 ∗ AT ∗ v Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 37 / 51 Applications of Least-Squares - Projection onto a subspace The theorem stated at the end of the previous section indicates an alternative, and more computationally efficient method of computing the projection of a vector onto a subspace W of Rn. Previously we had to first establish an orthogonal basis for W . But given any basis {v1, . . . , vm} for W , we can avoid first orthogonalizing the basis by Concatenating the basis vectors to form the matrix A with A(:, i) = vi , 1 ≤ i ≤ m, then for any vector v ∈ Rn, computing the projection of v onto W = C(A) as prW (v) = A ∗ (AT ∗ A)−1 ∗ AT ∗ v Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 37 / 51 Applications of Least-Squares - Projection onto a subspace The theorem stated at the end of the previous section indicates an alternative, and more computationally efficient method of computing the projection of a vector onto a subspace W of Rn. Previously we had to first establish an orthogonal basis for W . But given any basis {v1, . . . , vm} for W , we can avoid first orthogonalizing the basis by Concatenating the basis vectors to form the matrix A with A(:, i) = vi , 1 ≤ i ≤ m, then for any vector v ∈ Rn, computing the projection of v onto W = C(A) as prW (v) = A ∗ (AT ∗ A)−1 ∗ AT ∗ v Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 37 / 51 Applications of Least-Squares - Projection onto a subspace Theorem If an m× n matrix A has full rank, then B = A ∗ (AT ∗ A)−1 ∗ AT satisfies the identity B ∗ B = B (a matrix satisfying such an identity is called a projection matrix, since the linear transformation it defines on Rn corresponds exactly to projection onto its column space C(B)). Left-multiplication by B then corresponds to the projection map prC(A)(−): for all v ∈ Rm prC(A)(v) = B ∗ v Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 38 / 51 Applications of Least-Squares - Projection onto a subspace Theorem If an m× n matrix A has full rank, then B = A ∗ (AT ∗ A)−1 ∗ AT satisfies the identity B ∗ B = B (a matrix satisfying such an identity is called a projection matrix, since the linear transformation it defines on Rn corresponds exactly to projection onto its column space C(B)). Left-multiplication by B then corresponds to the projection map prC(A)(−): for all v ∈ Rm prC(A)(v) = B ∗ v Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 38 / 51 Applications of Least-Squares - Projection onto a subspace Theorem If an m× n matrix A has full rank, then B = A ∗ (AT ∗ A)−1 ∗ AT satisfies the identity B ∗ B = B (a matrix satisfying such an identity is called a projection matrix, since the linear transformation it defines on Rn corresponds exactly to projection onto its column space C(B)). Left-multiplication by B then corresponds to the projection map prC(A)(−): for all v ∈ Rm prC(A)(v) = B ∗ v Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 38 / 51 Applications of Least-Squares - Projection onto a subspace Theorem If an m× n matrix A has full rank, then B = A ∗ (AT ∗ A)−1 ∗ AT satisfies the identity B ∗ B = B (a matrix satisfying such an identity is called a projection matrix, since the linear transformation it defines on Rn corresponds exactly to projection onto its column space C(B)). Left-multiplication by B then corresponds to the projection map prC(A)(−): for all v ∈ Rm prC(A)(v) = B ∗ v Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 38 / 51 Applications of Least-Squares - Polynomial Data Fitting We suppose given n points {(x1, y1), (x2, y2), . . . , (xn, yn)} in the plane R2, with distinct x -coordinates (in practice, such sets of points can arise as data based on the measurement of some quantity - recorded as the y -coordinate - as a function of some parameter recorded as the x -coordinate). We would like to find the equation of the line that best fits these points (by exactly what measurement the line represents a best possible fit is explained below). If we write the equation of the line as y = l(x) = c0 + c1x for indeterminants c0, c1, then what we are looking for is a least-squares solution to the n × 2 system of equations c0 + c1x1 = y1 c0 + c1x2 = y2 ... c0 + c1xn = yn Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 39 / 51 Applications of Least-Squares - Polynomial Data Fitting We suppose given n points {(x1, y1), (x2, y2), . . . , (xn, yn)} in the plane R2, with distinct x -coordinates (in practice, such sets of points can arise as data based on the measurement of some quantity - recorded as the y -coordinate - as a function of some parameter recorded as the x -coordinate). We would like to find the equation of the line that best fits these points (by exactly what measurement the line represents a best possible fit is explained below). If we write the equation of the line as y = l(x) = c0 + c1x for indeterminants c0, c1, then what we are looking for is a least-squares solution to the n × 2 system of equations c0 + c1x1 = y1 c0 + c1x2 = y2 ... c0 + c1xn = yn Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 39 / 51 Applications of Least-Squares - Polynomial Data Fitting We suppose given n points {(x1, y1), (x2, y2), . . . , (xn, yn)} in the plane R2, with distinct x -coordinates (in practice, such sets of points can arise as data based on the measurement of some quantity - recorded as the y -coordinate - as a function of some parameter recorded as the x -coordinate). We would like to find the equation of the line that best fits these points (by exactly what measurement the line represents a best possible fit is explained below). If we write the equation of the line as y = l(x) = c0 + c1x for indeterminants c0, c1, then what we are looking for is a least-squares solution to the n × 2 system of equations c0 + c1x1 = y1 c0 + c1x2 = y2 ... c0 + c1xn = yn Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 39 / 51 Applications of Least-Squares - Polynomial Data Fitting We suppose given n points {(x1, y1), (x2, y2), . . . , (xn, yn)} in the plane R2, with distinct x -coordinates (in practice, such sets of points can arise as data based on the measurement of some quantity - recorded as the y -coordinate - as a function of some parameter recorded as the x -coordinate). We would like to find the equation of the line that best fits these points (by exactly what measurement the line represents a best possible fit is explained below). If we write the equation of the line as y = l(x) = c0 + c1x for indeterminants c0, c1, then what we are looking for is a least-squares solution to the n × 2 system of equations c0 + c1x1 = y1 c0 + c1x2 = y2 ... c0 + c1xn = yn Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 39 / 51 Applications of Least-Squares - Polynomial Data Fitting Note that, in this system, the xi and yj are constants, and we are trying to solve for c0 and c1. For n ≤ 2 there will be a solution, but in the overdetermined case there almost always fails to be one. Hence the need to work in the least-squares setting. Examples We wish to find the least-squares fit by a linear equation to the set of points (2, 3), (4, 6), (7, 10), (9, 14). This problem can be represented by the matrix equation A1 ∗ c = y where A1 =   1 2 1 4 1 7 1 9  , c = [ c0 c1 ] , and y =   3 6 10 14  . We note that this matrix is full rank. Therefore least-squares solution is unique and given by Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 40 / 51 Applications of Least-Squares - Polynomial Data Fitting Note that, in this system, the xi and yj are constants, and we are trying to solve for c0 and c1. For n ≤ 2 there will be a solution, but in the overdetermined case there almost always fails to be one. Hence the need to work in the least-squares setting. Examples We wish to find the least-squares fit by a linear equation to the set of points (2, 3), (4, 6), (7, 10), (9, 14). This problem can be represented by the matrix equation A1 ∗ c = y where A1 =   1 2 1 4 1 7 1 9  , c = [ c0 c1 ] , and y =   3 6 10 14  . We note that this matrix is full rank. Therefore least-squares solution is unique and given by Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 40 / 51 Applications of Least-Squares - Polynomial Data Fitting Note that, in this system, the xi and yj are constants, and we are trying to solve for c0 and c1. For n ≤ 2 there will be a solution, but in the overdetermined case there almost always fails to be one. Hence the need to work in the least-squares setting. Examples We wish to find the least-squares fit by a linear equation to the set of points (2, 3), (4, 6), (7, 10), (9, 14). This problem can be represented by the matrix equation A1 ∗ c = y where A1 =   1 2 1 4 1 7 1 9  , c = [ c0 c1 ] , and y =   3 6 10 14  . We note that this matrix is full rank. Therefore least-squares solution is unique and given by Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 40 / 51 Applications of Least-Squares - Polynomial Data Fitting Note that, in this system, the xi and yj are constants, and we are trying to solve for c0 and c1. For n ≤ 2 there will be a solution, but in the overdetermined case there almost always fails to be one. Hence the need to work in the least-squares setting. Examples We wish to find the least-squares fit by a linear equation to the set of points (2, 3), (4, 6), (7, 10), (9, 14). This problem can be represented by the matrix equation A1 ∗ c = y where A1 =   1 2 1 4 1 7 1 9  , c = [ c0 c1 ] , and y =   3 6 10 14  . We note that this matrix is full rank. Therefore least-squares solution is unique and given by Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 40 / 51 Applications of Least-Squares - Polynomial Data Fitting Note that, in this system, the xi and yj are constants, and we are trying to solve for c0 and c1. For n ≤ 2 there will be a solution, but in the overdetermined case there almost always fails to be one. Hence the need to work in the least-squares setting. Examples We wish to find the least-squares fit by a linear equation to the set of points (2, 3), (4, 6), (7, 10), (9, 14). This problem can be represented by the matrix equation A1 ∗ c = y where A1 =   1 2 1 4 1 7 1 9  , c = [ c0 c1 ] , and y =   3 6 10 14  . We note that this matrix is full rank. Therefore least-squares solution is unique and given by Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 40 / 51 Applications of Least-Squares - Polynomial Data Fitting Examples c = (A1T ∗ A1)−1 ∗ A1T ∗ y = [ −.18966 1.53448 ] Thus the desired equation is given by l(x) = −.18966 + 1.53448x We can also measure the degree to which this comes close to being an actual solution (which would only exist if the points were colinear). Given c, the vector y1 := A1 ∗ c =   2.8793 5.9483 10.5517 13.6207   Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 41 / 51 Applications of Least-Squares - Polynomial Data Fitting Examples c = (A1T ∗ A1)−1 ∗ A1T ∗ y = [ −.18966 1.53448 ] Thus the desired equation is given by l(x) = −.18966 + 1.53448x We can also measure the degree to which this comes close to being an actual solution (which would only exist if the points were colinear). Given c, the vector y1 := A1 ∗ c =   2.8793 5.9483 10.5517 13.6207   Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 41 / 51 Applications of Least-Squares - Polynomial Data Fitting Examples c = (A1T ∗ A1)−1 ∗ A1T ∗ y = [ −.18966 1.53448 ] Thus the desired equation is given by l(x) = −.18966 + 1.53448x We can also measure the degree to which this comes close to being an actual solution (which would only exist if the points were colinear). Given c, the vector y1 := A1 ∗ c =   2.8793 5.9483 10.5517 13.6207   Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 41 / 51 Applications of Least-Squares - Polynomial Data Fitting Examples c = (A1T ∗ A1)−1 ∗ A1T ∗ y = [ −.18966 1.53448 ] Thus the desired equation is given by l(x) = −.18966 + 1.53448x We can also measure the degree to which this comes close to being an actual solution (which would only exist if the points were colinear). Given c, the vector y1 := A1 ∗ c =   2.8793 5.9483 10.5517 13.6207   Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 41 / 51 Applications of Least-Squares - Polynomial Data Fitting Examples is (by the above) the least-squares approximation to y by a vector in the column space of A1 (accurate to 4 decimal places). The accuracy can then be estimated by the distance of this approximation to the original vector y: e1 := ‖y− y1‖ = 0.68229 The last computation in this example indicates what is being minimized when one fits data points in this way. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 42 / 51 Applications of Least-Squares - Polynomial Data Fitting Examples is (by the above) the least-squares approximation to y by a vector in the column space of A1 (accurate to 4 decimal places). The accuracy can then be estimated by the distance of this approximation to the original vector y: e1 := ‖y− y1‖ = 0.68229 The last computation in this example indicates what is being minimized when one fits data points in this way. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 42 / 51 Applications of Least-Squares - Polynomial Data Fitting Remark Using least-squares linear approximation techniques to find the best linear fit to a set of n data points {(x1, y1), (x2, y2), . . . , (xn, yn)} results in the equation of a line l(x) = c0 + c1(x) which minimizes the sum of the squares of the vertical distances from the given points to the line: n∑ i=1 (yi − l(xi ))2 Unless the line is horizontal, the vertical distance will be slightly larger than the actual distance, which is measured in the direction orthogonal to the line, and minimizing the sum of squares of those distances would correspond geometrically to what one might normally think of as constituting a least-squares fit. However this linear algebraic approach provides a simple and efficient method for finding a good approximation by a line which will be exact whenever the points are colinear. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 43 / 51 Applications of Least-Squares - Polynomial Data Fitting The setup above provides a method for finding not just linear approximations, but higher order ones as well. The linear algebra is essentially the same. To illustrate, Examples Suppose instead we were asked to find the least-squares fit by a quadratic equation to the same set set of points (2, 3), (4, 6), (7, 10), (9, 14). As before, this problem can be represented by the matrix equation A2 ∗ c = y Where A2 =   1 2 4 1 4 16 1 7 49 1 9 81  , c =  c0c1 c2  , and y =   3 6 10 14  . We note that the matrix A2 is again full rank (it has rank 3). Therefore least-squares solution is unique and given by Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 44 / 51 Applications of Least-Squares - Polynomial Data Fitting Examples c = (A2T ∗ A2)−1 ∗ A2T ∗ y =  0.9603450.984483 0.050000   Thus the desired equation is given by q(x) = 0.960345 + 0.984483x + 0.050000x2 Measuring the degree to which this comes close to being an actual solution (which would only exist if the points all lay on the same quadratic graph), we compute y2 := A2 ∗ c =   3.1293 5.6983 10.3017 13.8707   Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 45 / 51 Applications of Least-Squares - Polynomial Data Fitting Examples which is (by the above) the least-squares approximation to y by a vector in the column space of A2 (accurate to 4 decimal places). The accuracy can then be estimated by the distance of this approximation to the original vector y: e2 := ‖y− y2‖ = 0.46424 As with the linear fit, the quantity being minimized is the sum of squares of vertical distances of the original points to the graph of this quadratic function. Notice the modest improvement; from 0.68229 to 0.46424. Because the column space of A2 contains the columns space of A1, the least-squares approximation y2 has to be at least as good as the linear one y1, and almost always will be closer to the original vector y. We will illustrate our final point by looking at what happens if we go one degree higher. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 46 / 51 Applications of Least-Squares - Polynomial Data Fitting Examples We will find the least-squares fit by a cubic equation to the same set set of points (2, 3), (4, 6), (7, 10), (9, 14). As before, this problem can be represented by the matrix equation A3 ∗ c = y Where A3 =   1 2 4 8 1 4 16 64 1 7 49 343 1 9 81 729  , c =   c0 c1 c2 c3  , and y =   3 6 10 14  . The matrix A3 is still full rank (it has rank 4). Therefore least-squares solution is unique and given by c = (A3T ∗ A3)−1 ∗ A3T ∗ y =   −1.6 2.890476 −0.342857 0.023810   Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 47 / 51 Applications of Least-Squares - Polynomial Data Fitting Examples Thus the desired equation is given by f (x) = −1.6 + 2.890476x +−0.342857x2 + 0.023810x3 However, now when we compute the least-squares approximation we get y3 := A3 ∗ c =   3 6 10 14   which is not just an approximation but rather the vector y on the nose; e3 = 0. In other words, given these four points, there is a unique cubic equation which fits the points exactly. Inspecting the computation more carefully, we see why: the matrix A3 is both full rank and square. In other words, non-singular. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 48 / 51 Applications of Least-Squares - Polynomial Data Fitting Examples In this case the system of equations is no longer over determined but rather balanced. And with a non-singular coefficient matrix, we get a unique solution. Symbolically, this can be seen by noting that the non-singularity of A3 results in a simplified expression for c, confirming it is indeed an exact solution: c = (A3T ∗ A3)−1 ∗ A3T ∗ y = A3−1 ∗ (A3T )−1 ∗ A3T ∗ y = A3−1 ∗ y This set of examples, in which we compute successively higher order approximations to a set of n data points until we finally arrive at an exact fit, is part of a more general phenomenon, which we record without proof by the following theorem. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 49 / 51 Applications of Least-Squares - Polynomial Data Fitting Theorem Given n points in R2 with distinct x -coordinates {(x1, y1), (x2, y2), . . . , (xn, yn)}, the least-squares fit by a polynomial of degree k is computed by finding the least-squares solution to the matrix equation Ak ∗ c = y where y = [y1 y2 . . . yn] and Ak is the n × (k + 1) matrix with Ak(i , j) = x j−1 i . Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 50 / 51 Applications of Least-Squares - Polynomial Data Fitting Theorem - cont. d The matrix Ak will have full column rank for all k ≤ (n − 1), and so the least-squares solution c is unique and given by [c0 c1 . . . ck ]T = c = (ATk ∗ Ak) −1 ∗ ATk ∗ y with degree k polynomial least-squares fit given by pk(x) = k∑ i=0 ci x i Because An−1 is non-singular, there will be a polynomial of degree at most (n − 1) which fits the points exactly. Moreover, the polynomial of degree at most (n − 1) which accomplishes this will be unique. Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 51 / 51 Inner Products on Rn The Dot Product on Rn Inner Products Orthogonal Vectors and Subspaces Projections and Least Squares Least-Squares Approximation Applications of Least-Squares