Linear Algebra Module IV: Inner Product Spaces
Linear Algebra Module IV: Inner Product Spaces
Summer Term 2021
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 1 / 51
Table of Contents
1 Inner Products on Rn
The Dot Product on Rn
Inner Products
Orthogonal Vectors and Subspaces
Projections and Least Squares
Least-Squares Approximation
Applications of Least-Squares
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 2 / 51
Table of Contents
1 Inner Products on Rn
The Dot Product on Rn
Inner Products
Orthogonal Vectors and Subspaces
Projections and Least Squares
Least-Squares Approximation
Applications of Least-Squares
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 3 / 51
The Dot Product on Rn
Consider Rn equipped with its standard basis (so that vectors in Rn are
canonically identified with their n × 1 coordinate representations). Given
v = [v1 v2 . . . vn]T ,w = [w1 w2 . . . wn]T ∈ Rn, their dot product (also
referred to as scalar product) is given by
v ·w := vT ∗w =
n∑
i=1
vi wi
This operation on pairs of vectors satisfies three basic properties
(IP1) It is symmetric:
v ·w = w · v
(IP2) It is bilinear:
(α1v1 + α2v2) ·w = α1v1 ·w + α2v2 ·w
v · (β1w1 + β2w2) = β1v ·w1 + β2v ·w2
(IP3) It is positive non-degenerate:
v · v ≥ 0; v · v = 0 iff v = 0
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 4 / 51
The Dot Product on Rn
Consider Rn equipped with its standard basis (so that vectors in Rn are
canonically identified with their n × 1 coordinate representations). Given
v = [v1 v2 . . . vn]T ,w = [w1 w2 . . . wn]T ∈ Rn, their dot product (also
referred to as scalar product) is given by
v ·w := vT ∗w =
n∑
i=1
vi wi
This operation on pairs of vectors satisfies three basic properties
(IP1) It is symmetric:
v ·w = w · v
(IP2) It is bilinear:
(α1v1 + α2v2) ·w = α1v1 ·w + α2v2 ·w
v · (β1w1 + β2w2) = β1v ·w1 + β2v ·w2
(IP3) It is positive non-degenerate:
v · v ≥ 0; v · v = 0 iff v = 0
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 4 / 51
The Dot Product on Rn
Consider Rn equipped with its standard basis (so that vectors in Rn are
canonically identified with their n × 1 coordinate representations). Given
v = [v1 v2 . . . vn]T ,w = [w1 w2 . . . wn]T ∈ Rn, their dot product (also
referred to as scalar product) is given by
v ·w := vT ∗w =
n∑
i=1
vi wi
This operation on pairs of vectors satisfies three basic properties
(IP1) It is symmetric:
v ·w = w · v
(IP2) It is bilinear:
(α1v1 + α2v2) ·w = α1v1 ·w + α2v2 ·w
v · (β1w1 + β2w2) = β1v ·w1 + β2v ·w2
(IP3) It is positive non-degenerate:
v · v ≥ 0; v · v = 0 iff v = 0
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 4 / 51
The Dot Product on Rn
Consider Rn equipped with its standard basis (so that vectors in Rn are
canonically identified with their n × 1 coordinate representations). Given
v = [v1 v2 . . . vn]T ,w = [w1 w2 . . . wn]T ∈ Rn, their dot product (also
referred to as scalar product) is given by
v ·w := vT ∗w =
n∑
i=1
vi wi
This operation on pairs of vectors satisfies three basic properties
(IP1) It is symmetric:
v ·w = w · v
(IP2) It is bilinear:
(α1v1 + α2v2) ·w = α1v1 ·w + α2v2 ·w
v · (β1w1 + β2w2) = β1v ·w1 + β2v ·w2
(IP3) It is positive non-degenerate:
v · v ≥ 0; v · v = 0 iff v = 0
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 4 / 51
The Dot Product on Rn
Consider Rn equipped with its standard basis (so that vectors in Rn are
canonically identified with their n × 1 coordinate representations). Given
v = [v1 v2 . . . vn]T ,w = [w1 w2 . . . wn]T ∈ Rn, their dot product (also
referred to as scalar product) is given by
v ·w := vT ∗w =
n∑
i=1
vi wi
This operation on pairs of vectors satisfies three basic properties
(IP1) It is symmetric:
v ·w = w · v
(IP2) It is bilinear:
(α1v1 + α2v2) ·w = α1v1 ·w + α2v2 ·w
v · (β1w1 + β2w2) = β1v ·w1 + β2v ·w2
(IP3) It is positive non-degenerate:
v · v ≥ 0; v · v = 0 iff v = 0
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 4 / 51
The Dot Product on Rn
Consider Rn equipped with its standard basis (so that vectors in Rn are
canonically identified with their n × 1 coordinate representations). Given
v = [v1 v2 . . . vn]T ,w = [w1 w2 . . . wn]T ∈ Rn, their dot product (also
referred to as scalar product) is given by
v ·w := vT ∗w =
n∑
i=1
vi wi
This operation on pairs of vectors satisfies three basic properties
(IP1) It is symmetric:
v ·w = w · v
(IP2) It is bilinear:
(α1v1 + α2v2) ·w = α1v1 ·w + α2v2 ·w
v · (β1w1 + β2w2) = β1v ·w1 + β2v ·w2
(IP3) It is positive non-degenerate:
v · v ≥ 0; v · v = 0 iff v = 0
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 4 / 51
The Dot Product on Rn – The Norm of a Vector
Note that the standard Euclidean norm of v (also referred to as the
L2-norm) is closely related to the dot product; precisely
‖v‖ = ‖v‖2 =
√
v21 + v
2
2 + . . . v2n = (v · v)
1
2 (1)
Its essential features are
(N1) It is positive definite:
‖v‖ ≥ 0; ‖v‖ = 0 iff v = 0
(N2) It satisfies the triangle inequality (for norms):
‖v + w‖ ≤ ‖v‖+ ‖w‖
This is the norm used in Rn to define the standard Euclidean metric, which
is the conventional way to measure the distance between two vectors:
d(v,w) := ‖v−w‖2 (2)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 5 / 51
The Dot Product on Rn – The Norm of a Vector
Note that the standard Euclidean norm of v (also referred to as the
L2-norm) is closely related to the dot product; precisely
‖v‖ = ‖v‖2 =
√
v21 + v
2
2 + . . . v2n = (v · v)
1
2 (1)
Its essential features are
(N1) It is positive definite:
‖v‖ ≥ 0; ‖v‖ = 0 iff v = 0
(N2) It satisfies the triangle inequality (for norms):
‖v + w‖ ≤ ‖v‖+ ‖w‖
This is the norm used in Rn to define the standard Euclidean metric, which
is the conventional way to measure the distance between two vectors:
d(v,w) := ‖v−w‖2 (2)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 5 / 51
The Dot Product on Rn – The Norm of a Vector
Note that the standard Euclidean norm of v (also referred to as the
L2-norm) is closely related to the dot product; precisely
‖v‖ = ‖v‖2 =
√
v21 + v
2
2 + . . . v2n = (v · v)
1
2 (1)
Its essential features are
(N1) It is positive definite:
‖v‖ ≥ 0; ‖v‖ = 0 iff v = 0
(N2) It satisfies the triangle inequality (for norms):
‖v + w‖ ≤ ‖v‖+ ‖w‖
This is the norm used in Rn to define the standard Euclidean metric, which
is the conventional way to measure the distance between two vectors:
d(v,w) := ‖v−w‖2 (2)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 5 / 51
The Dot Product on Rn – The Norm of a Vector
Note that the standard Euclidean norm of v (also referred to as the
L2-norm) is closely related to the dot product; precisely
‖v‖ = ‖v‖2 =
√
v21 + v
2
2 + . . . v2n = (v · v)
1
2 (1)
Its essential features are
(N1) It is positive definite:
‖v‖ ≥ 0; ‖v‖ = 0 iff v = 0
(N2) It satisfies the triangle inequality (for norms):
‖v + w‖ ≤ ‖v‖+ ‖w‖
This is the norm used in Rn to define the standard Euclidean metric, which
is the conventional way to measure the distance between two vectors:
d(v,w) := ‖v−w‖2 (2)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 5 / 51
The Dot Product on Rn – The Norm of a Vector
Note that the standard Euclidean norm of v (also referred to as the
L2-norm) is closely related to the dot product; precisely
‖v‖ = ‖v‖2 =
√
v21 + v
2
2 + . . . v2n = (v · v)
1
2 (1)
Its essential features are
(N1) It is positive definite:
‖v‖ ≥ 0; ‖v‖ = 0 iff v = 0
(N2) It satisfies the triangle inequality (for norms):
‖v + w‖ ≤ ‖v‖+ ‖w‖
This is the norm used in Rn to define the standard Euclidean metric, which
is the conventional way to measure the distance between two vectors:
d(v,w) := ‖v−w‖2 (2)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 5 / 51
The Dot Product on Rn – The Norm of a Vector
Note that the standard Euclidean norm of v (also referred to as the
L2-norm) is closely related to the dot product; precisely
‖v‖ = ‖v‖2 =
√
v21 + v
2
2 + . . . v2n = (v · v)
1
2 (1)
Its essential features are
(N1) It is positive definite:
‖v‖ ≥ 0; ‖v‖ = 0 iff v = 0
(N2) It satisfies the triangle inequality (for norms):
‖v + w‖ ≤ ‖v‖+ ‖w‖
This is the norm used in Rn to define the standard Euclidean metric, which
is the conventional way to measure the distance between two vectors:
d(v,w) := ‖v−w‖2 (2)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 5 / 51
The Dot Product on Rn – The Norm of a Vector
Note that the standard Euclidean norm of v (also referred to as the
L2-norm) is closely related to the dot product; precisely
‖v‖ = ‖v‖2 =
√
v21 + v
2
2 + . . . v2n = (v · v)
1
2 (1)
Its essential features are
(N1) It is positive definite:
‖v‖ ≥ 0; ‖v‖ = 0 iff v = 0
(N2) It satisfies the triangle inequality (for norms):
‖v + w‖ ≤ ‖v‖+ ‖w‖
This is the norm used in Rn to define the standard Euclidean metric, which
is the conventional way to measure the distance between two vectors:
d(v,w) := ‖v−w‖2 (2)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 5 / 51
The Dot Product on Rn – Distance Between Vectors
Again, this distance function – or metric – satisfies three basic properties,
which are direct consequences of the ones above.
(M1) It is symmetric:
d(v,w) = d(w, v)
(M2) It is positive non-degenerate:
d(v,w) ≥ 0 ∀v,w ∈ Rn; moreover d(v,w) = 0 iff v = w
(M3 ) It satisfies the triangle inequality (for metrics):
d(u,w) ≤ d(u, v) + d(v,w) ∀u, v,w ∈ Rn
So the i) dot product, ii) Euclidean norm, and iii) Euclidean distance are
all closely related.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 6 / 51
The Dot Product on Rn – Distance Between Vectors
Again, this distance function – or metric – satisfies three basic properties,
which are direct consequences of the ones above.
(M1) It is symmetric:
d(v,w) = d(w, v)
(M2) It is positive non-degenerate:
d(v,w) ≥ 0 ∀v,w ∈ Rn; moreover d(v,w) = 0 iff v = w
(M3 ) It satisfies the triangle inequality (for metrics):
d(u,w) ≤ d(u, v) + d(v,w) ∀u, v,w ∈ Rn
So the i) dot product, ii) Euclidean norm, and iii) Euclidean distance are
all closely related.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 6 / 51
The Dot Product on Rn – Distance Between Vectors
Again, this distance function – or metric – satisfies three basic properties,
which are direct consequences of the ones above.
(M1) It is symmetric:
d(v,w) = d(w, v)
(M2) It is positive non-degenerate:
d(v,w) ≥ 0 ∀v,w ∈ Rn; moreover d(v,w) = 0 iff v = w
(M3 ) It satisfies the triangle inequality (for metrics):
d(u,w) ≤ d(u, v) + d(v,w) ∀u, v,w ∈ Rn
So the i) dot product, ii) Euclidean norm, and iii) Euclidean distance are
all closely related.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 6 / 51
The Dot Product on Rn – Distance Between Vectors
Again, this distance function – or metric – satisfies three basic properties,
which are direct consequences of the ones above.
(M1) It is symmetric:
d(v,w) = d(w, v)
(M2) It is positive non-degenerate:
d(v,w) ≥ 0 ∀v,w ∈ Rn; moreover d(v,w) = 0 iff v = w
(M3 ) It satisfies the triangle inequality (for metrics):
d(u,w) ≤ d(u, v) + d(v,w) ∀u, v,w ∈ Rn
So the i) dot product, ii) Euclidean norm, and iii) Euclidean distance are
all closely related.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 6 / 51
The Dot Product on Rn – Distance Between Vectors
Again, this distance function – or metric – satisfies three basic properties,
which are direct consequences of the ones above.
(M1) It is symmetric:
d(v,w) = d(w, v)
(M2) It is positive non-degenerate:
d(v,w) ≥ 0 ∀v,w ∈ Rn; moreover d(v,w) = 0 iff v = w
(M3 ) It satisfies the triangle inequality (for metrics):
d(u,w) ≤ d(u, v) + d(v,w) ∀u, v,w ∈ Rn
So the i) dot product, ii) Euclidean norm, and iii) Euclidean distance are
all closely related.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 6 / 51
The Dot Product on Rn – Equivalence of Constructions
In fact, any one of them determines the other two. Obviously, the dot
product determines the norm, and the norm determines the distance. But
also one has
the equality
v ·w =
1
2
(‖v + w‖2 − ‖v‖2 − ‖w‖2)
so that one can also recover the dot product from the norm;
‖v‖ = d(v, 0), so d(−,− ) and ‖−‖ determine each other.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 7 / 51
The Dot Product on Rn – Equivalence of Constructions
In fact, any one of them determines the other two. Obviously, the dot
product determines the norm, and the norm determines the distance. But
also one has
the equality
v ·w =
1
2
(‖v + w‖2 − ‖v‖2 − ‖w‖2)
so that one can also recover the dot product from the norm;
‖v‖ = d(v, 0), so d(−,− ) and ‖−‖ determine each other.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 7 / 51
The Dot Product on Rn – Equivalence of Constructions
In fact, any one of them determines the other two. Obviously, the dot
product determines the norm, and the norm determines the distance. But
also one has
the equality
v ·w =
1
2
(‖v + w‖2 − ‖v‖2 − ‖w‖2)
so that one can also recover the dot product from the norm;
‖v‖ = d(v, 0), so d(−,− ) and ‖−‖ determine each other.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 7 / 51
The Dot Product on Rn – The Angle between Two Vectors
If v, w are two non-zero vectors in Rn, then there span is at most
two-dimensional, and so we may choose a plane in Rn containing them.
Within that plane the vectors determine two angles. We define θv,w to be
the smaller of the two angles; this angle will always lie between 0 and π
radians
[Note: Except in the case the two vectors are colinear and pointing in
opposite directions, the smaller angle will be unique and less than π
radians. In that one remaining case the choice of angle representative will
not be unique, but both angles will be equal to π, so the angle itself is
well-defined.]
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 8 / 51
The Dot Product on Rn – The Angle between Two Vectors
If v, w are two non-zero vectors in Rn, then there span is at most
two-dimensional, and so we may choose a plane in Rn containing them.
Within that plane the vectors determine two angles. We define θv,w to be
the smaller of the two angles; this angle will always lie between 0 and π
radians
[Note: Except in the case the two vectors are colinear and pointing in
opposite directions, the smaller angle will be unique and less than π
radians. In that one remaining case the choice of angle representative will
not be unique, but both angles will be equal to π, so the angle itself is
well-defined.]
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 8 / 51
The Dot Product on Rn – The Angle between Two Vectors
If v, w are two non-zero vectors in Rn, then there span is at most
two-dimensional, and so we may choose a plane in Rn containing them.
Within that plane the vectors determine two angles. We define θv,w to be
the smaller of the two angles; this angle will always lie between 0 and π
radians
[Note: Except in the case the two vectors are colinear and pointing in
opposite directions, the smaller angle will be unique and less than π
radians. In that one remaining case the choice of angle representative will
not be unique, but both angles will be equal to π, so the angle itself is
well-defined.]
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 8 / 51
The Dot Product on Rn – The Angle Between Two Vectors
The angle formula is given by the following lemma.
Lemma
For Rn 3 v,w 6= 0
v ·w = ‖v‖‖w‖ cos(θv,w)
As cos is 1-1 on the interval [0, π], this equality implies
Corollary
For v,w as above
θv,w = arccos (v ·w/(‖v‖‖w‖))
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 9 / 51
The Dot Product on Rn – The Angle Between Two Vectors
The angle formula is given by the following lemma.
Lemma
For Rn 3 v,w 6= 0
v ·w = ‖v‖‖w‖ cos(θv,w)
As cos is 1-1 on the interval [0, π], this equality implies
Corollary
For v,w as above
θv,w = arccos (v ·w/(‖v‖‖w‖))
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 9 / 51
The Dot Product on Rn – The Angle Between Two Vectors
The angle formula is given by the following lemma.
Lemma
For Rn 3 v,w 6= 0
v ·w = ‖v‖‖w‖ cos(θv,w)
As cos is 1-1 on the interval [0, π], this equality implies
Corollary
For v,w as above
θv,w = arccos (v ·w/(‖v‖‖w‖))
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 9 / 51
The Dot Product on Rn – The Angle Between Two Vectors
The angle formula is given by the following lemma.
Lemma
For Rn 3 v,w 6= 0
v ·w = ‖v‖‖w‖ cos(θv,w)
As cos is 1-1 on the interval [0, π], this equality implies
Corollary
For v,w as above
θv,w = arccos (v ·w/(‖v‖‖w‖))
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 9 / 51
The Dot Product on Rn – The Angle Between Two Vectors
The angle formula is given by the following lemma.
Lemma
For Rn 3 v,w 6= 0
v ·w = ‖v‖‖w‖ cos(θv,w)
As cos is 1-1 on the interval [0, π], this equality implies
Corollary
For v,w as above
θv,w = arccos (v ·w/(‖v‖‖w‖))
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 9 / 51
The Dot Product on Rn – The Angle Between Two Vectors
The angle formula is given by the following lemma.
Lemma
For Rn 3 v,w 6= 0
v ·w = ‖v‖‖w‖ cos(θv,w)
As cos is 1-1 on the interval [0, π], this equality implies
Corollary
For v,w as above
θv,w = arccos (v ·w/(‖v‖‖w‖))
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 9 / 51
The Dot Product on Rn – The Angle Between Two Vectors
Examples
Suppose v =
3−1
4
,w =
−21
5
, and we want to compute the angle θv,w
determined by v and w.
Using Octave or MATLAB, we see that (in ”short” format)
v ·w = 13;
‖v‖ = 5.0990;
‖w‖ = 5.4772;
Therefore
θv,w = arccos (v ·w/(‖v‖‖w‖))
= arccos(13/(5.0990 ∗ 5.4772)) = arccos(0.4655) = 1.0866 radians
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 10 / 51
The Dot Product on Rn – The Angle Between Two Vectors
Examples
Suppose v =
3−1
4
,w =
−21
5
, and we want to compute the angle θv,w
determined by v and w.
Using Octave or MATLAB, we see that (in ”short” format)
v ·w = 13;
‖v‖ = 5.0990;
‖w‖ = 5.4772;
Therefore
θv,w = arccos (v ·w/(‖v‖‖w‖))
= arccos(13/(5.0990 ∗ 5.4772)) = arccos(0.4655) = 1.0866 radians
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 10 / 51
The Dot Product on Rn – The Angle Between Two Vectors
Examples
Suppose v =
3−1
4
,w =
−21
5
, and we want to compute the angle θv,w
determined by v and w.
Using Octave or MATLAB, we see that (in ”short” format)
v ·w = 13;
‖v‖ = 5.0990;
‖w‖ = 5.4772;
Therefore
θv,w = arccos (v ·w/(‖v‖‖w‖))
= arccos(13/(5.0990 ∗ 5.4772)) = arccos(0.4655) = 1.0866 radians
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 10 / 51
The Dot Product on Rn – The Angle Between Two Vectors
Examples
Suppose v =
3−1
4
,w =
−21
5
, and we want to compute the angle θv,w
determined by v and w.
Using Octave or MATLAB, we see that (in ”short” format)
v ·w = 13;
‖v‖ = 5.0990;
‖w‖ = 5.4772;
Therefore
θv,w = arccos (v ·w/(‖v‖‖w‖))
= arccos(13/(5.0990 ∗ 5.4772)) = arccos(0.4655) = 1.0866 radians
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 10 / 51
The Dot Product on Rn – The Angle Between Two Vectors
Examples
Suppose v =
3−1
4
,w =
−21
5
, and we want to compute the angle θv,w
determined by v and w.
Using Octave or MATLAB, we see that (in ”short” format)
v ·w = 13;
‖v‖ = 5.0990;
‖w‖ = 5.4772;
Therefore
θv,w = arccos (v ·w/(‖v‖‖w‖))
= arccos(13/(5.0990 ∗ 5.4772)) = arccos(0.4655) = 1.0866 radians
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 10 / 51
The Dot Product on Rn – The Angle Between Two Vectors
Examples
Suppose v =
3−1
4
,w =
−21
5
, and we want to compute the angle θv,w
determined by v and w.
Using Octave or MATLAB, we see that (in ”short” format)
v ·w = 13;
‖v‖ = 5.0990;
‖w‖ = 5.4772;
Therefore
θv,w = arccos (v ·w/(‖v‖‖w‖))
= arccos(13/(5.0990 ∗ 5.4772)) = arccos(0.4655) = 1.0866 radians
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 10 / 51
The Dot Product on Rn – The Angle Between Two Vectors
Examples
Suppose v =
3−1
4
,w =
−21
5
, and we want to compute the angle θv,w
determined by v and w.
Using Octave or MATLAB, we see that (in ”short” format)
v ·w = 13;
‖v‖ = 5.0990;
‖w‖ = 5.4772;
Therefore
θv,w = arccos (v ·w/(‖v‖‖w‖))
= arccos(13/(5.0990 ∗ 5.4772)) = arccos(0.4655) = 1.0866 radians
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 10 / 51
The Dot Product on Rn – The Angle Between Two Vectors
Examples
Suppose v =
3−1
4
,w =
−21
5
, and we want to compute the angle θv,w
determined by v and w.
Using Octave or MATLAB, we see that (in ”short” format)
v ·w = 13;
‖v‖ = 5.0990;
‖w‖ = 5.4772;
Therefore
θv,w = arccos (v ·w/(‖v‖‖w‖))
= arccos(13/(5.0990 ∗ 5.4772)) = arccos(0.4655) = 1.0866 radians
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 10 / 51
Inner Products
The dot product defined in the previous section is a specific example of a
bilinear, symmetric, positive definite pairing on the vector space Rn.
We consider these properties in sequence, for an arbitrary vector space V
over R.
A bilinear pairing on V is a map P : V× V→ R which simply satisfies
property
(IP2)
P(α1v1 + α2v2,w) = α1P(v1,w) + α2P(v2,w)
P(v, β1w1 + β2w2) = β1P(v,w1) + β2P(v,w2)
A symmetric bilinear pairing is a bilinear pairing that also satisfies
(IP1)
P(v,w) = P(w, v)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 11 / 51
Inner Products
The dot product defined in the previous section is a specific example of a
bilinear, symmetric, positive definite pairing on the vector space Rn.
We consider these properties in sequence, for an arbitrary vector space V
over R.
A bilinear pairing on V is a map P : V× V→ R which simply satisfies
property
(IP2)
P(α1v1 + α2v2,w) = α1P(v1,w) + α2P(v2,w)
P(v, β1w1 + β2w2) = β1P(v,w1) + β2P(v,w2)
A symmetric bilinear pairing is a bilinear pairing that also satisfies
(IP1)
P(v,w) = P(w, v)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 11 / 51
Inner Products
The dot product defined in the previous section is a specific example of a
bilinear, symmetric, positive definite pairing on the vector space Rn.
We consider these properties in sequence, for an arbitrary vector space V
over R.
A bilinear pairing on V is a map P : V× V→ R which simply satisfies
property
(IP2)
P(α1v1 + α2v2,w) = α1P(v1,w) + α2P(v2,w)
P(v, β1w1 + β2w2) = β1P(v,w1) + β2P(v,w2)
A symmetric bilinear pairing is a bilinear pairing that also satisfies
(IP1)
P(v,w) = P(w, v)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 11 / 51
Inner Products
The dot product defined in the previous section is a specific example of a
bilinear, symmetric, positive definite pairing on the vector space Rn.
We consider these properties in sequence, for an arbitrary vector space V
over R.
A bilinear pairing on V is a map P : V× V→ R which simply satisfies
property
(IP2)
P(α1v1 + α2v2,w) = α1P(v1,w) + α2P(v2,w)
P(v, β1w1 + β2w2) = β1P(v,w1) + β2P(v,w2)
A symmetric bilinear pairing is a bilinear pairing that also satisfies
(IP1)
P(v,w) = P(w, v)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 11 / 51
Inner Products
The dot product defined in the previous section is a specific example of a
bilinear, symmetric, positive definite pairing on the vector space Rn.
We consider these properties in sequence, for an arbitrary vector space V
over R.
A bilinear pairing on V is a map P : V× V→ R which simply satisfies
property
(IP2)
P(α1v1 + α2v2,w) = α1P(v1,w) + α2P(v2,w)
P(v, β1w1 + β2w2) = β1P(v,w1) + β2P(v,w2)
A symmetric bilinear pairing is a bilinear pairing that also satisfies
(IP1)
P(v,w) = P(w, v)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 11 / 51
Inner Products
The dot product defined in the previous section is a specific example of a
bilinear, symmetric, positive definite pairing on the vector space Rn.
We consider these properties in sequence, for an arbitrary vector space V
over R.
A bilinear pairing on V is a map P : V× V→ R which simply satisfies
property
(IP2)
P(α1v1 + α2v2,w) = α1P(v1,w) + α2P(v2,w)
P(v, β1w1 + β2w2) = β1P(v,w1) + β2P(v,w2)
A symmetric bilinear pairing is a bilinear pairing that also satisfies
(IP1)
P(v,w) = P(w, v)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 11 / 51
Inner Products on Rn – Matrix Representation of Bilinear
Pairings
When V = Rn, these pairings admit a straightforward matrix
representation, not unlike the matrix representation of linear
transformations discussed previously. Again, we assume we are looking at
coordinate vectors with respect to the standard basis for Rn.
Theorem
For any bilinear pairing P on Rn, there is a unique n × n matrix AP such
that
P(v,w) = vT ∗ AP ∗w
Moreover, if P is symmetric then so is AP . Conversely, any n × n matrix
A determines a unique bilinear pairing PA on Rn by
PA(v,w) = vT ∗ A ∗w
which is symmetric precisely when A is.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 12 / 51
Inner Products on Rn – Matrix Representation of Bilinear
Pairings
When V = Rn, these pairings admit a straightforward matrix
representation, not unlike the matrix representation of linear
transformations discussed previously. Again, we assume we are looking at
coordinate vectors with respect to the standard basis for Rn.
Theorem
For any bilinear pairing P on Rn, there is a unique n × n matrix AP such
that
P(v,w) = vT ∗ AP ∗w
Moreover, if P is symmetric then so is AP . Conversely, any n × n matrix
A determines a unique bilinear pairing PA on Rn by
PA(v,w) = vT ∗ A ∗w
which is symmetric precisely when A is.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 12 / 51
Inner Products on Rn – Matrix Representation of Bilinear
Pairings
When V = Rn, these pairings admit a straightforward matrix
representation, not unlike the matrix representation of linear
transformations discussed previously. Again, we assume we are looking at
coordinate vectors with respect to the standard basis for Rn.
Theorem
For any bilinear pairing P on Rn, there is a unique n × n matrix AP such
that
P(v,w) = vT ∗ AP ∗w
Moreover, if P is symmetric then so is AP . Conversely, any n × n matrix
A determines a unique bilinear pairing PA on Rn by
PA(v,w) = vT ∗ A ∗w
which is symmetric precisely when A is.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 12 / 51
Inner Products on Rn – Matrix Representation of Bilinear
Pairings
When V = Rn, these pairings admit a straightforward matrix
representation, not unlike the matrix representation of linear
transformations discussed previously. Again, we assume we are looking at
coordinate vectors with respect to the standard basis for Rn.
Theorem
For any bilinear pairing P on Rn, there is a unique n × n matrix AP such
that
P(v,w) = vT ∗ AP ∗w
Moreover, if P is symmetric then so is AP . Conversely, any n × n matrix
A determines a unique bilinear pairing PA on Rn by
PA(v,w) = vT ∗ A ∗w
which is symmetric precisely when A is.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 12 / 51
Inner Products on Rn – Matrix Representation of Bilinear
Pairings
When V = Rn, these pairings admit a straightforward matrix
representation, not unlike the matrix representation of linear
transformations discussed previously. Again, we assume we are looking at
coordinate vectors with respect to the standard basis for Rn.
Theorem
For any bilinear pairing P on Rn, there is a unique n × n matrix AP such
that
P(v,w) = vT ∗ AP ∗w
Moreover, if P is symmetric then so is AP .
Conversely, any n × n matrix
A determines a unique bilinear pairing PA on Rn by
PA(v,w) = vT ∗ A ∗w
which is symmetric precisely when A is.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 12 / 51
Inner Products on Rn – Matrix Representation of Bilinear
Pairings
When V = Rn, these pairings admit a straightforward matrix
representation, not unlike the matrix representation of linear
transformations discussed previously. Again, we assume we are looking at
coordinate vectors with respect to the standard basis for Rn.
Theorem
For any bilinear pairing P on Rn, there is a unique n × n matrix AP such
that
P(v,w) = vT ∗ AP ∗w
Moreover, if P is symmetric then so is AP . Conversely, any n × n matrix
A determines a unique bilinear pairing PA on Rn by
PA(v,w) = vT ∗ A ∗w
which is symmetric precisely when A is.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 12 / 51
Inner Products on Rn – Matrix Representation of Bilinear
Pairings
When V = Rn, these pairings admit a straightforward matrix
representation, not unlike the matrix representation of linear
transformations discussed previously. Again, we assume we are looking at
coordinate vectors with respect to the standard basis for Rn.
Theorem
For any bilinear pairing P on Rn, there is a unique n × n matrix AP such
that
P(v,w) = vT ∗ AP ∗w
Moreover, if P is symmetric then so is AP . Conversely, any n × n matrix
A determines a unique bilinear pairing PA on Rn by
PA(v,w) = vT ∗ A ∗w
which is symmetric precisely when A is.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 12 / 51
Inner Products on Rn – Matrix Representation of Bilinear
Pairings
When V = Rn, these pairings admit a straightforward matrix
representation, not unlike the matrix representation of linear
transformations discussed previously. Again, we assume we are looking at
coordinate vectors with respect to the standard basis for Rn.
Theorem
For any bilinear pairing P on Rn, there is a unique n × n matrix AP such
that
P(v,w) = vT ∗ AP ∗w
Moreover, if P is symmetric then so is AP . Conversely, any n × n matrix
A determines a unique bilinear pairing PA on Rn by
PA(v,w) = vT ∗ A ∗w
which is symmetric precisely when A is.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 12 / 51
Inner Products – The Positive Definite Property
Definition
An inner product on V is a symmetric bilinear pairing P that is also
positive definite:
(IP3)
P(v, v) ≥ 0; P(v, v) = 0 iff v = 0
However, unlike properties (IP1) and (IP2), (IP3) is harder to translate
into properties of the representing matrix. In fact, this last property is
closely related to the property of diagonalizability for symmetric matrices.
Theorem
Let P be a symmetric bilinear pairing on Rn, represented (as above) by the
real symmetric matrix AP . Then P is positive definite precisely when all of
the eigenvalues of AP are positive.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 13 / 51
Inner Products – The Positive Definite Property
Definition
An inner product on V is a symmetric bilinear pairing P that is also
positive definite:
(IP3)
P(v, v) ≥ 0; P(v, v) = 0 iff v = 0
However, unlike properties (IP1) and (IP2), (IP3) is harder to translate
into properties of the representing matrix. In fact, this last property is
closely related to the property of diagonalizability for symmetric matrices.
Theorem
Let P be a symmetric bilinear pairing on Rn, represented (as above) by the
real symmetric matrix AP . Then P is positive definite precisely when all of
the eigenvalues of AP are positive.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 13 / 51
Inner Products – The Positive Definite Property
Definition
An inner product on V is a symmetric bilinear pairing P that is also
positive definite:
(IP3)
P(v, v) ≥ 0; P(v, v) = 0 iff v = 0
However, unlike properties (IP1) and (IP2), (IP3) is harder to translate
into properties of the representing matrix. In fact, this last property is
closely related to the property of diagonalizability for symmetric matrices.
Theorem
Let P be a symmetric bilinear pairing on Rn, represented (as above) by the
real symmetric matrix AP . Then P is positive definite precisely when all of
the eigenvalues of AP are positive.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 13 / 51
Inner Products – The Positive Definite Property
Definition
An inner product on V is a symmetric bilinear pairing P that is also
positive definite:
(IP3)
P(v, v) ≥ 0; P(v, v) = 0 iff v = 0
However, unlike properties (IP1) and (IP2), (IP3) is harder to translate
into properties of the representing matrix. In fact, this last property is
closely related to the property of diagonalizability for symmetric matrices.
Theorem
Let P be a symmetric bilinear pairing on Rn, represented (as above) by the
real symmetric matrix AP . Then P is positive definite precisely when all of
the eigenvalues of AP are positive.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 13 / 51
Inner Products – The Positive Definite Property
Definition
An inner product on V is a symmetric bilinear pairing P that is also
positive definite:
(IP3)
P(v, v) ≥ 0; P(v, v) = 0 iff v = 0
However, unlike properties (IP1) and (IP2), (IP3) is harder to translate
into properties of the representing matrix. In fact, this last property is
closely related to the property of diagonalizability for symmetric matrices.
Theorem
Let P be a symmetric bilinear pairing on Rn, represented (as above) by the
real symmetric matrix AP . Then P is positive definite precisely when all of
the eigenvalues of AP are positive.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 13 / 51
Inner Products – Positive Definite Matrices
For this reason we call a real symmetric matrix positive definite iff all of
its eigenvalues are positive.
It is conventional in mathematics to denote inner products, and more
generally bilinear pairings, by the symbol <−,−>. In other words, a
bilinear pairing on a vector space V is denoted by
V × V 3 (v,w) 7→< v,w >∈ R
We have seen above how a given bilinear pairing on Rn is represented by
an n × n matrix, and conversely, how any n × n matrix can be used to
define a bilinear pairing on Rn. The same is true more generally for
bilinear pairings on an n-dimensional vector space V .
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 14 / 51
Inner Products – Positive Definite Matrices
For this reason we call a real symmetric matrix positive definite iff all of
its eigenvalues are positive.
It is conventional in mathematics to denote inner products, and more
generally bilinear pairings, by the symbol <−,−>.
In other words, a
bilinear pairing on a vector space V is denoted by
V × V 3 (v,w) 7→< v,w >∈ R
We have seen above how a given bilinear pairing on Rn is represented by
an n × n matrix, and conversely, how any n × n matrix can be used to
define a bilinear pairing on Rn. The same is true more generally for
bilinear pairings on an n-dimensional vector space V .
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 14 / 51
Inner Products – Positive Definite Matrices
For this reason we call a real symmetric matrix positive definite iff all of
its eigenvalues are positive.
It is conventional in mathematics to denote inner products, and more
generally bilinear pairings, by the symbol <−,−>. In other words, a
bilinear pairing on a vector space V is denoted by
V × V 3 (v,w) 7→< v,w >∈ R
We have seen above how a given bilinear pairing on Rn is represented by
an n × n matrix, and conversely, how any n × n matrix can be used to
define a bilinear pairing on Rn. The same is true more generally for
bilinear pairings on an n-dimensional vector space V .
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 14 / 51
Inner Products – Positive Definite Matrices
For this reason we call a real symmetric matrix positive definite iff all of
its eigenvalues are positive.
It is conventional in mathematics to denote inner products, and more
generally bilinear pairings, by the symbol <−,−>. In other words, a
bilinear pairing on a vector space V is denoted by
V × V 3 (v,w) 7→< v,w >∈ R
We have seen above how a given bilinear pairing on Rn is represented by
an n × n matrix, and conversely, how any n × n matrix can be used to
define a bilinear pairing on Rn. The same is true more generally for
bilinear pairings on an n-dimensional vector space V .
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 14 / 51
Inner Products – Positive Definite Matrices
For this reason we call a real symmetric matrix positive definite iff all of
its eigenvalues are positive.
It is conventional in mathematics to denote inner products, and more
generally bilinear pairings, by the symbol <−,−>. In other words, a
bilinear pairing on a vector space V is denoted by
V × V 3 (v,w) 7→< v,w >∈ R
We have seen above how a given bilinear pairing on Rn is represented by
an n × n matrix, and conversely, how any n × n matrix can be used to
define a bilinear pairing on Rn.
The same is true more generally for
bilinear pairings on an n-dimensional vector space V .
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 14 / 51
Inner Products – Positive Definite Matrices
For this reason we call a real symmetric matrix positive definite iff all of
its eigenvalues are positive.
It is conventional in mathematics to denote inner products, and more
generally bilinear pairings, by the symbol <−,−>. In other words, a
bilinear pairing on a vector space V is denoted by
V × V 3 (v,w) 7→< v,w >∈ R
We have seen above how a given bilinear pairing on Rn is represented by
an n × n matrix, and conversely, how any n × n matrix can be used to
define a bilinear pairing on Rn. The same is true more generally for
bilinear pairings on an n-dimensional vector space V .
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 14 / 51
Inner Products – Matrix Representation for Arbitrary
Vector Spaces
Theorem
Let V be a vector space over R of dimension n, and let P =<−,−> be a
bilinear pairing on V . Then for any basis S of V (not necessarily
orthogonal) there is a matrix A = AP,S depending on both P and S such
that for every v,w ∈ V there is an equality (of real numbers)
< v,w >= SvT ∗ A ∗ Sw (3)
and, conversely, any n × n matrix A determines a bilinear pairing on
<−,−>=<−,−>A on V by the above equation. For fixed basis S this
establishes a 1-1 correspondence between the set of bilinear pairings on V
and the set of n × n real matrices. Moreover, the bilinear pairing is
symmetric iff its matrix representation is a symmetric matrix, and the
symmetric bilinear pairing is positive definite iff its symmetric matrix
representation is positive definite (as defined above).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 15 / 51
Inner Products – Matrix Representation for Arbitrary
Vector Spaces
Theorem
Let V be a vector space over R of dimension n, and let P =<−,−> be a
bilinear pairing on V .
Then for any basis S of V (not necessarily
orthogonal) there is a matrix A = AP,S depending on both P and S such
that for every v,w ∈ V there is an equality (of real numbers)
< v,w >= SvT ∗ A ∗ Sw (3)
and, conversely, any n × n matrix A determines a bilinear pairing on
<−,−>=<−,−>A on V by the above equation. For fixed basis S this
establishes a 1-1 correspondence between the set of bilinear pairings on V
and the set of n × n real matrices. Moreover, the bilinear pairing is
symmetric iff its matrix representation is a symmetric matrix, and the
symmetric bilinear pairing is positive definite iff its symmetric matrix
representation is positive definite (as defined above).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 15 / 51
Inner Products – Matrix Representation for Arbitrary
Vector Spaces
Theorem
Let V be a vector space over R of dimension n, and let P =<−,−> be a
bilinear pairing on V . Then for any basis S of V (not necessarily
orthogonal) there is a matrix A = AP,S depending on both P and S such
that for every v,w ∈ V there is an equality (of real numbers)
< v,w >= SvT ∗ A ∗ Sw (3)
and, conversely, any n × n matrix A determines a bilinear pairing on
<−,−>=<−,−>A on V by the above equation. For fixed basis S this
establishes a 1-1 correspondence between the set of bilinear pairings on V
and the set of n × n real matrices. Moreover, the bilinear pairing is
symmetric iff its matrix representation is a symmetric matrix, and the
symmetric bilinear pairing is positive definite iff its symmetric matrix
representation is positive definite (as defined above).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 15 / 51
Inner Products – Matrix Representation for Arbitrary
Vector Spaces
Theorem
Let V be a vector space over R of dimension n, and let P =<−,−> be a
bilinear pairing on V . Then for any basis S of V (not necessarily
orthogonal) there is a matrix A = AP,S depending on both P and S such
that for every v,w ∈ V there is an equality (of real numbers)
< v,w >= SvT ∗ A ∗ Sw (3)
and, conversely, any n × n matrix A determines a bilinear pairing on
<−,−>=<−,−>A on V by the above equation. For fixed basis S this
establishes a 1-1 correspondence between the set of bilinear pairings on V
and the set of n × n real matrices. Moreover, the bilinear pairing is
symmetric iff its matrix representation is a symmetric matrix, and the
symmetric bilinear pairing is positive definite iff its symmetric matrix
representation is positive definite (as defined above).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 15 / 51
Inner Products – Matrix Representation for Arbitrary
Vector Spaces
Theorem
Let V be a vector space over R of dimension n, and let P =<−,−> be a
bilinear pairing on V . Then for any basis S of V (not necessarily
orthogonal) there is a matrix A = AP,S depending on both P and S such
that for every v,w ∈ V there is an equality (of real numbers)
< v,w >= SvT ∗ A ∗ Sw (3)
and, conversely, any n × n matrix A determines a bilinear pairing on
<−,−>=<−,−>A on V by the above equation.
For fixed basis S this
establishes a 1-1 correspondence between the set of bilinear pairings on V
and the set of n × n real matrices. Moreover, the bilinear pairing is
symmetric iff its matrix representation is a symmetric matrix, and the
symmetric bilinear pairing is positive definite iff its symmetric matrix
representation is positive definite (as defined above).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 15 / 51
Inner Products – Matrix Representation for Arbitrary
Vector Spaces
Theorem
Let V be a vector space over R of dimension n, and let P =<−,−> be a
bilinear pairing on V . Then for any basis S of V (not necessarily
orthogonal) there is a matrix A = AP,S depending on both P and S such
that for every v,w ∈ V there is an equality (of real numbers)
< v,w >= SvT ∗ A ∗ Sw (3)
and, conversely, any n × n matrix A determines a bilinear pairing on
<−,−>=<−,−>A on V by the above equation. For fixed basis S this
establishes a 1-1 correspondence between the set of bilinear pairings on V
and the set of n × n real matrices.
Moreover, the bilinear pairing is
symmetric iff its matrix representation is a symmetric matrix, and the
symmetric bilinear pairing is positive definite iff its symmetric matrix
representation is positive definite (as defined above).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 15 / 51
Inner Products – Matrix Representation for Arbitrary
Vector Spaces
Theorem
Let V be a vector space over R of dimension n, and let P =<−,−> be a
bilinear pairing on V . Then for any basis S of V (not necessarily
orthogonal) there is a matrix A = AP,S depending on both P and S such
that for every v,w ∈ V there is an equality (of real numbers)
< v,w >= SvT ∗ A ∗ Sw (3)
and, conversely, any n × n matrix A determines a bilinear pairing on
<−,−>=<−,−>A on V by the above equation. For fixed basis S this
establishes a 1-1 correspondence between the set of bilinear pairings on V
and the set of n × n real matrices. Moreover, the bilinear pairing is
symmetric iff its matrix representation is a symmetric matrix, and the
symmetric bilinear pairing is positive definite iff its symmetric matrix
representation is positive definite (as defined above).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 15 / 51
Orthogonal Vectors and Subspaces
The concept of orthogonality is dependent on the choice of inner product.
So assume first that we are working with the standard dot product in Rn.
Definition of orthogonality for a set of vectors
Two vectors v, w are orthogonal if they are non-zero and v ·w = 0; we
indicate this by writing v ⊥ w. Orthogonality with respect to this standard
inner product corresponds to our usual notion of perpendicular. More
generally, a collection of non-zero vectors {vi} is said to be orthogonal if
they are pairwise orthogonal; in other words, vi · vj = 0 for all i 6= j .
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 16 / 51
Orthogonal Vectors and Subspaces
The concept of orthogonality is dependent on the choice of inner product.
So assume first that we are working with the standard dot product in Rn.
Definition of orthogonality for a set of vectors
Two vectors v, w are orthogonal if they are non-zero and v ·w = 0; we
indicate this by writing v ⊥ w. Orthogonality with respect to this standard
inner product corresponds to our usual notion of perpendicular. More
generally, a collection of non-zero vectors {vi} is said to be orthogonal if
they are pairwise orthogonal; in other words, vi · vj = 0 for all i 6= j .
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 16 / 51
Orthogonal Vectors and Subspaces
The concept of orthogonality is dependent on the choice of inner product.
So assume first that we are working with the standard dot product in Rn.
Definition of orthogonality for a set of vectors
Two vectors v, w are orthogonal if they are non-zero and v ·w = 0; we
indicate this by writing v ⊥ w.
Orthogonality with respect to this standard
inner product corresponds to our usual notion of perpendicular. More
generally, a collection of non-zero vectors {vi} is said to be orthogonal if
they are pairwise orthogonal; in other words, vi · vj = 0 for all i 6= j .
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 16 / 51
Orthogonal Vectors and Subspaces
The concept of orthogonality is dependent on the choice of inner product.
So assume first that we are working with the standard dot product in Rn.
Definition of orthogonality for a set of vectors
Two vectors v, w are orthogonal if they are non-zero and v ·w = 0; we
indicate this by writing v ⊥ w. Orthogonality with respect to this standard
inner product corresponds to our usual notion of perpendicular.
More
generally, a collection of non-zero vectors {vi} is said to be orthogonal if
they are pairwise orthogonal; in other words, vi · vj = 0 for all i 6= j .
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 16 / 51
Orthogonal Vectors and Subspaces
The concept of orthogonality is dependent on the choice of inner product.
So assume first that we are working with the standard dot product in Rn.
Definition of orthogonality for a set of vectors
Two vectors v, w are orthogonal if they are non-zero and v ·w = 0; we
indicate this by writing v ⊥ w. Orthogonality with respect to this standard
inner product corresponds to our usual notion of perpendicular. More
generally, a collection of non-zero vectors {vi} is said to be orthogonal if
they are pairwise orthogonal; in other words, vi · vj = 0 for all i 6= j .
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 16 / 51
Orthogonal Vectors and Subspaces
The notion of orthogonality extends to subspaces and pairs of sets.
Definition of orthogonality for subspaces
If V ,W ⊂ Rn are two non-zero subspaces, we say V and W are
orthogonal ( written as V ⊥W ) if v ·w = 0 ∀v ∈ V ,w ∈W . As with a
collection of vectors, a collection of subspaces {Vi} is orthogonal iff it is
pairwise orthogonal: Vi ⊥ Vj ∀i 6= j . Finally, we say that two sets of
vectors S,T are orthogonal if the dot product of any vector in S with any
other vector in T is zero. In this case we again write this as S ⊥ T .
Examples
Let V = Span{[2 3]T},W = Span{[3 (−2)]T} ⊂ R2. Then it is easy to
check that V and W are orthogonal (geometrically, they are represented
by two lines in R2 passing through the origin and forming a 90◦ angle
between them).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 17 / 51
Orthogonal Vectors and Subspaces
The notion of orthogonality extends to subspaces and pairs of sets.
Definition of orthogonality for subspaces
If V ,W ⊂ Rn are two non-zero subspaces, we say V and W are
orthogonal ( written as V ⊥W ) if v ·w = 0 ∀v ∈ V ,w ∈W . As with a
collection of vectors, a collection of subspaces {Vi} is orthogonal iff it is
pairwise orthogonal: Vi ⊥ Vj ∀i 6= j . Finally, we say that two sets of
vectors S,T are orthogonal if the dot product of any vector in S with any
other vector in T is zero. In this case we again write this as S ⊥ T .
Examples
Let V = Span{[2 3]T},W = Span{[3 (−2)]T} ⊂ R2. Then it is easy to
check that V and W are orthogonal (geometrically, they are represented
by two lines in R2 passing through the origin and forming a 90◦ angle
between them).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 17 / 51
Orthogonal Vectors and Subspaces
The notion of orthogonality extends to subspaces and pairs of sets.
Definition of orthogonality for subspaces
If V ,W ⊂ Rn are two non-zero subspaces, we say V and W are
orthogonal ( written as V ⊥W ) if v ·w = 0 ∀v ∈ V ,w ∈W .
As with a
collection of vectors, a collection of subspaces {Vi} is orthogonal iff it is
pairwise orthogonal: Vi ⊥ Vj ∀i 6= j . Finally, we say that two sets of
vectors S,T are orthogonal if the dot product of any vector in S with any
other vector in T is zero. In this case we again write this as S ⊥ T .
Examples
Let V = Span{[2 3]T},W = Span{[3 (−2)]T} ⊂ R2. Then it is easy to
check that V and W are orthogonal (geometrically, they are represented
by two lines in R2 passing through the origin and forming a 90◦ angle
between them).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 17 / 51
Orthogonal Vectors and Subspaces
The notion of orthogonality extends to subspaces and pairs of sets.
Definition of orthogonality for subspaces
If V ,W ⊂ Rn are two non-zero subspaces, we say V and W are
orthogonal ( written as V ⊥W ) if v ·w = 0 ∀v ∈ V ,w ∈W . As with a
collection of vectors, a collection of subspaces {Vi} is orthogonal iff it is
pairwise orthogonal: Vi ⊥ Vj ∀i 6= j .
Finally, we say that two sets of
vectors S,T are orthogonal if the dot product of any vector in S with any
other vector in T is zero. In this case we again write this as S ⊥ T .
Examples
Let V = Span{[2 3]T},W = Span{[3 (−2)]T} ⊂ R2. Then it is easy to
check that V and W are orthogonal (geometrically, they are represented
by two lines in R2 passing through the origin and forming a 90◦ angle
between them).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 17 / 51
Orthogonal Vectors and Subspaces
The notion of orthogonality extends to subspaces and pairs of sets.
Definition of orthogonality for subspaces
If V ,W ⊂ Rn are two non-zero subspaces, we say V and W are
orthogonal ( written as V ⊥W ) if v ·w = 0 ∀v ∈ V ,w ∈W . As with a
collection of vectors, a collection of subspaces {Vi} is orthogonal iff it is
pairwise orthogonal: Vi ⊥ Vj ∀i 6= j . Finally, we say that two sets of
vectors S,T are orthogonal if the dot product of any vector in S with any
other vector in T is zero. In this case we again write this as S ⊥ T .
Examples
Let V = Span{[2 3]T},W = Span{[3 (−2)]T} ⊂ R2. Then it is easy to
check that V and W are orthogonal (geometrically, they are represented
by two lines in R2 passing through the origin and forming a 90◦ angle
between them).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 17 / 51
Orthogonal Vectors and Subspaces
The notion of orthogonality extends to subspaces and pairs of sets.
Definition of orthogonality for subspaces
If V ,W ⊂ Rn are two non-zero subspaces, we say V and W are
orthogonal ( written as V ⊥W ) if v ·w = 0 ∀v ∈ V ,w ∈W . As with a
collection of vectors, a collection of subspaces {Vi} is orthogonal iff it is
pairwise orthogonal: Vi ⊥ Vj ∀i 6= j . Finally, we say that two sets of
vectors S,T are orthogonal if the dot product of any vector in S with any
other vector in T is zero. In this case we again write this as S ⊥ T .
Examples
Let V = Span{[2 3]T},W = Span{[3 (−2)]T} ⊂ R2. Then it is easy to
check that V and W are orthogonal (geometrically, they are represented
by two lines in R2 passing through the origin and forming a 90◦ angle
between them).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 17 / 51
Orthogonal Vectors and Subspaces
If V , W are two non-zero subspaces, it would be impossible to verify if
they are orthogonal by computing every possible dot product of a vector in
V with a vector in W . We need an efficient way of checking this.
Theorem
If V = Span{S},W = Span{T} are subspaces of Rn, then V ⊥W iff
S ⊥ T .
This applies in particular when S and T are minimal spanning sets (i.e.,
bases).
Definition of Orthogonal Complement
If W ⊂ Rn is a subspace, its orthogonal complement is given by
W⊥ := {v ∈ Rn | v ·w = 0∀w ∈W }.
W⊥ is the largest subspace of Rn
orthogonal to W as a subspace.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 18 / 51
Orthogonal Vectors and Subspaces
If V , W are two non-zero subspaces, it would be impossible to verify if
they are orthogonal by computing every possible dot product of a vector in
V with a vector in W . We need an efficient way of checking this.
Theorem
If V = Span{S},W = Span{T} are subspaces of Rn, then V ⊥W iff
S ⊥ T .
This applies in particular when S and T are minimal spanning sets (i.e.,
bases).
Definition of Orthogonal Complement
If W ⊂ Rn is a subspace, its orthogonal complement is given by
W⊥ := {v ∈ Rn | v ·w = 0∀w ∈W }.
W⊥ is the largest subspace of Rn
orthogonal to W as a subspace.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 18 / 51
Orthogonal Vectors and Subspaces
If V , W are two non-zero subspaces, it would be impossible to verify if
they are orthogonal by computing every possible dot product of a vector in
V with a vector in W . We need an efficient way of checking this.
Theorem
If V = Span{S},W = Span{T} are subspaces of Rn, then V ⊥W iff
S ⊥ T .
This applies in particular when S and T are minimal spanning sets (i.e.,
bases).
Definition of Orthogonal Complement
If W ⊂ Rn is a subspace, its orthogonal complement is given by
W⊥ := {v ∈ Rn | v ·w = 0∀w ∈W }.
W⊥ is the largest subspace of Rn
orthogonal to W as a subspace.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 18 / 51
Orthogonal Vectors and Subspaces
If V , W are two non-zero subspaces, it would be impossible to verify if
they are orthogonal by computing every possible dot product of a vector in
V with a vector in W . We need an efficient way of checking this.
Theorem
If V = Span{S},W = Span{T} are subspaces of Rn, then V ⊥W iff
S ⊥ T .
This applies in particular when S and T are minimal spanning sets (i.e.,
bases).
Definition of Orthogonal Complement
If W ⊂ Rn is a subspace, its orthogonal complement is given by
W⊥ := {v ∈ Rn | v ·w = 0∀w ∈W }.
W⊥ is the largest subspace of Rn
orthogonal to W as a subspace.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 18 / 51
Orthogonal Vectors and Subspaces
If V , W are two non-zero subspaces, it would be impossible to verify if
they are orthogonal by computing every possible dot product of a vector in
V with a vector in W . We need an efficient way of checking this.
Theorem
If V = Span{S},W = Span{T} are subspaces of Rn, then V ⊥W iff
S ⊥ T .
This applies in particular when S and T are minimal spanning sets (i.e.,
bases).
Definition of Orthogonal Complement
If W ⊂ Rn is a subspace, its orthogonal complement is given by
W⊥ := {v ∈ Rn | v ·w = 0∀w ∈W }.
W⊥ is the largest subspace of Rn
orthogonal to W as a subspace.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 18 / 51
Orthogonal Vectors and Subspaces
If V , W are two non-zero subspaces, it would be impossible to verify if
they are orthogonal by computing every possible dot product of a vector in
V with a vector in W . We need an efficient way of checking this.
Theorem
If V = Span{S},W = Span{T} are subspaces of Rn, then V ⊥W iff
S ⊥ T .
This applies in particular when S and T are minimal spanning sets (i.e.,
bases).
Definition of Orthogonal Complement
If W ⊂ Rn is a subspace, its orthogonal complement is given by
W⊥ := {v ∈ Rn | v ·w = 0∀w ∈W }. W⊥ is the largest subspace of Rn
orthogonal to W as a subspace.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 18 / 51
Orthogonal Vectors and Subspaces
Orthogonality is connected to the property of linear independence.
Lemma
If {v1, . . . , vm} is an orthogonal set of vectors in Rn, then it is linearly
independent.
Orthogonal Basis – Definition
An orthogonal basis is a basis consisting of orthogonal vectors.
Given this definition, it is natural to ask whether orthogonal bases always
exist. That question will be answered below.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 19 / 51
Orthogonal Vectors and Subspaces
Orthogonality is connected to the property of linear independence.
Lemma
If {v1, . . . , vm} is an orthogonal set of vectors in Rn, then it is linearly
independent.
Orthogonal Basis – Definition
An orthogonal basis is a basis consisting of orthogonal vectors.
Given this definition, it is natural to ask whether orthogonal bases always
exist. That question will be answered below.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 19 / 51
Orthogonal Vectors and Subspaces
Orthogonality is connected to the property of linear independence.
Lemma
If {v1, . . . , vm} is an orthogonal set of vectors in Rn, then it is linearly
independent.
Orthogonal Basis – Definition
An orthogonal basis is a basis consisting of orthogonal vectors.
Given this definition, it is natural to ask whether orthogonal bases always
exist. That question will be answered below.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 19 / 51
Orthogonal Vectors and Subspaces
Orthogonality is connected to the property of linear independence.
Lemma
If {v1, . . . , vm} is an orthogonal set of vectors in Rn, then it is linearly
independent.
Orthogonal Basis – Definition
An orthogonal basis is a basis consisting of orthogonal vectors.
Given this definition, it is natural to ask whether orthogonal bases always
exist. That question will be answered below.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 19 / 51
Orthogonal Matrices
An orthogonal set of vectors {u1,u2, . . . ,un} is said to be orthonormal if
‖ui‖ = 1, 1 ≤ i ≤ n.
Clearly, given an orthogonal set of vectors
{v1, v2, . . . , vn}, one can orthonormalize it by setting ui = vi/‖vi‖ for
each i .
We call an n × n matrix A orthogonal if the columns of A form an
orthonormal set of vectors.
Lemma
An n × n matrix A is orthogonal iff AT ∗ A = I.
Lemma
An n × n matrix A is orthogonal iff
v ·w = (A ∗ v) · (A ∗w)
for all v,w ∈ Rn.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 20 / 51
Orthogonal Matrices
An orthogonal set of vectors {u1,u2, . . . ,un} is said to be orthonormal if
‖ui‖ = 1, 1 ≤ i ≤ n. Clearly, given an orthogonal set of vectors
{v1, v2, . . . , vn}, one can orthonormalize it by setting ui = vi/‖vi‖ for
each i .
We call an n × n matrix A orthogonal if the columns of A form an
orthonormal set of vectors.
Lemma
An n × n matrix A is orthogonal iff AT ∗ A = I.
Lemma
An n × n matrix A is orthogonal iff
v ·w = (A ∗ v) · (A ∗w)
for all v,w ∈ Rn.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 20 / 51
Orthogonal Matrices
An orthogonal set of vectors {u1,u2, . . . ,un} is said to be orthonormal if
‖ui‖ = 1, 1 ≤ i ≤ n. Clearly, given an orthogonal set of vectors
{v1, v2, . . . , vn}, one can orthonormalize it by setting ui = vi/‖vi‖ for
each i .
We call an n × n matrix A orthogonal if the columns of A form an
orthonormal set of vectors.
Lemma
An n × n matrix A is orthogonal iff AT ∗ A = I.
Lemma
An n × n matrix A is orthogonal iff
v ·w = (A ∗ v) · (A ∗w)
for all v,w ∈ Rn.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 20 / 51
Orthogonal Matrices
An orthogonal set of vectors {u1,u2, . . . ,un} is said to be orthonormal if
‖ui‖ = 1, 1 ≤ i ≤ n. Clearly, given an orthogonal set of vectors
{v1, v2, . . . , vn}, one can orthonormalize it by setting ui = vi/‖vi‖ for
each i .
We call an n × n matrix A orthogonal if the columns of A form an
orthonormal set of vectors.
Lemma
An n × n matrix A is orthogonal iff AT ∗ A = I.
Lemma
An n × n matrix A is orthogonal iff
v ·w = (A ∗ v) · (A ∗w)
for all v,w ∈ Rn.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 20 / 51
Orthogonal Matrices
An orthogonal set of vectors {u1,u2, . . . ,un} is said to be orthonormal if
‖ui‖ = 1, 1 ≤ i ≤ n. Clearly, given an orthogonal set of vectors
{v1, v2, . . . , vn}, one can orthonormalize it by setting ui = vi/‖vi‖ for
each i .
We call an n × n matrix A orthogonal if the columns of A form an
orthonormal set of vectors.
Lemma
An n × n matrix A is orthogonal iff AT ∗ A = I.
Lemma
An n × n matrix A is orthogonal iff
v ·w = (A ∗ v) · (A ∗w)
for all v,w ∈ Rn.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 20 / 51
Projection onto a 1-Dimensional Subspace
Suppose V = Span{v} is a 1-dimensional subspace of Rn (so that v 6= 0).
Then given w ∈ Rn, we define the projection of w onto V to be
prV (w) :=
(
v ·w
v · v
)
v
Note that this quantity makes sense, as v 6= 0 implies v · v > 0.
Theorem
The vector prV (w) depends only on the vector w and the subspace V . In
other words, it does not depend on the choice of basis for V .
The vector prV (w) represents the V -component of w (in texts, this
projection is also referred to as the component of w in the direction of v.
We prefer the subspace interpretation, as it makes clear the independence
on the choice of basis element).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 21 / 51
Projection onto a 1-Dimensional Subspace
Suppose V = Span{v} is a 1-dimensional subspace of Rn (so that v 6= 0).
Then given w ∈ Rn, we define the projection of w onto V to be
prV (w) :=
(
v ·w
v · v
)
v
Note that this quantity makes sense, as v 6= 0 implies v · v > 0.
Theorem
The vector prV (w) depends only on the vector w and the subspace V . In
other words, it does not depend on the choice of basis for V .
The vector prV (w) represents the V -component of w (in texts, this
projection is also referred to as the component of w in the direction of v.
We prefer the subspace interpretation, as it makes clear the independence
on the choice of basis element).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 21 / 51
Projection onto a 1-Dimensional Subspace
Suppose V = Span{v} is a 1-dimensional subspace of Rn (so that v 6= 0).
Then given w ∈ Rn, we define the projection of w onto V to be
prV (w) :=
(
v ·w
v · v
)
v
Note that this quantity makes sense, as v 6= 0 implies v · v > 0.
Theorem
The vector prV (w) depends only on the vector w and the subspace V . In
other words, it does not depend on the choice of basis for V .
The vector prV (w) represents the V -component of w (in texts, this
projection is also referred to as the component of w in the direction of v.
We prefer the subspace interpretation, as it makes clear the independence
on the choice of basis element).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 21 / 51
Projection onto a 1-Dimensional Subspace
Suppose V = Span{v} is a 1-dimensional subspace of Rn (so that v 6= 0).
Then given w ∈ Rn, we define the projection of w onto V to be
prV (w) :=
(
v ·w
v · v
)
v
Note that this quantity makes sense, as v 6= 0 implies v · v > 0.
Theorem
The vector prV (w) depends only on the vector w and the subspace V . In
other words, it does not depend on the choice of basis for V .
The vector prV (w) represents the V -component of w (in texts, this
projection is also referred to as the component of w in the direction of v.
We prefer the subspace interpretation, as it makes clear the independence
on the choice of basis element).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 21 / 51
Projection onto a 1-Dimensional Subspace
Suppose V = Span{v} is a 1-dimensional subspace of Rn (so that v 6= 0).
Then given w ∈ Rn, we define the projection of w onto V to be
prV (w) :=
(
v ·w
v · v
)
v
Note that this quantity makes sense, as v 6= 0 implies v · v > 0.
Theorem
The vector prV (w) depends only on the vector w and the subspace V . In
other words, it does not depend on the choice of basis for V .
The vector prV (w) represents the V -component of w (in texts, this
projection is also referred to as the component of w in the direction of v.
We prefer the subspace interpretation, as it makes clear the independence
on the choice of basis element).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 21 / 51
Projection onto a 1-Dimensional Subspace
Suppose V = Span{v} is a 1-dimensional subspace of Rn (so that v 6= 0).
Then given w ∈ Rn, we define the projection of w onto V to be
prV (w) :=
(
v ·w
v · v
)
v
Note that this quantity makes sense, as v 6= 0 implies v · v > 0.
Theorem
The vector prV (w) depends only on the vector w and the subspace V . In
other words, it does not depend on the choice of basis for V .
The vector prV (w) represents the V -component of w (in texts, this
projection is also referred to as the component of w in the direction of v.
We prefer the subspace interpretation, as it makes clear the independence
on the choice of basis element).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 21 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
We know that every non-zero vector space admits a basis. It is natural
then to ask: does every non-zero inner product space admit an orthogonal
basis?
The answer is: yes, it does. In fact, given a basis for an inner product
space, there is a systematic way to convert it into an orthogonal basis.
And this method simultaneously provides a method for projecting a vector
onto a subspace.
Again, we discuss the procedure first for Rn equipped with its standard
scalar product, then show how this naturally extends to more general inner
product spaces.
Let {v1, v2, . . . , vn} be a basis for Rn. We will give an inductive procedure
for constructing an orthogonal basis for Rn from this original set.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 22 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
We know that every non-zero vector space admits a basis. It is natural
then to ask: does every non-zero inner product space admit an orthogonal
basis?
The answer is: yes, it does.
In fact, given a basis for an inner product
space, there is a systematic way to convert it into an orthogonal basis.
And this method simultaneously provides a method for projecting a vector
onto a subspace.
Again, we discuss the procedure first for Rn equipped with its standard
scalar product, then show how this naturally extends to more general inner
product spaces.
Let {v1, v2, . . . , vn} be a basis for Rn. We will give an inductive procedure
for constructing an orthogonal basis for Rn from this original set.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 22 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
We know that every non-zero vector space admits a basis. It is natural
then to ask: does every non-zero inner product space admit an orthogonal
basis?
The answer is: yes, it does. In fact, given a basis for an inner product
space, there is a systematic way to convert it into an orthogonal basis.
And this method simultaneously provides a method for projecting a vector
onto a subspace.
Again, we discuss the procedure first for Rn equipped with its standard
scalar product, then show how this naturally extends to more general inner
product spaces.
Let {v1, v2, . . . , vn} be a basis for Rn. We will give an inductive procedure
for constructing an orthogonal basis for Rn from this original set.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 22 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
We know that every non-zero vector space admits a basis. It is natural
then to ask: does every non-zero inner product space admit an orthogonal
basis?
The answer is: yes, it does. In fact, given a basis for an inner product
space, there is a systematic way to convert it into an orthogonal basis.
And this method simultaneously provides a method for projecting a vector
onto a subspace.
Again, we discuss the procedure first for Rn equipped with its standard
scalar product, then show how this naturally extends to more general inner
product spaces.
Let {v1, v2, . . . , vn} be a basis for Rn. We will give an inductive procedure
for constructing an orthogonal basis for Rn from this original set.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 22 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
We know that every non-zero vector space admits a basis. It is natural
then to ask: does every non-zero inner product space admit an orthogonal
basis?
The answer is: yes, it does. In fact, given a basis for an inner product
space, there is a systematic way to convert it into an orthogonal basis.
And this method simultaneously provides a method for projecting a vector
onto a subspace.
Again, we discuss the procedure first for Rn equipped with its standard
scalar product, then show how this naturally extends to more general inner
product spaces.
Let {v1, v2, . . . , vn} be a basis for Rn. We will give an inductive procedure
for constructing an orthogonal basis for Rn from this original set.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 22 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
First, some notation. Let Wi := Span{v1, . . . , vi} be the span of the first i
vectors in the set.
Since any subset of linearly independent vectors is
linearly independent, we see that Dim(Wi ) = i , 1 ≤ i ≤ n, with Wn = Rn.
Now {v1} is an orthogonal basis for W1, since it has only one element. We
set u1 = v1, and consider the vector v2.
This need not be orthogonal to v1, but it cannot be simply a scalar
multiple of v1 either, since that would imply that the set {v1, v2} was
linearly dependent, contradicting what we know.
So we define
u2 := v2 − prW1(v2)
As we have just observed, u2 6= 0.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 23 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
First, some notation. Let Wi := Span{v1, . . . , vi} be the span of the first i
vectors in the set. Since any subset of linearly independent vectors is
linearly independent, we see that Dim(Wi ) = i , 1 ≤ i ≤ n, with Wn = Rn.
Now {v1} is an orthogonal basis for W1, since it has only one element. We
set u1 = v1, and consider the vector v2.
This need not be orthogonal to v1, but it cannot be simply a scalar
multiple of v1 either, since that would imply that the set {v1, v2} was
linearly dependent, contradicting what we know.
So we define
u2 := v2 − prW1(v2)
As we have just observed, u2 6= 0.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 23 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
First, some notation. Let Wi := Span{v1, . . . , vi} be the span of the first i
vectors in the set. Since any subset of linearly independent vectors is
linearly independent, we see that Dim(Wi ) = i , 1 ≤ i ≤ n, with Wn = Rn.
Now {v1} is an orthogonal basis for W1, since it has only one element. We
set u1 = v1, and consider the vector v2.
This need not be orthogonal to v1, but it cannot be simply a scalar
multiple of v1 either, since that would imply that the set {v1, v2} was
linearly dependent, contradicting what we know.
So we define
u2 := v2 − prW1(v2)
As we have just observed, u2 6= 0.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 23 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
First, some notation. Let Wi := Span{v1, . . . , vi} be the span of the first i
vectors in the set. Since any subset of linearly independent vectors is
linearly independent, we see that Dim(Wi ) = i , 1 ≤ i ≤ n, with Wn = Rn.
Now {v1} is an orthogonal basis for W1, since it has only one element. We
set u1 = v1, and consider the vector v2.
This need not be orthogonal to v1, but it cannot be simply a scalar
multiple of v1 either, since that would imply that the set {v1, v2} was
linearly dependent, contradicting what we know.
So we define
u2 := v2 − prW1(v2)
As we have just observed, u2 6= 0.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 23 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
First, some notation. Let Wi := Span{v1, . . . , vi} be the span of the first i
vectors in the set. Since any subset of linearly independent vectors is
linearly independent, we see that Dim(Wi ) = i , 1 ≤ i ≤ n, with Wn = Rn.
Now {v1} is an orthogonal basis for W1, since it has only one element. We
set u1 = v1, and consider the vector v2.
This need not be orthogonal to v1, but it cannot be simply a scalar
multiple of v1 either, since that would imply that the set {v1, v2} was
linearly dependent, contradicting what we know.
So we define
u2 := v2 − prW1(v2)
As we have just observed, u2 6= 0.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 23 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
First, some notation. Let Wi := Span{v1, . . . , vi} be the span of the first i
vectors in the set. Since any subset of linearly independent vectors is
linearly independent, we see that Dim(Wi ) = i , 1 ≤ i ≤ n, with Wn = Rn.
Now {v1} is an orthogonal basis for W1, since it has only one element. We
set u1 = v1, and consider the vector v2.
This need not be orthogonal to v1, but it cannot be simply a scalar
multiple of v1 either, since that would imply that the set {v1, v2} was
linearly dependent, contradicting what we know.
So we define
u2 := v2 − prW1(v2)
As we have just observed, u2 6= 0.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 23 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Lemma
The dot product u1 · u2, is zero. Also Span({u1,u2}) = Span({v1, v2}).
Hence {u1,u2} is an orthonormal basis for W2.
We now suppose that we have constructed an orthogonal basis
{u1, . . . ,um} for Wm. We need to show how to this can be extended to
Wm+1 if m < n. First, for v ∈ Rn, we define the projection of v onto Wm
to be
prWm (v) :=
m∑
i=1
(
ui · v
ui · ui
)
ui ∈Wm
Again, if v /∈Wm, then v 6= prWm (v) and so their difference will not be
zero. As above, we then set
um+1 = vm+1 − prWm (vm+1)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 24 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Lemma
The dot product u1 · u2, is zero. Also Span({u1,u2}) = Span({v1, v2}).
Hence {u1,u2} is an orthonormal basis for W2.
We now suppose that we have constructed an orthogonal basis
{u1, . . . ,um} for Wm. We need to show how to this can be extended to
Wm+1 if m < n.
First, for v ∈ Rn, we define the projection of v onto Wm
to be
prWm (v) :=
m∑
i=1
(
ui · v
ui · ui
)
ui ∈Wm
Again, if v /∈Wm, then v 6= prWm (v) and so their difference will not be
zero. As above, we then set
um+1 = vm+1 − prWm (vm+1)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 24 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Lemma
The dot product u1 · u2, is zero. Also Span({u1,u2}) = Span({v1, v2}).
Hence {u1,u2} is an orthonormal basis for W2.
We now suppose that we have constructed an orthogonal basis
{u1, . . . ,um} for Wm. We need to show how to this can be extended to
Wm+1 if m < n. First, for v ∈ Rn, we define the projection of v onto Wm
to be
prWm (v) :=
m∑
i=1
(
ui · v
ui · ui
)
ui ∈Wm
Again, if v /∈Wm, then v 6= prWm (v) and so their difference will not be
zero. As above, we then set
um+1 = vm+1 − prWm (vm+1)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 24 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Lemma
The dot product u1 · u2, is zero. Also Span({u1,u2}) = Span({v1, v2}).
Hence {u1,u2} is an orthonormal basis for W2.
We now suppose that we have constructed an orthogonal basis
{u1, . . . ,um} for Wm. We need to show how to this can be extended to
Wm+1 if m < n. First, for v ∈ Rn, we define the projection of v onto Wm
to be
prWm (v) :=
m∑
i=1
(
ui · v
ui · ui
)
ui ∈Wm
Again, if v /∈Wm, then v 6= prWm (v) and so their difference will not be
zero. As above, we then set
um+1 = vm+1 − prWm (vm+1)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 24 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Lemma
The dot product u1 · u2, is zero. Also Span({u1,u2}) = Span({v1, v2}).
Hence {u1,u2} is an orthonormal basis for W2.
We now suppose that we have constructed an orthogonal basis
{u1, . . . ,um} for Wm. We need to show how to this can be extended to
Wm+1 if m < n. First, for v ∈ Rn, we define the projection of v onto Wm
to be
prWm (v) :=
m∑
i=1
(
ui · v
ui · ui
)
ui ∈Wm
Again, if v /∈Wm, then v 6= prWm (v) and so their difference will not be
zero.
As above, we then set
um+1 = vm+1 − prWm (vm+1)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 24 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Lemma
The dot product u1 · u2, is zero. Also Span({u1,u2}) = Span({v1, v2}).
Hence {u1,u2} is an orthonormal basis for W2.
We now suppose that we have constructed an orthogonal basis
{u1, . . . ,um} for Wm. We need to show how to this can be extended to
Wm+1 if m < n. First, for v ∈ Rn, we define the projection of v onto Wm
to be
prWm (v) :=
m∑
i=1
(
ui · v
ui · ui
)
ui ∈Wm
Again, if v /∈Wm, then v 6= prWm (v) and so their difference will not be
zero. As above, we then set
um+1 = vm+1 − prWm (vm+1)
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 24 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Lemma
If m < n, then {u1, . . . ,um,um+1} is an orthogonal basis for Wm+1.
Continuing in this fashion, we eventually reach the case m = n at which
point the algorithm is complete.
Note that this procedure depends not
only on the basis but also on the order in which we list the basis elements
- changing the order will (most of the time) result in a different orthogonal
basis for Rn. Note also that this procedure works just as well if we start
with a subspace of Rn, together with a basis for that subspace.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 25 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Lemma
If m < n, then {u1, . . . ,um,um+1} is an orthogonal basis for Wm+1.
Continuing in this fashion, we eventually reach the case m = n at which
point the algorithm is complete. Note that this procedure depends not
only on the basis but also on the order in which we list the basis elements
- changing the order will (most of the time) result in a different orthogonal
basis for Rn.
Note also that this procedure works just as well if we start
with a subspace of Rn, together with a basis for that subspace.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 25 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Lemma
If m < n, then {u1, . . . ,um,um+1} is an orthogonal basis for Wm+1.
Continuing in this fashion, we eventually reach the case m = n at which
point the algorithm is complete. Note that this procedure depends not
only on the basis but also on the order in which we list the basis elements
- changing the order will (most of the time) result in a different orthogonal
basis for Rn. Note also that this procedure works just as well if we start
with a subspace of Rn, together with a basis for that subspace.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 25 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Theorem
For any subspace W of Rn and basis S = {v1, . . . , vm} for that subspace,
the Gram-Schmidt algorithm produces an orthogonal basis {u1, . . . ,um}
for W , which depends only on the ordering of the initial basis elements in
S. Given this orthogonal basis for W and an arbitrary vector v ∈ Rn, the
projection of v onto W , or the W -component of v is given by
prW (v) :=
m∑
i=1
(
ui · v
ui · ui
)
ui
The reason for calling this projection the W -component of v is more or
less clear, since the equation
v = prW (v) + (v− prW (v))
gives v as a sum of i) its component in W , and ii) its component in W⊥.
Rn = W ⊕W⊥, so this sum decomposition of v is unique. In other words
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 26 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Theorem
For any subspace W of Rn and basis S = {v1, . . . , vm} for that subspace,
the Gram-Schmidt algorithm produces an orthogonal basis {u1, . . . ,um}
for W , which depends only on the ordering of the initial basis elements in
S.
Given this orthogonal basis for W and an arbitrary vector v ∈ Rn, the
projection of v onto W , or the W -component of v is given by
prW (v) :=
m∑
i=1
(
ui · v
ui · ui
)
ui
The reason for calling this projection the W -component of v is more or
less clear, since the equation
v = prW (v) + (v− prW (v))
gives v as a sum of i) its component in W , and ii) its component in W⊥.
Rn = W ⊕W⊥, so this sum decomposition of v is unique. In other words
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 26 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Theorem
For any subspace W of Rn and basis S = {v1, . . . , vm} for that subspace,
the Gram-Schmidt algorithm produces an orthogonal basis {u1, . . . ,um}
for W , which depends only on the ordering of the initial basis elements in
S. Given this orthogonal basis for W and an arbitrary vector v ∈ Rn, the
projection of v onto W , or the W -component of v is given by
prW (v) :=
m∑
i=1
(
ui · v
ui · ui
)
ui
The reason for calling this projection the W -component of v is more or
less clear, since the equation
v = prW (v) + (v− prW (v))
gives v as a sum of i) its component in W , and ii) its component in W⊥.
Rn = W ⊕W⊥, so this sum decomposition of v is unique. In other words
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 26 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Theorem
For any subspace W of Rn and basis S = {v1, . . . , vm} for that subspace,
the Gram-Schmidt algorithm produces an orthogonal basis {u1, . . . ,um}
for W , which depends only on the ordering of the initial basis elements in
S. Given this orthogonal basis for W and an arbitrary vector v ∈ Rn, the
projection of v onto W , or the W -component of v is given by
prW (v) :=
m∑
i=1
(
ui · v
ui · ui
)
ui
The reason for calling this projection the W -component of v is more or
less clear, since the equation
v = prW (v) + (v− prW (v))
gives v as a sum of i) its component in W , and ii) its component in W⊥.
Rn = W ⊕W⊥, so this sum decomposition of v is unique. In other words
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 26 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Theorem
For any subspace W of Rn and basis S = {v1, . . . , vm} for that subspace,
the Gram-Schmidt algorithm produces an orthogonal basis {u1, . . . ,um}
for W , which depends only on the ordering of the initial basis elements in
S. Given this orthogonal basis for W and an arbitrary vector v ∈ Rn, the
projection of v onto W , or the W -component of v is given by
prW (v) :=
m∑
i=1
(
ui · v
ui · ui
)
ui
The reason for calling this projection the W -component of v is more or
less clear, since the equation
v = prW (v) + (v− prW (v))
gives v as a sum of i) its component in W , and ii) its component in W⊥.
Rn = W ⊕W⊥, so this sum decomposition of v is unique. In other words
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 26 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Theorem
For any subspace W of Rn and basis S = {v1, . . . , vm} for that subspace,
the Gram-Schmidt algorithm produces an orthogonal basis {u1, . . . ,um}
for W , which depends only on the ordering of the initial basis elements in
S. Given this orthogonal basis for W and an arbitrary vector v ∈ Rn, the
projection of v onto W , or the W -component of v is given by
prW (v) :=
m∑
i=1
(
ui · v
ui · ui
)
ui
The reason for calling this projection the W -component of v is more or
less clear, since the equation
v = prW (v) + (v− prW (v))
gives v as a sum of i) its component in W , and ii) its component in W⊥.
Rn = W ⊕W⊥, so this sum decomposition of v is unique. In other words
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 26 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Theorem
For any subspace W of Rn and basis S = {v1, . . . , vm} for that subspace,
the Gram-Schmidt algorithm produces an orthogonal basis {u1, . . . ,um}
for W , which depends only on the ordering of the initial basis elements in
S. Given this orthogonal basis for W and an arbitrary vector v ∈ Rn, the
projection of v onto W , or the W -component of v is given by
prW (v) :=
m∑
i=1
(
ui · v
ui · ui
)
ui
The reason for calling this projection the W -component of v is more or
less clear, since the equation
v = prW (v) + (v− prW (v))
gives v as a sum of i) its component in W , and ii) its component in W⊥.
Rn = W ⊕W⊥, so this sum decomposition of v is unique. In other words
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 26 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Theorem
For any non-zero subspace W ⊂ Rn, prW (v) is the unique vector in W for
which the difference v− prW (v) lies in W⊥.
As we are about to see, this is equivalent to saying that it is the vector in
W closest to v, where distance is in terms of the standard Euclidean
distance for Rn based on the scalar product.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 27 / 51
Projections onto m-Dim. Subspaces and Gram-Schmidt
Theorem
For any non-zero subspace W ⊂ Rn, prW (v) is the unique vector in W for
which the difference v− prW (v) lies in W⊥.
As we are about to see, this is equivalent to saying that it is the vector in
W closest to v, where distance is in terms of the standard Euclidean
distance for Rn based on the scalar product.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 27 / 51
Least-Squares Approximation
As before, we are given a subspace W ⊂ Rn and a vector v ∈ Rn. The
questions are then:
(LS1) Is there a vector wv ∈W satisfying the property that
‖wv − v‖ ≤ ‖w− v‖ for all w ∈W ?
(LS2) If such a vector exists, is it unique?
The answer is “yes.” More precisely, the vector in W we are looking for is
exactly the projection of v onto W :
Theorem
wv = prW (v).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 28 / 51
Least-Squares Approximation
As before, we are given a subspace W ⊂ Rn and a vector v ∈ Rn. The
questions are then:
(LS1) Is there a vector wv ∈W satisfying the property that
‖wv − v‖ ≤ ‖w− v‖ for all w ∈W ?
(LS2) If such a vector exists, is it unique?
The answer is “yes.” More precisely, the vector in W we are looking for is
exactly the projection of v onto W :
Theorem
wv = prW (v).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 28 / 51
Least-Squares Approximation
As before, we are given a subspace W ⊂ Rn and a vector v ∈ Rn. The
questions are then:
(LS1) Is there a vector wv ∈W satisfying the property that
‖wv − v‖ ≤ ‖w− v‖ for all w ∈W ?
(LS2) If such a vector exists, is it unique?
The answer is “yes.” More precisely, the vector in W we are looking for is
exactly the projection of v onto W :
Theorem
wv = prW (v).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 28 / 51
Least-Squares Approximation
As before, we are given a subspace W ⊂ Rn and a vector v ∈ Rn. The
questions are then:
(LS1) Is there a vector wv ∈W satisfying the property that
‖wv − v‖ ≤ ‖w− v‖ for all w ∈W ?
(LS2) If such a vector exists, is it unique?
The answer is “yes.” More precisely, the vector in W we are looking for is
exactly the projection of v onto W :
Theorem
wv = prW (v).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 28 / 51
Least-Squares Approximation
As before, we are given a subspace W ⊂ Rn and a vector v ∈ Rn. The
questions are then:
(LS1) Is there a vector wv ∈W satisfying the property that
‖wv − v‖ ≤ ‖w− v‖ for all w ∈W ?
(LS2) If such a vector exists, is it unique?
The answer is “yes.” More precisely, the vector in W we are looking for is
exactly the projection of v onto W :
Theorem
wv = prW (v).
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 28 / 51
Least-Squares Approximation
The vector vm = prW (v) is referred to as the least-squares
approximation of v by a vector in W , because prW (v) satisfies the
property that ‖v− prW (v)‖2, which is computed as a sum of squares of
differences in coordinates, is minimized.
This turns out to have an important application to finding the best
approximation to a system of equations in the event no actual solution
exists. We discuss this next.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 29 / 51
Least-Squares Approximation
The vector vm = prW (v) is referred to as the least-squares
approximation of v by a vector in W , because prW (v) satisfies the
property that ‖v− prW (v)‖2, which is computed as a sum of squares of
differences in coordinates, is minimized.
This turns out to have an important application to finding the best
approximation to a system of equations in the event no actual solution
exists. We discuss this next.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 29 / 51
Least-Squares and the Fundamental Subspaces Theorem
Suppose we are given a matrix equation A ∗ x = b with x a vector variable
taking values in Rn, and b a fixed vector in Rm (implying that A is an
m × n matrix).
The consistency theorem for systems of equations tells us
that the equation is consistent precisely when b is in the span of the
columns of A, or alternatively, when b ∈ C(A).
But what if it is not? In other words, the system is inconsistent? Up until
now we have simply left it at that; inconsistency was the end of the story.
But it is not. Because whether or not the original system is consistent, one
can always find a solution to the related equation
A ∗ x = prC(A)(b) (4)
because the projection prC(A)(b) of b onto the column space of A will
always be in the column space of A regardless of whether or not the
original vector b is.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 30 / 51
Least-Squares and the Fundamental Subspaces Theorem
Suppose we are given a matrix equation A ∗ x = b with x a vector variable
taking values in Rn, and b a fixed vector in Rm (implying that A is an
m × n matrix). The consistency theorem for systems of equations tells us
that the equation is consistent precisely when b is in the span of the
columns of A, or alternatively, when b ∈ C(A).
But what if it is not? In other words, the system is inconsistent? Up until
now we have simply left it at that; inconsistency was the end of the story.
But it is not. Because whether or not the original system is consistent, one
can always find a solution to the related equation
A ∗ x = prC(A)(b) (4)
because the projection prC(A)(b) of b onto the column space of A will
always be in the column space of A regardless of whether or not the
original vector b is.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 30 / 51
Least-Squares and the Fundamental Subspaces Theorem
Suppose we are given a matrix equation A ∗ x = b with x a vector variable
taking values in Rn, and b a fixed vector in Rm (implying that A is an
m × n matrix). The consistency theorem for systems of equations tells us
that the equation is consistent precisely when b is in the span of the
columns of A, or alternatively, when b ∈ C(A).
But what if it is not? In other words, the system is inconsistent? Up until
now we have simply left it at that; inconsistency was the end of the story.
But it is not. Because whether or not the original system is consistent, one
can always find a solution to the related equation
A ∗ x = prC(A)(b) (4)
because the projection prC(A)(b) of b onto the column space of A will
always be in the column space of A regardless of whether or not the
original vector b is.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 30 / 51
Least-Squares and the Fundamental Subspaces Theorem
Suppose we are given a matrix equation A ∗ x = b with x a vector variable
taking values in Rn, and b a fixed vector in Rm (implying that A is an
m × n matrix). The consistency theorem for systems of equations tells us
that the equation is consistent precisely when b is in the span of the
columns of A, or alternatively, when b ∈ C(A).
But what if it is not? In other words, the system is inconsistent? Up until
now we have simply left it at that; inconsistency was the end of the story.
But it is not. Because whether or not the original system is consistent, one
can always find a solution to the related equation
A ∗ x = prC(A)(b) (4)
because the projection prC(A)(b) of b onto the column space of A will
always be in the column space of A regardless of whether or not the
original vector b is.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 30 / 51
Least-Squares and the Fundamental Subspaces Theorem
Suppose we are given a matrix equation A ∗ x = b with x a vector variable
taking values in Rn, and b a fixed vector in Rm (implying that A is an
m × n matrix). The consistency theorem for systems of equations tells us
that the equation is consistent precisely when b is in the span of the
columns of A, or alternatively, when b ∈ C(A).
But what if it is not? In other words, the system is inconsistent? Up until
now we have simply left it at that; inconsistency was the end of the story.
But it is not. Because whether or not the original system is consistent, one
can always find a solution to the related equation
A ∗ x = prC(A)(b) (4)
because the projection prC(A)(b) of b onto the column space of A will
always be in the column space of A regardless of whether or not the
original vector b is.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 30 / 51
Least-Squares and the Fundamental Subspaces Theorem
Suppose we are given a matrix equation A ∗ x = b with x a vector variable
taking values in Rn, and b a fixed vector in Rm (implying that A is an
m × n matrix). The consistency theorem for systems of equations tells us
that the equation is consistent precisely when b is in the span of the
columns of A, or alternatively, when b ∈ C(A).
But what if it is not? In other words, the system is inconsistent? Up until
now we have simply left it at that; inconsistency was the end of the story.
But it is not. Because whether or not the original system is consistent, one
can always find a solution to the related equation
A ∗ x = prC(A)(b) (4)
because the projection prC(A)(b) of b onto the column space of A will
always be in the column space of A regardless of whether or not the
original vector b is.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 30 / 51
Least-Squares and the Fundamental Subspaces Theorem
The question then becomes:
given that we know (4) has at least one
solution, how do we go about finding it (or them)?
The starting point for answering that question is the following theorem,
often referred to as the Fundamental Subspaces theorem (originally proven
by Gauss)
Theorem
For any m × n matrix A, there are equalities
C(A)⊥ = N(AT )
C(A) = N(AT )⊥
C(AT )⊥ = N(A)
C(AT ) = N(A)⊥
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 31 / 51
Least-Squares and the Fundamental Subspaces Theorem
The question then becomes: given that we know (4) has at least one
solution, how do we go about finding it (or them)?
The starting point for answering that question is the following theorem,
often referred to as the Fundamental Subspaces theorem (originally proven
by Gauss)
Theorem
For any m × n matrix A, there are equalities
C(A)⊥ = N(AT )
C(A) = N(AT )⊥
C(AT )⊥ = N(A)
C(AT ) = N(A)⊥
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 31 / 51
Least-Squares and the Fundamental Subspaces Theorem
The question then becomes: given that we know (4) has at least one
solution, how do we go about finding it (or them)?
The starting point for answering that question is the following theorem,
often referred to as the Fundamental Subspaces theorem (originally proven
by Gauss)
Theorem
For any m × n matrix A, there are equalities
C(A)⊥ = N(AT )
C(A) = N(AT )⊥
C(AT )⊥ = N(A)
C(AT ) = N(A)⊥
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 31 / 51
Least-Squares and the Fundamental Subspaces Theorem
The question then becomes: given that we know (4) has at least one
solution, how do we go about finding it (or them)?
The starting point for answering that question is the following theorem,
often referred to as the Fundamental Subspaces theorem (originally proven
by Gauss)
Theorem
For any m × n matrix A, there are equalities
C(A)⊥ = N(AT )
C(A) = N(AT )⊥
C(AT )⊥ = N(A)
C(AT ) = N(A)⊥
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 31 / 51
Least-Squares and the Fundamental Subspaces Theorem
Write b′ for prC(A)(b). Then
A ∗ x = b′
⇔ A ∗ x− b ∈ C(A)⊥ (as b′ is the unique vector in C(A) with
b′ − b ∈ C(A)⊥)
⇔ A ∗ x− b ∈ N(AT ) (by Theorem 31)
⇔ AT ∗ A ∗ x− AT ∗ b = AT ∗ (A ∗ x− b) = 0
⇔ AT ∗ A ∗ x = AT ∗ b
This last equation AT ∗ A ∗ x = AT ∗ b has the same set of solutions as
the equation that started the sequence, namely A ∗ x = b′, and is therefore
always consistent. It is derived from our original equation A ∗ x = b by
simply multiplying both sides on the left by AT , and is often referred to as
the associated normal equation of the original matrix equation from which
it was derived.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 32 / 51
Least-Squares and the Fundamental Subspaces Theorem
Write b′ for prC(A)(b). Then
A ∗ x = b′
⇔ A ∗ x− b ∈ C(A)⊥ (as b′ is the unique vector in C(A) with
b′ − b ∈ C(A)⊥)
⇔ A ∗ x− b ∈ N(AT ) (by Theorem 31)
⇔ AT ∗ A ∗ x− AT ∗ b = AT ∗ (A ∗ x− b) = 0
⇔ AT ∗ A ∗ x = AT ∗ b
This last equation AT ∗ A ∗ x = AT ∗ b has the same set of solutions as
the equation that started the sequence, namely A ∗ x = b′, and is therefore
always consistent. It is derived from our original equation A ∗ x = b by
simply multiplying both sides on the left by AT , and is often referred to as
the associated normal equation of the original matrix equation from which
it was derived.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 32 / 51
Least-Squares and the Fundamental Subspaces Theorem
Write b′ for prC(A)(b). Then
A ∗ x = b′
⇔ A ∗ x− b ∈ C(A)⊥ (as b′ is the unique vector in C(A) with
b′ − b ∈ C(A)⊥)
⇔ A ∗ x− b ∈ N(AT ) (by Theorem 31)
⇔ AT ∗ A ∗ x− AT ∗ b = AT ∗ (A ∗ x− b) = 0
⇔ AT ∗ A ∗ x = AT ∗ b
This last equation AT ∗ A ∗ x = AT ∗ b has the same set of solutions as
the equation that started the sequence, namely A ∗ x = b′, and is therefore
always consistent. It is derived from our original equation A ∗ x = b by
simply multiplying both sides on the left by AT , and is often referred to as
the associated normal equation of the original matrix equation from which
it was derived.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 32 / 51
Least-Squares and the Fundamental Subspaces Theorem
Write b′ for prC(A)(b). Then
A ∗ x = b′
⇔ A ∗ x− b ∈ C(A)⊥ (as b′ is the unique vector in C(A) with
b′ − b ∈ C(A)⊥)
⇔ A ∗ x− b ∈ N(AT ) (by Theorem 31)
⇔ AT ∗ A ∗ x− AT ∗ b = AT ∗ (A ∗ x− b) = 0
⇔ AT ∗ A ∗ x = AT ∗ b
This last equation AT ∗ A ∗ x = AT ∗ b has the same set of solutions as
the equation that started the sequence, namely A ∗ x = b′, and is therefore
always consistent. It is derived from our original equation A ∗ x = b by
simply multiplying both sides on the left by AT , and is often referred to as
the associated normal equation of the original matrix equation from which
it was derived.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 32 / 51
Least-Squares and the Fundamental Subspaces Theorem
Write b′ for prC(A)(b). Then
A ∗ x = b′
⇔ A ∗ x− b ∈ C(A)⊥ (as b′ is the unique vector in C(A) with
b′ − b ∈ C(A)⊥)
⇔ A ∗ x− b ∈ N(AT ) (by Theorem 31)
⇔ AT ∗ A ∗ x− AT ∗ b = AT ∗ (A ∗ x− b) = 0
⇔ AT ∗ A ∗ x = AT ∗ b
This last equation AT ∗ A ∗ x = AT ∗ b has the same set of solutions as
the equation that started the sequence, namely A ∗ x = b′, and is therefore
always consistent. It is derived from our original equation A ∗ x = b by
simply multiplying both sides on the left by AT , and is often referred to as
the associated normal equation of the original matrix equation from which
it was derived.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 32 / 51
Least-Squares and the Fundamental Subspaces Theorem
Write b′ for prC(A)(b). Then
A ∗ x = b′
⇔ A ∗ x− b ∈ C(A)⊥ (as b′ is the unique vector in C(A) with
b′ − b ∈ C(A)⊥)
⇔ A ∗ x− b ∈ N(AT ) (by Theorem 31)
⇔ AT ∗ A ∗ x− AT ∗ b = AT ∗ (A ∗ x− b) = 0
⇔ AT ∗ A ∗ x = AT ∗ b
This last equation AT ∗ A ∗ x = AT ∗ b has the same set of solutions as
the equation that started the sequence, namely A ∗ x = b′, and is therefore
always consistent. It is derived from our original equation A ∗ x = b by
simply multiplying both sides on the left by AT , and is often referred to as
the associated normal equation of the original matrix equation from which
it was derived.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 32 / 51
Least-Squares and the Fundamental Subspaces Theorem
This yields a straightforward procedure for finding the least-squares
solution to our original equation A ∗ x = b; i.e., a solution to the
associated normal equation AT ∗ A ∗ x = AT ∗ b, which by the above is
equivalent to a solution to the related equation A ∗ x = prC(A)(b).
Note that the original equation is consistent precisely when b ∈ C(A), or
equivalently when b = prC(A)(b); in other words, when the least-squares
solution is an exact solution. The advantages to seeking a least-squares
solution are i) it always exists (regardless of whether or not the original
equation is consistent), and ii) it yields an actual solution whenever an
actual solutions exist.
Because this procedure finds the least-squares solution first, it can be also
applied to finding the least-squares approximation to b as
prC(A)(b) = A ∗ x, where x is a least-squares solution to the original
equation.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 33 / 51
Least-Squares and the Fundamental Subspaces Theorem
This yields a straightforward procedure for finding the least-squares
solution to our original equation A ∗ x = b; i.e., a solution to the
associated normal equation AT ∗ A ∗ x = AT ∗ b, which by the above is
equivalent to a solution to the related equation A ∗ x = prC(A)(b).
Note that the original equation is consistent precisely when b ∈ C(A), or
equivalently when b = prC(A)(b); in other words, when the least-squares
solution is an exact solution.
The advantages to seeking a least-squares
solution are i) it always exists (regardless of whether or not the original
equation is consistent), and ii) it yields an actual solution whenever an
actual solutions exist.
Because this procedure finds the least-squares solution first, it can be also
applied to finding the least-squares approximation to b as
prC(A)(b) = A ∗ x, where x is a least-squares solution to the original
equation.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 33 / 51
Least-Squares and the Fundamental Subspaces Theorem
This yields a straightforward procedure for finding the least-squares
solution to our original equation A ∗ x = b; i.e., a solution to the
associated normal equation AT ∗ A ∗ x = AT ∗ b, which by the above is
equivalent to a solution to the related equation A ∗ x = prC(A)(b).
Note that the original equation is consistent precisely when b ∈ C(A), or
equivalently when b = prC(A)(b); in other words, when the least-squares
solution is an exact solution. The advantages to seeking a least-squares
solution are i) it always exists (regardless of whether or not the original
equation is consistent), and ii) it yields an actual solution whenever an
actual solutions exist.
Because this procedure finds the least-squares solution first, it can be also
applied to finding the least-squares approximation to b as
prC(A)(b) = A ∗ x, where x is a least-squares solution to the original
equation.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 33 / 51
Least-Squares and the Fundamental Subspaces Theorem
This yields a straightforward procedure for finding the least-squares
solution to our original equation A ∗ x = b; i.e., a solution to the
associated normal equation AT ∗ A ∗ x = AT ∗ b, which by the above is
equivalent to a solution to the related equation A ∗ x = prC(A)(b).
Note that the original equation is consistent precisely when b ∈ C(A), or
equivalently when b = prC(A)(b); in other words, when the least-squares
solution is an exact solution. The advantages to seeking a least-squares
solution are i) it always exists (regardless of whether or not the original
equation is consistent), and ii) it yields an actual solution whenever an
actual solutions exist.
Because this procedure finds the least-squares solution first, it can be also
applied to finding the least-squares approximation to b as
prC(A)(b) = A ∗ x, where x is a least-squares solution to the original
equation.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 33 / 51
Least-Squares and the Fundamental Subspaces Theorem
The steps are:
1 Form the associated normal equation AT ∗ A ∗ x = AT ∗ b;
2 find the solution(s) to the normal equation by computing
rref ([AT ∗ A | AT ∗ b]). These will be the least-squares solution(s) to
the original equation;
3 for any least-squares solution x from Step 2, compute A ∗ x. This will
yield the least-squares approximation prC(A)(b) to b by a vector in the
column space of A.
Again, there will only be one least-squares approximation to b by a vector
in C(A), because we have already seen such a vector is unique. However,
the set of least-squares solutions to the original equation may not be
unique. Thus another consequence of this theory is
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 34 / 51
Least-Squares and the Fundamental Subspaces Theorem
The steps are:
1 Form the associated normal equation AT ∗ A ∗ x = AT ∗ b;
2 find the solution(s) to the normal equation by computing
rref ([AT ∗ A | AT ∗ b]). These will be the least-squares solution(s) to
the original equation;
3 for any least-squares solution x from Step 2, compute A ∗ x. This will
yield the least-squares approximation prC(A)(b) to b by a vector in the
column space of A.
Again, there will only be one least-squares approximation to b by a vector
in C(A), because we have already seen such a vector is unique. However,
the set of least-squares solutions to the original equation may not be
unique. Thus another consequence of this theory is
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 34 / 51
Least-Squares and the Fundamental Subspaces Theorem
The steps are:
1 Form the associated normal equation AT ∗ A ∗ x = AT ∗ b;
2 find the solution(s) to the normal equation by computing
rref ([AT ∗ A | AT ∗ b]). These will be the least-squares solution(s) to
the original equation;
3 for any least-squares solution x from Step 2, compute A ∗ x. This will
yield the least-squares approximation prC(A)(b) to b by a vector in the
column space of A.
Again, there will only be one least-squares approximation to b by a vector
in C(A), because we have already seen such a vector is unique. However,
the set of least-squares solutions to the original equation may not be
unique. Thus another consequence of this theory is
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 34 / 51
Least-Squares and the Fundamental Subspaces Theorem
The steps are:
1 Form the associated normal equation AT ∗ A ∗ x = AT ∗ b;
2 find the solution(s) to the normal equation by computing
rref ([AT ∗ A | AT ∗ b]). These will be the least-squares solution(s) to
the original equation;
3 for any least-squares solution x from Step 2, compute A ∗ x. This will
yield the least-squares approximation prC(A)(b) to b by a vector in the
column space of A.
Again, there will only be one least-squares approximation to b by a vector
in C(A), because we have already seen such a vector is unique. However,
the set of least-squares solutions to the original equation may not be
unique. Thus another consequence of this theory is
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 34 / 51
Least-Squares and the Fundamental Subspaces Theorem
The steps are:
1 Form the associated normal equation AT ∗ A ∗ x = AT ∗ b;
2 find the solution(s) to the normal equation by computing
rref ([AT ∗ A | AT ∗ b]). These will be the least-squares solution(s) to
the original equation;
3 for any least-squares solution x from Step 2, compute A ∗ x. This will
yield the least-squares approximation prC(A)(b) to b by a vector in the
column space of A.
Again, there will only be one least-squares approximation to b by a vector
in C(A), because we have already seen such a vector is unique. However,
the set of least-squares solutions to the original equation may not be
unique. Thus another consequence of this theory is
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 34 / 51
Least-Squares and the Fundamental Subspaces Theorem
Corollary
The value of A ∗ x remains constant as x ranges over the set of
least-squares solutions to the matrix equation A ∗ x = b.
A final question then remains; when will there be a unique least-squares
solution? We say that the matrix A has full column rank (or just full rank
when there is no confusion) if the columns of A are linearly independent;
namely that rank(A) = n. If A is m × n, this imposes the constraint that
m ≥ n (otherwise the rank would have to be less than n the number of
columns). A useful fact about the ranks of matrices (which we do not
prove here) is
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 35 / 51
Least-Squares and the Fundamental Subspaces Theorem
Corollary
The value of A ∗ x remains constant as x ranges over the set of
least-squares solutions to the matrix equation A ∗ x = b.
A final question then remains; when will there be a unique least-squares
solution?
We say that the matrix A has full column rank (or just full rank
when there is no confusion) if the columns of A are linearly independent;
namely that rank(A) = n. If A is m × n, this imposes the constraint that
m ≥ n (otherwise the rank would have to be less than n the number of
columns). A useful fact about the ranks of matrices (which we do not
prove here) is
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 35 / 51
Least-Squares and the Fundamental Subspaces Theorem
Corollary
The value of A ∗ x remains constant as x ranges over the set of
least-squares solutions to the matrix equation A ∗ x = b.
A final question then remains; when will there be a unique least-squares
solution? We say that the matrix A has full column rank (or just full rank
when there is no confusion) if the columns of A are linearly independent;
namely that rank(A) = n. If A is m × n, this imposes the constraint that
m ≥ n (otherwise the rank would have to be less than n the number of
columns).
A useful fact about the ranks of matrices (which we do not
prove here) is
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 35 / 51
Least-Squares and the Fundamental Subspaces Theorem
Corollary
The value of A ∗ x remains constant as x ranges over the set of
least-squares solutions to the matrix equation A ∗ x = b.
A final question then remains; when will there be a unique least-squares
solution? We say that the matrix A has full column rank (or just full rank
when there is no confusion) if the columns of A are linearly independent;
namely that rank(A) = n. If A is m × n, this imposes the constraint that
m ≥ n (otherwise the rank would have to be less than n the number of
columns). A useful fact about the ranks of matrices (which we do not
prove here) is
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 35 / 51
Least-Squares and the Fundamental Subspaces Theorem
Lemma
For any matrix A, rank(A) = rank(AT ∗ A). In particular, A has full
column rank iff AT ∗ A is non-singular.
As the normal equation is always consistent, we see
Theorem
AT ∗ A ∗ x = AT ∗ b will have a unique solution precisely when
N(AT ∗ A) = {0}, which happens iff AT ∗ A is non-singular. In this case,
the unique least-squares solution is given by
x = (AT ∗ A)−1 ∗ AT ∗ b
and the least-squares approximation to b by a vector in C(A) is
prC(A)(b) = A ∗ x = A ∗ (AT ∗ A)−1 ∗ AT ∗ b
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 36 / 51
Least-Squares and the Fundamental Subspaces Theorem
Lemma
For any matrix A, rank(A) = rank(AT ∗ A). In particular, A has full
column rank iff AT ∗ A is non-singular.
As the normal equation is always consistent, we see
Theorem
AT ∗ A ∗ x = AT ∗ b will have a unique solution precisely when
N(AT ∗ A) = {0}, which happens iff AT ∗ A is non-singular. In this case,
the unique least-squares solution is given by
x = (AT ∗ A)−1 ∗ AT ∗ b
and the least-squares approximation to b by a vector in C(A) is
prC(A)(b) = A ∗ x = A ∗ (AT ∗ A)−1 ∗ AT ∗ b
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 36 / 51
Least-Squares and the Fundamental Subspaces Theorem
Lemma
For any matrix A, rank(A) = rank(AT ∗ A). In particular, A has full
column rank iff AT ∗ A is non-singular.
As the normal equation is always consistent, we see
Theorem
AT ∗ A ∗ x = AT ∗ b will have a unique solution precisely when
N(AT ∗ A) = {0}, which happens iff AT ∗ A is non-singular. In this case,
the unique least-squares solution is given by
x = (AT ∗ A)−1 ∗ AT ∗ b
and the least-squares approximation to b by a vector in C(A) is
prC(A)(b) = A ∗ x = A ∗ (AT ∗ A)−1 ∗ AT ∗ b
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 36 / 51
Least-Squares and the Fundamental Subspaces Theorem
Lemma
For any matrix A, rank(A) = rank(AT ∗ A). In particular, A has full
column rank iff AT ∗ A is non-singular.
As the normal equation is always consistent, we see
Theorem
AT ∗ A ∗ x = AT ∗ b will have a unique solution precisely when
N(AT ∗ A) = {0}, which happens iff AT ∗ A is non-singular. In this case,
the unique least-squares solution is given by
x = (AT ∗ A)−1 ∗ AT ∗ b
and the least-squares approximation to b by a vector in C(A) is
prC(A)(b) = A ∗ x = A ∗ (AT ∗ A)−1 ∗ AT ∗ b
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 36 / 51
Least-Squares and the Fundamental Subspaces Theorem
Lemma
For any matrix A, rank(A) = rank(AT ∗ A). In particular, A has full
column rank iff AT ∗ A is non-singular.
As the normal equation is always consistent, we see
Theorem
AT ∗ A ∗ x = AT ∗ b will have a unique solution precisely when
N(AT ∗ A) = {0}, which happens iff AT ∗ A is non-singular. In this case,
the unique least-squares solution is given by
x = (AT ∗ A)−1 ∗ AT ∗ b
and the least-squares approximation to b by a vector in C(A) is
prC(A)(b) = A ∗ x = A ∗ (AT ∗ A)−1 ∗ AT ∗ b
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 36 / 51
Least-Squares and the Fundamental Subspaces Theorem
Lemma
For any matrix A, rank(A) = rank(AT ∗ A). In particular, A has full
column rank iff AT ∗ A is non-singular.
As the normal equation is always consistent, we see
Theorem
AT ∗ A ∗ x = AT ∗ b will have a unique solution precisely when
N(AT ∗ A) = {0}, which happens iff AT ∗ A is non-singular. In this case,
the unique least-squares solution is given by
x = (AT ∗ A)−1 ∗ AT ∗ b
and the least-squares approximation to b by a vector in C(A) is
prC(A)(b) = A ∗ x = A ∗ (AT ∗ A)−1 ∗ AT ∗ b
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 36 / 51
Applications of Least-Squares - Projection onto a subspace
The theorem stated at the end of the previous section indicates an
alternative, and more computationally efficient method of computing the
projection of a vector onto a subspace W of Rn.
Previously we had to first establish an orthogonal basis for W . But given
any basis {v1, . . . , vm} for W , we can avoid first orthogonalizing the basis
by
Concatenating the basis vectors to form the matrix A with
A(:, i) = vi , 1 ≤ i ≤ m,
then for any vector v ∈ Rn, computing the projection of v onto
W = C(A) as
prW (v) = A ∗ (AT ∗ A)−1 ∗ AT ∗ v
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 37 / 51
Applications of Least-Squares - Projection onto a subspace
The theorem stated at the end of the previous section indicates an
alternative, and more computationally efficient method of computing the
projection of a vector onto a subspace W of Rn.
Previously we had to first establish an orthogonal basis for W . But given
any basis {v1, . . . , vm} for W , we can avoid first orthogonalizing the basis
by
Concatenating the basis vectors to form the matrix A with
A(:, i) = vi , 1 ≤ i ≤ m,
then for any vector v ∈ Rn, computing the projection of v onto
W = C(A) as
prW (v) = A ∗ (AT ∗ A)−1 ∗ AT ∗ v
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 37 / 51
Applications of Least-Squares - Projection onto a subspace
The theorem stated at the end of the previous section indicates an
alternative, and more computationally efficient method of computing the
projection of a vector onto a subspace W of Rn.
Previously we had to first establish an orthogonal basis for W . But given
any basis {v1, . . . , vm} for W , we can avoid first orthogonalizing the basis
by
Concatenating the basis vectors to form the matrix A with
A(:, i) = vi , 1 ≤ i ≤ m,
then for any vector v ∈ Rn, computing the projection of v onto
W = C(A) as
prW (v) = A ∗ (AT ∗ A)−1 ∗ AT ∗ v
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 37 / 51
Applications of Least-Squares - Projection onto a subspace
The theorem stated at the end of the previous section indicates an
alternative, and more computationally efficient method of computing the
projection of a vector onto a subspace W of Rn.
Previously we had to first establish an orthogonal basis for W . But given
any basis {v1, . . . , vm} for W , we can avoid first orthogonalizing the basis
by
Concatenating the basis vectors to form the matrix A with
A(:, i) = vi , 1 ≤ i ≤ m,
then for any vector v ∈ Rn, computing the projection of v onto
W = C(A) as
prW (v) = A ∗ (AT ∗ A)−1 ∗ AT ∗ v
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 37 / 51
Applications of Least-Squares - Projection onto a subspace
The theorem stated at the end of the previous section indicates an
alternative, and more computationally efficient method of computing the
projection of a vector onto a subspace W of Rn.
Previously we had to first establish an orthogonal basis for W . But given
any basis {v1, . . . , vm} for W , we can avoid first orthogonalizing the basis
by
Concatenating the basis vectors to form the matrix A with
A(:, i) = vi , 1 ≤ i ≤ m,
then for any vector v ∈ Rn, computing the projection of v onto
W = C(A) as
prW (v) = A ∗ (AT ∗ A)−1 ∗ AT ∗ v
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 37 / 51
Applications of Least-Squares - Projection onto a subspace
Theorem
If an m× n matrix A has full rank, then B = A ∗ (AT ∗ A)−1 ∗ AT satisfies
the identity B ∗ B = B (a matrix satisfying such an identity is called a
projection matrix, since the linear transformation it defines on Rn
corresponds exactly to projection onto its column space C(B)).
Left-multiplication by B then corresponds to the projection map prC(A)(−):
for all v ∈ Rm
prC(A)(v) = B ∗ v
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 38 / 51
Applications of Least-Squares - Projection onto a subspace
Theorem
If an m× n matrix A has full rank, then B = A ∗ (AT ∗ A)−1 ∗ AT satisfies
the identity B ∗ B = B (a matrix satisfying such an identity is called a
projection matrix, since the linear transformation it defines on Rn
corresponds exactly to projection onto its column space C(B)).
Left-multiplication by B then corresponds to the projection map prC(A)(−):
for all v ∈ Rm
prC(A)(v) = B ∗ v
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 38 / 51
Applications of Least-Squares - Projection onto a subspace
Theorem
If an m× n matrix A has full rank, then B = A ∗ (AT ∗ A)−1 ∗ AT satisfies
the identity B ∗ B = B (a matrix satisfying such an identity is called a
projection matrix, since the linear transformation it defines on Rn
corresponds exactly to projection onto its column space C(B)).
Left-multiplication by B then corresponds to the projection map prC(A)(−):
for all v ∈ Rm
prC(A)(v) = B ∗ v
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 38 / 51
Applications of Least-Squares - Projection onto a subspace
Theorem
If an m× n matrix A has full rank, then B = A ∗ (AT ∗ A)−1 ∗ AT satisfies
the identity B ∗ B = B (a matrix satisfying such an identity is called a
projection matrix, since the linear transformation it defines on Rn
corresponds exactly to projection onto its column space C(B)).
Left-multiplication by B then corresponds to the projection map prC(A)(−):
for all v ∈ Rm
prC(A)(v) = B ∗ v
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 38 / 51
Applications of Least-Squares - Polynomial Data Fitting
We suppose given n points {(x1, y1), (x2, y2), . . . , (xn, yn)} in the plane R2,
with distinct x -coordinates (in practice, such sets of points can arise as
data based on the measurement of some quantity - recorded as the
y -coordinate - as a function of some parameter recorded as the
x -coordinate).
We would like to find the equation of the line that best fits these points
(by exactly what measurement the line represents a best possible fit is
explained below). If we write the equation of the line as
y = l(x) = c0 + c1x for indeterminants c0, c1, then what we are looking
for is a least-squares solution to the n × 2 system of equations
c0 + c1x1 = y1
c0 + c1x2 = y2
...
c0 + c1xn = yn
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 39 / 51
Applications of Least-Squares - Polynomial Data Fitting
We suppose given n points {(x1, y1), (x2, y2), . . . , (xn, yn)} in the plane R2,
with distinct x -coordinates (in practice, such sets of points can arise as
data based on the measurement of some quantity - recorded as the
y -coordinate - as a function of some parameter recorded as the
x -coordinate).
We would like to find the equation of the line that best fits these points
(by exactly what measurement the line represents a best possible fit is
explained below).
If we write the equation of the line as
y = l(x) = c0 + c1x for indeterminants c0, c1, then what we are looking
for is a least-squares solution to the n × 2 system of equations
c0 + c1x1 = y1
c0 + c1x2 = y2
...
c0 + c1xn = yn
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 39 / 51
Applications of Least-Squares - Polynomial Data Fitting
We suppose given n points {(x1, y1), (x2, y2), . . . , (xn, yn)} in the plane R2,
with distinct x -coordinates (in practice, such sets of points can arise as
data based on the measurement of some quantity - recorded as the
y -coordinate - as a function of some parameter recorded as the
x -coordinate).
We would like to find the equation of the line that best fits these points
(by exactly what measurement the line represents a best possible fit is
explained below). If we write the equation of the line as
y = l(x) = c0 + c1x for indeterminants c0, c1, then what we are looking
for is a least-squares solution to the n × 2 system of equations
c0 + c1x1 = y1
c0 + c1x2 = y2
...
c0 + c1xn = yn
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 39 / 51
Applications of Least-Squares - Polynomial Data Fitting
We suppose given n points {(x1, y1), (x2, y2), . . . , (xn, yn)} in the plane R2,
with distinct x -coordinates (in practice, such sets of points can arise as
data based on the measurement of some quantity - recorded as the
y -coordinate - as a function of some parameter recorded as the
x -coordinate).
We would like to find the equation of the line that best fits these points
(by exactly what measurement the line represents a best possible fit is
explained below). If we write the equation of the line as
y = l(x) = c0 + c1x for indeterminants c0, c1, then what we are looking
for is a least-squares solution to the n × 2 system of equations
c0 + c1x1 = y1
c0 + c1x2 = y2
...
c0 + c1xn = yn
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 39 / 51
Applications of Least-Squares - Polynomial Data Fitting
Note that, in this system, the xi and yj are constants, and we are trying to
solve for c0 and c1. For n ≤ 2 there will be a solution, but in the
overdetermined case there almost always fails to be one. Hence the need
to work in the least-squares setting.
Examples
We wish to find the least-squares fit by a linear equation to the set of
points (2, 3), (4, 6), (7, 10), (9, 14). This problem can be represented by the
matrix equation
A1 ∗ c = y
where A1 =
1 2
1 4
1 7
1 9
, c =
[
c0
c1
]
, and y =
3
6
10
14
. We note that this matrix is
full rank. Therefore least-squares solution is unique and given by
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 40 / 51
Applications of Least-Squares - Polynomial Data Fitting
Note that, in this system, the xi and yj are constants, and we are trying to
solve for c0 and c1. For n ≤ 2 there will be a solution, but in the
overdetermined case there almost always fails to be one. Hence the need
to work in the least-squares setting.
Examples
We wish to find the least-squares fit by a linear equation to the set of
points (2, 3), (4, 6), (7, 10), (9, 14). This problem can be represented by the
matrix equation
A1 ∗ c = y
where A1 =
1 2
1 4
1 7
1 9
, c =
[
c0
c1
]
, and y =
3
6
10
14
. We note that this matrix is
full rank. Therefore least-squares solution is unique and given by
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 40 / 51
Applications of Least-Squares - Polynomial Data Fitting
Note that, in this system, the xi and yj are constants, and we are trying to
solve for c0 and c1. For n ≤ 2 there will be a solution, but in the
overdetermined case there almost always fails to be one. Hence the need
to work in the least-squares setting.
Examples
We wish to find the least-squares fit by a linear equation to the set of
points (2, 3), (4, 6), (7, 10), (9, 14). This problem can be represented by the
matrix equation
A1 ∗ c = y
where A1 =
1 2
1 4
1 7
1 9
, c =
[
c0
c1
]
, and y =
3
6
10
14
. We note that this matrix is
full rank. Therefore least-squares solution is unique and given by
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 40 / 51
Applications of Least-Squares - Polynomial Data Fitting
Note that, in this system, the xi and yj are constants, and we are trying to
solve for c0 and c1. For n ≤ 2 there will be a solution, but in the
overdetermined case there almost always fails to be one. Hence the need
to work in the least-squares setting.
Examples
We wish to find the least-squares fit by a linear equation to the set of
points (2, 3), (4, 6), (7, 10), (9, 14). This problem can be represented by the
matrix equation
A1 ∗ c = y
where A1 =
1 2
1 4
1 7
1 9
, c =
[
c0
c1
]
, and y =
3
6
10
14
.
We note that this matrix is
full rank. Therefore least-squares solution is unique and given by
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 40 / 51
Applications of Least-Squares - Polynomial Data Fitting
Note that, in this system, the xi and yj are constants, and we are trying to
solve for c0 and c1. For n ≤ 2 there will be a solution, but in the
overdetermined case there almost always fails to be one. Hence the need
to work in the least-squares setting.
Examples
We wish to find the least-squares fit by a linear equation to the set of
points (2, 3), (4, 6), (7, 10), (9, 14). This problem can be represented by the
matrix equation
A1 ∗ c = y
where A1 =
1 2
1 4
1 7
1 9
, c =
[
c0
c1
]
, and y =
3
6
10
14
. We note that this matrix is
full rank. Therefore least-squares solution is unique and given by
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 40 / 51
Applications of Least-Squares - Polynomial Data Fitting
Examples
c = (A1T ∗ A1)−1 ∗ A1T ∗ y =
[
−.18966
1.53448
]
Thus the desired equation is given by
l(x) = −.18966 + 1.53448x
We can also measure the degree to which this comes close to being an
actual solution (which would only exist if the points were colinear). Given
c, the vector
y1 := A1 ∗ c =
2.8793
5.9483
10.5517
13.6207
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 41 / 51
Applications of Least-Squares - Polynomial Data Fitting
Examples
c = (A1T ∗ A1)−1 ∗ A1T ∗ y =
[
−.18966
1.53448
]
Thus the desired equation is given by
l(x) = −.18966 + 1.53448x
We can also measure the degree to which this comes close to being an
actual solution (which would only exist if the points were colinear). Given
c, the vector
y1 := A1 ∗ c =
2.8793
5.9483
10.5517
13.6207
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 41 / 51
Applications of Least-Squares - Polynomial Data Fitting
Examples
c = (A1T ∗ A1)−1 ∗ A1T ∗ y =
[
−.18966
1.53448
]
Thus the desired equation is given by
l(x) = −.18966 + 1.53448x
We can also measure the degree to which this comes close to being an
actual solution (which would only exist if the points were colinear). Given
c, the vector
y1 := A1 ∗ c =
2.8793
5.9483
10.5517
13.6207
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 41 / 51
Applications of Least-Squares - Polynomial Data Fitting
Examples
c = (A1T ∗ A1)−1 ∗ A1T ∗ y =
[
−.18966
1.53448
]
Thus the desired equation is given by
l(x) = −.18966 + 1.53448x
We can also measure the degree to which this comes close to being an
actual solution (which would only exist if the points were colinear). Given
c, the vector
y1 := A1 ∗ c =
2.8793
5.9483
10.5517
13.6207
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 41 / 51
Applications of Least-Squares - Polynomial Data Fitting
Examples
is (by the above) the least-squares approximation to y by a vector in the
column space of A1 (accurate to 4 decimal places). The accuracy can then
be estimated by the distance of this approximation to the original vector y:
e1 := ‖y− y1‖ = 0.68229
The last computation in this example indicates what is being minimized
when one fits data points in this way.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 42 / 51
Applications of Least-Squares - Polynomial Data Fitting
Examples
is (by the above) the least-squares approximation to y by a vector in the
column space of A1 (accurate to 4 decimal places). The accuracy can then
be estimated by the distance of this approximation to the original vector y:
e1 := ‖y− y1‖ = 0.68229
The last computation in this example indicates what is being minimized
when one fits data points in this way.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 42 / 51
Applications of Least-Squares - Polynomial Data Fitting
Remark
Using least-squares linear approximation techniques to find the best linear
fit to a set of n data points {(x1, y1), (x2, y2), . . . , (xn, yn)} results in the
equation of a line l(x) = c0 + c1(x) which minimizes the sum of the
squares of the vertical distances from the given points to the line:
n∑
i=1
(yi − l(xi ))2
Unless the line is horizontal, the vertical distance will be slightly larger
than the actual distance, which is measured in the direction orthogonal to
the line, and minimizing the sum of squares of those distances would
correspond geometrically to what one might normally think of as
constituting a least-squares fit. However this linear algebraic approach
provides a simple and efficient method for finding a good approximation by
a line which will be exact whenever the points are colinear.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 43 / 51
Applications of Least-Squares - Polynomial Data Fitting
The setup above provides a method for finding not just linear
approximations, but higher order ones as well. The linear algebra is
essentially the same. To illustrate,
Examples
Suppose instead we were asked to find the least-squares fit by a quadratic
equation to the same set set of points (2, 3), (4, 6), (7, 10), (9, 14). As
before, this problem can be represented by the matrix equation
A2 ∗ c = y
Where A2 =
1 2 4
1 4 16
1 7 49
1 9 81
, c =
c0c1
c2
, and y =
3
6
10
14
. We note that the
matrix A2 is again full rank (it has rank 3). Therefore least-squares
solution is unique and given by
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 44 / 51
Applications of Least-Squares - Polynomial Data Fitting
Examples
c = (A2T ∗ A2)−1 ∗ A2T ∗ y =
0.9603450.984483
0.050000
Thus the desired equation is given by
q(x) = 0.960345 + 0.984483x + 0.050000x2
Measuring the degree to which this comes close to being an actual
solution (which would only exist if the points all lay on the same quadratic
graph), we compute
y2 := A2 ∗ c =
3.1293
5.6983
10.3017
13.8707
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 45 / 51
Applications of Least-Squares - Polynomial Data Fitting
Examples
which is (by the above) the least-squares approximation to y by a vector in
the column space of A2 (accurate to 4 decimal places). The accuracy can
then be estimated by the distance of this approximation to the original
vector y:
e2 := ‖y− y2‖ = 0.46424
As with the linear fit, the quantity being minimized is the sum of squares
of vertical distances of the original points to the graph of this quadratic
function. Notice the modest improvement; from 0.68229 to 0.46424.
Because the column space of A2 contains the columns space of A1, the
least-squares approximation y2 has to be at least as good as the linear one
y1, and almost always will be closer to the original vector y.
We will illustrate our final point by looking at what happens if we go one
degree higher.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 46 / 51
Applications of Least-Squares - Polynomial Data Fitting
Examples
We will find the least-squares fit by a cubic equation to the same set set of
points (2, 3), (4, 6), (7, 10), (9, 14). As before, this problem can be
represented by the matrix equation
A3 ∗ c = y
Where A3 =
1 2 4 8
1 4 16 64
1 7 49 343
1 9 81 729
, c =
c0
c1
c2
c3
, and y =
3
6
10
14
. The matrix A3
is still full rank (it has rank 4). Therefore least-squares solution is unique
and given by
c = (A3T ∗ A3)−1 ∗ A3T ∗ y =
−1.6
2.890476
−0.342857
0.023810
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 47 / 51
Applications of Least-Squares - Polynomial Data Fitting
Examples
Thus the desired equation is given by
f (x) = −1.6 + 2.890476x +−0.342857x2 + 0.023810x3
However, now when we compute the least-squares approximation we get
y3 := A3 ∗ c =
3
6
10
14
which is not just an approximation but rather the vector y on the nose;
e3 = 0. In other words, given these four points, there is a unique cubic
equation which fits the points exactly. Inspecting the computation more
carefully, we see why: the matrix A3 is both full rank and square. In other
words, non-singular.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 48 / 51
Applications of Least-Squares - Polynomial Data Fitting
Examples
In this case the system of equations is no longer over determined but
rather balanced. And with a non-singular coefficient matrix, we get a
unique solution. Symbolically, this can be seen by noting that the
non-singularity of A3 results in a simplified expression for c, confirming it
is indeed an exact solution:
c = (A3T ∗ A3)−1 ∗ A3T ∗ y = A3−1 ∗ (A3T )−1 ∗ A3T ∗ y = A3−1 ∗ y
This set of examples, in which we compute successively higher order
approximations to a set of n data points until we finally arrive at an exact
fit, is part of a more general phenomenon, which we record without proof
by the following theorem.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 49 / 51
Applications of Least-Squares - Polynomial Data Fitting
Theorem
Given n points in R2 with distinct x -coordinates
{(x1, y1), (x2, y2), . . . , (xn, yn)}, the least-squares fit by a polynomial of
degree k is computed by finding the least-squares solution to the matrix
equation
Ak ∗ c = y
where y = [y1 y2 . . . yn] and Ak is the n × (k + 1) matrix with
Ak(i , j) = x
j−1
i .
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 50 / 51
Applications of Least-Squares - Polynomial Data Fitting
Theorem - cont. d
The matrix Ak will have full column rank for all k ≤ (n − 1), and so the
least-squares solution c is unique and given by
[c0 c1 . . . ck ]T = c = (ATk ∗ Ak)
−1 ∗ ATk ∗ y
with degree k polynomial least-squares fit given by
pk(x) =
k∑
i=0
ci x i
Because An−1 is non-singular, there will be a polynomial of degree at most
(n − 1) which fits the points exactly. Moreover, the polynomial of degree
at most (n − 1) which accomplishes this will be unique.
Linear Algebra Module IV: Inner Product Spaces Summer Term 2021 51 / 51
Inner Products on Rn
The Dot Product on Rn
Inner Products
Orthogonal Vectors and Subspaces
Projections and Least Squares
Least-Squares Approximation
Applications of Least-Squares