Economics Teaching Materials
Lecture Notes in Linear Algebra
Chris D. Orme
September 2009
Economics
School of Social Sciences The University of Manchester Manchester M13 9PL
Lecture Notes in Linear Algebra
Chris D. Orme Economics
School of Social Sciences University of Manchester chris.orme@manchester.ac.uk
September 2009
c The University of Manchester
These notes are published with the permission of the University of Manchester and can be downloaded by any individual for their own private study. Permission must be obtained to copy and distribute for teaching purposes, outside of The University of Manchester.
ii
Contents
Preface
1 Vectors
vii
1
1.1 Introductory remarks, notation, deÖnition and concepts
1.2 Manipulatingvectors………………….. 3
1.2.1 Scalarmultiplicationofvectors . . . . . . . . . . . . . 4
1.2.2 Additionofvectors……………….. 4
1.2.3 Linearcombinationsofvectors . . . . . . . . . . . . . 6
1.2.4 Scalar(dotorinner)product ………….. 6
1.3 Angle&orthogonality …………………. 7 1.3.1 Orthogonality………………….. 9
1.4 Exercises ……………………….. 9
2 Vector Spaces 11
2.1 Introductoryremarks………………….. 11 2.2 Spacesandsubspaces………………….. 12 2.3 Lineardependence&independence…………… 13 2.4 Someresults………………………. 14 2.5 Exercises ……………………….. 18
3 Matrices & Rank 21
3.1 Introductoryremarks………………….. 21 3.1.1 Somespecialmatrices ……………… 22
3.2 Basicmatrixoperations&deÖnitions. . . . . . . . . . . . . . 22
3.3 Multivariaterandomvariables……………… 23
3.4 Matrixmultiplication………………….. 24 3.4.1 Someusefulresults……………….. 25
3.5 Linearindependenceandrank……………… 26 3.5.1 Someresultsconcerningrank ………….. 27
3.6 Exercises ……………………….. 28
4 Determinants and the Inverse Matrix 31
4.1 Introductoryremarks………………….. 31 4.2 Determinants ……………………… 31
iii
. . . 1
iv
CONTENTS
4.2.1 Propertiesofdeterminants …………… 33
4.2.2 ExpansionbyCofactors …………….. 33
4.3 Theinversematrix …………………… 35
4.4 Some further results concerning rank, determinants and inverses 36
4.5 Exercises ……………………….. 37
5 Linear Transformations and Simultaneous Equations 41
5.1 Introductoryremarks………………….. 41
5.2 Non-singulartransformations ……………… 43 5.2.1 Special cases: elementary transformations . . . . . . . 44
5.3 SimultaneousLinearEquations …………….. 45 5.3.1 ExistenceandUniqueness……………. 47 5.3.2 Homogeneousequations …………….. 49 5.3.3 SomeExamples…………………. 49
5.4 Exercises ……………………….. 50
6 The Characteristic Value Problem 53
6.1 Introductoryremarks………………….. 53 6.2 The characteristic polynomial and equation . . . . . . . . . . 53 6.3 SymmetricMatrices ………………….. 55 6.4 TheDiagonalisationofaSymmetricMatrix . . . . . . . . . . 58 6.5 Exercises ……………………….. 59
7 Quadratic Forms 61
7.1 Introductoryremarks………………….. 61
7.2 DeÖniteandindeÖnitequadraticforms…………. 62
7.3 Propertiesofquadraticforms ……………… 63
7.4 Projectionmatrices…………………… 66
7.4.1 Orthogonal projections and least squares . . . . . . . 67
7.5 Applicationsinstatistics………………… 68
7.5.1 Linear combinations of normal random variables . . . 68
7.5.2 Independent normal random vectors . . . . . . . . . . 70
7.5.3 The distribution of some quadratic forms in normal randomvariables………………… 70
7.6 Exercises ……………………….. 73
8 Matrix Di§erential Calculus and Optimisation 77
8.1 Introductoryremarksandnotation…………… 77
8.2 TheGradientvectorandJacobianmatrix . . . . . . . . . . . 78 8.2.1 Real-valuedfunctionofavector. . . . . . . . . . . . . 79 8.2.2 Vectorfunctionofavector …………… 80
8.3 Using di§erentials to identify the derivative . . . . . . . . . . 80 8.3.1 Identifyingthederivative ……………. 81
8.4 Unconstrainedoptimisation ………………. 83
CONTENTS v
8.4.1 Globalandlocaloptima…………….. 83 8.4.2 StationaryPoints………………… 84 8.4.3 Hessianmatrix …………………. 85 8.4.4 Characterising stationary points . . . . . . . . . . . . 86
8.5 ConcavityandConvexity ……………….. 93 8.5.1 Concavityconditions………………. 94 8.5.2 Convexityconditions………………. 94
8.6 First and second order mean value expansions . . . . . . . . . 95 8.6.1 Real-valuedfunctionsI……………… 95 8.6.2 Real-valuedfunctionsII …………….. 97 8.6.3 Vectorfunctions ………………… 98
8.7 Exercises ……………………….. 98
A Some useful results from statistical theory 101
B A swift revision of matrix algebra 105
C The classical linear regression model 111
C.1 Ordinary Least Squares (OLS) Estimation . . . . . . . . . . . 114
vi CONTENTS
Preface
These lecture notes present a number of topics in linear algebra that, I believe, are necessary for the understanding, and subsequent analysis, of various problems in statistics/econometrics. Chapters 1-7 were originally prepared as a one term sequence of 18 lectures and 9 problem classes for spe- cialist second year undergraduates, studying at the University of York over the period 1990-1995. These students had a good statistical background, from their Örst year of study, including some exposure to simple bivariate regression, probability and hypothesis testing. Concurrently, in their second year, they took a course in statistical theory (probability, distribution theory and inference) and a course in univariate and bivariate calculus. They were about to embark (in their third year) on an advanced econometric theory course. In recent years these notes (supplemented by a little material on ma- trix di§erential calculus) have been recommended to postgraduate (taught Masters) students at the University of Manchester as reference material un- derpinning a one semester course in econometrics. In the current edition, I have substantially extended the material on multivariate optimisation and the result is Chapter 8.
It is, therefore, intended that these lecture notes should provide an ap- propriate grounding in linear algebra for those wishing to study economet- ric theory. On reáection, the undergraduates at York were fortunate since, nowadays, most econometrics courses do without such preliminary train- ing in linear algebra. Whilst most undergraduate (or graduate) texts in econometrics provide ìappendicesî, or preliminary chapters, itemising im- portant results, they do not necessarily o§er much in the way of a deeper investigation of the technology which delivers these results. In my view this is unsatisfactory, from both the studentsíand lecturerís perspectives. Of course, putting on another course and requiring students to invest in yet another text can be viewed as being costly (in the short run); even at Manchester! Hence, and in the absence of a self-contained lecture course in linear algebra that would, undoubtedly, make the students better prepared, these notes are made freely available to students studying econometrics at the University of Manchester. Although presented in book form, the content does not yet amount to a text book. The reader should bear in mind that they these are, as the title states, lecture notes and the writing style reáects
vii
viii PREFACE
this.
In preparing, and updating, these notes, my aim was to give an intuitive
feel for the subject at the conceptual level, whilst also adopting a reasonable amount of rigour – given the assumed background of the students. Since, they are also written with a follow-on course in econometrics in mind, the notation employed is consistent with that of modern econometrics. However, the language of linear algebra, as developed here, will (I hope) be of use in many other areas of advanced economics courses in that it will enable students to express quite sophisticated concepts succinctly.
Each Chapter concludes with a set of problems listed as exercises. Many of these were tried out on students during my time at York and I thank them for their e§orts (and, sometimes, ingenuity) in solving the problems set. I must also thank various colleagues who have helped develop this material – although they may not know it! Especially, Don Poskitt, Les Godfrey, Peter Lambert, Peter Smith and Mel Weeks. I inherited Don Poskittís lecture notes when teaching the course for the Örst time, at York, and I am happy to acknowledge his ináuence which must surely feature heavily in what follows. However, the notes have been redrafted a number of times and Don can not be held responsible for what follows.
Finally I am eternally grateful to Rashmi Sarmah, a PhD student at Manchester, for carefully transcribing the original from ChiWriter to Scien- tiÖcWord.
Chris Orme
Manchester, September 2009
Chapter 1 Vectors
1.1 Introductory remarks, notation, deÖnition and concepts
We use vectors as a compact notation to represent concepts/phenomena in applied mathematics/social sciences that, in some sense, can not be rep- resented by a single number. For example, the n prices, (p1;:::pn); and quantity demands, (q1;::::qn) can be represented in the following way:
26 p 1 37 26 q 1 37 p=6 p2 7; q=6 q2 7;
4 . 5 4 . 5
pn qn
where pi denotes the price of good i; and qj the demand for good i; pi and qi
are referred to as the elements of the vectors p and q; respectively, and we often denote this by writing p = fpig ; for example, where pi often referred to as a typical element (or co-ordinate) of p:
In the above both p and q would be referred to as (n 1) column vectors or simply (n 1) vectors, by which we imply that a vector is a column of numbers. In most texts it is usual to denote a vector in lower case bold face type (and I shall try to keep this convention in these notes wherever possible). Thus, when we see a, b, x or u we imagine a column of numbers.
In some cases it is convenient to think of a row of numbers, rather than a column, in which case we consider the tranpose of a vector. In econometrics it is usual to deÖne the transpose of the vector x as x0 (although many authors employ xT; instead of x0; in order to avoid confusion with di§erentiation). Thus for the vector of prices, p, we have p0 = [p1;::::pn]. Note also I employ square brackets, [:::] ; or ìroundîbrackets, (:::) ; to enclose the elements of a vector.
Having introduced the idea of a transpose we now deÖne how we can succinctly express total expenditure, m; on n goods:
1
2 CHAPTER 1. VECTORS
x
θ
O
Figure 1.1: Directed Line Segment: Vector
Xn i=1
the operation p0q known as the scalar or inner or dot product. This is because the result of p0q is a scalar, m. We shall say more about the scalar product operation shortly.
Historically, vectors arose naturally in mechanics as being 2 numbers (magnitude and direction) deÖning a force, and this is depicted by a directed line segment, %; or, &; indicating magnitude and direction. It is, however, convenient to have all vectors ëstartingíat the same point. In this sense, a vector is often thought of as a directed line segment emanating from the origin of a rectangular co-ordinate system. The simple case of such a (2 1) vector, x, is illustrated in Figure 1.1. The vector emanates from the origin, 0; at an angle of to the horizontal co-ordinate axis and travels to the tip of the arrow head.
In three dimensions we have, for example, x0 = (x1;x2;x3) where the three numbers x1;x2 and x3 are the co-ordinates of a point (called the ele- ments of the vector x), which together with the origin enables us to draw and deÖne the vector, although from now on I shall ìterminateîthe vector with a dot (and not an arrow head) since all vectors emanate from the origin so that there is no ambiguity about direction; see Figure 1.2.
Notice that the same vector is equivalently deÖned by the three quan- tities: , and OX, where and are angles giving direction (the Örst with respect to the x1 axis and the second giving elevation with respect to
m =
piqi p0q
1.2. MANIPULATING VECTORS 3
θ
φ
Figure 1.2: A vector in 3 dimensions
the x1-x2 plane) and OX is the distance (from the origin) giving magnitude. Such diagrams get more di¢ cult to draw as the number of elements in a vector increases.
1.2 Manipulating vectors
We have already introduced the transpose of an arbitrary (n 1) vector x as x0 = [x1; :::xn] : This can be regarded simply as a notational device as x and x0 represent essentially the same information. We also write x = fxig, meaning ìlet the vector x have typical element xiî. The following concepts are important:
1. NULL VECTOR: 0 = f0g 8 i
2. EQUALITY: x = y i§ xi = yi 8 i; both x and y must be (n 1)
3. INEQUALITY:xyi§ xi yi 8i;bothxandymustbe(n1)
4. MAGNITUDE or EUCLIDEAN NORM: In the 2-dimensional case we can deÖne the magnitude of a vector quite readily by using ëPythagorasí- and this extends quite naturally to 3-dimensions. For example, if x0 = [x1;x2]; then the length or magnitude of x is called the Euclidean Norm and is deÖned as:
4 CHAPTER 1. VECTORS
Z
z = 2x
X
x
O
Figure 1.3: Scalar multiplication
k x k = q x 21 + x 2 2
where, by default, square roots are positive. We now simply generalise this
for(n1)vectors,x0 =fxg=[x ;xv;:::x ]: i12n
uXn
k x k = t x 2i
i=1
Note that this also deÖnes the magnitude of a scalar; i.e., a (1 1) vector.
We can change the magnitude of a vector by scalar multiplication.
1.2.1 Scalar multiplication of vectors
If x = fxig; then y = x = (x1;:::xn)0 = fxig: In which case, kyk =
s Pn 2 2 s 2 Pn 2
kxk= xi = xi =jjkxk:
i=1 i=1
Note that if > 0 then jj = and the direction of x remains unchanged;
if < 0; then x is rotated through 180 ( radians); see Figure 1.3 where we the vector x; of length OX; and z = 2x of length OZ (twice that of x):
1.2.2 Addition of vectors
The addition of vectors can be illustrated using The Parallelogram Law; see Figure 1.4.
1.2. MANIPULATING VECTORS 5
z=x+y xX
Y
y
Z
O
Figure 1.4: The Parallelogram Law for the addition of vectors
Let x = fxig and y = fyig, both (n 1) : Then z = x + y is deÖned and isa(n1)vectorsuchthatz=fzig=fxi +yig,fori=1;:::;n:Thevector z is the directed line segment OZ: (Vectors can only be added together if they have the same number of elements.) Note:
x+y=y+x
x + y + z = (x + y) + z = x + (y + z):
Subtraction is deÖned as a combination of scalar multiplication (by 1) and addition. Provided x and y are both (n1) then x y = x+( 1)y and has typical element fxi yig.
The Euclidean Norm, kx yk ; measures the distance between x and y; see Figure 1.5, where x y is depicted, for two given vectors x and y and OX (the length of x y) measures the distance between x and y:
The concept of addition and scalr multiplication introduces the useful idea of the resolution of a vector into its components. Let x = fxig be a (n 1) vector, and ei; i = 1; :::n; a sequence of n; (n 1) ; vectors where ei has 1 as its ith element and 0 elsewhere. For example, if n = 3, then e1 = (1;0;0)0 ; e2 = (0;1;0)0 and e3 = (0;0;1)0 : The ei are called unit vectors (they each have unit length) and one can think of them as ëdeÖningí the axes in the n-dimensional co-ordinate system or n-dimensional Euclidean Space (see Chapter 2 for a further discussion of vector spaces). From the deÖnition of the ei we have
6 CHAPTER 1. VECTORS
x x- y
X
Y
y
O
Figure 1.5: The Distance between 2 vectors: XY
Xn i=1
x is said to be linear combination of the n unit vectors. This is simply illustrated for the n = 2 case, with two vectors x and y; in Figure 1.6. For example, if x = [3; 4]0 ; then x = 3e1 + 4e2; by the Parallelogram Law.
We shall explore the very important concept of linear combinations of vectors in Chapter 2. For the present we simply note the following (general) deÖnition of linear combinations.
1.2.3 Linear combinations of vectors
Let fx1; ::::xmg be a set of m; (n 1), vectors and f1; :::; mg a set of m scalars. Then the following (n 1) vector, y; is called a linear combination of these m vectors fx1; :::; xmg :
We have already introduced this concept. In general, let x = fxig and y = fyig be two (n 1) vectors. The scalar product between x and y is written x0y and deÖned as
x=
xiei =x1e1 +x2e2 +::::::+xnen:
y=
Xm j=1
jxj =1x1+:::+mxm: 1.2.4 Scalar (dot or inner) product
1.3. ANGLE & ORTHOGONALITY 7
x
y
e2
O e1
Figure 1.6: Resolution of a vector
Xn x0y =
i 1
xiyi =
Xn i=1
yixi = y0x:
Note: x0y is NOT the same as xy0; which is something entirely di§erent. The latter is called an OUTER PRODUCT and will not be discussed until we talk about matrices in Chapter 3.
Using scalar product, we Önd that x0x = kxk2 ; the squared "length" (or squared Euclidean Norm) of x:
1.3 Angle & orthogonality
The scalar product between two vectors is sometimes deÖned in terms of the angle between those two vectors. We consider two (2 1) vectors, x and y, as depicted in Figure 1.7.
In the above diagram, is the angle between the two vectors, x and y: The lengths of the vectors are kxk = OX and kyk = OY: The dashed line, XA is perpendicular (that is, orthogonal) to OY: Elementary trigonometry tells us that
cos = OA = OA; OX kxk
sothatkxkcos=OA=OY AY =kyk AY:Thus,squaring,weget kxk2cos2=(kyk AY)2 =kyk2 2kykAY +(AY)2: (1.1)
8 CHAPTER 1. VECTORS
X
θ
x
y
Y A
O
Figure 1.7: The Angle between Vectors
But, we can also write
(AY )2 = (XY )2 (AX)2
= ky xk2 kxk2sin2
because ky xk = XY and AX = kxk sin : Substituting this into (1.1), and also remembering that AY = kyk kxk cos : we get
kxk2 cos2 = kyk2 2 kyk (kyk kxk cos ) + ky xk2 kxk2 sin2 : Since sin2 + cos2 = 1; we get (by taking kxk2 sin2 to the left)
kxk2 = kyk2 +2kykkxkcos+ky xk2: Now, y x = (y1 x1; y2 x2)0 ; so that
ky xk2 = (y1 x1)2 +(y2 x2)2
= y12 +y2 +x21 +x2 2y1x1 2y2x2
= kyk2 + kxk2 2y0x:
Then substituting this into (1.2) and re-arranging the terms yields
cos= y0x = x0y : kyk kxk kxk kyk
(1.2)
Thus, cos = x0y and this deÖnes , the angle between x and y: In kxk kyk
higher than three dimensions the angle between two vectors is di¢ cult to imagine, but this expression for cos still obtains.
1.4. EXERCISES 9 Let x and y be (n 1) vectors, then
Pn
i=1 i=1
cos =
x0y
= s i=1
xi yi
xiyi kxkkyk Pn2Pn2
and note that the Cauchy-Schwartz Inequality (see Exercises at the end of this Chapter) ensures that with this deÖnition of angle, 1 cos 1, as would be required.
1.3.1 Orthogonality
If n = 2, the two vectors are said to be orthogonal (to one another) if they are at right-angles (or perpendicular, to one another). Clearly, in such cases cos () = 0: In general, we deÖne orthogonality in exactly the same way for (n 1) vectors, x and y, and the deÖnition of cos() above implies that x and y will be orthogonal i§ x0y = 0 (since both kxk > 0 and kyk > 0).
Note that the unit vectors are mutually orthogonal so that e0iej = ij; where ij is the Kronecker Delta deÖned as ij = 1 if i = j and is zero otherwise.
1.4 Exercises
1. Given the vectors a0 = (2; 1), b0 = (1; 3) ; plot the vectors 2a; a + b and illustrate the use of the parallelogram rule to Önd a+b and 2a+b.
2. If a0 = (3;2;1) and b0 = (1;5;6) solve the following vector equation for x:
3a + 2x = 5b
Illustrate geometrically.
3. Let x and y be any two (n1) vectors.
(a) Prove that (x0y)2 kxk2 kyk2.
(b) Thus deduce that jx0yj kxkkyk; the Cauchy-Scwartz Inequal-
ity.
(c) Using this result prove the triangle inequality:
ka bk + kb ck ka ck
[Hint: writeka ck2 ask(a b)+(b c)k2 andshowthatthis isequaltoka bk2+kb ck2+2(a b)0(b c). Thenwrite x = a b and y = b c:]
10
CHAPTER 1. VECTORS
Verify, by direct computation, that this inequality holds for the specialcasea0 =(3;7;1);b0 =(9;1;4)andc0 =(3;0;2):Illustrate geometrically:
4. Let(Xi;Yi)denotedata,i=1;:::n;anddeÖnethe(n1)vectors x0 = X1 X;:::;Xn X; y0 = Y1 Y;:::;Yn Y
where X and Y are the respective sample means.
(a) Give a statistical interpretation to
i. the scalar product between x and y
ii. the scalar product between x with itself; and,
iii. the cosine between x and y.
(b) Suppose the Yi are independently and identically distributed nor- mal random variables, each having mean and variance 2; i.e., the Yi are iid N ; 2 : What is the exact sampling distribution of y0y=2 ? 2
(c) If the Xi are also iid, but with common distribution N ; ; independent of Yi; then what is the exact sampling distribution of y0y=x0x; under the assumption that 2 = 2 ?
5. Consider the following simple bivariateb regression model
Yi= + Xi+ui; i=1;::;n:
(a) Express the formula for the estimated ordinary least squares slope
coe¢ cient, b, as the ratio of two inner (or scalar) products.
(b) By showing that
x0u Xn
b= +x0x= + ciui;
i=1
where u = (u1; :::; un)0 ; x is deÖned as in Question 4, and the ci are Xi X=x0x;i=1;::;n;provethefollowingresult: 2 ìIf the disturbance terms in the regression model are iid N 0; ; then (assuming that the Xi are prespeciÖed constants) b has a sampling distribution which is normal with mean and variance 2=x0x:î
Chapter 2 Vector Spaces
2.1 Introductory remarks
The totality of all (n1) vectors is called the n-dimensional VECTOR SPACE, also sometimes called the n-dimensional EUCLIDEAN SPACE and denoted thus:
Rn =x:x0 =(x1;::;xn); xi 2R; i=1;:::;n :
In order to be able to deÖne explicitly any one of these vectors, in this space, we only need to think of the n unit vectors ei; i = 1;:::;n; as intro- duced in the previous chapter. This is because any x 2 Rn can be written as a simple linear combination of these unit vectors (recall the discussion pertaining to the resolution of a vector into its component parts):
Xn i=1
Observe that:
(i) any vector x 2 Rn can be constructed as a linear combination of e1; :::en;
(ii) the e1; :::en represent a subset of the totality of all (n 1) vectors, Rn; but none of the ei can be expressed as a linear combination of the remaining ej; j 6= i:.
ì(i)îsays that fe1;:::;eng SPANS the whole space; and ì(ii)îsays the set of vectors fe1; :::; eng is LINEARLY INDEPENDENT. ì(i)îand ì(ii)î together say that fe1; :::; eng forms a BASIS for the n-dimensional vector space.
It is because the basis set contains a maximum of n vectors, from which any other (n 1) vector can be constructed, that Rn is called n-dimensional.
11
x=x1e1 +:::+xnen =
xiei:
12 CHAPTER 2. VECTOR SPACES
It is not so-called because all vectors in the space have n elements (or co- ordinates). Also note that a basis set of vectors is NOT unique: any set which is linearly independent and spans can serve as a basis.
To illustrate this last point consider R3; which has a basis of e1 = (1;0;0)0 ; e2 = (0;1;0)0 and e3 = (0;0;1)0 : Any vector x 2 R3 can be writ- ten as x = x1e1 + x2e2 + x3e3: Now suppose for a particular vector a 2 R3; a3 6= 0: This means we can write e3 = (a a1e1 a2e2) =a3; which is a lin- ear combination of fe1; e2; ag. Thus any vector which can be expressed as a linear combination of fe1;e2;e3g can equivalently be expressed as a linear combination of fe1; e2; ag :
x = =
=
x1e1 +x2e2 +x3e3
x1e1 + x2e2 + x3 (a a1e1 a2e2) =a3
(x1 x3a1)e1 + (x2 x3a2)e2 + x3a: a3 a3 a3
Therefore, the set fe1; e2; ag SPANS R3: The set is also LINEARLY INDE- PENDENT since, as a3 6= 0 and both e1 and e2 have their third elements equal to zero, it is impossible to express any one of the vectors in fe1; e2; ag as a linear combination of the remaining two vectors. Thus fe1 ; e2 ; ag is also a BASIS for R3:
2.2 Spaces and subspaces
DeÖnition 1 (Vector Space) A vector space, V , is a non-empty set (or collection) of vectors which is closed under the operations of addition and scalar multiplication. ìClosedî means that if a and b 2 V then a 2V ; b2V and a+b2V; for any scalars 2R and 2R:
As noted, Rn is sometimes referred to as the n dimensional vector space. We immediately have the following result:
Proposition 1 A vector space must contain the null vector 0 = (0; :::; 0)0 : Problem 1 Left as an exercise.
DeÖnition 2 (Subspace) A subspace, of Rn; is a non-empty subset of Rn which is also a vector space.
Example 1 Consider R3 as illustrated Figure 1.2 with a single vector. All vectors lying in the x1 x2 plane form a subspace of R3; this subspace of (3 1) vectors is clearly two-dimensional, since all the vectors in this vector (sub)space reside in the 2 dimensional x1 x2 plane. Algebraically we only require two linearly independent vectors, e1 = (1; 0; 0)0 and e2 = (0; 1; 0)0 to
2.3. LINEAR DEPENDENCE & INDEPENDENCE 13
generate all the (3 1) vectors in this subspace. A ray through the origin is a one-dimensional subspace; if the ray is at 45o to all axes then a basis is simply (1; 1; 1)0 : Another example is depicted in this FLASH demo1 . Here we have two vectors in R3 : one red vector and one blue vector. On these two vectors we have rested a plane (which must pass through the origin). This plane is (of course) two-dimensional, but all points lying on the plane correspond to a vector in R3: The plane is a two-dimensional subspace of R3: It is a subspace because it contains the origin of R3; but not all vectors in R3 lie on it. It is two-dimensional because only two vectors in the subspace (e.g., the red vector and the blue vector) are required to generate all vectors in the subspace; i.e., any vector in the subspace can (by the parallelogram rule) be expressed as a linear combination of the red and blue vector. The red and blue vector form a basis for this two-dimensional subspace.
DeÖnition 3 (Dimension) the dimension of a vector (sub)space is the number of vectors in its basis.
DeÖnition 4 (Basis) A basis, of a vector space, is a set of vectors which is linearly independent and which spans the whole space.
Although a basis is not unique, the number of vectors contained in a basis for as given subspace is constant and is called the dimension of that subspace; see Proposition 8 below.
At this point we need to discuss in more detail the concepts of linear independence and spanning sets.
2.3 Linear dependence & independence
Linear independence/dependence is a concept that pertains to a SET of vectors (not individual vectors). A SET of m, (n 1), vectors fx1; :::; xmg is said to be linearly independent if none of the vectors in the set can be expressed as a linear combination of the rest. If at least one of the xj (j = 1; :::; m) can be expressed as a linear combination of the rest then the SET is said to be linearly dependent.
Formally,
DeÖnition 5 (Linear independence/dependence) If there exists scalars j; j = 1;:::;m; such that
Xm
jxj =1×1+:::+mxm =0; withj 6=0foratleastonej; j=1
1 http://media.humanities.manchester.ac.uk/humanities/screencasts/soss/economics/linalg/3D Vector Plane.exe
14 CHAPTER 2. VECTOR SPACES then the set is said to be linearly dependent. If the ONLY set of j for
Proposition 2 A set of vectors which contains the null vector must be lin- early dependent.
Proof. Trivial and left as an exercise.
DeÖnition 6 (Spanning Set) A set of m; (n 1) ; vectors fx1; :::; xmg is said to span a vector (sub)space if every vector in the space can be written as a linear combination of fx1; :::; xmg :
Note that a spanning set need not be linearly independent. For ex- ample, (1; 0)0 ; (0; 1)0 and (0; 2)0 span R2 but do not constitute a linearly independent set of vectors. This points to the essential di§erence between a spanning set and a basis: a basis is a set which contains the maximum number of linearly independent vectors in the space.2 (A spanning set must contain a basis set.)
The idea of linear independence/dependence and spanning sets is cru- cially bound up the problem of Önding solutions to a system of m linear equations in n unknowns, as we shall see later in these notes.
2.4 Some results Proposition 3
1. If a set of vectors is linearly independent then any subset of these vectors is also linearly independent.
2. If a set of vectors is linearly dependent then any larger set must also be linearly dependent.
Proof.
1. (The proof is by contradiction.) Let fx1; :::; xm g be a linearly inde-
pendent set. Suppose, however, that fx1;:::;xkg is linearly dependent,
k < m (perhaps after some re-ordering of the vectors); then there must
Pk
Pm j=1
jxj =0isj =0 8j,thenthesetissaidissaidtobelinearly independent.
which
be some non-zero i; i = i; :::; k; such that
ixi = 0: Then, letting
i=1
j =j;j=1;:::;k;andj =0forj=k+1;:::;mwecanwrite
Pm
jxj = 0; where not all the j are zero. This is a contradiction,
j=1
2 Strictly speaking, we should write that the basis set contains the maximum number of vectors that can exist within any linearly independent set of vectors in the space.
2.4. SOME RESULTS 15 since fx1; ::; xmg is, by assumption, linearly independent. Therefore,
the set fx1;:::;xkg can not be linearly dependent for any k < m. 2. Proof is left as an exercise
Proposition 4 Suppose k < m is the maximum number of linearly inde- pendent vectors in a set of m; (n 1) ; vectors. Then given any linearly independent subset of k vectors from this set, every other vector in the set can be expressed as a linear combination of these k vectors. (k is the max- imum number of linearly independent vectors. This means that any set of k + 1 vectors must be linearly dependent, and there exists at least one set of k vectors which is linearly independent).
Proof. Proof is left as an exercise
Proposition 5 Consider a vector (sub)space V . The representation of any
vector a 2 V in terms of a basis is unique. Pm
Proof. Let x1;:::;xm be a basis for V with a=P ixi; for some a 2
i=1 m
V: Suppose, however, that we can also write a = ixi:Upon subtraction
i=1 Pm Pm
we Önd that a a=0= (i i)xi = ixi; say. But due to linear i=1 i=1
independence of the set fx1;:::;xmg (it is a basis) it must be that i = 0 8 i,sothati =i 8i:
Proposition 6 The representation of any vector in terms of a linearly de- pendent set of vectors is not unique.
Proof. Proof is left as an exercise
Proposition 7 A basis is not unique.
Proof. Let A = fx1; :::; xmg be a basis for a vector (sub)space V: Consider
any other vectoPr y 2 V ; y 2 A: From Proposition 5, the unique representa- m
tion of is y = ixi, in which it can be assumed (after some re-ordering i=1
if necessary) that 1 6= 0: To prove the result it is merely su¢ cient to show that B = fy; x2; ::xmg is also a basis for V: We must therefore show that B spans V and is linearly independent.
1. B spans V
From the expression for y we have x1 = (y
Pm i=2
ixi)=1; 1 6= 0:
Thus any vector in V which can be expressed as a linear combina-
tion of x1 and x2; :::; xm is also expressible as a linear combination of fy; x2; :::; xmg. So B spans V:
16
CHAPTER 2.
VECTOR SPACES
2. B is linearly independent Consider the following equation
Xm i=2
i.e.,1x1+
Pm i=2
Xm 1
Xm i=2
1y+
ixi = 0:
If the only solution is i = 0 8 i = 1;:::;m; then B must be linearly independent. Substituting for y we obtain
ixi +
ixi =0;where1 =11 andi =1i+i;i=
2; :::; m: But A = fx1; ::; xmg is linearly independent, thus it must be that1 =2 =:::=m =0:However,byassumption1 6=0sothat 1 = 0: Since 1 must be zero, i = i = 0 for i = 2;:::;m: Therefore, i = 0 8 i and so B is linearly independent.
i=1
ixi = 0; 1 6= 0;
Although a basis is not unique, the following Proposition shows that all bases (for a particular vector (sub)space) must contain the same number of vectors:
Proposition 8 Consider two alternative bases for a vector (sub)space V : A = fy1;:::;yrg; yi 6= 0; i = 1;:::;r: and B = fx1;:::xmg; xj 6= 0; j = 1;:::;m: Then r = m and this number is called the dimension of the (sub)space and is the maximum number of vectors that can exist in any linearly independent set of vectors in V:
Proof. Suppose, without loss of generality, that A and B are mutually ex-
clusive. Consider A; which is linearly independent and spPans V: Each xj 2 B r
can be expressed inPterms of A and assume that x1 = iyi; 1 6= 0; so
r i=2
i=1 Proposition 7,
that y1 =
that A1 = fx1; y2; ::; yrg is a baPsis for V: Assume the representation of x2
(x1 intermsofA1 isx2 =1x1+
the Proof of iyi;inwhichsomeofthei (i=2:::;r)
i yi )=1 : From r
we know
i=2
must be non-zero. (If this were not so then we would have x2 = 1x1; which
is a contradiction since x1 and x2 come from B; a linearly independent set
of vectors.) Therefore we can assume that 2 6= 0 and consequently replace
y2 with x2 in A1 to form a new basis A2 =Pfx1;x2;y3;:::;yrg. With this r
basis we can write x3 as x3 = 1x1 + 2x2 + iyi, and similar arguments i=3
2.4. SOME RESULTS 17
to those above show that i 6= 0 for some i > 3: We can thus assume that 3 6= 0 and replace y3 by x3 to form a new basis A3 = fx1;x2;x3;y4;:::;yrg: This process can continue until all the remaining xj in B have been substi- tuted into the basis. For each substitution, one of the yi is replaced. In order to complete the process there must be at least as many vectors in A as there are in B; i.e.,., r > m: If r < m then there will come a point (before xm can be substituted in) when a contradiction concerning the linear independence
of B arises. So we have r > m:
Now reverse the process; starting with B as a basis and, one-by-one, substi- tute the vectors from A into the basis. The process is perfectly symmetric to the one described above and from which we, therefore, Önd m > r:
Thus we have r > m and m > r: The reconciliation of these Öndings is r = m:
Corollary 1 Any set of m linearly independent vectors form a basis for a m-dimensional vector (sub)space.
Proposition 9 Let V be a m-dimensional vector (sub)space. Any set of m mutually orthogonal (non-zero) vectors, fx1; :::; xmg ; must form a basis for V . (The x1; :::; xm are mutually orthogonal if and only if x0ixj = 0, all i 6= j; and kxk2 = x0ixi > 0.)
Proof. In the light of Corollary 1, all we have to do is prove that fx1; :::; xmg
is linearly independent. To do so we consider the possibilities for i which Pm 0Pm00
solve ixi = 0: Pre-multiplying by xj yields i(xjxi) = jxjxj = 0; i=1 i=1
since x0jxi = 0: But x0jxj = kxjk2 > 0: Therefore it must be that j = 0
8 j = 1; :::; m; is the only possible solution to j kxj k2 = 0, which implies that the set fx1;::;xmg is linearly independent and thus a basis for an m- dimensional vector space V:
DeÖnition 7 (Orthonormal Basis) Any set of m mutually orthogonal set of vectors, fx1; :::; xmg, satisfying kxik = 1; for all i = 1; :::; m; is called an ORTHONORMAL basis for an m dimensional (sub)space.
Proposition 10 Any set of m linearly independent vectors from a m-dimensional vector (sub)space can be converted into an orthonormal basis using The Gram Schmidt Orthogonalisation Process.
Proof. Read up!
Proposition 11 If V is a vector (sub)space with dimension equal to m and fx1;:::;xmg is a basis with each xj 2 Rn: Then n m:
Proof. Since fx1; :::; xmg is a basis it is also linearly independent. Suppose n < m; then by Proposition 3, fx1; :::; xng must also be linearly independent. However, by Corollary 1 it must be that fx1;:::;xng is a basis for Rn. This implies that for j = n + 1; :::; m; xj can be expressed as a linear combination
18 CHAPTER 2. VECTOR SPACES of fx1; :::; xng : This is a contradiction as fx1; :::; xmg is a basis and thus a
linearly independent set of vectors. Therefore, n m: 2.5 Exercises
1. Are the following sets of vectors linearly dependent? If so, Önd the appropriate linear combination which equates the null vector.
(a) (1;0;0)0 ; (0;1;0)0 ; (0;0;1)0 ;
(b) (1;0;0)0 ; (0;1;0)0 ; (0;0;1)0 ; (123; 17;0)0 ;
(c) (1;2;3)0 ; (2;3;4)0 ;
(d) (1;2;3)0 ; (2;3;4)0 ; (3;5;7)0 ;
(e) (4;1; 5)0 ; ( 2;3; 1)0 ; (4; 7;3)0 ;
(f) (2;3;1; 1)0 ; (2;3;1; 2)0 ; (4;6;2; 3)0 :
2. A non-empty set of vectors in Rn is called a subspace of Rn if it is
closed under addition and scalar multiplication. That is, if both x and
y are in the set then so are x+y and x, for any scalar : Therefore, if
the vectors x1; x2; :::; xm are in the set, all linear combinations must also be in the set for it to be a subspace.
Pm i=1
ixi (a) Show that a subspace of Rn must always contain the null vector,
0 = (0;0;:::;0)0 :
(b) If x = (x1; x2; x3; x4)0 is an arbitrary vector from R4; determine
which of the following sets are subspaces of R4: All vectors with:
i. x1 =x2 =x3 =x4; ii. x1 = x2; x3 = x4;
iii. x4 = 0; iv. x1 = 1;
v. x1; x2; x3; x4 all integers.
3. If x and y are vectors in Rn then the projection of y onto x is deÖned
as the vector x x0y : kxk2
(a) Provide a geometrical interpretation of this for n = 2:
(b) Show that if this vector is subtracted from y then the resulting
vector is orthogonal to x.
4. Show that if a set of vectors is linearly dependent, then any larger set of vectors is also linear dependent.
2.5. EXERCISES 19
5. Consider a set of m vectors, W = fw1;:::;wmg where k < m is the maximum number of vectors that can exist in any linearly independent subset drawn from from W. Let A = fa1;:::;akg W denote any linearly independent set of k vectors. Show that any vector x 2 W; but x 2= A; can be expressed as a linear combination of the vectors in A.
6. Find the dimension of each of the three subspaces of Rn generated by the following sets of vectors:
(a) n(1; 1;2;3)0 ;(3;2;1;0)0 ;(4;1;3;3)0 ;(5;0;5;6)0o (b) n(1;0;3;1)0 ;(1;0;2;2)0 ;(2;0;3;3)0 ;(3;0;2;2)0o (c) n(1;2;1;2)0 ;(1;3;2;1)0 ;(2;3;1;0)0 ;(1;2;3;0)0o
Also give a basis for each subspace.
7. Show that if the (n 1) vector u is orthogonal to the (n 1) vector v, then every scalar multiple of u is orthogonal to v. Find a vector of unit length which is orthogonal to both v1 = (1; 1; 2)0 and v2 = (0; 1; 3)0 in R3:
20 CHAPTER 2. VECTOR SPACES
Chapter 3 Matrices & Rank
3.1 Introductory remarks
In Chapter 2 we saw that a linear combination of a set of vectors fx1; :::; xmg is an important consideration. In order to pursue related theoretical con- cepts, as well as introducing an extremely important and useful deÖnition, it is convenient to arrange a set of (column) vectors to form what is known as a matrix. For example, if we have n observations on m variables, where both n and m are positive integers, then we can introduce the (n 1) vector xj to denote all n observations on variable j: In such a situation we might write xj = fxijg; i = 1;:::;n; where xij is the ith observation on variable j: Collecting all variables together we could write X = (x1; :::; xm) ; which would constitute a data matrix and is a rectangular array having n rows and m columns; we say that X is (n m) ; (rows columns ): Of course, a matrix can be used to represent other things, apart from data.
In general, then, a (n m) matrix X is a rectangular array of numbers:
26x11 x12 x1m37 X=6x21 x22 x2m7 4 . . ... . 5
xn1 xn2 xnm
and we write X = fxijg; meaning ìX has typical element xijî, where xij is the element in the ith row and jth column of X; for i = 1;::;n; and j = 1; :::; m; that is, xij is the (i; j)th element of X:
Note that where x; a; z; etc, are used to denote vectors (which could be regarded as (n 1) matrices), in general we use X; A; Z, etc, to denote
matrices.
212343 Example2Considerthe(34)matrixA=42 3 4 55;a23=4=
a14 =a32; whilst a21 =a12 =2 and a34 =6: 21
3456
22 CHAPTER 3. MATRICES & RANK
A matrix has NO NUMERICAL VALUE itself. However, matrices are very useful and apart from anything else they a§ord the statistician and econometrician a convenient and compact notation.
3.1.1 Some special matrices
1. Asquarematrix,A;isany(nn)matrix,andtheelementsa11;a22;:::;ann
are termed the leading diagonal.
21 0 03
2. Theidentitymatrix: I =(e ;e ;:::;e )=60 1 .7; (nn)
n 12 n 64. ...075 001
60 .7
3. A diagonal matrix: D = diag( ) = 2 , (nn);
with i 6= 0 for at least one i:
4. The null matrix: any matrix containing only zeros.
3.2 Basic matrix operations & deÖnitions
We can add, subtract and multiply matrices, under appropriate conditions (if we are careful), but direct DIVISION is NOT DEFINED: you should NEVER write ìA=Bîmeaning ìA divided by Bî- it is simply not done!
In the following let A = faijg and B = fbijg; for i = 1;:::;n; j = 1;:::;m; so that both A and B are (n m) :
1. EQUALITY:A=Bi§aij =bij foralliandj.
2. SCALAR MULTIPLICATION: B = A = faijg = faijg = A; for any scalar 2 R:
Consider a diagonal matrix D = diag () ; (n n); then we have D = In
3. ADDITION: C = A+B = faij +bijg = fbij +aijg = B+A: Triviallywehave: A+B+C=A+(B+C)=(A+B)+C;etc.
4. SUBTRACTION:C=A B=A+( 1)B=faij bijg:
5. TRANSPOSE: the transpose of A, denoted A0;is the (m n) ma- trix whose m rows are the m columns of A; i.e., if A = faijg; i = 1;:::;n;j = 1;:::;m; then A0 = fbjig; j = 1;:::;m;i = 1;:::n; where bji = aij:
21 0 03
i 64. ...075 0 0 n
MULTIVARIATE RANDOM VARIABLES 23
1 2 3 21 43 Example3IfA= 4 5 6 ;thenA0=42 55:
36
TRACE (square matrices only): If A is (n n) ; the trace of A,
denoted tr(A) is the sum of the elements in the leading diagonal
DeÖnition 8 (Symmetry) A is symmetric i§ A0 = A; and is skew- symmetric i§ A = A0: (In both cases A must be (n n) ; i.e., square.
3.3 Multivariate random variables
In statistics it is necessary to draw the distinction between a random vari- able X, say, which has some distribution; a possible realisation from that distribution, denoted x; and, moments/parameters of that distribution. e.g., E [X] = : When considering multivariate random variables it is convenient to arrange all random variables under consideration into a vector. How- ever, adopting the notion of upper case letters to denote random variables would lead us to write something like ìX = (X1; :::; Xn)0î for a vector of n random variables (a random vector). This would be at odds with the notation developed so far. To get over this we shall adopt the rule of NOT distinguishing between a random variable and a realisation of that variable (a practise adopted by most econometricians). We thus write a multivariate random variable as x = (x1; :::; xn)0, a (n 1) random vector where fxig ; i = 1; :::; n; are univariate random variables. If E [xi] = i; we say that x has mean vector E[x] = = fig and note that E[x ] = 0: In general when taking expectations of random vectors (or random matrices) we sim- ply apply the expectation operator to each element in the vector (or matrix) and thus obtain the corresponding vector (matrix) of expectations.
From the deÖnition of trace, introduced above, it is easy to see that, when dealing with a square matrix of random variables, expectation and trace are interchangeable. Let X be a square matrix of random variables, xij ; such that E [X] = M = fE [xij ]g : Then E [tr (X)] = tr (M) : (The proof is trivial!)
In the case of a (n 1) random vector, x; we deÖne the following (n n) variance-covariance matrix1 : var(x) = V = fcov(xi;xj)g, i;j = 1;:::;n: This matrix is square and provides a way of arranging variances of, and
1 Sometimes called simply the covariance or variance matrix.
3.3.
6.
tr(A) =
From the deÖnition of transpose, we deÖne the following:
Xn i=1
aii:
24 CHAPTER 3. MATRICES & RANK
covariances between, the elements in x: Observe that writing V = fvijg, we have vii = var (xi) and vij = vji: The leading diagonal, being the elements v11; v22; :::; vnn (top left to bottom right) give the variances and the o§- diagonal elements, vij ; i 6= j; are the covariance terms. Thus V is symmetric. (A variance matrix has other important properties as we shall in due course.)
3.4 Matrix multiplication
We now consider the possibility of multiplying or taking the product of two (or more) matrices. Such a product, when deÖned, is written C = AB and is fundamental to many of the ideas in statistics and econometrics. In the product AB, A is called the pre-multiplier and B the post-multiplier (A is post-multiplied by B). There are other matrix products which are commonly used such as the Kronecker product A B and the Hadamard product A
B, but these will not be discussed here.
Howver, matrix multiplication is NOT always applicable; in order for
C = AB to be deÖned the number of columns in A must equal the numbers of rows in B. Thus A = faikg must be (nr) and B = fbkjg must be (r m) ; notice that n and m are arbitrary and not necessarily equal (although they may be). In such a situation the matrix product AB can be taken and the result is the (n m) matrix C = fcij g where
Xr
k=1
Observe that, since matrix product is not always deÖned, matrix multipli-
cation is NOT commutative; i.e., AB 6= BA; in general (BA may not be deÖned even though AB is).
cij =
aikbkj; i = 1;:::;n; j = 1;:::;m:
To see how following way:
2c11 c1j 6. . 6ci1 cij 64 . . . . . .
cn1 cnj
from which we see that the typical element cij is in fact, the scalar product between the ith row of A and the jth column of B. Thus writing A as a stacked set of n, (1 r) row vectors
2 a 01 3
A=64 . 75, a0i =(ai1;:::;air), (1r); i=1;:::;n a0n
matrix multiplication works, write the matrix product in the
c1m3 2a11 a1r3 2b11 b1j b1m3 .7 6. .76 7
cim7=6ai1 air7 6 . . . 7 . . . 75 64 . . . . . . 75 64 75
cnm an1 anr br1 brj brm
3.4. MATRIX MULTIPLICATION 25
with B = [b1;::::;bm]; bj = (b1j;:::;brj)0; (r1); j = 1;:::;m; the i;jth element in the product AB can be written cij = a0ibj:
Provided that the appropriate operations are deÖned the following results hold:
1. ASSOCIATIVE LAW
(AB)C = A(BC) = ABC
2. DISTRIBUTIVE LAW
A (B + C) = AB + AC; (B + C) D = BD + CD
At this point, let us introduce the outer product between two vectors u and v. This is a particular example of matrix multiplication as applied to two vectors:
DeÖnition 9 (Outer product) The outer product between u, (n 1) ; and v,(m1);isthe(nm)matrixuv0 =fuivjg; i=1;:::;n; j=1;:::;m:
It is the operation of outer product which is used to construct a variance- covariance matrix:
DeÖnition 10 (Variance-covariance matrix) Let x be a (n 1) ran- dom vector with mean vector E [x] = : The variance-covariance of x, when it exists, is constructed as:
var(x)=E(x )(x )0=E(xi i) xj j =fcov(xi;xj)g: 3.4.1 Some useful results
1. If A is any (nm) matrix then: (a) InA=AIm =A
(b) 0p;nA = 0p;m and A0m;q= 0n;q; where 0p;n is a (p n) matrix of zeros (the null matrix).
2. (AB)0 = B0A0; provided the product AB is deÖned. (The proof is left as an exercise.)
3. tr (AB) = tr (BA) ; provided AB is deÖned and square. (The proof is left as an exercise.)
Note that, where the appropriate operations are deÖned we have that:
tr (ABC) = tr (BCA) = tr (CAB) :
26 CHAPTER 3. MATRICES & RANK 4. For two (n1) vectors; u and v; tr(uv0) = v0u = u0v: (The proof is
trivial using 2, and noting that tr(:) is a scalar).
We use matrix multiplication and trace to deÖne a natural generalisation
of Euclidean Norm for a matrix:
DeÖnition 11 (Matrix Euclidean Norm) kAk = Pni=1 Pmj=1 a2ij is the Euclidean norm of an arbitrary (n m) mqatrix, A; and is equal to
kAk = ptr (A0A) = tr (AA0):
(You will be asked to show in the Exercsies at the end of this Chapter that
kAk2 = tr (A0A) = tr (AA0) 🙂
3.5 Linear independence and rank
Observe that a linear combination of vectors can be regarded as a linear
combinPation of the columns of a matrix. If A = faijg; (nm); and we m
write aijxj = bi; for some xj not all zero, then this is equivalent to j=1
Xm
xjaj =x1a1 +x2a2 +:::+xmam =b, (n1) j=1
where aj = (a1j ; :::; anj )0 and b = fbj g : The vector b is, therefore, a linear combination of the columns of A: From the deÖnition of matrix product such an equation can, alternatively, be written Ax = b, where the (m 1) vector x = fxjg and the (nm) matrix A = [a1;a2;:::;am] = faijg; is the collection of the m, (n 1) ; vectors aj : (Such a matrix-vector formulation arises when considering the nature of a solution to n linear equations in m unknowns, and shall be discussed later). Of particular interest is the case of whether or not there exists a x 6= 0 such that Ax = 0. If ìyesî, then there is a (non-trivial) linear combination of the columns of A which equals the null vector. From the discussions in Chapter 2, it thus follows that the columns of A form a linearly dependent set of vectors and we say that A does NOT have full column rank. If, on the other hand, the answer is ìnoî, then the columns of A form a linearly independent set and A is said to have full column rank (the only possible vector, x, which satisÖes Ax = 0 is x=0).
In general, the rank of matrix, denoted r(A), is the maximum number of linearly independent vectors that can exist in any subset of vectors taken from the columns of A, (n m); or the maximum number of linearly inde- pendent columns in A. If r(A) = k < m, then there is at least one subset
3.5. LINEAR INDEPENDENCE AND RANK 27
of k column vectors, in A, which is linearly independent and all subsets of k + 1 vectors are linearly dependent.
You should be able to verify for yourself that the rank of a diagonal matrix is simply the number of non-zero diagonal elements.
3.5.1 Some results concerning rank
Let A be (nm):
1. Clearly r(A) m:
2. Suppose r(A) = k < m: Then any linearly independent set of k vec- tors, taken from the columns of A; must form a basis for some k- dimensional subspace of Rn: It thus follows, from Proposition 11 in Chapter 2, that r(A) = k n:
3. 1 and 2 together imply that r(A) min(n; m): For example, a (4 3) matrix must have rank 3:
4. The above discussion describes rank in terms of the columns of A. But clearly a similar concept pertains to the maximum number of linearly independent rows of A. It is, however, unnecessary to distinguish between row rank and column rank: it turns out that row rank equals column rank; i.e., the maximum number of vectors in any linearly independent subset of rows drawn from A is exactly the same as the maximum number of vectors in any linearly independent subset of columns drawn from A. Thus we can refer to a single number as being the rank of any matrix.
5. From 4, r(A0) = r(A):
As noted above, if A is (nm) and r(A) = k m; then it is possible to Önd x 6= 0 such that Ax = 0; since the columns of A form a linearly dependent set. But from Proposition 6, in Chapter 2, we know that such a vector x will not be unique in this case: there will be many vectors, x; which satisfy Ax = 0. Let us consider this set, W , of solution vectors. If x 2 W thensois xforanyscalar: Ax=0)A( x)=0:Ifbothx2W and y2W then(x+y)2W;becauseAx=0andAy=0)A(x+y)=0: Thus W is, in fact, a subspace of Rm (since all vectors in W are (m 1)): This subspace is called the null-space or kernel of A and its dimension is called the nullity of A. It can be shown that m = r(A) + nullity, where m is the number of columns in A.
If, on the other hand, A is (nm) with r(A) = k m; then for all x 2 Rm; y = Ax 2 C where C is a k-dimensional subspace of Rn; :sometimes referred to as the column space of A:
28 CHAPTER 3. MATRICES & RANK
3.6 Exercises
1. Consider the simple linear regression model with no intercept
yi= xi+ui; i=1;:::;n;
where the ui are a sequence of iid random variables with mean zero
and constant variance and xi is a scalar regressor.
(a) Show from Örst principles that the ordinary least squares estima-
tor of
can be written
Pn
xi yi
i=1 b=Pn :
x2i i=1
(b) Show that b can be written b = x0y=(x0x); for suitably deÖned (n 1) vectors x and y.
DeÖnethefollowingvectorsy^=(y^1;:::;y^n)0 ande=(y1;:::y^1;yn y^n)0;
where y^i = bxi; (c) x0e=0:
(d) y0y = y^0y^ + e0e:
(e) y^0e=0:
2 1 0 23
2. IfX=42 4 55andY=43 25;computetheproductmatrix
i = 1; :::; n: Show that
21 13 3 2
1 2 0
Z = XY and determine:
(a) all the elements in the third row and all the elements the Örst column;
(b) the single element occupying the position in the second row and second column.
3. Let A and B be two (n n) matrices and let k be a scalar constant. Show that
(a) tr(A+B)=tr(A)+tr(B): (b) tr(kA)=ktr(A):
4. If the matrix products XY and YX are both deÖned and square, show that tr(XY) = tr(YX):
3.6. EXERCISES 29
5. If the matrix X is (n n) with n2 random variables xij as elements we deÖne the expected value of X; E(X); to be the (n n) matrix with elements E (xij ) : Show that E ftr (X)g = tr fE (X)g :
6. Let A be (nm) and B (mp): Using the fact that the (i;j)th element of the product AB is the scalar product of the ith row of A with the jth column of B, answer the following:
(a) Show that AA0 is deÖned and square having typical element
Pm s=1
th
aqsars in its (q;r) position, where q; r = 1;:::;n; and
P
A = faqsg; s = 1;:::;m:
(b) Show that A0A has typical element
n
aqsaqt; in its (s;t)th posi- (c) From (a) and (b), deduce that kAk2 = tr (A0A) = tr (AA0) :
q=1
miliar Cauchy-Schwartz Inequality)
(e) Explain why (AB)0 = B0A0: Hence, show that for any matrix X,
say, X0X is symmetric.
7. Let u be a (n 1) random vector and consider the outer product
matrix uu0 which has typical element fui uj g ; i = 1; :::; n; j = 1; :::n:
(a) Show that E ftr (uu0)g = E fu0ug :
(b) Under what conditions will tr fuu0g have a chi-squared distribu- tion with n degrees of freedom?
8. Explain carefully how the following system of n equations
Xk
yi= 1+ xij j+ui; i=1;:::;n;
j=2
can be expressed in vector-matrix notation as y = X + u, where y and u are (n1) vectors, is a (k1) vector and X is a (nk) matrix.
(a) What is distinctive about the Örst column of the X matrix?
(b) If ui are a sequence of independently and identically distributed ( iid) random variables each with zero mean and constant variance 2; show that E (uu0) = 2In, where u is a (n 1) random vector with typical element ui; i = 1; :::; n; and In is the (n n) identity matrix.
tion.
(d) Show that kABk2 kAk2 kBk2 : (This result generalises the fa-
30 CHAPTER 3. MATRICES & RANK
(c) In addition to the assumptions about u; outlined above, suppose that the elements of X are known constants (that is, they are not random variables). Explain why E (y) = X and var (y) = 2In:
Chapter 4
Determinants and the Inverse Matrix
4.1 Introductory remarks
As made clear in Chapter 3, strictly speaking matrix division is not deÖned but we are able to talk about inverse matrices in terms of matrix multipli- cation. In this Chapter we shall investigate the following question in our pursuit of the, so-called, ìinverse matrixî:
For any given matrix, X, does there exist another matrix, Y say, such that XY = YX = I ?
where I is an identity matrix. If such a matrix, Y, does exist then it is called the inverse of X (and conversely X will be the inverse of Y). If the inverse of X exists then it is written X 1 (it is NOT written ëI=Xí). Notice imme- diately that the existence of an inverse for all matrices is not guaranteed. For example, if X is (n m) so that it is impossible for XY = YX when n 6= m: Thus, we can immediately conclude that only square matrices can have inverses. Note that a square (n n) matrix is also referred to as an nth order matrix.
It is not true, however, that all square matrices possess an inverse. It is the purpose of this Chapter to investigate the circumstances under which a square matrix is ëinvertibleíand these deliberations will lead directly to the construction of an inverse matrix, when one exists.
The question of existence requires a discussion of determinants, to which we now turn.
4.2 Determinants
The determinant of a matrix is a real number, and is ONLY deÖned for square matrices as follows:
31
32 CHAPTER 4. DETERMINANTS AND THE INVERSE MATRIX
DeÖnition 12 (Determinant) The determinant of an nth order; (n n) ; matrix X is deÖned to be that real number computed from the following sum involving the n2 elements of X:
det(X) = X ()x1ix2jx3k:::xnr i;j;:::;r
where the sum is taken over all permutations of the second subscript. A plus or minus sign () is attached to each term in this sum according to whether the permutation of fi; j; k; :::rg is even (+) or odd ( ) : (det (X) is sometimes, also, denoted jXj 🙂
The Örst thing to note is that the number of terms in this sum is equal to the number of di§erent permutations of the n integers fi; j; k; :::; rg and this is, of course, equal to n! = n (n 1) (n 2) :::2:1 :
The calculation is probably best illustrated by means of an example.
Example 4 Consider the (3 3) matrix
2a11 a12 a13 3
A = 4a21 a22 a235 : a31 a32 a33
A typical term in the sum deÖning det (A) involves the product of the three elements a1ia2ja3k; i;j;k = 1;2;3 (notice that the Örst subscripts always appear in the order 1;2;3). The number of terms like this in the sum is equal to the number of di§erent permutations of f1;2;3g and this is equal to 6 = 3! To write down the expression for det(A) we need all of these 6 permutations (and their sign). Permutations are signed according to the following simple conventions:
f1; 2; 3g is an even permutation; i.e., f1; 2; 3; :::n 1; ng is even
simply transposing (ëswappingí) two integers changes an even permu-
tation into an odd permutation, and odd into an even. We thus have the following table:
start
2nd f2; 1; 3g 3rd f2; 3; 1g
4th f3;2;1g 5th f3; 1; 2g 6th f1;3;2g
even + 1$2 odd 1$3 even + 2$3 odd 2$1 even + 3$1 odd
f1; 2; 3g
from which we can write out det (A) as
det(A) = a11a22a33 a12a21a33+a12a23a31 a13a22a31+a13a21a32 a11a23a32:
4.2. DETERMINANTS 33
It is not di¢ cult to see that such an expression can get quite compli- cated very quickly even for moderate values of n (e.g., for n = 4 there will be 24 times to consider!) However, this IS the deÖnition of a determi- nant and it does reveal an important feature. In any one term, in the sum deÖning det(X), no Örst subscripts are identical and no second subscripts are identical. The Örst subscripts are simply f1; 2; :::; ng and the second set of subscripts is simply a permutation of these n distinct integers. This means that whenever an element, xit, of the matrix appears in a term like x1i;x2j;x3k:::xnr; then no other element from the same row, i; OR column, t; can appear as well.
4.2.1 Properties of determinants
Let X be (nn):
1. The interchange of any two columns of X will change the sign of the determinant.
This is so because two of the second subscripts (corresponding to the columns that are interchanged) will be swapped in all terms of the sum deÖning det(X): Thus all terms change sign.
2. det(X) = det(X0): (Think about it!)
3. If X has two rows or columns which are identical then det(X) = 0. Let Y denote the matrix obtained from X by interchanging two columns. Then det(X) = det(Y). But if the interchanged columns are iden- tical then Y = X so that we have det(X) = det(X) and the only resolution of this det(X) = 0.
4. det(X) = n det(X); for any 2 R:
Note that X=fxijg and the result follows immediately from the deÖnition of determinant.
5. It can be shown that det(AB) = det(A) det(B); provided A and B are square and of the same order.
6. If X is triangular then det(X) = x11x22x33:::xnm: (Xissaidtobetriangularifxij =0 forj>i; or ifxij =0 for i > j:)
4.2.2 Expansion by Cofactors
We now introduce an alternative approach to evaluating det(X). The method, known as expansion by cofactors, leads directly to the construction of the inverse matrix.
34 CHAPTER 4. DETERMINANTS AND THE INVERSE MATRIX Consider again the arbitrary (3 3) matrix A. We note that
det(A) = a11 (a22a33 a23a32) a12 (a21a33 a23a31)+a13 (a21a32 a22a31) = a11C11 + a12C12 + a13C13; say.
Now, deÖne A(ij) to be that (2 2) square matrix obtained from A by eliminating the ith row and jth column so that, for example,
A(12)=a21 a23; a31 a33
then we can see that C11 = a22a33 a23a32 = det(A(11)); C12 = (a21a33 a23a31) = det(A(12)) and C13 = a21a32 a22a31 = det(A(13)):
In general, for an arbitrary (nn) matrix X, det(X(ij)) are called MINORS and Cij = ( 1)i+j det(X(ij)) are called SIGNED MINORS, i; j = 1; :::; n; or COFACTORS:
DeÖnition 13 (Cofactor) The cofactor of a square (n n) matrix, X; is Cij = ( 1)i+j det(X(ij)); where X(i;j) is the (n 1 n 1) matrix obtained from X by eliminating row i and column j:
DeÖnition 14 (Leading Principal Minors) The kth order leading prin- cipal minor of a square (n n) matrix X; k n; is the determinant of the (k k) matrix consisting of the Örst k rows and columns of X:
In the example given above, the expression for det(A) is called the ex- pansion by cofactors along the Örst row. However, we can evaluate det(A) using an expansion by cofactors along any row or column. For example, for A (3 3) ; alternative expressions for det(A) include:
1. det(A) = a11C11 + a11C12 + a13C13 : expansion along row 1. 2. det(A) = a21C21 + a22C22 + a23C23 : expansion along row 2 3. det(A) = a31C31 + a32C32 + a33C33 : expansion along row 3
which you can easily verify for yourselves.
In general, if X is (n n) the following are valid expression for det(X) :
Pn i=1
The fact that we can express a determinant in terms of an expansion by cofactors leads directly to the construction of an inverse matrix and also the condition for the existence of the inverse.
Expansion along row i : det(X) =
Expansion down column j : det(X) =
xij Cij
Pn j=1
xij Cij
4.3. THE INVERSE MATRIX 35 4.3 The inverse matrix
Let us consider an arbitrary (n n) matrix X: Since X(ij) is obtained by eliminating row i and column j from X it follows that the element xij does not appear in X(ij) and hence does not appear in Cij = ( 1)i+j det X(ij) : With this in mind consider the following expansion
Xn
xijCkj; for k 6= i; j=1
which is termed an expansion by alien cofactors as it is an expansion along
row i of X but using cofactors from row k 6= i: Observe that no elements from
row k; of X; appear in Ckj and (obviously) no elements from row k appear in
the ith row of X; i.e., both Ckj and xij ; j = 1; ::; n; are invariant with respect
to the elements in the kth row of X. Thus the elements fxk1; :::; xkng could be
anything and the value of the expansion given above will remain unaltered.
In particular, it doesnít matter if we make the elements in the kth identical
to the elements in the ith row: the value of this expansion is una§ected. Let
us do this and we see what we Önd: set xkj = xij: Substituting this into the
Pn
expansion above we get xkjCkj = det(X ); i.e., we get the determinant of
j=1
X; a matrix which has row k and row i identical, by construction. However,
previous results show that this determinant is zero. Thus zero is the value
Pn
Xn
of
the value is unaltered by considering other choices for xkj; whatever the kth
xijCkj; k 6= i, for one particular choice of xkj: But, as argued above, row of X is. Thus we have that
j=1
whilst
xijCkj = 0; for k 6= i; j=1
Xn
xijCij = det(X):
j=1
We now deÖne, and form, the ADJOINT of X:
DeÖnition 15 (Adjoint) The adjoint of a square (n n) matrix X; de- noted X+; is the transposed matrix of cofactors. Thus X+ = fCjig ; or
26C11 C21 Cn137 X+ = 6C12 C22 Cn2 7 :
4 . … . 5 C1n C2n Cnn
36 CHAPTER 4. DETERMINANTS AND THE INVERSE MATRIX We can now see, from simple matrix multiplication, that a typical ele-
ment in XX+ is (in row i and column j)
(Xn )
XX+ = xikCjk ; (i; j)th position,
k=1
which equals det(X) when i = j, but is zero otherwise. That is, XX+ has all diagonal elements equal to det(X) and zeros everywhere; or, more succinctly,
XX+ = det(X) In
And, indeed, similar considerations shos that X+X = det(X) In, also. Thus, provided det(X) 6= 0; we can deÖne Y = [det(X)] 1X+ and the above analysis shows that XY = YX = In: Therefore, by construction, Y is the inverse matrix that we have been looking for. It exists only for square matrices with non-zero determinants and, if it exists, it MUST be unique
(see later).
DeÖnition 16 (Inverse matrix) Provided X is (n n) ; then X has a inverse matrix if and only if det(X) 6= 0: This inverse matrix is deÖned by
X 1 = 1 X+; det(X)
where X+ is the adjoint matrix of X:
Remark 1 If X 1 exists, we say that that X is non-singular. If det(X) = 0,
then X is said to be singular and its inverse does not exist.
Example5 Considerthe(22)matrixX=fxijg;i;j=1;2:Inthiscase
det(X) = x11x22 x21x12 and provided this is non-zero X 1 = 1 x22 x12;
x11x22 x21x12 x21 x11 as is probably well known.
4.4 Some further results concerning rank, deter- minants and inverses
1. For an arbitrary (n m) matrix, A; r(A) is the size of the largest non-vanishing determinant contained in A: (Obtained by eliminating rows and columns from A:)
Proof: omitted – quite tiresome.
4.5. EXERCISES 37 Example 6
21 3 1 03 23 1 03
X=42 4 0 15)r(X)=3sincedet44 0 15=0 1
3501 501 (4 5) + 0=1:
21 2 13 2 1
X=42 4 05)r(X)=2sincedet 2 4 = 4;eventhough
360 det 1 2 =0:
24
In the following let X be (n n) :
2. If X has an inverse then that inverse is unique. (Proof is left as an exercise)
3. If X and Y are non-singular having inverses X 1 and Y 1, respec- tively, then XY is non-singular and has inverse Y 1X 1.
(Proof is left as an exercise.)
4. Provided X is non-singular, det(X) = [det(X)] 1:
Proof: Using previous results we have that det(In) = 1 and det(X 1X) = det(X) det(X 1). Since X 1X = In, the result follows.
5. If the columns (or rows) in X form a linearly dependent set of vectors, the det(X) = 0.
Proof follows from 1 above; alternative derivation is left as an exercise.
6. For an (nn) matrix; X:[r(X) = n] , [det(X) 6= 0] , [X is non- singular]; i.e., X is non-singular i§ it has full rank.
Proof follows from 1: r(X) = n ) largest non-vanishing determinant is det(X) ) det(X) 6= 0:
det(X) 6= 0 ) largest non-vanishing determinant is det(X) ) r(X) = n:
7. If X = diag(xi) then, provided xi 6= 0 for all i; X 1 = diag(x 1): i
(Proof is left as an exercise).
8. Provided det(X) 6= 0; X 10 = (X0) 1: (Proof is left as an exercise.)
4.5 Exercises
1.
38
CHAPTER 4. DETERMINANTS AND THE INVERSE MATRIX
21 2 33 21 2 33
(a) IfA=40 1 15andB=40 1 15showbydirecteval-
4 2 2 4 2 2 uation that det(A) = det(B).
21 2 33 21 2 33
(b) IfA=44 1 25andB=41 1 15showbydirectevaluation
111 412 that det(A) = det(B).
21 3 13
(c) Show that det 42 6 25 is zero.
412
If the (n n) matrix A = faijg is lower triangular, so that aij = 0 for
all j > i; show that det(A) = a11a22:::ann: Explain why the determinant of a square matrix:
(a) will be zero it has two columns (or rows) which are identical;
(b) will change sign if two columns are interchanged;
(c) will be multiplied by if one of its columns has all its entries multiplied by :
Let A be a square matrix whose rows form a linearly dependent set of (row) vectors. Thus the rth row of A can be expressed as some linear combination of the remaining rows in A.
By writing down det(A) in terms of an expansion by cofactors along the rth row of A, show that det(A) = 0.
(Hint: if you write the algebra correctly, you will have to recognise that an expansion by alien cofactors is identically zero).
The concept of a block-partitioned or simply partitioned matrix is often very useful. Indeed, we have introduced the concept previously by regarding a matrix as a collection of column vectors (side by side) or as a collection of row vectors (stacked on top of one another). In general, it may be convenient to regard an (n m) matrix A in block partitioned form as
A=A11 A12; A21 A22
whereA11;(n1 m1); A12;(n1 m2);A21; (n2 m1);A22; (n2 m2); and,n1+n2 =n and m1+m2 =m:
If the matrix product AB is deÖned, for some matrix B, then obtain
2. 3.
4.
5.
4.5. EXERCISES
the appropriate partition of B such that
26AB+AB . AB+AB37
AB =64 . 75: A B +A B . A B +A B
6. Explain the partitions used if Ax = b is re-expressed as either:
(a) A1x1 + A2x2 = b
or
(b) A1x=b1 and A2x=b2
where A is (nn); x is (m1); b is (n1) and a subscript on a matrix/vector denotes a partition of that matrix/vector.
[Note that the partition of A in (a) is di§erent to the partitioned of Ain(b),sothatA1 in(a)isNOTthesameasA1 in(b).]
11 11 12 21 11 12 12 22
21 11 22 21 21 12 22 22
7. If possible calculate the inverse of the following matrices
A=2 1; B= 1 3; C=1 2 6; 6 2 3 9 0 1 5
22 13 21 1 73 21 2 43 D=46 95; E=40 1 25; F=42 4 75:
7 4 2 4 17 0 1 3
If the inverse does not exist, explain why and also calculate the rank of these singular matrices.
8.
9. In general, an nth order matrix, X say, whose inverse is its own trans- pose, X0, is said to be orthogonal. An orthogonal matrix thus satisÖes: X0X = XX0= In:
(a) Verify that the n columns of an orthogonal matrix form an or- thonormal basis for the n-dimensional Euclidean Space (i.e., show that these column vectors are mutually orthogonal and have unit length).
(a) Show that if an nth order matrix is non-singular, then its inverse is unique.
(b) Consider two square, non zero matrices A and B which satisfy AB = 0. Show that BOTH A and B must be singular matrices.
(c) Prove that (XY) 1 = Y 1X 1; when the inverses are deÖned.
(d) Provided det(X) 6= 0; show that (X0) 1= (X 1)0:
39
40 CHAPTER 4. DETERMINANTS AND THE INVERSE MATRIX (b) Let X be an orthogonal matrix and y and z two (n 1) vectors,
such that y = Xz: Show that kyk = kzk:
10. Let x = fxig denote a (n 1) vector of observations on some random
variable of interest. DeÖne the sum vector, n = (1; 1; :::; 1)0; to be a
(n 1) vector of ones, and the (n n) matrix M = In 1 n0n: n
(a) Verify that 0nx computes the sample mean of the observations
n fxig:
(b) VerifythatMhaselementsmii =1 1 and mij = 1; i6=j: nn
(c) ShowthatM0 =MandthatMM=M.
(d) Finally, show that Mx = fxi xg ; where x is the sample mean of the observations.
Chapter 5
Linear Transformations and Simultaneous Equations
5.1 Introductory remarks
For our purposes we think of a linear transformation as being a simple linear combination of a set of m, (n 1) vectors, where the set of vectors concerned forms the columns of some (nm) matrix, A = [a1;:::;am]; say. A sin- gle linear combination of the m vectors fa1; :::; amg ; each (n 1) ; forms a single, new, (n 1) vector. A sequence of p linear combinations of these vectors forms a corresponding sequence of p, (n 1) ; vectors. Collecting all these p vectors together forms a new (n p) matrix C, say, which can be regarded as a linear transformation of A. This linear transformation can be represented by matrix multiplication, as follows.
Let A be the (n m) matrix described above, and let br; r = 1; :::; p; denote a sequence of p; (m 1) vectors. Then Abr = cr deÖnes a vector cr which is a simple linear combination of the columns of A. The scalar coe¢ cients in this linear combination are the elements in br :
Xm
j=1
where br = (b1r ; :::; bmr )0 : The sequence of p vectors, br ; deÖnes the (m p) matrix B = [b1;b2;:::;bp] and similarly we can write C = [c1;c2;:::;cp]; (n p) ; where Abr = cr; r = 1; :::; p: Now, applying the rules for matrix multiplication it is fairly easy to show that:
AB = A[b1; b2; :::; bp] = [Ab1; Ab2; :::; Abr] = [c1; c2; :::; cp] = C:
That is, by taking p (possibly di§erent) linear combinations of the columns of A, (nm), we form a new matrix C, (np); and the operation that achieves this linear transformation is the post-multiplication of the matrix A
41
Abr =
ajbjr; r=1;:::;p
42CHAPTER5. LINEARTRANSFORMATIONSANDSIMULTANEOUSEQUATIONS
by an appropriate (m p) matrix B. Thus we think of AB as representing a linear transformation of the columns of A:
Alternatively, we could think of the matrix product AB as representing a linear transformation of the rows of B, where the matrix A achieves the transformation. Either way we regard the operation of matrix product, AB, as providing a linear transformation:
post-multiplication, by B; takes linear combinations of the columns of the pre-multiplier
pre-multiplication, by A; takes linear combinations of the rows of the post-multiplier
Example 7
1. Suppose D = diag(di);i = 1;:::;n; A is (mn) and B is (np): Then
AD = [d1a1; d2a2; :::; dnan]
provides a new matrix whose columns are scalar multiples of the columns
of A. Further
dn bn
provides a new matrix whose rows are scalar multiples of the rows of
B.
2. Suppose the sequence of random variables, ui; are independently and identically distributed (iid), each having mean zero and variance equal to 2; i = 1;:::;n: Writing u = fuig we have that E[u] = 0 and var[u] = E[uu0] = fE[uiuj]g = 2In: Consider now the new vector random variable, z = Au; whose elements, zs; are simple linear com- binations of the elements in u (i.e., the random vector z is a linear transformation of the random vector u):
Xn i=1
and A = fasig is a matrix of constants. We wish to Önd E[z] and var[z]:
Firstly, and noting that E[:] is a linear operator, we have
(Xn )(Xn )
E[z] = fE[zs]g = E[ asiui] = asiE[ui] = f0g i=1 i=1
zs =
asiui;s = 1;::;r
2d1b013
DB = 6
4 05
7
5.2. NON-SINGULAR TRANSFORMATIONS 43
since E[ui] = 0: Thus we may write: E[z] = AE[u] = 0:
Finding var[z] is slightly more complicated but proceeds along similar lines. Since zs has zero mean for all s we have,
var[z] =
=
= =
(Xn Xn ) fE (zszt)g = E[ asiui atiui]
8 i=1 i=1 9