Nonlinear econometrics for finance: notes
. Bandi and Johns Hopkins University
⃝c 2021 . Bandi, All Rights Reserved
Copyright By PowCoder代写 加微信 powcoder
Chapter 0 Nonlinear econometrics for finance
Quick review of important facts about matrices
A matrix is a rectangular table (or array) of numbers. The number of rows n and
has 2 rows and 3 columns, so we say A is a 2×3 matrix. In general, a matrix is denoted as an n × m array, with n rows and m columns. A matrix with the same number of rows and columns is called a square matrix. For example, the following matrix is a (2 × 2) square matrix
A= 11 12 . (2)
A column vector is a single column matrix, that is a matrix n × 1. A row vector is a single row matrix, that is a matrix 1 × m. Finally, a matrix with 1 row and 1 column (1 × 1) is called a scalar. As an example, x is a 2 × 1 column vector, y is a 1×3 row vector and c is a scalar
columns m of the matrix is the dimension of the matrix. For example, the matrix a a a
1
; y= 3 4 5 ; c= 6 . (3)
A= 11 12 13 (1) a21 a22 a23
The sum of two matrices A and B is a matrix with entries equal to the sum of the entries of each matrix, that is if the matrices A and B are
a a a A= 11 12 13 ;
a21 a22 a23
B= 11 12 13 (4)
b21 b22 b23
then the sum A + B is the following matrix
a +b A+B= 11 11
Nonlinear econometrics for finance
13 13 . (5)
cij = mk=1 aikbkj for i = 1,…,n and j = 1,…,p. For example, the matrices
11 12 13 14 15 16
12 a22 +b22
Notice that for this operation to work out, we need the matrices A and B to have the same dimensions n × m. If this is the case, then the matrices are said to be comformable for addition. As an example, consider the addition of
123 111213 121416
4 5 6 14 15 16 18 20 22
Analogously, the subtraction of two matrices yields the matrix with entries equal to the difference in the entries of the two matrices. That is, if we compute C = A−B, then each element of C is computed as cij = aij − bij for all i = 1,…,n and j = 1, …., m.
• The multiplication of two matrices A and B is possible only when the number ofcolumnsofAisthesameasthenumberofrowsofB. Thatis,ifAisan n×m matrix, B must be an m×p matrix. If that is the case, then A and B are comformable for multiplication.
The product of an n×m matrix A and an m×p matrix B is the n×p matrix C satisfying
C = AB so that cij = aikbkj. (7)
In other words, each element of C is defined as the following sum of products
Chapter 0 Nonlinear econometrics for finance
are not comformable for multiplication since A is 2 × 3 and B is 2 × 3. On the other hand, we can define the product of the two matrices below
1 2 3 11 A= ;B=13
which are comformable. The product is the 2 × 2
82 88 1 2 311 C = = 13
199 214 4 5 6 The entries of C are computed as follows:
14 , (9) 16
matrix C = AB
14 = AB. (10) 16
c11 = a1kbk1 =1·11+2·13+3·15=11+26+45=82, (11)
c12 = a1kbk2 =1·12+2·14+3·16=12+28+48=88, (12) k=1
c21 = a2kb12 =4·11+5·13+6·15=44+65+90=199, (13)
c22 = a2kbk2 =4·12+5·14+6·16=48+70+96=214. (14) k=1
Finally, notice that AB ̸= BA.
• Identity matrix. A square matrix with ones on the main diagonal and zeros
everywhere else is called the identity matrix. A k-dimensional identity matrix has
k columns and k rows,
Nonlinear econometrics for finance
1 0 0 ··· 0 0 1 0 ··· 0
Ik=0 0 1 ··· 0. (15)
. . . . . . . . .. .
000···1 For example, I3 is the following 3 × 3 matrix
Ik=0 1 0. (16)
• Important property. If A is an n × m matrix, then
InA=A and AIm =A (17)
so any matrix pre-multiplied or post-multiplied by the (comformable) identity ma- trix is left unchanged. In other words, the identity matrix plays the same role as the number 1 in the usual algebra (for scalars).
• Transposition. Let aij denote the row i, column j element of a matrix A : A = [aij]. The transpose of A, denoted by A⊤, is given by A⊤ = [aji].
For example, consider the matrix
a11 A = a21
a22 a23 . (18)
Its transpose is
Nonlinear econometrics for finance
a32. (19)
a11 a21 A⊤ =a12 a22
a13 a23 a33 Consider, now, the matrix B and its transpose:
1 2 3 1 4 7
B = 4 5 6 B⊤ = 2 5 8 (20)
• Important properties: The transpose of the product of two matrices and the trans- pose of the sum of two matrices are computed as follows:
(AB)⊤ = B⊤A⊤, (A+B)⊤ = A⊤+B⊤.
• Symmetric matrices. A square matrix such that A = A⊤ is said to be symmetric. For example, the identity matrix is symmetric. Another common example in econo- metrics is the Variance-Covariance matrix of a vector random variable. Suppose that X = [X1, X2, X3, X4]. Then, we can compute the variance-covariance matrix
V (X1) COV (X1, X2)
COV (X1, X2) V (X2) V(X)=COV(X ,X ) COV(X ,X )
COV (X1, X3)
COV (X2, X3)
COV (X1, X4)
COV (X2, X4) COV(X ,X ) (21)
13 23 3 34
COV (X1, X4) COV (X2, X4) COV (X3, X4) V (X4) whichissymmetricbecausethecovariancesaresymmetric: indeedCOV(Xi,Xj)=
COV(Xj,Xi) ∀i,j. Check that the transpose of this matrix is the same matrix. Another important symmetric matrix in this course will be the weight matrix W in
Chapter 0 Nonlinear econometrics for finance
the Generalized Method of Moments (GMM). Assume a 3 × 3 weight matrix. This is a matrix that looks like this:
w1 w12 w13
W = w12 w2 a23 , (22)
w13 w23 w3
where, in the diagonal, we have the weights of the squared moments (w1,w2,w3), while off-diagonal we have the weights of the cross-products of each moment pair. We will see the matrix in detail during the lectures.
• Trace. The trace of a square matrix is the sum of the elements along the diagonal.
• Important property:
tr(AB) = tr(BA).
• Determinant. The determinant of a 2 × 2 square matrix A, denoted by |A|, is the
|A| = a11a22 − a12a21.
• One could of course define determinants for more general matrices than a simple
2 × 2 square matrix.
• Important properties:
1. The determinant of the identity matrix is 1.
2. For an n×n matrix A,
|σA| = σn|A|.
• Inverse. If the determinant of a square n × n matrix exists, then its inverse A−1 exists and is such that AA−1 = In (the identity matrix). In the bi-variate case, the inverse is computed as:
1 a−a A−1= 2212.
a11a22 − a12a21 −a21 a11 6
Chapter 0 Nonlinear econometrics for finance In general, you will not compute inverse matrices by hand but will rely on a software.
• Eigenvalues and eigenvectors. Suppose that an n×n matrix A, a nonzero n×1 vector x, and a scalar λ are related by
Then, x is called the eigenvector of A and λ is called the eigenvalue of A.
• Jordan decomposition. Every symmetric matrix A can be written as BΛB⊤, where Λ is a matrix which contains the eigenvalues of A on the diagonal (and zeros ev- erywhere else) and B is an orthogonal matrix consisting of the eigenvectors of A.
• A square orthogonal matrix is such that BB⊤ = I and B−1 = B⊤.
• Idempotent matrices. A square matrix A such that AA = A is called idempo-
• Important property: The eigenvalues of an idempotent matrix are either 1 or zero.
• Quadratic forms. If you have a n × n real symmetric matrix A and a real vector n × 1, then the scalar
is called a quadratic form.
x⊤Ax = aij xixj (23)
– Positive definite. An n × n real symmetric matrix A is said to be positive definite if all quadratic forms are greater than zero for all x ̸= 0; that is, for any real n × 1 vector x ̸= 0, we have x⊤Ax > 0.
– Positive semidefinite. An n × n real symmetric matrix A is said to be positive semidefinite if all quadratic forms are non-negative; that is, for any real n × 1 vector x, we have x⊤Ax ≥ 0.
Chapter 0 Nonlinear econometrics for finance
– Negative definite. An n × n real symmetric matrix A is said to be negative definite if all quadratic forms are less than zero for all x ̸= 0; that is, for any real n × 1 vector x ̸= 0, we have x⊤Ax < 0.
– Negative semidefinite. An n×n real symmetric matrix A is said to be negative semidefinite if all quadratic forms are non-positive; that is, for any real n × 1 vector x, we have x⊤Ax ≤ 0.
• Important property: The eigenvalues of a positive semidefinite matrix are either zero or positive.
• Derivatives of a vector function: Let f(x) be a function with multiple inputs and multiple outputs. That is, x is a d-dimensional vector (a d × 1 vector)
x = . (24)
. xd
and the function f (x) is N -dimensional (an N × 1 vector), that is f1(x1,x2,··· ,xd)
f2(x1,x2,··· ,xd)
f(x) = . . (25)
. fN(x1,x2,··· ,xd)
Then, the partial derivative of the function f(x) with respect to one of the xk’s, k = 1, .., d is an N × 1 vector
∂f1(x1,x2,··· ,xd) ∂xk
∂f(x) ∂f2(x1,x2,··· ,xd)
=∂xk . (26)
∂x . k.
∂fN (x1,x2,··· ,xd) ∂xk
Chapter 0 Nonlinear econometrics for finance
The matrix that contains all of the partial derivatives of f(x) with respect to all of its arguments x1,··· ,xd is a N ×d matrix
∂f1(x1,x2,··· ,xd) ∂x1
∂f2(x1,x2,···,x ) ∂f(x) d
∂fN (x1,x2,··· ,xd) ∂fN (x1,x2,··· ,xd) · · · ∂fN (x1,x2,··· ,xd)
∂x1 ∂x2 ∂xd
Notice the use of the notation ∂x⊤ to indicate that we are creating the partial
derivatives of each function fj (x) as a 1 × d (row) vector.
• Expected value and Variance-Covariance matrix of a vector. Consider the
∂f1(x1,x2,··· ,xd)
∂f2(x1,x2,···,x )
∂f1(x1,x2,··· ,xd) ∂xd
∂f2(x1,x2,···,x )
=∂x1 ∂x2 ∂xd.(27)
.. . ....
N × 1 random vector X
X= . . (28)
. XN
The expected value of the random vector is a N × 1 vector of expected values of each Xi, i = 1,...,N.
E(X)= . . (29)
. E(XN)
What about variances and covariances? The Variance-Covariance matrix contains all the variances and covariances for the vector. It is a N × N matrix computed as follows
V (X) = E(XX⊤) − E(X)E(X)⊤. (30)
Chapter 0 Nonlinear econometrics for finance If we work through the matrix multiplications, we obtain
Other useful concepts and properties
If you have two random variables X and Y , their covariance is
COV (X, Y ) = E(XY ) − E(X)E(Y ). (32)
If we flip this equation around, we can find that the expected value of the product of the two random variables can be written as
E(XY ) = COV (X, Y ) + E(X)E(Y ). (33) We will use the following properties of expected values and variances. Let X and
Y be random variables and let a and b be two constants (scalar). Then we have: 1. E(aX+bY)=aE(X)+bE(Y)
X1 X2
E(X1) E(X2)
− . E(X1)
. .
= E . . .
E[(X1)2]
E[XN X1] E[XN X2] E[(X1)2] − E(X1)2
E[(XN )2] E(XN )E(X1) E[X1X2] − E(X1)E(X2) · · ·
E[(X2)2] − E(X2)2 · · ·
E(XN )E(X2) · · · E(XN )2 E[X1XN ] − E(X1)E(XN )
E(X1)E(X2) · · · E[X2X ] E(X2)E(X1) E(X2)2 · · ·
E(X1)E(XN ) E(X2)E(X )
E[X1XN ] E(X1)2
· · · · · ·
=....−. ...
........ ........
E[X2X1] NN
E[X2X1] − E(X2)E(X1)
=. ... .... ....
E[XN X1] − E(XN )E(X1) E[XN X2] − E(XN )E(X2) · · · V (X1) COV (X1, X2) · · · COV (X1, XN )
COV (X1, X2) V (X2) · · · COV (X2, X ) N
=. .... .... ....
COV (X1, XN ) COV (X2, XN ) · · · V (XN )
E[(XN )2] − E(XN )2
2. V(aX+bY)=a2V(X)+b2V(Y)+2abCOV(X,Y)
• Expected value vs conditional expected value. The mean of a random vari-
able is called the expected value. Let Y be the random variable. Its mean is called 10
E[X2X ] − E(X )2
Chapter 0 Nonlinear econometrics for finance
the expected value and denoted by E(Y ). Note that the mean E(Y ) is a number, a constant.
If the random variable Y is discrete, and has possible values y1, ..., yT , each occurring with probabilities p(y1), ..., p(yT ), its expected value is
E(Y ) = ytp(yt). (34)
For a continuous random variable Y with probability density f(y) we have
The conditional expected value is a function of another random variable. Let X be another random variable with possible values x1,...,xT and let p(y|x) denote the conditional probability of Y given X. Then, the conditional expected value of Y given X is
E(Y |X) = ytp(yt|xk). (36)
Notice that for any value xk of the variable X we will have, in general, a different
result. Therefore, the conditional expectation is a function of X.
For a continuous random variable Y , the idea is similar. Let X be another contin- uous random variable and let the probability density f(y|x) denote the conditional probability of Y given X. Then, the conditional expectation of Y given X is
numbers, therefore the expected value E(Y |X = x) is a function of x.
You have computed conditional expected values in Linear Econometrics for Finance,
when you studied linear regression models. Indeed, consider the simple linear model
yf(y)dy. (35)
yf(y|x)dy. (37) Again, notice that if we change the value x of the variable X we obtain different
Chapter 0 Nonlinear econometrics for finance with one regressor:
yt = βxt + εt (38)
where yt is the dependent variable, xt is the the regressor, εt is the error term and β is the scalar parameter to estimate. The usual assumption for this model is that the error term is on average zero and uncorrelated with regressor, that is E(εt|xt) = 0. Then, the conditional expected value of yt given xt is
E(yt|xt) = E(βxt + εt) = βxt + E(εt|xt) = βxt. (39)
In particular, notice that the conditional expected value of yt is a function of the regressor xt. In other words, different values of xt give different (average) predictions about yt.
• Law of iterated expectations: The law of iterated expectations allows us to take expected values by conditioning on additional random variables. Let Y and X be two random variables. Then,
E(Y ) = E [E (Y |X)] . (40)
Look at the right hand side of the equation. Intuitively, first we take a conditional expectation with respect to X (this expectation will be a function of X, as discussed above) and then we take an expectation with respect to the random variable X. Be- cause of the equality, this is like taking expectations with respect to Y directly (the left hand side of the equation). In this course we use the Law of Iterated Expectations in several computations. In particular, when dealing with the GMM estimator, we define the function g(Xt+1, θ) as a pricing error. We will see conditional restrictions of the following form:
Et (g (Xt+1, θ)) = 0, (41) where the Xt+1s are our data, θ is a parameter to estimate, and the notation
Et (g (Xt+1, θ)) denotes the expected value conditional on the information It avail- 12
Chapter 0 Nonlinear econometrics for finance able at time t:
Et (g (Xt+1, θ)) = E (g (Xt+1, θ) | It) . (42)
This information contains all previous values of the data, that is It = {Xt, Xt−1, Xt−2, ...}. The law of iterated expectations applied to these conditional expectations will allow
us to take expectations on both sides of Eq. (41) and write:
E (g (Xt+1, θ)) = 0, (43) which is now an unconditional restriction.
Quick review of asymptotic theory
Proving “consistency”: The Weak Law of Large Numbers, WLLN
Consider an IID random sample {xt} from a distribution with mean μ and variance σ2 < ∞. Note, we are not making assumptions on the probability distribution of the individual observations xt. We are simply saying that they have the same expected value
(μ) and the same variance (σ2). T
Write X = t=1 T
. Let us now compute the expected value and the variance of X. We
xt E(xt) Tμ
E(X) = E t=1 = t=1 = = μ. T T T
Chapter 0 Nonlinear econometrics for finance The sample mean is unbiased for the true expected value μ.
xt Var(xt) Tσ2 σ2
Var(X)=Vart=1 = t=1 = = . TT2 T2T
The sample mean has a variance which goes to zero as the number of observations in- creases. In other words V ar(X) → 0 as T → ∞.
Note: because (1) the sample mean is, in expectation, the same as the expected value and (2) the variance of the sample mean (which is a measure of variability around the expected value) goes to zero, the sample mean converges to the expected value. What does it mean to converge to the expected value? Next, we talk briefly about modes of covergence.
Convergence in probability: Let g1, g2, g3, ...., gT be a sequence of random variables 123T
xt xt xt xt
(for example, t=1 , t=1 , t=1 , ...,t=1 ). We say that gT converges in probability to a
123T constant c as T → ∞ and write
if, ∀ε > 0,
PlimgT =c,
lim Pr(|gT − c| > ε) = 0. T→∞
Converges in mean-squared: Let g1, g2, g3, …., gT be a sequence of random variables. We say that gT converges in mean-squared to a constant c and write
m.2. gT → c
E(gT−c)2 →0. T→∞
Chapter 0 Nonlinear econometrics for finance
Note: convergence in mean-squared implies convergence in probability. Now, let us return to the sample mean. Notice that
2 2σ2 Var(X)=E(X−E(X)) =E(X−μ) = T →0,
where the first equality is the definition of the variance, the second equality depends on the sample mean being an unbiased estimator of μ and the third equality was derived earlier. Given the third equality,
We say …
X → μ = E(x)
X→p μ=E(x).
(1) … that the sample mean converges in probability to μ = E(x) as the number of observations increases. This is also called a “weak law of large numbers”.
(2) … that the sample mean is a “consistent” estimator for μ as the number of observations increases. It converges to μ, in probability, as the sample grows.
Important for our purposes: In our discussion above, we assumed an IID sample with σ2 < ∞. The result is, however, general. Sample means converge to expectations under fairly mild assumptions like stationarity (an IID sample is clearly stationary) and bounded second moments. We will therefore apply the result more generally.
3.2 Proving “asymptotic normality”: The Central Limit Theo- rem, CLT
Consider an IID random sample {xt} from a distribution with mean μ and variance σ2 < ∞. Again, we are not making assumptions on the probability distribution of the individual observations xt. We are simply saying that they have the same expected value (μ) and the same variance (σ2). We know that
Chapter 0 Nonlinear econometrics for finance 1. E(X)=μ
2. V ar(X) = σ2 T
By the Central Limit Theorem, we also know that “averages” are “approximately” normally distributed as T → ∞. Note that the key word is “approximate”. The larger the sample size, the better the approximation. Thus, we can write
d σ2 XYN μ,T ,
where Yd means “approximately distributed as.” We note that this result is coherent with the convergence in probability of the sample mean to μ. As the number of observations grows, the variance decreases to zero and the sample mean becomes a more and more accurate estimator of the expected value μ.
We can write the same result in a number of different ways:
X −μ Yd N (0, 1)
By simply standardizing ... √TX−μYd N(0,σ2)
The sample mean converges to μ at a speed of convergence of
of convergence to zero of the standard deviation of the sample mean.
Tt=1(xt−μ) Yd N(0,σ2)
Differences from the mean (suitably standardized) converge to a normal random variable. Notice, in fact, that
Tt=1(xt − μ) Tt=1 E(xt − μ)
E √ = √ =0.
So, it makes sense for the limiting normal random variable to be centered at zero.
T . This is the speed
Chapter 0 Nonlinear econometrics for finance In fact, we are averaging de-meaned random variables. Also,
T (xt −μ) T Var(xt −μ) Tσ2 t=1 t=1
Var √T = T =T=σ. Hence, we can also write:
T (x −μ) T (x −μ) t=1 t d t=1 t
√ YN0,Var √ . (44) TT
By standardizing by
distribution is a well-defined number (which does not diverge to infinity or converge to zero). Because the variance is “balanced”, the limiting distribution is well-posed. The representation in Eq. (44
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com