CS计算机代考程序代写 matlab AI FTP Probability Density Functions

Probability Density Functions

Australian National University

(James Taylor) 1 / 7

6 I

Density and Likelihood Functions

First, suppose X is an (absolutely continuous) random variable

Everything we will need to know about X is summarised by it’s

probability density function (pdf) f (x)

That is, given f (x) we can compute EX , Var(X ), sin(X ), EX 20, etc.

(James Taylor) 2 / 7

I

Examples

Univariate Normal, N (µ, s2)

f(x ; µ, s2) =


1

2ps2

◆1/2
exp


1

2s2
(x � µ)2

= (2p)�1/2|s2|�1/2 exp


1

2
(x � µ)(s2)�1(x � µ)

Multivariate Normal, N (µ,S)

(2p)�n/2|S|�1/2 exp


1

2
(x� µ)0S�1(x� µ)

(James Taylor) 3 / 7

Normalisation
IsAxldx

e

Joint Distribution of Normal Random Variables

Suppose we have n iid random variables X1, . . . ,Xn with Xi ⇠ N (µ, s2),
and we wish to derive their joint pdf f (x)

As X1, . . . ,Xn are iid:

f (x) =
n


i=1

f(xi ; µ, s
2)

=
n


i=1


1

2ps2

◆1/2
exp


1

2s2
(xi � µ)2

=


1

2ps2

◆n/2
exp

1

2s2

n

Â
i=1

(xi � µ)2
!

(James Taylor) 4 / 7

t II 0 Hi 1m67T
independent

Geb eatb

Joint Distributions of Normals are Multivariate Normals

Quick Exercise: The joint distribution of iid normals from the previous

slide is a multivariate normal.

Joint normal:


1

2ps2

◆n/2
exp

1

2s2

n

Â
i=1

(xi � µ)2
!

Multivariate Normal:

(2p)�n/2|S|�1/2 exp


1

2
(x� µ)0S�1(x� µ)

(James Taylor) 5 / 7

o

Fayeto

Find E M

(James Taylor) 6 / 7

KalEl4 iexpl tcxmi ilx.us
M

Let 2 1 1 I a fan I
wait 1h expl Ilx.us’s’Ll ex ul
City 625 exp l x ni x ul

Easy exec III Hi Ml
same

LUTTE1621 exp I Eilxi m
2 exp l Ni mi

Yet More Distributions

Distribution Notation pdf Support

Bernoulli Ber(p) px (1� p)1�x {0, 1}
Uniform U(a, b) 1b�a (a, b)

Gamma G(a, b)
ba

G(a)x
a�1e�bx R+

Univariate t t(n, µ, s2)
G((n)/2)

s
p

npG(n/2)


1+ 1n


x�µ

s

⌘2◆� n+12
R

(James Taylor) 7 / 7

Properties of Determinants

A Brief Mathematical Interlude

Australian National University

(James Taylor) 1 / 2

6.2

Digression: Properties of Determinants

What is the determinant?

Multiplying a row/column is fine; also multiplying the whole matrix

Swapping rows/columns is ok

Loves triangular matrices

Plays well with inverses and transposes

Plays well with matrix multiplication

(James Taylor) 2 / 2

det
kttfdetlf.baadbc up UHUD
thesizeofthematrix

au cibdl la.ba

detlAI IA

(James Taylor) 2 / 2

dit Kia Kab kkaid K bi c au au

me
KdetIAI

det l k Ebd

det KfcKfa kaKd kb.pe
det KA ditLA

Read by

(James Taylor) 2 / 2

det Cadb cd da swap
2 row multiplyoutby 1

lad be

det bd f s be ad swap 2
columns hut detby 1

Lad bc

(James Taylor) 2 / 2

detCao III aada
bio dt

ajaaiii.inJ a au
anndetPgfIf

alntit told Itold
morethan 3tines as hard as 2 2

det of nXh is more than u tines as hard ai na x na

(James Taylor) 2 / 2

det ab’d ad b
DetCA l detCA

ad be

det A Det AI

detCAB detCA detCB

det At B nothing in particular

Likelihood Functions

General Theory

Australian National University

(James Taylor) 1 / 5

6.3

Likelihood Function

A very important concept in statistics

It describes, in a precise manner, our information about the

parameters of the model, given the observed data

Let f (x | q) denote the (multivariate) pdf for the sample
X = (X1, . . . ,Xn) with unknown parameter vector q.

That is f (x | q) is the marginal probability of finding x if the
parameter was indeed q.

(James Taylor) 2 / 5

Likelihood Function: Definition

Given that x is observed, the likelihood function is defined as

L(q | x) = f (x | q)

So the likelihood is just the pdf of the data, but viewed as a function

of the parameter vector q

In a sense it is the same function, just viewed from a new perspective

(James Taylor) 3 / 5

i

Likelihood Function: Definition

By definition, the likelihood function is a pdf in x, Therefore

Z
L(q | x)dx =

Z
f (x | q)dx = 1

But L(q, x) is not a pdf with respect to q

That is, in general Z
L(q | x)dq 6= 1

(James Taylor) 4 / 5

e

Log-likelihood

For basically all purposes it is easier to work with the logarithm of the

likelihood function

Naturally, this is called the log-likelihood function

`(q | x) = log L(q | x)

Why is this ok to do?

Why might we want to do this?

(James Taylor) 5 / 5

T
en

take6g

e mostlikely wewouldbemaximizingthelikelihood hewanttofindthe

avgwax valueof thita I

Likelihood Functions

Log-likelihood of the AR(1) Process

Australian National University

(James Taylor) 1 / 3

67cal

Log-likelihood for the AR(1) Process

Recall the AR(1) process:

yt = ryt�1 + et , et ⇠ N (0, s2)

Given yt�1 and the parameters r, s
2, we know yt ⇠ N (ryt�1, s2)

So the pdf of yt is

f (yt | yt�1, r, s2) =

1

2ps2

◆1/2
exp


1

2s2
(yt � ryt�1)2

(James Taylor) 2 / 3

mean variance

Log-likelihood for the AR(1) Process

The joint density is then

f (y1, . . . , yT | y0, r, s2) =
T


t=1

f (yt | yt�1, r, s2)

So the log-likelihood is

`(r, s2 | y) = log
T


t=1

f (yt | yt�1, r, s2)

=
T

Â
t=1

log f (yt | yt�1, r, s2)

= �
T

2
log(2ps2)�

1

2s2

T

Â
t=1

(yt � ryt�1)2

(James Taylor) 3 / 3

II flytlyo.ba

fcyifyo.P64fly4yyXo.P64 flYslYny e.at

fifty e02
dortginchformation

f (yt | yt�1, r, s2) =

1

2ps2

◆1/2
exp


1

2s2
(yt � ryt�1)2

(James Taylor) 3 / 3

togf lyt lyt l p 62 FI bg I lnot expc setaft eyeD2

If 6g l 2 F lyt pyt T

FIE I togCLI64 2 Iyt IytDY

I togl Il 64 EI lyt Pyi i5

f lP 6 Iy

(James Taylor) 3 / 3

Yt PTt i EtstNNCO.bz iid Kfmatrix
H

y eyes Hye IT LIL o
I ti HH
tt H’it

(James Taylor) 3 / 3

y pLy t s s Nc O 621 wantdistribution ofy
y pLy s
I my E

yHIME y n N lo Me61mi
MT

I
f y l e 6 UT E GNeMi expl E y l6MeMp5 y
f lp o lyJ E log LTD Ehog I yC62Meup I Iy
I tog ut E hog64 yC2 outfitsputgy
I 6g12404 2 Y L pLI LI pby
I 6gCUT64 FI IYe pyt I

(James Taylor) 2 / 2

16MpMe1 16251 141411
Me CI put

64T det Me det Mj

64Tdetuue5

64 detllI.PL Y

c65aetueTI eLftoii.jif
c65

(James Taylor) 4 / 7

12pDy

if it I It ILy pyo4Lys ly ft lyi PYi i5
ly t eYt i

Likelihood Functions

Log-likelihood of the MA(1) Process

Australian National University

(James Taylor) 1 / 7

6 Cbl

Log-likelihood for the MA(1) Process

Recall the MA(1) process:

yt = et + qet�1, et ⇠ N (0, s2)

We can write this in matrix form as
0

BBBBBBBB
@

y1

y2

y2

yT

1

CCCCCCCC
A

=

0

BBBBBBBB
@

1 0 0 · · · 0
q 1 0 · · · 0
0 q 1 · · · 0

. . .

0 0 · · · q 1

1

CCCCCCCC
A

0

BBBBBBBB
@

e1

e2

e3

eT

1

CCCCCCCC
A

(James Taylor) 2 / 7

iid

Yi E tdLo

y SetQE

YT ET1041

FO

Log-likelihood for the MA(1) Process

Therefore

(y | q) ⇠ N (0, s2GqG0q)

So the log-likelihood is

`(q, s2 | y) = log

(2p)�T/2|S|�1/2 exp



1

2
(y� µ)0S�1(y� µ)

◆◆

= �
T

2
log(2ps2)�

1

2
log |GqG0q |�

1

2s2
y0(GqG

0
q)

�1y

(James Taylor) 3 / 7

E NCO 621 y To E

Y N lo Tol 6 1 To

K

O

(James Taylor) 4 / 7

log EyE I EToTa f exp l I ly D l6ToTo 5 y D

E6gcuttingI 6Toto l Ey l 6ToTa T y

I togi ra log l05 IToto l 2 y CToto’s y

E logUT EY l64 ItoglToto l utY lTato J y

I log LTE i Tf II Y 1ToTa I y

Log-likelihood for the MA(1) Process

As an example, generate T = 100 data points according to the

MA(1) model with q = 0.8 and s2 = 1

Assume that s2 = 1 is known

So the log-likelihood will be a function only of q

Evaluate the log-likelihood over a grid of q’s

(James Taylor) 5 / 7

MATLAB code

ngrid = 300;

theta grid = linspace(.2,1,ngrid);

%% construct A and B so that Gam = theta*A + B

A = speye(T);

B = spdiags(ones(T�1,1),[�1],T,T);
ell = zeros(ngrid,1);

for i=1:ngrid

theta = theta grid(i);

Gam = A + B*theta;

Gam2 = Gam*Gam’;

ell(i) = �T/2*log(2*pi) �.5*log(det(Gam2)) +…
� .5*y’*(Gam2\y);

end

(James Taylor) 6 / 7

I l f

o is TXT
9 B

e

t I I
TT

E ecdily

Figure: The log-likelihood `(q | y) for the MA(1) model with s2 = 1

(James Taylor) 7 / 7

evil

i
i

o

Maximum Likelihood Estimation

General Theory

Australian National University

(James Taylor) 1 / 4

64

Maximum Likelihood Estimator

Problem: given the likelihood L(q | y) with observed sample y, we
want the “best” guess for q

Solution: use the value of q for which the observed sample y is most

likely

Call this the maximum likelihood estimator

(James Taylor) 2 / 4

Maximum Likelihood Estimator: Definition

The method of maximum likelihood estimation formalises this idea

into a parametric maximisation problem

The maximum likelihood estimator for q is defined as

q̂MLE = argmax
q2Q

L(q | y) = argmax
q2Q

`(q | y)

The q̂MLE is found by maximising the likelihood (or indeed the

log-likelihood)

(James Taylor) 3 / 4

OO O

Properties of MLE

Theorem – Properties of MLE

Let y be an observed sample with log-likelihood `(q | y). Then, under
appropriate regularity conditions, the MLE q̂MLE is consistent and is

asymptotically normal

q̂MLE ⇠ N (q, �1(q))

where (q) = �EH(q;Y) is the Fisher information matrix, and

H(q; y) =
∂2

∂q∂q0
`(q | y)

is the Hessian matrix.

(James Taylor) 4 / 4

Maximum Likelihood Estimation

Maximising Log-likelihood of the Linear Regression Model

Australian National University

(James Taylor) 1 / 5

6.4cal

Log-likelihood for the Linear Regression Model

Recall the linear regression model:

y = Xb + e, e ⇠ N (0, s2 T )

The joint density of y is N (Xb, s2 T )

The log-likelihood is given by

`(b, s2 | y) = �
1

2
log |2ps2 T |�

1

2
(y�Xb)0(s2 T )�1(y�Xb)

= �
T

2
log(2ps2)�

1

2s2
(y�Xb)0(y�Xb)

(James Taylor) 2 / 5

ii
µ

Enfant

MLE for Linear Regression Model

Log-likelihood for the linear regression is

`(b, s2 | y) = �
T

2
log(2ps2)�

1

2s2
(y�Xb)0(y�Xb)

= �
T

2
log(2ps2)�

1

2s2
(b0X0Xb � 2y0Xb + y0y)

The first-order conditions

∂b
`(b, s2 | y) = �

1

2s2
(2b0X0X� 2y0X) = 0

∂s2
`(b, s2 | y) = �

T

2s2
+

1

2(s2)2
(y�Xb)0(y�Xb) = 0

(James Taylor) 3 / 5

I

(James Taylor) 4 / 5

ftp 64Y Ehfl2T6 IT ly x ly xp
I 6g Ltd tr ly y t px’xp yxp px’y

I

I 6g Ltd 62 yy t px’Xp yxp teynstantPx’y Px’yl
kkgTH yxp

I f l f 64y O 2 o 2Bx’x 2Y’x o text
me

If X X 2yX
I x’x5x’y

c

l LP64y E IT t 4 y x’e ly xp 0

y XI l ly y T It

E fly xpIcy xp I

MLE for Linear Regression Model

Solving this system of equations for b and s2:

b̂ = (X0X)�1X0y

ŝ2 =
1

T
(y�Xb̂)0(y�Xb̂)

So q̂ = (b̂, ŝ2) is a critical point

To show it is indeed a local max we would need to show the Hessian

at q̂ is positive definite, but let’s not.

(James Taylor) 5 / 5

IT1y X Xy5x y I Iy XIXy5Xy