程序代写代做代考 Option One Title Here

Option One Title Here

ANLY-601
Advanced Pattern Recognition

Spring 2018

L9 – Model Estimation

Parameter and Model
Estimation

• Densities aren’t available, only data.

• When building parametric classifiers or density

models we estimate quantities based on

sample data.

These estimates are, themselves, random

variables and thus suffer random fluctuations.

We need to characterize these.

2

Parameter and Model
Estimation

We want to estimate some model function f of statistical parameters yi by some
estimator

Where the are estimates of statistical quantities from the data – e.g.
sample mean, sample covariance …

Example: Linear regression of a dependent variable t on an independent
variable x is given by

where are determined from fitting data.

Clearly these estimators vary across training sets, so the regression predictor

will vary across training sets.

We’d like to express the variability, and the bias, in

)ˆ…,,ˆ,ˆ(ˆ
21 q

yyyff 

i

tx

x

xt xxt 


ˆ)ˆ(

ˆ

ˆ
)(ˆ

2


2ˆand,ˆ,ˆ,ˆ 
xttx

)(ˆ xt

)(ˆ xt

3

Bias / Variance Decomposition
of Model Error

)ˆ…,,ˆ,ˆ(ˆ
21 q

yyyff Suppose we estimate some model (or parameter) from data

What’s the expected squared error between this estimate, and

the true model f ? (The statistical expectation is over data sets used to estimate

the model or parameter. Can you write that as an integral?)

   

   

22

2 2

E f f E f f E f E f

E f E f f E f

          
        

       
     

prove this!

variance + squared bias

4

Simplest Example: ML
Variance

The naïve estimate of the population variance (we’ll see this is also the
maximum likelihood estimate) is

The expected value of the estimator is

Hence the unbiased estimator is





N

i

iN

N

i

iN
xwithxf

1

1

1

212 ˆ)ˆ(ˆ)ˆ( 

 
2

22 21
1 1 1

ˆ ˆ
i i jN

i i j

N
E E x E x x

N N N
  

      
            
       

  

 
22

1

1
ˆ ˆ

1 1

N

i

i

N
x

N N
 

 
  

  

5

Variance of
Variance

Gives

One needs to make assumptions about the density for x
in order to evaluate

 
22ˆ ˆ ˆvar( ) [ ]E E    

 

 2 2 2 4 2 2ˆ ˆ ˆ ˆ ˆvar( ) 4 [ ( ) ] 4 [ ] [ ]E E E        

4ˆ[ ]E 

6

Parameter and Model Estimation

In some cases, can write exact algebraic expressions for model

bias and variance.

More generally, we cannot. The following series expansion can

help derive approximate expressions.

We’ll start by assuming that the parameter estimates are close to

the true values

Next, Taylor expand about f(y)

“small”with,ˆ
iiii

yyyy 

    …)(

…)()ˆ…,,ˆ,ˆ(ˆ

22
1

1,

2

2
1

1

21










 



yfDyyfyf

yy
yy

f
y

y

f
yfyyyff

TT

ji

q

ji ji

i

q

i i

q

7

Parameter and Model
Estimation

Having defined

 
ji

ij

qqq

q

q

q

yy

f
fDor

y

f

yy

f

yy

f

yy

f

y

f

yy

f

yy

f

yy

f

y

f

fD

y

f

y

f

f










































2

2

2

2

2

2

1

2

2

2

2

2

2

12

2

1

2

21

2

2

1

2

2

1

ector)gradient v (the



8

Model Bias

Take the expectation of on p7

If the are unbiased then E[y]=0 and this reduces to

From these we can estimate (to lowest order) the bias in .

Even if are unbiased, will in general be biased.

      
      …][)(

…][)(ˆ

22
1

22
1





TTT

TT

yyfDETraceyEfyf

yfDyEyEfyffE

i

f̂iŷ

 1 22
ˆ ( ) …

T
E f f y Trace E D f y y           

9

Model Variance

Similarly, to lowest order, the variance is given by (assuming
unbiased )

This is how we got the expression for on p6.

 

  
  
   3

2

22
1

22
1

2

22
1

2

)()(

…)][()()(

…)][(ˆ

]ˆ[ˆ)ˆvar(

yOfyyfE

yyEfDTraceyyfDTraceyfE

yyEfDTraceffE

fEfEf

TT

TTT

T










 

i

2ˆvar( )

10

Empirical Error
Estimates

We estimate the performance of a classifier by
counting errors on a finite test set.

Suppose the true error rate for class 1 objects is E1.
Then the number of errors made on a sample of N1
objects follows a binomial distribution

The average number of errors is E[Nerrors]=N1 E1 so

The variance is

i

i

errors

i

i

N

N

N

objectsclassonerrorsof


#
ˆ
i
E

errors
i

errors NN

i

N

i
errors

i

i
errors

i

N

N
NP








 )1(( EE)

1

11
111

111
)1(

)1()var()ˆvar( 2
1

2
1 N

NN
NerrorsNi

EE
EEE




11
]ˆ[ EE E

11