Option One Title Here
ANLY-601
Advanced Pattern Recognition
Spring 2018
L9 – Model Estimation
Parameter and Model
Estimation
• Densities aren’t available, only data.
• When building parametric classifiers or density
models we estimate quantities based on
sample data.
These estimates are, themselves, random
variables and thus suffer random fluctuations.
We need to characterize these.
2
Parameter and Model
Estimation
We want to estimate some model function f of statistical parameters yi by some
estimator
Where the are estimates of statistical quantities from the data – e.g.
sample mean, sample covariance …
Example: Linear regression of a dependent variable t on an independent
variable x is given by
where are determined from fitting data.
Clearly these estimators vary across training sets, so the regression predictor
will vary across training sets.
We’d like to express the variability, and the bias, in
)ˆ…,,ˆ,ˆ(ˆ
21 q
yyyff
i
ŷ
tx
x
xt xxt
ˆ)ˆ(
ˆ
ˆ
)(ˆ
2
2ˆand,ˆ,ˆ,ˆ
xttx
)(ˆ xt
)(ˆ xt
3
Bias / Variance Decomposition
of Model Error
)ˆ…,,ˆ,ˆ(ˆ
21 q
yyyff Suppose we estimate some model (or parameter) from data
What’s the expected squared error between this estimate, and
the true model f ? (The statistical expectation is over data sets used to estimate
the model or parameter. Can you write that as an integral?)
22
2 2
E f f E f f E f E f
E f E f f E f
prove this!
variance + squared bias
4
Simplest Example: ML
Variance
The naïve estimate of the population variance (we’ll see this is also the
maximum likelihood estimate) is
The expected value of the estimator is
Hence the unbiased estimator is
N
i
iN
N
i
iN
xwithxf
1
1
1
212 ˆ)ˆ(ˆ)ˆ(
2
22 21
1 1 1
ˆ ˆ
i i jN
i i j
N
E E x E x x
N N N
22
1
1
ˆ ˆ
1 1
N
i
i
N
x
N N
5
Variance of
Variance
Gives
One needs to make assumptions about the density for x
in order to evaluate
22ˆ ˆ ˆvar( ) [ ]E E
2 2 2 4 2 2ˆ ˆ ˆ ˆ ˆvar( ) 4 [ ( ) ] 4 [ ] [ ]E E E
4ˆ[ ]E
6
Parameter and Model Estimation
In some cases, can write exact algebraic expressions for model
bias and variance.
More generally, we cannot. The following series expansion can
help derive approximate expressions.
We’ll start by assuming that the parameter estimates are close to
the true values
Next, Taylor expand about f(y)
“small”with,ˆ
iiii
yyyy
f̂
…)(
…)()ˆ…,,ˆ,ˆ(ˆ
22
1
1,
2
2
1
1
21
yfDyyfyf
yy
yy
f
y
y
f
yfyyyff
TT
ji
q
ji ji
i
q
i i
q
7
Parameter and Model
Estimation
Having defined
ji
ij
qqq
q
q
q
yy
f
fDor
y
f
yy
f
yy
f
yy
f
y
f
yy
f
yy
f
yy
f
y
f
fD
y
f
y
f
f
2
2
2
2
2
2
1
2
2
2
2
2
2
12
2
1
2
21
2
2
1
2
2
1
ector)gradient v (the
8
Model Bias
Take the expectation of on p7
If the are unbiased then E[y]=0 and this reduces to
From these we can estimate (to lowest order) the bias in .
Even if are unbiased, will in general be biased.
f̂
…][)(
…][)(ˆ
22
1
22
1
TTT
TT
yyfDETraceyEfyf
yfDyEyEfyffE
i
ŷ
f̂
f̂iŷ
1 22
ˆ ( ) …
T
E f f y Trace E D f y y
9
Model Variance
Similarly, to lowest order, the variance is given by (assuming
unbiased )
This is how we got the expression for on p6.
3
2
22
1
22
1
2
22
1
2
)()(
…)][()()(
…)][(ˆ
]ˆ[ˆ)ˆvar(
yOfyyfE
yyEfDTraceyyfDTraceyfE
yyEfDTraceffE
fEfEf
TT
TTT
T
i
ŷ
2ˆvar( )
10
Empirical Error
Estimates
We estimate the performance of a classifier by
counting errors on a finite test set.
Suppose the true error rate for class 1 objects is E1.
Then the number of errors made on a sample of N1
objects follows a binomial distribution
The average number of errors is E[Nerrors]=N1 E1 so
The variance is
i
i
errors
i
i
N
N
N
objectsclassonerrorsof
#
ˆ
i
E
errors
i
errors NN
i
N
i
errors
i
i
errors
i
N
N
NP
)1(( EE)
1
11
111
111
)1(
)1()var()ˆvar( 2
1
2
1 N
NN
NerrorsNi
EE
EEE
11
]ˆ[ EE E
11