Bias
Classical Classical Theory
Variance Decomposition
Modern Elements Regularization
Parameterselection
K fold CV Successive Hatay
Data Efficient
Compute
Efficient
Modern Theory
Bonus
True function
ho a OzWt OlK t Oo
don’tobserve ho only samplesfrom
it
a
SM se s Samples
whatif we fit a tone to sampler
Bias
a
we underfit the data
a Are
What if we
Overfitting data Variance
Informally
or
a
fit high degree polynomial
degree 5
Hope If we picked quadratics
save ha quadratic inherent noise
low bias not expect
Famous Chart Error
variance
zero error optimal model
complexity
Test Loss
Training Loss More Formally Bias Variance Tradeoff
model complexity
degree
True hypothesis
ha n a O u
E NOE error
output
yhoa E
E Features
Procedurei
data
C IRD
hfweaann.gg
IEEE 0 ECE T
observed
GEIR O n ElRd
L Draw
y
call this S ho na tell
n y
h labeledpants CX g
2
We train model on 5 Call it hs IRD
IR
3 PickaCIRDtestpoint yhole E EnNCOEt 4 Measure hsbc y Risk
We examine Goat Decompose
Ee
IE LhsCa SE
error
g
J
X Y independent IEEXY
IEEXT E
how holnl5
hocus e5
e 2EEhshholn IEChin
LhsCn y5 Ee
IsEe Chsm
p ofSIEeTo s
o OISEhsm Unavoidable
error
indpt
Define haug Ca EsLhsCns randomlyselect S tram to fit hs evaluate
overs
hscn
haug average prediction
1EsfthsCul hotel5 EsfLhsCa haugen haugen hotel
ISE
hsiu hangCND 2IEs
t F Charges how
havgCu
Since techsCNT hangbn
n
O
havg Es hsin haugenD
EsKhsaa
haugen 5 t Chang CN hotel
Variance of tracing procedure VAR Ch
Bias does not depend on S error introduced by
model family
Summary sEe Chin Unavoidable error
how
c Bias
ChangCui hocus t techsCa havgcu
02
Et
1 Variance
optimal
Total Error
Bias
model
Regularization
Reduce Variance to obtain more robust model to training set variation
Explicit
Implicit
Penalty terms regularization
squares
Change the model by the algorithm
Classical
setting
Cn O g ordinary heart
C Rt complex model
OEIRD Xi 0
parameter
Argun
t Iz110th
pick a less
Xi 100100
tradeoff 4 hyperparameter
O
fix
O
probably good O
solution
a
co XO g Too y
t 12020
Normal equation
Pollo
XTXog XO xtx XT.JO X’Ty
Uhde determined modern
Rank XTX there U no
ML
Ld
o
XO XTXOo
7v GTXvO
unique sod My
TSO
OoTV is a solution too KTX is PSD Omani
702270
0h70
THAI
on’xX70 regularized solution
ridge regression how
0,7 3
9
47
A
variance
lsE hsin
25
Xty
1
Reduces
Varsch
increase X
0GB
spectrum getting flatter
variance 1 as well
PNaucxtx OGD t Pspancx OG
decreases
Bonuses Implicitly regularize thought expt run gradient descent
Claim Pnm Oso Pnm Old
get’t OCA L XT xd y e always in
span Cx 09 0 min norm sol just by using
Extra
Belkin Hsu Ma
e
gradient descent
Mandel 2018 Double Descent
loss D
interpolation