CS542: Machine Learning
LAB2: Linear Regression
Problem Statement We have some data
and we want a “rule” that would tell us how to construct or choose a function
that would agree with data we observe now and expect to observe in the future.
Y
X
Two lenses / frameworks to analyse the problem
1. Statistical Learning Theory
2. Model Parameter Estimation
Statistical Learning Theory
in the “infinite data limit”
Excessive risk: restaurant analogy
Imagine you were in a restaurant with menu H. You have some idea of what you like and what different dishes taste, but you have only finite experience.
How the size of the menu (H) and your experience with the cuisine (N) will affect your final expected “level of dissatisfaction” with the dish you choose
compared to the hypothetical universe in which you knew exactly “how everything tastes” and could choose “tastiest dish in this restaurant” (according to our taste)
How would our “infinite data limit dissatisfaction level” change depending on the size of the menu?
Statistical Learning Theory
– a family / set of decision rules – the hypothesis space
– a loss / cost function
– expected risk of the hypothesis
– empirical risk of the hypothesis
– optimal decision rule if we knew P(X, Y)
– Empirical Risk Minimization (ERM)
– excessive risk
Statistical Learning Theory
“realizable learning” is when H corresponds to (contains optimal rule for) P(X, Y), but the analysis extends beyond that
Interesting questions: how excessive risk depends on the size of the training set and the complexity of the hypothesis space?
Two lenses / frameworks to analyse the problem
1. Statistical Learning Theory
2. Model Parameter Estimation
Model Parameter Estimation
Notation: graphical models
ABC
D
“for loop”
i = 0 .. N
Di
Model Parameter Estimation We assume a “story” behind the dataset we observe.
θ
i = 0 .. N
yi xi
ti 𝛆i
𝝈
Interesting questions: are model parameters identifiable, i.e. can we determine θ from data?
Linear regression – derivation
i = 0 .. N
yi xi
ti 𝛆i
θ
𝝈
[.. lecture 03 slide 20 ..]