程序代写代做代考 data mining algorithm MAST90083 Computational Statistics & Data Mining Linear Regression

MAST90083 Computational Statistics & Data Mining Linear Regression

Tutorial & Practical 4: Model Selection

Question 1

In this question we are interested in deriving an algorithm for solving Lasso.

Given the model
y = Xβ + �

where y ∈ Rn, X ∈ Rn×p and � ∈ Rn ∼ N (0, σ2In). Let β̂ be the estimate of β obtained by
least square estimation.

1. Let y ∈ R, find the solution u ∈ R that minimizes

(y − u)2 + λ|u|

2. Plot the solution as a function of y

3. Use this solution to derive an algorithm for solving

min
β
‖y−Xβ‖2 + λ|β|

Question 2

Let Y = (y1, …, yn)
>

be an n×q matrix of observations for which we postulate the parametric
model

Y = XB + ε where vec(ε) ∼ N (0,Σ⊗ In)
where X is a known n × k design matrix of rank k, B is a k × q matrix of unknown

parameters, ε is the n× q matrix of errors and Σ is a q × q matrix of error covariance.

1. Give the expression of the log likelihood

2. Find the number of parameters involved in the model

3. Derive the expressions of AIC and BIC

Question 3

Let y ∈ Rn be a vector of observation for which we postulate the linear model

y = Xβ + �

where y ∈ Rn, X ∈ Rn×k, β ∈ Rk and � ∈ Rn ∼ N (0, σ2In). The dimension k of β is
estimated using AIC.

1. Give the form of the AIC criterion

2. Derive the expression of the probability of overfitting.

1