MAST90083 Computational Statistics & Data Mining NPR
Tutorial & Practical 7: Solutions Question 1:
1. The linear spline model for f is given by
f(xi)=β0 +β1xi +uk(xi −kk)+
k=1
βT = [β0 β1] and uT = [u1, . . . , uk] define the coefficients of the polynomial functions
and truncated functions respectively. 2. Define
1 x1 . .
X=. . 1 xn
(x1 −k1)+ … (x1 −kK)+ . .. .
Z=… (xn −k1)+ … (xn −kK)+
The penalized spline fitting criterion when divided by σε2 can be written as 1 ||y − Xβ − Zu||2+ λ ||u||2
σε2 σε2
3. Assuming cov(u) = G = σu2I and σu2 = σε2/λ and cov(ε) = R = σε2I
(y − Xβ − Zu)T R−1(y − Xβ − Zu) + uT G−1u
which corresponds to the negative log-likelihood of probability density of (y,u) with
y|u ∼ N (Xβ + Zu, R) and u ∼ N (0, G)
4. C = [X Z], D = diag(0,0,1,1,…,1), and α = σε2/σu2,
θ = [βT uT]T = (CTC+αD)−1CTy
5. The fitted values can be written as
̃f = C ( C T C + α D ) − 1 C T y
6. y=Xβ+ε∗,whereε∗ =Zu+ε
cov(ε∗)=V=ZGZT +R
7. β ̃ = (XT V−1XT )−1XT V−1y, and cov(β ̃) = (XT V−1X)−1
1
K
MAST90083 Computational Statistics & Data Mining NPR
Question 2:
1. The Nadaraya-Watson kernel estimation at x0 is given by
Kh(x0, xi) ̸= 0 i=1
ˆ
2. For a Gaussian kernel we have
i=1
N K(x,x)y N
i=1 h 0 i i f(x0) = N Kh(x0, xi) I
(x−x0)2 Kh(x0,x)=√2πhexp − 2h2
1
Since Kh(x0, x) ̸= 0 for all x0 and xi we don’t have a singularity in the denominator of ˆ
f(x0).
The Gaussian kernel is also everywhere differentiable in x0 and therefore the Nadaraya- Watson estimator is differentiable as a function of x0.
3. For the Epanechnikov Kernel we have
Kh(x0,x) = K h
|x−x0|
34(1−z2) if|z|≤1 0 otherwise
K(z) =
We can observe from the expression that for |x − x0|→ h−, |x−x0| → 1−
h
∂Kh(x0, x) ∂K(z) ∂z −3 −1 3z
∂x =∂z.∂x=2z h=2h=2h2 ̸=0
3|x − x0|
00
However, when |x − x0|→ h+, |x−x0| → 1+ and
h
∂Kh(x0,x) = 0
∂x0
because the function is zero in this domain. Since the two limits are different this kernel
is not differentiable everywhere.
Question 3:
1. N plays the role of the smoothing parameter similar to h for the kernel smoothing.
2.
N1nNn
fˆ (x)=θˆρ(x)= yρ(x)ρ(x)=yW (x)
nN jjnijij ini j=1 i=1 j=1 i=1
where W (x) = 1 N ρ (x )ρ (x) and fˆ is therefore a linear estimator. ni nj=1jij nN
2
MAST90083 Computational Statistics & Data Mining NPR
3.
1n1nn
θˆ = yρ (x)= f(x)ρ (x)+ερ (x) jnijinijiiji
4.
0
= E(θj)−θj +E θj −E(θj) 1 n 2
=rj2+E n εiρj(xi) i=1
5.
1 n
rj = n
i=1
E θj −θj
f(x)ρj(x)dx = n f(xi)ρj(xi) − θj i=1
i=1
i=1 i=1
f(xi)ρj(xi) = θj + rj 1 n
ˆ 1n
E(θj) = n
1
ˆ2ˆ22
fnN −f = (θj −θj) + θj since
2
j=1 j=N+1
and {ρj} is an orthonormal basis
ˆ 2ˆ 22
6. As shown in the figure
i=1
f(xi)ρj(xi) −
ˆ 2ˆ 2 ˆ ˆ2
= r j2 + n 2
∞
i=1
j=1 N∞
1 n
σ 2 σ ε2 ρ 2j ( x i ) = r j2 + nε
f(x) = θjρj(x) j=1
N
fˆ (x)=θˆρ(x) nN jj
N∞
fˆ −f=(θˆ−θ)ρ(x)+θρ(x)
nN j jj jj j=1 j=N+1
N∞ E∥fnN −f∥2= E (θj −θj) + θj
j=1 j=N+1
N∞ Nσε2 2 2
=n+rj+ θj=C j=1 j=N+1
3
MAST90083 Computational Statistics & Data Mining NPR
Figure 1: Solution to 3.6
4