MAST90083 Computational Statistics & Data Mining NPR
Tutorial & Practical 7: Solutions
Question 1:
1. The linear spline model for f is given by
f(xi) = β0 + β1xi +
K∑
k=1
uk(xi − kk)+
βT = [β0 β1] and u
T = [u1, . . . , uk] define the coefficients of the polynomial functions
and truncated functions respectively.
2. Define
X =
1 x1… …
1 xn
Z =
(x1 − k1)+ . . . (x1 − kK)+… . . . …
(xn − k1)+ . . . (xn − kK)+
The penalized spline fitting criterion when divided by σ2� can be written as
1
σ2�
||y −Xβ − Zu||2+
λ
σ2�
||u||2
3. Assuming cov(u) = G = σ2uI and σ
2
u = σ
2
�/λ and cov(�) = R = σ
2
� I
(y −Xβ − Zu)TR−1(y −Xβ − Zu) + uTG−1u
which corresponds to the negative log-likelihood of probability density of (y,u) with
y|u ∼ N (Xβ + Zu,R) and u ∼ N (0,G)
4. C = [X Z], D = diag(0, 0, 1, 1, …, 1), and α = σ2�/σ
2
u,
θ = [βT uT ]T = (CTC + αD)−1CTy
5. The fitted values can be written as
f̃ = C(CTC + αD)−1CTy
6. y = Xβ + �∗, where �∗ = Zu + �
cov(�∗) = V = ZGZT + R
7. β̃ = (XTV−1XT )−1XTV−1y, and cov(β̃) = (XTV−1X)−1
1
MAST90083 Computational Statistics & Data Mining NPR
Question 2:
1. The Nadaraya-Watson kernel estimation at x0 is given by
f̂(x0) =
∑N
i=1Kh(x0, xi)yi∑N
i=1Kh(x0, xi)
I
[
N∑
i=1
Kh(x0, xi) 6= 0
]
2. For a Gaussian kernel we have
Kh(x0, x) =
1
√
2πh
exp
{
−
(x− x0)2
2h2
}
Since Kh(x0, x) 6= 0 for all x0 and xi we don’t have a singularity in the denominator of
f̂(x0).
The Gaussian kernel is also everywhere differentiable in x0 and therefore the Nadaraya-
Watson estimator is differentiable as a function of x0.
3. For the Epanechnikov Kernel we have
Kh(x0, x) = K
(
|x− x0|
h
)
K(z) =
{
3
4
(1− z2) if |z|≤ 1
0 otherwise
We can observe from the expression that for |x− x0|→ h−,
|x−x0|
h
→ 1−
∂Kh(x0, x)
∂x0
=
∂K(z)
∂z
.
∂z
∂x0
=
(
−3
2
z
)(
−1
h
)
=
3z
2h
=
3|x− x0|
2h2
6= 0
However, when |x− x0|→ h+,
|x−x0|
h
→ 1+ and
∂Kh(x0, x)
∂x0
= 0
because the function is zero in this domain. Since the two limits are different this kernel
is not differentiable everywhere.
Question 3:
1. N plays the role of the smoothing parameter similar to h for the kernel smoothing.
2.
f̂nN(x) =
N∑
j=1
θ̂jρj(x) =
1
n
n∑
i=1
N∑
j=1
yiρj(xi)ρj(x) =
n∑
i=1
yiWni(x)
where Wni(x) =
1
n
∑N
j=1 ρj(xi)ρj(x) and f̂nN is therefore a linear estimator.
2
MAST90083 Computational Statistics & Data Mining NPR
3.
θ̂j =
1
n
n∑
i=1
yiρj(xi) =
1
n
(
n∑
i=1
f(xi)ρj(xi) +
n∑
i=1
�iρj(xi)
)
E(θ̂j) =
1
n
n∑
i=1
f(xi)ρj(xi) = θj + rj
rj =
1
n
n∑
i=1
f(xi)ρj(xi)−
∫ 1
0
f(x)ρj(x)dx =
1
n
n∑
i=1
f(xi)ρj(xi)− θj
4.
E
[(
θ̂j − θj
)2]
=
(
E(θ̂j)− θj
)2
+ E
[(
θ̂j − E(θ̂j)
)2]
= r2j + E
( 1
n
n∑
i=1
�iρj(xi)
)2
= r2j +
1
n2
n∑
i=1
σ2�ρ
2
j(xi) = r
2
j +
σ2�
n
5.
f(x) =
∞∑
j=1
θjρj(x)
f̂nN(x) =
N∑
j=1
θ̂jρj(x)
∥∥∥f̂nN − f∥∥∥2
2
=
N∑
j=1
(θ̂j − θj)2 +
∞∑
j=N+1
θ2j since
f̂nN − f =
N∑
j=1
(θ̂j − θj)ρj(x) +
∞∑
j=N+1
θjρj(x)
and {ρj} is an orthonormal basis
E‖f̂nN − f‖22=
N∑
j=1
E
[
(θ̂j − θj)2
]
+
∞∑
j=N+1
θ2j
=
Nσ2�
n
+
N∑
j=1
r2j +
∞∑
j=N+1
θ2j = C
6. As shown in the figure
3
MAST90083 Computational Statistics & Data Mining NPR
Figure 1: Solution to 3.6
4