计算机代考程序代写 data mining MAST90083 Computational Statistics & Data Mining NPR

MAST90083 Computational Statistics & Data Mining NPR

Tutorial & Practical 7: Solutions

Question 1:

1. The linear spline model for f is given by

f(xi) = β0 + β1xi +
K∑
k=1

uk(xi − kk)+

βT = [β0 β1] and u
T = [u1, . . . , uk] define the coefficients of the polynomial functions

and truncated functions respectively.

2. Define

X =


1 x1… …

1 xn




Z =


(x1 − k1)+ . . . (x1 − kK)+… . . . …

(xn − k1)+ . . . (xn − kK)+




The penalized spline fitting criterion when divided by σ2� can be written as

1

σ2�
||y −Xβ − Zu||2+

λ

σ2�
||u||2

3. Assuming cov(u) = G = σ2uI and σ
2
u = σ

2
�/λ and cov(�) = R = σ

2
� I

(y −Xβ − Zu)TR−1(y −Xβ − Zu) + uTG−1u

which corresponds to the negative log-likelihood of probability density of (y,u) with
y|u ∼ N (Xβ + Zu,R) and u ∼ N (0,G)

4. C = [X Z], D = diag(0, 0, 1, 1, …, 1), and α = σ2�/σ
2
u,

θ = [βT uT ]T = (CTC + αD)−1CTy

5. The fitted values can be written as

f̃ = C(CTC + αD)−1CTy

6. y = Xβ + �∗, where �∗ = Zu + �

cov(�∗) = V = ZGZT + R

7. β̃ = (XTV−1XT )−1XTV−1y, and cov(β̃) = (XTV−1X)−1

1

MAST90083 Computational Statistics & Data Mining NPR

Question 2:

1. The Nadaraya-Watson kernel estimation at x0 is given by

f̂(x0) =

∑N
i=1Kh(x0, xi)yi∑N
i=1Kh(x0, xi)

I

[
N∑
i=1

Kh(x0, xi) 6= 0

]
2. For a Gaussian kernel we have

Kh(x0, x) =
1

2πh
exp

{

(x− x0)2

2h2

}
Since Kh(x0, x) 6= 0 for all x0 and xi we don’t have a singularity in the denominator of
f̂(x0).
The Gaussian kernel is also everywhere differentiable in x0 and therefore the Nadaraya-
Watson estimator is differentiable as a function of x0.

3. For the Epanechnikov Kernel we have

Kh(x0, x) = K

(
|x− x0|

h

)

K(z) =

{
3
4
(1− z2) if |z|≤ 1

0 otherwise

We can observe from the expression that for |x− x0|→ h−,
|x−x0|
h
→ 1−

∂Kh(x0, x)

∂x0
=
∂K(z)

∂z
.
∂z

∂x0
=

(
−3
2
z

)(
−1
h

)
=

3z

2h
=

3|x− x0|
2h2

6= 0

However, when |x− x0|→ h+,
|x−x0|
h
→ 1+ and

∂Kh(x0, x)

∂x0
= 0

because the function is zero in this domain. Since the two limits are different this kernel
is not differentiable everywhere.

Question 3:

1. N plays the role of the smoothing parameter similar to h for the kernel smoothing.

2.

f̂nN(x) =
N∑
j=1

θ̂jρj(x) =
1

n

n∑
i=1

N∑
j=1

yiρj(xi)ρj(x) =
n∑
i=1

yiWni(x)

where Wni(x) =
1
n

∑N
j=1 ρj(xi)ρj(x) and f̂nN is therefore a linear estimator.

2

MAST90083 Computational Statistics & Data Mining NPR

3.

θ̂j =
1

n

n∑
i=1

yiρj(xi) =
1

n

(
n∑
i=1

f(xi)ρj(xi) +
n∑
i=1

�iρj(xi)

)

E(θ̂j) =
1

n

n∑
i=1

f(xi)ρj(xi) = θj + rj

rj =
1

n

n∑
i=1

f(xi)ρj(xi)−
∫ 1
0

f(x)ρj(x)dx =
1

n

n∑
i=1

f(xi)ρj(xi)− θj

4.

E
[(
θ̂j − θj

)2]
=
(
E(θ̂j)− θj

)2
+ E

[(
θ̂j − E(θ̂j)

)2]

= r2j + E


( 1

n

n∑
i=1

�iρj(xi)

)2
= r2j +

1

n2

n∑
i=1

σ2�ρ
2
j(xi) = r

2
j +

σ2�
n

5.

f(x) =
∞∑
j=1

θjρj(x)

f̂nN(x) =
N∑
j=1

θ̂jρj(x)

∥∥∥f̂nN − f∥∥∥2
2

=
N∑
j=1

(θ̂j − θj)2 +
∞∑

j=N+1

θ2j since

f̂nN − f =
N∑
j=1

(θ̂j − θj)ρj(x) +
∞∑

j=N+1

θjρj(x)

and {ρj} is an orthonormal basis

E‖f̂nN − f‖22=
N∑
j=1

E
[
(θ̂j − θj)2

]
+

∞∑
j=N+1

θ2j

=
Nσ2�
n

+
N∑
j=1

r2j +
∞∑

j=N+1

θ2j = C

6. As shown in the figure

3

MAST90083 Computational Statistics & Data Mining NPR

Figure 1: Solution to 3.6

4