计算机代考程序代写 data mining MAST90083 Computational Statistics & Data Mining NPR

MAST90083 Computational Statistics & Data Mining NPR
Tutorial & Practical 7: Solutions Question 1:
1. The linear spline model for f is given by
f(xi)=β0 +β1xi +􏰏uk(xi −kk)+
k=1
βT = [β0 β1] and uT = [u1, . . . , uk] define the coefficients of the polynomial functions
and truncated functions respectively. 2. Define
1 x1 . .
X=. . 1 xn
(x1 −k1)+ … (x1 −kK)+  . .. . 
Z=… (xn −k1)+ … (xn −kK)+
The penalized spline fitting criterion when divided by σε2 can be written as 1 ||y − Xβ − Zu||2+ λ ||u||2
σε2 σε2
3. Assuming cov(u) = G = σu2I and σu2 = σε2/λ and cov(ε) = R = σε2I
(y − Xβ − Zu)T R−1(y − Xβ − Zu) + uT G−1u
which corresponds to the negative log-likelihood of probability density of (y,u) with
y|u ∼ N (Xβ + Zu, R) and u ∼ N (0, G)
4. C = [X Z], D = diag(0,0,1,1,…,1), and α = σε2/σu2,
θ = [βT uT]T = (CTC+αD)−1CTy
5. The fitted values can be written as
̃f = C ( C T C + α D ) − 1 C T y
6. y=Xβ+ε∗,whereε∗ =Zu+ε
cov(ε∗)=V=ZGZT +R
7. β ̃ = (XT V−1XT )−1XT V−1y, and cov(β ̃) = (XT V−1X)−1
1
K

MAST90083 Computational Statistics & Data Mining NPR
Question 2:
1. The Nadaraya-Watson kernel estimation at x0 is given by
Kh(x0, xi) ̸= 0 i=1
ˆ
2. For a Gaussian kernel we have
i=1
􏰏 􏰎N K(x,x)y 􏰋N
􏰌
i=1 h 0 i i f(x0) = 􏰎N Kh(x0, xi) I
􏰇 (x−x0)2􏰈 Kh(x0,x)=√2πhexp − 2h2
1
Since Kh(x0, x) ̸= 0 for all x0 and xi we don’t have a singularity in the denominator of ˆ
f(x0).
The Gaussian kernel is also everywhere differentiable in x0 and therefore the Nadaraya- Watson estimator is differentiable as a function of x0.
3. For the Epanechnikov Kernel we have
Kh(x0,x) = K h
􏰃|x−x0|􏰄
􏰍34(1−z2) if|z|≤1 0 otherwise
K(z) =
We can observe from the expression that for |x − x0|→ h−, |x−x0| → 1−
h
∂Kh(x0, x) ∂K(z) ∂z 􏰃−3 􏰄 􏰃−1􏰄 3z
∂x =∂z.∂x=2z h=2h=2h2 ̸=0
3|x − x0|
00
However, when |x − x0|→ h+, |x−x0| → 1+ and
h
∂Kh(x0,x) = 0
∂x0
because the function is zero in this domain. Since the two limits are different this kernel
is not differentiable everywhere.
Question 3:
1. N plays the role of the smoothing parameter similar to h for the kernel smoothing.
2.
N1nNn
fˆ (x)=􏰏θˆρ(x)= 􏰏􏰏yρ(x)ρ(x)=􏰏yW (x)
nN jjnijij ini j=1 i=1 j=1 i=1
where W (x) = 1 􏰎N ρ (x )ρ (x) and fˆ is therefore a linear estimator. ni nj=1jij nN
2

MAST90083 Computational Statistics & Data Mining NPR
3.
1n1􏰉nn􏰊
θˆ = 􏰏yρ (x)= 􏰏f(x)ρ (x)+􏰏ερ (x) jnijinijiiji
4.
0
= E(θj)−θj +E θj −E(θj)  􏰉 1 􏰏n 􏰊 2 
=rj2+E n εiρj(xi)  i=1
5.
1 􏰏n
rj = n
i=1
E θj −θj
f(x)ρj(x)dx = n f(xi)ρj(xi) − θj i=1
i=1
i=1 i=1
f(xi)ρj(xi) = θj + rj 1 􏰏n
ˆ 1􏰏n
E(θj) = n
􏰐 1
ˆ2􏰏ˆ2􏰏2
􏰀􏰀fnN −f􏰀􏰀 = (θj −θj) + θj since
2
j=1 j=N+1
and {ρj} is an orthonormal basis
ˆ 2􏰏􏰑ˆ 2􏰒􏰏2
6. As shown in the figure
i=1
f(xi)ρj(xi) −
􏰅􏰁ˆ 􏰂2􏰆􏰁ˆ 􏰂2 􏰅􏰁ˆ ˆ􏰂2􏰆
= r j2 + n 2

i=1
j=1 􏰀􏰀N∞
1 􏰏n
σ 2 σ ε2 ρ 2j ( x i ) = r j2 + nε
f(x) = 􏰏θjρj(x) j=1
N
fˆ (x)=􏰏θˆρ(x) nN jj
N∞
fˆ −f=􏰏(θˆ−θ)ρ(x)+􏰏θρ(x)
nN j jj jj j=1 j=N+1
N∞ E∥fnN −f∥2= E (θj −θj) + θj
j=1 j=N+1
N∞ Nσε2 􏰏2 􏰏2
=n+rj+ θj=C j=1 j=N+1
3

MAST90083 Computational Statistics & Data Mining NPR
Figure 1: Solution to 3.6
4