Problem sheet 5
The next question is optional, yet it will be a bonus to your final score.
2. If the density function is compactly supported, standard kernel density estimators are invalid/inconsistent at or near boundary points. Below we introduce a new estimator that automatically adapts to the (possibly unknown) boundaries of the support of the density without requiring specific data modification or additional tuning parameters.
Let X! 1, …, Xn be a random sample, where X! i is a continuous random variable with a smooth CDF over its possibly unknownsupport[!a,b]⊆Rfora!
[! −1,1]. The proposed boundary adaptive density estimator is then defined as
̂̂
!f ba(x) = β1(x).
1
(a) Write an R function to implement the above estimator, with !h > 0 as an input parameter. Use the triangular kernel !K(u) = max{0,1 − |u|}, u! ∈ R.
(b) Conduct simulation study based on exponential distribution with density function f! (x) = e−x if x! ≥ 0 and
f! (x) = 0 if !x < 0. This density function is then supported on [! 0,∞). We want to estimate the density at
x! ∈ {0,0.5,1}, where 0! is the boundary, !0.5 is a near-boundary point and !1 is an interior point. Take the sample size n! = 1000. For various ̂choices of the bandwidth !h, conduct 5000 Monte Carlo repetitions, and report the average estimation error !| f ba(x) − f (x)| and the corresponding standard deviation at x! ∈ {0,0.5,1}.
(c) Use 5-fold cross-validation to select the bandwidth for the above estimator. Again, for exponential distribution, take n! = 1000 and conduct 5000 Monte Carlo repetitions to compare the performance of this estimator with standard kernel density estimator (bandwidth chosen by default) at !x ∈ {0,0.5,1}.
2