1.
Homework 11 MATH 504
• Reading from Elements of Statistical Learning (available in Course Doc- uments)
– Sec 5.1, 5.2 discuss spline regression and the general idea of using ba- sis functions to calculate regressions. I followed roughly the notations and development of these sections in the lecture.
• Reading from Sauer
– Sections 5.1.1,5.1.2. Differentiation using finite difference approxi- mations.
– Sections 5.2.1, 5.2.2, 5.2.3. Integration using trapezoid rule and simp- son’s rule. Newton-Cotes formulas are generalizations of Riemann, trapezoid integration using higher order polynomials.
• Here is a nice webpage on spline regression in R, see section 4.5 therein. http://data.princeton.edu/R/linearModels.html
The main point is that the R function lm is used in conjunction with an R function, e.g. bs, that selects a spline basis and then computes the associated model/design matrix. In otherwords, spline regression is viewed as a linear regression as we discussed in class and this view is made explicit by using lm.
The file BoneMassData.txt contains bone mass data for men and women at a variety of ages. This data comes from the book, ”The Elements of Statistical Learning”. In this problem, we will apply a spline regression to the dataset using predetermined knots. Specifically our regression model will be y ∼ S(x) were x is the age, y is the bone mass, and S(x) is a cubic spline. For brevity consider only the samples taken from women and for simplicity consider two knots with ζ1 = 15 and ζ2 = 20 (the case of more knots is no different). Then our splines are determined by the parameters ai, bi, ci, di for i = 0, 1, 2 where Si(x)=ai +bix+cix2 +dix3 and
if x < ζ1
if x ∈ [ζ1, ζ2) (1)
if x ≥ ζ2,
with the requirement that S(x), S′(x), S′′(x) be continuous at the two knots.
Our goal is to find S(x) that minimizes the sum of squared residuals
N
(yi − S(xi))2, (2)
i=1 where (xi, yi) are the datapoints.
(a) Suppose that we can decompose any cubic spline S(x) with knots at ζ = 15, 20 as a linear combination of the functions h1 (x), h2 (x), . . . , hD (x)
2.
S0(x) S(x) = S1(x) S2(x)
1
for some D. That is, any cubic spline with knots at ζ1, ζ2 can be written
as
D
S(x) = αjhj(x) (3)
j=1
Show that the spline S(x) that minimizes the sum of squared residuals is given by α defined by α = (BTB)−1BTy where y is the vector of yi values and B is a N × D matrix given by Bkl = hl(xk). (We did this in class, I want you to go through the details.)
(b) Now show that D from (a) has value D = 6. (Again, we did this in class.)
(c) Show that the six functions h1(x) = 1, h2(x) = x, h3(x) = x2, h4(x) = x3, h5(x) = [x−ζ1]3+, h6(x) = [x−ζ2]3+ form a basis for all splines with knots at ζ = 15,20. To do this you must show (i) each hi(x) is a spline for the two knots, (ii) the hi(x) cannot be linearly combined to give the zero function, and (iii) each spline must be a linear combination of these functions. (Hint: show (i) and (ii) and then use the dimensionality of the spline space to show (iii))
(d) Now compute the regression spline S(x) and plot it along with the data points to show the fit. (Typically it is better to do the actual fitting using a call to lm or using a spline regression package, but it’s good to go through the process yourself at least once.)
3. Consider f(x) = ex. Note that f′(0) = 1. Consider the following two finite
differences:
f(x + h) − f(x). (4) h
f(x+h)−f(x−h). (5) 2h
For h = 10i with i = −20, −19, −18, . . . , −1, 0 calculate both finite differences. For each h, determine how many digits in the finite difference estimate are cor- rect (you know the true value of the derivative is 1). Note that .99991 is correct up to 4 digits since .999999 · · · = 1. Explain your results given finite differences and floating point error. DON’T FORGET TO SET options(digits=16).
4. Let
x 1−z2/2 F (x) = dz √ e
0 2π
. (6)
F (x) is an important function - essentially the cdf of a normal random variable - that has no analytic formula and must be evaluated numerically. Write a function, Fapprox(n, method) that approximates F (∞) by numerical integra- tion. If method is set to the character string ”reimann” or ”trapezoid” use a Riemann sum and trapezoid rule method, respectively, with n grid points. If method is set to ”useR”, use the R integrate function with subdivisions set to n. (R’s integrate function uses an adaptive grid method. In these meth- ods, the grid is made dense in regions where the function varies, see Sauer for further details on such methods.) Since you cannot integrate to ∞, you must
2
pick some reasonable cutoff. We know that the true value is F(∞) = 1/2. Consider approximating the integral using each of the three methods, try n = 10, 100, 1000, 10000 and compare accuracy.
3