§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Spline Regression
MAST90083 Computational Statistics and Data Mining
School of Mathematics & Statistics The University of Melbourne
Spline Regression 1/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Outline
§3.1 Introduction
§3.2 Motivation
§3.3 Spline
§3.4 Penalized Spline Regression
Spline Regression 2/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Introduction
Some data sets are hard or impossible to model using traditional parametric techniques
Many data sets also involve nonlinear effects that are difficult to model parametrically
There is a need for flexible techniques to handle complicated nonlinear relationships
Here we look at some ways of freeing oneself of the restrictions of parametric regression models
Spline Regression 3/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Introduction
The interest is the discovery of the underlying trend in the observed data which are treated as a collection of points on the plane
Spline Regression 4/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Introduction
Alternatively, we could think of the vertical axis as a realization of a random variable y conditional on the variable x
The underlying trend would then be a function f (x) = E (y|x)
This can also be written as
yi =f (xi)+εi, E(εi)=0
and the problem is referred as nonparametric regression
Spline Regression 5/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Introduction
Aim Estimate the unspecified smooth function from the pairs (xi,yi), i = 1,…,n.
x here will be considered univariate
There are several available methods, here we focus first on penalized splines
Spline Regression 6/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Motivation
Let’s start with the straight line regression model yi = β0 + β1xi + εi
Spline Regression 7/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Motivation
The corresponding basis for this model are the functions: 1 and x
The model is a linear combination of these functions which is the reason for use of the world basis
Spline Regression 8/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Motivation
The basis functions correspond to the columns of X for fitting the regression
1 x1 . .
1 xn The vector of fitted values
⊤ −1 ⊤ ˆy=XXX Xy= Regression 9/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Motivation
The quadratic model is a simple extension of the linear model yi = β0 + β1xi + β2xi2 + εi
Spline Regression 10/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Motivation
There is an extra basis function x2 corresponding to the addition of the β2xi2 term to the model
The quadratic model is an example of how the simple linear model might be extended to handle nonlinear structure
Spline Regression 11/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Motivation
The basis functions correspond to the columns of X for fitting the regression in the case of a quadratic model is given by
1 x1 x12 . . .
1 xn xn2 The vector of fitted values
⊤ −1 ⊤ ˆy=XXX Xy= Regression 12/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Spline basis function
We know look at how the model can be extended to accommodate a different type of nonlinear structure
Broken line model: it consists of two differently sloped lines that join together
Spline Regression 13/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Spline basis function
Broken line: A linear combination of three basis functions
where we have (x − 0.6)+ with
u u>0 u+= 0 u≤0
Spline Regression 14/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Spline basis function
Broken line model is
yi =β0 +β1xi +β11(xi −0.6)+ +εi
which can be fit using the least square estimator with 1 x1 (x1−0.6)+
. . .
1 xn (xn − 0.6)+
Spline Regression 15/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Spline basis function
Assume a more complicated structure
Spline Regression 16/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Spline basis function
If we have good reason to believe that our underlying structure is of this basic, we could change the basis ?
where the functions: (x − 0.5)+, (x − 0.55)+,…,(x − 0.95)+
Spline Regression 17/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Spline basis function
The basis can do a reasonable job with a linear portion between x = 0 and x = 0.5
We can use least square to fit such model with
1 x1 (x1 −0.5)+ (x1 −0.55)+ … (x1 −0.95)+
… …. … …
1 xn (xn −0.5)+ (xn −0.55)+ … (xn −0.95)+
Spline Regression 18/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Spline basis function
It is possible to handle any complex type of structure by simply adding functions of the form (x − k)+ to the basis
This is equivalent to adding a column of values to the X matrix
The value k is usually referred to as knots
The function is made up of two lines that are tied together at x=k
Spline Regression 19/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Spline basis function
The function (x − 0.6)+ is called a linear spline basis function
A set of such functions is called a linear spline basis
Any linear combination of linear spline basis functions 1, x, (xi − k1)+,…,(xi − kK )+ is a piecewise linear function with knots k1, k2,…,kK and called spline
Spline Regression 20/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Spline basis function
Rather than referring to the spline basis function (x − k)+ it is common to simply refer to it knots k
We say the model has a knot at 0.35 it the function (x − 0.35)+ is the basis
The spline model for a function f is
K
f (x) = β0 + β1x + βki (x − ki )+
i=1
Spline Regression 21/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Illustration
The selection of a good basis is usually challenging Start by trying to choose knots by trial
Spline Regression 22/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Illustration
The fit lacking in quality for low values of range An obvious remedy is to use more knots
Spline Regression 23/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Illustration
Larger set of knots, the fitting procedure has much more flexibility
The plots is heavily overfitted
Spline Regression 24/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Illustration
Pruning the knots to overcome the overfitting issue
This fits the data well without overfitting
This was arrived at, after a lot of time consuming trial and
error
Spline Regression 25/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Knot selection
A natural attempt at automatic selection of the knots is to use a model selection criterion
If there are K candidate knots then there are 2K possible models assuming the overall intercept and linear term are always present
Highly computational intensive
Spline Regression 26/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Penalized spline regression
Too many knots in the model induces roughness of the fit An alternative approach: retain all the knots but constrain
their influence
Hope: this will result in a less variable fit
Consider a general spline model with K knots, K large
Spline Regression 27/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Penalized spline regression
The ordinary least square fit is written as
ˆy = Xβˆ where βˆ minimizes ∥y − Xβ∥2
and β = [β0, β1, β11, …, β1K ] with β1k the coefficient of the kth knot.
Unconstrained estimation of the β leads to a wiggly fit
Spline Regression 28/41
§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression
Penalized spline regression
Constraints on the β1k that might help avoid this situation are
max|β1k|