计算机代考程序代写 flex data mining §3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Spline Regression
MAST90083 Computational Statistics and Data Mining
Karim Seghouane
School of Mathematics & Statistics The University of Melbourne
Spline Regression 1/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Outline
§3.1 Introduction
§3.2 Motivation
§3.3 Spline
§3.4 Penalized Spline Regression §3.5 Linear Smoothers
§3.6 Other Basis
Spline Regression 2/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Introduction
􏰔 Some data sets are hard or impossible to model using traditional parametric techniques
􏰔 Many data sets also involve nonlinear effects that are difficult to model parametrically
􏰔 There is a need for flexible techniques to handle complicated nonlinear relationships
􏰔 Here we look at some ways of freeing oneself of the restrictions of parametric regression models
Spline Regression 3/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Introduction
The interest is the discovery of the underlying trend in the observed data which are treated as a collection of points on the plane
Spline Regression 4/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Introduction
􏰔 Alternatively, we could think of the vertical axis as a realization of a random variable y conditional on the variable x
􏰔 The underlying trend would then be a function f (x) = E (y|x)
􏰔 This can also be written as
yi =f (xi)+εi, E(εi)=0
􏰔 and the problem is referred as nonparametric regression
Spline Regression 5/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Introduction
􏰔 Aim Estimate the unspecified smooth function from the pairs (xi,yi), i = 1,…,n.
􏰔 x here will be considered univariate
􏰔 There are several available methods, here we focus first on penalized splines
Spline Regression 6/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Motivation
􏰔 Let’s start with the straight line regression model yi = β0 + β1xi + εi
Spline Regression 7/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Motivation
􏰔 The corresponding basis for this model are the functions: 1 and x
􏰔 The model is a linear combination of these functions which is the reason for use of the world basis
Spline Regression 8/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Motivation
􏰔 The basis functions correspond to the columns of X for fitting the regression
􏰔 The vector of fitted values
1 x1 . .
X=. . 1 xn
􏰈 ⊤ 􏰉−1 ⊤ ˆy=XXX Xy=Hy
Spline Regression 9/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Motivation
􏰔 The quadratic model is a simple extension of the linear model yi = β0 + β1xi + β2xi2 + εi
Spline Regression 10/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Motivation
􏰔 There is an extra basis function x2 corresponding to the addition of the β2xi2 term to the model
􏰔 The quadratic model is an example of how the simple linear model might be extended to handle nonlinear structure
Spline Regression 11/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Motivation
􏰔 The basis functions correspond to the columns of X for fitting
the regression in the case of a 1
. X=.
1
quadratic model is given by
x1 x12 . . 
. . xn xn2
􏰔 The vector of fitted values
􏰈 ⊤ 􏰉−1 ⊤ ˆy=XXX Xy=Hy
Spline Regression 12/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Spline basis function
􏰔 We know look at how the model can be extended to accommodate a different type of nonlinear structure
􏰔 Broken line model: it consists of two differently sloped lines that join together
Spline Regression 13/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Spline basis function
􏰔 Broken line: A linear combination of three basis functions
􏰔 where we have (x − 0.6)+ with
􏰧u u>0 u+= 0 u≤0
Spline Regression 14/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Spline basis function
􏰔 Broken line model is
yi =β0 +β1xi +β11(xi −0.6)+ +εi
􏰔 which can be fit using the least square estimator with 1 x1 (x1−0.6)+
.. . X=.. .
1 xn (xn − 0.6)+
Spline Regression 15/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Spline basis function
􏰔 Assume a more complicated structure
Spline Regression 16/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Spline basis function
􏰔 If we have good reason to believe that our underlying structure is of this basic, we could change the basis ?
􏰔 where the functions: (x − 0.5)+, (x − 0.55)+,…,(x − 0.95)+
Spline Regression 17/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Spline basis function
􏰔 The basis can do a reasonable job with a linear portion between x = 0 and x = 0.5
􏰔 We can use least square to fit such model with
1 x1 (x1 −0.5)+ (x1 −0.55)+ … (x1 −0.95)+
… …. X=… …
1 xn (xn −0.5)+ (xn −0.55)+ … (xn −0.95)+
Spline Regression 18/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Spline basis function
􏰔 It is possible to handle any complex type of structure by simply adding functions of the form (x − k)+ to the basis
􏰔 This is equivalent to adding a column of values to the X matrix
􏰔 The value k is usually referred to as knots
􏰔 The function is made up of two lines that are tied together at x=k
Spline Regression 19/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Spline basis function
􏰔 The function (x − 0.6)+ is called a linear spline basis function
􏰔 A set of such functions is called a linear spline basis
􏰔 Any linear combination of linear spline basis functions 1, x, (xi − k1)+,…,(xi − kK )+ is a piecewise linear function with knots k1, k2,…,kK and called spline
Spline Regression 20/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Spline basis function
􏰔 Rather than referring to the spline basis function (x − k)+ it is common to simply refer to it knots k
􏰔 We say the model has a knot at 0.35 it the function (x − 0.35)+ is the basis
􏰔 The spline model for a function f is
K
f (x) = β0 + β1x + 􏰅 βki (x − ki )+
i=1
Spline Regression 21/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Illustration
􏰔 The selection of a good basis is usually challenging 􏰔 Start by trying to choose knots by trial
Spline Regression 22/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Illustration
􏰔 The fit lacking in quality for low values of range 􏰔 An obvious remedy is to use more knots
Spline Regression 23/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Illustration
􏰔 Larger set of knots, the fitting procedure has much more flexibility
􏰔 The plots is heavily overfitted
Spline Regression 24/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Illustration
􏰔 Pruning the knots to overcome the overfitting issue
􏰔 This fits the data well without overfitting
􏰔 This was arrived at, after a lot of time consuming trial and
error
Spline Regression 25/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Knot selection
􏰔 A natural attempt at automatic selection of the knots is to use a model selection criterion
􏰔 If there are K candidate knots then there are 2K possible models assuming the overall intercept and linear term are always present
􏰔 Highly computational intensive
Spline Regression 26/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Penalized spline regression
􏰔 Too many knots in the model induces roughness of the fit 􏰔 An alternative approach: retain all the knots but constrain
their influence
􏰔 Hope: this will result in a less variable fit
􏰔 Consider a general spline model with K knots, K large
Spline Regression 27/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Penalized spline regression
􏰔 The ordinary least square fit is written as
ˆy = Xβˆ where βˆ minimizes ∥y − Xβ∥2
􏰔 and β = [β0, β1, β11, …, β1K ] with β1k the coefficient of the kth knot.
􏰔 Unconstrained estimation of the β leads to a wiggly fit
Spline Regression 28/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Penalized spline regression
Constraints on the β1k that might help avoid this situation are
􏰔 max|β1k| 0
􏰔 In case of spline basis D = diag (0p+1, 1K ) 􏰔 In smoothing splines D defines the penalty
Spline Regression 52/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
General form of penalized spline
When applying splines, there are two basic choices to make 􏰔 The spline model: the degree and knot locations
􏰔 The penalty: the form of the penalty
Once the choices have been made, there follow two secondary choices
􏰔 The basis functions: truncate power functions or B-splines 􏰔 The basis functions used in the computations
Spline Regression 53/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Linear smoothers
Penalized spline is a linear function of the data y
􏰈⊤ 􏰉−1⊤
ˆy=SλywithSλ=X X X+λD X
􏰔 where X corresponds for example to the pth degree truncated
spline basis
􏰔 Sλ is usually called the smoother matrix
In general
ˆy = L y
where L is an n × n matrix that doesn’t depend on y directly (but does through λ). This is also called linear smoother.
Spline Regression 54/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Error of the smoothers
Let fˆ be an estimator of f obtained from yi = f (xi ) + εi
An important quantity of interest is the error incurred by an estimator with respect to a given target. The most common measure of error is the mean square error MSE
􏰖ˆ 􏰗 􏰆􏰖ˆ 􏰗2􏰇 MSE f(x) =E f(x)−f(x)
which has the advantage of admitting the decomposition 􏰖ˆ 􏰗 􏰟 􏰖ˆ 􏰗 􏰠2 􏰖ˆ 􏰗
MSE f (x) = E f(x) −f(x) +var f(x) which represents the squared bias and variance of the error.
Spline Regression 55/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Error of the smoothers
􏰔 The entire curve is of interest → so it is common to measure the error globally across several values of x
􏰔 Mean integrated squared error (MISE) is a possibility 􏰖ˆ􏰗􏰝 􏰖ˆ􏰗
MISE f(.) = MSE f(x) dx 􏰔 when only error at the observations are considered
MSSE􏰖fˆ(.)􏰗=E􏰅n 􏰖fˆ(xi)−f (xi)􏰗2 i=1
Spline Regression 56/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Error of the smoothers
􏰔ˆ􏰟ˆ ˆ􏰠⊤
Let f = f (x1 ) , …, f (xn ) denotes the vector of fitted values
and
􏰔 let f = [f (x1 ) , …, f (xn )] denotes the vector of unknown values
􏰈􏰉
MSSE ˆf =E∥ˆf−f∥2
􏰔 For linear smoother ˆf = Ly
􏰈􏰉􏰅n􏰈 􏰉2􏰖􏰗
Note 2
MSSE ˆf =
Efˆ(xi)−f(xi) +var fˆ(xi)
i=1
􏰈􏰉 22􏰈⊤􏰉
MSSE ˆf =∥(L−I)f∥ +σεtr LL
Spline Regression 57/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Error of the smoothers
􏰔 The bias is given by
Bias ˆf =f−E ˆf =f−Lf
􏰔 The covariance
􏰈􏰉⊤2⊤ cov ˆf =Lcov(y)L =σεLL
􏰔 The diagonal contains the pointwise variances at the xi
􏰈􏰉 􏰈􏰉
Spline Regression 58/74

§3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot
Degrees of freedom of a smoother
􏰔 For penalized spline
􏰆􏰈 ⊤ 􏰉−1 ⊤ 􏰇
dffit=tr XX+λD XX=tr(Sλ) 􏰔 For K knots and degree p
tr (S0) = p + 1 + K
􏰔 At the other extreme
tr (Sλ) → p + 1 as λ → ∞
􏰔 Soforλ>0
p + 1 < df < p + 1 + K Spline Regression 59/74 §3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot Degrees of freedom of a smoother Different values lead to similar appearance. They have roughly the same degree of freedom Spline Regression 60/74 §3.1 Introduction §3.2 Motivation §3.3 Spline §3.4 Penalized Spline Regression §3.5 Linear Smoothers §3.6 Ot Cross validation 􏰔 The most common measure for the goodness of fit of a regression curve n RSS=􏰅(yi −yˆi)2 =∥y−ˆy∥2 i=1 􏰔 Itisminimizedforλ=0forwhichyˆi =yi,1