Motivation
Motivation
Regression
3.0 – Regression modelling
Motivation
Regression
Regression analysis
Linear regression is a central tool of statistical analysis. It is used
extensively and often is a key component within a more complex
analytical procedure.
Regression methods, including generalized linear modelling (GLM)
and nonlinear regression, apply to a wide range of problems,
including:
trend estimation and prediction
diagnostic tests for model selection
benchmarking and (methodological) explorative analysis
In this section we review elementary notions related to the
implementation of regression analysis.
Motivation
Regression
Linear regression
3.1 – Linear regression
Motivation
Regression
Linear regression
Linear regression
In linear regression models, observations depend linearly on the
parameters of interest:
Y = θ0 + θ1X + ε
where ε traditionally represents a collection of realizations of a
Gaussian r.v. with mean 0
Ordinary least squares are most often applied for estimation of
θ = (θ0, θ1)
When the error term ε is not assumed Gaussian, maximum
likelihood estimation is the more generic framework
MLE is equivalent to OLS when ε ∼ N(0, σ2)
There are many other estimation criteria
Motivation
Regression
Linear regression
M-estimation
M-estimation: an objective function ρ is applied to the residuals
e(θ̃0, θ̃1) = Y − Ỹ = Y − θ̃0 − θ̃1X
to form a criterion that gets optimised with respect to (θ0, θ1):
(θ̂0, θ̂1) = arg min
θ0,θ1
N∑
i=1
ρ (ei(θ0, θ1))
For example ρ(u) = u2 and we optimise the sum of squares
N∑
i=1
(
ei(θ̃0, θ̃1)
)2
Any choice of a function ρ may be implemented and optimised.
Motivation
Regression
Linear regression
Linear regression diagnostics
Note that if (θ̃0, θ̃1) ≡ (θ0, θ1) then the residuals are identical to
the error term ε.
Recall the main assumptions on the (traditional) linear model:
the error term has zero mean and constant variance
the errors are uncorrelated
the errors are Normally distributed or sample size is large
enough for large sample theory to be applied
This means that if the model fit (estimation) is good, the residuals
should look like they satisfy these properties – although residuals
are not independent r.v.’s.
Motivation
Regression
Nonlinear regression
3.2 – Nonlinear regression
Motivation
Regression
Nonlinear regression
Polynomial regression
There is a lot of flexibility (choice) in how one may tackle
nonlinear relationships. Nonlinearity may be expressed with respect
to the parameter of interest, the model covariate, or both.
Polynomial regression uses a model that is linear in the
parameter of interest but nonlinear in the regression covariate:
Y = θ0 + θ1X + θ2X
2 + · · ·+ θpXp + ε
Goodness of fit depends on the order p of the polynomial (ideally,
p = N − 1), but polynomial regression is hard to interpret.
NB: Y = θ2X + ε is actually Y = βX + ε with β = θ2 ∈ R+…
Motivation
Regression
Nonlinear regression
Polynomial regression
Exercise: attach(pressure); x=temperature; y=pressure
Implement the linear regression to obtain the following plot:
0 50 100 150 200 250 300 350
0
20
0
40
0
60
0
80
0
Example: polynomial regression
Temperature (Celsius)
V
ap
or
p
re
ss
ur
e
(m
l o
f m
er
cu
ry
)
0 200 400 600
-2
0
0
20
40
Residual plot indicates an ill-suited model
Fitted values
R
es
id
ua
ls
Motivation
Regression
Nonlinear regression
Nonlinear regression (using least squares)
Nonlinear regression allows the model structure (i.e. conditional
expectation E(Y | X) of the observations Y ) to depend
nonlinearly on the parameters of interest.
Examples:
Y = θ0 + θ1X1 + θ
2
2X2 + ε
Y = eθX + ε
Nonlinear regression via Least Squares is performed in R using nls.
NB: nonlinear methods are usually (very) sensitive to initialisation.
Motivation
Regression
Nonlinear regression
Example: 2-parameter exponential model (LearnR page)
par(font.lab= 2, font.axis=2)
plot(x, y, pch=20, cex=2,
xlab=”Temperature (Celsius)”,
ylab=”Vapor pressure (ml of mercury)”,
main=”Example: polynomial regression”)
nlreg <- nls(y~exp(a+b*x), start=list(a=0,b=.5)) # error... need for better initialisation nlreg <- nls(y~exp(a+b*x), start=list(a=0,b=.02)) summary(nlreg) cc = coef(nlreg) curve(exp(cc[1]+cc[2]*x), col=’blue’, add=T, lwd=3) points(x, exp(0+0.018*x), col=’red’, t=’b’, lwd=3, cex=.5) nlreg2 <- nls(y~exp(a*x), start=list(a=.02)) cc = coef(nlreg2) curve(exp(cc[1]*x), col=’cyan’, add=T, lwd=3) legend("topleft", col=c(’black’,’blue’,’red’,’cyan’), lwd=3, legend=c("Data","Fitted (p=2)","Guessed","Fitted (p=1)")) https://eric-wolsztynski.shinyapps.io/learnr3_regression/ Motivation Regression Nonlinear regression 0 50 100 150 200 250 300 350 0 20 0 40 0 60 0 80 0 Example: nonlinear regression Temperature (Celsius) V ap or p re ss ur e (m l o f m er cu ry ) Data Fitted (p=2) Guessed Fitted (p=1) Motivation Regression Nonlinear regression Example: US population (LearnR page) library(car) data(USPop) attach(USPop) par(font.lab= 2, font.axis=2) plot(year, population, cex=1.5, pch=20, main="Example: nonlinear regression (USpop data)") time <- 0:21 pop.mod <- nls(population ~ beta1/(1 + exp(beta2 + beta3*time)), start=list(beta1 = 350, beta2 = 4.5, beta3 = -0.3), trace=TRUE) summary(pop.mod) lines(year, fitted.values(pop.mod), lwd=3, col=’red’) https://eric-wolsztynski.shinyapps.io/learnr3_regression/ Motivation Regression Nonlinear regression 1800 1850 1900 1950 2000 0 50 10 0 15 0 20 0 25 0 Example: nonlinear regression (USpop data) year po pu la tio n Motivation Regression Nonlinear regression More details on NLS may be found e.g. at: http://cran.r-project.org/doc/contrib/Fox-Companion/ appendix-nonlinear-regression.pdf http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-nonlinear-regression.pdf http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-nonlinear-regression.pdf Motivation Regression Nonlinear regression optim and optimx If one wants to implement another criterion, optim and optimx become useful for (general purpose) optimization. (The optimx package extends former optim.) optim implements Nelder-Mead, quasi-Newton and conjugate-gradient methods, and includes the possibility of box constraints. It takes definitions of the cost function (estimation criterion) and (optionally) its gradient as arguments, along with starting values. This allows users to define their own estimating functions. Motivation Regression Nonlinear regression Example: optim (LearnR page) par(mfrow=c(1,2),font.lab= 2, font.axis=2) model <- function(x, theta){ return(theta*x) } crit <- function(theta,x,y){ # must have theta as 1st parameter and return a single value... return( sum( y-model(x,theta) ) ) } thbar = 1.8 x = rep(c(1,3,7,8), len=100) y = model(x,thbar) + rnorm(x) plot(x, y, pch=20, cex=1.5, main="optim example 1: data") points(x,model(x,thbar),col=’green’,pch=20,cex=2) (optim.out <- optim(par=c(1), fn=crit, x=x, y=y, method="L-BFGS-B", lower=c(0.01), upper=c(3.05))) Exercise: Why does it not “work”????? https://eric-wolsztynski.shinyapps.io/learnr3_regression/ Motivation Regression Nonlinear regression Example: optim (continued) crit <- function(theta,x,y){ # must have theta as 1st parameter and return a single value... return( sum( (y-model(x,theta))^2 ) ) } thbar = 1.8 x = rep(c(1,3,7,8), len=100) y = model(x,thbar) + rnorm(x) (optim.out <- optim(par=c(1), fn=crit, x=x, y=y, method="L-BFGS-B", lower=c(0.01), upper=c(3.05))) # Visualize objective shape and estimated optimum: plot(x, y, pch=20, cex=1.5, main="optim example 2: data") points(x, model(x,thbar), col=’green’, pch=20, cex=2) ths = seq(0,5,len=50) obj = 0*ths for(i in 1:length(ths)){ obj[i] = crit(ths[i],x,y) } plot(ths, obj, main="Criterion...", xlab="theta", ylab="Objective value") points(optim.out$par, crit(optim.out$par,x,y),col=’red’,pch=20,cex=3) Motivation Regression Nonlinear regression Example: optim (continued again) model.f <- function(x, theta){ return(theta*x) } model.g <- function(x, theta){ gradient of model.f... return(x) } crit.f <- function(theta,x,y){ return( sum( (y-model.f(x,theta))^2 ) ) } crit.g <- function(theta,x,y){ # gradient of crit.f... return( sum( -2*model.g(x,theta)*(y-model.f(x,theta)) ) ) } thbar = 1.8 x = rep(c(1,3,7,8), len=100) y = model(x,thbar) + rnorm(x) (optim.out <- optim(par=c(1), fn=crit.f, gr=crit.g, x=x, y=y, method="L-BFGS-B", lower=c(0.01), upper=c(3.05))) ... Motivation Regression Nonlinear regression Exercise: implement a simulation of the nonlinear model Y = exp(−θX) + ε and minimize the sum of least squares for this model using optim. Exercise: implement the same simulation and minimize the sum of least squares for this model using optimx, this time. Motivation Regression Regularization 3.3 - Regularization Motivation Regression Regularization Regularization (shrinkage) Linear regression models easily overfit ’training’ data 5 10 15 20 0 20 40 60 80 10 0 12 0 speed di st data points linear fit new points what would be the new fit fit of the whole data Adding irrelevant covariates X artificially increases model fit Shrinking the less important covariates reduces overfitting Shrinking can also help reduce their variance Motivation Regression Regularization Ridge regression (shrinkage) Add an L2 penalty on to the regression cost function Shrinks the effect of some variables Output coefficient estimates depend on the choice of tuning parameter λ and minimise the criterion N∑ i=1 Yi − β0 − p∑ j=1 βjXij 2 + λ p∑ j=1 β2j = RSS + λ p∑ j=1 β2j λ is usually selected by cross-validation (tuning) Shrinkage is not applied to the intercept parameter The final model includes all p covariates Motivation Regression Regularization The LASSO Add an L1 penalty on to the regression cost function Allows to effectively “get rid of” some variables Output coefficient estimates depend on the choice of tuning parameter λ and minimise the criterion N∑ i=1 Yi − β0 − p∑ j=1 βjXij 2 + λ p∑ j=1 |βj | = RSS + λ p∑ j=1 |βj | λ is usually selected by cross-validation (tuning) Shrinkage is not applied to the intercept parameter The final model may have less than p covariates “Least Absolute Selection and Shrinkage Operator” Motivation Regression Regularization Tuning the regularization parameter Output coefficient estimates depend on the choice of tuning parameter λ used in regularization (for some cost function ρ(β)) N∑ i=1 ( Yi − β0 − p∑ j=1 βjXij )2 + λ p∑ j=1 ρ (βj) = RSS + λ p∑ j=1 ρ (βj) Cross-validation: Try several λ values on CV training sets Select λ that minimises CV MSE library(glmnet, ISLR) dat = na.omit(Hitters) x = model.matrix(Salary∼., data=dat)[,-1] y = dat$Salary plot( cv.glmnet(x, y) ) -2 0 2 4 10 00 00 12 00 00 14 00 00 16 00 00 18 00 00 20 00 00 22 00 00 Log(λ) M ea n- S qu ar ed E rr or 19 18 18 18 17 17 13 13 12 9 9 7 6 6 6 6 6 5 4 4 2 Regression Linear regression Nonlinear regression Regularization