Elastic Net
As opposed to subset selection, ridge regression does not actually result in simpler models.
I Some estimated coe cients are simply reduced in their absolute value, but we dont get completely rid of predictors as we did wit subset selection.
Copyright By PowCoder代写 加微信 powcoder
I Ideally, we want to allow for coe cients to be exactly equal to zero.
I This can be achieved, if we make the optimization problem slightly more complicated.
Lasso optimization problem
Xp argmin (yi 0 xij j )2 + | j |
i=1 j=1 j=1 I Again, this can be rewritten as:
min (yi 0 xij j)2 subjectto | j |s (1)
i=1 j=1 j=1 | {z }
Residual Sum of Squares
I The only thing that changes is the shape of the“circle”.
I The constraint in absolute values is also called the L1 norm of
the vector , i.e.
Some graphical intuition for p = 2
Some graphical intuition for p = 2
Lasso selects variables…
I Lasso shrinks some variables to be exactly equal to zero, rather than just close to zero.
I This variable selection method makes the Lasso particularly attractive.
I In particular with a large number of features p, the Lasso can help to significantly reduce the dimensionality of the matrix X .
I Lets now apply the Lasso to our example on Housing prices.
Shrinking hedonic pricing coe cients?
Lets plot some of the coe cients as they evolve with increasing.
baths bedrooms builtyryr distgas2009 distgas2010 lat
The vertical axis presents the estimated coe cients i , while the horizontal axis plots out log( ). A large value means a high value of , which imply significant shrinking of coe cients. Due to the very large values that takes in this example, I used a logaritmic transformation to condense the picture.
Shrinking hedonic pricing coe cients?
Lets plot some of the coe cients as they evolve with increasing.
baths bedrooms distgas2009 distgas2010 sqft
The vertical axis presents the estimated coe cients i , while the horizontal axis plots out log( ). A large value means a high value of , which imply significant shrinking of coe cients. Due to the very large values that takes in this example, I used a logaritmic transformation to condense the picture.
Shrinking hedonic pricing coe cients?
Lets plot some of the coe cients as they evolve with increasing.
0.50 0.75 1.00
l2betaratio
baths bedrooms builtyryr distgas2009 distgas2010 lat
The vertical axis presents the estimated coe cients i , while the
horizontal axis plots out k ˆ k1 . k ˆOLS k1
Shrinking hedonic pricing coe cients?
Lets plot some of the coe cients as they evolve with increasing.
0.50 0.75 1.00
l2betaratio
baths bedrooms distgas2009 distgas2010 sqft
The vertical axis presents the estimated coe cients i , while the
horizontal axis plots out k ˆ k1 . k ˆOLS k1
Lasso or Ridge regression?
I Neither ridge regression nor the lasso will universally dominate the other.
I Lasso performs generally better in a situation, where a relatively small number of predictors have substantial coe cients, and the remaining predictors have coe cients that are very small or that equal zero.
I Ridge regression will perform better when the response is a function of many predictors, the coe cients of which are not very dissimilar from one another.
I Lasso and Ridge regression can yield a reduction in variance at the expense of a small increase in bias
I There are very e cient algorithms for fitting both ridge and lasso models; in both cases the entire coe cient paths can be computed with about the same amount of work as a single least squares fit
Lasso for variable selection illustrated
I will present two use cases that may be illustrative.
I In a previous lecture, I was showing the UK EU Referendum in the context of building a robust predictive exercise – we used BSS, but LASSO is an approximation of the BSS algorithm.
I As indicated, with p > 40, BSS becomes infeasible very fast.
I In the paper we used a blocked BSS and combined regressors carrying similar variation to navigate the computational constraint. How does LASSO perform?
I In total there are 76 continuous regressors…
Get the data https://www.dropbox.com/s/3fxh0a2n2lca26b/ Becker_Fetzer_Novy_2017_EP.zip?dl=1
Lasso for Brexit
> #to load data from other formats, such as stata data files
> library(haven)
> library(glmnet)
> DAT<-data.table(read_dta("R/brexitevote-ep2017.dta"))
> #remove irrelevant variable
> DAT<-DAT[, setdiff(names(DAT),"BREXITINDEXNARROW"),with=F]
> #we need to normalize the variables to have unit standard deviation
> #inspect the data frame and see the variables are in columns
> DAT<-DAT[, c(13,15:91), with=F]
> dim(DAT)
[1] 382 78
> #missing values for two areas
> DAT<-DAT[!is.na(Pct_Leave)]
> #remove columns with any missing values
> DAT<-DAT[, which(colSums(is.na(DAT))==0), with=F]
> dim(DAT)
[1] 380 65
> class(DAT)
[1] “data.table” “data.frame”
> #standardize, scale and center
> DAT<-data.table(scale(DAT,center=TRUE))
> #create a model matrix for regression
> x=model.matrix(Pct_Leave ~ ., data=DAT)[,-1]
> x[1:5, 1:4]
EU75Leaveshare
> class(x)
pensionergrowth20012011 -0.55928 -1.13089 -0.38467 0.00712 -0.28806
ResidentAge30to44share -0.077721 -0.386752 -0.548088 0.407054 0.000242
ResidentAge45to59share -0.42167 -1.00202 0.32559 -0.15890 0.00298
[1] “matrix” “array”
Lasso for Brexit
> library(glmnet)
> #run lasso
> lasso.mod=glmnet(x,DAT$Pct_Leave,alpha=1,
+ standardize=TRUE, family=”gaussian”)
> summary(lasso.mod)
Length Class Mode
a0 91 -none- numeric
beta 5824 dgCMatrix S4
df 91 -none- numeric
dim 2 -none- numeric
lambda 91 -none- numeric
dev.ratio 91 -none- numeric
nulldev 1 -none- numeric
npasses 1 -none- numeric
jerr 1 -none- numeric
offset 1 -none- logical
call 6 -none- call
nobs 1 -none- numeric
Lasso for Brexit
0 2 5 21 36 49 55
0.0 0.5 1.0 1.5 2.0 2.5 3.0 L1 Norm
Coefficients
−0.2 0.0 0.2 0.4 0.6
Elastic Net
Elastic Net
I An elastic net combines L1 and L2 penalization I The objective function becoms
argmin (yi 0 xij j )2+ [↵ | j | (1 ↵) j2]
i=1 j=1 j=1 j=1
I This now has two tuning parameters to select – the weight on the penalization form ↵ and the to navigate the bias-variance trade-o↵.
Elastic Net
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com