CS代写 ISOM3360 Data Mining for Business Analytics, Session 8

ISOM3360 Data Mining for Business Analytics, Session 8
Linear Regression
Instructor: Department of ISOM Spring 2022

Key Concepts in Model Evaluation (Recap)
holdout validation vs. k-fold cross validation
Evaluation classification performance
􏰁 Benchmark?
􏰁 Confusion matrix, ROC curve
􏰁 Accuracy/error rate, precision, recall, average misclassification cost, AUC
Evaluation regression performance 􏰁 Benchmark?
􏰁 MSE, RMSE, MAE

Commonly Used Induction Algorithms

What is Regression?
Regression model in supervised learning predicts a numerical target variable based on a set of predictor variables.
Applications
􏰁 Predict the impact of discounts on sales in retail outlets.
􏰁 Predict customer credit card activities from their
demographics and historical activities. 􏰁…
Help businesses make data-driven decisions

Regression Trees (Revisit)
Regression trees
􏰁 Predicted output = average of the training examples in the subset (leaf)
􏰁 Potential splits are measured on how much they reduce the mean squared error (MSE)

An Example of Regression Tree

Regression Tree (Reduce Overfitting)
Stop splitting until max depth is reached.
Tree depth=2
Tree depth=5

Linear Regression
Linear regression is the simplest regression model. It models a linear relationship between target variable and predictor variables.
For example, predict house price based on sq feet.
(in 1000s of dollars)
Housing prices (Portland, OR)
1000 1500 2000 Size (feet2)

Features, attributes, variables: 𝑥 Target variable: 𝑦
Parameters: 𝜃 Predictions:h𝜃 𝑥
Captures the patterns we are looking for
𝑥1 𝑥2 𝑥3 𝑥4 𝑦
Size (feet2)
Number of bedrooms
Number of floors
Age of home (years)
Price ($1000)

Simple Linear Regression
Simple (univariate) linear regression: only one single predictor variable
The coefficients we want to know.
The coefficient (except 𝜃0) represents the change in the target variable for one unit of change in the predictor variable while holding other predictors in the model constant. E.g. Price (in $1000) = 37.15 + 0.21*Size (in sq ft)

Choose the Right Parameters
How to choose ’s ?
Prediction Observation
Idea: Choose so that is close to for our training examples

Ordinary Least Squares (OLS)
Ordinary least squares (OLS)
􏰁 Minimizes sum of squared errors
Observation
Prediction
pTin 品 The squared errors are used to penalize predictions that are far from “true” values.

Multiple Linear Regression
Multiple linear regression: multiple predictor variables
Size (feet2)
Number of bedrooms
Number of floors
Age of home (years)
Price ($1000)
𝑃𝑟𝑖𝑐𝑒= 𝜃 +𝜃 𝑆𝑖𝑧𝑒+𝜃 𝐵𝑒𝑑𝑟𝑜𝑜𝑚𝑠+𝜃 𝐹𝑙𝑜𝑜𝑟𝑠+𝜃 𝐴𝑔𝑒 01234

OLS for Multiple Linear Regression
Still, ordinary least squares (OLS) 􏰁 Minimizes sum of squared errors

The House Price Example
These coefficient values achieve the lowest possible sum of mean squared errors on training examples (the model fitting in linear regression is essentially finding the values of the coefficients)!

Interpreting Coefficients
One unit of change in 𝑥𝑖 is associated with 𝜃𝑖 change in the value of y.
A positive coefficient means that the predictor variable has a positive impact on the value of the target variable, while a negative coefficient means the opposite.
A large regression coefficient means strong impact on the
target variable (note: feature normalization required). Foot0x1200
Without feature normalization, does the claim
above still hold?
Ootlhczo.no
O 1oso sqn16

Feature Normalization in Linear Regression
There are a few benefits of applying feature
􏰁 Ability to rank the importance of features by the relative magnitude of coefficients.
􏰁 A must-do step for regularization (overfitting control). Please note: i
􏰁 Do the same transformation on your training data and
test data!
normalization before doing linear regression

The Boston House Price Example
Which variable has the highest impact on house price?

Evaluation Measures for Regression (Recap)
Mean squared error (MSE)
􏰁 The average of the squares of the differences between predicted values and actual values
Root mean squared error (RMSE)
􏰁 The square root of the average of the squares of the differences between predicted values and actual values
Mean absolute error (MAE)
􏰁 The average of the differences between predicted values and actual values

Linear Regression
􏰁 Simple to understand and interpret.
􏰁 Oversimplifies many real word problems by assuming
linearity.
􏰁 Sensitive to outliers.

Practical Issue: More Complex Attributes
The inputs for linear regression can be
􏰁Originalquantitativeinputs
􏰁 Transformation of quantitative inputs
􏰀 e.g., log, square root, square 􏰁 Polynomial transformation
出二 4 2XI tzxixz
iy 3t4XH5xz.fi
􏰀e.g.,1,𝑥, 𝑥2,…
􏰁 Interactions between variables
􏰀 e.g., 𝑥3 = 𝑥1 ∙ 𝑥2
“Linear regression” = linear in parameters (𝜃) 1二

Nonlinearity
Using more complex features allows the use of linear regression techniques to fit non-linear datasets.

Practical Issue: Controlling Model Complexity
Regularization: a method for automatically controlling the model complexity.
􏰁 L1-regularization (Lasso regression) 􏰁 L2-regularization (Ridge regression)
Idea: penalize for large magnitudes of coefficients 􏰁 Can incorporate into the minimization function
􏰁 Works well when we have a lot of features, each contributing a bit to prediction
Can address the overfitting problem

Idea Behind Regularization (I)
All other things being equal, simple models are preferable to complex ones (recall session 5).
􏰁 A simple model that fits the data is unlikely to be a coincidence.
􏰁 A complex hypothesis that fits the data might be a coincidence.
In linear regression
􏰁 Small values for parameters 一 = simpler model, which is less prone to overfitting.

Idea Behind Regularization (II)
Ideally, we want to reduce the magnitudes of coefficients while retaining the model accuracy on the training set.
How can we achieve the two objectives above at the same time?
Answer: extend the minimization function to include the goal of model simplicity (i.e., penalizing large magnitudes of coefficients)

LASSO Regression (L1-Regularization)
model fit to data ǐL1-regularization hyperparameter
𝛼 is the regularization parameter (𝛼 ≥ 0)
No regularization on !
Larger value means stronger penalization.
The estimated coefficients of LASSO regression solve this new minimization function!
What if an attribute does not help reduce the error? 26

LASSO Regression (L1-Regularization)
regression line with very large coefficient
The height variable has a steep slope (large coefficient), but its impact on income should be very small. The estimated coefficient under LASSO would be small (or even zero).
height of a person
LASSO regression penalizes coefficients with large values.
LASSO regression helps us to pick up the informative attributes (feature selection) by forcing the coefficients of unnecessary attributes to zero.

LASSO Regression -> Feature Selection
In practice, the dataset could contain hundreds of predictor variables.
LASSO regression can generate a sparse regression model, which means only a small number of attributes’ coefficients are not zero. It addresses overfitting problem by eliminating unnecessary attributes.
overfitting t
overfitting4 1morecomplex
A very rough rule of thumb is to have n>10m, where n is
#records, m is #attributes.

LASSO Regression (L1-Regularization)
Generally, the coefficients of LASSO regression are smaller than the coefficients of linear regression trained on the same
十億101 LASSO regression tends to perform better than linear
set of training examples.
Because of model simplicity, LASSO regression is less prone to
overfitting.
regression when there are a large number of attributes but not many training examples (e.g., n<=10m, where n is #records, m is #attributes). Which one is likely to achieve better performance on training examples? Linear regression or LASSO regression? [Optional] Ridge Regression (L2-Regularization model fit to data L2-regularization 𝛼 is the regularization parameter (𝛼 ≥ 0) No regularization on ! notgood Erfeature reduction case 1 LASSOBetter in this if terms of predictive performance in llgdy Quality of Model Underfitting “Just right” Overfitting The learned model may fit the training set very well (near-zero sum of squared errors), but fails to generalize to new examples. regularization Demo: Linear and LASSO Regression 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts