Basics Simple regression Regression assumptions Multiple regression
CORPFIN 2503 – Business Data Analytics:
Applications of multiple regressions
£ius
Week 4: August 16th, 2021
£ius CORPFIN 2503, Week 4 1/87
Basics Simple regression Regression assumptions Multiple regression
Outline
Basics
Simple regression
Regression assumptions
Multiple regression
£ius CORPFIN 2503, Week 4 2/87
Basics Simple regression Regression assumptions Multiple regression
Introduction
The purpose of quantitative analysis is to �nd or test certain
relations.
Correlation coe�cients shed some light on the direction of the
linear relation:
• positive
• negative or
• no linear relation.
£ius CORPFIN 2503, Week 4 3/87
Basics Simple regression Regression assumptions Multiple regression
Example
Let’s investigate the relation between car price and car length using
SAS provided data set.
/* Creating data file: */
DATA work.car_data;
SET SAShelp.Cars;
RUN;
/* Correlation coefficient: */
proc corr data=work.car_data;
var invoice length;
run;
£ius CORPFIN 2503, Week 4 4/87
Basics Simple regression Regression assumptions Multiple regression
Example II
£ius CORPFIN 2503, Week 4 5/87
Basics Simple regression Regression assumptions Multiple regression
Example III
Correlation coe�cient between car price and length is 0.16659
(p-val.=0.0005).
=⇒ The relation is positive and statistically signi�cant.
However, correlation coe�cient does not let us estimate or predict
the car price given certain car length:
• e.g., what is approximately price of 180-inch length car?
£ius CORPFIN 2503, Week 4 6/87
Basics Simple regression Regression assumptions Multiple regression
Introduction II
One needs to use regression analysis in order to answer this
question!
Regressions:
Simple regression: y = β0 + β1x
Multiple linear regression: y = β0 + β1×1 + β2×2 + · · ·+ βnxn
where
y is the dependent variable
x, x1, x2, xn are independent variables
β0 is intercept
β1, β2, βn are slopes.
£ius CORPFIN 2503, Week 4 7/87
Basics Simple regression Regression assumptions Multiple regression
Simple regression
Let’s regress using SAS car price on car length:
• INVOICE = f(LENGTH) = intercept + slope × LENGTH.
/* OLS regression: */
PROC REG DATA=work.car_data;
MODEL invoice=length;
RUN;
£ius CORPFIN 2503, Week 4 8/87
Basics Simple regression Regression assumptions Multiple regression
Simple regression II
£ius CORPFIN 2503, Week 4 9/87
Basics Simple regression Regression assumptions Multiple regression
Simple regression III
£ius CORPFIN 2503, Week 4 10/87
Basics Simple regression Regression assumptions Multiple regression
Simple regression IV
No obvious trends or patterns in the residuals.
£ius CORPFIN 2503, Week 4 11/87
Basics Simple regression Regression assumptions Multiple regression
Simple regression V
£ius CORPFIN 2503, Week 4 12/87
Basics Simple regression Regression assumptions Multiple regression
95% con�dence interval vs. 95% prediction interval
Con�dence intervals tell you how well you have determined the
mean. Assume that the data are randomly sampled
from a normal distribution and you are interested in
determining the mean. If you sample many times, and
calculate a con�dence interval of the mean from each
sample, you’d expect 95% of those intervals to
include the true value of the population mean.
Prediction intervals tell you where you can expect to see the next
data point sampled. Assume that the data are
randomly sampled from a normal distribution. Collect
a sample of data and calculate a prediction interval.
Then sample one more value from the population. If
you repeat this process many times, you’d expect the
prediction interval to capture the individual value
95% of the time.
Source: https://www.graphpad.com/support/faqid/1506/.
£ius CORPFIN 2503, Week 4 13/87
Basics Simple regression Regression assumptions Multiple regression
Simple regression: Interpretation of the results
We got that:
• intercept = �8131.76
• slope = 204.69
• INVOICE = �8131.76 + 204.69 × LENGTH.
Car price increases by $204.69 for each additional inch of car length.
£ius CORPFIN 2503, Week 4 14/87
Basics Simple regression Regression assumptions Multiple regression
Simple regression: Predictions
Suppose we would like to estimate the price of 180-inch length car:
• INVOICE = �8131.76 + 204.69 × LENGTH
• INVOICE = �8131.76 + 204.69 × 180 = $28,712.4
£ius CORPFIN 2503, Week 4 15/87
Basics Simple regression Regression assumptions Multiple regression
Simple regression: Predictions II
Let’s check the actual prices of 180-inch length cars:
PROC PRINT DATA=work.car_data (obs=20);
var make model length invoice;
where length = 180;
RUN;
£ius CORPFIN 2503, Week 4 16/87
https://www.graphpad.com/support/faqid/1506/
Basics Simple regression Regression assumptions Multiple regression
Simple regression: Predictions III
=⇒ most of the cars are more expensive than our predicted value.
£ius CORPFIN 2503, Week 4 17/87
Basics Simple regression Regression assumptions Multiple regression
R-Squared
R-squared is a goodness of �t or accuracy measure.
The higher the R-squared, the better the model.
R-squared is the ratio of the variation explained to the total
variation (of the dependent variable).
0 ≤ R-squared ≤ 1.
£ius CORPFIN 2503, Week 4 18/87
Basics Simple regression Regression assumptions Multiple regression
R-squared and correlation
We got:
• R-squared of the regression model = 0.0278
• Correlation coe�cient = 0.16659.
Correlation2 = R-squared
0.166592 = 0.0278
£ius CORPFIN 2503, Week 4 19/87
Basics Simple regression Regression assumptions Multiple regression
R-squared and correlation II
R-squared of the regression model = 0.0278
=⇒ Car length can explain only 2.78% of variation in car prices:
• explanatory power of the model is weak
• there should be other factors that better explain car prices.
£ius CORPFIN 2503, Week 4 20/87
Basics Simple regression Regression assumptions Multiple regression
Predicted values and residuals
Predicted values of car prices (INVOICEpred) can be computed
from:
• INVOICEpred = �8131.76 + 204.69 × LENGTH,
where LENGTH is the actual car length.
Residuals (INVOICEres) are then computed as:
• INVOICEres = INVOICE � INVOICEpred,
where INVOICE is the actual car price.
£ius CORPFIN 2503, Week 4 21/87
Basics Simple regression Regression assumptions Multiple regression
Predicted values and residuals II
Consider the following 180-inch length cars:
• LX: $18,630
• Chevrolet Corvette 2dr: $39,068.
The predicted car prices are the same: $28,712.4.
The residuals are:
• LX: �$10,082.4
• Chevrolet Corvette 2dr: $10,355.6.
£ius CORPFIN 2503, Week 4 22/87
Basics Simple regression Regression assumptions Multiple regression
Predicted values and residuals III
We can compute predicted values and residuals manually or we can
ask SAS to do this:
PROC REG DATA=work.car_data;
MODEL invoice=length;
OUTPUT OUT=work.reg_results r=Res p=Pred;
RUN;
`Pred’ are predicted values
`Res’ description are residuals.
£ius CORPFIN 2503, Week 4 23/87
Basics Simple regression Regression assumptions Multiple regression
Assumptions of linear regression
1. Linear relationship
2. Normality
3. Independence
4. Homoscedasticity
£ius CORPFIN 2503, Week 4 24/87
Basics Simple regression Regression assumptions Multiple regression
1. Linear relationship
The relation between the independent variable(x) and the
dependent variable (y) is linear.
Detecting a linear relationship is fairly simple.
In most cases, linearity is clear from the scatterplot.
Relevant SAS code:
/* Scatterplot: */
proc gplot data=work.car_data;
title ‘Scatter plot of Invoice and length’;
plot Invoice * Length=1;
run;
quit;
£ius CORPFIN 2503, Week 4 25/87
Basics Simple regression Regression assumptions Multiple regression
1. Linear relationship II
=⇒ no obvious non-linear relation found.
£ius CORPFIN 2503, Week 4 26/87
Basics Simple regression Regression assumptions Multiple regression
2. Normality
The dependent variable y is distributed normally for each value of
the independent variable x.
Outliers are the main reason for the violation of this assumption.
To check for the normality, one can use:
• scatterplots (y vs x)
• histogram of standardized residuals
• A normal probability versus residual probability distribution
plot (P-P plot).
=⇒ there are a few outliers.
£ius CORPFIN 2503, Week 4 27/87
Basics Simple regression Regression assumptions Multiple regression
3. Independence
The values of y should depend on independent variables but not on
its own previous values.
The violation of this assumption is observed mostly in time series
data (e.g., gross domestic product (GDP))
Autocorrelation coe�cients for di�erent lags can help detect
dependencies:
• correlation between GDP and GDP lagged by 1 period
• correlation between GDP and GDP lagged by 2 periods
• correlation between GDP and GDP lagged by 3 periods
• correlation between GDP and GDP lagged by 4 periods.
£ius CORPFIN 2503, Week 4 28/87
Basics Simple regression Regression assumptions Multiple regression
4. Homoscedasticity
The variance in y is the same at each stage of x.
There is no special segment or an interval in x where the dispersion
in y is distinct.
Scatterplots (y vs x) can be used to detect heteroscedasticity
(which is opposite of homoscedasticity).
In our example, the variance is higher when car is 175-205 inch long
but statistical tests imply that the model does not su�er from
heteroscedasticity
Plots with the residual versus predicted values can also be used to
detect heteroscedasticity.
£ius CORPFIN 2503, Week 4 29/87
Basics Simple regression Regression assumptions Multiple regression
4. Homoscedasticity II
The simple way to deal with heteroscedasticity is to segment out
the data and build di�erent regression lines for di�erent intervals.
In general, if the �rst three assumptions are satis�ed, then
heteroscedasticity might not even exist.
As a rule of thumb, �rst three assumptions need to be �xed before
attempting to �x heteroscedasticity.
£ius CORPFIN 2503, Week 4 30/87
Basics Simple regression Regression assumptions Multiple regression
4. Homoscedasticity III
To detect heteroscedasticity, one can use White’s and
Breusch-Pagan tests.
Procedure MODEL (rather than REG) includes them.
Relevant SAS code (both procedures give the same results):
PROC REG DATA=work.car_data;
MODEL invoice=EngineSize;
RUN;
PROC MODEL DATA=work.car_data;
PARMS a1 b1;
invoice = a1 + b1 * EngineSize;
FIT invoice / WHITE PAGAN=(1 EngineSize);
RUN;
QUIT;
£ius CORPFIN 2503, Week 4 31/87
Basics Simple regression Regression assumptions Multiple regression
4. Homoscedasticity IV
Tests’ results:
=⇒ H0 of no heteroscedasticity is rejected.
£ius CORPFIN 2503, Week 4 32/87
Basics Simple regression Regression assumptions Multiple regression
4. Homoscedasticity V
Solutions:
• adjust standard errors (a.k.a., (heteroskedasticity) robust
standard errors, White-Huber standard errors etc.)
• transform non-normally distributed variables (e.g., using
natural logarithm).
£ius CORPFIN 2503, Week 4 33/87
Basics Simple regression Regression assumptions Multiple regression
Adjusted standard errors
Option ACOV adjusts standard errors:
PROC REG DATA=work.car_data;
MODEL invoice=EngineSize / ACOV;
RUN;
This option can be used in SAS procedure REG only.
£ius CORPFIN 2503, Week 4 34/87
Basics Simple regression Regression assumptions Multiple regression
Adjusted standard errors II
Results:
The robust standard errors can be used even under
homoskedasticity.
Then the robust standard errors will become just �regular� standard
errors.
£ius CORPFIN 2503, Week 4 35/87
Basics Simple regression Regression assumptions Multiple regression
Log-transformed variables
Variables with positive values can be log-transformed.
DATA work.car_data;
SET work.car_data;
log_MSRP=log(MSRP);
log_length=log(length);
run;
£ius CORPFIN 2503, Week 4 36/87
Basics Simple regression Regression assumptions Multiple regression
Log-transformed variables II
£ius CORPFIN 2503, Week 4 37/87
Basics Simple regression Regression assumptions Multiple regression
Log-transformed variables III
£ius CORPFIN 2503, Week 4 38/87
Basics Simple regression Regression assumptions Multiple regression
Log-transformed variables IV
Let’s estimate 3 regression models and predict MSRP of a 180
inches long car.
PROC REG DATA=work.car_data;
MODEL log_MSRP= length;
RUN;
PROC REG DATA=work.car_data;
MODEL MSRP= log_length;
RUN;
PROC REG DATA=work.car_data;
MODEL log_MSRP= log_length;
RUN;
QUIT;
£ius CORPFIN 2503, Week 4 39/87
Basics Simple regression Regression assumptions Multiple regression
Presentation of results
SAS does not present regression results properly.
We should manually make tables.
We should also make table descriptions that include:
• variable de�nitions
• that t-statistics based on standard errors are reported in
brackets.
• that ***, **, and * indicate signi�cance at 1%, 5%, and 10%
levels, respectively.
£ius CORPFIN 2503, Week 4 40/87
Basics Simple regression Regression assumptions Multiple regression
Log-transformed variables V
We get the following results:
Dependent variable:
ln(MSRP) MSRP ln(MSRP)
Independent variables Model 1 Model 2 Model 3
LENGTH 0.0095***
[6.05]
ln(LENGTH) 44,467*** 1.8039***
[3.70] [6.16]
Intercept 8.4922*** �199,551*** 0.8446
[28.83] [3.18] [0.55]
R2 0.079 0.031 0.082
Adjusted R2 0.077 0.029 0.080
Number of observations 428 428 428
£ius CORPFIN 2503, Week 4 41/87
Basics Simple regression Regression assumptions Multiple regression
Description for the previous table
Table 1: Determinants of car prices
This table presents the results of OLS regressions where the dependent
variable is the manufacturer suggested retail price (MSRP) or its natural
logarithm. LENGTH is a car length in inches. The absolute values of
t-statistics based on standard errors are reported in brackets. ***, **,
and * indicate signi�cance at 1%, 5%, and 10% levels, respectively.
The description above should be above the table.
£ius CORPFIN 2503, Week 4 42/87
Basics Simple regression Regression assumptions Multiple regression
Log-transformed variables VI
Let’s compute predicted values of MRSP for 180 inches long car.
ln(180) ≈ 5.1930
Model 1: ln(MSRP) = 8.4922 + 0.0095 × 180 ≈ 10.2022.
MSRP = exp(10.2022) ≈ 26,962.44
Model 2: MSRP = �199551 + 44,467 × 5.1930 ≈ 31,364.21
Model 3: ln(MSRP) = 0.8446 + 1.8039 × 5.1930 ≈ 10.2122.
MSRP = exp(10.2122) ≈ 27,232.73
Models 1 and 3 are preferred (one of the reasons is higher R2).
The di�erence between the predictions of Models 1 and 3 is 1%.
However, the di�erence between the predictions of Models 2 and 3
is 15%.
£ius CORPFIN 2503, Week 4 43/87
Basics Simple regression Regression assumptions Multiple regression
When linear regression can’t be used
If any of the 4 assumptions are not satis�ed, the linear regressions
should not be used.
Linear regression can’t be used when:
• the relation between y and x is nonlinear
• the errors are not normally distributed
• there is a dependency within the values of the dependent
variable
• the variance pattern of y is not the same for the entire range
of x.
£ius CORPFIN 2503, Week 4 44/87
Basics Simple regression Regression assumptions Multiple regression
Multiple regression
Let’s investigate the relation between the discount and car length.
It is likely that the discount is impacted by many other factors
besides car length:
• car manufacturer (`make’)
• car model (`model’)
• car type (`type’)
• drivetrain (`drivetrain’)
• production place (`origin’)
• engine (`enginesize’, `cylinders’, `horsepower’)
• fuel e�ciency (`MPG_City’, `MPG_Highway’)
• and maybe car weight (`weight’) and wheel base (`wheelbase’).
£ius CORPFIN 2503, Week 4 45/87
Basics Simple regression Regression assumptions Multiple regression
Multiple regression II
If our regression model does not include any of the important
independent variables, then the model su�ers from the omitted
variable bias.
OLS estimator is likely to be biased and inconsistent due to omitted
variable bias.
=⇒ coe�cient estimates might become unreliable.
One should include as many variables as possible in the regression
model if the data is available.
I found that some variables in the previous slide cannot be included
in the model together with car length (due to multicollinearity).
£ius CORPFIN 2503, Week 4 46/87
Basics Simple regression Regression assumptions Multiple regression
is a phenomenon due to a high interdependency
between the independent variables.
If we include highly correlated independent variables in the same
regression model, then this could cause multicollinearity.
Implications of multicollinearity:
• it increases the variance of the coe�cient estimates and make
the estimates very sensitive to minor changes in the model
• coe�cient estimates might become unstable and di�cult to
interpret
• T-test results are not trustworthy etc.
£ius CORPFIN 2503, Week 4 47/87
Basics Simple regression Regression assumptions Multiple regression
Multiple regression III
First, let’s check the summary statistics of the discount.
/* Creating discount variable: */
DATA work.car_data;
SET work.car_data;
discount=msrp/invoice-1;
RUN;
proc univariate data=work.car_data plots;
var discount;
run;
£ius CORPFIN 2503, Week 4 48/87
Basics Simple regression Regression assumptions Multiple regression
Properties of DISCOUNT
£ius CORPFIN 2503, Week 4 49/87
Basics Simple regression Regression assumptions Multiple regression
Properties of DISCOUNT II
£ius CORPFIN 2503, Week 4 50/87
Basics Simple regression Regression assumptions Multiple regression
Properties of DISCOUNT III
£ius CORPFIN 2503, Week 4 51/87
Basics Simple regression Regression assumptions Multiple regression
Properties of DISCOUNT IV
£ius CORPFIN 2503, Week 4 52/87
Basics Simple regression Regression assumptions Multiple regression
Correlation matrix
Let’s look at the correlation matrix of discount and other numerical
variables:
proc corr data=work.car_data;
var discount invoice enginesize cylinders horsepower
MPG_City MPG_Highway Length weight wheelbase;
run;
£ius CORPFIN 2503, Week 4 53/87
Basics Simple regression Regression assumptions Multiple regression
Correlation matrix II
£ius CORPFIN 2503, Week 4 54/87
Basics Simple regression Regression assumptions Multiple regression
Determinants of DISCOUNT
Let’s consider engine size as potential determinant for discount.
SAS code for scatterplot:
proc gplot data=work.car_data;
title ‘Scatter plot of discount and engine size’;
plot Discount * Enginesize=1;
run;
quit;
£ius CORPFIN 2503, Week 4 55/87
Basics Simple regression Regression assumptions Multiple regression
Determinants of DISCOUNT II
£ius CORPFIN 2503, Week 4 56/87
Basics Simple regression Regression assumptions Multiple regression
Determinants of DISCOUNT III
Let’s start with simple regression:
/* OLS regression: */
PROC REG DATA=work.car_data;
MODEL discount= enginesize;
RUN;
£ius CORPFIN 2503, Week 4 57/87
Basics Simple regression Regression assumptions Multiple regression
Simple regression
£ius CORPFIN 2503, Week 4 58/87
Basics Simple regression Regression assumptions Multiple regression
Simple regression II
We got that:
• intercept = 0.05782
• slope = 0.00952
• DISCOUNT = 0.05782 + 0.00952 × ENGINE SIZE.
Discount increases by 0.00952 for each additional liter of engine.
Plus one gets the 0.05782 discount for showing up in the car
dealership.
£ius CORPFIN 2503, Week 4 59/87
Basics Simple regression Regression assumptions Multiple regression
Simple regression III
We got:
• R-squared of the regression model = 0.1938
• Correlation coe�cient = 0.44021 (0.440212 = 0.1938).
=⇒ Engine size can explain 19.38% of variation in discount.
£ius CORPFIN 2503, Week 4 60/87
Basics Simple regression Regression assumptions Multiple regression
Simple regression IV
£ius CORPFIN 2503, Week 4 61/87
Basics Simple regression Regression assumptions Multiple regression
Simple regression V
Residuals seem to be independent from engine size. Residuals are
somewhat normally distributed.
£ius CORPFIN 2503, Week 4 62/87
Basics Simple regression Regression assumptions Multiple regression
Simple regression VI
£ius CORPFIN 2503, Week 4 63/87
Basics Simple regression Regression assumptions Multiple regression
Multiple regression IV
Let’s augment our model with additional independent variable � car
length.
Scatterplot code in SAS:
proc gplot data=work.car_data;
title ‘Scatter plot of discount and length’;
plot Discount * Length=1;
run;
quit;
£ius CORPFIN 2503, Week 4 64/87
Basics Simple regression Regression assumptions Multiple regression
Multiple regression V
£ius CORPFIN 2503, Week 4 65/87
Basics Simple regression Regression assumptions Multiple regression
Multiple regression VI
Regression code in SAS:
/* OLS regression: */
PROC REG DATA=work.car_data;
MODEL discount= enginesize length;
RUN;
£ius CORPFIN 2503, Week 4 66/87
Basics Simple regression Regression assumptions Multiple regression
Multiple regression VII
£ius CORPFIN 2503, Week 4 67/87
Basics Simple regression Regression assumptions Multiple regression
Multiple regression VIII
Car length is insigni�cant.
No improvement in R-squared.
Coe�cient estimates for intercept and engine size almost did not
change.
F-value = 51.77 (p-value < 0.0001). =⇒ we reject null hypothesis that all slopes are 0s (joint test). Thus, our model has some explanatory power. £ius CORPFIN 2503, Week 4 68/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression IX £ius CORPFIN 2503, Week 4 69/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression X Residuals seem to be well distributed. £ius CORPFIN 2503, Week 4 70/87 Basics Simple regression Regression assumptions Multiple regression Multicollinearity II According to the correlation matrix: • corr. between discount and MPG_City = �0.34 • corr. between discount and MPG_Highway = �0.33 • corr. between MPG_City and MPG_Highway = 0.94 Let's see what happens when both MPG_City and MPG_Highway are included in the regression model. £ius CORPFIN 2503, Week 4 71/87 Basics Simple regression Regression assumptions Multiple regression Multicollinearity III Dependent variable: DISCOUNT Independent variables Model 1 Model 2 Model 3 MPG_City �0.0016*** �0.0015** [7.53] [2.36] MPG_Highway -0.0014*** �0.0001 [7.11] [0.20] Intercept 0.1197*** 0.1247*** 0.1204*** [27.72] [23.76] [21.74] R2 0.117 0.106 0.118 Adjusted R2 0.115 0.104 0.113 Number of observations 428 428 428 In Model 3, the coe�cient estimate for MPG_Highway is insigni�cant and its value is very small (in absolute terms). One should include either MPG_Highway or MPG_City but not both variables in the regression. £ius CORPFIN 2503, Week 4 72/87 Basics Simple regression Regression assumptions Multiple regression Description for the previous table Table 2: Determinants of the discount on the car This table presents the results of OLS regressions where the dependent variable is a discount on the car in decimals (DISCOUNT). MPG_City is the number of miles a car can travel on a single gallon of gas while driving on city streets. MPG_Highway is the number of miles a car can travel on a single gallon of gas while driving on the highway. The absolute values of t-statistics based on standard errors are reported in brackets. ***, **, and * indicate signi�cance at 1%, 5%, and 10% levels, respectively. The description above should be above the table. £ius CORPFIN 2503, Week 4 73/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression: with drivetrain DRIVETRAIN is a categorical variable: ALL, FRONT, REAR. Let's check whether drivetrain is an important factor (using SAS procedure GLM): proc glm data=work.car_data; class drivetrain; MODEL discount = enginesize drivetrain / solution noint; run; £ius CORPFIN 2503, Week 4 74/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression: with drivetrain II £ius CORPFIN 2503, Week 4 75/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression: with drivetrain III £ius CORPFIN 2503, Week 4 76/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression: with drivetrain IV The model includes 3 slopes � one for each type of drivetrain (ALL, FRONT, REAR). Let's check whether discount is statistically di�erent for di�erent types of drivetrain: proc glm data=work.car_data; class drivetrain; MODEL discount= enginesize drivetrain/ solution; run; £ius CORPFIN 2503, Week 4 77/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression: with drivetrain V REAR type of drivetrain is a benchmark case. The discounts for ALL and FRONT drivetrains are statistically the same as for REAR drivetrain. The coe�cient estimate for engine size is almost the same as before. £ius CORPFIN 2503, Week 4 78/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression: Results with and without intercept REAR type of drivetrain is a benchmark case on the LHS. Its impact on discount is equal to intercept on the LHS. To get coef. estimates for FRONT on the RHS, we should add intercept and FRONT on the LHS. The same thing for ALL. £ius CORPFIN 2503, Week 4 79/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression XI In the same way, we can check whether car type (TYPE) impacts the discount. Also car manufacturer (MAKE) or ORIGIN. By including a number of dummy variables (one for each type of the cars), we estimate �xed e�ects regression. Fixed e�ects regressions are superior to OLS regressions in most cases. £ius CORPFIN 2503, Week 4 80/87 Basics Simple regression Regression assumptions Multiple regression Fixed e�ects regressions If we are not interested in the coef. estimates of particular dummy, we do not need to report full table. The table can be condensed as follows. Dependent variable: DISCOUNT Independent variables Model 1 Model 2 Model 3 Model 4 EngineSize 0.0095*** 0.0087*** 0.0088*** 0.0088*** [10.12] [8.49] [8.10] [11.79] Intercept 0.0578*** [18.17] Drivetrain �xed e�ects No Yes Yes Yes Car type �xed e�ects No No Yes Yes Manufacturer �xed e�ects No No No Yes R2 0.194 0.204 0.222 0.767 Number of observations 428 428 428 428 £ius CORPFIN 2503, Week 4 81/87 Basics Simple regression Regression assumptions Multiple regression Interaction terms Suppose we would like to check whether a discount is greater for American cars with larger engines. To do so, we need to create an interaction term between engine size and a dummy for American cars. Then in the regression model, we need to include interaction term, engine size, and dummy for American cars besides other independent variables. £ius CORPFIN 2503, Week 4 82/87 Basics Simple regression Regression assumptions Multiple regression Interaction terms II SAS code: /* Generating a dummy for US made car and an interaction term between this dummy and engine size: */ DATA work.car_data; SET work.car_data; US_car=0; IF Origin="USA" then US_car=1; US_car_engine=US_car*enginesize; RUN; proc glm data=work.car_data; class drivetrain; MODEL discount = enginesize US_car_engine US_car drivetrain / solution; run; £ius CORPFIN 2503, Week 4 83/87 Basics Simple regression Regression assumptions Multiple regression Interaction terms III American cars are subject to lower discounts (by 0.016). However, American cars with larger engines are discounted more. £ius CORPFIN 2503, Week 4 84/87 Basics Simple regression Regression assumptions Multiple regression Interaction terms IV The coe�cient estimate for engine size is 0.0075 • discount increases by 0.0075 for each additional liter of engine. The interaction term is 0.0039 (signi�cant at 5% level): • discount increases by additional 0.0039 for each additional liter of engine for American cars • thus, in total, discount increases by 0.0114 for each additional liter of engine for American cars. £ius CORPFIN 2503, Week 4 85/87 Basics Simple regression Regression assumptions Multiple regression Interaction terms V Let's predict the discount for the following car: • EngineSize = 3 • Origin = USA • DriveTrain = All. DISCOUNT = 0.0660 + 0.0075 × EngineSize + 0.0039 × US_car_engine � 0.0161 × US_car + 0.0025 DISCOUNT = 0.0865 £ius CORPFIN 2503, Week 4 86/87 Basics Simple regression Regression assumptions Multiple regression Required reading Konasani, V. R. and Kadre, S. (2015). �Practical Business Analytics Using SAS: A Hands-on Guide�: chapters 9, 10. £ius CORPFIN 2503, Week 4 87/87 Basics Basics Simple regression Simple regression Regression assumptions Regression assumptions Multiple regression Multiple regression