程序代写代做代考 Basics Simple regression Regression assumptions Multiple regression

Basics Simple regression Regression assumptions Multiple regression

CORPFIN 2503 – Business Data Analytics:

Applications of multiple regressions

£ius

Week 4: August 16th, 2021

£ius CORPFIN 2503, Week 4 1/87

Basics Simple regression Regression assumptions Multiple regression

Outline

Basics

Simple regression

Regression assumptions

Multiple regression

£ius CORPFIN 2503, Week 4 2/87

Basics Simple regression Regression assumptions Multiple regression

Introduction

The purpose of quantitative analysis is to �nd or test certain

relations.

Correlation coe�cients shed some light on the direction of the

linear relation:

• positive
• negative or
• no linear relation.

£ius CORPFIN 2503, Week 4 3/87

Basics Simple regression Regression assumptions Multiple regression

Example

Let’s investigate the relation between car price and car length using

SAS provided data set.

/* Creating data file: */

DATA work.car_data;

SET SAShelp.Cars;

RUN;

/* Correlation coefficient: */

proc corr data=work.car_data;

var invoice length;

run;

£ius CORPFIN 2503, Week 4 4/87

Basics Simple regression Regression assumptions Multiple regression

Example II

£ius CORPFIN 2503, Week 4 5/87

Basics Simple regression Regression assumptions Multiple regression

Example III

Correlation coe�cient between car price and length is 0.16659

(p-val.=0.0005).

=⇒ The relation is positive and statistically signi�cant.

However, correlation coe�cient does not let us estimate or predict

the car price given certain car length:

• e.g., what is approximately price of 180-inch length car?

£ius CORPFIN 2503, Week 4 6/87

Basics Simple regression Regression assumptions Multiple regression

Introduction II

One needs to use regression analysis in order to answer this

question!

Regressions:

Simple regression: y = β0 + β1x

Multiple linear regression: y = β0 + β1×1 + β2×2 + · · ·+ βnxn
where

y is the dependent variable

x, x1, x2, xn are independent variables

β0 is intercept

β1, β2, βn are slopes.

£ius CORPFIN 2503, Week 4 7/87

Basics Simple regression Regression assumptions Multiple regression

Simple regression

Let’s regress using SAS car price on car length:

• INVOICE = f(LENGTH) = intercept + slope × LENGTH.

/* OLS regression: */

PROC REG DATA=work.car_data;

MODEL invoice=length;

RUN;

£ius CORPFIN 2503, Week 4 8/87

Basics Simple regression Regression assumptions Multiple regression

Simple regression II

£ius CORPFIN 2503, Week 4 9/87

Basics Simple regression Regression assumptions Multiple regression

Simple regression III

£ius CORPFIN 2503, Week 4 10/87

Basics Simple regression Regression assumptions Multiple regression

Simple regression IV

No obvious trends or patterns in the residuals.
£ius CORPFIN 2503, Week 4 11/87

Basics Simple regression Regression assumptions Multiple regression

Simple regression V

£ius CORPFIN 2503, Week 4 12/87

Basics Simple regression Regression assumptions Multiple regression

95% con�dence interval vs. 95% prediction interval
Con�dence intervals tell you how well you have determined the

mean. Assume that the data are randomly sampled

from a normal distribution and you are interested in

determining the mean. If you sample many times, and

calculate a con�dence interval of the mean from each

sample, you’d expect 95% of those intervals to

include the true value of the population mean.

Prediction intervals tell you where you can expect to see the next

data point sampled. Assume that the data are

randomly sampled from a normal distribution. Collect

a sample of data and calculate a prediction interval.

Then sample one more value from the population. If

you repeat this process many times, you’d expect the

prediction interval to capture the individual value

95% of the time.

Source: https://www.graphpad.com/support/faqid/1506/.
£ius CORPFIN 2503, Week 4 13/87

Basics Simple regression Regression assumptions Multiple regression

Simple regression: Interpretation of the results

We got that:

• intercept = �8131.76
• slope = 204.69
• INVOICE = �8131.76 + 204.69 × LENGTH.

Car price increases by $204.69 for each additional inch of car length.

£ius CORPFIN 2503, Week 4 14/87

Basics Simple regression Regression assumptions Multiple regression

Simple regression: Predictions

Suppose we would like to estimate the price of 180-inch length car:

• INVOICE = �8131.76 + 204.69 × LENGTH
• INVOICE = �8131.76 + 204.69 × 180 = $28,712.4

£ius CORPFIN 2503, Week 4 15/87

Basics Simple regression Regression assumptions Multiple regression

Simple regression: Predictions II

Let’s check the actual prices of 180-inch length cars:

PROC PRINT DATA=work.car_data (obs=20);

var make model length invoice;

where length = 180;

RUN;

£ius CORPFIN 2503, Week 4 16/87

https://www.graphpad.com/support/faqid/1506/

Basics Simple regression Regression assumptions Multiple regression

Simple regression: Predictions III

=⇒ most of the cars are more expensive than our predicted value.
£ius CORPFIN 2503, Week 4 17/87

Basics Simple regression Regression assumptions Multiple regression

R-Squared

R-squared is a goodness of �t or accuracy measure.

The higher the R-squared, the better the model.

R-squared is the ratio of the variation explained to the total

variation (of the dependent variable).

0 ≤ R-squared ≤ 1.

£ius CORPFIN 2503, Week 4 18/87

Basics Simple regression Regression assumptions Multiple regression

R-squared and correlation

We got:

• R-squared of the regression model = 0.0278
• Correlation coe�cient = 0.16659.

Correlation2 = R-squared

0.166592 = 0.0278

£ius CORPFIN 2503, Week 4 19/87

Basics Simple regression Regression assumptions Multiple regression

R-squared and correlation II

R-squared of the regression model = 0.0278

=⇒ Car length can explain only 2.78% of variation in car prices:
• explanatory power of the model is weak
• there should be other factors that better explain car prices.

£ius CORPFIN 2503, Week 4 20/87

Basics Simple regression Regression assumptions Multiple regression

Predicted values and residuals

Predicted values of car prices (INVOICEpred) can be computed

from:

• INVOICEpred = �8131.76 + 204.69 × LENGTH,

where LENGTH is the actual car length.

Residuals (INVOICEres) are then computed as:

• INVOICEres = INVOICE � INVOICEpred,

where INVOICE is the actual car price.

£ius CORPFIN 2503, Week 4 21/87

Basics Simple regression Regression assumptions Multiple regression

Predicted values and residuals II

Consider the following 180-inch length cars:

• LX: $18,630
• Chevrolet Corvette 2dr: $39,068.

The predicted car prices are the same: $28,712.4.

The residuals are:

• LX: �$10,082.4
• Chevrolet Corvette 2dr: $10,355.6.

£ius CORPFIN 2503, Week 4 22/87

Basics Simple regression Regression assumptions Multiple regression

Predicted values and residuals III

We can compute predicted values and residuals manually or we can

ask SAS to do this:

PROC REG DATA=work.car_data;

MODEL invoice=length;

OUTPUT OUT=work.reg_results r=Res p=Pred;

RUN;

`Pred’ are predicted values

`Res’ description are residuals.

£ius CORPFIN 2503, Week 4 23/87

Basics Simple regression Regression assumptions Multiple regression

Assumptions of linear regression

1. Linear relationship

2. Normality

3. Independence

4. Homoscedasticity

£ius CORPFIN 2503, Week 4 24/87

Basics Simple regression Regression assumptions Multiple regression

1. Linear relationship

The relation between the independent variable(x) and the

dependent variable (y) is linear.

Detecting a linear relationship is fairly simple.

In most cases, linearity is clear from the scatterplot.

Relevant SAS code:

/* Scatterplot: */

proc gplot data=work.car_data;

title ‘Scatter plot of Invoice and length’;

plot Invoice * Length=1;

run;

quit;

£ius CORPFIN 2503, Week 4 25/87

Basics Simple regression Regression assumptions Multiple regression

1. Linear relationship II

=⇒ no obvious non-linear relation found.
£ius CORPFIN 2503, Week 4 26/87

Basics Simple regression Regression assumptions Multiple regression

2. Normality

The dependent variable y is distributed normally for each value of

the independent variable x.

Outliers are the main reason for the violation of this assumption.

To check for the normality, one can use:

• scatterplots (y vs x)
• histogram of standardized residuals
• A normal probability versus residual probability distribution
plot (P-P plot).

=⇒ there are a few outliers.

£ius CORPFIN 2503, Week 4 27/87

Basics Simple regression Regression assumptions Multiple regression

3. Independence

The values of y should depend on independent variables but not on

its own previous values.

The violation of this assumption is observed mostly in time series

data (e.g., gross domestic product (GDP))

Autocorrelation coe�cients for di�erent lags can help detect

dependencies:

• correlation between GDP and GDP lagged by 1 period
• correlation between GDP and GDP lagged by 2 periods
• correlation between GDP and GDP lagged by 3 periods
• correlation between GDP and GDP lagged by 4 periods.

£ius CORPFIN 2503, Week 4 28/87

Basics Simple regression Regression assumptions Multiple regression

4. Homoscedasticity

The variance in y is the same at each stage of x.

There is no special segment or an interval in x where the dispersion

in y is distinct.

Scatterplots (y vs x) can be used to detect heteroscedasticity

(which is opposite of homoscedasticity).

In our example, the variance is higher when car is 175-205 inch long

but statistical tests imply that the model does not su�er from

heteroscedasticity

Plots with the residual versus predicted values can also be used to

detect heteroscedasticity.

£ius CORPFIN 2503, Week 4 29/87

Basics Simple regression Regression assumptions Multiple regression

4. Homoscedasticity II

The simple way to deal with heteroscedasticity is to segment out

the data and build di�erent regression lines for di�erent intervals.

In general, if the �rst three assumptions are satis�ed, then

heteroscedasticity might not even exist.

As a rule of thumb, �rst three assumptions need to be �xed before

attempting to �x heteroscedasticity.

£ius CORPFIN 2503, Week 4 30/87

Basics Simple regression Regression assumptions Multiple regression

4. Homoscedasticity III
To detect heteroscedasticity, one can use White’s and

Breusch-Pagan tests.

Procedure MODEL (rather than REG) includes them.

Relevant SAS code (both procedures give the same results):

PROC REG DATA=work.car_data;

MODEL invoice=EngineSize;

RUN;

PROC MODEL DATA=work.car_data;

PARMS a1 b1;

invoice = a1 + b1 * EngineSize;

FIT invoice / WHITE PAGAN=(1 EngineSize);

RUN;

QUIT;
£ius CORPFIN 2503, Week 4 31/87

Basics Simple regression Regression assumptions Multiple regression

4. Homoscedasticity IV

Tests’ results:

=⇒ H0 of no heteroscedasticity is rejected.

£ius CORPFIN 2503, Week 4 32/87

Basics Simple regression Regression assumptions Multiple regression

4. Homoscedasticity V

Solutions:

• adjust standard errors (a.k.a., (heteroskedasticity) robust
standard errors, White-Huber standard errors etc.)

• transform non-normally distributed variables (e.g., using
natural logarithm).

£ius CORPFIN 2503, Week 4 33/87

Basics Simple regression Regression assumptions Multiple regression

Adjusted standard errors

Option ACOV adjusts standard errors:

PROC REG DATA=work.car_data;

MODEL invoice=EngineSize / ACOV;

RUN;

This option can be used in SAS procedure REG only.

£ius CORPFIN 2503, Week 4 34/87

Basics Simple regression Regression assumptions Multiple regression

Adjusted standard errors II

Results:

The robust standard errors can be used even under

homoskedasticity.

Then the robust standard errors will become just �regular� standard

errors.

£ius CORPFIN 2503, Week 4 35/87

Basics Simple regression Regression assumptions Multiple regression

Log-transformed variables

Variables with positive values can be log-transformed.

DATA work.car_data;

SET work.car_data;

log_MSRP=log(MSRP);

log_length=log(length);

run;

£ius CORPFIN 2503, Week 4 36/87

Basics Simple regression Regression assumptions Multiple regression

Log-transformed variables II

£ius CORPFIN 2503, Week 4 37/87

Basics Simple regression Regression assumptions Multiple regression

Log-transformed variables III

£ius CORPFIN 2503, Week 4 38/87

Basics Simple regression Regression assumptions Multiple regression

Log-transformed variables IV

Let’s estimate 3 regression models and predict MSRP of a 180

inches long car.

PROC REG DATA=work.car_data;

MODEL log_MSRP= length;

RUN;

PROC REG DATA=work.car_data;

MODEL MSRP= log_length;

RUN;

PROC REG DATA=work.car_data;

MODEL log_MSRP= log_length;

RUN;

QUIT;

£ius CORPFIN 2503, Week 4 39/87

Basics Simple regression Regression assumptions Multiple regression

Presentation of results

SAS does not present regression results properly.

We should manually make tables.

We should also make table descriptions that include:

• variable de�nitions
• that t-statistics based on standard errors are reported in
brackets.

• that ***, **, and * indicate signi�cance at 1%, 5%, and 10%
levels, respectively.

£ius CORPFIN 2503, Week 4 40/87

Basics Simple regression Regression assumptions Multiple regression

Log-transformed variables V

We get the following results:

Dependent variable:

ln(MSRP) MSRP ln(MSRP)

Independent variables Model 1 Model 2 Model 3

LENGTH 0.0095***
[6.05]

ln(LENGTH) 44,467*** 1.8039***
[3.70] [6.16]

Intercept 8.4922*** �199,551*** 0.8446
[28.83] [3.18] [0.55]

R2 0.079 0.031 0.082
Adjusted R2 0.077 0.029 0.080
Number of observations 428 428 428

£ius CORPFIN 2503, Week 4 41/87

Basics Simple regression Regression assumptions Multiple regression

Description for the previous table

Table 1: Determinants of car prices

This table presents the results of OLS regressions where the dependent
variable is the manufacturer suggested retail price (MSRP) or its natural
logarithm. LENGTH is a car length in inches. The absolute values of
t-statistics based on standard errors are reported in brackets. ***, **,
and * indicate signi�cance at 1%, 5%, and 10% levels, respectively.

The description above should be above the table.

£ius CORPFIN 2503, Week 4 42/87

Basics Simple regression Regression assumptions Multiple regression

Log-transformed variables VI

Let’s compute predicted values of MRSP for 180 inches long car.

ln(180) ≈ 5.1930

Model 1: ln(MSRP) = 8.4922 + 0.0095 × 180 ≈ 10.2022.
MSRP = exp(10.2022) ≈ 26,962.44

Model 2: MSRP = �199551 + 44,467 × 5.1930 ≈ 31,364.21

Model 3: ln(MSRP) = 0.8446 + 1.8039 × 5.1930 ≈ 10.2122.
MSRP = exp(10.2122) ≈ 27,232.73

Models 1 and 3 are preferred (one of the reasons is higher R2).

The di�erence between the predictions of Models 1 and 3 is 1%.

However, the di�erence between the predictions of Models 2 and 3

is 15%.

£ius CORPFIN 2503, Week 4 43/87

Basics Simple regression Regression assumptions Multiple regression

When linear regression can’t be used

If any of the 4 assumptions are not satis�ed, the linear regressions

should not be used.

Linear regression can’t be used when:

• the relation between y and x is nonlinear
• the errors are not normally distributed
• there is a dependency within the values of the dependent
variable

• the variance pattern of y is not the same for the entire range
of x.

£ius CORPFIN 2503, Week 4 44/87

Basics Simple regression Regression assumptions Multiple regression

Multiple regression

Let’s investigate the relation between the discount and car length.

It is likely that the discount is impacted by many other factors

besides car length:

• car manufacturer (`make’)
• car model (`model’)
• car type (`type’)
• drivetrain (`drivetrain’)
• production place (`origin’)
• engine (`enginesize’, `cylinders’, `horsepower’)
• fuel e�ciency (`MPG_City’, `MPG_Highway’)
• and maybe car weight (`weight’) and wheel base (`wheelbase’).

£ius CORPFIN 2503, Week 4 45/87

Basics Simple regression Regression assumptions Multiple regression

Multiple regression II

If our regression model does not include any of the important

independent variables, then the model su�ers from the omitted

variable bias.

OLS estimator is likely to be biased and inconsistent due to omitted

variable bias.

=⇒ coe�cient estimates might become unreliable.

One should include as many variables as possible in the regression

model if the data is available.

I found that some variables in the previous slide cannot be included

in the model together with car length (due to multicollinearity).

£ius CORPFIN 2503, Week 4 46/87

Basics Simple regression Regression assumptions Multiple regression

is a phenomenon due to a high interdependency

between the independent variables.

If we include highly correlated independent variables in the same

regression model, then this could cause multicollinearity.

Implications of multicollinearity:

• it increases the variance of the coe�cient estimates and make
the estimates very sensitive to minor changes in the model

• coe�cient estimates might become unstable and di�cult to
interpret

• T-test results are not trustworthy etc.

£ius CORPFIN 2503, Week 4 47/87

Basics Simple regression Regression assumptions Multiple regression

Multiple regression III

First, let’s check the summary statistics of the discount.

/* Creating discount variable: */

DATA work.car_data;

SET work.car_data;

discount=msrp/invoice-1;

RUN;

proc univariate data=work.car_data plots;

var discount;

run;

£ius CORPFIN 2503, Week 4 48/87

Basics Simple regression Regression assumptions Multiple regression

Properties of DISCOUNT

£ius CORPFIN 2503, Week 4 49/87

Basics Simple regression Regression assumptions Multiple regression

Properties of DISCOUNT II

£ius CORPFIN 2503, Week 4 50/87

Basics Simple regression Regression assumptions Multiple regression

Properties of DISCOUNT III

£ius CORPFIN 2503, Week 4 51/87

Basics Simple regression Regression assumptions Multiple regression

Properties of DISCOUNT IV

£ius CORPFIN 2503, Week 4 52/87

Basics Simple regression Regression assumptions Multiple regression

Correlation matrix

Let’s look at the correlation matrix of discount and other numerical

variables:

proc corr data=work.car_data;

var discount invoice enginesize cylinders horsepower

MPG_City MPG_Highway Length weight wheelbase;

run;

£ius CORPFIN 2503, Week 4 53/87

Basics Simple regression Regression assumptions Multiple regression

Correlation matrix II

£ius CORPFIN 2503, Week 4 54/87

Basics Simple regression Regression assumptions Multiple regression

Determinants of DISCOUNT

Let’s consider engine size as potential determinant for discount.

SAS code for scatterplot:

proc gplot data=work.car_data;

title ‘Scatter plot of discount and engine size’;

plot Discount * Enginesize=1;

run;

quit;

£ius CORPFIN 2503, Week 4 55/87

Basics Simple regression Regression assumptions Multiple regression

Determinants of DISCOUNT II

£ius CORPFIN 2503, Week 4 56/87

Basics Simple regression Regression assumptions Multiple regression

Determinants of DISCOUNT III

Let’s start with simple regression:

/* OLS regression: */

PROC REG DATA=work.car_data;

MODEL discount= enginesize;

RUN;

£ius CORPFIN 2503, Week 4 57/87

Basics Simple regression Regression assumptions Multiple regression

Simple regression

£ius CORPFIN 2503, Week 4 58/87

Basics Simple regression Regression assumptions Multiple regression

Simple regression II

We got that:

• intercept = 0.05782
• slope = 0.00952
• DISCOUNT = 0.05782 + 0.00952 × ENGINE SIZE.

Discount increases by 0.00952 for each additional liter of engine.

Plus one gets the 0.05782 discount for showing up in the car

dealership.

£ius CORPFIN 2503, Week 4 59/87

Basics Simple regression Regression assumptions Multiple regression

Simple regression III

We got:

• R-squared of the regression model = 0.1938
• Correlation coe�cient = 0.44021 (0.440212 = 0.1938).

=⇒ Engine size can explain 19.38% of variation in discount.

£ius CORPFIN 2503, Week 4 60/87

Basics Simple regression Regression assumptions Multiple regression

Simple regression IV

£ius CORPFIN 2503, Week 4 61/87

Basics Simple regression Regression assumptions Multiple regression

Simple regression V

Residuals seem to be independent from engine size. Residuals are

somewhat normally distributed.
£ius CORPFIN 2503, Week 4 62/87

Basics Simple regression Regression assumptions Multiple regression

Simple regression VI

£ius CORPFIN 2503, Week 4 63/87

Basics Simple regression Regression assumptions Multiple regression

Multiple regression IV

Let’s augment our model with additional independent variable � car

length.

Scatterplot code in SAS:

proc gplot data=work.car_data;

title ‘Scatter plot of discount and length’;

plot Discount * Length=1;

run;

quit;

£ius CORPFIN 2503, Week 4 64/87

Basics Simple regression Regression assumptions Multiple regression

Multiple regression V

£ius CORPFIN 2503, Week 4 65/87

Basics Simple regression Regression assumptions Multiple regression

Multiple regression VI

Regression code in SAS:

/* OLS regression: */

PROC REG DATA=work.car_data;

MODEL discount= enginesize length;

RUN;

£ius CORPFIN 2503, Week 4 66/87

Basics Simple regression Regression assumptions Multiple regression

Multiple regression VII

£ius CORPFIN 2503, Week 4 67/87

Basics Simple regression Regression assumptions Multiple regression

Multiple regression VIII

Car length is insigni�cant.

No improvement in R-squared.

Coe�cient estimates for intercept and engine size almost did not

change.

F-value = 51.77 (p-value < 0.0001). =⇒ we reject null hypothesis that all slopes are 0s (joint test). Thus, our model has some explanatory power. £ius CORPFIN 2503, Week 4 68/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression IX £ius CORPFIN 2503, Week 4 69/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression X Residuals seem to be well distributed. £ius CORPFIN 2503, Week 4 70/87 Basics Simple regression Regression assumptions Multiple regression Multicollinearity II According to the correlation matrix: • corr. between discount and MPG_City = �0.34 • corr. between discount and MPG_Highway = �0.33 • corr. between MPG_City and MPG_Highway = 0.94 Let's see what happens when both MPG_City and MPG_Highway are included in the regression model. £ius CORPFIN 2503, Week 4 71/87 Basics Simple regression Regression assumptions Multiple regression Multicollinearity III Dependent variable: DISCOUNT Independent variables Model 1 Model 2 Model 3 MPG_City �0.0016*** �0.0015** [7.53] [2.36] MPG_Highway -0.0014*** �0.0001 [7.11] [0.20] Intercept 0.1197*** 0.1247*** 0.1204*** [27.72] [23.76] [21.74] R2 0.117 0.106 0.118 Adjusted R2 0.115 0.104 0.113 Number of observations 428 428 428 In Model 3, the coe�cient estimate for MPG_Highway is insigni�cant and its value is very small (in absolute terms). One should include either MPG_Highway or MPG_City but not both variables in the regression. £ius CORPFIN 2503, Week 4 72/87 Basics Simple regression Regression assumptions Multiple regression Description for the previous table Table 2: Determinants of the discount on the car This table presents the results of OLS regressions where the dependent variable is a discount on the car in decimals (DISCOUNT). MPG_City is the number of miles a car can travel on a single gallon of gas while driving on city streets. MPG_Highway is the number of miles a car can travel on a single gallon of gas while driving on the highway. The absolute values of t-statistics based on standard errors are reported in brackets. ***, **, and * indicate signi�cance at 1%, 5%, and 10% levels, respectively. The description above should be above the table. £ius CORPFIN 2503, Week 4 73/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression: with drivetrain DRIVETRAIN is a categorical variable: ALL, FRONT, REAR. Let's check whether drivetrain is an important factor (using SAS procedure GLM): proc glm data=work.car_data; class drivetrain; MODEL discount = enginesize drivetrain / solution noint; run; £ius CORPFIN 2503, Week 4 74/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression: with drivetrain II £ius CORPFIN 2503, Week 4 75/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression: with drivetrain III £ius CORPFIN 2503, Week 4 76/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression: with drivetrain IV The model includes 3 slopes � one for each type of drivetrain (ALL, FRONT, REAR). Let's check whether discount is statistically di�erent for di�erent types of drivetrain: proc glm data=work.car_data; class drivetrain; MODEL discount= enginesize drivetrain/ solution; run; £ius CORPFIN 2503, Week 4 77/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression: with drivetrain V REAR type of drivetrain is a benchmark case. The discounts for ALL and FRONT drivetrains are statistically the same as for REAR drivetrain. The coe�cient estimate for engine size is almost the same as before. £ius CORPFIN 2503, Week 4 78/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression: Results with and without intercept REAR type of drivetrain is a benchmark case on the LHS. Its impact on discount is equal to intercept on the LHS. To get coef. estimates for FRONT on the RHS, we should add intercept and FRONT on the LHS. The same thing for ALL. £ius CORPFIN 2503, Week 4 79/87 Basics Simple regression Regression assumptions Multiple regression Multiple regression XI In the same way, we can check whether car type (TYPE) impacts the discount. Also car manufacturer (MAKE) or ORIGIN. By including a number of dummy variables (one for each type of the cars), we estimate �xed e�ects regression. Fixed e�ects regressions are superior to OLS regressions in most cases. £ius CORPFIN 2503, Week 4 80/87 Basics Simple regression Regression assumptions Multiple regression Fixed e�ects regressions If we are not interested in the coef. estimates of particular dummy, we do not need to report full table. The table can be condensed as follows. Dependent variable: DISCOUNT Independent variables Model 1 Model 2 Model 3 Model 4 EngineSize 0.0095*** 0.0087*** 0.0088*** 0.0088*** [10.12] [8.49] [8.10] [11.79] Intercept 0.0578*** [18.17] Drivetrain �xed e�ects No Yes Yes Yes Car type �xed e�ects No No Yes Yes Manufacturer �xed e�ects No No No Yes R2 0.194 0.204 0.222 0.767 Number of observations 428 428 428 428 £ius CORPFIN 2503, Week 4 81/87 Basics Simple regression Regression assumptions Multiple regression Interaction terms Suppose we would like to check whether a discount is greater for American cars with larger engines. To do so, we need to create an interaction term between engine size and a dummy for American cars. Then in the regression model, we need to include interaction term, engine size, and dummy for American cars besides other independent variables. £ius CORPFIN 2503, Week 4 82/87 Basics Simple regression Regression assumptions Multiple regression Interaction terms II SAS code: /* Generating a dummy for US made car and an interaction term between this dummy and engine size: */ DATA work.car_data; SET work.car_data; US_car=0; IF Origin="USA" then US_car=1; US_car_engine=US_car*enginesize; RUN; proc glm data=work.car_data; class drivetrain; MODEL discount = enginesize US_car_engine US_car drivetrain / solution; run; £ius CORPFIN 2503, Week 4 83/87 Basics Simple regression Regression assumptions Multiple regression Interaction terms III American cars are subject to lower discounts (by 0.016). However, American cars with larger engines are discounted more. £ius CORPFIN 2503, Week 4 84/87 Basics Simple regression Regression assumptions Multiple regression Interaction terms IV The coe�cient estimate for engine size is 0.0075 • discount increases by 0.0075 for each additional liter of engine. The interaction term is 0.0039 (signi�cant at 5% level): • discount increases by additional 0.0039 for each additional liter of engine for American cars • thus, in total, discount increases by 0.0114 for each additional liter of engine for American cars. £ius CORPFIN 2503, Week 4 85/87 Basics Simple regression Regression assumptions Multiple regression Interaction terms V Let's predict the discount for the following car: • EngineSize = 3 • Origin = USA • DriveTrain = All. DISCOUNT = 0.0660 + 0.0075 × EngineSize + 0.0039 × US_car_engine � 0.0161 × US_car + 0.0025 DISCOUNT = 0.0865 £ius CORPFIN 2503, Week 4 86/87 Basics Simple regression Regression assumptions Multiple regression Required reading Konasani, V. R. and Kadre, S. (2015). �Practical Business Analytics Using SAS: A Hands-on Guide�: chapters 9, 10. £ius CORPFIN 2503, Week 4 87/87 Basics Basics Simple regression Simple regression Regression assumptions Regression assumptions Multiple regression Multiple regression