Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
BUSANA 7001 – Predictive and Visual Analytics for Business
Week 5: Predictive analytics using logit regressions
£ius BUSANA 7001, Week 5 1/57
Copyright By PowCoder代写 加微信 powcoder
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Logit models
Probit models
Logit models using SAS Visual Analytics
£ius BUSANA 7001, Week 5 2/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Introduction
We use simple and multiple linear regressions if the dependent variable is continuous.
For example, the dependent variable is the car price.
What if the dependent variable is a dummy variable, such as SEDAN=1 if a car type is sedan and 0 otherwise?
£ius BUSANA 7001, Week 5 3/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Introduction II
Other interesting issues:
• dividend payers vs non-payers
• rms going bankrupt and not
• rms issuing equity vs debt securities
• credit rating downgrades and upgrades
• will a rm become an M&A target or not • etc.
£ius BUSANA 7001, Week 5 4/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Introduction III
If we use OLS regressions in these cases, we would estimate linear probability models (LPMs).
It is better to estimate logit or probit models.
£ius BUSANA 7001, Week 5 5/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
LPM example
Let’s estimate an LPM where the dependent variable equals 1 if a rm pays dividends using ASX data from Workshop 3:
data work.asx;
set work.asx;
label dividend_payer=”Dividend payer”;
ln_assets=log(assets);
re_equity_d=0;
if re_equity>0 & re_equity ne . then re_equity_d=1;
PROC REG DATA=work.asx;
MODEL dividend_payer=re_equity_d ln_assets;
£ius BUSANA 7001, Week 5 6/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
LPM example II
£ius BUSANA 7001, Week 5 7/57
LPM Logit models Probit models Logit models using SAS Visual Analytics
LPM example II
The results suggest that:
• rms with positive RE/equity ratio have 25.6% higher probability of paying dividends
• the increase in ln(assets) by 1 leads to 9.4% higher probability of paying dividends.
If RE/equity > 0 and ln(assets) = 15 then a rm has 18% probability of paying dividends (1.48192 + 0.25610 + 15 × 0.09350 = 0.17668).
Let’s look at the residuals.
£ius BUSANA 7001, Week 5 8/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
LPM example III
£ius BUSANA 7001, Week 5 9/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Slide #62 of Lecture 3
£ius BUSANA 7001, Week 5 10/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
LPM example IV
Residuals are not normally distributed and are subject to heteroscedasticity.
£ius BUSANA 7001, Week 5 11/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Introduction II
LPMs are subject to:
1. heteroskedasticity (standard errors will be wrong, and hypothesis tests will be incorrect)
2. residuals will not be normally distributed (as the residuals can take on two possible values)
3. the predicted values for the dependent variable might be greater than 1 or lower than 0 (as LPM assumes the linear impact of the independent variable):
• if the dependent variable equals 1 if a rm pays dividends and 0 otherwise, values outside the region [0, 1] are not logical
• probability of paying dividends cannot be negative.
The solution is to use logit (logistic) or probit models.
£ius BUSANA 7001, Week 5 12/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Suppose p is the probability that Y = 1: p=P(Y =1).
Functional form of LPM with 2 independent variables is:
p=β0 +β1×1 +β2×2.
The LHS of the LPM can range from 0 to 1, but the RHS can vary
from −∞ to ∞.
=⇒ We need to transform the dependent variable to eliminate the
0 to 1 constraint.
We can eliminate the upper bound (p = 1) by using the ratio 1−p.
£ius BUSANA 7001, Week 5 13/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
1−p is the odds of an event occurring.
Let’s assume that the probability of success of some event is 0.75. Then the probability of failure is 1 0.75 = 0.25.
The odds of success are dened as the ratio of the probability of success over the probability of failure.
=⇒ The odds of success are 0.75/0.25 = 3. =⇒ The odds of success are 3 to 1.
Similarly, if the probability of success is 0.5, i.e., 50-50 percent chance, then the odds of success are 1 to 1.
£ius BUSANA 7001, Week 5 14/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Theory III
1−p is the odds of an event occurring.
0.0001 0.0001 0.01 0.0101 0.1 0.1111
• When probability is either very small or very big, changes in odds hardly impact probability.
0.25 0.3333 • 0.5 1
When probability is between 0.1 and 0.9, changes in odds substantially impact probability.
0.75 3 0.9 9 0.99 99 0.9999 9999
• The range of odds is between 0 and ∞.
BUSANA 7001, Week 5 15/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
We can eliminate the lower bound of 0 by taking the natural logarithm of the odds ratio.
y=ln(Odds)
0 1 2 3Odds4 5 6
-2 -4 -6 -8
£ius BUSANA 7001, Week 5 16/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
p The log odds of the event occurring is ln Odds = ln 1−p .
0.0001 0.01 0.1 0.25 0.5 0.75 0.9 0.99 0.9999
Odds ln(Odds)
• When probability is either very small or very big, changes in log odds hardly impact probability.
• When probability is between 0.1 and 0.9, changes in log odds substantially impact probability.
• Ifp<0.5,thenodds<1and log odds < 0.
• Ifp=0.5,thenodds=1and log odds = 0.
• Ifp>0.5,thenodds>1and log odds > 0.
BUSANA 7001, Week 5
0.0001 0.0101 0.1111 0.3333
9.210 4.595 2.197 1.099
0 1.099 2.197 4.595 9.210
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Functional form of logit (logistic) model with 2 independent variables:
lnOdds=ln 1−p =β0 +β1×1 +β2×2.
The dependent and independent variables can vary between −∞ and ∞.
£ius BUSANA 7001, Week 5 18/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Theory VII
Let’s derive the predicted value of p: p
ln 1−p =β0 +β1×1 +β2×2,
where e ≈ 2.71828.
= eβ0+β1×1+β2×2 , p = eβ0+β1×1+β2×2 ,
1+eβ0 +β1 x1 +β2 x2
£ius BUSANA 7001, Week 5 19/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Theory VIII
The impact of Q on the predicted probability depends on Q: • if Q is low, the impact is small
• if Q is neither low or high, the impact is big
• if Q is high, the impact is small.
£ius BUSANA 7001, Week 5 20/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
In case of logit models, predicted probability is never < 0 or > 1, and the line is not straight.
£ius BUSANA 7001, Week 5 21/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Functional form of logit (logistic) model:
eβ0 +β1 x1 +β2 x2
p = 1 + eβ0+β1×1+β2×2 .
Supposethateβ0+β1×1+β2×2 =1,000,000then: p = 1,000,000 < 1.
1+1,000,000 Suppose that eβ0+β1x1+β2x2 = 0.0000001 then:
p = 0.0000001 > 0. 1 + 0.0000001
=⇒ p is always between 0 and 1.
£ius BUSANA 7001, Week 5 22/57
Basics LPM Logit models Probit models Logit models using SAS Visual models
In SAS, there are several procedures to estimate logit models: • LOGISTIC (our main choice)
• PHREG and • CATMOD.
BUSANA 7001, Week 5
Basics LPM Logit models Probit models Logit models using SAS Visual models II
Now let’s estimate logit model using the same dataset:
PROC LOGISTIC DATA=work.asx;
MODEL dividend_payer (EVENT=’1′) = re_equity_d ln_assets;
Option (EVENT=’1′) makes SAS estimate the probability of paying dividends.
£ius BUSANA 7001, Week 5 24/57
Basics LPM Logit models Probit models Logit models using SAS Visual models III
£ius BUSANA 7001, Week 5 25/57
Basics LPM Logit models Probit models Logit models using SAS Visual models IV
£ius BUSANA 7001, Week 5 26/57
Basics LPM Logit models Probit models Logit models using SAS Visual models V
£ius BUSANA 7001, Week 5 27/57
Basics LPM Logit models Probit models Logit models using SAS Visual models VI
Coecient estimates for rm size and for positive RE/equity ratio dummy are signicantly positive.
The results suggest that larger rms as well as rms with positive RE/equity ratio are more likely to pay dividends:
• a 1 unit increase in RE/equity ratio dummy will result in a 1.6120 increase in the log odds to pay dividends (if there are 2 rms with identical ln(assets), the log odds for the one with positive RE/equity ratio would be 1.6120 greater than the log odds for the rm with negative RE/equity ratio)
• a 1 unit increase in ln(assets) will result in a 1.1059 increase in the log odds to pay dividends.
£ius BUSANA 7001, Week 5 28/57
Basics LPM Logit models Probit models Logit models using SAS Visual models VII
To compute the probability of paying dividends for a particular rm, we simply need to plug in its ln(assets) and RE/equity dummy value in the equation below:
e−22.4746+1.6120×RE/equity dummy+1.1059×ln(assets)
p = 1 + e−22.4746+1.6120×RE/equity dummy+1.1059×ln(assets) .
Suppose ln(assets)=20 and RE/equity dummy=1, then p=0.778.
Suppose ln(assets)=17 and RE/equity dummy=0, then p=0.025.
Suppose ln(assets)=20 and RE/equity dummy=0, then p=0.412.
Suppose ln(assets)=17 and RE/equity dummy=1, then p=0.113.
£ius BUSANA 7001, Week 5 29/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Descriptive statistics
Let’s look at the properties of ln_assets and re_equity_d:
proc univariate data=work.asx plots;
var ln_assets;
proc freq data=work.asx;
tables dividend_payer * re_equity_d /
norow nocol nopercent;
£ius BUSANA 7001, Week 5 30/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Descriptive statistics II: ln_assets
£ius BUSANA 7001, Week 5 31/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Descriptive statistics III: ln_assets
ln(assets)=20: assets=e20= 485,165,195. Very large rms! ln(assets)=17: assets=e17= 24,154,952. Average rms. The former rms are 20 times larger than the latter rms.
£ius BUSANA 7001, Week 5 32/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Two-way table:
Descriptive statistics IV: re_equity_d
£ius BUSANA 7001, Week 5 33/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Odds ratios
Odds ratio estimates are used to see the exact impact of each individual variable on the odds of the positive outcome of the model.
E.g., the odds ratio estimate for RE/equity dummy indicates the impact of RE/equity dummy on the odds of paying dividends:
• What is the change in the odds when there is a unit change in the independent variable?
£ius BUSANA 7001, Week 5 34/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Odds ratios II
Odds ratio for RE/equity dummy is 5.013.
Suppose ln(assets)=20 and RE/equity dummy=1:
• p(div. payer)=0.778 & p(div. non-payer)=1 0.778=0.222
• odds of paying over not paying dividends = 0.778 = 3.51. 0.222
Suppose ln(assets)=20 and RE/equity dummy=0:
• p(div. payer)=0.412 & p(div. non-payer)=1 0.412=0.588
• odds of paying over not paying dividends = 0.412 = 0.70. 0.588
Change in odds with unit change in RE/equity dummy is
3.51 = 5.013. 0.70
£ius BUSANA 7001, Week 5 35/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Odds ratios III
For 2 otherwise identical rms, the odds to pay dividends:
• for a rm with the positive RE/equity would be exp(1.6120) = 5.013 times greater
• for a rm with ln(assets) greater by 1 unit would be exp(1.1059) = 3.022 times greater.
£ius BUSANA 7001, Week 5 36/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Marginal eects
Interpreting the impacts on log odds and odds might be tricky.
Why not look at the impact of a variable on the probability to pay dividends, holding all other variables in the model constant?
Yes, we can. This is known as a marginal eect.
However, it depends on the the variable values.
Thus, we compute marginal eect at each observation and then calculate the sample average of individual marginal eects to obtain the overall marginal eect.
£ius BUSANA 7001, Week 5 37/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Marginal eects II
SAS procedure LOGISTIC does not compute marginal eects but a procedure QLIM does:
PROC QLIM DATA=work.asx;
MODEL dividend_payer = re_equity_d ln_assets
/ discrete(d=logistic);
OUTPUT OUT=work.marginal_effects MARGINAL;
PROC MEANS DATA=work.marginal_effects mean min max maxdec=3;
VAR Meff_P2_re_equity_d Meff_P2_ln_assets;
title ‘Average of the Individual Marginal Effects
(Logit Model)’;
£ius BUSANA 7001, Week 5 38/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
The results:
On average:
Marginal eects III
• having positive RE/equity ratio increases the probability of paying dividends by 0.106
• ln(assets) greater by 1 unit increases the probability of paying dividends by 0.073.
These values are smaller than those from the LPM (0.25610 & 0.09350).
£ius BUSANA 7001, Week 5 39/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Model t statistics
If we estimate several logit models, how do we know which one is the best?
AIC (Akaike Information Criterion) and SC ( ) are used to compare two or more models and pick the best one.
A model with minimum AIC and SC values are preferred:
• such model would have fewer independent variables and • better t to the data.
£ius BUSANA 7001, Week 5 40/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Model t statistics II
Let’s estimate 4 logit models and compare their t statistics:
PROC LOGISTIC DATA=work.asx;
MODEL dividend_payer (EVENT=’1′) = re_equity_d ln_assets;
PROC LOGISTIC DATA=work.asx;
MODEL dividend_payer (EVENT=’1′) = re_equity_d assets;
PROC LOGISTIC DATA=work.asx;
MODEL dividend_payer (EVENT=’1′) = re_equity_d;
PROC LOGISTIC DATA=work.asx;
MODEL dividend_payer (EVENT=’1′) = ln_assets;
£ius BUSANA 7001, Week 5 41/57
Basics LPM
Logit models
Probit models Logit models using SAS Visual Analytics
Model t statistics III
Summary of results:
• Model 1 is the best (its indep. var.: RE/equity dummy and ln(assets)).
• Model 3 is the worst (its indep. var.: RE/equity dummy).
• Models with covariates are better than the model without any covariate.
In general, AIC and SC lead to the same model being selected.
£ius BUSANA 7001, Week 5 42/57
Basics LPM Logit models Probit models Logit models using SAS Visual models
The key dierence between probit and logit models is their functional form.
For logit (logistic) models, it is the cumulative standard logistic distribution function:
eβ0 +β1 x1 +β2 x2
p = 1 + eβ0+β1×1+β2×2 .
For probit models, it is the cumulative standard normal probability distribution function:
p = Φ (β0 + β1×1 + β2×2) .
£ius BUSANA 7001, Week 5 43/57
Basics LPM Logit models Probit models Logit models using SAS Visual models II
Now let’s estimate probit model using the same dataset:
PROC PROBIT DATA=work.asx;
MODEL dividend_payer (EVENT=’1′) = re_equity_d ln_assets;
PROC LOGISTIC DATA=work.asx;
MODEL dividend_payer (EVENT=’1′) = re_equity_d ln_assets
/ LINK=PROBIT;
£ius BUSANA 7001, Week 5 44/57
Basics LPM Logit models Probit models Logit models using SAS Visual models III
£ius BUSANA 7001, Week 5 45/57
Basics LPM Logit models Probit models Logit models using SAS Visual models IV
£ius BUSANA 7001, Week 5 46/57
Basics LPM Logit models Probit models Logit models using SAS Visual models V
£ius BUSANA 7001, Week 5 47/57
Basics LPM Logit models Probit models Logit models using SAS Visual models VI
Coecient estimates for rm size and for positive RE/equity ratio dummy are signicantly positive.
The results suggest that larger rms as well as rms with positive RE/equity ratio are more likely to pay dividends.
Interpretation of the coecients in probit regression is not as straightforward as the interpretations of coecients in LPM or logit models.
If LOGISTIC procedure is used, then one gets AIC and SC values that can be used to identify the best model (which is the one with the lowest AIC and SC values).
£ius BUSANA 7001, Week 5 48/57
Basics LPM Logit models Probit models Logit models using SAS Visual models VII
To compute the predicted probability of paying dividends for a particular rm, we simply need to plug in its ln(assets) and RE/equity dummy value in the equation below:
p = Φ(−11.7732 + 0.8823 × RE/equity dummy + 0.5767 × ln(assets))
where Φ is the cumulative standard normal distribution function (Excel function: NORM.S.DIST).
Suppose ln(assets)=20 and RE/equity dummy=1, then p=0.740. Suppose ln(assets)=17 and RE/equity dummy=0, then p=0.024. Suppose ln(assets)=20 and RE/equity dummy=0, then p=0.405. Suppose ln(assets)=17 and RE/equity dummy=1, then p=0.139.
The results are very similar to those of logit model.
£ius BUSANA 7001, Week 5 49/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
Marginal eects
SAS procedure PROBIT also does not compute marginal eects; thus, we can use the procedure QLIM:
PROC QLIM DATA=work.asx;
MODEL dividend_payer = re_equity_d ln_assets
/ discrete(d=probit);
OUTPUT OUT=work.marginal_effects2 MARGINAL;
PROC MEANS DATA=work.marginal_effects2 mean min max
VAR Meff_P2_re_equity_d Meff_P2_ln_assets;
title ‘Average of the Individual Marginal Effects
(Probit Model)’;
£ius BUSANA 7001, Week 5 50/57
Basics LPM Logit models Probit models Logit models using SAS Visual Analytics
The results:
On average:
Marginal eects II
• having positive RE/equity ratio increases the probability of paying dividends by 0.109
• ln(assets) greater by 1 unit increases the probability of paying dividends by 0.071.
These values are almost identical to those from logit model.
£ius BUSANA 7001, Week 5 51/57
Basics LPM Logit models Probit models Logit models using SAS Visual models VIII
The results are very similar to those of logit model.
Logit and probit models lead to essentially the same results in most cases.
Logit models tend to converge a little bit faster. =⇒ Thus, use either of them.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com