CORPFIN 2503 – Business Data Analytics
2021 S2, Workshop 5: Applications of logit and probit models
£ius
1 Financial data of NYSE �rms
Let’s download �nancial data for all stocks traded on NYSE using Eikon:
1. Search for Screener
2. Filter for stocks traded in the USA (country of exchange)
3. Filter for �rm incorporated in the USA
4. Filter for �rm with headquarters in the USA
5. Filter for stocks traded on NYSE
6. Consider only �rms from the following sectors:
� Materials
� Consumer Staples
� Industrials
� Energy
� Health Care
� Consumer Discretionary
7. select the following variables:
� Dividend per Share – Actual (FY0, USD)
� Total Assets, Reported (FY0, USD)
1
� Cash Flow (FY0, USD)
� Cash Flow (FY-1, USD)
� IPO date
� Beta
� Net Income After Taxes (FY0, USD)
� Retained Earnings (Accumulated De�cit) (FY0, USD)
� Total Equity (FY0, USD)
� Total Debt (FY0, USD).
8. Click on Excel icon to save the data as Excel �le.
2 Linear probability models
Suppose you are stock analyst. You would like to identify �rms that currently
do not pay dividends but might start paying dividends soon using multiple linear
regressions (linear probability models) and data downloaded in Task 1. Assume that
dividend payers are �rms that:
1. are larger (in terms of total assets)
2. are less risky (in terms of beta)
3. have greater retained earnings to total equity ratio.
In Excel, create a back-up copy of the �le generated in Task 1 and then:
1. delete Company Name, Country of Headquarters, Country of Incorporation,
Country of Exchange, and Exchange Name columns
2. change the format of all numerical variables (except for IPO date) to `General’
3. rename the variables (optional)
4. save data as CSV �le (e.g., NYSE.csv).
Then in SAS:
2
1. import the data into SAS
2. generate additional variables: dividend-payer dummy, natural logarithm of
�rm assets, retained earnings to total equity ratio
3. compute the descriptive statistics of the independent variables to see whether
they have outliers
4. remove the outliers (winsorize at the top and bottom either 1% or 5% of their
distributions to avoid the in�uence of outliers)
5. generate correlation matrix
6. estimate a linear regression model whether the dependent variable is dividend-
payer dummy, and the independent variables are natural logarithm of �rm
assets, beta, and retained earnings to total equity ratio
7. compute the predicted values for dividend-payer dummy
8. from the newly created �le remove �rms that pay dividends and have the
predicted values for dividend-payer dummy less than 0.5
9. sort the remaining �rms in the descending order by the predicted values for
dividend-payer dummy.
3 Linear probability models II (at home)
Extend Task 2 to include in addition the following independent variables: �rm age
(or the natural logarithm of �rm age if it is more appropriate), positive net income
dummy, cash �ow to assets ratio, cash �ow growth, and debt to assets ratio. That
is, assume that dividend payers are �rms that:
1. are larger (in terms of total assets)
2. are older (2021 minus IPO year):
� SAS code: age=2021-year(IPO_date);
3. are less risky (in terms of beta and debt-to-assets ratio)
4. more cash �ows (cash �ow to assets ratio)
3
5. feature low growth (in terms on annual cash �ow growth)
6. are pro�table
7. have greater retained earnings to total equity ratio.
4 Logit/probit models (at home)
Repeat the analysis in Task 3 using logit and probit models. Compare the top 20
�rms in all three cases (linear probability, logit, and probit models).
4
Financial data of NYSE firms
Linear probability models
Linear probability models II (at home)
Logit/probit models (at home)