Practice Question: Data Analysis (20 marks)
Note: Where you are required to generate R Code and output, please copy and paste the relevant code and output(s) into your word file. Pls upload the word file to the submission link in iLearn.
[Start time: 5pm April 1; End time: 5pm April 26]
In the attached dataset named as “firm financial dataset”, you are provided with the financial information of US firms in a long sample period. It is a panel dataset.
Here is a list of variables:
gvkey: firm identifier; each firm has a unique identifier;
fyear: fiscal year of the data
conm: firm name
sic: SIC industry code (https://en.wikipedia.org/wiki/Standard_Industrial_Classification)
at: total asset (millions)
bkvlps: book value per share (dollars)
ceq: common equity (millions)
che: cash and short-term investments (millions)
csho: common shares outstanding (millions)
dlc: debt in current liabilities (millions)
dltt: long term debt (millions)
ebitda: earnings before interest, depreciation and amortisation (millions)
invt: inventory (millions)
ppent: property, plant and equilibrium (millions)
sale: net sales (millions)
prcc_f: end of fiscal year share price (dollars)
Requirements
• [5 marks] [just copy your code for this part]
First, remove observations with SIC codes between 6000 and 6799 and keep observations with fiscal year between 1970 and 2015.
Second, construct variables based on the sample remaining from the first step using the following definitions:
• Book leverage: book leverage=total liability/total asset=(DLTT+DLC)/AT;
• Cash holding: cash holding=cash and short-term investments/total asset=CHE/AT;
• Tangibility: tangibility=(Inventory+ Net Property, Plant and Equipment)/Total Asset=(INVT+PPENT)/AT;
• Market to book ratio: M/B= stock price of fiscal year/book value per share=PRCC_F/BKVLPS;
• Size: size=log(SALE)
• Profitability: profitability=EBITDA/Total Assets=EBITDA/TA;
b) Produce summary statistics table (number of observations, mean, median, standard deviation, Min, and Max) of the 6 variables above. Use sample from years of 1970-2015. (3 marks)
c) In order to study the determinants of leverage across firms, we have excluded firms from the financial industry [e.g., banks] (SIC: 6000 to 6799). Why do we need to exclude these firms? (1 mark)
d) Use sample from years of 1970-2015. Run an Ols regression. The Y variable is Book leverage. The X variables are Tangibility, Market to book ratio, Size and Profitability.
You can use the method of Rajan and Zingales (1995) by averaging the Y and X variables across fiscal years for each firm and then run the OLS regression.
[Note: by doing so, we finally reach a cross sectional data set, and each firm has one Y variable and four X variables. An easy way to achieve that is to use the function “aggregate” https://www.statmethods.net/management/aggregate.html; for instance, aggregate(.~gvkey,data=data,mean) ] ;
Present your estimation results in an organized table (similar to Table IX of Ranjan and Zingales, 1995). [5 marks]
e) Run TSLS on the sample in d, where Y variable is Book leverage, and the X (endogeneous) variable is Tangibility and Z (instrument) variable is size, use Market to book ratio and profitability as exogeneous variables. Present the regression results. [4 marks]
[Hint: you can install the package of “AER”, and then use the function of “ivreg”].
d) Media commentators have been saying that U.S. corporations hold too much cash for
a while. Plot the median and mean of cash holding by year. Is there any trend in cash holding in your sample? (2 marks)
[Note: each year from 1970 to 2015, you need to calculate the median and mean of the cash holdings for all the firms. An easy way to achieve that is to use the function “aggregate”. Then you can plot it.].