CS计算机代考程序代写 finance Excel Assignment 1

Assignment 1
Empirical Finance: Methods and Applications February 13, 2021
• Datasets for problems 4, 5, and 6 are available on insendi.
• You should submit a single pdf solution containing answers to all sub-parts of all problems (including
4-7). Typewritten solutions are preferred but handwritten and scanned solutions are acceptable.
• Marks for each problem are listed below.
• In addition, please submit code for problems 4-7 in the form of an R project. This should be a zipped folder that contains an R Project, a single R file with answers to all relevant parts of all problems, and all csv files (including those for 4-6 and any you download for problem 7). I should be able to download and run your R file directly. Please comment your code to make it as easy to interpret as possible.
• Your marks depend on clarity of exposition in solutions and code. This includes figures and regression results.
• You may discuss all problems with classmates but each student must independently write and submit their own solution. Solutions and code that have been clearly copied will cause the full assignment to receive 0 marks and may invite further disciplinary action.
Problem 1 (5 marks)
Suppose we see 5 observations of yi, Di, shown in the table below:
Consider the following linear model:
yi Di 10 81 41 00 31
yi = δ0 + δ1Di + vi.
Suppose we estimate this model on the data above via OLS. Please explicitly find δˆOLS and δˆOLS. 01
Solution: δ0 = 0.5, δ1 = 4.5.
1

Problem 2 (10 Marks)
Relative to the United Kingdom, the United States has borrower friendly laws surrounding residential mort- gage default. Many US states are Non-Recourse—that is, if borrowers stop making the mortgage payments, lenders cannot hold them responsible beyond seizing the home itself. On the other hand, the United King- dom has Full-Recourse: lenders may seize cars, investments, garnish wages, et cetera. Many believe that the relative leniency of laws in the United States is responsible for higher rates of mortgage default.
For the sake of simplicity, assume laws may take only two forms: Non-Recourse (in the United States) or Full-Recourse (in the United Kingdom). Imagine we are interested in the causal (treatment) effect of Non-Recourse laws on mortgage default.
(a) Denote mortgage default for a borrower i by Di. In potential outcomes notation, write the average treatment effect of Non-Recourse laws on default. (3 marks)
Solution: Define Di1 to be the potential outcome for borrower i in the presence of Non-Recourse laws. Define Di0 to be the potential outcome for borrower i under Full-Recourse laws. The average treatment effect is defined to be:
ATE = E[Di1 − Di0]
(b) Suppose we compare the average default rates in the United States to the average default rates in the
United Kingdom. Write this comparison in potential outcomes notation. (3 marks) Solution:
E[Di1|Borrower i in US] − [Di0|Borrower i in UK]
(c) Why does the expression in part (a) differ from that in part (b)? Please provide an explanation that is not simply mathematical, but that provides some intuition. Would you expect the answer in (b) to be higher or lower than that in (a)? Why? (4 marks)
Solution: There are many ways to describe why the expression in (a) and (b) might be different. One way is to break the above into two components:
E[Di1|Borrower i in US] − [Di0|Borrower i in UK] = E[Di1|Borrower i in US] − E[Di0|Borrower i in US] 􏰐 􏰏􏰎 􏰑
Effect of Non-Recourse in US
+ E[Di0|Borrower i in US] − [Di0|Borrower i in UK]
􏰐 􏰏􏰎 􏰑
Difference in Default in US vs. UK under Full-Recourse ̸= E[Di1 − Di0]
The first term is often referred to as the treatment effect on the treated, and captures the fact that recourse laws might impact borrowers in the US differently than the UK (perhaps because of other regulatory differences or personality types). The second term is often referred to as the selection effect, and captures the fact that borrowers in different countries might have differences in default behavior, even in the absence of any difference in bankruptcy laws.
In general, cogent arguments that (a) is higher or lower than (b) can be made. The important part is to directly tie it to the framing. One example is that borrowers in the US may be less concerned with the social stigma surrounding default than those in the UK. This might be evidence for for the existence of a selection effect:
E[Di0|Borrower i in US] − [Di0|Borrower i in UK] > 0 which would cause (b) to be higher than (a).
2

Problem 3 (10 marks)
Suppose the relationship between yi and xi is as follows:
yi = β0 + β1xi + vi,
where xi is observable, E[vi|xi] = 0 and E[xi] = 0. However, suppose we do not see yi, but instead observe
yi∗ = yi + ηi. Consider the regression:
You may assume that ηi has mean 0 and variance ση2.
y i∗ = β 0 + β 1 x i + u i ,
(a) Suppose that Cov(x , η ) = 0. Will the OLS estimator βols using y∗ instead of y be biased for β ? Show
ii1ii1 why or why not. (5 marks)
Solution:
y∗ = β0 + β1xi + vi + ηi 􏰐 􏰏􏰎 􏰑
ui
βols = cov(β0 +β1xi +vi +ηi,xi)
var(xi)
=β var(xi)+cov(xi,vi+ηi)
1 var(xi) var(xi) = β1
(b) Suppose instead that ηi = γxi +εi, where γ ̸= 0 and Xi and εi are independent. Will the OLS estimator βols using y∗ instead of y be biased for β ? Show why or why not. (5 marks)
1ii1
y∗ =β0 +β1xi +vi +γxi +εi 􏰐 􏰏􏰎 􏰑
ui
βols = cov(β0 +β1xi +γxi +εi +vi,xi)
var(xi)
=(β1 +γ)var(xi) + cov(xi,vi +εi)
var(xi) var(xi) = β1 + γ ̸= β1
3

Problem 4 (20 marks)
The dataset rollingsales manhattan.xls contains details on 2020 real estate transactions in Manhattan.1
(a) Load the data into R and perform the following basic data cleaning exercises: 2
• Relabel the column names to remove any spaces
– One trick is names(dataset) < − gsub(” ”, ” ”, names(dataset). • Remove any observations with the sale price equal to 0. Using this cleaned data, what neighborhood has the highest average sale price? (4 marks) Solution: The Fashion District has the highest average sale price. (b) Create a new variable equal to log(sale price). Create another variable representing the age of the property in 2020 (i.e. years since the year it was built). Run an OLS regression of log(sale price) on age and a set of dummy variables for each neighborhood (omitting one). Report the coefficient on age. What does this indicate about the relationship between age and sale price in the sample as a whole? (4 marks) Solution: The coefficient on age is -0.0028. This indicates a slight negative relationship between age and sale price. (c) Run an OLS regression of log(sale price) on age, but use only data from the Upper East Side below 79th street.3 Report the coefficient on age. What does indicate about the relationship between age and sale price in this particular neighborhood. (4 marks) Solution: The coefficient on age is 0.008. This indicates a slightly larger positive relationship between age and sale price. (d) Plot the mean and median sale price and the total quantity of sales across months in 2020. This can be on multiple figures or a single figure, and you may choose the plotting style that you feel best presents the data. Please comment on and discuss any major patterns you see in these plots. (4 marks) Solution: Figures 1 and 2 below show the relevant plots. Students could comment on many aspects of the data: the extreme difference between mean and median, the drop off in sales in March, the spike in mean price in late spring, etc (although digging into the source of this last point is important). (e) Create a chart showing a new (and hopefully interesting) pattern of your choice using this data. This may be a plot of any type, and may relate to the sale price or not. Please briefly describe the plot you have created. (4 marks) 1This data, and data for other New York City boroughs, can be found at https://www1.nyc.gov/site/finance/taxes/ property- rolling- sales- data.page. 2One option is to covert this to a .csv file and use the read csv command we have worked with in class. However, I recommend using library(readxl) and the command read xls(). Note that the subcommand skip (e.g. read xls(dataset, skip=...)) can be useful for getting rid of the useless rows at the top of an excel spreadsheet. 3NEIGHBORHOOD==”UPPER EAST SIDE (59-79)”. 4 Figure 1: Mean and Median Sale Price by Month 100 75 50 25 0 3 6 9 12 Month Mean Median Figure 2: Quantity of Sales by Month 1250 1000 750 500 250 0 3 6 9 12 Month 5 Quantity of Sales Price (100k USD) Problem 5 (15 marks) On February 1st, 2015 the New England Patriots won Superbowl 49 to claim the NFL championship for the 2014-2015 season. The NFL commissioner believed that this had a major beneficial effect on the US economy and requested a study of the impact on stock prices. His researchers collected weekly data on stock prices for set of US firms from February 2014 to January 2016. Meanwhile—because they were worried about the potential for aggregate time trends—the researchers collected data from a set of Chinese firms included in the SSE Index over the same period to use as a control group. On insendi you will find a dataset labeled patriots.csv. In it you will find four variables: • exchange: A string variable indicating whether the stock is listed on the Shanghai Stock Exchange (sse) or the New York Stock Exchange (nyse) • date: Date • stock id: A variable assigning a unique id number for each stock • price: The closing price of the stock in question on the listed date (a) In R, perform a difference-in-difference analysis that compares US and Chinese stock prices, before and after February 1st, 2015. In particular, let Di be a dummy variable that is equal to one for US stocks and 0 for Chinese stocks. Let Tt be a dummy variable that is equal to 1 for any day after February 1st 2015 and 0 otherwise. Let yi represent the share price. Estimate the following regression model: yit =β0 +β1Di ×Tt +β2Di +β3Tt +vit Report the values of βˆOLS, βˆOLS, βˆOLS, and βˆOLS. (5 marks) Solution: βˆ0 = $77.67 βˆ1 = −$42.17 βˆ2 = $21.23 βˆ3 = $45.15 (b) If the necessary assumptions were true, which of these coefficients would represent the causal effect of a Patriots win on US stock prices. What would your estimated parameter suggest about this effect? (5 marks) Solution: If the difference-in-difference assumptions hold, βˆ1 = −$42.17 may be interpreted as the causal effect of the patriots win on stock prices. Under those assumptions, this would suggest that Superbowl win had a negative impact on stock prices. (c) Create a plot of the average stock prices in (i) the US and (ii) China for every week between February 2014 and January 2016. Looking at the plot, do you believe the answer you suggested in part (b) is an accurate representation of the impact of the Patriots Superbowl win on US stock prices? Explain why or why not with explicit reference to the assumptions necessary for a valid difference-in-difference approach. (5 marks) Solution: The plot suggests that the parallel trends assumptions may not hold. Given the sharp rise in the average price of Chinese stocks beginning in late 2014, it seems unlikeley that the average change in prices in the US—had the Patriots not won the superbowl—would have been the same as the observed pattern in China. 0123 6 Notes: Not real data. Rough approximation of 2014/2015 patterns. Figure 3: Average Stock Price By Exchange 200 150 100 50 0 2014−01 2014−07 2015−01 2015−07 Date 2016−01 exchange nyse sse 7 Price Problem 6 (15 marks) Disclaimer: this question uses techniques introduced in week 4, so you may want to wait before beginning. On insendi you will find two data files: regularization train.csv and regularization test.csv. Each contains a single y variable and 50 x variables labeled x1, x2, · · · , x50 (a) Set the seed using set.seed(1234). Use cross validation (with 10 folds) to choose λ for a LASSO regression of y on all x1, x2, · · · , x50. Report the following: (5 marks) 1. The value of λ that minimizes the cross-validated error. Solution: λmin = 0.0389 2. The largest value of λ that provides cross-validated error within one standard error of the minimum Solution: λ1se = 0.119 3. The number of non-zero coefficients in the model estimated with λ from part 2. Solution: 4 coefficients not including the intercept. 4. The mean squared error from an out of sample test of the model using the testing data regulariza- tion test.csv (again using λ from part 2). ˆ Solution: MSE = 1.041 (b) Repeat the exercise in part (a) using 20 fold cross validation. (5 marks) ˆ Solution: λmin = 0.043, λ1se = 0.119, 4 coefficients, MSE = 1.041. (c) Repeat the exercise in part (a) using elastic-net with α = 0.1. (5 marks) ˆ Solution: λmin = 0.088, λ1se = 0.244, 26 coefficients, MSE = 1.041. 8 Problem 7 (25 marks) Create a single compelling plot of your choice using financial data and ggplot. • You must download or otherwise acquire some financial or financially relevant data. This may be from Bloomberg, an online provider (e.g. Yahoo Finance or fred.stlouisfed.org), or any other source. • Your plot should demonstrate an interesting stylized fact. This could be, for example, the short run price response of an asset or group of assets to some salient event, long run changes in investor flows, or anything that you find exciting. Please be creative and try to identify something that I or your classmates might find interesting. • Each student must perform this task individually. You are responsible for acquiring data yourself. You may not use data that has been downloaded by a classmate and you may not produce the same plot as a classmate. • Your answer to this question should cover a single page. The top half of the page will contain the plot, the bottom half will contain an explanation of the stylized fact. Feel free to supplement with regression analysis or other support from the data. • Your graph will be judged on clarity. Pay attention to labeling and scaling. • I will select a few interesting plots to present and discuss in class (with permission of the authors). 9