DAT 537 FINAL EXAM
\sf This is the final exam. You cannot consult any material except materials on canvas or materials that are summaries of those materials. You cannot consult with anyone or any other materials on the web. Honor code is in place. You have 15 minutes to go over the exam and decide if you want to take it. If you decide not to take the exam, please send an email to me with the subject: sec (your section number): opting out of the final exam. If you decided to take the test, you have 3 hours to complete the exam.
Copyright By PowCoder代写 加微信 powcoder
The data for all questions is on canvas under the final exam folder. You should first of all download these data files on to your computer, and then use the readRDS() function to bring the data sets into your R session.
Please answer the questions on this rmd file. You have to select the correct answer on canvas. The code for all the questions, including the multiple choice questions must be put into this rmd file. Upload this rmd file when you are done.
You are requested not to make a copy of this rmd file. Your compliance is appreciated.
Program Track:
Quantitative Finance
\pagebreak
# Problem 1 [30 points]
The data set data1.rds on canvas contains data on y and x1, x2, x3 and x4. You do not know which of these x’s are on the right hand side of the regression. Suppose you are told that the error term in the regression is student-t with either nu = 4.5 or nu = 5.5. On the basis of these data and this information, answer the following questions.
1.1 What is the best value of nu and which variables enter the RHS of the regression? Do not use the last two rows of the data (these will be used later for prediction). Use seed = 101. Write your code here and choose one of the answers from canvas.
1.2 What are the .005 and .995 posterior quantiles of sigma (the square root of sigmasq) in the best model? Write your code here and choose one of the answers from canvas.
1.3 Use your best model and output on the parameters in question 1.1 to predict y[499] and y[500]. Use the x’s in the last two rows of data1 to find your mean forecasts of y[499] and y[500], and your .025 and .975 quantile forecasts. Use seed = 601. Write your code here and choose one of the answers from canvas.
# Problem 2 [30 points]
Consider the data in data2.RDS on the three outcome variables y = (y1,y2,y3). Suppose that these data have come from either a 3-dimensional Gaussian population with mean vector mu and covariance matrix Sigma, or from a MVT population with mean vector mu and dispersion matrix Delta and nu = 4 degrees of freedom.
2.1 Use the appropriate functions in estimate the Gaussian and MVT models. What is the logmarg of the Gaussian model and the logmarg of the MVT model? Use seed = 301 in estimating each model. Write your code here and choose one of the answers from canvas.
2.2 Now suppose you have to generate a forecast of yf, which is the next y beyond the last sample point. Given your posterior draws of mu and Sigma from the best model in question 2.1, generate forecasts of yf by the method of composition and use these draws on yf to calculate the mean forecast of yf. Use seed = 501. Write your code here and choose one of the answers from canvas.
2.3 Use the forecasts of yf in question 2.2 to calculate the .25 and .75 quantiles of the forecast distribution of yf1 and yf2. Write your code here and choose one of the answers from canvas.
# Problem 3 [30 points]
The data set data3.rds on canvas contains data on sales, price and disp of tuna brands bb, bbc and ge. On the basis of these data, you have to answer the following questions.
3.1 Suppose that you are interested in learning about the price elasticity of demand for bbc (the coefficient multpliying log-price in a log-log regression). You would like to estimate a student-t regression model. You would like to find the best model by searching over nu on the gird nug = seq(from = 2.5,to = 3.5,by = .1). For each nu, you estimate the regression t model for log salesbbc against log pricebbc and dispbbc with a seed of 251. What value of nu gives the best model? What is the posterior median of the price elasticity (the price elasticity is the coefficient of log price) in th?Write your code here and choose one of the answers from canvas
3.2. What is the posterior median of the price elasticity (the price elasticity is the coefficient of log price) in the best model you found in 3.1? Write your code here and choose one of the answers from canvas.
3.3 Now suppose you estimate the log(sales) against log(price) and display model for all the three brands bb, bbc and ge jointly (meaning that you now assume that the errors in the log(sales) models are correlated). Assume a MVT distribution for the errors. Use seed = 251 and the best value of nu from question 3.1. What is the posterior median of the price elasticity of bbc in this case? Give your code here and choose one of the answers on canvas.
3.4 What is the log marginal likelihood of the model you just estimated in question 3.3? Give your code here and choose one of the answers on canvas.
3.5 What is the sum of log-marginal likelihood of the three log-sales assuming that the errors are independent student-t. Assume that nu is the best nu you found in question 3.1. Use the seed = 251 in estimating each log-sales models. Give your code here and choose one of the answers on canvas.
# Problem 4 [40 points]
This questions concerns the 12 factors that are in the file “data4.RDS” on canvas. These data are monthly excess returns (multiplied by 100) on these factors from January 1989 to December 2020.
You are interested in finding out the best set of risk-factors. You decide to use the Chib, Zhao and Zeng (2020) approach for this purpose.
4.1 Suppose that you search for the risk-factors only using data from Jan 1996 to Dec 2020. Use the Chib, Zhao and Zeng (2020) method with a trainpct of .2 to find the best risk-factors on this sample. What are the best risk-factors you find? Give your code here and choose one of the answers on canvas.
4.2 One way to check if these factors are good, is to see how many of the remaining factors in data4 are priced by these risk-factors. You decide to do this check using a student-t regression model for each of the remaining factors (which are now test assets), first estimating a model without an intercept and then with an intercept. But in order to make the results more precise, you decide to find the best nu for the model without an intercept, and the best nu for the model with an intercept, searching in each case for the best nu on the grid nug = seq(from = 3,to = 4,by = .2). You take the difference in marg-likelihoods for the models with the best nu (for example, the model without a constant for one factor may have a best nu of 3.2 and the model with a constant for that same factor may have a best nu of 4). You declare that a factor is priced if the difference in marginal likelihoods is greater than .69. Use a seed of 351 for estimating each model. Use m = 5000 MCMC draws to estimate each model. How many of the remaining factors are priced? Give your code here and choose one of the answers on canvas.
# Problem 5 [20 points]
Consider the same data as in Question 4, but now suppose that instead of doing a model scan you estimate just one model – the model in which all 12 factors are risk-factors. Again suppose you work with January 1996 to December 2020 and you use the package czzg.
5.1 Estimate the factor model with all 12 factors as risk-factors. Use seed = 121 and assume multivariate normal errors. Let the output from this estimation be denoted by thetamf (f for full model). What is the log-marginal likelihood of this model? Give your code here and choose one of the answers on canvas.
5.2 Having estimated the model in question 5.1, you can determine which factors are risk-factors by looking at the posterior distribution of the SDF coefficients b = Ominv*lambda. The draws on b are sent out as an attr of thetamf called bm. Extract these draws and call the matrix bm. This matrix will be of dimension 10000 times 12. Use these draws to calculate the .005 and .995 quantiles of each column of bm. If the .005 and .995 quantiles are entirely to the left or right of zero, you declare that the corresponding factor is a risk-factor. Which factors are risk-factors from this method? Give your code here and choose one of the answers on canvas.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com