Sample exam
Writing Time: 120 mins Total Duration 120 mins
Instructions to Candidate:
1. Answer ALL SIX (6) questions.
Copyright By PowCoder代写 加微信 powcoder
2. This is a Closed Book examination.
3. You should answer all questions in the answer book and should begin each answer on a
new page in the answer book.
4. Please allocate your time according to the percentage contribution of the questions.
5. Examination materials must NOT be removed from the examination room.
6. Please provide all relevant calculations.
7. The exam is worth a total of 50 points.
Good luck!
************************
Question 1. What are the advantages of using a sample from ¡°a big data¡± rather than ¡°big data¡± (the answer should not exceed 10 sentences). [8 points]
Agenda for Questions 2-4.
Suppose you are a real estate analyst. You obtained the Melbourne real estate data for the 2016-2018 period. The data set includes the following variables:
Price: Method: Year: Distance:
Price in Australian dollars
S – property sold in auction; SP – property sold prior to auction; Year sold
Distance from CBD in kilometres
Suburb: Suburb
Propertycount: Number of properties that exist in the suburb. Rooms: Number of rooms
Bathroom: Number of bathrooms
Car: Number of car spots
Landsize: Land size in sq. metres
BuildingArea: Building size in sq. metres
Council: Council
You analysed the data using SAS. The questions below contain SAS output related to this data set.
Question 2. OLS regression.
The table below shows the regression output where the dependent variable is ¡®Price¡¯ and
independent variable is ¡®Distance¡¯:
a) What is the impact of ¡®Distance¡¯ on ¡®Price¡¯? Is the impact statistically significant? b) How does ¡®Price¡¯ change if ¡®Distance¡¯ decreases by 2km? Provide all calculations.
c) Discuss the residual plot. Based on the residual plot, would you recommend to make any changes in the regression models? If yes, what changes and why.
d) What is the meaning (or interpretation) of the intercept in the regression model above?
e) Compute the predicted value for ¡®Price¡± for the property which is located 10 km away from the CBD.
[9 points]
Question 3.
The most of real estate transactions have been recorded for the following councils: Boroondara City Council, Darebin City Council, and Council.
You decided to model the probabilities that a property is sold in each of these councils, using a multinomial logit regression. The following variables are used as the independent variables: the natural logarithm of ¡®Price¡¯, ¡®Rooms¡¯, ¡®Bathroom¡¯, and ¡®Car¡¯. The results are provided below.
The LOGISTIC Procedure
Model Information
Response Variable
Number of Response Levels Model
Optimization Technique
WORK.HOUSES CouncilArea
generalized logit Newton- ObservationsRead 987 Number of Observations Used 987
Ordered V alue
Response Profile CouncilArea
Total Frequency
Boroondara City Council 309 Darebin City Council 386 Council 292
Logits modeled use CouncilArea=’ Council’ as the reference category.
Model Fit Statistics
Criterion Intercept Only Intercept and Covariates
2157.743 2167.532 2153.743
762.725 811.672 742.725
Testing Global Null Hypothesis: BETA=0
Likelihood Ratio Score
-Square DF Pr > ChiSq
1411.0181 8 859.4111 8 328.4687 8
<.0001 <.0001 <.0001
Intercept Intercept ln_price ln_price Rooms Rooms Bathroom Bathroom Car
Analysis of Maximum Likelihood Estimates
CouncilArea DF Estimate Standard Wald Pr > ChiSq Error Chi-Square
Boroondara City Council 1 -250.8 Darebin City Council 1 -142.9 Boroondara City Council 1 18.7642 Darebin City Council 1 11.0258 Boroondara City Council 1 -1.8106 Darebin City Council 1 -1.0068 Boroondara City Council 1 -0.8175 Darebin City Council 1 -1.0419 Boroondara City Council 1 -0.4602 Darebin City Council 1 -0.4806
14.5711 11.8369 1.0949 0.9031 0.3448 0.2744 0.3759 0.2996 0.2152 0.1671
296.1921 <.0001 145.8268 <.0001 293.7257 <.0001 149.0455 <.0001
27.5780 <.0001 13.4648 0.0002 4.7292 0.0297 12.0922 0.0005 4.5746 0.0324 8.2734 0.0040
a) Which properties are more likely to be sold in Darebin City Council compared to Council? In the discussion, consider those independent variables that are statistically significant at 10% level.
b) Which properties are more likely to be sold in Boroondara City Council compared to Council? In the discussion, consider those independent variables that are statistically significant at 10% level.
c) Compute the predicted probabilities that the following property is sold in each suburb: Price = 2100000
Bathroom =2
d) Based on the results in part e), in which type of suburb is this property most likely to be sold?
[9 points]
Question 4.
You would like to forecast a future bond yield using time series models. The variable of interest is a yield on corporate bonds (¡®WSPCA¡¯). You may also consider its first difference (¡®d_WSPCA¡¯). You used SAS procedure ARIMA to check the stationarity of the variables:
proc arima data=work.yield;
identify var= WSPCA stationarity=(DICKEY);
run; quit;
SAS output for ¡®WSPCA¡¯ is as follows:
Augmented Dickey- Root Tests
data work.yield;
set work.yield;
d_WSPCA=WSPCA-lag(WSPCA);
proc arima data=work.yield;
identify var= d_WSPCA stationarity=(DICKEY);
Type Lags Rho
Pr < Rho Tau
0.6433 -0.77 0.6412 -0.77 0.6380 -0.77 0.1465 -2.10 0.0913 -2.31 0.0447 -2.59 0.4306 -2.16 0.3127 -2.36 0.1867 -2.62
Pr < Tau F
0.3831 0.3853 0.3841 0.2450 2.43 0.1697 2.88 0.0973 3.56 0.5097 2.36 0.3999 2.80 0.2699 3.45
0.4490 0.3327 0.1600 0.7045 0.6162 0.4857
Single Mean
0 -0.1721 1 -0.1816 2 -0.1955 0 -9.5817 1 -11.5152 2 -14.3716 0 -10.0040 1 -11.9295 2 -14.7717
SAS output for ¡®d_WSPCA¡¯ is as follows:
Augmented Dickey-
Root Tests
Pr < Tau F
<.0001 139.84 <.0001 64.21 <.0001 51.13 <.0001 139.42 <.0001 64.01 <.0001 50.97
Type Lags Rho
0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
-16.73 -11.32 -10.10 -16.72 -11.33 -10.11 -16.70 -11.31 -10.10
0.0010 0.0010 0.0010 0.0010 0.0010 0.0010
Single Mean
0 -297.174 1 -257.490 2 -305.578 0 -297.616 1 -258.603 2 -308.248 0 -297.656 1 -258.653 2 -308.374
a) Given the results above, which variable should be used in the time series analysis: ¡®WSPCA¡¯ or ¡®d_WSPCA¡¯?
b) Which variable is a stationary time series? Why?
Question 5.
[6 points]
You would like to forecast future bond yields using time series models. The variable of interest is ¡®d_WSPCA¡¯. You used SAS procedure ARIMA to find optimal ARIMA model:
run; quit;
SAS output is as follows:
proc arima data=work.yield;
identify var= d_WSPCA stationarity=(DICKEY) SCAN ESACF;
ARMA(p+d,q) Tentative Order Selection Tests
SCAN ESACF p+d q p+d q 22 21 43
a) What ARMA model or models does SAS recommend?
You estimated the ARMA model (using SAS procedure ARIMA) where the dependent variable is ¡®d_WSPCA¡¯. SAS provided the following output
Conditional Least Squares Estimation
Parameter Estimate Standard Error
MU -0.0046173 0.0077681 MA1,1 -0.52671 0.45371 MA1,2 -0.11366 0.05624 AR1,1 -0.46535 0.45597
t Value Approx Lag Pr > |t|
-0.59 0.5527 0 -1.16 0.2466 1 -2.02 0.0441 2 -1.02 0.3082 1
b) What are the ARMA orders in the model above?
c) Which coefficient estimates are statistically significant (at 5% level) in the table above?
c) Find the predicted values (FORECAST) and residuals (RESIDUAL) for the first 3 observations:
Observation d_WSPCA FORECAST RESIDUAL 1 0.056
2 0.08 3 0.01
Question 6. What is the purpose of Monte Carlo simulation? Describe using your own words how to implement Monte Carlo simulation. (the answer should not exceed 20 sentences). [8 points]
[10 points]
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com