python 数据分析代写

The value of our asset is extremely sensitive to the “Original Agreed Value” (i.e. the value of the home at origination of the deal).  However, without an actual market transaction, we rely on AVMs (Automated Valuation Models) to guess what a fair value is.  In this exercise, we want to evaluate how good our AVMs are and determine how we can combine them into an even better AVM (i.e. boosting).

Open the excel sheet AVMBoosting.xlsx.  Note the following columns:

  1. Value: the true value that we want to estimate with AVMs.
  2. AVMs 1-5: These are the AVMs generated by 5 different models.
  3. AVMs Standard Error 1-5: These are the standard deviations of each AVM.

*Note there is quite a bit of missing data.  Please use good judgement to treat this problem.

Please complete the following exercises in a Jupyter (or comparable Python notebook).

 

2a. Using descriptive statistics and plots, describe the performance of AVM 1.  Specifically, for AVM 1, plot the percentage (or log) error between the AVM estimate and the true “Value”.  Comment on the shape of the distribution (average, median, variance and tail behaviors).  Repeat for AVMs 2-5.

 

2b. Create a simple “model 1” for guessing the “Value” by taking an average of the AVMs 1-5 for each property.  Plot the percentage (or log) error between model 1 estimates and the true “Value” in the training set.  Comment of the shape.

 

2c. Repeat 2b with a “model 2” which takes the median of the AVMs 1-5 for each property.  Compare the results with 2b.  When is it better, when is it worse?

 

2d. Repeat 2b with a weighted average.  How can you use AVM standard errors to determine correct weights? (hint: heteroskedasticity)

 

2e. Create your own model for estimating the “Value” using the AVMs provided.  Describe your model in detail and support why/when you think it will outperform the previous methods.  Split your data into training and test data-sets to determine how it performs in-sample vs. out-of-sample.