Handed out: 03/05/2019 Due by 5:59 PM, 03/12/2019
Use this word document to describe all your steps after each problem. Show all commands & screen pages and describe your steps. Points are deducted for missing steps.
Describe all steps for each problem set and capture screen shots for each step with a JPG image within this document. Do not forget to put your name on the top of this file for your submitted homework. Also, please replace _ALL_ text in red with your information! Please include in your MS Word document only relevant portions of the console output or output files. Sometime either console output or the result file is too long and including it into the MS Word document makes that document too hard to read. PLEASE DO NOT EMBED files into your MS Word document. You are not obliged to use Java or Eclipse. You are welcome to use any language and any IDE of your choice.
Problem 1:
Remove the header of the attached Samll_Car_Data.csv file and then import it into Spark. Randomly select 10% of you data for testing and use remaining data for training. Look initially at horsepower and displacement. Treat displacement as a feature and horsepower as the target variable. Use MLlib linear regression to identify the model for the relationship. Use test data to illustrate accuracy of your ability to predict the relationship. Create a diagram using D3 which presents the model (straight line), original test data and predictions of your analysis. Please label your axes and use different colors for original data and predicted data.
Total points: 35
Put all your steps and screen captures here.
Problem 2:
Treat: cylinders, displacement, manufacturer, model year, origin and weight as features and use linear regression to predict two target variable: horsepower and acceleration. Please note that some of those are categorical variables. Use test data to assess quality of prediction for both target variables. Which of two target variables is easier to predict, in the sense that predicted values differ less from the original values
Total points: 35
Put all your steps and screen captures here.
Problem 3:
Repeat above analysis with decision tree method. Compare predicting ability/quality of this technique with that of the linear regression.
Total points: 30
Put all your steps and screen captures here.
————————————————————————————————————
Upload your MS Word document named: EAI6010_YourLastNameYourFirstNameHWXX.docx to the course web site in your folder for Assignment 1. Note: Do NOT include your JPEG files as separate files.
If you have issues with the upload, please notify your instructor! If you are raising issues that might be of interest to all of your colleagues in the class, please use the Discussion Board, Assignment XX thread on the course site. The Discussion Board is your best friend.