Econometrics Name_________________
Problem Set #1
1.) Import the “Homework #1 Airfare.csv” into R. A description of the variables is also included in the “Homework #1 Description” text file.
Copyright By PowCoder代写 加微信 powcoder
a. Let’s first focus on the variable year. In class we noticed that some of the observations had an incorrect value as the expected data range is between 1997 – 2000. Rather than deleting these incorrect observations (as we did in class), I would like you to convert them all to the year 2000. Hint: in class we wrote out a command that replaced these observations with the value NA. Now you must replace these observations with the value 2000. How many observations are there in each year after this conversion is made?
b. Create an indicator variable (0/1) for each year. Label these new variables “yrXX”, where XX is the last two digits of the year.
c. What was the most expensive flight in 2000? What was the least expensive flight in 2000? (Hint: one way of solving this problem is by using the summary() command).
d. If you work for an airline, you might be interested in finding flights that are expensive and average a lot of passengers. Create an indicator variable (0/1) that equals 1 when a flight averages more than 500 passengers per day AND the fare was greater than 175, and 0 otherwise. (Hint: when creating an indicator variable with two conditions, use the “&” symbol to separate each condition). How many observations have a value of 1 for this binary variable?
e. You want to know what affects airfare prices. You believe that the inflation rate may affect rates over time, but this variable is not included in your initial dataset. I’ve included a second csv file titled “Homework #1 Inflation” in your homework folder. First import this data, and then merge it with your existing dataset using the common variable year. What is the value and sign of the correlation coefficients between airfare price and yr97, yr98, yr99 and yr00 (the dummies created in part 1b)? Does these match your expectations? Why or why not?
f. Now, run a regression which includes airfare as the dependent variable and dist, passen, yr98, yr99 and yr00 (from part b) as independent variables. Do the coefficients on yr98, yr99 and yr00 match what is observed in part 1e? How are these coefficients different from the correlation coefficient in part 1e?
g. Next let’s merge in information on the origin city. Currently we only know where the flights departed from based on the origin_id variable, which is only a numerical variable. Merge in the name of the origin city from the “Homework #1 Origin Name” csv file using the common variable origin_id. How many flights departed from AKRON, OH between 1997 – 2000? How many flights departed from AUSTIN, TX in the year 1998? Hint: you may want to create dummy variables for this question and then use the function table().
h. Re-estimate your regression from part f by including dist, passen, yr98, yr99, yr00 and a dummy for whether the flight originated from Akron, Ohio as independent variables. Are flights more costly or cheaper when they depart from Akron, Ohio? How much more expensive/cheap are they?
i. Save the predicted airfare prices (fare_hat) and the residuals (e) from the regression from part h. What is highest predicted airfare value? What is the lowest predicted airfare?
j. Create a scatter plot of the actual airfare values (fare) on the Y-axis and predicted airfare values (fare_hat) on the X-axis. Copy and paste this graph into your homework.
k. Plot the value of your residuals, which were calculated in part i, on a histogram and paste this diagram into your homework. Set xlim=c(-250,250) when constructing this diagram.
l. If an observation has a positive residual value is your model overpredicting or underpredicting airfare prices? If an observation has a negative residual value is your model overpredicting or underpredicting airfare prices?
m. If you were concerned with the accuracy of your regression model, would you want larger residual values or smaller residual values?
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com