The file FlightDelaysSM.csv contains data on flights from airports in the USA for one month in 2015. The file contains the following variables:
• schedtime the scheduled time of departure (using the 24 hour clock); • deptime the actual departure time;
• distance the length of the flight (in miles)
• flightnumber the flight number;
• weather 0 = normal; 1 = severe;
• dayweek the day of the week (1=Monday, . . ., 7=Sunday); • daymonth the day of the month.
Note that is this a comma separated file. You need to tell R this when using the read.table command. For example, if you have saved the data file to a directory named MyR, the following command should work:
Delays <- read.table("MyR\\FlightDelays.csv",header=T,sep=",")
The aim of this project is to be able to predict the length of flight delays, using a statistical model. Note that the data set does not contain the length of flight delays, but it can be calculated from the actual and scheduled departure times. Think carefully about how to do this.
Carry out a thorough analysis of these data, using whatever methods you believe to be suitable. Write a brief summary of the results.
Your report should be no more than 300 words in total and include no more than one plot. There is no fixed format for structuring the report, but it should include:
• a brief description of the methods you have used and why;
• a brief description of the results of your analysis and what they mean in
practical terms;
• an explanation how to predict the length of a flight delay, given information on all variables other than the actual departure time, illustrated by at least one example.
Illustrate means: To show the meaning or truth of something more clearly