Complete the steps below. When finished, submit SAS code, log, and PDF output.
Part 0
A SAS program shell is provided as part of this assignment. Provide requested values at lines 1-3 of the code shell.
– Macro variable UID should be set to your 9-digit, numeric, university UID (i.e., 999887777)
– Specify appropriate filepaths in the macro variables at the top of the code shell
Read 2015-05 data into SAS dataset using code shell lines 18-169.
– Same code that has been used throughout course – be careful to use the entire dataset (do *not* subset to only Carrier=AA).
– Note that you must fill in the missing filepath in line 155
– Note that the dataset is named HW3.BTS201505
Lines 177 – 192 generate an individualized random subset of the BTS201505 dataset
– Run code to generate a customized subset of the full dataset
– Note that the resulting dataset is named HW3.BTS201505_&UID.
Part 1
1. Use appropriate SAS data step syntax to read the O’Hare 2015 weather file (Weather_ord_2015_miss.txt) into a SAS dataset. (Variables are Flight Date, Weather Station Name, Daily Precipitation (PRCP), Snow Depth on Ground (SNWD), Daily Snowfall (SNOW), Daily Low Temp (TEMPMIN), and Daily High Temp (TEMPMAX)).
1.a. Note that the weather file includes missing values.
1.b. Create a new dataset named BTS201505WTHR that MERGEs the partial weather file to the May, 2015 BTS file in a DATA step-Include only observations with ORIGIN=”ORD” in the new dataset.
1.c. Create a new dataset named BTS201505WTHRIMP with 12 iterations of multiple imputation values for missing values of prcp, tempmin, and tempmax.
1.d. Estimate an OLS regression model for DepDelay, using the following input variables:
• CRSDepTime
• seqnum
• ArrDelayLagInd
• ArrDelayLag
• ArrDelayLagCum
• ArrDelayLag2
• DistanceLag
• CancelledLag
• Tempmin (from the Weather file)
• Tempmax (from the Weather file)
• PRCP (from the Weather file)
1.e. Use only observations where CANCELLED=0 and ORIGIN=”ORD”.
1.f. Use an appropriate SAS proc to evaluate the Multiple Imputation parameter estimates for tempmin, tempmax, and prcp.
Part 2
In order:
– Use an appropriate SAS PROC to impute missing values for MNAR variables DepDelay, ArrDelay, and Distance in HW3.BTS201505_&UID. (Do NOT rename HW3.BTS201505_&UID. Simply replace missing values without changing the name of HW3.BTS201505_&UID.)
– Create Lag1 variables for ArrDelay, Distance, and Cancelled
– Create variables ArrDelayLagInd and ArrDelayLagCum
– Create Lag2 variables from ArrDelay (ArrDelayLag2 and ArrDelayLag2Ind)
Generate standard OLS parameter estimates from the HW3.BTS201505_&UID. dataset for the following MODEL specification:
• DepDelay=Carrier CRSDepTime seqnum ArrDelayLagInd ArrDelayLag ArrDelayLagCum ArrDelayLag2 ArrDelayLag2Ind CancelledLag DistanceLag
Generate standard OLS parameter estimates from the HW3.BTS201505_&UID. dataset for the following MODEL specification BY CARRIER:
• Use an appropriate PROC option to output the parameter estimates for each BY GROUP to a SAS dataset
• The dataset with parameter estimates will be created with each BY model in a separate row.
• TRANSPOSE that table so that each model appears in a column.
• In the TRANSPOSEd table create new variables that compare the percent difference in parameter estimates for each Carrier against the parameter estimates for the “AA” BY group
Part 3
Using BY group DATA step logic (hint: first. and last.) create a new data set named HW3.BTS201505_&UID._DEL15S that contains one summary observation for every CARRIER/TAILNUM/FLIGHTDATE combination in HW3.BTS201505_&UID.
Each observation should include a new variable DELAYCOUNT that counts the number of times each airplane has a Departure Delay greater than 15 minutes on any given day that airplane flies.
Each record should also contain the following variables:
• SeqNum: max SeqNum for a CARRIER/TAILNUM/FLIGHTDATE combination (hint: use the DATA step BY group processing to find this value)
• LnSeqNUM: natural log of the max SeqNum
• MaxCRSDepTime: max CRSDepTime:
• DistanceCum: cumulative Distance flown by the aircraft across all flights during a given day
• CancelledCount: Count of Cancelled=1 flights for each CARRIER/TAILNUM/FLIGHTDATE combination
Fit a Poisson regression model for the following model specification:
• DelayCount= Carrier MaxCRSDepTime DistanceCum CancelledCount
• Specify the natural log of SeqNum as the offset
Estimate the same model BY CARRIER for the following model specification:
• DelayCount= MaxCRSDepTime DistanceCum CancelledCount
• Specify LnSeqNum as the OFFSET
Part 5
Select one of the Easter Egg problems (located in a separate Blackboard content folder) and complete per the instructions specific to the question you choose.