MS3251 Analytics Using SAS
Assignment 2
· You must complete the assignment by yourself. Exchanging ideas with classmates are encouraged, but you must not cross the line between discussion and collaboration. Showing your work to your classmates will be considered as a kind of collaboration.
· Complete all tasks. Put all of your SAS code into one PLAIN TEXT file or in PDF format. Name your code file as nnnnnnnn.txt or nnnnnnnn.pdf where nnnnnnnn is your EID (login name) at CityU. Do not submit your file in Word format or in .SAS extension.
· You should set all irrelevant statements in your code as SAS comment statements. Be aware that if you create your SAS code under a non-English operating system, it may contain extraneous characters when viewed in an English operating system. It is your responsibility to ensure that your submitted code is free of these characters. All extraneous characters in your submitted code will be considered errors.
· Your submitted SAS code should be free of error when running in SAS Studio.
· You must submit your code file via the link Assignment 2 under the Assignments section of the course on Canvas. If you submit your file more than once, only the latest submitted file will be marked. The link will be closed on the 30 November, at 11:59 pm. Late submission will not be accepted. Submission by other methods will not be accepted.
Tasks
Download the zipped file Assignment2.zip from the folder Data Sets under the Files section. Expand the zipped file for the following SAS data sets:
· Demographics.sas7bdat: contain demographic information of all members.
· Flying_Activities.sas7bdat: contain members’ flying activities between 1 Jan 2018 and 31 Dec 2021.
You can find the variable dictionary Assignment2_Variabledictionary.xlsx. Do not change the contents of these data sets unless you are told to do so explicitly.
Write a SAS program for the following tasks. Mark the code of each task by using a proper comment statement, such as ‘/*Task 2a*/’.
Task 1:
Write a LIBNAME statement to define a SAS library in SAS Studio so that you can access the above-described data sets (and any others) via the library.
Task 2:
2a: Write a DATA step to create a SAS data set named as Demographics_New. This data set must contain all observations and variables in Demographics and a new variable called Tenure. Tenure is defined as the number of full months from an observation’s Join_Date till 1 Jan 2021. For example, if an observation’s Join_Date is 2 Jan 2020, then the observation’s Tenure value is 11 months. Place Demographics_New in the library created in Task 1. {Hint: SAS function INTCK(‘month’, ‘2jan2020’d, ‘1jan2021’d,’continuous’) returns a value of 11 months. The ‘month’ and ‘continuous’ are system keywords for the INTCK function.}
A subset of the observations in Demographics_New is shown here for illustration purposes:
2b: Write a DATA step to modify the Gender values in Demographics_New so that if an observation’s Gender value is either ‘f’ or ‘m’, replace it by ‘F’ or ‘M’, respectively. Name the modified data set as Demographics_New again and keep it in the library created in Task 1. (Be aware that this new Demographics_New will overwrite the original Demographics_New.)
2c: Write appropriate SAS procedure statements for the purpose of comparing the average Tenure between male and female members in Demographics_New.
Task 3:
3a: Write appropriate SAS procedure statements for sorting the data set Flying_Activities by the variable Member_Id. Name the sorted data set as Sorted_Activities and place it in the Work library of SAS Studio.
3b: Write a DATA step to reshape Sorted_Activities by collapsing the observations with the same Member_Id value into a single observation. Name this new SAS data set as Collapsed_Activities and place it in the Work library of SAS. Create the following variables and keep only these variables, but not necessary in the listed order, for each observation in Collapsed_Activities:
Variable Name
Description
Member_Id
Member’s identity.
Air_CityU_2018
Number of times flew on CityU Airline (i.e. Airline value = ‘CityU’) in 2018.
Air_CityU_2019
Number of times flew on CityU Airline in 2019.
Air_CityU_2020
Number of times flew on CityU Airline in 2020.
Air_NonCityU_2018
Number of times flew on non-CityU Airlines (i.e. Airline value ^= ‘CityU’) in 2018.
Air_NonCityU_2019
Number of times flew on non-CityU Airlines in 2019.
Air_NonCityU_2020
Number of times flew on non-CityU Airlines in 2020.
FlyBonus_Earned_2018
Total bonus points earned from flying in 2018.
FlyBonus_Earned_2019
Total bonus points earned from flying in 2019.
FlyBonus_Earned_2020
Total bonus points earned from flying in 2020.
A subset of the observations in Collapsed_Activities is shown here for illustration purposes:
Task 4:
4a: Write appropriate SAS procedure statements for sorting the data set Demographics_New as created in Task 2b by the variable Member_Id in ascending order. Replace the original Demographics_New with the sorted data set.
4b: The data set Demographics_New is supposed to contain all members’ information. However, However, not all members in Collapsed_Activities are also contained in Demographics_New or vice versa. Write one DATA step for creating three SAS data sets. The three data sets must have these characteristics:
· One data set contains only the observations of Collapsed_Activities with Member_Id that do not appear in Demographics_New. Name this data set Redundant_Activities and place it in the Work library of SAS Studio. Keep only the variables of Collapsed_Activities in Redundant_Activities.
· A second data contains the observations of Collapsed_Activities with Member_Id that also appear in Demographics_New. Name this data set Collapsed_Activities2 and place it in the Work library of SAS Studio. Keep only the variables of Collapsed_Activities in Collapsed_Activities2.
· A third data set contains the observations of Demographic_New with Members_ID that do not appear in Collapsed_Activities. Name this data set Inactive_Members and place it in the Work library of SAS Studio. Keep only the variables of Demographic_New in Inactive_Members.
You must accomplish Task 4b with one DATA step only.
A subset of the observations in Redundant_Activities is shown here for illustration purposes:
A subset of the observations in Collapsed_Activities2 is shown here for illustration purposes:
A subset of the observations in Inactive_Members is shown here for illustration purposes: