SAS: Homework Assignment 3
In this assignment you will create a SAS program, save it as a .sas file, and upload that
file to Moodle on the assignment link.
Notes:
• The file submitted must meet the SAS File Submission Guidelines available in the
Resources and Information section of the course.
• If your file doesn’t meet these guidelines, we may take up to 100% off from your score.
• You may work with a partner (from your section) for these assignments. However, the
work that you submit must be yours in its entirety. You must also reference the
individual you worked with in your assignment’s header.
• No late work will be accepted. If you have a documented emergency that prevents you
from completing a homework assignment, please contact your instructor.
• Submission of the same (or extremely similar code) by two people is considered an act
of academic dishonesty. Even if you work with a partner, you must write your own final
comments and code. We understand that the code itself may be very similar, but the
comments, variable names chosen, etc. should be different.
Dataset:
The dataset for this homework is available in the assignment link and has information
about student performance and possible related factors. The dataset is available via the
URL:
https://www4.stat.ncsu.edu/~online/datasets/StudentData.txt
The StudentData.txt data comes from the UCI machine learning repository. Information
about the variables in the dataset is available at this link.
You do not need to upload this file to SAS On Demand as you are required to read it in
via a URL.
Task 1: Conceptual questions (3 pts)
In comments after your header, answer the following questions:
1. Given a numeric variable, what two aspects of the distribution do we usually want to
summarize? (2 pts)
2. What does the term contingency table mean? (1 pt)
Task 2: Programming questions (22 pts)
In the same file, write code corresponding to each question below. That is, don’t simply
overwrite/modify the code used for question 2 in question 3. You can copy and paste the
previous code if needed, but we need to see the code used to answer each question.
Don’t forget to add comments prior to each SAS step describing what you are doing!
We do not need the output. We can recreate everything using the code you turn in.
https://archive.ics.uci.edu/ml/datasets/Student+Performance
1. Create a permanent library using a LIBNAME statement. (1 pt)
2. Create code to import the StudentData.txt file from the URL above into your
permanent library created in question 1. Note: This is a ‘;’ delimited file. (3 pts)
3. Run the following code (altering the names corresponding to the library and dataset)
to make sure the g1 and g2 variables are stored as numeric variables: (0 pts)
DATA yourlib.yourdataname;
SET yourlib.yourdataname;
numG1 = input(G1, 8.);
numG2 = input(G2, 8.);
RUN;
4. Use a PROC step to sort the data by the variables below and output the resulting
dataset to a temporary library: (2 pts)
• sex
• address
5. Use a PROC to produce the following summary statistics about the age and numG1
variables (and no other summary statistics) at every combination of the variables in Q4:
(4 pts)
• Sample Q1 (first quartile)
• Sample Standard Deviation
• Sample Minimum
6. Use a PROC step to find the correlation between the age, numG1, and numG2
variables for every setting of the first variable you sorted by in Q4. (3 pts)
7. Create a two-way contingency table (that includes expected counts) between the
studytime and failures variables. (3 pts)
8. Create a horizontal bar plot of the mjob variable with categories in ascending order
(this is an option on the plotting statement) (3 pts)
9. Create side-by-side vertical bar plots of the reason and fjob variables. (3 pts)
Save this program and upload it to moodle using the assignment link! Great work!