CS计算机代考程序代写 Hive SAS: Homework Assignment 3

SAS: Homework Assignment 3

In this assignment you will create a SAS program, save it as a .sas file, and upload that

file to Moodle on the assignment link.

Notes:

• The file submitted must meet the SAS File Submission Guidelines available in the

Resources and Information section of the course.

• If your file doesn’t meet these guidelines, we may take up to 100% off from your score.

• You may work with a partner (from your section) for these assignments. However, the

work that you submit must be yours in its entirety. You must also reference the

individual you worked with in your assignment’s header.

• No late work will be accepted. If you have a documented emergency that prevents you

from completing a homework assignment, please contact your instructor.

• Submission of the same (or extremely similar code) by two people is considered an act

of academic dishonesty. Even if you work with a partner, you must write your own final

comments and code. We understand that the code itself may be very similar, but the

comments, variable names chosen, etc. should be different.

Dataset:

The dataset for this homework is available in the assignment link and has information

about student performance and possible related factors. The dataset is available via the

URL:

https://www4.stat.ncsu.edu/~online/datasets/StudentData.txt

The StudentData.txt data comes from the UCI machine learning repository. Information

about the variables in the dataset is available at this link.

You do not need to upload this file to SAS On Demand as you are required to read it in

via a URL.

Task 1: Conceptual questions (3 pts)

In comments after your header, answer the following questions:

1. Given a numeric variable, what two aspects of the distribution do we usually want to

summarize? (2 pts)

2. What does the term contingency table mean? (1 pt)

Task 2: Programming questions (22 pts)

In the same file, write code corresponding to each question below. That is, don’t simply

overwrite/modify the code used for question 2 in question 3. You can copy and paste the

previous code if needed, but we need to see the code used to answer each question.

Don’t forget to add comments prior to each SAS step describing what you are doing!

We do not need the output. We can recreate everything using the code you turn in.

https://archive.ics.uci.edu/ml/datasets/Student+Performance

1. Create a permanent library using a LIBNAME statement. (1 pt)

2. Create code to import the StudentData.txt file from the URL above into your

permanent library created in question 1. Note: This is a ‘;’ delimited file. (3 pts)

3. Run the following code (altering the names corresponding to the library and dataset)

to make sure the g1 and g2 variables are stored as numeric variables: (0 pts)

DATA yourlib.yourdataname;

SET yourlib.yourdataname;

numG1 = input(G1, 8.);

numG2 = input(G2, 8.);

RUN;

4. Use a PROC step to sort the data by the variables below and output the resulting

dataset to a temporary library: (2 pts)

• sex

• address

5. Use a PROC to produce the following summary statistics about the age and numG1

variables (and no other summary statistics) at every combination of the variables in Q4:

(4 pts)

• Sample Q1 (first quartile)

• Sample Standard Deviation

• Sample Minimum

6. Use a PROC step to find the correlation between the age, numG1, and numG2

variables for every setting of the first variable you sorted by in Q4. (3 pts)

7. Create a two-way contingency table (that includes expected counts) between the

studytime and failures variables. (3 pts)

8. Create a horizontal bar plot of the mjob variable with categories in ascending order

(this is an option on the plotting statement) (3 pts)

9. Create side-by-side vertical bar plots of the reason and fjob variables. (3 pts)

Save this program and upload it to moodle using the assignment link! Great work!