Homework Assignment #3 Due Thur, February 14, at 1pm – No late homework accepted.
Please put all the SAS code in one SAS Program, and upload the program on Blackboard.
Please name your file, yourlastnameHW3.sas
• Write a macro: It should be able to use proc means to print means of a quantitative variable BY a categorical variable (must use a BY line of code), and use proc sgplot to print boxplots of the same quantitative variable by the categorical. Your macro should sort the data first. The user should be able to call the macro, indicating the data set name, and the two variables used in the analysis, as parameters. (You’ll use the macro on some data below…)
• Use the final version of the ipeds data that you used for HW#2. Include in this homework: the import code, and include the recoding of missing values, labels and formats, new variables.
• Macro: Call the macro you just wrote, for percent female by region. Do the results make sense? Is one region different from the rest? Explain in a comment.
• Tabulate: Create the Proc Tabulate output given below.
• Crime Data: This data is from the CLERY Data site (National Campus Safety and Security Data site: http://ope.ed.gov/security/). The attached file contains some of the data available for crimes and disciplinary actions, for schools in the US.
Note that many of the schools have multiple campuses for the same UnitID. (Campuses are indicated by CampusID). Our goal is to sum the incidences of crime for each UnitID, calculate the crime rate per 1000 students enrolled for each crime, for each UnitID, then merge the data with our IPEDS data set. With the final data set, we will look at correlations of crimes with institution cost and enrollment. Here are the steps:
• Import the crime data and include the import statements in your file. Sort it by unitid.
• Using proc means and the output statement (see 4.11 in the text), create a new data set with the sum of each crime over all the campuses at that unitid. That is, you will end up with a file with only one row per unitid. Note that you can just say SUM= ; in your output statement – you do not have to name them all. Drop _type_ and _freq_ from the data set. In your proc means, use NOPRINT, so that you don’t get too much output in your results viewer!
• Merge the crime data with your IPEDS data, by unitid. Run proc means on all quantitative variables (not Unitid) to confirm that you have everything.
• Using the full data set, run correlations relating the vars: all crime/discipline rate variables with: total instate cost and total enrollment. Write a comment in your program, describing the correlations that you observe.
That is, which crime/discipline variables are significantly correlated (p-value < .05) with total cost? Which are correlated with total enrollment? Do note the direction – for example “Schools with higher cost have a higher rate of drug and alcohol violations, higher incidence of rape, etc.” Don’t say anything about the correlations that are not statistically significant.
• Using the data set you just created, create a new data set that turns all the crime/disciplinary variables into rates per 1000 students, using the total enrollment variable. Use an array for this purpose, since there are many crime variables.
• Using the new data set, rerun correlations relating the vars: all crime/discipline rate variables with: total instate cost and total enrollment. Write a comment on what you observe this time. Are the results different? Explain why.
• When everything is working correctly, add lines of code to use ODS to create a file called HW3Results.pdf in one of your SAS On Demand folders.