/* WE WILL COVER THE FOLLOWING TOPICS IN THIS SET OF NOTES
– INTRODUCTION TO PROC FORMAT AND FORMATTING DATA SETS
– USE OF SAS FUNCTIONS
– CREATING AND REDEFINING VARIABLES
– MERGING, SETTING, AND SUBSETTING DATA SETS
– IF/THEN/ELSE STATEMENTS
– ARRAYS
– DO STATEMENTS
*/
*Set the pathname for the files for today’s lecture-my path is;
%let pathname=E:\BIS679A\Lecture3and4\Data_9_14_2020\;
/* Previously, YOU WERE INTRODUCED TO SAS FORMATS, AND HOW TO FORMAT A VARIABLE USING SAS FORMATS
(SOME COMMON SAS FORMATS CAN BE FOUND ON p 112-113 OF THE LITTLE SAS BOOK)
TOPIC: CREATING YOUR OWN FORMAT IN SAS */
* Why might we want to/need to do this?
*We can use proc format to create formats;
*Do you think that these formats are permanent or temporary?;
*How do we know? Where can I find the formats?;
*You can either type them directly into the program or access them from a separate sas file;
*Here is what it looks like when typed directly into the program;
proc format;
value $gender “M”=”Male”
“F”=”Female”;
value employ 1=”Employed”
0=”Unemployed”;
value insurance 1=”Uninsured”
2=”Public”
0=”Private”;
value ed 1=”< high school" 0=">= high school”;
value marriage 0=”Married/Relationship”
1=”Never Married”
2=”Divorced/Widowed/separated”;
value alcohol 0=”<= 2 drinks/week" 1="> 2 drinks/week”;
value smoking 0=”Never”
1=”Former”
2=”Current”;
run;
* Once the formats are created, we apply them like we would any SAS format;
proc import datafile=”&pathname.Demographics.xlsx” out=demo replace;
sheet=”Demo$”;
label V1=”Gender”;
label V2=”Employment Status”;
label V3=”Insurance Status”;
label V4=”Education Level”;
label V5=”Marital Status”;
label V6=”Smoking Status”;
label V7=”Alcohol Consumption”;
format v1 $gender. v2 employ. v3 insurance. v4 ed. v5 marriage. v6 smoking. v7 alcohol.;
run;
/* IMPORTANT TO LOOK AT LOG! */
* What is wrong? Why don’t you think that it works?;
title ‘Contents of demographics data set’;
proc contents data=demo;
run;
*Applying formats to the demographics data set;
*Log suggests proc datasets – can also use a data step;
data demo; set demo; *Rename it the same thing;
format v1 $gender. v2 employ. v3 insurance. v4 ed. v5 marriage. v6 smoking. v7 alcohol.;
run;
*Now look at the contents;
*Even though the same title as above, still good idea to title all output in case you run code
out of order;
title ‘Contents of demographics data set’;
proc contents data=demo;
run;
*Print only 10 observations;
title ‘Check of the Demographic Data Set’;
proc print data=demo(obs=10);
run;
title ‘Check the distribution of the first few variables’;
proc freq data=demo;
table v1 v2 v3;
run;
*Even though we have created variable labels for the variables, it is much better to have meaningful variables names.
These can be shorter than the labels, but should help to identify the variables;
* TOPIC: REMAMING VARIABLES;
*Rename the variables in the demographics data set set to more meaningful variable names;
data demo; set demo(rename=(v1=gender v2=employment v3=insurance v4=education v5=marriage
v6=smoking v7=alcohol));
run;
title ‘Check of the Demographic Data Set’;
proc print data=demo(obs=10);
run;
title ‘Check the distribution of the first few variables’;
proc freq data=demo;
table gender employment insurance;
run;
*TOPIC: USING FORMAT TO REGROUP VARIABLES WITHOUT RECODING;
*Create a new format;
proc format;
value insure 1=”Uninsured”
2=”Insured”
0=”Insured”;
run;
title “Original format”;
proc freq data=demo;
table insurance;
run;
title “New combined format”;
proc freq data=demo;
table insurance;
format insurance insure.;
run;
*What format is contained in the data set demo for insurance? How do you find out?;
title “Demo contents”;
proc contents data=demo;
run;
/*ACCESSING A FORMAT FILE */
*TOPIC: DEALING WITH DATA SETS THAT ALREADY CONTAIN FORMATS;
*Set the pathname as done above;
%let pathname=C:\BIS679A\Lecture3and4\Data_9_14_2020\;
proc import datafile=”&pathname.Demographics.xlsx” out=demo2 replace;
sheet=”Demo$”;
label V1=”Gender”;
label V2=”Employment Status”;
label V3=”Insurance Status”;
label V4=”Education Level”;
label V5=”Marital Status”;
label V6=”Smoking Status”;
label V7=”Alcohol Consumption”;
run;
title ‘Demographic data set without Formats’;
proc print data=demo2;
run;
*Now try by calling the external format file;
*Include the format file for the demo data set;
%include “&pathname.formats.sas”;
*Then as sas to locate the formats (called demo) created by the Formats file;
options fmtsearch=(work.demo);
*Would use a.demo rather than work.demo if permanent file;
proc import datafile=”&pathname.Demographics.xlsx” out=demo2 replace;
sheet=”Demo$”;
label V1=”Gender”;
label V2=”Employment Status”;
label V3=”Insurance Status”;
label V4=”Education Level”;
label V5=”Marital Status”;
label V6=”Smoking Status”;
label V7=”Alcohol Consumption”;;
run;
data demo2; set demo2;
format v1 $gender. v2 employ. v3 insurance. v4 ed. v5 marriage. v6 smoking. v7 alcohol.; run;
title ‘Employment data set with Formats’;
proc print data=demo2;
run;
*How do you create a SAS data set with formats?;
*How do you locate permanent SAS data sets?;
*Do you need to read the data into SAS if it is already a SAS data set?;
*What happens when you are given a SAS data set that has formats attached to it?;
*Note that formats created in one version of SAS may not work in another version of SAS;
libname a “&pathname”;
*Create a sas data named demo located in the “a” library;
data a.demo; set demo2; run;
title “Contents of comorbid”;
proc contents data=a.demo;
run;
*(1) Can use the following option – IGNORE THE FORMATS;
*When NOFMTERR is in effect, SAS substitutes a format for the missing format and
continues processing. FMTERR specifies that when SAS cannot find a specified variable format,
it generates an error message and stops processing;