BIS 505a SAS Programming Basics
BIS679A
SAS Programming Basics
September 1 and 8, 2021
1
Lesson Outline
TWO TYPES OF COMMANDS
1) DATA STEPS
2) PROC STEPS
Common types of data files
Strategies for reading data into SAS
Using the DATA step
Manual entry
Infile statement
Special input circumstances
Using the PROC IMPORT statement
Using the SAS Import Wizard
IF/THEN statements
Strategies for improving the display of our data
Labeling
Formatting
2
Common types of data files
Excel files (.xls or .xlsx)
Comma separated value (CSV) files (.csv)
Other text files (e.g., .dat or .txt)
3
Common types of data files
Excel files (.xls or .xlsx)
Comma separated value (CSV) files (.csv)
Other text files (e.g., .dat or .txt)
The file extensions indicate the type of data file we are working with.
4
Sample data: Multi-ethnic Study of Atherosclerosis
Will use three versions of the MESA data
5
Sample data: Multi-ethnic Study of Atherosclerosis
Will use three versions of the MESA data.
MESA_short
4 variables, 9 observations
6
Sample data: Multi-ethnic Study of Atherosclerosis
sex BMI diabetes SBP
F 22.5 0 109
M 24.3 0 110
F 19.1 0 110
F 18.5 0 111
M 28.9 0 113
F 24.3 0 114
F 21.5 0 112
M 26.4 0 115
F 22.3 0 115
Preview of MESA_short dataset
Will use three versions of the MESA data.
MESA_short
4 variables, 9 observations
7
Sample data: Multi-ethnic Study of Atherosclerosis
Preview of MESA_medium dataset
sex BMI diabetes SBP
F 22.5 0 109
M 24.3 0 110
F 19.1 0 110
F 18.5 0 111
M 28.9 0 113
F 24.3 0 114
F 21.5 0 112
M 26.4 0 115
F 22.3 0 115
M 27.6 0 116
F 25.4 0 116
F 27.3 0 117
F 20.2 0 117
M 28.5 0 118
M 29.3 0 120
M 27.2 0 121
F 26.4 0 121
M 28.3 0 122
F 23.5 0 124
M 30.4 1 127
M 29.2 1 129
M 31.5 1 140
F 29.6 1 142
F 32.4 1 150
M 35.1 1 154
M 30.1 1 145
M 28.8 1 139
F 22.7 0 115
F 23 0 120
F 24.5 0 117
Will use three versions of the MESA data.
MESA_short
4 variables, 9 observations
MESA_medium
4 variables, 30 observations
8
Sample data: Multi-ethnic Study of Atherosclerosis
Preview of MESA_full dataset
Will use three versions of the MESA data.
MESA_short
4 variables, 9 observations
MESA_medium
4 variables, 30 observations
MESA_full
6 variables, 30 observations
sex BMI diabetes SBP DOB Site
F 22.5 0 109 08/09/1951 Columbia University
M 24.3 0 110 09/05/1948 Johns Hopkins University
F 19.1 0 110 03/15/1951 Northwestern University
F 18.5 0 111 02/10/1945 University of Minnesota
M 28.9 0 113 01/14/1954 Johns Hopkins University
F 24.3 0 114 02/28/1945 Northwestern University
F 21.5 0 112 11/11/1945 Northwestern University
M 26.4 0 115 12/23/1943 Columbia University
F 22.3 0 115 04/14/1950 Johns Hopkins University
M 27.6 0 116 06/17/1948 UCLA
F 25.4 0 116 07/25/1946 Johns Hopkins University
F 27.3 0 117 07/01/1950 UCLA
F 20.2 0 117 02/05/1947 UCLA
M 28.5 0 118 04/01/1942 Wake Forest University
M 29.3 0 120 05/05/1941 Wake Forest University
M 27.2 0 121 09/16/1943 Johns Hopkins University
F 26.4 0 121 08/18/1949 Columbia University
M 28.3 0 122 01/21/1951 Columbia University
F 23.5 0 124 10/21/1943 Johns Hopkins University
M 30.4 1 127 08/26/1939 University of Minnesota
M 29.2 1 129 11/01/1945 University of Minnesota
M 31.5 1 140 05/08/1943 Columbia University
F 29.6 1 142 12/12/1945 UCLA
F 32.4 1 150 01/30/1940 Wake Forest University
M 35.1 1 154 02/11/1939 UCLA
M 30.1 1 145 01/02/1938 Northwestern University
M 28.8 1 139 04/27/1945 Northwestern University
F 22.7 0 115 06/15/1940 University of Minnesota
F 23 0 120 10/10/1939 UCLA
F 24.5 0 117 12/09/1941 Wake Forest University
9
Sample data: Multi-ethnic Study of Atherosclerosis
Preview of MESA_full dataset
sex BMI diabetes SBP DOB Site
F 22.5 0 109 08/09/1951 Columbia University
M 24.3 0 110 09/05/1948 Johns Hopkins University
F 19.1 0 110 03/15/1951 Northwestern University
F 18.5 0 111 02/10/1945 University of Minnesota
M 28.9 0 113 01/14/1954 Johns Hopkins University
F 24.3 0 114 02/28/1945 Northwestern University
F 21.5 0 112 11/11/1945 Northwestern University
M 26.4 0 115 12/23/1943 Columbia University
F 22.3 0 115 04/14/1950 Johns Hopkins University
M 27.6 0 116 06/17/1948 UCLA
F 25.4 0 116 07/25/1946 Johns Hopkins University
F 27.3 0 117 07/01/1950 UCLA
F 20.2 0 117 02/05/1947 UCLA
M 28.5 0 118 04/01/1942 Wake Forest University
M 29.3 0 120 05/05/1941 Wake Forest University
M 27.2 0 121 09/16/1943 Johns Hopkins University
F 26.4 0 121 08/18/1949 Columbia University
M 28.3 0 122 01/21/1951 Columbia University
F 23.5 0 124 10/21/1943 Johns Hopkins University
M 30.4 1 127 08/26/1939 University of Minnesota
M 29.2 1 129 11/01/1945 University of Minnesota
M 31.5 1 140 05/08/1943 Columbia University
F 29.6 1 142 12/12/1945 UCLA
F 32.4 1 150 01/30/1940 Wake Forest University
M 35.1 1 154 02/11/1939 UCLA
M 30.1 1 145 01/02/1938 Northwestern University
M 28.8 1 139 04/27/1945 Northwestern University
F 22.7 0 115 06/15/1940 University of Minnesota
F 23 0 120 10/10/1939 UCLA
F 24.5 0 117 12/09/1941 Wake Forest University
Note: ‘MESA_short’ and ‘MESA_medium’ are subsets of ‘MESA_full’.
Will use three versions of the MESA data.
MESA_short
4 variables, 9 observations
MESA_medium
4 variables, 30 observations
MESA_full
6 variables, 30 observations
10
Sample data: Multi-ethnic Study of Atherosclerosis
Preview of MESA_full dataset
sex BMI diabetes SBP DOB Site
F 22.5 0 109 08/09/1951 Columbia University
M 24.3 0 110 09/05/1948 Johns Hopkins University
F 19.1 0 110 03/15/1951 Northwestern University
F 18.5 0 111 02/10/1945 University of Minnesota
M 28.9 0 113 01/14/1954 Johns Hopkins University
F 24.3 0 114 02/28/1945 Northwestern University
F 21.5 0 112 11/11/1945 Northwestern University
M 26.4 0 115 12/23/1943 Columbia University
F 22.3 0 115 04/14/1950 Johns Hopkins University
M 27.6 0 116 06/17/1948 UCLA
F 25.4 0 116 07/25/1946 Johns Hopkins University
F 27.3 0 117 07/01/1950 UCLA
F 20.2 0 117 02/05/1947 UCLA
M 28.5 0 118 04/01/1942 Wake Forest University
M 29.3 0 120 05/05/1941 Wake Forest University
M 27.2 0 121 09/16/1943 Johns Hopkins University
F 26.4 0 121 08/18/1949 Columbia University
M 28.3 0 122 01/21/1951 Columbia University
F 23.5 0 124 10/21/1943 Johns Hopkins University
M 30.4 1 127 08/26/1939 University of Minnesota
M 29.2 1 129 11/01/1945 University of Minnesota
M 31.5 1 140 05/08/1943 Columbia University
F 29.6 1 142 12/12/1945 UCLA
F 32.4 1 150 01/30/1940 Wake Forest University
M 35.1 1 154 02/11/1939 UCLA
M 30.1 1 145 01/02/1938 Northwestern University
M 28.8 1 139 04/27/1945 Northwestern University
F 22.7 0 115 06/15/1940 University of Minnesota
F 23 0 120 10/10/1939 UCLA
F 24.5 0 117 12/09/1941 Wake Forest University
Note: All versions of MESA data used in this presentation will be available on Canvas.
Will use three versions of the MESA data.
MESA_short
4 variables, 9 observations
MESA_medium
4 variables, 30 observations
MESA_full
6 variables, 30 observations
11
Sample data: Multi-ethnic Study of Atherosclerosis
sex BMI diabetes SBP DOB Site
F 22.5 0 109 08/09/1951 Columbia University
M 24.3 0 110 09/05/1948 Johns Hopkins University
F 19.1 0 110 03/15/1951 Northwestern University
F 18.5 0 111 02/10/1945 University of Minnesota
M 28.9 0 113 01/14/1954 Johns Hopkins University
F 24.3 0 114 02/28/1945 Northwestern University
F 21.5 0 112 11/11/1945 Northwestern University
M 26.4 0 115 12/23/1943 Columbia University
F 22.3 0 115 04/14/1950 Johns Hopkins University
M 27.6 0 116 06/17/1948 UCLA
F 25.4 0 116 07/25/1946 Johns Hopkins University
F 27.3 0 117 07/01/1950 UCLA
F 20.2 0 117 02/05/1947 UCLA
M 28.5 0 118 04/01/1942 Wake Forest University
M 29.3 0 120 05/05/1941 Wake Forest University
M 27.2 0 121 09/16/1943 Johns Hopkins University
F 26.4 0 121 08/18/1949 Columbia University
M 28.3 0 122 01/21/1951 Columbia University
F 23.5 0 124 10/21/1943 Johns Hopkins University
M 30.4 1 127 08/26/1939 University of Minnesota
M 29.2 1 129 11/01/1945 University of Minnesota
M 31.5 1 140 05/08/1943 Columbia University
F 29.6 1 142 12/12/1945 UCLA
F 32.4 1 150 01/30/1940 Wake Forest University
M 35.1 1 154 02/11/1939 UCLA
M 30.1 1 145 01/02/1938 Northwestern University
M 28.8 1 139 04/27/1945 Northwestern University
F 22.7 0 115 06/15/1940 University of Minnesota
F 23 0 120 10/10/1939 UCLA
F 24.5 0 117 12/09/1941 Wake Forest University
Variable: sex at birth
Variable name: sex
Type: character
Coding: F=female; M=male
Will use three versions of the MESA data.
MESA_short
4 variables, 9 observations
MESA_medium
4 variables, 30 observations
MESA_full
6 variables, 30 observations
Preview of MESA_full dataset
12
Sample data: Multi-ethnic Study of Atherosclerosis
sex BMI diabetes SBP DOB Site
F 22.5 0 109 08/09/1951 Columbia University
M 24.3 0 110 09/05/1948 Johns Hopkins University
F 19.1 0 110 03/15/1951 Northwestern University
F 18.5 0 111 02/10/1945 University of Minnesota
M 28.9 0 113 01/14/1954 Johns Hopkins University
F 24.3 0 114 02/28/1945 Northwestern University
F 21.5 0 112 11/11/1945 Northwestern University
M 26.4 0 115 12/23/1943 Columbia University
F 22.3 0 115 04/14/1950 Johns Hopkins University
M 27.6 0 116 06/17/1948 UCLA
F 25.4 0 116 07/25/1946 Johns Hopkins University
F 27.3 0 117 07/01/1950 UCLA
F 20.2 0 117 02/05/1947 UCLA
M 28.5 0 118 04/01/1942 Wake Forest University
M 29.3 0 120 05/05/1941 Wake Forest University
M 27.2 0 121 09/16/1943 Johns Hopkins University
F 26.4 0 121 08/18/1949 Columbia University
M 28.3 0 122 01/21/1951 Columbia University
F 23.5 0 124 10/21/1943 Johns Hopkins University
M 30.4 1 127 08/26/1939 University of Minnesota
M 29.2 1 129 11/01/1945 University of Minnesota
M 31.5 1 140 05/08/1943 Columbia University
F 29.6 1 142 12/12/1945 UCLA
F 32.4 1 150 01/30/1940 Wake Forest University
M 35.1 1 154 02/11/1939 UCLA
M 30.1 1 145 01/02/1938 Northwestern University
M 28.8 1 139 04/27/1945 Northwestern University
F 22.7 0 115 06/15/1940 University of Minnesota
F 23 0 120 10/10/1939 UCLA
F 24.5 0 117 12/09/1941 Wake Forest University
Variable: body mass index (BMI; kg/m2)
Variable name: BMI
Type: numeric
Will use three versions of the MESA data.
MESA_short
4 variables, 9 observations
MESA_medium
4 variables, 30 observations
MESA_full
6 variables, 30 observations
Preview of MESA_full dataset
13
Sample data: Multi-ethnic Study of Atherosclerosis
sex BMI diabetes SBP DOB Site
F 22.5 0 109 08/09/1951 Columbia University
M 24.3 0 110 09/05/1948 Johns Hopkins University
F 19.1 0 110 03/15/1951 Northwestern University
F 18.5 0 111 02/10/1945 University of Minnesota
M 28.9 0 113 01/14/1954 Johns Hopkins University
F 24.3 0 114 02/28/1945 Northwestern University
F 21.5 0 112 11/11/1945 Northwestern University
M 26.4 0 115 12/23/1943 Columbia University
F 22.3 0 115 04/14/1950 Johns Hopkins University
M 27.6 0 116 06/17/1948 UCLA
F 25.4 0 116 07/25/1946 Johns Hopkins University
F 27.3 0 117 07/01/1950 UCLA
F 20.2 0 117 02/05/1947 UCLA
M 28.5 0 118 04/01/1942 Wake Forest University
M 29.3 0 120 05/05/1941 Wake Forest University
M 27.2 0 121 09/16/1943 Johns Hopkins University
F 26.4 0 121 08/18/1949 Columbia University
M 28.3 0 122 01/21/1951 Columbia University
F 23.5 0 124 10/21/1943 Johns Hopkins University
M 30.4 1 127 08/26/1939 University of Minnesota
M 29.2 1 129 11/01/1945 University of Minnesota
M 31.5 1 140 05/08/1943 Columbia University
F 29.6 1 142 12/12/1945 UCLA
F 32.4 1 150 01/30/1940 Wake Forest University
M 35.1 1 154 02/11/1939 UCLA
M 30.1 1 145 01/02/1938 Northwestern University
M 28.8 1 139 04/27/1945 Northwestern University
F 22.7 0 115 06/15/1940 University of Minnesota
F 23 0 120 10/10/1939 UCLA
F 24.5 0 117 12/09/1941 Wake Forest University
Variable: type 2 diabetes
Variable name: diabetes
Type: numeric
Coding: 1=Yes; 0=No
Will use three versions of the MESA data.
MESA_short
4 variables, 9 observations
MESA_medium
4 variables, 30 observations
MESA_full
6 variables, 30 observations
Preview of MESA_full dataset
14
Sample data: Multi-ethnic Study of Atherosclerosis
sex BMI diabetes SBP DOB Site
F 22.5 0 109 08/09/1951 Columbia University
M 24.3 0 110 09/05/1948 Johns Hopkins University
F 19.1 0 110 03/15/1951 Northwestern University
F 18.5 0 111 02/10/1945 University of Minnesota
M 28.9 0 113 01/14/1954 Johns Hopkins University
F 24.3 0 114 02/28/1945 Northwestern University
F 21.5 0 112 11/11/1945 Northwestern University
M 26.4 0 115 12/23/1943 Columbia University
F 22.3 0 115 04/14/1950 Johns Hopkins University
M 27.6 0 116 06/17/1948 UCLA
F 25.4 0 116 07/25/1946 Johns Hopkins University
F 27.3 0 117 07/01/1950 UCLA
F 20.2 0 117 02/05/1947 UCLA
M 28.5 0 118 04/01/1942 Wake Forest University
M 29.3 0 120 05/05/1941 Wake Forest University
M 27.2 0 121 09/16/1943 Johns Hopkins University
F 26.4 0 121 08/18/1949 Columbia University
M 28.3 0 122 01/21/1951 Columbia University
F 23.5 0 124 10/21/1943 Johns Hopkins University
M 30.4 1 127 08/26/1939 University of Minnesota
M 29.2 1 129 11/01/1945 University of Minnesota
M 31.5 1 140 05/08/1943 Columbia University
F 29.6 1 142 12/12/1945 UCLA
F 32.4 1 150 01/30/1940 Wake Forest University
M 35.1 1 154 02/11/1939 UCLA
M 30.1 1 145 01/02/1938 Northwestern University
M 28.8 1 139 04/27/1945 Northwestern University
F 22.7 0 115 06/15/1940 University of Minnesota
F 23 0 120 10/10/1939 UCLA
F 24.5 0 117 12/09/1941 Wake Forest University
Variable: systolic blood pressure (mmHg)
Variable name: SBP
Type: numeric
Will use three versions of the MESA data.
MESA_short
4 variables, 9 observations
MESA_medium
4 variables, 30 observations
MESA_full
6 variables, 30 observations
Preview of MESA_full dataset
15
Sample data: Multi-ethnic Study of Atherosclerosis
sex BMI diabetes SBP DOB Site
F 22.5 0 109 08/09/1951 Columbia University
M 24.3 0 110 09/05/1948 Johns Hopkins University
F 19.1 0 110 03/15/1951 Northwestern University
F 18.5 0 111 02/10/1945 University of Minnesota
M 28.9 0 113 01/14/1954 Johns Hopkins University
F 24.3 0 114 02/28/1945 Northwestern University
F 21.5 0 112 11/11/1945 Northwestern University
M 26.4 0 115 12/23/1943 Columbia University
F 22.3 0 115 04/14/1950 Johns Hopkins University
M 27.6 0 116 06/17/1948 UCLA
F 25.4 0 116 07/25/1946 Johns Hopkins University
F 27.3 0 117 07/01/1950 UCLA
F 20.2 0 117 02/05/1947 UCLA
M 28.5 0 118 04/01/1942 Wake Forest University
M 29.3 0 120 05/05/1941 Wake Forest University
M 27.2 0 121 09/16/1943 Johns Hopkins University
F 26.4 0 121 08/18/1949 Columbia University
M 28.3 0 122 01/21/1951 Columbia University
F 23.5 0 124 10/21/1943 Johns Hopkins University
M 30.4 1 127 08/26/1939 University of Minnesota
M 29.2 1 129 11/01/1945 University of Minnesota
M 31.5 1 140 05/08/1943 Columbia University
F 29.6 1 142 12/12/1945 UCLA
F 32.4 1 150 01/30/1940 Wake Forest University
M 35.1 1 154 02/11/1939 UCLA
M 30.1 1 145 01/02/1938 Northwestern University
M 28.8 1 139 04/27/1945 Northwestern University
F 22.7 0 115 06/15/1940 University of Minnesota
F 23 0 120 10/10/1939 UCLA
F 24.5 0 117 12/09/1941 Wake Forest University
Variable: date of birth
Variable name: DOB
Type: date (entered as MM/DD/YYYY)
Will use three versions of the MESA data.
MESA_short
4 variables, 9 observations
MESA_medium
4 variables, 30 observations
MESA_full
6 variables, 30 observations
Preview of MESA_full dataset
16
Sample data: Multi-ethnic Study of Atherosclerosis
sex BMI diabetes SBP DOB Site
F 22.5 0 109 08/09/1951 Columbia University
M 24.3 0 110 09/05/1948 Johns Hopkins University
F 19.1 0 110 03/15/1951 Northwestern University
F 18.5 0 111 02/10/1945 University of Minnesota
M 28.9 0 113 01/14/1954 Johns Hopkins University
F 24.3 0 114 02/28/1945 Northwestern University
F 21.5 0 112 11/11/1945 Northwestern University
M 26.4 0 115 12/23/1943 Columbia University
F 22.3 0 115 04/14/1950 Johns Hopkins University
M 27.6 0 116 06/17/1948 UCLA
F 25.4 0 116 07/25/1946 Johns Hopkins University
F 27.3 0 117 07/01/1950 UCLA
F 20.2 0 117 02/05/1947 UCLA
M 28.5 0 118 04/01/1942 Wake Forest University
M 29.3 0 120 05/05/1941 Wake Forest University
M 27.2 0 121 09/16/1943 Johns Hopkins University
F 26.4 0 121 08/18/1949 Columbia University
M 28.3 0 122 01/21/1951 Columbia University
F 23.5 0 124 10/21/1943 Johns Hopkins University
M 30.4 1 127 08/26/1939 University of Minnesota
M 29.2 1 129 11/01/1945 University of Minnesota
M 31.5 1 140 05/08/1943 Columbia University
F 29.6 1 142 12/12/1945 UCLA
F 32.4 1 150 01/30/1940 Wake Forest University
M 35.1 1 154 02/11/1939 UCLA
M 30.1 1 145 01/02/1938 Northwestern University
M 28.8 1 139 04/27/1945 Northwestern University
F 22.7 0 115 06/15/1940 University of Minnesota
F 23 0 120 10/10/1939 UCLA
F 24.5 0 117 12/09/1941 Wake Forest University
Variable: date of birth
Variable name: DOB
Type: date (entered as MM/DD/YYYY)
Note: date variables require ‘special handling’ in SAS
Will use three versions of the MESA data.
MESA_short
4 variables, 9 observations
MESA_medium
4 variables, 30 observations
MESA_full
6 variables, 30 observations
Preview of MESA_full dataset
17
Sample data: Multi-ethnic Study of Atherosclerosis
sex BMI diabetes SBP DOB Site
F 22.5 0 109 08/09/1951 Columbia University
M 24.3 0 110 09/05/1948 Johns Hopkins University
F 19.1 0 110 03/15/1951 Northwestern University
F 18.5 0 111 02/10/1945 University of Minnesota
M 28.9 0 113 01/14/1954 Johns Hopkins University
F 24.3 0 114 02/28/1945 Northwestern University
F 21.5 0 112 11/11/1945 Northwestern University
M 26.4 0 115 12/23/1943 Columbia University
F 22.3 0 115 04/14/1950 Johns Hopkins University
M 27.6 0 116 06/17/1948 UCLA
F 25.4 0 116 07/25/1946 Johns Hopkins University
F 27.3 0 117 07/01/1950 UCLA
F 20.2 0 117 02/05/1947 UCLA
M 28.5 0 118 04/01/1942 Wake Forest University
M 29.3 0 120 05/05/1941 Wake Forest University
M 27.2 0 121 09/16/1943 Johns Hopkins University
F 26.4 0 121 08/18/1949 Columbia University
M 28.3 0 122 01/21/1951 Columbia University
F 23.5 0 124 10/21/1943 Johns Hopkins University
M 30.4 1 127 08/26/1939 University of Minnesota
M 29.2 1 129 11/01/1945 University of Minnesota
M 31.5 1 140 05/08/1943 Columbia University
F 29.6 1 142 12/12/1945 UCLA
F 32.4 1 150 01/30/1940 Wake Forest University
M 35.1 1 154 02/11/1939 UCLA
M 30.1 1 145 01/02/1938 Northwestern University
M 28.8 1 139 04/27/1945 Northwestern University
F 22.7 0 115 06/15/1940 University of Minnesota
F 23 0 120 10/10/1939 UCLA
F 24.5 0 117 12/09/1941 Wake Forest University
Variable: recruitment site
Variable name: site
Type: character (long string with spaces)
Will use three versions of the MESA data.
MESA_short
4 variables, 9 observations
MESA_medium
4 variables, 30 observations
MESA_full
6 variables, 30 observations
Preview of MESA_full dataset
18
Sample data: Multi-ethnic Study of Atherosclerosis
sex BMI diabetes SBP DOB Site
F 22.5 0 109 08/09/1951 Columbia University
M 24.3 0 110 09/05/1948 Johns Hopkins University
F 19.1 0 110 03/15/1951 Northwestern University
F 18.5 0 111 02/10/1945 University of Minnesota
M 28.9 0 113 01/14/1954 Johns Hopkins University
F 24.3 0 114 02/28/1945 Northwestern University
F 21.5 0 112 11/11/1945 Northwestern University
M 26.4 0 115 12/23/1943 Columbia University
F 22.3 0 115 04/14/1950 Johns Hopkins University
M 27.6 0 116 06/17/1948 UCLA
F 25.4 0 116 07/25/1946 Johns Hopkins University
F 27.3 0 117 07/01/1950 UCLA
F 20.2 0 117 02/05/1947 UCLA
M 28.5 0 118 04/01/1942 Wake Forest University
M 29.3 0 120 05/05/1941 Wake Forest University
M 27.2 0 121 09/16/1943 Johns Hopkins University
F 26.4 0 121 08/18/1949 Columbia University
M 28.3 0 122 01/21/1951 Columbia University
F 23.5 0 124 10/21/1943 Johns Hopkins University
M 30.4 1 127 08/26/1939 University of Minnesota
M 29.2 1 129 11/01/1945 University of Minnesota
M 31.5 1 140 05/08/1943 Columbia University
F 29.6 1 142 12/12/1945 UCLA
F 32.4 1 150 01/30/1940 Wake Forest University
M 35.1 1 154 02/11/1939 UCLA
M 30.1 1 145 01/02/1938 Northwestern University
M 28.8 1 139 04/27/1945 Northwestern University
F 22.7 0 115 06/15/1940 University of Minnesota
F 23 0 120 10/10/1939 UCLA
F 24.5 0 117 12/09/1941 Wake Forest University
Variable: recruitment site
Variable name: site
Type: character (long string with spaces)
Will use three versions of the MESA data.
MESA_short
4 variables, 9 observations
MESA_medium
4 variables, 30 observations
MESA_full
6 variables, 30 observations
Note: long strings require ‘special handling’ in SAS
Preview of MESA_full dataset
19
Use the DATA step
Use PROC IMPORT
Use the SAS Import Wizard
Strategies for reading data into SAS
*************** Example: manual entry within DATA step ***************;
Data mesa_short;
input sex $ BMI diabetes SBP;
datalines;
F 22.5 0 109
M 24.3 0 110
F 19.1 0 110
F 18.5 0 111
M 28.9 0 113
F 24.3 0 114
F 21.5 0 112
M 26.4 0 115
F 22.3 0 115
;
run;
Reading in data using the DATA step: manual entry
Usage: for reading in small datasets (few observations/variables)
Note: We can assign our dataset any name we want as long as it :
begins with a letter;
contains no spaces; and
is < 32 characters in length.
Creates a dataset named ‘mesa_short’
21
*************** Example: manual entry within DATA step ***************;
Data mesa_short;
input sex $ BMI diabetes SBP;
datalines;
F 22.5 0 109
M 24.3 0 110
F 19.1 0 110
F 18.5 0 111
M 28.9 0 113
F 24.3 0 114
F 21.5 0 112
M 26.4 0 115
F 22.3 0 115
;
run;
*************** Example: manual entry within DATA step ***************;
Data mesa_short;
input sex $ BMI diabetes SBP;
datalines;
F 22.5 0 109
M 24.3 0 110
F 19.1 0 110
F 18.5 0 111
M 28.9 0 113
F 24.3 0 114
F 21.5 0 112
M 26.4 0 115
F 22.3 0 115
;
run;
Usage: for reading in small datasets (few observations/variables)
Creates a dataset named ‘mesa_short’
Lists variable names in order; use $ to indicate sex is a ‘character’ variable
Reading in data using the DATA step: manual entry
22
*************** Example: manual entry within DATA step ***************;
Data mesa_short;
input sex $ BMI diabetes SBP;
datalines;
F 22.5 0 109
M 24.3 0 110
F 19.1 0 110
F 18.5 0 111
M 28.9 0 113
F 24.3 0 114
F 21.5 0 112
M 26.4 0 115
F 22.3 0 115
;
run;
*************** Example: manual entry within DATA step ***************;
Data mesa_short;
input sex $ BMI diabetes SBP;
datalines;
F 22.5 0 109
M 24.3 0 110
F 19.1 0 110
F 18.5 0 111
M 28.9 0 113
F 24.3 0 114
F 21.5 0 112
M 26.4 0 115
F 22.3 0 115
;
run;
Usage: for reading in small datasets (few observations/variables)
Creates a dataset named ‘mesa_short’
Lists variable names in order; use $ to indicate sex is a ‘character’ variable
The word cards can be used interchangeably with datalines.
Reading in data using the DATA step: manual entry
23
*************** Example: manual entry within DATA step ***************;
Data mesa_short;
input sex $ BMI diabetes SBP;
datalines;
F 22.5 0 109
M 24.3 0 110
F 19.1 0 110
F 18.5 0 111
M 28.9 0 113
F 24.3 0 114
F 21.5 0 112
M 26.4 0 115
F 22.3 0 115
;
run;
*************** Example: manual entry within DATA step ***************;
Data mesa_short;
input sex $ BMI diabetes SBP;
datalines;
F 22.5 0 109
M 24.3 0 110
F 19.1 0 110
F 18.5 0 111
M 28.9 0 113
F 24.3 0 114
F 21.5 0 112
M 26.4 0 115
F 22.3 0 115
;
run;
Usage: for reading in small datasets (few observations/variables)
Note that the semicolon has to be on a separate line (below the data).
Creates a dataset named ‘mesa_short’
Lists variable names in order; use $ to indicate sex is a ‘character’ variable
The word cards can be used interchangeably with datalines.
Reading in data using the DATA step: manual entry
24
Check that your dataset was successfully created.
Step 1: Check the SAS log.
Things to look for:
errors in your program
The SAS log will show which line in the program contains the error.
correct # of observations and variables
Reading in data using the DATA step: manual entry
Looks good!
25
******** Print the data to the screen ********;
proc print data=mesa_short;
run;
PROC PRINT Output
Check that your dataset was successfully created.
Step 2: View the dataset within SAS.
Reading in data using the DATA step: manual entry
Exactly what we expected
26
******** View contents of the dataset ********;
proc contents data=mesa_short;
run;
PROC CONTENTS Output
Note: PROC CONTENTS provides a wealth of information on our dataset in addition to the selected output shown on this slide (e.g., # of variables/observations, variable labels, etc.).
Check that your dataset was successfully created.
Step 3: Check that variable ‘types’ (character, numeric, etc.) are correct.
Looks good!
Reading in data using the DATA step: manual entry
27
**************** Example: Infile statement within DATA step *****************;
data mesa_medium;
infile "C:\BIS679A\Lecture1\MESA_medium.csv" dlm="," firstobs=2;
input sex $ BMI diabetes SBP;
run;
***** Print the data to the screen *****;
proc print data=mesa_medium (obs=5);
run;
(Often used to input text data files (e.g., .csv))
Usage: for reading in larger datafiles in combination with an INPUT statement
Specifies the datafile name ‘MESA_medium.csv’ and location in PC
Reading in data using the DATA step: Infile statement
28
**************** Example: Infile statement within DATA step *****************;
data mesa_medium;
infile "C:\BIS679A\Lecture1\MESA_medium.csv" dlm="," firstobs=2;
input sex $ BMI diabetes SBP;
run;
***** Print the data to the screen *****;
proc print data=mesa_medium (obs=5);
run;
Usage: for reading in larger datafiles in combination with an INPUT statement
Specifies the datafile name ‘MESA_medium.csv’ and location in PC
Specifies ‘delimiter’
Note: a ‘delimiter’ is the character/value that denotes the boundary between variables in a text datafile.
Reading in data using the DATA step: Infile statement
(Often used to input text data files (e.g., .csv))
29
**************** Example: Infile statement within DATA step *****************;
data mesa_medium;
infile "C:\BIS679A\Lecture1\MESA_medium.csv" dlm="," firstobs=2;
input sex $ BMI diabetes SBP;
run;
***** Print the data to the screen *****;
proc print data=mesa_medium (obs=5);
run;
Specifies the datafile name ‘MESA_medium.csv’ and location in PC
Specifies ‘delimiter’
Tells SAS where to begin reading data
Note: a ‘delimiter’ is the character/value that denotes the boundary between variables in a text datafile.
Reading in data using the DATA step: Infile statement
Usage: for reading in larger datafiles in combination with an INPUT statement
(Often used to input text data files (e.g., .csv))
30
**************** Example: Infile statement within DATA step *****************;
data mesa_medium;
infile "C:\BIS679A\Lecture1\MESA_medium.csv" dlm="," firstobs=2;
input sex $ BMI diabetes SBP;
run;
***** Print the data to the screen *****;
proc print data=mesa_medium (obs=5);
run;
Specifies the datafile name ‘MESA_medium.csv’ and location in PC
Specifies ‘delimiter’
Tells SAS where to begin reading data
List names of variables included in the file (in the order in which they appear in the file)
Note: a ‘delimiter’ is the character/value that denotes the boundary between variables in a text datafile.
Reading in data using the DATA step: Infile statement
Usage: for reading in larger datafiles in combination with an INPUT statement
(Often used to input text data files (e.g., .csv))
31
**************** Example: Infile statement within DATA step *****************;
data mesa_medium;
infile "C:\BIS679A\Lecture1\MESA_medium.csv" dlm="," firstobs=2;
input sex $ BMI diabetes SBP;
run;
***** Print the data to the screen *****;
proc print data=mesa_medium (obs=5);
run;
Specifies the datafile name ‘MESA_medium.csv’ and location in PC
Specifies ‘delimiter’
Tells SAS where to begin reading data
List names of variables included in the file (in the order in which they appear in the file)
Selects only the first 5 observations for printing (useful for large datasets)
Note: a ‘delimiter’ is the character/value that denotes the boundary between variables in a text datafile.
Reading in data using the DATA step: Infile statement
Usage: for reading in larger datafiles in combination with an INPUT statement
(Often used to input text data files (e.g., .csv))
32
**************** Example: Infile statement within DATA step *****************;
data mesa_medium;
infile "C:\BIS679A\Lecture1\MESA_medium.csv" dlm="," firstobs=2;
input sex $ BMI diabetes SBP;
run;
***** Print the data to the screen *****;
proc print data=mesa_medium (obs=5);
run;
Specifies the datafile name ‘MESA_medium.csv’ and location in PC
Specifies ‘delimiter’
Tells SAS where to begin reading data
List names of variables included in the file (in the order in which they appear in the file)
Selects only the first 5 observations for printing (useful for large datasets)
PROC PRINT Output
Note: a ‘delimiter’ is the character/value that denotes the boundary between variables in a text datafile.
Reading in data using the DATA step: Infile statement
Usage: for reading in larger datafiles in combination with an INPUT statement
(Often used to input text data files (e.g., .csv))
33
Now, let’s modify our DATA step to input the MESA_full.csv file.
Recall that the MESA_full dataset contains two additional variables:
date of birth
Variable name: DOB
Type: date (entered as MM/DD/YYYY)
recruitment site
Variable name: site
Type: character (long string with spaces)
As previously noted, dates and long character variables require ‘special handling’ in SAS.
Reading in data using the DATA step: Infile statement
34
Date variables
By default, dates are stored as integers in SAS.
The reference date is January 1, 1960 (i.e., this date is stored as a ‘0’).
All other dates are stored as integer values relative to the reference date.
Examples:
December 31, 1959 = -1
January 2, 1960 = 1
September 18, 2017 = 21080
Special data input circumstances
35
Date variables
Dates may be entered into a given datafile in various formats.
Examples:
09/18/2017
09182017
18SEPT2017
SAS needs to know in which format to expect the dates in a given datafile.
Specify a date format when reading data into SAS using a DATA step.
Please note that when reading data into SAS using PROC IMPORT the date format is automatically detected.
Special data input circumstances
36
Date variables
SAS also needs to know in which format to display the dates for us.
Recall that SAS stores dates as integers.
Unless it’s told to do otherwise, it will display the values of date variables as integers.
Therefore, it is useful to apply a date format to a date variable.
SAS date formats
(from Cody and Smith, pg.124)
Special data input circumstances
37
When working with character variables, keep in mind that:
SAS’s default character limit for variables is 8 characters.
SAS expects that data entries will NOT contain spaces.
Therefore, we need to warn SAS when our data do not meet these expectations.
Character (‘string’) variables
Special data input circumstances
Observations of lengths beyond this limit will appear truncated.
The program will stop entering values of a character variable after it encounters the first space.
*************** Example: Infile statement within DATA step ***************
data mesa_full;
infile "C:\BIS679A\Lecture1\MESA_full.csv" dlm="," firstobs=2;
input sex $ BMI diabetes SBP DOB : mmddyy10. site & $25.;
run;
***** Print the data to the screen *****;
proc print data=mesa_full (obs=5);
run;
Let’s modify our DATA step to tell SAS in which format to expect the date variable.
PROC PRINT Output
SAS read in the dates correctly!
BUT, it is displaying the integer values of the dates.
Prevents read errors due to missing leading zeros
Tells SAS the dates were entered into the datafile in mm/dd/yyyy format
Reading in data using the DATA step: Infile statement
39
*************** Example: Infile statement within DATA step ***************
data mesa_full;
infile "C:\BIS679A\Lecture1\MESA_full.csv" dlm="," firstobs=2;
input sex $ BMI diabetes SBP DOB : mmddyy10. site & $25.;
format DOB mmddyy10.;
run;
***** Print the data to the screen *****;
proc print data=mesa_full (obs=5);
run;
Let’s modify our DATA step to tell SAS in which format to display the dates.
Applies the ‘mmddyy10’ format to the DOB variable for more readable output.
Much better!
PROC PRINT Output
Reading in data using the DATA step: Infile statement
40
*************** Example: Infile statement within DATA step ***************
data mesa_full;
infile "C:\BIS679A\Lecture1\MESA_full.csv" dlm="," firstobs=2;
input sex $ BMI diabetes SBP DOB : mmddyy10. site & $25.;
format DOB mmddyy10.;
run;
***** Print the data to the screen *****;
proc print data=mesa_full (obs=5);
run;
Let’s modify our DATA step to account for spaces and length of ‘site’ variable.
Tells SAS to expect spaces
Changes character limit to 25
(value chosen based on our data)
Perfect!
Reading in data using the DATA step: Infile statement
PROC PRINT Output
41
Reading data into SAS: PROC IMPORT
********************* Example: PROC IMPORT with CSV Data **********************;
proc import
datafile = "C:\BIS679A\Lecture1\MESA_full.csv"
dbms = csv
out = MESA_full REPLACE;
format DOB mmddyy10.;
run;
Specifies datafile name ‘MESA_full.csv’ and location in PC
Usage: for reading in datafiles directly from external file
Reading data into SAS: PROC IMPORT
********************* Example: PROC IMPORT with CSV Data **********************;
proc import
datafile = "C:\BIS679A\Lecture1\MESA_full.csv"
dbms = csv
out = MESA_full REPLACE;
format DOB mmddyy10.;
run;
Specifies datafile name ‘MESA_full.csv’ and location in PC
Specifies the file extension. Note that dbms is short for database management system.
Usage: for reading in datafiles directly from external file
Reading data into SAS: PROC IMPORT
********************* Example: PROC IMPORT with CSV Data **********************;
proc import
datafile = "C:\BIS679A\Lecture1\MESA_full.csv"
dbms = csv
out = MESA_full REPLACE;
format DOB mmddyy10.;
run;
Specifies datafile name ‘MESA_full.csv’ and location in PC
Specifies the file extension. Note that dbms is short for database management system.
Usage: for reading in datafiles directly from external file
Creates a dataset called MESA_full. This will contain all variables & observations in the CSV file.
Reading data into SAS: PROC IMPORT
********************* Example: PROC IMPORT with CSV Data **********************;
proc import
datafile = "C:\BIS679A\Lecture1\MESA_full.csv"
dbms = csv
out = MESA_full REPLACE;
format DOB mmddyy10.;
run;
Specifies datafile name ‘MESA_full.csv’ and location in PC
Specifies the file extension. Note that dbms is short for database management system.
‘REPLACE’ option asks SAS to overwrite any other dataset currently stored in memory under the same name.
Usage: for reading in datafiles directly from external file
Creates a dataset called MESA_full. This will contain all variables & observations in the CSV file.
Reading data into SAS: PROC IMPORT
********************* Example: PROC IMPORT with CSV Data **********************;
proc import
datafile = "C:\BIS679A\Lecture1\MESA_full.csv"
dbms = csv
out = MESA_full REPLACE;
format DOB mmddyy10.;
run;
Specifies datafile name ‘MESA_full.csv’ and location in PC
Specifies the file extension. Note that dbms is short for database management system.
‘REPLACE’ option asks SAS to overwrite any other dataset currently stored in memory under the same name.
Note: As noted previously, the formatting/special features of dates and other variables will be automatically detected by PROC IMPORT. However, we can still use the ‘format’ statement to control how SAS displays the date values.
Usage: for reading in datafiles directly from external file
Creates a dataset called MESA_full. This will contain all variables & observations in the CSV file.
Reading data into SAS: PROC IMPORT
Usage: for reading in datafiles directly from external file
******************** Example: PROC IMPORT with Excel Data *********************;
proc import
datafile = "C:\BIS679A\Lecture1\MESA_full.xlsx"
dbms = xlsx
out = MESA_full REPLACE;
format DOB mmddyy10.;
run;
Reading data into SAS: SAS IMPORT WIZARD
Usage: for reading in external files with direct guidance from SAS
(It can also help us generate the SAS code necessary to read in the file via PROC IMPORT).
Note: The Import Wizard can be accessed within the SAS drop-down menu (File Import Data).
PROC IMPORT OUT= MESA_FULL
DATAFILE="C:\Users\yds4\Documents\MESA_full.xlsx"
DBMS=XLSX REPLACE;
RANGE="MESA$";
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
Note that the SAS IMPORT WIZARD will create code that includes all statements/options available within PROC IMPORT.
You can learn about these options in the online SAS documentation and gauge which of these you would need to include in a given situation.
data mesa_full2;
set mesa_full;
/*example 1*/
if sex = "F" then newsex = 1;
else if sex = "M" then newsex=0;
else if sex = " " then newsex=.;
/*example 2*/
if BMI < 18.5 then weight_status=1;
else if 18.5 <= BMI < 25 then weight_status = 2;
else if 25 <= BMI < 30 then weight_status = 3;
else if BMI => 30 then weight_status = 4;
else if BMI = . then weight_status = .;
run;
“If/then” statements
Usage: for creating new variables from existing variables within a DATA step.
Copies the contents of ‘mesa_full’ to a new dataset called ‘mesa_full2’.
The dataset mesa_full2 will contain the old variables + our new variables.
data mesa_full2;
set mesa_full;
/*example 1*/
if sex = “F” then newsex = 1;
else if sex = “M” then newsex=0;
else if sex = “ ” then newsex=.;
/*example 2*/
if BMI < 18.5 then weight_status=1;
else if 18.5 <= BMI < 25 then weight_status = 2;
else if 25 <= BMI < 30 then weight_status = 3;
else if BMI => 30 then weight_status = 4;
else if BMI = . then weight_status = .;
run;
“If/then” statements
Usage: for creating new variables from existing variables within a DATA step.
Copies the contents of ‘mesa_full’ to a new dataset called ‘mesa_full2’.
The dataset mesa_full2 will contain the old variables + our new variables.
Makes a numeric variable called ‘newsex’ from the character variable called ‘sex’. Useful for analytic purposes, as you will see in Lab 2 this week.
Notice the different notation for missing values of character vs. numeric variables.
Guidelines for if/then statements:
Must be mutually exclusive.
Must include all possible real values.
data mesa_full2;
set mesa_full;
/*example 1*/
if sex = “F” then newsex = 1;
else if sex = “M” then newsex=0;
else if sex = “ ” then newsex=.;
/*example 2*/
if BMI < 18.5 then weight_status=1;
else if 18.5 <= BMI < 25 then weight_status = 2;
else if 25 <= BMI < 30 then weight_status = 3;
else if BMI => 30 then weight_status = 4;
else if BMI = . then weight_status = .;
run;
“If/then” statements
Usage: for creating new variables from existing variables within a DATA step.
Copies the contents of ‘mesa_full’ to a new dataset called ‘mesa_full2’.
The dataset mesa_full2 will contain the old variables + our new variables.
Makes a numeric variable called ‘newsex’ from the character variable called ‘sex’. Useful for analytic purposes, as you will see in Lab 2 this week.
Notice the different notation for missing values of character vs. numeric variables.
Makes a numeric categorical variable called ‘weight_status’ from the numeric continuous variable called ‘BMI’ using standard cutpoints defined by the World Health Organization.
Guidelines for if/then statements:
Must be mutually exclusive.
Must include all possible real values.
/*check values of new variable against values of old variable*/
proc freq data=mesa_full2;
tables sex*new_sex/list missing;
run;
proc freq data=mesa_full2;
tables bmi*weight_status/list missing;
run;
Always check that the new variables were created correctly.
“If/then” statements
Requests a table of the old variable vs. new variable.
Note that If/then statements that are not mutually exclusive will generate missing values.
The list missing option helps us catch those coding errors.
Note: It is good practice to do this for every new variable.
“If/then” statements
PROC FREQ Output
PROC FREQ Output
All “F” values 1
All “M” values 0
Weight_status variable values correspond to the cut-points of BMI we specified.
Always check that the new variables were created correctly.
Labeling
Formatting
Strategies for improving the display of our data
Improving display of our data: the Label statement
Usage: adding a description (‘label’) to variables using the DATA step
***** Example adding descriptions to our variables using LABEL statement ****;
data mesa_full2;
set mesa_full2;
label BMI = “body mass index (in kg/m2)”
SBP = “systolic blood pressure (in mmHg)”
DOB = “date of birth”
weight_status = “BMI categories, based on WHO cutpoints”
;
run;
************** check that labels have been created successfully **************;
proc contents data= mesa_full2;
run;
This time we are not creating a new dataset. We are adding the labels directly to ‘mesa_full2’ and over-writing the old version of this dataset. (Analogous to hitting ‘save’ vs. ‘save as’.)
Improving display of our data: the Label statement
Usage: adding a description (‘label’) to variables using the DATA step
***** Example adding descriptions to our variables using LABEL statement ****;
data mesa_full2;
set mesa_full2;
label BMI = “body mass index (in kg/m2)”
SBP = “systolic blood pressure (in mmHg)”
DOB = “date of birth”
weight_status = “BMI categories, based on WHO cutpoints”
;
run;
************** check that labels have been created successfully **************;
proc contents data= mesa_full2;
run;
This time we are not creating a new dataset. We are adding the labels directly to ‘mesa_full2’ and over-writing the old version of this dataset. (Analogous to hitting ‘save’ vs. ‘save as’.)
We can add labels to any variable (categorical or continuous, numeric or character).
Note: Labels are most helpful for variables with names that are not descriptive/meaningful.
Improving display of our data: the Label statement
Check labels using PROC CONTENTS
MESA_full BEFORE
MESA_full AFTER
See our labels ‘in action’: PROC MEANS example
PROC MEANS output
***** Descriptive stats for SBP *****;
proc means data=mesa_full2;
var SBP;
run;
Improving display of our data: variable formats
Usage: to give ‘word’ names to values assigned to categorical variables
Let’s inspect our MESA_full dataset:
‘Sex’, ‘diabetes’, ‘newsex’ and ‘weight_status’ are the only categorical variables in our dataset.
We can format these categorical variables, so that their data values have meaningful ‘names’.
Example: Formatting goal for weight_status values is to have the BMI category denoted by each numeric value:
1 “underweight”
2 “normal weight”
3 “overweight”
4 “obese”
Improving display of our data: variable formats
Formatting is a two-step process
*Step 1: Create format w/ PROC FORMAT*;
proc format;
value num_sexf
1 = “female”
0 = “male”;
value $ char_sexf
“F” = “female”
“M” = “male”;
value yesnof
1= “yes”
0= “no”;
value weightf
1 = “underweight”
2 = “normal weight”
3 = “overweight”
4 = “obese”;
run;
Note: We need one value statement for each format we want to create.
Improving display of our data: variable formats
Formatting is a two-step process
*Step 1: Create format w/ PROC FORMAT*;
proc format;
value num_sexf
1 = “female”
0 = “male”;
value $ char_sexf
“F” = “female”
“M” = “male”;
value yesnof
1= “yes”
0= “no”;
value weightf
1 = “underweight”
2 = “normal weight”
3 = “overweight”
4 = “obese”;
run;
Note: Each value statement creates a format called whateveryouwant (e.g. num_sexf) and lists the “names” to display in place of the actual values of a variable.
Improving display of our data: variable formats
Formatting is a two-step process
*Step 1: Create format w/ PROC FORMAT*;
proc format;
value num_sexf
1 = “female”
0 = “male”;
value $ char_sexf
“F” = “female”
“M” = “male”;
value yesnof
1= “yes”
0= “no”;
value weightf
1 = “underweight”
2 = “normal weight”
3 = “overweight”
4 = “obese”;
run;
Formats created for numeric variables
Improving display of our data: variable formats
Formatting is a two-step process
*Step 1: Create format w/ PROC FORMAT*;
proc format;
value num_sexf
1 = “female”
0 = “male”;
value $ char_sexf
“F” = “female”
“M” = “male”;
value yesnof
1= “yes”
0= “no”;
value weightf
1 = “underweight”
2 = “normal weight”
3 = “overweight”
4 = “obese”;
run;
Format created for character variable
Improving display of our data: variable formats
Formatting is a two-step process
*Step 1: Create format w/ PROC FORMAT*;
proc format;
value num_sexf
1 = “female”
0 = “male”;
value $ char_sexf
“F” = “female”
“M” = “male”;
value yesnof
1= “yes”
0= “no”;
value weightf
1 = “underweight”
2 = “normal weight”
3 = “overweight”
4 = “obese”;
run;
*Step 2: Apply formats in DATA step*;
data mesa_full2;
set mesa_full2;
format
sex char_sexf.
newsex num_sexf.
diabetes yesnof.
weight_status weightf.
;
run;
Applies the appropriate format to each variable by listing the variablename followed by the formatname.
Improving display of our data: variable formats
Formatting is a two-step process
*Step 1: Create format w/ PROC FORMAT*;
proc format;
value num_sexf
1 = “female”
0 = “male”;
value $ char_sexf
“F” = “female”
“M” = “male”;
value yesnof
1= “yes”
0= “no”;
value weightf
1 = “underweight”
2 = “normal weight”
3 = “overweight”
4 = “obese”;
run;
*Step 2: Apply formats in DATA step*;
data mesa_full2;
set mesa_full2;
format
sex char_sexf.
newsex num_sexf.
diabetes yesnof.
weight_status weightf.
;
run;
Note that a period (.) follows each formatname.
This is standard SAS syntax.
Improving display of our data: variable formats
Check that formatting worked by inspecting our dataset.
PROC PRINT Output
*print data to the screen*;
proc print data= mesa_full2;
run;
Much better!