MET MA 603: SAS Programming and Applications
MET MA 603:
SAS Programming and Applications
Arrays
1
1
The Need for Arrays
Sometimes we need to perform the same operation on many different variables. For example, consider the City_temps_by_month_fahr dataset below. If we need to convert each temperature from Fahrenheit to Celsius, we would need to write a statement for each month.
Month_1 = (Month_1 – 32) / 9 * 5;
Month_2 = (Month_2 – 32) / 9 * 5;
Month_3 = (Month_3 – 32) / 9 * 5;
etc.
etc.
2
2
The Need for Arrays (cont.)
There are two problems with this. First, it is tedious to enter and review so much repetitive code. Second, it increases the chance of typos.
A better way of getting the result needed in the previous slide would be if we could instruct SAS to do the same conversion calculation on each of the Month_i variables.
Month(i) = (Month(i) – 32) / 9 * 5;
This can be achieved with Arrays.
An Array is a convenient way of temporarily identifying a group of variables for processing within a Data step.
3
3
Defining Arrays
Arrays are defined within the Data step. We must supply SAS with the name of the array, number of items, data type, and name of each variable separated by spaces.
ARRAY name (n) $ c1 c2 c3;
ARRAY name (n) num1 num2 num3;
The variables within in array must be all of one data type.
Arrays are temporary structures created and used within a Data step. At the end of the Data step the array is gone.
Array names follow the usual SAS naming convention.
4
4
Shorthand for Listing Variables
SAS can recognize two shorthand ways of listing variables.
The first method relies on indexed variable names. Listing the first and last variables separated by a dash is equivalent to listing all variables names in the range.
ARRAY months (12) Month_1 – Month_12;
The second method relies on the order in which the variables are organized in the input dataset. Listing the first and last variables with a double dash in between is equivalent to listing all of the variables in the dataset that fall in between first and last.
ARRAY months (12) Month_1 –- Month_12;
The shorthand methods of listing variable names can be used inside of some functions, such as SUM, MIN, MAX.
Total2 = sum(of month_1 – Month_12);
5
5
Arrays and Do Loops
Arrays almost always are found in combination with a Do Loop, which makes it possible to iterate through each variable in the array. This is what gives arrays their efficiency.
data city_temps_by_month_cels;
set city_temps_by_month_fahr;
array temps (12) month1 – month12;
do i = 1 to 12;
temps (i) = (temps (i) – 32) / 9 * 5 ;
end;
run;
Think of the array as a short-hand way of referring to each variable, without needing to actually type in each different name.
6
6
Calculating Average Temps
Using city_temps_by_month_fahr dataset, calculate the average of the 12 monthly temperatures for each city. Calculate the maximum and minimum monthly temperature for each city. Use the Fahrenheit scale.
This problem can be solved using arrays or using functions with shorthand variable listing. Try to solve it both ways.
7
7
Converting Temperatures
Using city_temps_by_month_fahr dataset, create a new dataset that has the monthly temperatures for the city of Boston, with the month and temperature as variables, as shown below. To achieve this result you will need to combine several other tools in addition to arrays. Include both temperature scales.
8
8
Readings
Textbook section (6th edition) 3.16
Textbook section (5th edition) 3.11
9
9
/docProps/thumbnail.jpeg