MET MA 603: SAS Programming and Applications
MET MA 603:
SAS Programming and Applications
Combining Datasets with Merge
1
1
The Merge Statement can be used to combine two or more SAS datasets. Usually, the datasets will have at least one variable in common. The BY statement is used to indicate the common variables used to merge the datasets. The datasets must be sorted by the BY variables.
The number of observations and the number of variables in the output dataset are the union of the observations and variables in the input datasets, respectively.
If a variable is present in more than one dataset, only the values from the first dataset with the variable are written for those observations.
There are two ways that datasets can be combined with the Merge statement: One-to-One and One-to-Many.
Combining Datasets with Merge
2
2
One-to-One Merge
In a One-to-One Merge, the input datasets all have the same number of observations. A One-to-One merge is used when the variables we want to work with are divided among more than one dataset.
Data cities_with_pop ;
Merge cities
city_populations ;
By city ;
run ;
In a One-to-One merge, the number of observations in output and input datasets are the same.
3
3
One-to-Many Merge
In a One-to-Many Merge, the input datasets have different numbers of observations. In a one-to-many merge several variables in one input dataset correspond to a single variable in another input dataset.
Data cities_with_pop_and_region ;
Merge cities_with_pop
regions ;
By state ;
run ;
In a One-to-Many merge, the number of observations in output and input datasets are different.
The distinction between One-to-Many and One-to-One merges is made only to help in understanding the concept of merging – the coding looks exactly the same.
4
4
The in= Dataset Option
The in= dataset option can be used with the Merge statement to create “Inner Joins”, where the output dataset only includes the observations with matching values of the By variables in both datasets.
Data cities_with_pop_and_region ;
Merge cities_with_pop (in=a)
regions_nomidwest (in=b);
By state ;
If a = 1 and b = 1 then output;
run ;
5
5
Merging by State
Merge the State_Info.sas7bdat and Policies_by_State.sas7bdat datasets according to the common variables.
6
6
Rating by Roof Type
The Policy_Info.sas7bdat dataset has a list of policies and the type of roof for the home. The Roof_Rating.sas7bdat dataset has the rating factor for each type of roof. Merge the datasets so that the correct rating factor for each policy is shown. Your result should match what is below (not all of the observations are shown).
7
7
Readings
Textbook sections (5th edition) 6.4, 6.5, 6.6
Textbook sections (6th edition) 6.3, 6.4, 6.6
8
8
/docProps/thumbnail.jpeg