MET MA 603: SAS Programming and Applications
MET MA 603:
SAS Programming and Applications
Proc Import
1
1
Proc Import
Often, the data we want to work with isn’t stored in a SAS file (e.g., database, excel spreadsheet, text file, etc.) Proc Import is a procedure that creates a SAS dataset based on a non-SAS file.
This is often the fastest and easiest method for data that isn’t stored in SAS. However, because much of the procedure is automated, it lacks flexibility. If the source file has irregularities, this method may fail.
Use this method when the data isn’t in a SAS file, and the data is organized in a reasonable structure (we will see examples of files where Proc Import will work and examples where it won’t).
2
2
Using Proc Import
Proc Import
datafile = “C:\Users\govonlu\Desktop\Data\scores1.txt”
out = work.scores dbms = tab REPLACE
;
run;
This first statement is a Required Statement. It tells SAS where to find the file and what to name the SAS dataset.
When Replace is included in the Proc Import statement, any existing SAS dataset of the same name and in the same library as the dataset in the “out=” argument will be replaced. If Replace isn’t used the import will be cancelled.
DBMS tells SAS the type of data being used. Based on the extension, SAS will make an assumption about how the data is separated. If the assumption SAS makes is appropriate then DBMS is not needed.
3
3
Delimiter Statement
Proc Import
datafile = “C:\Users\govonlu\Desktop\Data\scores2.txt”
out = work.scores dbms = tab REPLACE ;
delimiter = “-”;
run;
When reading a file, SAS needs to know where one observation ends and the next begins. Delimiters are characters that indicate how the data is separated.
Based on the file extension and DBMS argument SAS will make an assumption about what the delimiter is. If the delimiter in the file is something different than the delimiter statement is needed.
Common choices of delimiters are spaces, commas, or tabs, but almost anything could be used.
4
4
DBMS and File Extensions
The table below shows the DBMS associated with some common file types that might be imported into SAS.
5
5
Other Optional Statements
Proc Import
datafile = “C:\Users\govonlu\Desktop\Data\occupancy.txt”
out = work.occupancy REPLACE ;
getnames = NO ; datarow = 6 ; guessingrows = 3000 ;
run;
By default, Proc Import assumes that the first line contains the variable names. Use the getnames statement to specify whether to use the first line as the variable names.
By default, Proc Import starts reading data from the first line of the file. Use the datarow statement to specify which line to start at.
By default, Proc Import determines the variable types and lengths based on the first 20 rows. Use the guessingrows statement to specify the number of guessing rows (note: increasing the number of rows slows processing).
6
6
Import Wizard
Proc Import can also be initiated via point-and-click by choosing Import Data from the File menu.
The Wizard will prompt for the type and location of the file, and the desired library and name for the SAS dataset.
Using the Import Wizard is really just another way of using Proc Import. Each piece of information the Wizard asks for
Based on what the user enters, SAS is generating Proc Import code. In the last step of the Wizard, choosing to save the Proc Import statements instructs SAS to save this code in a file. The code can be viewed, ran, or modified.
7
7
Where Proc Import Fails
Based on a scan of your data, SAS makes assumptions about the data and the dataset it should create. This can be convenient, but sometimes SAS doesn’t make the assumptions we want.
Sometimes the data file is structured in a way that Proc Import can’t deal with. This causes the method to fail.
When data is structured in columns SAS may not be able to determine how it is delimited.
Sometimes data is formatted, such as with percentages, currencies, or scientific notation. This may cause Proc Import to assign the wrong data type.
Sometimes our data has numbers that we want to be stored as text, such as zip codes or ID numbers. Proc Import will generally create numeric variables.
8
8
Practice
Use Proc Import to import “occupancy.csv”.
Use Proc Import to import “days_of_week1.txt”.
Use Proc Import to import “days_of_week2.txt”. Then, use the Import Wizard to import the same dataset, and save the code that SAS generates. How does the code you wrote compare to the code SAS generated? What is the smallest number that you could set guessingrows to, such that the data will still be imported correctly?
9
9
Readings
Textbook sections 2.3, 2.16
https://v8doc.sas.com/sashtml/proc/z0308090.htm
10
10
/docProps/thumbnail.jpeg