Knowledge Discovery in Databases
Stevens Institute of Technology Khasha Dehnad kdehnad@stevens.edu
Khasha.dehnad@aimsinfo.com Teaching Assistant
TBD
Spring 2013
1
What is SAS?
• Developed in the early 1970s at North Carolina State University
• Originally intended for management and analysis of agricultural field experiments
• Now the most widely used statistical analysis package
• Used to stand for “Statistical Analysis System”, now it is just an acronym
2
SAS is a Procedural language
Procedural programming is a programming paradigm, derived
from structured programming, based upon the concept of
the procedure call. Procedures, also known as routines, subroutines, or functions (not to be confused with mathematical functions, but similar to those used in functional programming), simply contain a series of computational steps to be carried out. Any given procedure might be called at any point during a program’s execution… ,
wikipedia
3
SAS Advantages
• Reliable/standard technical support
• Relatively simple but very powerful programming language for
data manipulations
• Easy access to multiple data sources and software packages
• Well documented SAS packages.
• Use of disk space
• Strong graphics and reporting
• SAS functionality is based on consumer demand (voluntary).
• Runs on almost any standard computing platform and operating system.
4
Disadvantages
• Very expensive
• Does not run modern tablets, phones, PDAs, and game
consoles (big foot print)
• Has infrequent releases
5
Course Objectives
After this course you will be:
• Functional in SAS programming
• Able to perform simple analysis in SAS
• Able to produce simple output
• Capable of extending your SAS knowledge and abilities But you will not become a SAS programmer
6
Typical Analysis Cycle
State the question
Load Data
Present Deploy
Prepare Data
Model Explore
7
SAS Basic Concepts
• Access/Load the data
• Clean and Manipulate the data
• Analyze/Explore the data
• Model the data
• Produce ,Present , and Deploy the results
8
Typical SAS Program Components
• Data Steps
• Procedures (Proc)
• Global Statements
• Macro Statements procs
• Command
Data manipulation and data cleansing, basic analysis and reporting
Procedures for analysis and/or presentations Managing the programming environment Automation of repeated data steps and/or
Line commands
9
• Base SAS
• SAS ACCESS
• SAS/STAT
• SAS/GRAPH
Data manipulation, basic and procedures, e.g. reporting
Access to databases and engines Statistical analysis
Presentation quality graphics
Overview of Some SAS Products
10
Getting Help
• http://support.sas.com/
• Documentation
• Submit a problem
note: run “proc setinit noalias;run;” to get the site number
Original site validation data
Site name: ‘STEVENS INSTITUTE OF TECHNOLOGY…..’. Site number: xxxxxxx.
Expiration: xxxxxxx
….
….
11
Agenda
• Navigate the SAS windowing environment
• Read various types of data into SAS data sets
• Create SAS variables and subset data
• Combine or create multiple SAS data sets
• Perform simple analysis
• Create and enhance listing and summary reports
12
Source of the Datasets
http://www.crcpress.com/product/isbn/9781439 816806
SAS Institute Training Datasets SAS Supplied Datasets
13
SAS Program Data Vector (PDV)
SAS Program Data Vector
Drop
Type Format Label
Type Format Label
Type Format Label
Type Format Label
Type Format Label
Type Format Label Drop
Type Format Label Drop
Type Format=$trscxx x Label
Type Format Label retain
Type Format Label
_infile_
_n_
Patient_ID
Name
RestHR
MaxHR
RecHR
TimeMin
TimeSec
Tolerance
Cum
loaddt
14