J. Hirschberg
Computational Economics and Business ECON30025/ ECOM90020
Tutorial Topic
Intro to SAS program
Copyright By PowCoder代写 加微信 powcoder
Creating and combining data series Graphics methods overview
Linear algebra basics
Multivariate Analysis in IML
Tutorial Exercises Data manipulation in SAS
Representing data
Practice examples Numerical Methods in IML
6. Analysis of the Australian IO tables
Input Output in IML
7. Assignment #1 Review
8. Linear programming in Excel
Example solutions for LP problems
9. Doing DEA
Example DEA setups
10. Quantile Regression
Using IML and
11. The Simulation of an inventory
A Queuing problem
12. Assignment #2 Review
Tutorials ………………………………………………………………………………………………………………………………………….1 1. Introduction to SAS ……………………………………………………………………………………………………………………..2 2. Creating and combining data series………………………………………………………………………………………………..3 3 Graphics Methods Overview ………………………………………………………………………………………………………….8 4. Linear algebra basics…………………………………………………………………………………………………………………..17 5. Programming Multivariate Statistics in IML ………………………………………………………………………………….19 6. Analysis of the Australian IO tables……………………………………………………………………………………………..24 8. Linear Programming…………………………………………………………………………………………………………………..31 9. Data Envelopment Analysis…………………………………………………………………………………………………………32 10. LAV and Quantile Regression……………………………………………………………………………………………………36 11. Inventorysimulation…………………………………………………………………………………………………………………38
J. Hirschberg Computational Economics and Business ECON30025/ ECOM90020
1. Introduction to SAS
From this tutorial you should have the following takeaways: Learn features of the SAS computer system
o To run SAS code from your computer using myuniapps. Findingfilesonyourcomputer.
Pastingcodeintotheeditor.
Cuttingandpastingresultsintoawordfile.
o To be able to perform the following steps in the interactive mode. Change editor styles.
To read a log file.
To examine data sets.
To use SAS output files.
o To navigate SAS help files.
Finding the details of the syntax of different Proc commands.
Locating different function commands that can be used in Data files. When in doubt use the internet to search for answers.
SAS is a proprietary program that can only be accessed in the full version via a site licence from SAS. Usually, this would not pose a problem at the University of Melbourne since we have a licence for all computers at Melbourne. However, due to the limitations on the return to campus we are unable to use the labs and tutorial rooms that have the program installed we need to use the myuniapps website1 that allows the use of a CITRIX connection to a server on campus. This method is the easiest method for access but requires a continuous connection to the university server. It can be accessed via computers with different operating systems with the appropriate CITRIX receiver software. Be advised that this software has recently been updated and if you have a older version you need to update it and restart your computer. At some point in the use of this method you will need to allow the system to access your computer’s files. Later in the subject you will also find that you can get access to other software such as Eviews, Stata, and Scientific Workplace that we will use for some limited examples.2
I have put all the data that needs to be read on a public website as csv files that can be read from anywhere. SAS will read these so that it is not necessary to store them on your computer. The cloud version of SAS will read them as well as the myuniapps version.
This tute is primarily a demonstration tute to show how these tasks can be done.
1 Myuniapps website can be found at: https://myuniapps.unimelb.edu.au/vpn/index.html Instructions for downloading the CITRIX software can be found there. This only needs to be done once.
2 Another option for off-campus access to SAS is “SAS University Edition”. You can find this at: https://www.sas.com/en_au/software/university-edition/download-software.html where you can find extensive descriptive videos on how to download and install this program. It does require a bit of patience. It is more complex than the myuniapps approach in that it requires that you install a virtual computer on your computer but once you have done this the program resides on your computer. In the past I would have said this was the best way however SAS has decided to discontinue this approach for students to move to a cloud based approach only in August of this year with the last chance to download this version is April 30, 2021. The cloud based option is the “SAS OnDemand” software that can be found at: https://www.sas.com/en_au/software/on-demand- for-academics/references/getting-started-with-sas-ondemand-for-academics-studio.html . This software option is running SAS Studio which has a slightly different interface from the SAS I use and is available from the myuniapps site. The code is the same and the results are the same as well. The editor is slightly different. However, the reliance on the same underlying program means that all the routines we use here will run in this environment. Even if you have difficulty reading your computer’s files you can cut and paste the programs (they are all text files) into the on-line source code editor. There are numerous videos of how this operates and there is on line support. To use this approach, you will need to create a profile.
J. Hirschberg Computational Economics and Business ECON30025/ ECOM90020
2. Creating and combining data series
From this tutorial you should have the following takeaways:
Understand the concept of the two forms of SAS programs the DATA step and the PROC step.
o To be able to perform the following steps in the data steps.
To use only a subset of the observations (rows).
To keep or drop a subset of the variables (columns) in the data set.
To create multiple dataset with one data step.
To combine multiple datasets by stacking them on top of each other.
To combine multiple datasets by concatenating or merging them.
To reconfigure a dataset from the wide view with multiple columns to a long view with
more observations into a single column (as done with the weather data).
o To be able to use some basic PROCs and follow the syntax of the commands.
To estimate a regression with automatic dummy variables using Proc glm. To generate multiple plots categorised by a variable.
To use a PROC to create a summary data set
The purpose of this tutorial is to provide some experience with SAS to create data sets of varying types for analysis.
You may download the programs for this tutorial from the LMS programs web-site. There is a zip file with all the programs we have (note I will be updating these as we have more topics). Alternatively, you can download individual ones as well.
First, open SAS,
J. Hirschberg Computational Economics and Business ECON30025/ ECOM90020
Load the footy_new program from where you have placed the unzipped programs.
Once you have read the program It should look like
In order to run the program you can either run the entire routine at once or more likely you should select the parts of the program you want to run and then either highlight the part of the code you want to run (just like you would be selecting part of a text to copy in word) – make sure you include the “ run;” part of the program in your selection and don’t leave out the “;”. Once you have highlighted the part you wan to run
you can either move the cursor to the running man and use the left button on the mouse, or go to the top of the frame and use the sub-menu under Run and find Submit, or press F8 on your keyboard to execute the part of the program. If you make a mistake you can usually do it again or include just a “;” and submit that part or just a “run;” .
The parts of the footy_new routine
J. Hirschberg Computational Economics and Business ECON30025/ ECOM90020
1. We read data from a csv file that contains data from the AFL website that lists the outcome of all the regular season football games from 1970 to 2006. Note this file is located on a website that we can access from off-campus as well as on-campus. This means you do not need to copy the data to a separate location.
2. From this data we create a new set with some transformed data and we label the variables for latter use.
3. We then demonstrate how we can summarize the data by creating summary data series.
4. Then we read another csv file of weather data as provided by the Australian Bureau of Meteorology
5. This data is available by month (daily data is more difficult to obtain). These observations can then
be used to define a monthly time series from 1900 to 2008 for rainfall in Melbourne as measured in
millimetres.
6. It is then possible to take these two data sets and determine if the attendance at football games at the
MCG is influenced by the amount of rainfall during the month.
Extra Questions:
1. Try creating some other combinations of the game data such as home goal accuracy or away team accuracy.
2. Add the month the game was played as another variable to explain changes in the attendance. To do this you need to add month to the regression.
* footy_new
proc import datafile=csvFile out=footy replace dbms=csv; run; Add labels to the names
Data footy1 ; set footy ;
Read Footy data from the excel file
This data is from the AFL website
Title “Footy Analysis” ;
filename csvFile url
“https://www.online.fbe.unimelb.edu.au/t_drive/ECOM/ECOM90020/data/footy.csv”
termstr=crlf;
Create a SASdate variable and determine
the day of the week the game was played.
A measure of kicking accuracy on the day by the ratio of total kicks on goal
to the number of goals.
Also add a format for the date.
date = mdy(mon,day,year) ; format date date7. ;
dw = weekday(date) ;
time = hour + min/60 ;
accuracy = t_g/(t_g+t_b) ;
Create a new variable that records which team won
and a variable for if the it was a home win or not.
Note the draws have the winner as ‘Draw’
if a_t > h_t then do; winner = Away ; Home_win = 1 ; end ;
else do; winner = Home ; Home_win = 0 ; end ;
if a_t = h_t then do; winner = ‘Draw’ ; Home_win = .5 ; end ;
Create labels for the variables to identify them later
Home_win = Flag for a home win
winner = Team that won
J. Hirschberg Computational Economics and Business ECON30025/ ECOM90020
dw = “Day of week (1=sun)”
time = Hour and fraction
date = date the gam played
round = Round in which played
w_marg = Winning Margin
Home = Home team
H_g = Home team goals
H_b = Home team behinds
H_t = Home team total
Away = Away team
A_g = Away team goals
A_b = Away team behinds
A_t = Away team total
Venue = Where played
Attend = Number of people attending
day = Day played
mon = Month played
year = Year Played
hour = Hour started
min = Minute started
t_b = Total number of behinds
t_g = Total number of goals
accuracy = Goals to shots on goal;
Sort the data by match winner then year
proc sort data=footy1 ; by winner year; run;
Note we exclude the draws from this new data set.
proc summary data=footy1(where=(winner ~= ‘Draw’)) ; by winner year; var w_marg ;
output out=tot_club n=N_wins mean=avg_WM max=max_wm; run;
proc sgpanel data=tot_club;
panelby winner / columns=3 ;
series y=n_wins x=year ; run;
Read the monthly rainfall in mm data for Melbourne
This data is from the Australian BOM web site.
filename csvFile url
“https://www.online.fbe.unimelb.edu.au/t_drive/ECOM/ECOM90020/data/rain.csv”
termstr=crlf;
proc import datafile=csvFile out=rain replace dbms=csv; run;
Transform the data read by rows to a regular time series by year and month.
data rain1 ; set rain ;
array xx m1-m12 ; * Define an array over the monthly variables ;
do mon = 1 to 12 ; * Create a new variable for the month ;
rain = xx[mon] ; * The variable rain is created for the monthly value;
drop m1–Annual ;
rain = Monthly mm of rain in Melbourne;
Create a new data set that provides the number of wins for each team,
the average winning margin, and the maximum of the winning margin.
Create an average attendance data series for the MCG by year and month
proc sort data=footy1; by year mon ;
J. Hirschberg Computational Economics and Business ECON30025/ ECOM90020
Merge the weather data with the attendance data
proc summary data=footy1(where=(venue=’MCG’)) ; by year mon ;
var attend w_marg accuracy;
output out=MCG_attn mean=avg_attn avg_wm avg_accur min(attend)=min_attn ; run;
data total ; merge rain1 mcg_attn(in=i1) ; by year mon ;
if i1 ; * Only keep the observations when it matches attendence data ;
Run a simple regression to see if rain matters to average
monthly attendance, minimum monthly attendance or kicking accuracy.
where the years are included as dummy variables
Use the program to compute the implied average value of the dependent variable
proc glm data=total ; class year home_win;
model avg_attn min_attn avg_accur = rain year /solution ;
lsmeans year ; quit; run;
J. Hirschberg Computational Economics and Business ECON30025/ ECOM90020
3 Graphics Methods Overview
From this tutorial you should have the following takeaways:
Understand the concepts of the appropriate graphic displays of cross-section data, time-series data and 3-dimensional representations.
o To be able to perform the following steps in the in the plots of cross section data.
To label the variables in a large data set to help define results.
To generate distribution plots with histograms, parametric density plots and kernel
density plots.
To generate matrix scatterplots.
To superimpose density plots.
To plot side-by-side boxplots.
To generate new versions of the daily sales to be daily sales per customer.
To reconfigure a dataset from the wide view with multiple columns to a long view with
more observations into a single column while adding a name.
Use Proc Format to generate a variable code.
Use Proc GLM to estimate the relationship between income and per customer sales.
o To be able to perform the following steps in the in the plots of time series data.
To label the daily data for a particular store for a number of years.
To redefine the daily sales into daily per customer sales.
To use one of the SAS functions for handling dates to identify the day of the week.
To plot all the daily values of the per customer beer sales.
To summarise the data by week to create a new data set with the average weekly per customer sales and plot the weekly average beer per customer sales.
Use the Proc glm routine to investigate the seasonal and calendar effects in the demand for beer.
o To be able to understand the elements in generating three dimensional plots.
Use a data step to generate a data set based on a particular 3D function.
To use Proc g3d to plot this function.
To change the perspective of the plot.
To generate a data set with incomplete coverage of a 3-D surface.
To generate a 3-D pillar plot of the incomplete data.
To use Proc g3grid to interpolate values in a 3-D space based on a smoothing
algorithm.
To use Proc g3d to plot the smoothed values.
To generate a contour plot of the 3-D shape in two dimensions using Proc gcontour.
This tutorial is designed to demonstrate how you can construct some of the basic plots shown in the notes that pertain to scatter diagrams and other plots. For this we will use SAS as well as the added feature for interactive data analysis.
For this tutorial we review graphic representations of two types of data: Cross Section and Time Series.
J. Hirschberg Computational Economics and Business ECON30025/ ECOM90020
Cross Section Data Graphics
Use the program entitled read_graphics_data1.
1. We first read the data for 84 Food stores that provides the average daily sales in each department and the census data for the population that lives in the same postal area as the store. Since this data comes from a simple file we need to add labels to the variables. We call this data set graphics.
2. Then we generate a scatter plot matrix of the sales in five departments with the implied normal density plot (based on the mean and standard deviation of the data) and a histogram of the data on the diagonal.
3. Then we compare the normal densities by use of overlapping densities.
4. In this step we perform the same analysis as in #3 except in this case we estimate the densities using
kernel density estimates.
5. In this step we create a new data set called next that redefines the values of wine, meat, dairy,
cheese, and bakery to be the log of the sales per customer. Note that we need to avoid very small
and zero values, so we make them missing values.
6. We repeat the scatterplot matrix plot from before with the histogram and normal plot on the diagonal
where we now use the log of the per customer sales.
7. Create a new data set from the next data to form a longer series with the log of per customer sales
stacked on top of each other in a new variable called sales. We call this dataset graph_long. We then only keep the store number, the sales, the name of the department, and the number that is associated with the department. This data set has 420 observations but only 5 variables (note we add the neighbourhood log income).
8. Create a code or format for the new variable in the dataset graph_long. This procedure allows us to refer to a code to interpret the numbers in the variable type. The name of the format is deps.
9. Generate side-by-side box-plots of the variables. We do this in two ways using two different SAS procedures. In the second case we require that the data be sorted by type.
10. Now we can estimate a regression for each type of department. This regression demonstrates how demand varies by income a relationship that is often referred to as an .
* read_graphics_data1
proc import datafile=csvFile out=graphics replace dbms=csv; run;
data graphics; set graphics;
This routine reads the store data used for the graphics description
into SAS where the data have been saved as an CSV file.
This data is for 84 stores located in Chicago and records the monthly
sales in the departments of the store.
In addition, a set of values have been added from the
US Census that describes the neighbourhood around the store.
More details of this data can be found at:
https://www.chicagobooth.edu/research/kilts/datasets/dominicks
options ls=80 ;
Title “CS Graphics Examples” ;
filename csvFile url
“https://www.online.fbe.unimelb.edu.au/t_drive/ECOM/ECOM90020/data/graphics_ddf_stores1.csv”
termstr=crlf;
Define the labels for the variables
AGE60 = “% aged 60 or over”
AGE9 = “% aged 9 or under”
BAKERY = “Bakery Sales”
BEER = “Beer Sales”
J. Hirschberg Computational Economics and Business ECON30025/ ECOM90020
BULK = “Bulk Food Sale
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com