程序代做CS代考 data mining Excel CORPFIN 2503 – Business Data Analytics

CORPFIN 2503 – Business Data Analytics

2021 S2, Workshop 2: Visual analytics and data mining

£ius

1 Time series stock market data

Let’s download time series stock market data for all stocks traded on ASX using

Eikon:

1. Search for Screener

2. Filter for stocks traded in Australia (country of exchange)

3. Click button `Launch Data Item Library’ (on the right of `Add Column’ bar)

4. Search for `price close’

5. Choose the following parameters: select `Series’, calendar month (CM), last

26 observations, ordered from oldest to newest

6. Click on Excel icon to save the data as Excel �le.

2 Lags and leads

Unfortunately, creating lagged and lead values of variables is not an easy task. Let’s

create lagged and lead values using the example below.

/* Creating data file: */

DATA work.citi;

SET SAShelp.Citimon;

RUN;

1

/* Creating another file with smaller number of observations: */

DATA work.citi_sh;

SET work.citi (keep = date FSPCOM LHUR EEC EXVUS);

IF FSPCOM ne .;

IF LHUR ne .;

IF EEC ne .;

IF EXVUS ne .;

RUN;

/**** Let’s create a lagged value for FSPCOM: 2-step procedure: ****/

/* 1) Sorting data by date: */

proc sort data=work.citi_sh;

by date;

run;

/* 2) Creating a lagged value using a command ‘LAG(…)’: */

data work.citi_sh;

set work.citi_sh;

lag_FSPCOM=LAG(FSPCOM);

run;

/**** Let’s create a lead value for FSPCOM: 2-step procedure: ****/

/* 1) Sorting data by date but in descending order: */

proc sort data=work.citi_sh;

by descending date;

run;

/* 2) Creating a lead value using a command ‘LAG(…)’: */

data work.citi_sh;

set work.citi_sh;

lead_FSPCOM=LAG(FSPCOM);

run;

/* Let’s sort the data in the original order (i.e., by date): */

proc sort data=work.citi_sh;

by date;

run;

2

3 Procedure CORR

SAS procedure CORR generates correlation coe�cients. The procedure allows to

display correlation matrix in the HTML window as well as to save them as a separate

�le. Let’s consider example below.

/* Let’s find the correlation matrix: */

PROC CORR DATA=work.citi_sh;

RUN;

/* Let’s create a new file with the correlation coeficients of */

/* lead_FSPCOM with the rest of variables (the command ‘OUTP=’ */

/* saves Pearson correlation coefficients as a new file): */

PROC CORR DATA=work.citi_sh

OUTP=work.corr_citi_sh;

WITH lead_FSPCOM;

RUN;

/* Removing redundant variables */

data work.corr_citi_sh;

set work.corr_citi_sh;

if _NAME_=”lead_FSPCOM”;

drop _NAME_;

run;

/* Transposing the data in order we could sort the correlation */

/* coefficients if needed */

proc transpose data=work.corr_citi_sh

out=work.corr_citi_sh2;

run;

/* Renaming the variable */

data work.corr_citi_sh2;

set work.corr_citi_sh2;

rename col1=corr;

run;

3

4 Various plots (in the lab and at home)

For this task, you should familiarize yourself with key plots available on SAS. Below,

I provide the necessary code.

/* Generating a new variable (return on index) */

data work.citi_sh;

set work.citi_sh;

return=FSPCOM/lag_FSPCOM-1;

run;

/* Option PLOTS generates stem-and-leaf, box and qq plots of the variable(s) */

proc univariate data=work.citi_sh PLOTS;

var return;

RUN;

/* Option HISTOGRAM generates histogram */

proc univariate data=work.citi_sh;

var return;

HISTOGRAM / NORMAL (COLOR=RED);

RUN;

/* Creating data file: */

DATA work.car_data;

SET SAShelp.Cars;

RUN;

/* Vertical bar chart */

proc gchart data=work.car_data;

VBAR type / type=percent;

run;

quit;

/* Horizontal bar chart */

proc gchart data=work.car_data;

HBAR type / type=percent;

4

run;

quit;

/* Pie bar chart */

proc gchart data=work.car_data;

PIE type / type=percent;

run;

quit;

/* One can create a stacked bar chart using the SUBGROUP= option: */

proc gchart data=work.car_data;

VBAR type / type=percent

subgroup=DriveTrain;

run;

quit;

/* The same thing for horizontal bar chart: */

proc gchart data=work.car_data;

HBAR type / type=percent

subgroup=DriveTrain;

run;

quit;

/* or */

proc gchart data=work.car_data;

HBAR type / type=percent

group=DriveTrain;

run;

quit;

/* Scatterplot: */

proc gplot data=work.car_data;

title ‘Scatter plot of MSRP and Invoice’;

plot Horsepower* Invoice=1;

run;

quit;

5

/* Scatterplot by group: */

SYMBOL1 V=circle C=black I=none height=2;

SYMBOL2 V=star C=red I=none;

SYMBOL2 V=square C=blue I=none height=2;

proc gplot data=work.car_data;

title ‘Scatter plot of Horsepower and Invoice by Drivetrain’;

plot Horsepower* Invoice=drivetrain;

run;

quit;

/* Line: */

symbol1 color=red interpol=join;

title ‘Return over time’;

proc gplot data=work.citi_sh;

plot return*date / hminor=0;

run;

quit;

/* 2 lines: */

/* First, we need to make labels shorter (otherwise, the graph will be */

/* very small) */

DATA work.citi_sh;

SET work.citi_sh;

label EXVUS=”USD index”;

label FSPCOM=”Stock index”;

RUN;

symbol1 color=black interpol=join;

symbol2 color=green interpol=join;

title ‘USD index and stock market index over time’;

proc gplot data=work.citi_sh;

plot EXVUS*DATE / caxis=black hminor=0

6

noframe;

plot2 FSPCOM*DATE / caxis=green;

run;

quit;

You may also refer to the e-books by on how to make nice graphs:

http://robslink.com/SAS/Home.htm.

5 Data mining (at home)

Suppose today is 30 June 2021. We want to predict whether Telstra’s stock price

will increase or decrease next month (ending on 31/07/2021) using the Excel �le

downloaded in Task 1.

Suggested steps:

1. remove stocks with missing stock price information

2. keep stocks with average stock price across 26 months above or equal to $2

3. you should retain around 400 stocks, including Telstra

4. delete the column `country of exchange’, rename the remaining columns of the

data set to: ticker, name, June-2019 . . . July-2021

5. save the �le as ASX.csv �le and import it to SAS

6. compute the correlation coe�cients between the Telstra’s stock price for 31/07/2019

– 30/06/2021 and other stocks for 30/06/2019 – 31/05/2021:

� transpose the data using the code below (`id ticker;’ option let us retain

the stock tickers; however, ‘.’ are replaced by ‘_’.):

proc transpose data=work.asx

out=work.asx2;

id ticker;

run;

� generate a variable which is equal to observation number; this variable

can be used instead of date in order to create lagged and lead variables:

7

http://robslink.com/SAS/Home.htm

data work.asx2;

set work.asx2;

period=_n_;

run;

� create a variable which is equal to lead values of Telstra’s stock price

� compute correlation coe�cients between the lead values of Telstra’s stock

price and other stocks using the �rst 24 observations

� transpose the data set with correlation coe�cients

7. create the variable for absolute values of correlation coe�cients using command

`ABS()’ (e.g., abs_corr=abs(corr);)

8. identify the stock with the highest absolute correlation coe�cient (you may

sort the data in order to achieve this)

9. plot the lead values of Telstra’s stock price and the price of stock identi�ed in

the previous step using the �rst 25 observations to identify the prediction and

whether the prediction was accurate.

8

Time series stock market data
Lags and leads
Procedure CORR
Various plots (in the lab and at home)
Data mining (at home)