Correlation Data mining Visual presentation SAS software
BUSN 7001 – Predictive and Visual Analytics for Business
Week 2: Correlations and visual presentation
£ius BUSN 7001, Week 2 1/65
Copyright By PowCoder代写 加微信 powcoder
Correlation Data mining Visual presentation
SAS software
Correlation
Data mining
Visual presentation
SAS software
£ius BUSN 7001, Week 2
Correlation Data mining Visual presentation SAS software
Covariance and correlation
Provides a measure of the strength of the linear relation between two variables.
If positive (negative), two variables are positively (negative) related with each other.
Correlation and covariance do not imply causality!
Covariance formula:
cov= 1 [(rA1−r ̄A)×(rB1−r ̄B)+(rA2−r ̄A)×(rB2−r ̄B)+…], n−1
(rAi −r ̄A)×(rBi −r ̄B).
£ius BUSN 7001, Week 2 3/65
Correlation Data mining Visual presentation SAS software
Covariance and correlation II
Correlation (ρ) formula:
ρ(A,B) = cov(A,B).
σA σB Correlation is standardized covariance.
ρ ∈ [−1, 1].
The closer ρ is to either 1 or 1, the stronger the correlation (linear
relationship) between the variables.
£ius BUSN 7001, Week 2 4/65
Correlation Data mining Visual presentation SAS software
0.6 0.5 0.4 0.3 0.2 0.1 0.0
0.2 0.3 0.4 0.5
£ius BUSN 7001, Week 2
Correlation Data mining Visual presentation SAS software
0.0 0.1 0.2 0.3 0.4 0.5
£ius BUSN 7001, Week 2
Correlation Data mining Visual presentation SAS software
0.0 0.1 0.2 0.3 0.4 0.5
£ius BUSN 7001, Week 2
Correlation Data mining Visual presentation SAS software
Correlation: Example
£ius BUSN 7001, Week 2
Correlation Data mining Visual presentation SAS software
Correlation: Example II
£ius BUSN 7001, Week 2
Correlation Data mining Visual presentation SAS software
Correlation: Example III
£ius BUSN 7001, Week 2
Correlation Data mining Visual presentation SAS software
Data mining
Data mining is the process of discovering interesting and new patterns, correlations, and anomalies using various data sets and methods.
The simplest methods:
• various graphs, plots. . . • correlation matrix
• regression analysis.
BUSN 7001, Week 2
Correlation Data mining Visual presentation SAS software
Data mining II: Example #1
Suppose you are a stock analyst and you would like to identify stock’s future expected performance.
One way would be to achieve this is to generate a correlation matrix between stock returns (or rm performance measures such as ROA, ROE etc.) and lagged various indicators.
High absolute values of correlation coecients would help identify potential factors.
£ius BUSN 7001, Week 2 12/65
Correlation Data mining Visual presentation SAS software
Example #2
Suppose today is 30 June 2018. We want to predict whether General Motors’ stock price will increase or decrease next month (ending on 31/07/2018) using the available stock market information.
£ius BUSN 7001, Week 2 13/65
Correlation Data mining Visual presentation SAS software
Suggested steps:
Example #2 II
1. download monthly stock price information of all NYSE stocks (3,906 stocks of US rms with market cap. greater than $25m)
2. compute the correlation coecients between the General Motors’ stock price for 31/07/2016 – 30/06/2018 and other stocks for 30/06/2016 – 31/05/2018
3. identify the stock with the high absolute values of correlation coecients
4. if correlation coecient is positive and if the stock price is higher on 30/06/2018 than on 31/05/2018, we predict that General Motors’ stock price will increase in 31/07/2018
5. if correlation coecient is positive and if the stock price is lower on 30/06/2018 than on 31/05/2018, we predict that General Motors’ stock price will decrease in 31/07/2018.
£ius BUSN 7001, Week 2 14/65
Correlation Data mining Visual presentation SAS software
Example #2 III
What is the decision rule if correlation coecient is negative?
£ius BUSN 7001, Week 2 15/65
Correlation Data mining Visual presentation SAS software
30/06/2016
30/06/2016
Example #2 IV
Unavailable stock market information
Available stock market information
30/06/2018
31/07/2018
Other stocks
31/05/2018
General Motors Co
31/07/2016 30/06/2018
Compute correlation coefficients for these time periods
£ius BUSN 7001, Week 2 16/65
Correlation Data mining Visual presentation SAS software
Example #2 V
Available stock market information
Unavailable stock market information
30/06/2016
30/06/2016
Other stocks =X 30/06/2018
£ius BUSN 7001, Week 2 17/65
30/06/2018
31/07/2018
General Motors Co
31/05/2018
31/07/2016
30/06/2018
If correlation coefficient is > 0 and X is greater than in previous month, then General Motors’ stock price should increase in 31/07/2018 compared to 30/06/2018.
Correlation Data mining Visual presentation SAS software
Example #2 VI
The highest absolute values of correlation coecients are as
Ticker Company name Correlation
PRIM.OQ Primoris Services Corp 0.904 FTK.N Flotek Industries Inc -0.902 CLAR.OQ Clarus Corp 0.900 VVI.N Viad Corp 0.893 ORRF.OQ Orrstown Financial Services Inc 0.892
Abs(corr.)
0.904 0.902 0.900 0.893 0.892
Intrepid Potash, Inc., based in Denver, Colorado, is a fertilizer manufacturer. The company is the largest producer of potassium chloride, also known as muriate of potash, in the United States. (source: Wikipedia).
£ius BUSN 7001, Week 2
IPI.N Intrepid 0.914 0.914
Correlation Data mining Visual presentation SAS software
Example #2 VII
44 42 40 38 36 34 32 30
1 1.5 2 2.5 3 3.5 4 4.5 5 Intrepid
y = 2.7164x + 29.507 R2 = 0.835
£ius BUSN 7001, Week 2
Correlation Data mining Visual presentation SAS software
Example #2 VIII
50 5 45 4.5 40 4 35 3.5 30 3 25 2.5 20 2 15 1.5 10 1
5 0.5 00
30/06/2018
(for General M
30/06/2018
(for Intrepid )
General Motors Co Intrepid (right axis)
Prediction: General Motors’ stock price will decrease in 31/07/2018.
£ius BUSN 7001, Week 2 20/65
General Motors Co
Correlation Data mining Visual presentation SAS software
Example #2 IX
44 42 40 38 36 34 32 30
1 1.5 2 2.5 3 3.5 4 4.5 5 Intrepid
y = 2.621x + 29.662
31/07/2018
(for General Motors)
£ius BUSN 7001, Week 2
Correlation Data mining Visual presentation SAS software
Data mining III: Example in academic research
Berkman, H., P.D. Koch, and P.J. Westerholm (2014). `Informed trading through the accounts of children.’ Journal of Finance 69, 363-404:
• look at Finish stock market and investor performance
• analyse the transactions of more than a half million individuals
over the period January 1, 1995, through May 31, 2010.
• Underaged accountholders (dened as accountholders aged 0 to 10 years) exhibit superior stock-picking skills on both the buy side and the sell side.
• Underaged accountholders signicantly outperform older investors.
£ius BUSN 7001, Week 2 22/65
Correlation Data mining Visual presentation SAS software
Visual presentation
A research project has the following stages:
1. Identication of the problem =⇒ research question:
• should be important
2. Hypothesis development (not always applicable) 3. Data collection
4. Data analysis
5. Interpretation of results
6. Policy recommendation
Data analysis includes presentation of results.
£ius BUSN 7001, Week 2 23/65
Correlation Data mining Visual presentation SAS software
Visual presentation II
Visual presentation is also important at the early stage of the project. E.g., hypothesis development.
Or when we do data mining.
£ius BUSN 7001, Week 2 24/65
General Motors Co
Correlation Data mining Visual presentation SAS software
Visual presentation III
It is too easy to produce gures using spreadsheets.
=⇒ There are so many graphs and tables that are misleading.
£ius BUSN 7001, Week 2 25/65
Correlation Data mining Visual presentation SAS software
The purpose of gures
To help reader understand the content of the report (or book) better.
But not to confuse the reader.
And not to ll additional space in the report.
£ius BUSN 7001, Week 2 26/65
Correlation Data mining Visual presentation SAS software
How to make appropriate gures?
A good source on how to make appropriate gures:
• Knaic, C.N. (2015). Storytelling with Data: A Data Visualization Guide for Business Professionals. Wiley.
Key principles according to Knaic, C.N. (2015) 1. Understand the context
2. Choose a suitable visual display
3. Eliminate clutter
4. Focus attention where you want it 5. Think like a designer
6. Tell a story.
£ius BUSN 7001, Week 2 27/65
Correlation Data mining Visual presentation SAS software
Understand the context
What message do you want to send to the audience?
Will the gure be included in presentation, report, e-mail etc.?
Who is your audience?
£ius BUSN 7001, Week 2 28/65
Correlation Data mining Visual presentation SAS software
Choose a suitable visual display
• Scatterplot
• Horizontal bar
• Vertical bar
• Stacked horizontal bar • 3-D gure
BUSN 7001, Week 2
Correlation Data mining Visual presentation SAS software
Eliminate clutter
Every single element added on the page or slide requires brain power to process for the audience.
Do not include elements that increase excessive cognitive load but do not increase understanding (e.g., most 3-D gures).
£ius BUSN 7001, Week 2 30/65
Correlation Data mining Visual presentation SAS software
Focus attention where you want it
Properly made gures `enable our audience to see what we want them to see before they even know they’re seeing it’.
Options to consider: • size
• font and its attributes (bold, italic) • position on page.
£ius BUSN 7001, Week 2 31/65
Correlation Data mining Visual presentation
Think like a designer
• Highlight the important stu
• Eliminate distractions
• Create visual hierarchy of information
• You might add some text inside the gure.
SAS software
£ius BUSN 7001, Week 2
Correlation Data mining Visual presentation SAS software
Tell a story
• The gure or table should help you tell the story.
• If it does not, do not include in the report or presentation or
improve the gure.
£ius BUSN 7001, Week 2 33/65
Correlation Data mining Visual presentation SAS software
Common issues: Bar width
7 6 5 4 3 2 1 0
Too thick Too thin
7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0
Appropriate
£ius BUSN 7001, Week 2 34/65
Correlation Data mining Visual presentation SAS software
Common issues: Minimum axis value
56 55 54 53 52 51 50
60 50 40 30 20 10
£ius BUSN 7001, Week 2 35/65
Correlation Data mining Visual presentation SAS software
Common issues: Use text and color
20 18 16 14 12 10
2013 2014 2015 2016 2017
Profit after adoption of the new strategy
£ius BUSN 7001, Week 2 36/65
Correlation
Data mining Visual presentation SAS software
Common issues: Decimal places
-0.756244237 0.428606224 1.241138714 0.836164502 0.691036891
-0.756 0.429 1.241 0.836 0.691
BUSN 7001, Week 2
Correlation
Data mining Visual presentation SAS software
Net income ($)
-7562437.004 4286062.2 12411387.14 836164.5023 6910368.906
Net income ($)
-7,562,437 4,286,062 12,411,387 836,165 6,910,369
Common issues: Large numbers
BUSN 7001, Week 2
Correlation
Data mining Visual presentation SAS software
Common issues: Large numbers II
Net income ($)
-7562437.004 4286062.2 12411387.14 836164.5023 6910368.906
Net income ($ millions)
-7.6 4.3 12.4 0.8 6.9
BUSN 7001, Week 2
Correlation Data mining Visual presentation SAS software
Common issues: Use dierent colors (bad)
10 9 8 7 6 5 4 3 2 1 0
2010 2011 2012 ABCEFG
2015 2016 2017
£ius BUSN 7001, Week 2 40/65
Correlation Data mining Visual presentation SAS software
Common issues: Use dierent colors (good)
10 9 8 7 6 5 4 3 2 1 0
2012 2013 2014
2015 2016 2017
£ius BUSN 7001, Week 2 41/65
Correlation Data mining Visual presentation SAS software
Common issues: Logarithmic scale
10000 1000 100 10 1
2500 2000 1500 1000
Normal scale Logarithmic scale
£ius BUSN 7001, Week 2 42/65
Correlation Data mining Visual presentation SAS software
SAS software
We will use the following SAS software:
• SAS 9.4 (in the Financial Markets Lab)
• SAS onDemand for Academics (see MyUni how to access it)
• SAS Viya for Learners (in order to access SAS Visual Analytics).
The rst two options are (almost) the same.
SAS Visual Analytics does not require any coding.
£ius BUSN 7001, Week 2 43/65
Correlation Data mining Visual presentation SAS software
SAS Visual Analytics
Steps to access:
1. Create an account with SAS Academic Hub
2. Launch SAS Viya for Learners 3.5
3. Click on ‘Reports’ tab, then on ‘Explore and Visualize’.
£ius BUSN 7001, Week 2 44/65
Correlation Data mining Visual presentation SAS software
SAS is an extremely powerful data analysis tool:
• capable to deal with very large les (le size is limited by the hard drive)
• can do lots of things
• popular among large nancial corporations and other entities
dealing with large datasets.
Disadvantages of SAS:
• rather steep learning curve • expensive.
£ius BUSN 7001, Week 2 45/65
Correlation Data mining Visual presentation SAS software
Basic advices for using SAS
1. Always save your SAS code (*.sas) 2. Use comments:
• /* Comment */ or • * Comment;
3. Don’t give up on debugging your code. Google.com is your good friend.
£ius BUSN 7001, Week 2 46/65
Correlation Data mining Visual presentation SAS software
First SAS code: Duplicates
Due to data recording and/or reporting issues, sometimes databases might contain duplicate (i.e., identical) observations.
They should be removed before the analysis.
£ius BUSN 7001, Week 2 47/65
Correlation Data mining Visual presentation
Importing the data
SAS code to import the data:
data work.sample;
input Company $ Analyst Recommendation $;
datalines;
IBM 112232 BUY
Apple 736352 BUY
Ford 929191 SELL
HP 929277 HOLD
IBM 112232 BUY
Amazon 48483 HOLD
IBM 32125 HOLD
SAS software
BUSN 7001, Week 2
Correlation
Data mining Visual presentation SAS software
Example of duplicates
Company Analyst IBM 112232 Apple 736352 Ford 929191 HP 929277 IBM 112232 Amazon 48483 IBM 32125
Recommendation BUY
HOLD BUY HOLD HOLD
BUSN 7001, Week 2
Correlation Data mining Visual presentation SAS software
Important considerations: Duplicates II
SAS code to sort the data:
PROC SORT data=work.sample;
BY Company;
£ius BUSN 7001, Week 2 50/65
Correlation
Data mining Visual presentation SAS software
Example of duplicates II
Company Analyst Amazon 48483
Recommendation HOLD
HOLD BUY BUY HOLD
HP IBM IBM IBM
BUSN 7001, Week 2
Correlation Data mining Visual presentation SAS software
Important considerations: Duplicates II
SAS code to remove duplicates:
PROC SORT data=work.sample nodupkey;
BY Company;
Option NODUPKEY deletes those observations with duplicate BY variable (in our case: Company) values.
£ius BUSN 7001, Week 2 52/65
Correlation
Data mining Visual presentation SAS software
Example of duplicates III
Company Analyst Amazon 48483
Recommendation HOLD
HOLD BUY BUY HOLD
BUSN 7001, Week 2
HP IBM IBM IBM
Correlation Data mining Visual presentation SAS software
Important considerations: Duplicates II
SAS code to remove duplicates:
PROC SORT data=work.sample nodup;
BY Company;
Option NODUP deletes duplicated observations.
£ius BUSN 7001, Week 2 54/65
Correlation
Data mining Visual presentation SAS software
Example of duplicates III
Company Analyst Amazon 48483
Recommendation HOLD
HOLD BUY BUY HOLD
HP IBM IBM IBM
BUSN 7001, Week 2
Correlation Data mining Visual presentation SAS software
Second SAS code: Correlations
Importing the data and creating a new variable:
data work.cars;
set sashelp.cars;
discount=MSRP/Invoice-1;
Generating correlation matrix:
proc corr data=work.cars;
var discount invoice horsepower weight;
£ius BUSN 7001, Week 2 56/65
Correlation Data mining Visual presentation SAS software
Second SAS code: Correlations II
£ius BUSN 7001, Week 2 57/65
Correlation Data mining Visual presentation SAS software
Second SAS code: Correlations III
£ius BUSN 7001, Week 2 58/65
Correlation Data mining Visual presentation SAS software
Second SAS code: Correlations IV
What about plots?:
proc corr data=work.cars
plots(MAXPOINTS=10000)=matrix(histogram);
var discount invoice horsepower weight;
£ius BUSN 7001, Week 2 59/65
Correlation Data mining Visual presentation SAS software
Second SAS code: Correlations V
£ius BUSN 7001, Week 2 60/65
Correlation Data mining Visual presentation SAS software
Second SAS code: Correlations VI
We can even make them colorful:
proc sgscatter data=work.cars;
matrix discount invoice horsepower weight
/group=drivetrain diagonal=(histogram kernel);
£ius BUSN 7001, Week 2 61/65
Correlation Data mining Visual presentation SAS software
Second SAS code: Correlations VII
£ius BUSN 7001, Week 2 62/65
Correlation Data mining Visual presentation SAS software
Correlation matrix using SAS Visual Analytics
£ius BUSN 7001, Week 2 63/65
Correlation Data mining Visual presentation SAS software
Required reading
Konasani, V. R. and Kadre, S. (2015). Practical Business Analytics Using SAS: A Hands-on Guide: chapter 5.
Schwabish, J.A. (2014). An Economist’s Guide to Visualizing Data. Journal of Economic Perspectives 28(1): 209-234.
Leo, S. (2019). Mistakes, we’ve drawn a few: Learning from our errors in data visualisation. The Economist: https://medium. economist.com/mistakes-weve-drawn-a-few-8cdd8a42d368.
£ius BUSN 7001, Week 2 64/65
Correlation Data mining Visual presentation SAS software
Recommended reading
Camm, J.D., Cochran, J.J., Fry, M.J., Ohlmann, J.W. (2021). Business Analytics, 4th edition: chapters 2, 3.
£ius BUSN 7001, Week 2 65/65
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com