Correlation Data mining Visual analytics
CORPFIN 2503 – Business Data Analytics: Visual analytics and data mining
Week 2: August 2nd, 2021
£ius CORPFIN 2503, Week 2 1/41
Copyright By PowCoder代写 加微信 powcoder
Correlation
Data mining
Visual analytics
Correlation
Data mining
Visual analytics
£ius CORPFIN 2503, Week 2
Correlation Data mining Visual analytics
Covariance and correlation
Provides a measure of the strength of the linear relation between two variables.
If positive (negative), two variables are positively (negative) related with each other.
Correlation and covariance do not imply causality!
Covariance formula:
cov= 1 [(rA1−r ̄A)×(rB1−r ̄B)+(rA2−r ̄A)×(rB2−r ̄B)+…], n−1
(rAi −r ̄A)×(rBi −r ̄B).
£ius CORPFIN 2503, Week 2 3/41
Correlation Data mining Visual analytics
Covariance and correlation II
Correlation (ρ) formula:
ρ(A,B) = cov(A,B).
σA σB Correlation is standardized covariance.
ρ ∈ [−1, 1].
The closer ρ is to either 1 or 1, the stronger the correlation (linear
relationship) between the variables.
£ius CORPFIN 2503, Week 2 4/41
Correlation Data mining Visual analytics
Correlation: Example
£ius CORPFIN 2503, Week 2
Correlation Data mining Visual analytics
Correlation: Example II
£ius CORPFIN 2503, Week 2
Correlation Data mining Visual analytics
Correlation: Example III
£ius CORPFIN 2503, Week 2
Correlation Data mining Visual analytics
Data mining
Data mining is the process of discovering interesting and new patterns, correlations, and anomalies using various data sets and methods.
The simplest methods:
• various graphs, plots. . . • correlation matrix
• regression analysis.
CORPFIN 2503, Week 2 8/41
Correlation Data mining Visual analytics
Data mining II: Example #1
Suppose you are a stock analyst and you would like to identify stock’s future expected performance.
One way would be to achieve this is to generate a correlation matrix between stock returns (or rm performance measures such as ROA, ROE etc.) and lagged various indicators.
High absolute values of correlation coecients would help identify potential factors.
£ius CORPFIN 2503, Week 2 9/41
Correlation Data mining Visual analytics
Example #2
Suppose today is 30 June 2018. We want to predict whether General Motors’ stock price will increase or decrease next month (ending on 31/07/2018) using the available stock market information.
£ius CORPFIN 2503, Week 2 10/41
Correlation Data mining Visual analytics
Example #2 II
Suggested steps:
1. download monthly stock price information of all NYSE stocks (3,906 stocks of US rms with market cap. greater than $25m)
2. compute the correlation coecients between the General Motors’ stock price for 31/07/2016 – 30/06/2018 and other stocks for 30/06/2016 – 31/05/2018
3. identify the stock with the high absolute values of correlation coecients
4. if correlation coecient is positive and if the stock price is higher on 30/06/2018 than on 31/05/2018, we predict that General Motors’ stock price will increase in 31/07/2018
5. if correlation coecient is positive and if the stock price is lower on 30/06/2018 than on 31/05/2018, we predict that General Motors’ stock price will decrease in 31/07/2018.
£ius CORPFIN 2503, Week 2 11/41
Correlation Data mining Visual analytics
Example #2 III
What is the decision rule if correlation coecient is negative?
£ius CORPFIN 2503, Week 2 12/41
Correlation
Data mining
Visual analytics
Example #2 IV
Unavailable stock market information
Available stock market information
30/06/2016
30/06/2016
31/07/2016
Other stocks
General Motors Co
31/05/2018
30/06/2018
30/06/2018
31/07/2018
Compute correlation coefficients for these time periods
£ius CORPFIN 2503, Week 2
Correlation
Data mining Visual analytics
Example #2 V
Available stock market information
Unavailable stock market information
30/06/2016
30/06/2016
Other stocks =X 30/06/2018
£ius CORPFIN 2503, Week 2 14/41
30/06/2018
31/07/2018
General Motors Co
31/05/2018
31/07/2016
30/06/2018
If correlation coefficient is > 0 and X is greater than in previous month, then General Motors’ stock price should increase in 31/07/2018 compared to 30/06/2018.
Correlation Data mining Visual analytics
Example #2 VI
The highest absolute values of correlation coecients are as
Ticker Company name Correlation
PRIM.OQ Primoris Services Corp 0.904 FTK.N Flotek Industries Inc -0.902 CLAR.OQ Clarus Corp 0.900 VVI.N Viad Corp 0.893 ORRF.OQ Orrstown Financial Services Inc 0.892
Abs(corr.)
0.904 0.902 0.900 0.893 0.892
Intrepid Potash, Inc., based in Denver, Colorado, is a fertilizer manufacturer. The company is the largest producer of potassium chloride, also known as muriate of potash, in the United States. (source: Wikipedia).
£ius CORPFIN 2503, Week 2
IPI.N Intrepid 0.914 0.914
Correlation Data mining Visual analytics
Example #2 VII
44 42 40 38 36 34 32 30
1 1.5 2 2.5 3 3.5 4 4.5 5 Intrepid
y = 2.7164x + 29.507 R2 = 0.835
£ius CORPFIN 2503, Week 2
General Motors Co
Correlation Data mining Visual analytics
Example #2 VIII
50 5 45 4.5 40 4 35 3.5 30 3 25 2.5 20 2 15 1.5 10 1
5 0.5 00
30/06/2018
(for General M
30/06/2018
(for Intrepid )
General Motors Co Intrepid (right axis)
Prediction: General Motors’ stock price will decrease in 31/07/2018.
£ius CORPFIN 2503, Week 2 17/41
Correlation Data mining Visual analytics
Example #2 IX
44 42 40 38 36 34 32 30
1 1.5 2 2.5 3 3.5 4 4.5 5 Intrepid
y = 2.621x + 29.662
31/07/2018
(for General Motors)
£ius CORPFIN 2503, Week 2
Correlation Data mining Visual analytics
Data mining III: Example in academic research
Berkman, H., P.D. Koch, and P.J. Westerholm (2014). `Informed trading through the accounts of children.’ Journal of Finance 69, 363-404:
• look at Finish stock market and investor performance
• analyse the transactions of more than a half million individuals
over the period January 1, 1995, through May 31, 2010.
• Underaged accountholders (dened as accountholders aged 0 to 10 years) exhibit superior stock-picking skills on both the buy side and the sell side.
• Underaged accountholders signicantly outperform older investors.
£ius CORPFIN 2503, Week 2 19/41
Correlation Data mining Visual analytics
Visual analytics
A research project has the following stages:
1. Identication of the problem =⇒ research question:
• should be important
2. Hypothesis development (not always applicable) 3. Data collection
4. Data analysis
5. Interpretation of results
6. Policy recommendation
Data analysis includes presentation of results.
£ius CORPFIN 2503, Week 2 20/41
General Motors Co
Correlation Data mining Visual analytics
Visual analytics II
Visual presentation is also important at the early stage of the project. E.g., hypothesis development.
Or when we do data mining.
£ius CORPFIN 2503, Week 2 21/41
Correlation Data mining Visual analytics
Visual analytics III
Good sources on how to make appropriate gures:
• Schwabish, J.A. (2014). An Economist’s Guide to Visualizing Data. Journal of Economic Perspectives 28(1): 209-234 (the link to the article is available on MyUni).
• Knaic, C.N. (2015). Storytelling with Data: A Data Visualization Guide for Business Professionals. Wiley.
£ius CORPFIN 2503, Week 2 22/41
Correlation Data mining Visual analytics
Visual analytics IV
It is too easy to produce gures using spreadsheets.
=⇒ There are so many graphs and tables that are misleading.
£ius CORPFIN 2503, Week 2 23/41
Correlation Data mining Visual analytics
The purpose of gures
To help reader understand the content of the report (or book) better.
But not to confuse the reader.
And not to ll additional space in the report.
£ius CORPFIN 2503, Week 2 24/41
Correlation Data mining Visual analytics
Key principles according to Knaic, C.N. (2015)
1. Understand the context
2. Choose a suitable visual display
3. Eliminate clutter
4. Focus attention where you want it 5. Think like a designer
6. Tell a story.
£ius CORPFIN 2503, Week 2 25/41
Correlation Data mining Visual analytics
Understand the context
What message do you want to send to the audience?
Will the gure be included in presentation, report, e-mail etc.?
Who is your audience?
£ius CORPFIN 2503, Week 2 26/41
Correlation Data mining Visual analytics
Choose a suitable visual display
• Scatterplot
• Horizontal bar
• Vertical bar
• Stacked horizontal bar • 3-D gure
CORPFIN 2503, Week 2
Correlation Data mining Visual analytics
Eliminate clutter
Every single element added on the page or slide requires brain power to process for the audience.
Do not include elements that increase excessive cognitive load but do not increase understanding (e.g., most 3-D gures).
£ius CORPFIN 2503, Week 2 28/41
Correlation Data mining Visual analytics
Focus attention where you want it
Properly made gures `enable our audience to see what we want them to see before they even know they’re seeing it’.
Options to consider: • size
• font and its attributes (bold, italic) • position on page.
£ius CORPFIN 2503, Week 2 29/41
Correlation Data mining
Visual analytics
Think like a designer
• Highlight the important stu
• Eliminate distractions
• Create visual hierarchy of information
• You might add some text inside the gure.
£ius CORPFIN 2503, Week 2
Correlation Data mining Visual analytics
Tell a story
• The gure or table should help you tell the story.
• If it does not, do not include in the report or presentation or
improve the gure.
£ius CORPFIN 2503, Week 2 31/41
Correlation Data mining Visual analytics
Common issues: Bar width
7 6 5 4 3 2 1 0
Too thick Too thin
7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0
£ius CORPFIN 2503, Week 2 32/41
Appropriate
Correlation Data mining Visual analytics
Common issues: Minimum axis value
56 55 54 53 52 51 50
60 50 40 30 20 10
£ius CORPFIN 2503, Week 2 33/41
Correlation Data mining Visual analytics
Common issues: Use text and color
20 18 16 14 12 10
2015 2016 2017
Profit after adoption of the new strategy
£ius CORPFIN 2503, Week 2 34/41
Correlation
Data mining Visual analytics
Common issues: Decimal places
-0.756244237 0.428606224 1.241138714 0.836164502 0.691036891
2013 -0.756 2014 0.429 2015 1.241 2016 0.836 2017 0.691
CORPFIN 2503, Week 2
Correlation
Data mining Visual analytics
Common issues: Large numbers
Net income ($)
-7562437.004 4286062.2 12411387.14 836164.5023 6910368.906
Year Net income ($)
2013 -7,562,437 2014 4,286,062 2015 12,411,387
836,165 6,910,369
CORPFIN 2503, Week 2
Correlation
Data mining Visual analytics
Net income ($) Year
-7562437.004 2013 4286062.2 2014 12411387.14 versus 2015 836164.5023 2016 6910368.906 2017
Net income ($ millions)
-7.6 4.3 12.4 0.8 6.9
Common issues: Large numbers II
CORPFIN 2503, Week 2
Correlation Data mining Visual analytics
Common issues: Use dierent colors (bad)
10 9 8 7 6 5 4 3 2 1 0
2010 2011 ABCEFG
£ius CORPFIN 2503, Week 2 38/41
Correlation Data mining Visual analytics
Common issues: Use dierent colors (good)
10 9 8 7 6 5 4 3 2 1 0
2012 2013 2014
2015 2016 2017
£ius CORPFIN 2503, Week 2 39/41
Correlation Data mining Visual analytics
Common issues: Logarithmic scale
10000 1000 100 10 1
2500 2000 1500 1000
Normal scale Logarithmic scale
£ius CORPFIN 2503, Week 2 40/41
Correlation Data mining Visual analytics
Required reading
Konasani, V. R. and Kadre, S. (2015). Practical Business Analytics Using SAS: A Hands-on Guide: chapter 5.
Schwabish, J.A. (2014). An Economist’s Guide to Visualizing Data. Journal of Economic Perspectives 28(1): 209-234.
Leo, S. (2019). Mistakes, we’ve drawn a few: Learning from our errors in data visualisation. The Economist: https://medium. economist.com/mistakes-weve-drawn-a-few-8cdd8a42d368.
£ius CORPFIN 2503, Week 2 41/41
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com