Correlation Data mining Visual analytics
CORPFIN 2503 – Business Data Analytics: Visual
analytics and data mining
£ius
Week 2: August 2nd, 2021
£ius CORPFIN 2503, Week 2 1/41
Correlation Data mining Visual analytics
Outline
Correlation
Data mining
Visual analytics
£ius CORPFIN 2503, Week 2 2/41
Correlation Data mining Visual analytics
Covariance and correlation
Provides a measure of the strength of the linear relation between
two variables.
If positive (negative), two variables are positively (negative) related
with each other.
Correlation and covariance do not imply causality!
Covariance formula:
cov =
1
n− 1
[(rA1 − r̄A)× (rB1 − r̄B) + (rA2 − r̄A)× (rB2 − r̄B) + . . . ] ,
cov =
1
n− 1
n∑
i=1
(rAi − r̄A)× (rBi − r̄B) .
£ius CORPFIN 2503, Week 2 3/41
Correlation Data mining Visual analytics
Covariance and correlation II
Correlation (ρ) formula:
ρ(A,B) =
cov(A,B)
σAσB
.
Correlation is standardized covariance.
ρ ∈ [−1, 1].
The closer ρ is to either �1 or 1, the stronger the correlation (linear
relationship) between the variables.
£ius CORPFIN 2503, Week 2 4/41
Correlation Data mining Visual analytics
Correlation: Example
£ius CORPFIN 2503, Week 2 5/41
Correlation Data mining Visual analytics
Correlation: Example II
£ius CORPFIN 2503, Week 2 6/41
Correlation Data mining Visual analytics
Correlation: Example III
£ius CORPFIN 2503, Week 2 7/41
Correlation Data mining Visual analytics
Data mining
Data mining is the process of discovering interesting and new
patterns, correlations, and anomalies using various data sets and
methods.
The simplest methods:
• various graphs, plots. . .
• correlation matrix
• regression analysis.
£ius CORPFIN 2503, Week 2 8/41
Correlation Data mining Visual analytics
Data mining II: Example #1
Suppose you are a stock analyst and you would like to identify
stock’s future expected performance.
One way would be to achieve this is to generate a correlation
matrix between stock returns (or �rm performance measures such
as ROA, ROE etc.) and lagged various indicators.
High absolute values of correlation coe�cients would help identify
potential factors.
£ius CORPFIN 2503, Week 2 9/41
Correlation Data mining Visual analytics
Example #2
Suppose today is 30 June 2018. We want to predict whether
General Motors’ stock price will increase or decrease next month
(ending on 31/07/2018) using the available stock market
information.
£ius CORPFIN 2503, Week 2 10/41
Correlation Data mining Visual analytics
Example #2 II
Suggested steps:
1. download monthly stock price information of all NYSE stocks
(3,906 stocks of US �rms with market cap. greater than
$25m)
2. compute the correlation coe�cients between the General
Motors’ stock price for 31/07/2016 – 30/06/2018 and other
stocks for 30/06/2016 – 31/05/2018
3. identify the stock with the high absolute values of correlation
coe�cients
4. if correlation coe�cient is positive and if the stock price is
higher on 30/06/2018 than on 31/05/2018, we predict that
General Motors’ stock price will increase in 31/07/2018
5. if correlation coe�cient is positive and if the stock price is
lower on 30/06/2018 than on 31/05/2018, we predict that
General Motors’ stock price will decrease in 31/07/2018.
£ius CORPFIN 2503, Week 2 11/41
Correlation Data mining Visual analytics
Example #2 III
What is the decision rule if correlation coe�cient is negative?
£ius CORPFIN 2503, Week 2 12/41
Correlation Data mining Visual analytics
Example #2 IV
Available stock market information
30/06/2016 31/07/2018
Unavailable
stock market
information
31/05/2018
31/07/2016
Other stocks
General Motors Co
30/06/2016
30/06/2018
Compute
correlation
coefficients
for these
time periods
30/06/2018
£ius CORPFIN 2503, Week 2 13/41
Correlation Data mining Visual analytics
Example #2 V
Available stock market information
30/06/2016 31/07/2018
Unavailable
stock market
information
31/05/2018
31/07/2016
Other stocks
General Motors Co
30/06/2016
30/06/2018
If correlation coefficient is > 0 and X is greater than in previous month, then General
Motors’ stock price should increase in 31/07/2018 compared to 30/06/2018.
30/06/2018
=X
30/06/2018
£ius CORPFIN 2503, Week 2 14/41
Correlation Data mining Visual analytics
Example #2 VI
The highest absolute values of correlation coe�cients are as follows:
Ticker Company name Correlation Abs(corr.)
IPI.N Intrepid 0.914 0.914
PRIM.OQ Primoris Services Corp 0.904 0.904
FTK.N Flotek Industries Inc -0.902 0.902
CLAR.OQ Clarus Corp 0.900 0.900
VVI.N Viad Corp 0.893 0.893
ORRF.OQ Orrstown Financial Services Inc 0.892 0.892
Intrepid Potash, Inc., based in Denver, Colorado, is a fertilizer
manufacturer. The company is the largest producer of potassium
chloride, also known as muriate of potash, in the United States.
(source: Wikipedia).
£ius CORPFIN 2503, Week 2 15/41
Correlation Data mining Visual analytics
Example #2 VII
y = 2.7164x + 29.507
R² = 0.835
30
32
34
36
38
40
42
44
1 1.5 2 2.5 3 3.5 4 4.5 5
G
en
er
al
M
ot
or
s
C
o
Intrepid
£ius CORPFIN 2503, Week 2 16/41
Correlation Data mining Visual analytics
Example #2 VIII
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0
5
10
15
20
25
30
35
40
45
50
General Motors Co Intrepid (right axis)
30/06/2018
(for Intrepid )
30/06/2018
(for General Motors)
Prediction: General Motors’ stock price will decrease in
31/07/2018.
£ius CORPFIN 2503, Week 2 17/41
Correlation Data mining Visual analytics
Example #2 IX
y = 2.621x + 29.662
R² = 0.8142
30
32
34
36
38
40
42
44
1 1.5 2 2.5 3 3.5 4 4.5 5
G
en
er
al
M
ot
or
s
C
o
Intrepid
31/07/2018
(for General Motors)
£ius CORPFIN 2503, Week 2 18/41
Correlation Data mining Visual analytics
Data mining III: Example in academic research
Berkman, H., P.D. Koch, and P.J. Westerholm (2014). `Informed
trading through the accounts of children.’ Journal of Finance 69,
363-404:
• look at Finish stock market and investor performance
• analyse the transactions of more than a half million individuals
over the period January 1, 1995, through May 31, 2010.
Results:
• Underaged accountholders (de�ned as accountholders aged 0
to 10 years) exhibit superior stock-picking skills on both the
buy side and the sell side.
• Underaged accountholders signi�cantly outperform older
investors.
£ius CORPFIN 2503, Week 2 19/41
Correlation Data mining Visual analytics
Visual analytics
A research project has the following stages:
1. Identi�cation of the problem =⇒ research question:
• should be important
2. Hypothesis development (not always applicable)
3. Data collection
4. Data analysis
5. Interpretation of results
6. Policy recommendation
Data analysis includes presentation of results.
£ius CORPFIN 2503, Week 2 20/41
Correlation Data mining Visual analytics
Visual analytics II
Visual presentation is also important at the early stage of the
project. E.g., hypothesis development.
Or when we do data mining.
£ius CORPFIN 2503, Week 2 21/41
Correlation Data mining Visual analytics
Visual analytics III
Good sources on how to make appropriate �gures:
• Schwabish, J.A. (2014). An Economist’s Guide to Visualizing
Data. Journal of Economic Perspectives 28(1): 209-234 (the
link to the article is available on MyUni).
• Kna�ic, C.N. (2015). Storytelling with Data: A Data
Visualization Guide for Business Professionals. Wiley.
£ius CORPFIN 2503, Week 2 22/41
Correlation Data mining Visual analytics
Visual analytics IV
It is too easy to produce �gures using spreadsheets.
=⇒ There are so many graphs and tables that are misleading.
£ius CORPFIN 2503, Week 2 23/41
Correlation Data mining Visual analytics
The purpose of �gures
To help reader understand the content of the report (or book)
better.
But not to confuse the reader.
And not to �ll additional space in the report.
£ius CORPFIN 2503, Week 2 24/41
Correlation Data mining Visual analytics
Key principles according to Kna�ic, C.N. (2015)
1. Understand the context
2. Choose a suitable visual display
3. Eliminate clutter
4. Focus attention where you want it
5. Think like a designer
6. Tell a story.
£ius CORPFIN 2503, Week 2 25/41
Correlation Data mining Visual analytics
Understand the context
What message do you want to send to the audience?
Will the �gure be included in presentation, report, e-mail etc.?
Who is your audience?
£ius CORPFIN 2503, Week 2 26/41
Correlation Data mining Visual analytics
Choose a suitable visual display
• Table
• Scatterplot
• Horizontal bar
• Vertical bar
• Heatmap
• Stacked horizontal bar
• 3-D �gure
• etc.
£ius CORPFIN 2503, Week 2 27/41
Correlation Data mining Visual analytics
Eliminate clutter
Every single element added on the page or slide requires brain
power to process for the audience.
Do not include elements that increase excessive cognitive load but
do not increase understanding (e.g., most 3-D �gures).
£ius CORPFIN 2503, Week 2 28/41
Correlation Data mining Visual analytics
Focus attention where you want it
Properly made �gures `enable our audience to see what we want
them to see before they even know they’re seeing it’.
Options to consider:
• size
• color
• font and its attributes (bold, italic)
• position on page.
£ius CORPFIN 2503, Week 2 29/41
Correlation Data mining Visual analytics
Think like a designer
• Highlight the important stu�
• Eliminate distractions
• Create visual hierarchy of information
• You might add some text inside the �gure.
£ius CORPFIN 2503, Week 2 30/41
Correlation Data mining Visual analytics
Tell a story
• The �gure or table should help you tell the story.
• If it does not, do not include in the report or presentation or
improve the �gure.
£ius CORPFIN 2503, Week 2 31/41
Correlation Data mining Visual analytics
Common issues: Bar width
Too thick Too thin
Appropriate
0
1
2
3
4
5
6
7
A B C D
0
1
2
3
4
5
6
7
A B C D
0
1
2
3
4
5
6
7
A B C D
£ius CORPFIN 2503, Week 2 32/41
Correlation Data mining Visual analytics
Common issues: Minimum axis value
50
51
52
53
54
55
56
A B C D
0
10
20
30
40
50
60
A B C D
£ius CORPFIN 2503, Week 2 33/41
Correlation Data mining Visual analytics
Common issues: Use text and color
0
2
4
6
8
10
12
14
16
18
20
2013 2014 2015 2016 2017
Profit after adoption
of the new strategy
£ius CORPFIN 2503, Week 2 34/41
Correlation Data mining Visual analytics
Common issues: Decimal places
2013 -0.756244237 2013 -0.756
2014 0.428606224 2014 0.429
2015 1.241138714 versus 2015 1.241
2016 0.836164502 2016 0.836
2017 0.691036891 2017 0.691
£ius CORPFIN 2503, Week 2 35/41
Correlation Data mining Visual analytics
Common issues: Large numbers
Year Net income ($) Year Net income ($)
2013 -7562437.004 2013 -7,562,437
2014 4286062.2 2014 4,286,062
2015 12411387.14 versus 2015 12,411,387
2016 836164.5023 2016 836,165
2017 6910368.906 2017 6,910,369
£ius CORPFIN 2503, Week 2 36/41
Correlation Data mining Visual analytics
Common issues: Large numbers II
Year Net income ($) Year Net income ($ millions)
2013 -7562437.004 2013 -7.6
2014 4286062.2 2014 4.3
2015 12411387.14 versus 2015 12.4
2016 836164.5023 2016 0.8
2017 6910368.906 2017 6.9
£ius CORPFIN 2503, Week 2 37/41
Correlation Data mining Visual analytics
Common issues: Use di�erent colors (bad)
0
1
2
3
4
5
6
7
8
9
10
2010 2011 2012 2013 2014 2015 2016 2017
A B C E F G
£ius CORPFIN 2503, Week 2 38/41
Correlation Data mining Visual analytics
Common issues: Use di�erent colors (good)
0
1
2
3
4
5
6
7
8
9
10
2010 2011 2012 2013 2014 2015 2016 2017
A
£ius CORPFIN 2503, Week 2 39/41
Correlation Data mining Visual analytics
Common issues: Logarithmic scale
Normal scale Logarithmic scale
0
500
1000
1500
2000
2500
1 2 3 4 5
1
10
100
1000
10000
1 2 3 4 5
£ius CORPFIN 2503, Week 2 40/41
Correlation Data mining Visual analytics
Required reading
Konasani, V. R. and Kadre, S. (2015). �Practical Business
Analytics Using SAS: A Hands-on Guide�: chapter 5.
Schwabish, J.A. (2014). An Economist’s Guide to Visualizing Data.
Journal of Economic Perspectives 28(1): 209-234.
Leo, S. (2019). Mistakes, we’ve drawn a few: Learning from our
errors in data visualisation. The Economist: https://medium.
economist.com/mistakes-weve-drawn-a-few-8cdd8a42d368.
£ius CORPFIN 2503, Week 2 41/41
https://medium.economist.com/mistakes-weve-drawn-a-few-8cdd8a42d368
https://medium.economist.com/mistakes-weve-drawn-a-few-8cdd8a42d368
Correlation
Correlation
Data mining
Data mining
Visual analytics
Visual analytics