程序代写代做代考 finance data mining ER Correlation Data mining Visual analytics

Correlation Data mining Visual analytics

CORPFIN 2503 – Business Data Analytics: Visual
analytics and data mining

£ius

Week 2: August 2nd, 2021

£ius CORPFIN 2503, Week 2 1/41

Correlation Data mining Visual analytics

Outline

Correlation

Data mining

Visual analytics

£ius CORPFIN 2503, Week 2 2/41

Correlation Data mining Visual analytics

Covariance and correlation
Provides a measure of the strength of the linear relation between
two variables.

If positive (negative), two variables are positively (negative) related
with each other.

Correlation and covariance do not imply causality!

Covariance formula:

cov =
1

n− 1
[(rA1 − r̄A)× (rB1 − r̄B) + (rA2 − r̄A)× (rB2 − r̄B) + . . . ] ,

cov =
1

n− 1

n∑
i=1

(rAi − r̄A)× (rBi − r̄B) .

£ius CORPFIN 2503, Week 2 3/41

Correlation Data mining Visual analytics

Covariance and correlation II

Correlation (ρ) formula:

ρ(A,B) =
cov(A,B)

σAσB
.

Correlation is standardized covariance.

ρ ∈ [−1, 1].

The closer ρ is to either �1 or 1, the stronger the correlation (linear
relationship) between the variables.

£ius CORPFIN 2503, Week 2 4/41

Correlation Data mining Visual analytics

Correlation: Example

£ius CORPFIN 2503, Week 2 5/41

Correlation Data mining Visual analytics

Correlation: Example II

£ius CORPFIN 2503, Week 2 6/41

Correlation Data mining Visual analytics

Correlation: Example III

£ius CORPFIN 2503, Week 2 7/41

Correlation Data mining Visual analytics

Data mining

Data mining is the process of discovering interesting and new
patterns, correlations, and anomalies using various data sets and
methods.

The simplest methods:

• various graphs, plots. . .
• correlation matrix
• regression analysis.

£ius CORPFIN 2503, Week 2 8/41

Correlation Data mining Visual analytics

Data mining II: Example #1

Suppose you are a stock analyst and you would like to identify
stock’s future expected performance.

One way would be to achieve this is to generate a correlation
matrix between stock returns (or �rm performance measures such
as ROA, ROE etc.) and lagged various indicators.

High absolute values of correlation coe�cients would help identify
potential factors.

£ius CORPFIN 2503, Week 2 9/41

Correlation Data mining Visual analytics

Example #2

Suppose today is 30 June 2018. We want to predict whether
General Motors’ stock price will increase or decrease next month
(ending on 31/07/2018) using the available stock market
information.

£ius CORPFIN 2503, Week 2 10/41

Correlation Data mining Visual analytics

Example #2 II
Suggested steps:

1. download monthly stock price information of all NYSE stocks
(3,906 stocks of US �rms with market cap. greater than
$25m)

2. compute the correlation coe�cients between the General
Motors’ stock price for 31/07/2016 – 30/06/2018 and other
stocks for 30/06/2016 – 31/05/2018

3. identify the stock with the high absolute values of correlation
coe�cients

4. if correlation coe�cient is positive and if the stock price is
higher on 30/06/2018 than on 31/05/2018, we predict that
General Motors’ stock price will increase in 31/07/2018

5. if correlation coe�cient is positive and if the stock price is
lower on 30/06/2018 than on 31/05/2018, we predict that
General Motors’ stock price will decrease in 31/07/2018.

£ius CORPFIN 2503, Week 2 11/41

Correlation Data mining Visual analytics

Example #2 III

What is the decision rule if correlation coe�cient is negative?

£ius CORPFIN 2503, Week 2 12/41

Correlation Data mining Visual analytics

Example #2 IV

Available stock market information

30/06/2016 31/07/2018

Unavailable
stock market
information

31/05/2018

31/07/2016

Other stocks

General Motors Co
30/06/2016

30/06/2018

Compute
correlation
coefficients
for these
time periods

30/06/2018

£ius CORPFIN 2503, Week 2 13/41

Correlation Data mining Visual analytics

Example #2 V

Available stock market information

30/06/2016 31/07/2018

Unavailable
stock market
information

31/05/2018

31/07/2016

Other stocks

General Motors Co
30/06/2016

30/06/2018

If correlation coefficient is > 0 and X is greater than in previous month, then General
Motors’ stock price should increase in 31/07/2018 compared to 30/06/2018.

30/06/2018

=X

30/06/2018

£ius CORPFIN 2503, Week 2 14/41

Correlation Data mining Visual analytics

Example #2 VI

The highest absolute values of correlation coe�cients are as follows:

Ticker Company name Correlation Abs(corr.)

IPI.N Intrepid 0.914 0.914
PRIM.OQ Primoris Services Corp 0.904 0.904
FTK.N Flotek Industries Inc -0.902 0.902
CLAR.OQ Clarus Corp 0.900 0.900
VVI.N Viad Corp 0.893 0.893
ORRF.OQ Orrstown Financial Services Inc 0.892 0.892

Intrepid Potash, Inc., based in Denver, Colorado, is a fertilizer
manufacturer. The company is the largest producer of potassium
chloride, also known as muriate of potash, in the United States.
(source: Wikipedia).

£ius CORPFIN 2503, Week 2 15/41

Correlation Data mining Visual analytics

Example #2 VII

y = 2.7164x + 29.507
R² = 0.835

30

32

34

36

38

40

42

44

1 1.5 2 2.5 3 3.5 4 4.5 5

G
en

er
al

M
ot

or
s

C
o

Intrepid

£ius CORPFIN 2503, Week 2 16/41

Correlation Data mining Visual analytics

Example #2 VIII

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0

5

10

15

20

25

30

35

40

45

50

General Motors Co Intrepid (right axis)

30/06/2018
(for Intrepid )

30/06/2018
(for General Motors)

Prediction: General Motors’ stock price will decrease in
31/07/2018.

£ius CORPFIN 2503, Week 2 17/41

Correlation Data mining Visual analytics

Example #2 IX

y = 2.621x + 29.662
R² = 0.8142

30

32

34

36

38

40

42

44

1 1.5 2 2.5 3 3.5 4 4.5 5

G
en

er
al

M
ot

or
s

C
o

Intrepid

31/07/2018
(for General Motors)

£ius CORPFIN 2503, Week 2 18/41

Correlation Data mining Visual analytics

Data mining III: Example in academic research

Berkman, H., P.D. Koch, and P.J. Westerholm (2014). `Informed
trading through the accounts of children.’ Journal of Finance 69,
363-404:

• look at Finish stock market and investor performance
• analyse the transactions of more than a half million individuals
over the period January 1, 1995, through May 31, 2010.

Results:

• Underaged accountholders (de�ned as accountholders aged 0
to 10 years) exhibit superior stock-picking skills on both the
buy side and the sell side.

• Underaged accountholders signi�cantly outperform older
investors.

£ius CORPFIN 2503, Week 2 19/41

Correlation Data mining Visual analytics

Visual analytics

A research project has the following stages:

1. Identi�cation of the problem =⇒ research question:
• should be important

2. Hypothesis development (not always applicable)

3. Data collection

4. Data analysis

5. Interpretation of results

6. Policy recommendation

Data analysis includes presentation of results.

£ius CORPFIN 2503, Week 2 20/41

Correlation Data mining Visual analytics

Visual analytics II

Visual presentation is also important at the early stage of the
project. E.g., hypothesis development.

Or when we do data mining.

£ius CORPFIN 2503, Week 2 21/41

Correlation Data mining Visual analytics

Visual analytics III

Good sources on how to make appropriate �gures:

• Schwabish, J.A. (2014). An Economist’s Guide to Visualizing
Data. Journal of Economic Perspectives 28(1): 209-234 (the
link to the article is available on MyUni).

• Kna�ic, C.N. (2015). Storytelling with Data: A Data
Visualization Guide for Business Professionals. Wiley.

£ius CORPFIN 2503, Week 2 22/41

Correlation Data mining Visual analytics

Visual analytics IV

It is too easy to produce �gures using spreadsheets.

=⇒ There are so many graphs and tables that are misleading.

£ius CORPFIN 2503, Week 2 23/41

Correlation Data mining Visual analytics

The purpose of �gures

To help reader understand the content of the report (or book)
better.

But not to confuse the reader.

And not to �ll additional space in the report.

£ius CORPFIN 2503, Week 2 24/41

Correlation Data mining Visual analytics

Key principles according to Kna�ic, C.N. (2015)

1. Understand the context

2. Choose a suitable visual display

3. Eliminate clutter

4. Focus attention where you want it

5. Think like a designer

6. Tell a story.

£ius CORPFIN 2503, Week 2 25/41

Correlation Data mining Visual analytics

Understand the context

What message do you want to send to the audience?

Will the �gure be included in presentation, report, e-mail etc.?

Who is your audience?

£ius CORPFIN 2503, Week 2 26/41

Correlation Data mining Visual analytics

Choose a suitable visual display

• Table
• Scatterplot
• Horizontal bar
• Vertical bar
• Heatmap
• Stacked horizontal bar
• 3-D �gure
• etc.

£ius CORPFIN 2503, Week 2 27/41

Correlation Data mining Visual analytics

Eliminate clutter

Every single element added on the page or slide requires brain
power to process for the audience.

Do not include elements that increase excessive cognitive load but
do not increase understanding (e.g., most 3-D �gures).

£ius CORPFIN 2503, Week 2 28/41

Correlation Data mining Visual analytics

Focus attention where you want it

Properly made �gures `enable our audience to see what we want
them to see before they even know they’re seeing it’.

Options to consider:

• size
• color
• font and its attributes (bold, italic)
• position on page.

£ius CORPFIN 2503, Week 2 29/41

Correlation Data mining Visual analytics

Think like a designer

• Highlight the important stu�
• Eliminate distractions
• Create visual hierarchy of information
• You might add some text inside the �gure.

£ius CORPFIN 2503, Week 2 30/41

Correlation Data mining Visual analytics

Tell a story

• The �gure or table should help you tell the story.
• If it does not, do not include in the report or presentation or
improve the �gure.

£ius CORPFIN 2503, Week 2 31/41

Correlation Data mining Visual analytics

Common issues: Bar width

Too thick Too thin

Appropriate

0
1
2
3
4
5
6
7

A B C D
0
1
2
3
4
5
6
7

A B C D

0
1
2
3
4
5
6
7

A B C D

£ius CORPFIN 2503, Week 2 32/41

Correlation Data mining Visual analytics

Common issues: Minimum axis value

50

51

52

53

54

55

56

A B C D
0

10

20

30

40

50

60

A B C D

£ius CORPFIN 2503, Week 2 33/41

Correlation Data mining Visual analytics

Common issues: Use text and color

0

2

4

6

8

10

12

14

16

18

20

2013 2014 2015 2016 2017

Profit after adoption
of the new strategy

£ius CORPFIN 2503, Week 2 34/41

Correlation Data mining Visual analytics

Common issues: Decimal places

2013 -0.756244237 2013 -0.756
2014 0.428606224 2014 0.429
2015 1.241138714 versus 2015 1.241
2016 0.836164502 2016 0.836
2017 0.691036891 2017 0.691

£ius CORPFIN 2503, Week 2 35/41

Correlation Data mining Visual analytics

Common issues: Large numbers

Year Net income ($) Year Net income ($)

2013 -7562437.004 2013 -7,562,437
2014 4286062.2 2014 4,286,062
2015 12411387.14 versus 2015 12,411,387
2016 836164.5023 2016 836,165
2017 6910368.906 2017 6,910,369

£ius CORPFIN 2503, Week 2 36/41

Correlation Data mining Visual analytics

Common issues: Large numbers II

Year Net income ($) Year Net income ($ millions)

2013 -7562437.004 2013 -7.6
2014 4286062.2 2014 4.3
2015 12411387.14 versus 2015 12.4
2016 836164.5023 2016 0.8
2017 6910368.906 2017 6.9

£ius CORPFIN 2503, Week 2 37/41

Correlation Data mining Visual analytics

Common issues: Use di�erent colors (bad)

0

1

2

3

4

5

6

7

8

9

10

2010 2011 2012 2013 2014 2015 2016 2017
A B C E F G

£ius CORPFIN 2503, Week 2 38/41

Correlation Data mining Visual analytics

Common issues: Use di�erent colors (good)

0

1

2

3

4

5

6

7

8

9

10

2010 2011 2012 2013 2014 2015 2016 2017
A

£ius CORPFIN 2503, Week 2 39/41

Correlation Data mining Visual analytics

Common issues: Logarithmic scale

Normal scale Logarithmic scale

0

500

1000

1500

2000

2500

1 2 3 4 5
1

10

100

1000

10000

1 2 3 4 5

£ius CORPFIN 2503, Week 2 40/41

Correlation Data mining Visual analytics

Required reading

Konasani, V. R. and Kadre, S. (2015). �Practical Business
Analytics Using SAS: A Hands-on Guide�: chapter 5.

Schwabish, J.A. (2014). An Economist’s Guide to Visualizing Data.
Journal of Economic Perspectives 28(1): 209-234.

Leo, S. (2019). Mistakes, we’ve drawn a few: Learning from our
errors in data visualisation. The Economist: https://medium.
economist.com/mistakes-weve-drawn-a-few-8cdd8a42d368.

£ius CORPFIN 2503, Week 2 41/41

https://medium.economist.com/mistakes-weve-drawn-a-few-8cdd8a42d368
https://medium.economist.com/mistakes-weve-drawn-a-few-8cdd8a42d368

Correlation
Correlation

Data mining
Data mining

Visual analytics
Visual analytics