Correlation – Introduction
School of Computing and Information Systems
@University of Melbourne 2022
Copyright By PowCoder代写 加微信 powcoder
• Discuss correlations between pairs of features in a dataset • Why useful and important
• Pitfalls
• Methods for computing correlation
• Pearson correlation
• Mutual information (another method to compute correlation)
COMP20008 Elements of Data Processing 2
What is Correlation?
Correlation is used to detect pairs of variables that might have some relationship.
How strong is the relationship?
https://www.mathsisfun.com/data/correlation.html
COMP20008 Elements of Data Processing 4
What is Correlation?
Visually can be identified via inspecting scatter plots
https://www.mathsisfun.com/data/correlation.html
COMP20008 Elements of Data Processing
What is Correlation?
Linear relations
https://www.mathsisfun.com/data/correlation.html
COMP20008 Elements of Data Processing 6
What is Correlation?
Correlation strength
https://images.app.goo.gl/zZXtjBLR2BcjRpK79
COMP20008 Elements of Data Processing 7
Example of non-linear correlation
https://www.mathsisfun.com/data/correlation.html It gets so hot that people aren’t going near the
shop, and sales start dropping
COMP20008 Elements of Data Processing 8
Example of non-linear correlation
https://www.mathsisfun.com/data/correlation.html It gets so hot that people aren’t going near the
shop, and sales start dropping
COMP20008 Elements of Data Processing 9
Example of Correlated Variables
• Greater understanding of data
• Can hint at potential causal
relationships
• Business decision based on correlation: increase electricity production when temperature increases
COMP20008 Elements of Data Processing 10
Example of Correlated Variables
Correlation does not necessarily imply causality!
COMP20008 Elements of Data Processing 11
Example rank correlation
“If a university has a higher-ranked football team, then is it likely to have a higher-ranked basketball team?”
Football ranking
University team
Basketball ranking
University team
COMP20008 Elements of Data Processing 12
Microarray data
Each chip contains thousands of tiny probes corresponding to the genes (20k – 30k genes in humans). Each probe measures the activity (expression) level of a gene
Gene 1 expression
Gene 2 expression
Gene 20K expression
COMP20008 Elements of Data Processing 13
Microarray dataset
• Each row represents measurements at some time
• Each column represents levels of a gene
COMP20008 Elements of Data Processing 14
Correlation analysis on Microarray data
Can reveal genes that exhibit similar patterns ⇨ similar or related functions ⇨ Discover functions of unknown genes
COMP20008 Elements of Data Processing 15
Genetic network
Connect genes with high correlation – understand behaviour of groups of genes
COMP20008 Elements of Data Processing 16
Salt Causes High Blood Pressure
Intersalt: an international study of electrolyte excretion and blood pressure. Results for 24 hour urinary sodium and potassium excretion.
British Medical Journal; 297: 319-328, 1988.
100# 150# 200# 250#
Median’urinary’sodium’excre8on'(mmol/24hr)’
COMP20008 Elements of Data Processing 17
Median’diastolic’blood’pressure'(mmHg)’
Or Does It!?
Intersalt: an international study of electrolyte excretion and blood pressure. Results for 24 hour urinary sodium and potassium excretion.
British Medical Journal; 297: 319-328, 1988.
If we exclude these four ‘outliers’, which are non-industrialised countries with non-salt diets, we get a quite different result!
100# 150# 200# 250#
Median’urinary’sodium’excre8on'(mmol/24hr)’
COMP20008 Elements of Data Processing 18
Median’diastolic’blood’pressure'(mmHg)’
Spurious Correlation
Correlation ≠ Causality
https://assets.businessinsider.com/real-maps-ridiculous- correlations-2014-11?jwsource=cl
https://images.app.goo.gl/FVr8BhxWmQMxCB5f7
COMP20008 Elements of Data Processing 19
Why is correlation important?
• Discover relationships
• One step towards discovering causality A causes B
Example: Smoking causes lung cancer
Feature ranking: for building better predictive models
A good feature to use, is a feature that has high correlation with the
outcome we are trying to predict
COMP20008 Elements of Data Processing 20
From the time before, 2019
COMP20008 Elements of Data Processing
School of Computing and Information Systems
@University of Melbourne 2022
Correlation
COMP20008 Elements of Data Processing
Problems of Euclidean distance
• Objects can be represented with different measure scales
Temperature
#Ice-creams
#Electricity
d(temp,ice-cr)= 540324 d(temp,elect)= 12309388
• Euclidean distance: does not give a clear intuition about how well variables are correlated
COMP20008 Elements of Data Processing 3
Problems of Euclidean distance
Cannot discover variables with similar behaviours/dynamics but at different scale
COMP20008 Elements of Data Processing 4
Problems of Euclidean distance
Cannot discover variables with similar behaviours/dynamics but in the opposite direction (negative correlation)
COMP20008 Elements of Data Processing 5
Assessing linear correlation – Pearson correlation
• We will define a correlation measure rxy, assessing samples from two features x and y
• Assess how close their scatter plot is to a straight line (a linear relationship)
Moment Correlation
• Range of rxy lies within [-1,1]:
• 1 for perfect positive linear correlation
• -1 for perfect negative linear correlation
• 0 means no correlation
• Absolute value |r| indicates strength of linear correlation
COMP20008 Elements of Data Processing 6
Pearson’s correlation coefficient (r)
∑& (%# − %̅)()# − )*) #$%
(∑& (%# − %̅)’)×(∑& ()# − )*)’) #$% #$%
∑$ “∗×(∗ !”# ! !
(∑$ “∗()×(∑$ (∗() !”# ! !”# !
(!∗ = (! − (‘
” ̅ = % & ” ! !”#
(‘ = % & ( ! !”#
“!∗ = “! − “̅
COMP20008 Elements of Data Processing
Pearson coefficient example
Height (x)
Weight (y)
• How do the values of x and y move (vary) together?
• Big values of x with big values of y?
• Small values of x with small values of y?
COMP20008 Elements of Data Processing 8
Pearson coefficient example
COMP20008 Elements of Data Processing 9
Interpreting Pearson correlation values
In general it depends on your domain of application. has suggested
• 0.5 is large
• 0.3-0.5 is moderate
• 0.1-0.3 is small
• less than 0.1 is trivial
COMP20008 Elements of Data Processing 11
Properties of Pearson’s correlation
• Range within [-1,1]
• Is sensitive to outliers
• Can only detect linear relationships
y = a.x + b + noise
• Cannot detect non-linear relationships
y=x3 +noise
• Scale invariant: r(x,y)= r(x, Ky)
• Multiplying a feature’s values by a constant K makes no difference • Location invariant: r(x,y)= r(x, K+y)
• Adding a constant K to one feature’s values makes no difference
COMP20008 Elements of Data Processing 12
Pearson correlation examples
https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
COMP20008 Elements of Data Processing 13
From the time before, 2019
Mutual Information
School of Computing and Information Systems
@University of Melbourne 2022
Recap: Pearson correlation – assess linear correlation between two features
COMP20008 Elements of Data Processing
https://www.mathsisfun.com/data/correlation.html
What about non-linear correlation?
Pearson correlation is not suitable for this scenario (value less than 0.1)
https://www.mathsisfun.com/data/correlation.html
COMP20008 Elements of Data Processing 3
Mutual Information
COMP20008 Elements of Data Processing 4
Entropy – intuition
• Entropy is a measure used to quantify the amount of uncertainty in an outcome
• Randomly select an element from
A. {1,1,1,1,1,1,1,2} versus B. {1,1,2,2,3,3,4,4}
• In which case are you more uncertain about the outcome of the selection? Why?
• More uncertain, less predictable => high entropy (b.) • Less uncertain, more predictable => low entropy (a.)
COMP20008 Elements of Data Processing 5
Another example
Consider the sample of all people in this subject. Each person is labelled young (<30 years) or old (>=30 years)
• Randomly select a person and inspect whether they are young or old.
• How surprised am I likely to be by the outcome?
• Suppose I repeat the experiment using a random sample of people catching the train to the city in peak hour?
• How surprised am I likely to be by the outcome?
COMP20008 Elements of Data Processing 6
Information theory – • Abit==0or1
• 1 bit of information reduces uncertainty by a factor of 2.
Flipping a coin: head – 50%, tail – 50%
• the machine tells you that the outcome will be tail • Uncertainty for tail reduced by a factor of 2.
• 1 bit of information.
• the machine tells you that the outcome will be head • Uncertainty for head reduced by a factor of 2.
• 1 bit of information.
• On average, machine transmits 0.5*1 + 0.5*1 = 1 bit (Entropy)
COMP20008 Elements of Data Processing 7
Flipping a special coin: head – 75%, tail – 25%
• the machine tells you that the outcome will be tail • Uncertainty for tail reduced by a factor of 4 (1/0.25 = 4) • 2 bits of information (log2 4 = – log2 0.25 = 2)
• the machine tells you that the outcome will be head
• Uncertainty for head reduced by a factor of 4/3 (1/0.75 = 4/3). • 0.41 bit of information (log2 4/3 = – log2 0.75 = 0.41)
• On average, machine transmits 0.75*0.41 + 0.25 * 2 = 0.81 (Entropy)
COMP20008 Elements of Data Processing 8
A recap on logarithms (to the base 2)
• y = log2x (y is the solution to the question “To what power do I need to raise 2, in order to get x?’’)
• 2*2*2*2=16 which means log216 = 4 (16 is 2 to the power 4) •log2 32=5
• log2 30 = 4.9
• log2 1.2 = 0.26
• log2 0.5 = -1
In what follows, we’ll write log instead of log2 to represent binary log
COMP20008 Elements of Data Processing 9
Given a feature !, assuming ! has k number of categories (bins),
then # ! is its entropy.
#! =−&’!log%’!
%$: proportion of points in the 4-th category (bin) X : Flipping a special coin, head: 75%, tail: 25%
! ” =− %!log”%! + %#log”%# =−[0.75log”0.75+0.25log”0.25]=1.23
COMP20008 Elements of Data Processing 10
Entropy practice example
We have 3 categories (A,B,C) for a feature ! 9 object, each object is in one category.
!5 =−6%$log”%$ $%&
What is the entropy of this sample data of 9 objects? Answer: ” ! = 1.53
COMP20008 Elements of Data Processing 11
How would you compute the entropy for the “Likes to sleep” feature?
!: Likes to sleep
!5 =−6%$log”%$ $%&
Likes to sleep
COMP20008 Elements of Data Processing
Properties of entropy
• H(X) ≥ 0
• Entropy is maximized for uniform distribution (highly uncertain what value a randomly selected object will have)
• Entropy – when using log base 2 – measures uncertainty of the outcome in bits. This can be viewed as the information associated with learning the outcome
COMP20008 Elements of Data Processing 13
Variable discretisation
• Pre-processing: continuous features are first discretised into bins (categories). E.g. small [0,1.4], medium (1.4,1.8), large [1.8,3.0]
Discretised Height
COMP20008 Elements of Data Processing 14
Variable discretisation: Techniques
• Equal-width bin
Divide the range of the continuous feature into equal length intervals (bins). If speed ranges from 0-100, then the 10 bins are [0,10), [10,20), [20,30), …[90,100]
• Equal-frequency bin
Divide range of continuous feature into equal frequency intervals (bins). Sort the values and divide so that each bin has roughly same number of objects
• Domain knowledge: assign thresholds manually Car speed:
• 0-40: slow
• 40-60: medium • >60: high
COMP20008 Elements of Data Processing 15
Discretisation example
Given the values 2, 2, 3, 10, 13, 15, 16, 17, 19, 19, 20, 20, 21 • A 3-bin equal width discretization
• Bin-width: “&(” = 6.333 )
• [2, 8.333), [8.333, 14.666), [14.666, 21]
COMP20008 Elements of Data Processing 16
Discretisation example
Given the values 2, 2, 3, 10, 13, 15, 16, 17, 19, 19, 20, 20, 21
• A 3-bin equal frequency discretization • [2, 13), [13, 19), [19, 21] – by hand
• (1.999, 13], (13, 19], (19, 21] – pandas
COMP20008 Elements of Data Processing 17
From the time before, 2019
COMP20008 Elements of Data Processing
Mutual Information
School of Computing and Information Systems
@University of Melbourne 2022
Conditional entropy – intuition
Suppose I randomly sample a person. I check if they wear glasses – how surprised am I by their age?
WearGlasses(X)
COMP20008 Elements of Data Processing 2
Conditional entropy H(Y|X)
! ” # = % & ‘ !(“|# = ‘) !∈#
Measures how much information is needed to describe outcome Y, given that outcome X is known. Suppose X is Height and Y is Weight.
Height (X)
Weight (Y)
COMP20008 Elements of Data Processing
Conditional entropy H(Y|X)
! ” # = % & ‘ !(“|# = ‘) !∈#
! “# =2+ ∗! “#=/01 +5+ ∗! “#=45677 77
!”#=’ =−%&9′ log&&9′ $∈%
!”#=%&’ =−1) log 1) +1) log 1) =1 2!22!2
!”#=01233 =−4) log!4) +1) log!1) = 0.721 5555
! ” # =2+ ∗1+5+ ∗0.721=0.801 77
Height (X)
Weight (Y)
COMP20008 Elements of Data Processing
Mutual information definition
!”#,% =’% −’%# !”#,% =’# −’#%
‘ % # = ) * + ‘(%|# = +) 9∈;
where X and Y are features (columns) in a dataset
• MI(mutualinformation)isameasureofcorrelation
• TheamountofinformationaboutYwegainbyknowingX,or • TheamountofinformationaboutXwegainbyknowingY
COMP20008 Elements of Data Processing 5
Mutual information example 1
!”#,% =’% −’%#
‘ % =− )*(,)log$*(,) !∈#
‘ % # = ) * 1 ‘(%|# = 1) %∈&
$%=−5(log5(+2(log2( =0.8631 7!77!7
$ %6 =6( ∗$ %6=89:;; +1( ∗$ %6=<=> 77
$%6=6(∗5(log5(+1(log1( +1(∗1∗log1=0.5571 76!66!67 !
?@ 6, % = 0.8631 − 0.5571 = 0.3059
Height (X)
Weight (Y)
COMP20008 Elements of Data Processing
Mutual information example 2
!”#,% =’% −’%#
‘% =−)*(/)log>*(/) <∈=
' % # = ) * + '(%|# = +)
9∈; ! " = 1.379
!"# =0.965
DE#," =!" −!"# =0.414
Height (X)
Weight (Y)
COMP20008 Elements of Data Processing
Properties of Mutual Information
• The amount of information shared between two variables X and Y
• large: X and Y are highly correlated (more dependent)
• small: X and Y have low correlation (more independent)
• 0 ≤ MI(X,Y) ≤ ∞
• Sometimes also referred to as ‘Information Gain’
COMP20008 Elements of Data Processing 8
Mutual information: normalisation
• MI(X,Y) is always at least zero, may be larger than 1
• In fact, one can show it is true that
• 0≤ MI(X,Y)≤ min(H(X),H(Y))
• Thus if want a measure in the interval [0,1], we can define
normalised mutual information (NMI).
• NMI(X,Y) = MI(X,Y) / min(H(X),H(Y))
• NMI(X,Y) = MI(X,Y) / max(H(X),H(Y))
• NMI(X,Y) = MI(X,Y) / mean(H(X),H(Y))
• NMI(X,Y)
• large: X and Y are highly correlated (more dependent)
• small: X and Y have low correlation (more independent)
COMP20008 Elements of Data Processing 9
Normalised Mutual Information
Example 1: '( = 0.3059 Example 2: '( = 0.414 /'( = 0.517 /'( = 0.300
Height (X)
Weight (Y)
Height (X)
Weight (Y)
COMP20008 Elements of Data Processing
Pearson correlation=-: -0.0864
Normalised mutual information (NMI)= 0.43 (3-bin equal spread bins)
COMP20008 Elements of Data Processing 11
Pearson correlation=-0.1
Normalised mutual information (NMI)=0.84
COMP20008 Elements of Data Processing 12
Pearson correlation=-0.05
Normalised mutual information (NMI)=0.35
COMP20008 Elements of Data Processing 13
• Pearson? • NMI?
COMP20008 Elements of Data Processing 14
• Pearson: 0.08 • NMI: 0.009
COMP20008 Elements of Data Processing 15
Computing MI with class features
Identifying features that are highly correlated with a class feature
HoursSleep
HoursExercise
HairColour
HoursStudy
(class feature)
Compute MI(HoursSleep, Happy), MI(HoursExercise, Happy),
and MI(HoursStudy, Happy), MI(HairColour, Happy). Retain most predictive feature(s)
COMP20008 Elements of Data Processing 16
Advantages and disadvantages of MI
• Advantages
• Can detect both linear and non-linear dependencies (unlike Pearson)
• Applicable and very effective for use with discrete features (unlike Pearson correlation)
• Disadvantages
• If feature is continuous, it first must be discretised to compute mutual
information. This involves making choices about what bins to use.
• This may not be obvious. Different bin choices will lead to different estimations of mutual information.
COMP20008 Elements of Data Processing 17
End of Correlation topic
From the time before, 2019
COMP20008 Elements of Data Processing
Acknowledgements
• Materials are partially adopted from ...
• Previous COMP2008 slides including material produced by ,
, , and others
• Correlation <> Causality: http://tylervigen.com/spurious-correlations
COMP20008 Elements of Data Processing 22
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com