CS代写 COMP20008 Elements of Data Processing 2

Correlation – Introduction
School of Computing and Information Systems
@University of Melbourne 2022

Copyright By PowCoder代写 加微信 powcoder

• Discuss correlations between pairs of features in a dataset • Why useful and important
• Pitfalls
• Methods for computing correlation
• Pearson correlation
• Mutual information (another method to compute correlation)
COMP20008 Elements of Data Processing 2

What is Correlation?
Correlation is used to detect pairs of variables that might have some relationship.
How strong is the relationship?
https://www.mathsisfun.com/data/correlation.html
COMP20008 Elements of Data Processing 4

What is Correlation?
Visually can be identified via inspecting scatter plots
https://www.mathsisfun.com/data/correlation.html
COMP20008 Elements of Data Processing

What is Correlation?
Linear relations
https://www.mathsisfun.com/data/correlation.html
COMP20008 Elements of Data Processing 6

What is Correlation?
Correlation strength
https://images.app.goo.gl/zZXtjBLR2BcjRpK79
COMP20008 Elements of Data Processing 7

Example of non-linear correlation
https://www.mathsisfun.com/data/correlation.html It gets so hot that people aren’t going near the
shop, and sales start dropping
COMP20008 Elements of Data Processing 8

Example of non-linear correlation
https://www.mathsisfun.com/data/correlation.html It gets so hot that people aren’t going near the
shop, and sales start dropping
COMP20008 Elements of Data Processing 9

Example of Correlated Variables
• Greater understanding of data
• Can hint at potential causal
relationships
• Business decision based on correlation: increase electricity production when temperature increases
COMP20008 Elements of Data Processing 10

Example of Correlated Variables
Correlation does not necessarily imply causality!
COMP20008 Elements of Data Processing 11

Example rank correlation
“If a university has a higher-ranked football team, then is it likely to have a higher-ranked basketball team?”
Football ranking
University team
Basketball ranking
University team
COMP20008 Elements of Data Processing 12

Microarray data
Each chip contains thousands of tiny probes corresponding to the genes (20k – 30k genes in humans). Each probe measures the activity (expression) level of a gene
Gene 1 expression
Gene 2 expression
Gene 20K expression
COMP20008 Elements of Data Processing 13

Microarray dataset
• Each row represents measurements at some time
• Each column represents levels of a gene
COMP20008 Elements of Data Processing 14

Correlation analysis on Microarray data
Can reveal genes that exhibit similar patterns ⇨ similar or related functions ⇨ Discover functions of unknown genes
COMP20008 Elements of Data Processing 15

Genetic network
Connect genes with high correlation – understand behaviour of groups of genes

Gene Network


COMP20008 Elements of Data Processing 16

Salt Causes High Blood Pressure
Intersalt: an international study of electrolyte excretion and blood pressure. Results for 24 hour urinary sodium and potassium excretion.
British Medical Journal; 297: 319-328, 1988.
100# 150# 200# 250#
Median’urinary’sodium’excre8on'(mmol/24hr)’
COMP20008 Elements of Data Processing 17
Median’diastolic’blood’pressure'(mmHg)’

Or Does It!?
Intersalt: an international study of electrolyte excretion and blood pressure. Results for 24 hour urinary sodium and potassium excretion.
British Medical Journal; 297: 319-328, 1988.
If we exclude these four ‘outliers’, which are non-industrialised countries with non-salt diets, we get a quite different result!
100# 150# 200# 250#
Median’urinary’sodium’excre8on'(mmol/24hr)’
COMP20008 Elements of Data Processing 18
Median’diastolic’blood’pressure'(mmHg)’

Spurious Correlation
Correlation ≠ Causality
https://assets.businessinsider.com/real-maps-ridiculous- correlations-2014-11?jwsource=cl
https://images.app.goo.gl/FVr8BhxWmQMxCB5f7
COMP20008 Elements of Data Processing 19

Why is correlation important?
• Discover relationships
• One step towards discovering causality A causes B
Example: Smoking causes lung cancer
— Feature ranking: for building better predictive models
A good feature to use, is a feature that has high correlation with the
outcome we are trying to predict
COMP20008 Elements of Data Processing 20

From the time before, 2019
COMP20008 Elements of Data Processing

School of Computing and Information Systems
@University of Melbourne 2022

Correlation
COMP20008 Elements of Data Processing

Problems of Euclidean distance
• Objects can be represented with different measure scales
Temperature
#Ice-creams
#Electricity
d(temp,ice-cr)= 540324 d(temp,elect)= 12309388
• Euclidean distance: does not give a clear intuition about how well variables are correlated
COMP20008 Elements of Data Processing 3

Problems of Euclidean distance
Cannot discover variables with similar behaviours/dynamics but at different scale
COMP20008 Elements of Data Processing 4

Problems of Euclidean distance
Cannot discover variables with similar behaviours/dynamics but in the opposite direction (negative correlation)
COMP20008 Elements of Data Processing 5

Assessing linear correlation – Pearson correlation
• We will define a correlation measure rxy, assessing samples from two features x and y
• Assess how close their scatter plot is to a straight line (a linear relationship)
Moment Correlation
• Range of rxy lies within [-1,1]:
• 1 for perfect positive linear correlation
• -1 for perfect negative linear correlation
• 0 means no correlation
• Absolute value |r| indicates strength of linear correlation
COMP20008 Elements of Data Processing 6

Pearson’s correlation coefficient (r)
∑& (%# − %̅)()# − )*) #$%
(∑& (%# − %̅)’)×(∑& ()# − )*)’) #$% #$%
∑$ “∗×(∗ !”# ! !
(∑$ “∗()×(∑$ (∗() !”# ! !”# !
(!∗ = (! − (‘
” ̅ = % & ” ! !”#
(‘ = % & ( ! !”#
“!∗ = “! − “̅
COMP20008 Elements of Data Processing

Pearson coefficient example
Height (x)
Weight (y)
• How do the values of x and y move (vary) together?
• Big values of x with big values of y?
• Small values of x with small values of y?
COMP20008 Elements of Data Processing 8

Pearson coefficient example
COMP20008 Elements of Data Processing 9

Interpreting Pearson correlation values
In general it depends on your domain of application. has suggested
• 0.5 is large
• 0.3-0.5 is moderate
• 0.1-0.3 is small
• less than 0.1 is trivial
COMP20008 Elements of Data Processing 11

Properties of Pearson’s correlation
• Range within [-1,1]
• Is sensitive to outliers
• Can only detect linear relationships
y = a.x + b + noise
• Cannot detect non-linear relationships
y=x3 +noise
• Scale invariant: r(x,y)= r(x, Ky)
• Multiplying a feature’s values by a constant K makes no difference • Location invariant: r(x,y)= r(x, K+y)
• Adding a constant K to one feature’s values makes no difference
COMP20008 Elements of Data Processing 12

Pearson correlation examples
https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
COMP20008 Elements of Data Processing 13

From the time before, 2019

Mutual Information
School of Computing and Information Systems
@University of Melbourne 2022

Recap: Pearson correlation – assess linear correlation between two features
COMP20008 Elements of Data Processing
https://www.mathsisfun.com/data/correlation.html

What about non-linear correlation?
Pearson correlation is not suitable for this scenario (value less than 0.1)
https://www.mathsisfun.com/data/correlation.html
COMP20008 Elements of Data Processing 3

Mutual Information
COMP20008 Elements of Data Processing 4

Entropy – intuition
• Entropy is a measure used to quantify the amount of uncertainty in an outcome
• Randomly select an element from
A. {1,1,1,1,1,1,1,2} versus B. {1,1,2,2,3,3,4,4}
• In which case are you more uncertain about the outcome of the selection? Why?
• More uncertain, less predictable => high entropy (b.) • Less uncertain, more predictable => low entropy (a.)
COMP20008 Elements of Data Processing 5

Another example
Consider the sample of all people in this subject. Each person is labelled young (<30 years) or old (>=30 years)
• Randomly select a person and inspect whether they are young or old.
• How surprised am I likely to be by the outcome?
• Suppose I repeat the experiment using a random sample of people catching the train to the city in peak hour?
• How surprised am I likely to be by the outcome?
COMP20008 Elements of Data Processing 6

Information theory – • Abit==0or1
• 1 bit of information reduces uncertainty by a factor of 2.
Flipping a coin: head – 50%, tail – 50%
• the machine tells you that the outcome will be tail • Uncertainty for tail reduced by a factor of 2.
• 1 bit of information.
• the machine tells you that the outcome will be head • Uncertainty for head reduced by a factor of 2.
• 1 bit of information.
• On average, machine transmits 0.5*1 + 0.5*1 = 1 bit (Entropy)
COMP20008 Elements of Data Processing 7

Flipping a special coin: head – 75%, tail – 25%
• the machine tells you that the outcome will be tail • Uncertainty for tail reduced by a factor of 4 (1/0.25 = 4) • 2 bits of information (log2 4 = – log2 0.25 = 2)
• the machine tells you that the outcome will be head
• Uncertainty for head reduced by a factor of 4/3 (1/0.75 = 4/3). • 0.41 bit of information (log2 4/3 = – log2 0.75 = 0.41)
• On average, machine transmits 0.75*0.41 + 0.25 * 2 = 0.81 (Entropy)
COMP20008 Elements of Data Processing 8

A recap on logarithms (to the base 2)
• y = log2x (y is the solution to the question “To what power do I need to raise 2, in order to get x?’’)
• 2*2*2*2=16 which means log216 = 4 (16 is 2 to the power 4) •log2 32=5
• log2 30 = 4.9
• log2 1.2 = 0.26
• log2 0.5 = -1
In what follows, we’ll write log instead of log2 to represent binary log
COMP20008 Elements of Data Processing 9

Given a feature !, assuming ! has k number of categories (bins),
then # ! is its entropy.
#! =−&’!log%’!
%$: proportion of points in the 4-th category (bin) X : Flipping a special coin, head: 75%, tail: 25%
! ” =− %!log”%! + %#log”%# =−[0.75log”0.75+0.25log”0.25]=1.23
COMP20008 Elements of Data Processing 10

Entropy practice example
We have 3 categories (A,B,C) for a feature ! 9 object, each object is in one category.
!5 =−6%$log”%$ $%&
What is the entropy of this sample data of 9 objects? Answer: ” ! = 1.53
COMP20008 Elements of Data Processing 11

How would you compute the entropy for the “Likes to sleep” feature?
!: Likes to sleep
!5 =−6%$log”%$ $%&
Likes to sleep
COMP20008 Elements of Data Processing

Properties of entropy
• H(X) ≥ 0
• Entropy is maximized for uniform distribution (highly uncertain what value a randomly selected object will have)
• Entropy – when using log base 2 – measures uncertainty of the outcome in bits. This can be viewed as the information associated with learning the outcome
COMP20008 Elements of Data Processing 13

Variable discretisation
• Pre-processing: continuous features are first discretised into bins (categories). E.g. small [0,1.4], medium (1.4,1.8), large [1.8,3.0]
Discretised Height
COMP20008 Elements of Data Processing 14

Variable discretisation: Techniques
• Equal-width bin
Divide the range of the continuous feature into equal length intervals (bins). If speed ranges from 0-100, then the 10 bins are [0,10), [10,20), [20,30), …[90,100]
• Equal-frequency bin
Divide range of continuous feature into equal frequency intervals (bins). Sort the values and divide so that each bin has roughly same number of objects
• Domain knowledge: assign thresholds manually Car speed:
• 0-40: slow
• 40-60: medium • >60: high
COMP20008 Elements of Data Processing 15

Discretisation example
Given the values 2, 2, 3, 10, 13, 15, 16, 17, 19, 19, 20, 20, 21 • A 3-bin equal width discretization
• Bin-width: “&(” = 6.333 )
• [2, 8.333), [8.333, 14.666), [14.666, 21]
COMP20008 Elements of Data Processing 16

Discretisation example
Given the values 2, 2, 3, 10, 13, 15, 16, 17, 19, 19, 20, 20, 21
• A 3-bin equal frequency discretization • [2, 13), [13, 19), [19, 21] – by hand
• (1.999, 13], (13, 19], (19, 21] – pandas
COMP20008 Elements of Data Processing 17

From the time before, 2019
COMP20008 Elements of Data Processing

Mutual Information
School of Computing and Information Systems
@University of Melbourne 2022

Conditional entropy – intuition
Suppose I randomly sample a person. I check if they wear glasses – how surprised am I by their age?
WearGlasses(X)
COMP20008 Elements of Data Processing 2

Conditional entropy H(Y|X)
! ” # = % & ‘ !(“|# = ‘) !∈#
Measures how much information is needed to describe outcome Y, given that outcome X is known. Suppose X is Height and Y is Weight.
Height (X)
Weight (Y)
COMP20008 Elements of Data Processing

Conditional entropy H(Y|X)
! ” # = % & ‘ !(“|# = ‘) !∈#
! “# =2+ ∗! “#=/01 +5+ ∗! “#=45677 77
!”#=’ =−%&9′ log&&9′ $∈%
!”#=%&’ =−1) log 1) +1) log 1) =1 2!22!2
!”#=01233 =−4) log!4) +1) log!1) = 0.721 5555
! ” # =2+ ∗1+5+ ∗0.721=0.801 77
Height (X)
Weight (Y)
COMP20008 Elements of Data Processing

Mutual information definition
!”#,% =’% −’%# !”#,% =’# −’#%
‘ % # = ) * + ‘(%|# = +) 9∈;
where X and Y are features (columns) in a dataset
• MI(mutualinformation)isameasureofcorrelation
• TheamountofinformationaboutYwegainbyknowingX,or • TheamountofinformationaboutXwegainbyknowingY
COMP20008 Elements of Data Processing 5

Mutual information example 1
!”#,% =’% −’%#
‘ % =− )*(,)log$*(,) !∈#
‘ % # = ) * 1 ‘(%|# = 1) %∈&
$%=−5(log5(+2(log2( =0.8631 7!77!7
$ %6 =6( ∗$ %6=89:;; +1( ∗$ %6=<=> 77
$%6=6(∗5(log5(+1(log1( +1(∗1∗log1=0.5571 76!66!67 !
?@ 6, % = 0.8631 − 0.5571 = 0.3059
Height (X)
Weight (Y)
COMP20008 Elements of Data Processing

Mutual information example 2
!”#,% =’% −’%#
‘% =−)*(/)log>*(/) <∈= ' % # = ) * + '(%|# = +) 9∈; ! " = 1.379 !"# =0.965 DE#," =!" −!"# =0.414 Height (X) Weight (Y) COMP20008 Elements of Data Processing Properties of Mutual Information • The amount of information shared between two variables X and Y • large: X and Y are highly correlated (more dependent) • small: X and Y have low correlation (more independent) • 0 ≤ MI(X,Y) ≤ ∞ • Sometimes also referred to as ‘Information Gain’ COMP20008 Elements of Data Processing 8 Mutual information: normalisation • MI(X,Y) is always at least zero, may be larger than 1 • In fact, one can show it is true that • 0≤ MI(X,Y)≤ min(H(X),H(Y)) • Thus if want a measure in the interval [0,1], we can define normalised mutual information (NMI). • NMI(X,Y) = MI(X,Y) / min(H(X),H(Y)) • NMI(X,Y) = MI(X,Y) / max(H(X),H(Y)) • NMI(X,Y) = MI(X,Y) / mean(H(X),H(Y)) • NMI(X,Y) • large: X and Y are highly correlated (more dependent) • small: X and Y have low correlation (more independent) COMP20008 Elements of Data Processing 9 Normalised Mutual Information Example 1: '( = 0.3059 Example 2: '( = 0.414 /'( = 0.517 /'( = 0.300 Height (X) Weight (Y) Height (X) Weight (Y) COMP20008 Elements of Data Processing Pearson correlation=-: -0.0864 Normalised mutual information (NMI)= 0.43 (3-bin equal spread bins) COMP20008 Elements of Data Processing 11 Pearson correlation=-0.1 Normalised mutual information (NMI)=0.84 COMP20008 Elements of Data Processing 12 Pearson correlation=-0.05 Normalised mutual information (NMI)=0.35 COMP20008 Elements of Data Processing 13 • Pearson? • NMI? COMP20008 Elements of Data Processing 14 • Pearson: 0.08 • NMI: 0.009 COMP20008 Elements of Data Processing 15 Computing MI with class features Identifying features that are highly correlated with a class feature HoursSleep HoursExercise HairColour HoursStudy (class feature) Compute MI(HoursSleep, Happy), MI(HoursExercise, Happy), and MI(HoursStudy, Happy), MI(HairColour, Happy). Retain most predictive feature(s) COMP20008 Elements of Data Processing 16 Advantages and disadvantages of MI • Advantages • Can detect both linear and non-linear dependencies (unlike Pearson) • Applicable and very effective for use with discrete features (unlike Pearson correlation) • Disadvantages • If feature is continuous, it first must be discretised to compute mutual information. This involves making choices about what bins to use. • This may not be obvious. Different bin choices will lead to different estimations of mutual information. COMP20008 Elements of Data Processing 17 End of Correlation topic From the time before, 2019 COMP20008 Elements of Data Processing Acknowledgements • Materials are partially adopted from ... • Previous COMP2008 slides including material produced by , , , and others • Correlation <> Causality: http://tylervigen.com/spurious-correlations
COMP20008 Elements of Data Processing 22

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com