程序代写代做代考 data mining Data Mining and Machine Learning

Data Mining and Machine Learning
Statistical Modelling (1) Peter Jančovič
Slide 1
Data Mining and Machine Learning

Objectives
 Review basic statistical modelling
 Review the notions of probability distribution and probability density function (PDF)
 Gaussian PDF
 Multivariate Gaussian PDFs
 Parameter estimation for Gaussian PDFs
Slide 2
Data Mining and Machine Learning

Discrete random variables
 Suppose that Y is a random variable which can take
any value in a discrete set X={x1,x2,…,xM}
 Suppose that y1,y2,…,yT are samples of the random
variable Y
 If cm is the number of times that the yn = xm then an estimate of the probability that yn takes the value xm is given by: c
PxmPyn xm m N
Slide 3
Data Mining and Machine Learning

Continuous Random Variables
 In most practical applications the data are not restricted to a finite set of values – they can take any value in N-dimensional space
 Counting the number of occurrences of each value is no longer a viable way of estimating probabilities…
 …but generalisations of this approach are applicable to continuous variables – non-parametric methods
Slide 4
Data Mining and Machine Learning

Continuous Random Variables
 An alternative is to use a parametric model
 Probabilities are defined by a small set of parameters
 Familiar example is a normal, or Gaussian model
 A (scalar/univariate) Gaussian probability density function (PDF) is defined by two parameters – its mean  and variance 
 For a multivariate Gaussian PDF defined on a vector space,  is the mean vector and  is the covariance matrix
Slide 5
Data Mining and Machine Learning

Gaussian PDF
 ‘Standard’ 1- dimensional Gaussian PDF:
– mean=0
– variance =1
P(a  x  b)
0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05
0ab
-5 -4 -3 -2 -1 0 1 2 3 4 5
x
Slide 6
Data Mining and Machine Learning

Gaussian PDF
 For a 1-dimensional Gaussian PDF p with mean 
and variance :
px px|,
1
 x
2 exp 2  
2
Constant to ensure area under curve is 1
Defines ‘bell’ shape
Slide 7
Data Mining and Machine Learning

Standard Deviation
 Standard deviation is the square root of the variance  For a Gaussian PDF:
– 68% of the area under the curve lies within one standard deviation (s.d.) of the mean
– 95% of the area under the curve lies within two s.ds of the mean
– 99% of the area under the curve lies within three standard deviations of the mean
Slide 8
Data Mining and Machine Learning

Standard Deviation
s
P  s  x    s 0.68 P  2s  x    2s 0.95 P 3s  x   3s 0.99
Slide 9
Data Mining and Machine Learning

Multivariate Gaussian PDFs
 A (univariate) Gaussian PDF assumes the random variable takes scalar values
 In the case where the random variable takes N dimensional vector values the corresponding PDF is called a multivariate Gaussian PDF and is given by:
11T  px 2  exp 2 mx 1mx
N
Slide 10
Data Mining and Machine Learning

Visualising multivariate Gaussian PDFs

Slide 11
Data Mining and Machine Learning

Example
 If   9 0 , standard
0 4 deviations in ‘x’ and ‘y’
directions are 3 and 2, respectively, and the 1 s.d. contour is an ellipse:
Slide 12
Data Mining and Machine Learning

Example 2:
 Now suppose   7.75 2.17 and m  2
2.17 5.25 4  Calculate the eigenvalue decomposition of Σ
31 31
UDUT 2 29 02 2
1 3041 3
 2222
Slide 13
Data Mining and Machine Learning

Example 2 (continued)  Note U is a rotation through 30o
 Hence the one standard deviation contour is the same as in the previous example, but rotated through 30o and translated by
m  2 4
Slide 14
Data Mining and Machine Learning

Example 2 (continued)
Slide 15
Data Mining and Machine Learning

Fitting a Gaussian PDF to Data
 Suppose y = y1,…,yt,…,yT is a set of T data values
 For a Gaussian PDF p with mean  and variance , define:
T py|,py |,
t t1
 How do we choose  and  to maximise p(y|, )?
Slide 16
Data Mining and Machine Learning

Fitting a Gaussian PDF to Data
Good fit
1.4 1.2 1 0.8 0.6 0.4 0.2
-5 -4 -3 -2 -1 0 1 2-5 -43 -34-2 5-1-0.20 1 2 3 4 5
1.4 1.2 1 0.8 0.6 0.4 0.2 00
Poor fit
Slide 17
Data Mining and Machine Learning

Maximum Likelihood Estimation
 The ‘best fitting’ Gaussian maximises p(y|,)  Terminology:
– p(y|,), as a function of y is the probability (density) of y
– p(y|,), a function of , is the likelihood of ,
 Maximising p(y|,) with respect to , is called Maximum Likelihood (ML) estimation of ,
Slide 18
Data Mining and Machine Learning

ML estimation of ,  Intuitively:
– The maximum likelihood estimate of  should be the average value of y1,…,yT. (the sample mean)
– The maximum likelihood estimate of  should be the variance of y1,…,yT. (the sample variance)
 This is true: p(y| , ) is maximised by setting:
1T 1T
 y,  y
t
T t1 T t1
t
2
Slide 19
Data Mining and Machine Learning

0
t So, T yt, Tyt

t1 
t
Proof
First note that maximising p(y) is the same as maximising
log(p(y))
logpy|,logpy |,
t1 log pyt | ,  
t
 t1
logpy |, t
Also
At a maximum:
  y  log2  t
TT
1 2
2
 
T  logpy|,
 logpy |,
T 2y1
t1  T1T
t1 t1
Slide 20
Data Mining and Machine Learning

Multi-modal distributions
 In practice the distributions of many naturally occurring phenomena do not follow the simple bell- shaped Gaussian curve
 For example, if the data arises from several difference sources, there may be several distinct peaks (e.g. distribution of heights of adults)
 These peaks are the modes of the distribution and the distribution is called multi-modal
Slide 21
Data Mining and Machine Learning

Summary
 Reviewed basic statistical modelling, probability distribution, probability density function
 Gaussian PDFs
 Multivariate Gaussian PDFs
 Maximum likelihood (ML) parameter estimation
 In the next session we will introduce Gaussian mixture PDFs (GMMs) and ML parameter estimation for GMMs
Slide 22
Data Mining and Machine Learning