CS计算机代考程序代写 algorithm Motivation

Motivation

Motivation

Statistical / Machine Learning

5.0 – Statistical Learning

Motivation

Statistical / Machine Learning

Basic concepts

5.1 – Basic concepts

Motivation

Statistical / Machine Learning

Basic concepts

“Machine learning”?

Essentially, it is statistical learning

Essentially, we’re looking for a pattern (unseen up to now)

If there is no pattern, then ML will be counter-productive as it
is likely to produce one!

Assumes availability of relevant data

Examples: consumer taste/habits, online advertising, election
forecasts, risk prediction

Warning: linear regression is now referred to as “artificial
intelligence” by a lot of people

Motivation

Statistical / Machine Learning

Basic concepts

Statistical learning: regression

Variable of interest: Wage (continuous)

(figure from ISLR)

http://www-bcf.usc.edu/~gareth/ISL/

Motivation

Statistical / Machine Learning

Basic concepts

Statistical learning: classification

Variable of interest: Default (categorical)

(figure from ISLR)

http://www-bcf.usc.edu/~gareth/ISL/

Motivation

Statistical / Machine Learning

Basic concepts

Statistical learning: supervised learning

Variable of interest: Species (categorical)
With prior experience available (supervised classification)

4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

2.
0

2.
5

3.
0

3.
5

4.
0

Iris dataset

Sepal.Length

S
ep
al
.W
id
th

setosa
versicolor
virginica

setosa
versicolor
virginica

Motivation

Statistical / Machine Learning

Basic concepts

Statistical learning: unsupervised learning

Variable of interest: Species (categorical)
With no prior experience available (clustering)

4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

2.
0

2.
5

3.
0

3.
5

4.
0

Iris dataset

Sepal.Length

S
ep
al
.W
id
th

Motivation

Statistical / Machine Learning

Learning framework

5.2 – Learning framework

Motivation

Statistical / Machine Learning

Learning framework

Data splitting

A dataset is randomly split into a training set and a test set

The training set is used to fit (or train) the model

The test set is used to validate this model

Model is both tuned and fitted during training

We expect MSEtrain < MSEtest Overfitting occurs when too much emphasis is put on training the data: Training set yields a much smaller MSE than test set Model is describing the training data too well and unable to adapt to new data Yields poor prediction performance Motivation Statistical / Machine Learning Learning framework General framework with large data Example in model selection: Random split into training, validation and test sets (e.g. 50%, 25%, 25% resp.) Training set: used to fit the model Validation set: used to measure prediction error and choose best model Test set: used to measure generalization error of final model (i.e. ability to predict from new data) [The Elements of Statistical Learning, T. Hastie, R. Tibshirani, J. Friedman, Springer] Motivation Statistical / Machine Learning Learning framework General framework with small data Cross-validation is usually used when dealing with small samples. Example: model selection for classification with N = 50, p = 5000 1 Randomly divide samples into K cross-validation folds 2 For each fold k = 1, 2, . . . ,K: a Find a subset of predictors with higher (univariate) correlation with class labels, using all data except fold k b Using just this subset of predictors, build a multivariate classifier, using all data except fold k c Use the classifier to predict the class labels for fold k and compute corresponding prediction error [The Elements of Statistical Learning, T. Hastie, R. Tibshirani, J. Friedman, Springer] Motivation Statistical / Machine Learning Learning framework General framework: summary Randomly split training and test set Training set: Model calibration (tuning) Fit model on whole training set Test set: Model validation (test) Motivation Statistical / Machine Learning Learning framework Key aspects of the data (typical challenges) Data scales Beware of scale effects in heterogeneous data Illustrative example: iris dataset Dimensionality (too many covariates) Many variables may be “aligned” / redundant Data pre-filtering: must be done independently of class labels Illustrative example: iris dataset Dimensionality (too many dimensions in the observed data) Pre-process data using dimension reduction techniques Factor analysis, PCA are predominant and common choices Illustrative examples: iris and EuStockMarkets datasets Motivation Statistical / Machine Learning Performance assessment 5.3 - Performance assessment Motivation Statistical / Machine Learning Performance assessment Performance indicators for regression Mean Square Error (MSE): MSE = 1 n n∑ i=1 (Yi − f̂(Xi))2 LOO CV test MSE: MSE(n) = 1 n n∑ i=1 (Yi − f̂−i(Xi))2 k-fold CV test MSE: MSE(k) = 1 k k∑ i=1 MSE−ii Motivation Statistical / Machine Learning Performance assessment Performance indicators for classification Prediction accuracy (error rate) Err = 1 n n∑ i=1 I(Yi 6= Ŷi) LOO CV error rate: Err(n) = 1 n n∑ i=1 I(Yi 6= Ŷ −ii ) k-fold CV test MSE: Err(k) = 1 k k∑ i=1 Err−ii Motivation Statistical / Machine Learning Performance assessment Performance indicators for classification Recall: one seeks to retain or reject a null hypothesis H0 on the basis of evidence. Let us denote H1 the alternative hypothesis. H0 is true H1 is true H0 is accepted Correct decision Type II error H1 is accepted Type I error Correct decision The Null hypothesis can never be proven Type I error occurs when H0 is true but rejected P(Type I error) = significance level of the test P(Type II error) = false negative rate 1 - P(Type II error) = statistical power of the test (sensitivity) Motivation Statistical / Machine Learning Performance assessment Performance indicators for classification False Positive rate = FP/N = specificity (I) True Positive rate = TP/N = sensitivity = recall (1-II) AUC of the ROC Actual value Prediction outcome + - + TP FN - FP TN TOTAL P N ROC for some dataset Specificity S en si tiv ity 1.0 0.8 0.6 0.4 0.2 0.0 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 AUC=0.7314 Motivation Statistical / Machine Learning Some techniques of reference 5.4 - Some techniques of reference Motivation Statistical / Machine Learning Some techniques of reference Logistic regression Use a scrambled subsample x of the iris dataset: is = sample(1:150); x = iris[is[1:100],] Recode x$Species into is.virginica with values in (0; 1) (we’re changing the problem formulation slightly) Fit model: fit <- glm(Species~., data=x, family=binomial(logit)) Use fit to predict Species of remaining data points: testset = iris[-is,] y <- testset[,1:4] pred <- predict(fit, newdata=y, type=’response’) Assess prediction performance Motivation Statistical / Machine Learning Some techniques of reference Naive Bayes classification Recall: find unknown true label li for each observation Yi, given the observed predictor vector x0 Naive Bayes Classifier: for all i = 1, . . . , n l̂i = argmax j Pr(Yi = j|Xi = x0) Example: for a two-class problem, l̂i = 1 if Pr(Y = 1|X = x0) > 0.5

Motivation

Statistical / Machine Learning

Some techniques of reference

Regression and Classification Trees

Ex: ISLR::Hitters (http://www-bcf.usc.edu/~gareth/ISL)

http://www-bcf.usc.edu/~gareth/ISL

Motivation

Statistical / Machine Learning

Some techniques of reference

Regression and Classification Trees

Ex: ISLR::Hitters

Motivation

Statistical / Machine Learning

Some techniques of reference

Clustering

Idea: arrange n individuals into groups wrt a set of measures

The choice of measure is key and determines classification

Variables should be rescaled first (and weighted)

Highly dimensional data may require prior reducing
(PCA not necessarily pertinent here)

http://cran.r-project.org/web/views/Cluster.html

http://cran.r-project.org/web/views/Cluster.html

Motivation

Statistical / Machine Learning

Some techniques of reference

Hierarchical clustering

Hierarchical clustering: split-and-merge to construct a dendrogam

M = matrix(c(0,3,5,8,4,

3,0,2,6,8,

5,2,0,3,4,

8,6,3,0,1,

4,8,4,1,0),nrow=5)

dM = data.frame(M, row.names=c(“A”,”B”,”C”,”D”,”E”))

dmat = dist(dM)

plclust( hclust(dmat) )

Motivation

Statistical / Machine Learning

Some techniques of reference

Hierarchical clustering

D E

A

B C

4
6

8
1

0
1

2

Cluster Dendrogram

hclust (*, “complete”)
dmat

H
e

ig
h

t

Motivation

Statistical / Machine Learning

Some techniques of reference

Hierarchical clustering

Example: data(eurodist)

Creating a dendrogram:
hc = hclust(eurodist, method=”ward”)

Plotting dendrograms:
plot(hc) # (or plclust)

plot(hc, hang=-1)

rect.hclust(hc, k=3)

Motivation

Statistical / Machine Learning

Some techniques of reference

k-means clustering

k-means clustering is another popular clustering method

It is comparable to an Expectation-Maximisation algorithm

?kmeans…

Initialize at k clusters (use hierarchical to choose k?)

Move an individual to another cluster if criterion is optimized

Risk of convergence to local solution

Motivation

Statistical / Machine Learning

Some techniques of reference

Principal component analysis

Project data according to highest variance components

Linear orthogonal transformation:
each component is uncorrelated with preceding ones

Reveals internal structure of the data, explaining its variance

PC1 = greatest variance by any projection of the data

Theoretically optimal transform for a given data in LS terms

PCA is widely used e.g. to reduce problem dimensionality

?prcomp…

Motivation

Statistical / Machine Learning

Some techniques of reference

Principal component analysis

Example: ?USArrests

Statistics, in arrests per 100,000 residents for assault, murder,
and rape in each of the 50 US states in 1973

Also contains the % of the population living in urban areas

plot(USArrests, main=”USArrests data”) # scatterplots

pairs(USArrests, panel = panel.smooth,

main = “USArrests data”)

Motivation

Statistical / Machine Learning

Some techniques of reference

Principal component analysis

Murder

50 150 250


● ●

●●





● ●

● ●




10 20 30 40

5
1

0
1

5


● ●

●●




5
0

1
5

0
2

5
0

● ●

●●





Assault

●●

● ●




●●

●●




●●

●●

●●


●●


●●

●●

●●


●●

● UrbanPop

3
0

5
0

7
0

9
0

●●

●●

● ●


●●

5 10 15

1
0

2
0

3
0

4
0




●●







●●




30 50 70 90




● ●





Rape

USArrests data

Motivation

Statistical / Machine Learning

Some techniques of reference

Principal component analysis

data(USArrests)

cor(x) # correlation matrix

eigen(cor(x)) # eigenvalue decomposition

# compare result with prcomp:

prcomp(USArrests, scale = TRUE) # same vectors!

prcomp(∼Murder+Assault+Rape, data=USArrests,scale=T)
plot(prcomp(USArrests, scale = TRUE))

# equiv. cov(prcomp(USArrests, scale = TRUE)$x)
# i.e. (prcomp(USArrests, scale = TRUE)$sdev)^2,
# i.e. the eigenvalues of the cov/correl matrix

summary(prcomp(USArrests, scale = TRUE))

Motivation

Statistical / Machine Learning

Some techniques of reference

Principal component analysis

prcomp(USArrests, scale = TRUE)

V
a

ri
a

n
ce

s

0
.0

0
.5

1
.0

1
.5

2
.0

Statistical / Machine Learning
Basic concepts
Learning framework
Performance assessment
Some techniques of reference