ANLP_1: Introduction
ISML_2: Classification
Warm up discussion
University of Adelaide 2
Outlines
University of Adelaide 3
• Classification problem overview
• Instance-based classifier:
– Nearest Neighbour Classifier
– K-Nearest Neighbour classifier
– Hyper-parameter selection, validation and overfitting
• Generative classifier: Bayesian decision rule and Naïve Bayes
classifier
• Discriminative classifier overview
• Classifier comparison
Outlines
University of Adelaide 4
• Classification problem overview
• Instance-based classifier:
– Nearest Neighbour Classifier
– K-Nearest Neighbour classifier
– Hyper-parameter selection, validation and overfitting
• Generative classifier: Bayesian decision rule and Naïve Bayes
classifier
• Discriminative classifier overview
• Classifier comparison
Classification problems
• A fundamental problem in machine learning
• Type of classification problems
– Binary classification problem: binary output
– Multi-class classification problem: output one of many discrete
class labels. Class labels are mutual exclusive
– Multi-label classification problem: one sample could be assigned
to multiple labels.
University of Adelaide 5
The general process of building a
classification system
• Collecting training data, i.e., (x,y) pairs, x is the input
and y is the output class labels
• Find features for representing input data
• (optional) Train the classifier to find the best mapping
function f
• Apply the classifier on unobserved data
University of Adelaide 6
Example
University of Adelaide 7
Why classification is possible?
• We hope classifier performs well on the training set can
be generalized to the unobserved samples
• Perfect generalization is not always possible
– E.g., if ‘a’=1,’b’=2,’c’=1 how about ‘d’
• Machine learning essentially has some assumption about
the relationship between data and their associated
output, e.g., class labels
– For example, we commonly assume similar x will have similar y.
– But there might be many possible ways of defining “similarity”
University of Adelaide 8
Type of classifiers
• We can divide the large variety of classification
approaches into roughly three main types
– Instance-based: classifiers use observation directly
(no models) e.g. K nearest neighbors
– Generative: build a generative statistical model –
e.g., Naïve Bayes
– Discriminative: directly estimate a decision
rule/boundary – e.g., decision tree
University of Adelaide 9
Outlines
University of Adelaide 10
• Classification problem overview
• Instance-based classifier:
– Nearest Neighbour Classifier
– K-Nearest Neighbour classifier
– Hyper-parameter selection, validation and overfitting
• Generative classifier: Bayesian decision rule and Naïve Bayes
classifier
• Discriminative classifier overview
• Classifier comparison
Nearest Neighbour Classifier
• Classification by referring the class label of the nearest
neighbour
University of Adelaide 11
Distance measurement
• How do we measure the similarity or distance between
two samples?
• Many possible choices
– A good distance metric is usually the key to nearest neighbour
classifier
– The commonly used one: Euclidean distance
– Mahalanobis distance: the matrix M could be a parameter
learned with supervised learning (e.g., metric learning)
University of Adelaide 12
Voronoi Diagram
• An interesting pattern appears in many areas
– Voronoi Explained! – YouTube
• Given a set of points, a Voronoi diagram describes the
areas that are nearest to any given point.
• These areas can be viewed as zones of control
• Each zone is controlled by one of the points
University of Adelaide 13
Decision boundary of Nearest Neighbour
• Each example controls its own neighborhood
• Create the Voroni diagram
University of Adelaide 14
Decision boundary of Nearest Neighbour
• Decision boundary are formed by only retaining these
line segments separating different classes.
• The more training examples we have stored, the more
complex the decision boundaries can become
University of Adelaide 15
Decision boundary of Nearest Neighbour
University of Adelaide 16
K-nearest neighbour classifier
University of Adelaide 17
The effect of K in KNN classifiers
University of Adelaide 18
The impact of k
University of Adelaide 19
Question: How to choose k?
University of Adelaide 20
Question: How to choose k?
University of Adelaide 21
More general: Model selection problem
University of Adelaide 22
Overfitting
University of Adelaide 23
Under-fitting
University of Adelaide 24
Model selection: Validation set
University of Adelaide 25
The impact of the validation set size
University of Adelaide 26
Model selection: K-fold Cross Validation
University of Adelaide 27
Model selection: K-fold Cross Validation
University of Adelaide 28
Special case: Leave-one-out cross validation
University of Adelaide 29
Practical issues with KNN classifier
University of Adelaide 30
Practical issues with KNN classifier
University of Adelaide 31
Practical issues with KNN classifier
University of Adelaide 32
Properties of (K) nearest neighbour
classifiers
University of Adelaide 33
Outlines
University of Adelaide 34
• Classification problem overview
• Instance-based classifier:
– Nearest Neighbour Classifier
– K-Nearest Neighbour classifier
– Hyper-parameter selection, validation and overfitting
• Generative classifier: Bayesian decision Theory and Naïve
Bayes classifier
• Discriminative classifier overview
• Classifier comparison
Model the data
• Conditional probability of data
• It answers the question “what is the likelihood of a
particular sample?”
• It can be modelled by various models. Usually achieved
by Maximum Likelihood estimation
University of Adelaide 35
classdata
Generative Model and Bayesian view of
classification
University of Adelaide 36
Bayes rule:
Bayesian view of classification
University of Adelaide 37
Naïve Bayes [optional]
University of Adelaide 38
Assume the input/feature can be modelled as n random variables
Assume features are conditional independent
Normal independence
Bag-of-words representation of text
• Ignore the order of words, but only record their
occurrence frequency
– A simple but effective way of representing text
University of Adelaide 39
Naïve Bayes with the BoW model
University of Adelaide 40
Example:
a mini document: “ the most interesting film in the summer”
P(doc|c) = P(the|c)P(most|c)P(interesting|c)P(film|c)P(in|c)P(the|c)P(summer|c)
Formally,
Recall that,
Naïve Bayes Classifier [optional]
University of Adelaide 41
• We usually use logarithm of probability in practise
• We can merge the terms with the same word
is the number of occurrence of the k-th word in the vocabulary. It can be 0,
or non-zero
Example: log(P(the|c) + log(P(more|c)) + log(P(the|c))+ log(P(better|c))
= 2x log(P(the|c)) + 1x log(P(more|c)) + 1 x log(P(better|c))
Type of classifiers (Recap)
• We can divide the large variety of classification
approaches into roughly three main types
– Instance-based: classifiers use observation directly
(no models) e.g. K nearest neighbors
– Generative: build a generative statistical model – e.g.,
Naïve Bayes
– Discriminative: directly estimate a decision
rule/boundary – e.g., decision tree
University of Adelaide 42
Discriminative classifier overview
• Directly model the decision boundary without explicit
statistic model on data
– Model P(y|x) directly without modelling P(x|y)
• We will learn a lot of discriminative models in this
course!
– Support vector machine
– Boosting
– Random forest
– Neural networks
– …
University of Adelaide 43
Discriminative models vs. Generative
Models
• Discriminative classifier vs. Generative classifier
– Discriminative classifier usually works better in terms of
prediction accuracy
– Decision boundary modelling is usually easier than modelling
data distribution
– However, generative classifier has some benefits, e.g., work with
open-world setting
University of Adelaide 44
Summary
• Classifier overview
– Workflow
– Basic assumption
– Type of classifiers
• K-NN classifiers
– 1-NN classifier and K-NN classifier
– Impact of k
– How to choose hyper-parameter, overfitting, under-fitting
– Validation set and K-fold cross validation
• Generative classifier
– Bayesian decision rule
– Naïve Bayes
• Discriminative classifier
University of Adelaide 45