CS计算机代考程序代写 Bayesian decision tree ANLP_1: Introduction

ANLP_1: Introduction

ISML_2: Classification

Warm up discussion

University of Adelaide 2

Outlines

University of Adelaide 3

• Classification problem overview

• Instance-based classifier:
– Nearest Neighbour Classifier

– K-Nearest Neighbour classifier

– Hyper-parameter selection, validation and overfitting

• Generative classifier: Bayesian decision rule and Naïve Bayes
classifier

• Discriminative classifier overview

• Classifier comparison

Outlines

University of Adelaide 4

• Classification problem overview

• Instance-based classifier:
– Nearest Neighbour Classifier

– K-Nearest Neighbour classifier

– Hyper-parameter selection, validation and overfitting

• Generative classifier: Bayesian decision rule and Naïve Bayes
classifier

• Discriminative classifier overview

• Classifier comparison

Classification problems

• A fundamental problem in machine learning

• Type of classification problems

– Binary classification problem: binary output

– Multi-class classification problem: output one of many discrete
class labels. Class labels are mutual exclusive

– Multi-label classification problem: one sample could be assigned
to multiple labels.

University of Adelaide 5

The general process of building a
classification system

• Collecting training data, i.e., (x,y) pairs, x is the input
and y is the output class labels

• Find features for representing input data

• (optional) Train the classifier to find the best mapping
function f

• Apply the classifier on unobserved data

University of Adelaide 6

Example

University of Adelaide 7

Why classification is possible?

• We hope classifier performs well on the training set can
be generalized to the unobserved samples

• Perfect generalization is not always possible

– E.g., if ‘a’=1,’b’=2,’c’=1 how about ‘d’

• Machine learning essentially has some assumption about
the relationship between data and their associated
output, e.g., class labels

– For example, we commonly assume similar x will have similar y.

– But there might be many possible ways of defining “similarity”

University of Adelaide 8

Type of classifiers

• We can divide the large variety of classification
approaches into roughly three main types

– Instance-based: classifiers use observation directly
(no models) e.g. K nearest neighbors

– Generative: build a generative statistical model –
e.g., Naïve Bayes

– Discriminative: directly estimate a decision
rule/boundary – e.g., decision tree

University of Adelaide 9

Outlines

University of Adelaide 10

• Classification problem overview

• Instance-based classifier:
– Nearest Neighbour Classifier

– K-Nearest Neighbour classifier

– Hyper-parameter selection, validation and overfitting

• Generative classifier: Bayesian decision rule and Naïve Bayes
classifier

• Discriminative classifier overview

• Classifier comparison

Nearest Neighbour Classifier

• Classification by referring the class label of the nearest
neighbour

University of Adelaide 11

Distance measurement

• How do we measure the similarity or distance between
two samples?

• Many possible choices

– A good distance metric is usually the key to nearest neighbour
classifier

– The commonly used one: Euclidean distance

– Mahalanobis distance: the matrix M could be a parameter
learned with supervised learning (e.g., metric learning)

University of Adelaide 12

Voronoi Diagram

• An interesting pattern appears in many areas

– Voronoi Explained! – YouTube

• Given a set of points, a Voronoi diagram describes the
areas that are nearest to any given point.

• These areas can be viewed as zones of control

• Each zone is controlled by one of the points

University of Adelaide 13

Decision boundary of Nearest Neighbour

• Each example controls its own neighborhood

• Create the Voroni diagram

University of Adelaide 14

Decision boundary of Nearest Neighbour

• Decision boundary are formed by only retaining these
line segments separating different classes.

• The more training examples we have stored, the more
complex the decision boundaries can become

University of Adelaide 15

Decision boundary of Nearest Neighbour

University of Adelaide 16

K-nearest neighbour classifier

University of Adelaide 17

The effect of K in KNN classifiers

University of Adelaide 18

The impact of k

University of Adelaide 19

Question: How to choose k?

University of Adelaide 20

Question: How to choose k?

University of Adelaide 21

More general: Model selection problem

University of Adelaide 22

Overfitting

University of Adelaide 23

Under-fitting

University of Adelaide 24

Model selection: Validation set

University of Adelaide 25

The impact of the validation set size

University of Adelaide 26

Model selection: K-fold Cross Validation

University of Adelaide 27

Model selection: K-fold Cross Validation

University of Adelaide 28

Special case: Leave-one-out cross validation

University of Adelaide 29

Practical issues with KNN classifier

University of Adelaide 30

Practical issues with KNN classifier

University of Adelaide 31

Practical issues with KNN classifier

University of Adelaide 32

Properties of (K) nearest neighbour
classifiers

University of Adelaide 33

Outlines

University of Adelaide 34

• Classification problem overview

• Instance-based classifier:
– Nearest Neighbour Classifier

– K-Nearest Neighbour classifier

– Hyper-parameter selection, validation and overfitting

• Generative classifier: Bayesian decision Theory and Naïve
Bayes classifier

• Discriminative classifier overview

• Classifier comparison

Model the data

• Conditional probability of data

• It answers the question “what is the likelihood of a
particular sample?”

• It can be modelled by various models. Usually achieved
by Maximum Likelihood estimation

University of Adelaide 35

classdata

Generative Model and Bayesian view of
classification

University of Adelaide 36

Bayes rule:

Bayesian view of classification

University of Adelaide 37

Naïve Bayes [optional]

University of Adelaide 38

Assume the input/feature can be modelled as n random variables

Assume features are conditional independent

Normal independence

Bag-of-words representation of text

• Ignore the order of words, but only record their
occurrence frequency

– A simple but effective way of representing text

University of Adelaide 39

Naïve Bayes with the BoW model

University of Adelaide 40

Example:
a mini document: “ the most interesting film in the summer”

Formally,

Recall that,

Naïve Bayes Classifier [optional]

University of Adelaide 41

• We usually use logarithm of probability in practise

• We can merge the terms with the same word

is the number of occurrence of the k-th word in the vocabulary. It can be 0,
or non-zero

Type of classifiers (Recap)

• We can divide the large variety of classification
approaches into roughly three main types

– Instance-based: classifiers use observation directly
(no models) e.g. K nearest neighbors

– Generative: build a generative statistical model – e.g.,
Naïve Bayes

– Discriminative: directly estimate a decision
rule/boundary – e.g., decision tree

University of Adelaide 42

Discriminative classifier overview

• Directly model the decision boundary without explicit
statistic model on data

– Model P(y|x) directly without modelling P(x|y)

• We will learn a lot of discriminative models in this
course!

– Support vector machine

– Boosting

– Random forest

– Neural networks

– …

University of Adelaide 43

Discriminative models vs. Generative
Models

• Discriminative classifier vs. Generative classifier

– Discriminative classifier usually works better in terms of
prediction accuracy

– Decision boundary modelling is usually easier than modelling
data distribution

– However, generative classifier has some benefits, e.g., work with
open-world setting

University of Adelaide 44

Summary

• Classifier overview
– Workflow

– Basic assumption

– Type of classifiers

• K-NN classifiers
– 1-NN classifier and K-NN classifier

– Impact of k

– How to choose hyper-parameter, overfitting, under-fitting

– Validation set and K-fold cross validation

• Generative classifier
– Bayesian decision rule

– Naïve Bayes

• Discriminative classifier

University of Adelaide 45

Related Posts