CS计算机代考程序代写 algorithm decision tree Excel Bayesian data mining information theory COMP9517: Computer Vision

COMP9517: Computer Vision
Pattern Recognition Part 1
Week 4 COMP9517 2021 T1 1

Introduction
• Pattern recognition: is the scientific discipline whose goal is to automatically recognise patterns and regularities in the data (e.g. images).
• Examples:
• object recognition / object classification
• Text classification (e.g. spam/non-spam emails) • Speech recognition
• Event detection
• recommender systems
Week 4 COMP9517 2021 T1 2

Pattern recognition categories
Based on learning paradigm:
• Supervised learning: learning patterns in a set of data together with available labels (ground truth)
• Unsupervised learning: finding patterns in a set of data without any labels available
• Semi-supervised learning: uses a combination of labelled and unlabelled data to learn patterns
• Weakly supervised learning: uses noisy, limited or imprecise labels for the data to learn patterns in a supervised setting
Week 4 COMP9517 2021 T1 3

Applications in Computer Vision
• making decisions about image content • classifying objects in an image
• recognising activities
Week 4 COMP9517 2021 T1 4

Applications in Computer Vision
• Character recognition
• Human activity recognition
• Face detection/recognition
• Image-based medical diagnosis • Image Segmentation
• Biometric authentication
Week 4 COMP9517 2021 T1 5

Pattern Recognition Systems
• Prototype System
Week 4 COMP9517 2021 T1 6

Pattern Recognition Systems
• Basic stages involved in the design of a classification system in computer vision:
Image acquisition
Feature
Pre- engineering Feature
processing & selection extraction
Learning algorithm
System evaluation (test)
Week 4
COMP9517 2021 T1
7

Pattern Recognition Concepts
• Object is a physical unit (an identifiable portion)
• Regions (ideally) correspond to objects, obtained after segmentation of an image
• Classes: the set of objects can be divided into disjoint subsets that may have some common features- such sets are called classes
• Class to which an object belongs is denoted by a class label
• Classification is a process that assigns labels to objects, based on object properties
• Classifier: the corresponding algorithm/method is called the classifier
• Pattern: the classifier bases its decision on object features, called the pattern
Features
Week 4 COMP9517 2021 T1
Classifier
A man Bob
8

More Concepts
• Features: description/properties of objects
• Model: description of classes
• Pre-processing for noise removal, segmentation
• Feature Extraction reduces the data by measuring certain “ features” or properties
• Training samples- objects with known ground truth / labels used to build model
• Cost- consequence of making incorrect decision
• Decision boundary between regions in feature space
Week 4 COMP9517 2021 T1 9

Pattern Recognition Overview-I
• Model Types
• Generative Model: builds some models for each class
• probabilistic “model” of each class
• models the joint probability p(x,y) (it learns p(x|y) and
p(y) and p(x,y) = p(x|y)p(y))
• “mechanism” by which the data is generated
• decision Boundary:
• one model more likely than others (p(y|x))
• applicable to unsupervised learning tasks (unlabeled data)
• Discriminative Model
Week 4
• •
focus on the decision boundary
• models the posterior probability p(y|x)
suited for supervised learning tasks (labeled data)
COMP9517 2021 T1 10

Pattern Recognition Overview
Week 4 COMP9517 2021 T1 11

• Features
Features and Descriptions
• descriptions representing scalar properties of objects are called features
• used to represent knowledge as part of more complex representation structure
• Feature vector
• combines many features, e.g. size feature represents area property,
compactness feature represents circularity
– Good representation is important to solve a problem
Week 4 COMP9517 2021 T1 12

Feature Vector Representation
• 1 2 𝑑 ,each 𝑗isadescriptor
• • • •
𝑗 may be an object measurement 𝑗 may be count of object parts
𝑗 may be colour of the object
Features go by other names like predictors/descriptors/covariates/independent variables
• Example:
• For fish type example: [length, colour,
lightness,…]
• Letter/digit recognition: [holes, moments, SIFT,…]
Week 4 COMP9517 2021 T1 13

Feature Extraction
• Goal of feature extraction is to characterise object by measurements that are
• similar for objects in the same class/category, and • different for objects in different classes
• Must find distinguishing features that are invariant to input transformations
• Design of features often based on prior experience or intuition
Week 4 COMP9517 2021 T1 14

Feature Extraction
• Selecting features that are
• translation, rotation and scale invariant in images, eg. shape, colour, texture
• handling occlusion, projective distortion for 3-D objects in images when distance between object and camera changes
• handling non-rigid deformations common in 3-D vision, eg fingers around a cup
• handling variations in illumination, shadow effects
• Feature selection is problem- and domain-dependent, requires
domain knowledge
• But classification techniques can help to make feature values less
noise sensitive, and to select valuable features out of a larger set
Week 4 COMP9517 2021 T1 15

Supervised Learning Overview
Functions
Training set
􏰜􏰜
􏰜􏰜
Learning Prediction
Find
such that
Test sample
Week 4
COMP9517 2021 T1
16

Classification
• Classifier performs object recognition by assigning a class label to an object, using the object description in the form of features
• Perfect classification is often impossible, instead determine probability for each possible category
• Variability in feature values for objects in the same class versus objects in different classes causes the difficulty of the classification problem
• Variability in feature values may arise due to complexity, but also due to noise • Noisy features and missing features are major issues- not always possible to
determine values of all features for an input
Week 4 COMP9517 2021 T1 17

Classification
• If we have training set of N observations: 􏰜􏰜􏰜􏰝􏰜
• Classification problem is to estimate from this data such that: 􏰜􏰜
Week 4 COMP9517 2021 T1
18

Nearest Class Mean Classifier
• This is a classifier based on minimum distance principle, where the class exemplars are just the centroids (or means)- model described by the
mean
• Training: for training sample pairs
where 􏰜 is the feature vector for sample and 􏰜 is the class label, class centroids are:
􏰞􏰒
􏰞
􏰐􏰐􏰓􏰓􏰚􏰚
• Test
• a new unknown object with feature vector
􏰒∈􏰟􏰠
is classified as class if it is much closer to the mean vector of class than to any other class mean vector
Week 4 COMP9517 2021 T1 19

Nearest Class Mean Classifier
• Compute the Euclidean distance between feature vector X and the mean of each class
• Choose closest class, if close enough (reject otherwise)
Week 4
COMP9517 2021 T1 20

Nearest Class Mean Classifier
• Class 2 has two modes; where is its mean?
• But if modes are detected, two subclass mean vectors can be used
Week 4
COMP9517 2021 T1 21

Nearest Class Mean Classifier
• Pros:
• Simple
• Fast
• works well when classes are compact and far from each other.
• Cons:
• For complex classes (eg. Multimodal, non-spherical) may give very poor
results
• Cannot handle outliers and noisy data well
• Cannot handle missing data
Week 4
COMP9517 2021 T1 22

K-nearest Neighbours
• K-NN is a classifier that decides class label for a sample, based on the K nearest samples
• The sample will be assigned to the class which has majority of members in the neighborhood
• The neighbours are selected from a set of samples for which the class is known
• For every new test sample, the distances between the test sample and all training samples are computed, and the K nearest training samples are used to decide the class label of test sample
Week 4 COMP9517 2021 T1 23

K-nearest Neighbours
• Commonly used distance for continuous variables is Euclidean distance
• for discrete variables, Hamming distance. K=12
https://towardsdatascience.com/knn-using-scikit-learn-c6bed765be75
Week 4 COMP9517 2021 T1 24

K-nearest Neighbours
• Pros:
• Very simple and intuitive
• Easy to implement
• No a priori assumptions
• No training step
• Decision surfaces are non-linear
• Cons:
• Slow algorithm for big datasets
• Does not perform well when the number of variables grows (curse of dimensionality)
• Needs homogeneous feature types and scales
• Finding the optimal number of neighbours can be challenging
Week 4 COMP9517 2021 T1 25

K-nearest Neighbours: An Application
• Automated MS-lesion segmentation by KNN
• Manually labeled image used as training set
• 4featuresused:Intensityandvoxel locations (x,y,z coordinates)
• Ref: Anbeek et. Al, “Automated MS-lesion segmentation by K-nearest neighbor classification”, MIDAS journal, 2008
Week 4 COMP9517 2021 T1 26

Bayesian Decision Theory
• A classifier’s decision may or may not be correct, so setting should be probabilistic (soft decision)
• Probability distributions may be used to make classification decisions with least expected error rate
Week 4 COMP9517 2021 T1 27

Bayesian Decision Theory
• Bayesian classifier classifies an object into the class to which it is most likely to belong, based on observed features
• Assume:
• aprioriprobability𝑃(𝑐􏰜)foreachclass𝑐􏰜
• unconditionaldistribution𝑃(𝑥)
• class conditional distribution 𝑃 𝑥 𝑐􏰜
We shall compute the posterior probability distribution 􏰜 as follows:
• If all the classes are disjoint, by Bayes Rule, the a posteriori probabilities are given
by:
𝑃𝑐􏰜𝑥= 𝑃𝑥𝑐􏰜𝑃(𝑐􏰜) ∑􏰒𝑃𝑥𝑐􏰜 𝑃(𝑐􏰜)
Week 4 COMP9517 2021 T1 28

Bayesian Decision Theory
• If we have an observation for which 1 is greater than 2 , we
would naturally prefer to decide that the true state of nature is
• So the decision is: 􏰜

􏰜􏰜
1
􏰡 􏰜 􏰡(􏰢􏰣) , and the denominator ∑􏰤􏰡 􏰜 􏰡(􏰢􏰣)
􏰒 􏰜 􏰜 is equal for all 􏰜 , so we can write:

􏰜􏰜􏰜
COMP9517 2021 T1
Week 4
29

Bayesian Decision Theory
• Whenever we observe a particular , the probability of error is: 􏰐􏰓
􏰓􏰐
• Clearly, for a given we can minimise the probability of error by deciding 1if 1 2
• This is the Bayes decision rule
Week 4 COMP9517 2021 T1 30

Bayesian Decision Theory: Example
• We want to classify fish type: Salmon , Sea bass, Other
• From past experience we already know the probability of each class:
P(𝒄𝒊)
Salmon
Sea bass
Others
Prior
0.3
0.6
0.1
• If we decide only based on prior, we shall always have to choose “sea bass”. This is called “decision rule based on prior”
• This can behave very poorly
• It never predicts other classes
Week 4 COMP9517 2021 T1 31

Bayesian Decision Theory: Example
• Let us use some features to add more information: • E.g.: length
• From experience we know the “class conditionals” for feature “length”
P(x|ci)
Salmon
Sea bass
other
length > 100 cm
0.5
0.3
0.2
50 cm < length < 100 cm 0.4 0.5 0.2 length < 50 cm 0.1 0.2 0.6 • Now we can estimate the posterior probability for each class Week 4 COMP9517 2021 T1 32 Bayesian Decision Theory: Example • If we have a fish with length 70cm, what would be our prediction? 􏰜 􏰜 􏰜 0.4*0.3=0.12 0.5*0.6=0.30 0.2*0.1=0.02 • So based on these probabilities, our model predicts the type as “sea bass” • Question: If the price of salmon is twice that of sea bass, and sea bass is also more expensive than the other types of fish, is the cost of a wrong decision the same for any misclassification? Week 4 COMP9517 2021 T1 33 Bayesian Decision Theory Risk • If the prices of all types of fish are the same, then we can make the decision by maximizing the posterior probability • But if the prices are not the same, we have to minimize the loss: • Loss is the cost of an action 􏰜 based on our decision: 􏰜 􏰜 • The expected loss associated to action 􏰜 is: 𝑅 𝛼􏰜 𝑥 = ∑􏰢􏰒􏰘􏰐𝜆 𝛼􏰜 𝑐􏰒 𝑃(𝑐􏰒|𝑥) • 􏰜 is also called conditional risk • An optimal Bayes decision strategy is to minimize the conditional risk Week 4 COMP9517 2021 T1 34 Bayesian Decision Theory Risk: Example • Continuing with the same example, and assuming we have the loss function 􏰜 􏰜 : 𝜆 𝛼􏰜 𝑐􏰜 Salmon Sea bass Other If predicted salmon 0 2 3 If predicted sea bass 10 0 4 If predicted other 20 7 0 𝑅 𝑠𝑒𝑙𝑙𝑎𝑠𝑠𝑎𝑙𝑚𝑜𝑛50<𝑥<100)= 𝜆 𝑠𝑒𝑙𝑙𝑎𝑠𝑠𝑎𝑙𝑚𝑜𝑛𝑠𝑎𝑙𝑚𝑜𝑛 𝑃 𝑠𝑎𝑙𝑚𝑜𝑛50<𝑥<100 + 𝜆 𝑠𝑒𝑙𝑙𝑎𝑠𝑠𝑎𝑙𝑚𝑜𝑛𝑠𝑒𝑎𝑏𝑎𝑠𝑠 𝑃 𝑠𝑒𝑎𝑏𝑎𝑠𝑠50<𝑥<100 +𝜆 𝑠𝑒𝑙𝑙𝑎𝑠𝑠𝑎𝑙𝑚𝑜𝑛𝑜𝑡h𝑒𝑟 𝑃 𝑜𝑡h𝑒𝑟50<𝑥<100 ∝ 0×𝑃 50<𝑥<100𝑠𝑎𝑙𝑚𝑜𝑛 𝑃 𝑠𝑎𝑙𝑚𝑜𝑛 +2×𝑃 50<𝑥<100𝑠𝑒𝑎𝑏𝑎𝑠𝑠 𝑃 𝑠𝑒𝑎𝑏𝑎𝑠𝑠 + 3×𝑃 50<𝑥<100𝑜𝑡h𝑒𝑟 𝑃 𝑜𝑡h𝑒𝑟 =0+2×0.5×0.6+3×0.2×0.1=0.66 𝑅 𝑠𝑒𝑙𝑙𝑎𝑠𝑠𝑒𝑎𝑏𝑎𝑠𝑠50<𝑥<100)∝1.28 𝑅 𝑠𝑒𝑙𝑙𝑎𝑠𝑜𝑡h𝑒𝑟𝑠50<𝑥<100)∝4.5 Week 4 COMP9517 2021 T1 35 Bayesian Decision Theory Risk: Example • So, if the length of your fish is in the range of , the loss would be minimized if you predict it as “salmon” Week 4 COMP9517 2021 T1 36 Bayesian Decision Theory Parametric Models for Distributions • To compute 𝑖 and 𝑖 , we can use an empirical method based on given samples • Or if we know that the distribution of follows a parametric model, then we may estimate the parameters using the samples • An Example: • Assume that the patterns in the 􏰥􏰦 class can be described by a normal distribution, whose covariance matrix 􏰜 is known but the mean 􏰜 is unknown • Then, an estimate of the mean may be the average of the labelled samples available in the training set: Week 4 𝜇􏰧 = 𝑥 ̅ COMP9517 2021 T1 37 Bayesian Decision Theory • Pros: • Considers uncertainties • Permits combining new information with the current knowledge • Simple & intuitive • Cons • Computationally expensive • Choice of priors can be subjective Week 4 COMP9517 2021 T1 38 Decision Trees Introduction • Most practical pattern recognition methods address problems where feature vectors are real-valued and there exists some notion of a metric • There are classification problems involving nominal data where instance descriptions are discrete and without any natural notion of similarity or even ordering • For example: {high, medium, low}, {red, green, blue} • Nominal data are also called as categorical data • How can we use such nominal data for classification? • Use rule-based methods Week 4 COMP9517 2021 T1 39 Decision Trees: Example • Let’s go back to the fish example with two types of fish, i.e. “salmon” and “sea bass”, and assume we have two features: • Length • Width • We want to classify fishes based on these two features Salmon Sea bass X1>80
no
yes (0,4)
yes
no (3,0)
x1=Length
Salmon Sea bass
X2>30
X2>10
yes
(1,0)
X1>110
Week 4
COMP9517 2021 T1
no (1,5)
yes (7,0)
80 110
x1=Length
40
10 30 x2 = width x2 = width

Decision Trees: Example
no
yes (1,0) (0,4)
yes
no (3,0)
Salmon
yes
X2>10
X1>80
Salmon
Sea bass
no (1,5)
yes (7,0)
Salmon
• For any new sample, we start from the top of the tree, answer the questions, follow the appropriate path to the bottom, and then decide the label
Week 4 COMP9517 2021 T1 41
X2>30
Sea bass
X1>110

Decision Trees Overview
• Approach
• classify a pattern through a sequence of questions, the next question asked
depends on answer to current question
• sequence of questions displayed in a directed decision tree or simply tree
• Structure
• nodes in the tree represent features
• leaf nodes contain the class labels
• one feature (or a few) at a time to split search space of patterns
• each branching node has one child for each possible value of the parent feature
• Classification
• begins at the root node, follows the appropriate link to a leaf node • assigns the class label of the leaf node to the test pattern
Week 4 COMP9517 2021 T1 42

Decision Trees Construction
• Binary decision tree is a binary tree structure that has a decision function associated with each node
• Simple case: numeric feature values, decision function compares value of a feature to a threshold. The decision function selects left/right branch if the value of the feature is less / greater than the threshold
• Advantages: at each node, only one feature is used, and its threshold value need to be stored
• For any given set of training examples, there may be more than one possible decision tree to classify them, depending on feature order
• We must select features that give the `best’ tree based on some criterion
• Smallest tree preferred
Week 4 COMP9517 2021 T1 43

Decision Trees Construction
• Algorithm

1. Select a feature to place at the node (the first one is the root)
2. Make one branch for each possible value
3. For each branch node, repeat step 1 and 2 using only those instances that actually reach the branch
4. When all instances at a node have the same classification (or label), stop growing that part of the tree
How to determine which feature to split on?
– One way is to use measures from information theory – Entropy and Information Gain
Week 4 COMP9517 2021 T1 44

Constructing Optimal Decision Tree
• Entropy
• To construct optimal decision trees from training data, we need a definition of
optimality
• One simple criterion is entropy, based on information theory
The entropy of a set of events x = {x1, x2, …., xn} is:
􏰨􏰜􏰘􏰐 􏰜 􏰓 􏰜)
where 􏰜 is the probability of event xi
• Entropy may be viewed as the average uncertainty of the information source. If the information source is homogeneous and has no uncertainty, then entropy is 0 and if the source information is uncertain then entropy is higher than zero.
Week 4
COMP9517 2021 T1 45

Decision Trees: Entropy
Example of entropy computations: Let us look at the fish example
The entropy or uncertainty of information (type of fish) is:
1.29
Week 4 COMP9517 2021 T1
46
P(class)
Salmon
Sea bass
Others
Prior
0.3
0.6
0.1

Decision Trees: Information Gain
Information gain is an entropy-based measure to evaluate features and produce optimal decision trees
If is a set of training samples and is one feature of the samples, then Information Gain w.r.t. feature :
|􏰩􏰪􏰫|
􏰬􏰫 ∈ 􏰭 􏰮 􏰯 􏰕 􏰰 􏰱 ( 􏰬 ) | 􏰩 | 􏰬􏰫
Use the feature with highest information gain to split on:
• Prior probabilities can be estimated using frequency of associated events in the training data
)
Week 4
COMP9517 2021 T1
47

Decision Trees: Information Gain Example
• Let us look at the fish example with two features of “length” and “width” again, but for the sake of simplicity, we assume three categories for each feature:
􏰐 􏰓
• This table is the list of our 15 observations/samples from two classes of “salmon” and “sea bass”
Week 4 COMP9517 2021 T1 48
𝑥􏰐
𝑥􏰓
Type
S
S
Salmon
M
S
Salmon
M
S
Salmon
S
M
Sea bass
S
L
Sea bass
S
M
Sea bass
M
M
Sea bass
M
L
Sea bass
L
M
Salmon
L
M
Salmon
L
L
Salmon
S
L
Sea bass
M
L
Sea bass
M
M
Sea bass
M
L
Sea bass

Decision Trees: Information Gain Example
• Before selecting the first feature we need to know the base entropy. There are 6 salmon and 9 sea bass in the sample so:
􏰲􏰴 􏰐􏰳 􏰐􏰳
𝑥􏰐
𝑥􏰓
Type
S
S
Salmon
M
S
Salmon
M
S
Salmon
S
M
Sea bass
S
L
Sea bass
S
M
Sea bass
M
M
Sea bass
M
L
Sea bass
L
M
Salmon
L
M
Salmon
L
L
Salmon
S
L
Sea bass
M
L
Sea bass
M
M
Sea bass
M
L
Sea bass
0.97
• To estimate 􏰐 we need to use frequency table for
􏰵􏰮􏰱􏰰
􏰐 to compute 􏰐)
Type
Salmon
Sea bass
𝑥􏰐
S
1
4
5
M
2
5
7
L
3
0
3
15
Week 4
COMP9517 2021 T1 49

Decision Trees: Information Gain Example
𝐻𝑆𝑥􏰐 =5𝐸𝑛𝑡.1,4+7𝐸𝑛𝑡.2,5+3𝐸𝑁𝑡.3,0=0.64 15 15 15
𝐼𝐺𝑆,𝑥􏰐 =𝐻􏰵􏰮􏰱􏰰−𝐻𝑆𝑥􏰐 =0.97−0.64=0.33
Type
Salmon
Sea bass
𝑥􏰐
S
1
4
5
M
2
5
7
L
3
0
3
15
• Similarly for 􏰓 :
𝐻𝑆𝑥􏰓 =6𝐸𝑛𝑡.2,4+6𝐸𝑛𝑡.1,5+3𝐸𝑁𝑡.3,0=0.62
15 15 15 𝐼𝐺𝑆,𝑥􏰓 =𝐻􏰵􏰮􏰱􏰰−𝐻𝑆𝑥􏰓 =0.97−0.62=0.35
• 􏰓 > 􏰐 , so the decision tree starts with spliting 􏰓 and will repeat the same procedure at every node
• divide the dataset by its branches and repeat the same process on every branch.
• A branch with entropy more than 0 needs further splitting.
Week 4 COMP9517 2021 T1 50

Decision Trees Summary
• Pros:
• Easy to interpret
• Can handle both numerical and categorical data
• It can handle outliers and missing values
• Gives information on importance of features (feature selection)
• Cons:
• Tends to overfit
• Only axis-aligned splits
• Greedy algorithm (may not find the best tree)
Week 4 COMP9517 2021 T1 51

Ensemble Learning
• Ensemble learning combines multiple models to improve predictive performance, compared to those obtained from any of the constituent models
• Multiple models can be created using
• different classifiers/learning algorithms
• different parameters for the same algorithm • different training examples
• different feature sets
Week 4 COMP9517 2021 T1 52

• Random forests is an ensemble learning method
• constructsanensembleof decision trees by training
• outputisthemodeofallclasses output by individual trees
• They overcome the decision trees’ habit of overfitting
https://medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d
Random Forests
Week 4 COMP9517 2021 T1 53

Random Forests: Breiman’s algorithm
Training
• Let N be number of training instances, M the number of features
• Sample N instances at random with replacement from the original data (bootstrap aggregating or bagging)
• At each node, m<