COMP9517: Computer Vision
Pattern Recognition Part 1
Week 4 COMP9517 2021 T1 1
Introduction
• Pattern recognition: is the scientific discipline whose goal is to automatically recognise patterns and regularities in the data (e.g. images).
• Examples:
• object recognition / object classification
• Text classification (e.g. spam/non-spam emails) • Speech recognition
• Event detection
• recommender systems
Week 4 COMP9517 2021 T1 2
Pattern recognition categories
Based on learning paradigm:
• Supervised learning: learning patterns in a set of data together with available labels (ground truth)
• Unsupervised learning: finding patterns in a set of data without any labels available
• Semi-supervised learning: uses a combination of labelled and unlabelled data to learn patterns
• Weakly supervised learning: uses noisy, limited or imprecise labels for the data to learn patterns in a supervised setting
Week 4 COMP9517 2021 T1 3
Applications in Computer Vision
• making decisions about image content • classifying objects in an image
• recognising activities
Week 4 COMP9517 2021 T1 4
Applications in Computer Vision
• Character recognition
• Human activity recognition
• Face detection/recognition
• Image-based medical diagnosis • Image Segmentation
• Biometric authentication
Week 4 COMP9517 2021 T1 5
Pattern Recognition Systems
• Prototype System
Week 4 COMP9517 2021 T1 6
Pattern Recognition Systems
• Basic stages involved in the design of a classification system in computer vision:
Image acquisition
Feature
Pre- engineering Feature
processing & selection extraction
Learning algorithm
System evaluation (test)
Week 4
COMP9517 2021 T1
7
Pattern Recognition Concepts
• Object is a physical unit (an identifiable portion)
• Regions (ideally) correspond to objects, obtained after segmentation of an image
• Classes: the set of objects can be divided into disjoint subsets that may have some common features- such sets are called classes
• Class to which an object belongs is denoted by a class label
• Classification is a process that assigns labels to objects, based on object properties
• Classifier: the corresponding algorithm/method is called the classifier
• Pattern: the classifier bases its decision on object features, called the pattern
Features
Week 4 COMP9517 2021 T1
Classifier
A man Bob
8
More Concepts
• Features: description/properties of objects
• Model: description of classes
• Pre-processing for noise removal, segmentation
• Feature Extraction reduces the data by measuring certain “ features” or properties
• Training samples- objects with known ground truth / labels used to build model
• Cost- consequence of making incorrect decision
• Decision boundary between regions in feature space
Week 4 COMP9517 2021 T1 9
Pattern Recognition Overview-I
• Model Types
• Generative Model: builds some models for each class
• probabilistic “model” of each class
• models the joint probability p(x,y) (it learns p(x|y) and
p(y) and p(x,y) = p(x|y)p(y))
• “mechanism” by which the data is generated
• decision Boundary:
• one model more likely than others (p(y|x))
• applicable to unsupervised learning tasks (unlabeled data)
• Discriminative Model
Week 4
• •
focus on the decision boundary
• models the posterior probability p(y|x)
suited for supervised learning tasks (labeled data)
COMP9517 2021 T1 10
Pattern Recognition Overview
Week 4 COMP9517 2021 T1 11
• Features
Features and Descriptions
• descriptions representing scalar properties of objects are called features
• used to represent knowledge as part of more complex representation structure
• Feature vector
• combines many features, e.g. size feature represents area property,
compactness feature represents circularity
– Good representation is important to solve a problem
Week 4 COMP9517 2021 T1 12
Feature Vector Representation
• 1 2 𝑑 ,each 𝑗isadescriptor
• • • •
𝑗 may be an object measurement 𝑗 may be count of object parts
𝑗 may be colour of the object
Features go by other names like predictors/descriptors/covariates/independent variables
• Example:
• For fish type example: [length, colour,
lightness,…]
• Letter/digit recognition: [holes, moments, SIFT,…]
Week 4 COMP9517 2021 T1 13
Feature Extraction
• Goal of feature extraction is to characterise object by measurements that are
• similar for objects in the same class/category, and • different for objects in different classes
• Must find distinguishing features that are invariant to input transformations
• Design of features often based on prior experience or intuition
Week 4 COMP9517 2021 T1 14
Feature Extraction
• Selecting features that are
• translation, rotation and scale invariant in images, eg. shape, colour, texture
• handling occlusion, projective distortion for 3-D objects in images when distance between object and camera changes
• handling non-rigid deformations common in 3-D vision, eg fingers around a cup
• handling variations in illumination, shadow effects
• Feature selection is problem- and domain-dependent, requires
domain knowledge
• But classification techniques can help to make feature values less
noise sensitive, and to select valuable features out of a larger set
Week 4 COMP9517 2021 T1 15
Supervised Learning Overview
Functions
Training set
Learning Prediction
Find
such that
Test sample
Week 4
COMP9517 2021 T1
16
Classification
• Classifier performs object recognition by assigning a class label to an object, using the object description in the form of features
• Perfect classification is often impossible, instead determine probability for each possible category
• Variability in feature values for objects in the same class versus objects in different classes causes the difficulty of the classification problem
• Variability in feature values may arise due to complexity, but also due to noise • Noisy features and missing features are major issues- not always possible to
determine values of all features for an input
Week 4 COMP9517 2021 T1 17
Classification
• If we have training set of N observations:
• Classification problem is to estimate from this data such that:
Week 4 COMP9517 2021 T1
18
Nearest Class Mean Classifier
• This is a classifier based on minimum distance principle, where the class exemplars are just the centroids (or means)- model described by the
mean
• Training: for training sample pairs
where is the feature vector for sample and is the class label, class centroids are:
• Test
• a new unknown object with feature vector
∈
is classified as class if it is much closer to the mean vector of class than to any other class mean vector
Week 4 COMP9517 2021 T1 19
Nearest Class Mean Classifier
• Compute the Euclidean distance between feature vector X and the mean of each class
• Choose closest class, if close enough (reject otherwise)
Week 4
COMP9517 2021 T1 20
Nearest Class Mean Classifier
• Class 2 has two modes; where is its mean?
• But if modes are detected, two subclass mean vectors can be used
Week 4
COMP9517 2021 T1 21
Nearest Class Mean Classifier
• Pros:
• Simple
• Fast
• works well when classes are compact and far from each other.
• Cons:
• For complex classes (eg. Multimodal, non-spherical) may give very poor
results
• Cannot handle outliers and noisy data well
• Cannot handle missing data
Week 4
COMP9517 2021 T1 22
K-nearest Neighbours
• K-NN is a classifier that decides class label for a sample, based on the K nearest samples
• The sample will be assigned to the class which has majority of members in the neighborhood
• The neighbours are selected from a set of samples for which the class is known
• For every new test sample, the distances between the test sample and all training samples are computed, and the K nearest training samples are used to decide the class label of test sample
Week 4 COMP9517 2021 T1 23
K-nearest Neighbours
• Commonly used distance for continuous variables is Euclidean distance
• for discrete variables, Hamming distance. K=12
https://towardsdatascience.com/knn-using-scikit-learn-c6bed765be75
Week 4 COMP9517 2021 T1 24
K-nearest Neighbours
• Pros:
• Very simple and intuitive
• Easy to implement
• No a priori assumptions
• No training step
• Decision surfaces are non-linear
• Cons:
• Slow algorithm for big datasets
• Does not perform well when the number of variables grows (curse of dimensionality)
• Needs homogeneous feature types and scales
• Finding the optimal number of neighbours can be challenging
Week 4 COMP9517 2021 T1 25
K-nearest Neighbours: An Application
• Automated MS-lesion segmentation by KNN
• Manually labeled image used as training set
• 4featuresused:Intensityandvoxel locations (x,y,z coordinates)
• Ref: Anbeek et. Al, “Automated MS-lesion segmentation by K-nearest neighbor classification”, MIDAS journal, 2008
Week 4 COMP9517 2021 T1 26
Bayesian Decision Theory
• A classifier’s decision may or may not be correct, so setting should be probabilistic (soft decision)
• Probability distributions may be used to make classification decisions with least expected error rate
Week 4 COMP9517 2021 T1 27
Bayesian Decision Theory
• Bayesian classifier classifies an object into the class to which it is most likely to belong, based on observed features
• Assume:
• aprioriprobability𝑃(𝑐)foreachclass𝑐
• unconditionaldistribution𝑃(𝑥)
• class conditional distribution 𝑃 𝑥 𝑐
We shall compute the posterior probability distribution as follows:
• If all the classes are disjoint, by Bayes Rule, the a posteriori probabilities are given
by:
𝑃𝑐𝑥= 𝑃𝑥𝑐𝑃(𝑐) ∑𝑃𝑥𝑐 𝑃(𝑐)
Week 4 COMP9517 2021 T1 28
Bayesian Decision Theory
• If we have an observation for which 1 is greater than 2 , we
would naturally prefer to decide that the true state of nature is
• So the decision is:
∗
1
() , and the denominator ∑ ()
is equal for all , so we can write:
∗
COMP9517 2021 T1
Week 4
29
Bayesian Decision Theory
• Whenever we observe a particular , the probability of error is:
• Clearly, for a given we can minimise the probability of error by deciding 1if 1 2
• This is the Bayes decision rule
Week 4 COMP9517 2021 T1 30
Bayesian Decision Theory: Example
• We want to classify fish type: Salmon , Sea bass, Other
• From past experience we already know the probability of each class:
P(𝒄𝒊)
Salmon
Sea bass
Others
Prior
0.3
0.6
0.1
• If we decide only based on prior, we shall always have to choose “sea bass”. This is called “decision rule based on prior”
• This can behave very poorly
• It never predicts other classes
Week 4 COMP9517 2021 T1 31
Bayesian Decision Theory: Example
• Let us use some features to add more information: • E.g.: length
• From experience we know the “class conditionals” for feature “length”
P(x|ci)
Salmon
Sea bass
other
length > 100 cm
0.5
0.3
0.2
50 cm < length < 100 cm
0.4
0.5
0.2
length < 50 cm
0.1
0.2
0.6
• Now we can estimate the posterior probability for each class
Week 4 COMP9517 2021 T1 32
Bayesian Decision Theory: Example
• If we have a fish with length 70cm, what would be our prediction?
0.4*0.3=0.12 0.5*0.6=0.30
0.2*0.1=0.02
• So based on these probabilities, our model predicts the type as “sea bass”
• Question: If the price of salmon is twice that of sea bass, and sea bass is also more expensive than the other types of fish, is the cost of a wrong decision the same for any misclassification?
Week 4 COMP9517 2021 T1 33
Bayesian Decision Theory Risk
• If the prices of all types of fish are the same, then we can make the decision by maximizing the posterior probability
• But if the prices are not the same, we have to minimize the loss: • Loss is the cost of an action based on our decision:
• The expected loss associated to action is:
𝑅 𝛼 𝑥 = ∑𝜆 𝛼 𝑐 𝑃(𝑐|𝑥)
• is also called conditional risk
• An optimal Bayes decision strategy is to minimize the conditional risk
Week 4 COMP9517 2021 T1 34
Bayesian Decision Theory Risk: Example
• Continuing with the same example, and assuming we have the loss function :
𝜆 𝛼 𝑐
Salmon
Sea bass
Other
If predicted salmon
0
2
3
If predicted sea bass
10
0
4
If predicted other
20
7
0
𝑅 𝑠𝑒𝑙𝑙𝑎𝑠𝑠𝑎𝑙𝑚𝑜𝑛50<𝑥<100)= 𝜆 𝑠𝑒𝑙𝑙𝑎𝑠𝑠𝑎𝑙𝑚𝑜𝑛𝑠𝑎𝑙𝑚𝑜𝑛 𝑃 𝑠𝑎𝑙𝑚𝑜𝑛50<𝑥<100 +
𝜆 𝑠𝑒𝑙𝑙𝑎𝑠𝑠𝑎𝑙𝑚𝑜𝑛𝑠𝑒𝑎𝑏𝑎𝑠𝑠 𝑃 𝑠𝑒𝑎𝑏𝑎𝑠𝑠50<𝑥<100 +𝜆 𝑠𝑒𝑙𝑙𝑎𝑠𝑠𝑎𝑙𝑚𝑜𝑛𝑜𝑡h𝑒𝑟 𝑃 𝑜𝑡h𝑒𝑟50<𝑥<100 ∝ 0×𝑃 50<𝑥<100𝑠𝑎𝑙𝑚𝑜𝑛 𝑃 𝑠𝑎𝑙𝑚𝑜𝑛 +2×𝑃 50<𝑥<100𝑠𝑒𝑎𝑏𝑎𝑠𝑠 𝑃 𝑠𝑒𝑎𝑏𝑎𝑠𝑠 +
3×𝑃 50<𝑥<100𝑜𝑡h𝑒𝑟 𝑃 𝑜𝑡h𝑒𝑟 =0+2×0.5×0.6+3×0.2×0.1=0.66
𝑅 𝑠𝑒𝑙𝑙𝑎𝑠𝑠𝑒𝑎𝑏𝑎𝑠𝑠50<𝑥<100)∝1.28 𝑅 𝑠𝑒𝑙𝑙𝑎𝑠𝑜𝑡h𝑒𝑟𝑠50<𝑥<100)∝4.5
Week 4 COMP9517 2021 T1 35
Bayesian Decision Theory Risk: Example
• So, if the length of your fish is in the range of , the loss would be minimized if you predict it as “salmon”
Week 4 COMP9517 2021 T1 36
Bayesian Decision Theory
Parametric Models for Distributions
• To compute 𝑖 and 𝑖 , we can use an empirical method based on given samples
• Or if we know that the distribution of follows a parametric model, then we may estimate the parameters using the samples
• An Example:
• Assume that the patterns in the class can be described by a normal distribution, whose covariance matrix is known but the mean is unknown
• Then, an estimate of the mean may be the average of the labelled samples available in the training set:
Week 4
𝜇 = 𝑥 ̅
COMP9517 2021 T1 37
Bayesian Decision Theory
• Pros:
• Considers uncertainties
• Permits combining new information with the current knowledge • Simple & intuitive
• Cons
• Computationally expensive
• Choice of priors can be subjective
Week 4 COMP9517 2021 T1 38
Decision Trees Introduction
• Most practical pattern recognition methods address problems where feature vectors are real-valued and there exists some notion of a metric
• There are classification problems involving nominal data where instance descriptions are discrete and without any natural notion of similarity or even ordering
• For example: {high, medium, low}, {red, green, blue} • Nominal data are also called as categorical data
• How can we use such nominal data for classification? • Use rule-based methods
Week 4 COMP9517 2021 T1 39
Decision Trees: Example
• Let’s go back to the fish example with two types of fish, i.e. “salmon” and “sea bass”, and assume we have two features:
• Length • Width
• We want to classify fishes based on these two features
Salmon Sea bass
X1>80
no
yes (0,4)
yes
no (3,0)
x1=Length
Salmon Sea bass
X2>30
X2>10
yes
(1,0)
X1>110
Week 4
COMP9517 2021 T1
no (1,5)
yes (7,0)
80 110
x1=Length
40
10 30 x2 = width x2 = width
Decision Trees: Example
no
yes (1,0) (0,4)
yes
no (3,0)
Salmon
yes
X2>10
X1>80
Salmon
Sea bass
no (1,5)
yes (7,0)
Salmon
• For any new sample, we start from the top of the tree, answer the questions, follow the appropriate path to the bottom, and then decide the label
Week 4 COMP9517 2021 T1 41
X2>30
Sea bass
X1>110
Decision Trees Overview
• Approach
• classify a pattern through a sequence of questions, the next question asked
depends on answer to current question
• sequence of questions displayed in a directed decision tree or simply tree
• Structure
• nodes in the tree represent features
• leaf nodes contain the class labels
• one feature (or a few) at a time to split search space of patterns
• each branching node has one child for each possible value of the parent feature
• Classification
• begins at the root node, follows the appropriate link to a leaf node • assigns the class label of the leaf node to the test pattern
Week 4 COMP9517 2021 T1 42
Decision Trees Construction
• Binary decision tree is a binary tree structure that has a decision function associated with each node
• Simple case: numeric feature values, decision function compares value of a feature to a threshold. The decision function selects left/right branch if the value of the feature is less / greater than the threshold
• Advantages: at each node, only one feature is used, and its threshold value need to be stored
• For any given set of training examples, there may be more than one possible decision tree to classify them, depending on feature order
• We must select features that give the `best’ tree based on some criterion
• Smallest tree preferred
Week 4 COMP9517 2021 T1 43
Decision Trees Construction
• Algorithm
•
1. Select a feature to place at the node (the first one is the root)
2. Make one branch for each possible value
3. For each branch node, repeat step 1 and 2 using only those instances that actually reach the branch
4. When all instances at a node have the same classification (or label), stop growing that part of the tree
How to determine which feature to split on?
– One way is to use measures from information theory – Entropy and Information Gain
Week 4 COMP9517 2021 T1 44
Constructing Optimal Decision Tree
• Entropy
• To construct optimal decision trees from training data, we need a definition of
optimality
• One simple criterion is entropy, based on information theory
The entropy of a set of events x = {x1, x2, …., xn} is:
)
where is the probability of event xi
• Entropy may be viewed as the average uncertainty of the information source. If the information source is homogeneous and has no uncertainty, then entropy is 0 and if the source information is uncertain then entropy is higher than zero.
Week 4
COMP9517 2021 T1 45
Decision Trees: Entropy
Example of entropy computations: Let us look at the fish example
The entropy or uncertainty of information (type of fish) is:
1.29
Week 4 COMP9517 2021 T1
46
P(class)
Salmon
Sea bass
Others
Prior
0.3
0.6
0.1
Decision Trees: Information Gain
Information gain is an entropy-based measure to evaluate features and produce optimal decision trees
If is a set of training samples and is one feature of the samples, then Information Gain w.r.t. feature :
||
∈ ( ) | |
Use the feature with highest information gain to split on:
• Prior probabilities can be estimated using frequency of associated events in the training data
)
Week 4
COMP9517 2021 T1
47
Decision Trees: Information Gain Example
• Let us look at the fish example with two features of “length” and “width” again, but for the sake of simplicity, we assume three categories for each feature:
• This table is the list of our 15 observations/samples from two classes of “salmon” and “sea bass”
Week 4 COMP9517 2021 T1 48
𝑥
𝑥
Type
S
S
Salmon
M
S
Salmon
M
S
Salmon
S
M
Sea bass
S
L
Sea bass
S
M
Sea bass
M
M
Sea bass
M
L
Sea bass
L
M
Salmon
L
M
Salmon
L
L
Salmon
S
L
Sea bass
M
L
Sea bass
M
M
Sea bass
M
L
Sea bass
Decision Trees: Information Gain Example
• Before selecting the first feature we need to know the base entropy. There are 6 salmon and 9 sea bass in the sample so:
𝑥
𝑥
Type
S
S
Salmon
M
S
Salmon
M
S
Salmon
S
M
Sea bass
S
L
Sea bass
S
M
Sea bass
M
M
Sea bass
M
L
Sea bass
L
M
Salmon
L
M
Salmon
L
L
Salmon
S
L
Sea bass
M
L
Sea bass
M
M
Sea bass
M
L
Sea bass
0.97
• To estimate we need to use frequency table for
to compute )
Type
Salmon
Sea bass
𝑥
S
1
4
5
M
2
5
7
L
3
0
3
15
Week 4
COMP9517 2021 T1 49
Decision Trees: Information Gain Example
𝐻𝑆𝑥 =5𝐸𝑛𝑡.1,4+7𝐸𝑛𝑡.2,5+3𝐸𝑁𝑡.3,0=0.64 15 15 15
𝐼𝐺𝑆,𝑥 =𝐻−𝐻𝑆𝑥 =0.97−0.64=0.33
Type
Salmon
Sea bass
𝑥
S
1
4
5
M
2
5
7
L
3
0
3
15
• Similarly for :
𝐻𝑆𝑥 =6𝐸𝑛𝑡.2,4+6𝐸𝑛𝑡.1,5+3𝐸𝑁𝑡.3,0=0.62
15 15 15 𝐼𝐺𝑆,𝑥 =𝐻−𝐻𝑆𝑥 =0.97−0.62=0.35
• > , so the decision tree starts with spliting and will repeat the same procedure at every node
• divide the dataset by its branches and repeat the same process on every branch.
• A branch with entropy more than 0 needs further splitting.
Week 4 COMP9517 2021 T1 50
Decision Trees Summary
• Pros:
• Easy to interpret
• Can handle both numerical and categorical data
• It can handle outliers and missing values
• Gives information on importance of features (feature selection)
• Cons:
• Tends to overfit
• Only axis-aligned splits
• Greedy algorithm (may not find the best tree)
Week 4 COMP9517 2021 T1 51
Ensemble Learning
• Ensemble learning combines multiple models to improve predictive performance, compared to those obtained from any of the constituent models
• Multiple models can be created using
• different classifiers/learning algorithms
• different parameters for the same algorithm • different training examples
• different feature sets
Week 4 COMP9517 2021 T1 52
• Random forests is an ensemble learning method
• constructsanensembleof decision trees by training
• outputisthemodeofallclasses output by individual trees
• They overcome the decision trees’ habit of overfitting
https://medium.com/@williamkoehrsen/random-forest-simple-explanation-377895a60d2d
Random Forests
Week 4 COMP9517 2021 T1 53
Random Forests: Breiman’s algorithm
Training
• Let N be number of training instances, M the number of features
• Sample N instances at random with replacement from the original data (bootstrap aggregating or bagging)
• At each node, m<