3/25/2021
CSE 473/4573
Introduction to Computer Vision and Image Processing
‘-
CLASSIFICATION AND RECOGNITION
Slide Credit: Hays, et al.
‘-
1
3/25/2021
Local-feature Alignment
‘-
3
Recall: Hypothesize and test
• Given model of object
• New image: hypothesize object identity and pose
• Render object in camera
• Compare rendering to actual image: if close, good
hypothesis.
‘-
4
2
3/25/2021
Recall: Alignment
• Alignment: fitting a model to a transformation between pairs
of features (matches) in two images
‘-
xi residual(T(xi ),xi) i x’
Find transformation T T i that minimizes
5
Alignment with constraints
6
L. G. Roberts, Machine Perception ‘-
of Three Dimensional Solids, Ph.D. thesis, MIT Department of Electrical Engineering, 1963.
3
3/25/2021
Alignment with Features
‘-
Huttenlocher & Ullman (1987)
7
Source: Lana Lazebnik
Machine Learning
‘-
8
4
3/25/2021
The machine learning framework
• Apply a prediction function to a feature representation of
the image to get the desired output:
f( ) = “apple” f( ) = “tomato” f( ) = “cow”
‘-
9
Slide credit: L. Lazebnik
The machine learning framework
y = f(x) output prediction ‘I-mage
function feature
• Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set
• Testing: apply f to a never before seen test example x and output the predicted value y = f(x)
10
Slide credit: L. Lazebnik
5
3/25/2021
Image Classification
Training Images
Training
Image Features
Training Labels
‘-Classifier Training
Trained Classifier
11
Image Categorization
Training Images
Test Image
Training
Image Features
Testing
Image Features
Training Labels
‘-Classifier Training
Trained Classifier
Trained Classifier
Prediction
Outdoor
12
6
3/25/2021
Example: Scene Classification • Is this a kitchen?
‘-
13
Image features
Training Images
Training
Image Features
Training Labels
Classifier ‘-Training
Trained Classifie r
14
7
3/25/2021
Types of Features • Raw pixels
• Histograms
• GIST descriptors
•…
‘-
15
Slide credit: L. Lazebnik
Desirable Properties of Features • Coverage
• Ensure that all relevant info is captured
• Concision
• Minimize number of features without sacrificing
coverage
• Directness/Uniqueness
• Ideal features are independently useful for prediction
16
‘-
8
3/25/2021
Feature Descriptors/Representations
• What types of features would be useful for detecting faces?
• Templates
• Intensity, gradients, etc.
‘-
• Histograms
• Color, texture, SIFT descriptors, etc.
17
Classifiers
Training Images
Training
Image Features
Training Labels
Classifier ‘-Training
Trained Classifier
18
9
3/25/2021
Learning a classifier
Given some set of features with corresponding labels, learn a function to predict the labels from the features
x
x
x
x
‘-
x
o
o
x
xx
x2
x1
o
o o
19
Desirable properties of classifiers
• Name as many desirable properties of a classifier as you can.
‘-
20
10
3/25/2021
Desirable properties of classifiers
• On-line adaptation
• Nonlinear Separability
• Non-Parametric
• Minimize Training Time and Data • Minimize Error
• Makes Hard and Soft Decisions • Verifiable and Validateable
• Can be Tuned
• Generalizable
‘-
21
Generalization
• How well does a learned model generalize from the data
it was trained on to a new test set?
‘-
Training set (labels known)
Test set (labels
22
unknown)
Slide credit: L. Lazebnik
11
3/25/2021
Generalization
• Components of generalization error
• Bias: how much the average model over all training sets differ from the true model?
• Error due to inaccurate assumptions/simplifications made by the model.
• Variance: how much models estimated from different training sets differ from each other. ‘-
• Underfitting: model is too “simple” to represent all the relevant class characteristics
• High bias (few degrees of freedom) and low variance • High training error and high test error
• Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data
• Low bias (many degrees of freedom) and high variance • Low training error and high test error
23
Slide credit: L. Lazebnik
No Free Lunch Theorem
‘-
You can only get generalization through assumptions.
24
Slide credit: D. Hoiem
12
3/25/2021
Bias-Variance Trade-off
• Models with too few parameters are inaccurate because of a large bias (not enough flexibility).
‘-
• Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample).
25
Slide credit: D. Hoiem
Bias-Variance Trade-off
E(MSE) = noise2 + bias2 + variance
Error due to Unavoidable Error due to ‘- variance of training
error incorrect samples assumptions
See the following for explanations of bias-variance (also Bishop’s “Neural Networks” book): •http://www.inf.ed.ac.uk/teaching/courses/mlsc/Notes/Lecture4/BiasVariance.pdf
26
Slide credit: D. Hoiem
13
3/25/2021
Bias-variance tradeoff
Underfitting
Overfitting
‘-
Test error
Training error
Low Bias High Variance
High Bias Low Variance
Complexity
27
Slide credit: D. Hoiem
Bias-variance tradeoff
High Bias Low Variance
Few training examples
‘-
Many training examples
Complexity Low Bias High Variance
28
Slide credit: D. Hoiem
14
Test Error
Error
3/25/2021
Effect of Training Size
Fixed prediction model
Generalization Error
Training
Number of Training Examples
‘-
Testing
29
Slide credit: D. Hoiem
The perfect classification algorithm
• Objective function: encodes the right loss for the problem
• Parameterization: makes assumptions that fit the problem
‘-
• Regularization: right level of regularization for amount of training data
• Training algorithm: can find parameters that maximize objective on training set
• Inference algorithm: can solve for objective function in evaluation
30
Slide credit: D. Hoiem
15
Error
3/25/2021
Remember…
• No classifier is inherently better than any other: you need to make assumptions to generalize
• Three kinds of error
– Inherent: unavoidable
–Bias: due to over-simplifications
‘-
– Variance: due to inability to perfectly estimate parameters from limited data
31
Slide credit: D. Hoiem
How to reduce variance?
• Choose a simpler classifier • Regularize the parameters • Get more training data
‘-
32
Slide credit: D. Hoiem
16
3/25/2021
One way to think about it…
• Training labels dictate that two examples are the same
or different, in some sense
‘-
• Features and distance measures define visual similarity
• Classifiers try to learn weights or parameters for features
and distance measures so that visual similarity predicts label similarity
33
Very brief tour of some classifiers
• K-nearest neighbor
• SVM
• Boosted Decision Trees • Neural networks
• Naïve Bayes
• Bayesian network
• Logistic regression
• Randomized Forests
• RBMs
• Etc.
‘-
34
17
3/25/2021
Generative vs. Discriminative Classifiers
Generative Models
• Represent both the data and the labels
• Often, makes use of conditional independence and priors
• Examples
• Naïve Bayes classifier • Bayesian network
• Models of data may apply to future prediction problems
Discriminative Models
• Learn to directly predict the labels from the data
• Often, assume a simple boundary
(e.g., linear)
• Often easier to predict a label from the data than to model the data
35
Slide credit: D. Hoiem
‘-
• Examples
– Logistic regression
– SVM
– Boosted decision trees
Classification
• Assign input vector to one of two or more classes
• Any decision rule divides input space into decision regions separated by decision boundaries
‘-
36
Slide credit: L. Lazebnik
18
3/25/2021
Nearest Neighbor Classifier
• Assign label of nearest training data point to each test
data point
from Duda et al.
‘-
Voronoi partitioning of feature space for two-category 2D and 3D data
37
Source: D. Lowe
K-nearest neighbor
x
x
x
x oxx
x2
x1
o+o o+x o ‘-
o
x
38
19
3/25/2021
1-nearest neighbor
x2
x1
x
x
x
x oxx
o+o o+x o ‘-
o
x
39
3-nearest neighbor
x
x
x
x oxx
x2
x1
o+o o+x o ‘-
o
x
40
20
3/25/2021
5-nearest neighbor
x
x
x
x oxx
x2
x1
o+o o+x o ‘-
o
x
41
Using K-NN
• Simple, a good one to try first
• With infinite examples, 1-NN provably has error that is at
most twice Bayes optimal error
‘-
42
21
3/25/2021
Classifiers: Linear SVM
x
x
x
o
o ‘-
x2
x1
• Find a linear function to separate the classes: f(x) = sgn(w x + b)
43
x
o o
x
o
xx
x
Classifiers: Linear SVM
x
x
x
o
o ‘-
x2
x1
• Find a linear function to separate the classes: f(x) = sgn(w x + b)
44
x
o o
x
o
xx
x
22
3/25/2021
Classifiers: Linear SVM
x
x oxx
x
x
o
o ‘-
x2
x1
• Find a linear function to separate the classes: f(x) = sgn(w x + b)
45
Nonlinear SVMs
• Datasets that are linearly separable work out great:
o o
x
o
x
0
x
• Butwhatifthedatasetisjusttooh‘-ard?
• We can map it to a higher-dimensional space:
0
x
x2
0x
46
Slide credit: Andrew Moore
23
3/25/2021
Summary: SVMs for image classification
1. Pick an image representation (in our case, bag of
features)
2. Pick a kernel function for that representation
3. Compute the matrix of kernel values between every
pair of training examples
‘-
4. Feed the kernel matrix into your favorite SVM solver to obtain support vectors and weights
5. At test time: compute kernel values for your test example and each support vector, and combine them with the learned weights to get the value of the decision function
47
Slide credit: L. Lazebnik
What about multi-class SVMs?
• Unfortunately, there is no “definitive” multi-class SVM
formulation
• In practice, we have to obtain a multi-class SVM by
combining multiple two-class SVMs
‘-
class of the SVM that returns the highest decision value
• One vs. one
• Training: learn an SVM for each pair of classes
• Testing: each learned SVM “votes” for a class to assign to the
test example
• One vs. others
• Traning: learn an SVM for each class vs. the others
• Testing: apply each SVM to test example and assign to it the
48
Slide credit: L. Lazebnik
24
3/25/2021
SVMs: Pros and cons • Pros
• Many publicly available SVM packages: http://www.kernel-machines.org/software
• Kernel-based framework is very powerful, flexible
• SVMs work very well in practice, even with very small training
sample sizes
• Cons
• No “direct” multi-class SVM, must combine two-class SVMs • Computation, memory
• During training time, must compute matrix of kernel values for every pair of examples
• Learning can take a very long time for large-scale problems
‘-
49
What to remember about classifiers • No free lunch: machine learning algorithms are
tools, not dogmas
• Try simple classifiers first
‘-
• Better to have smart features and simple classifiers
than simple features and smart classifiers
• Use increasingly powerful classifiers with more training data (bias-variance tradeoff)
Slide credit: D. Hoiem
50
25
3/25/2021
Machine Learning Considerations
• 3 important design decisions:
1) What data do I use?
2) How do I represent my data (what feature)?
3) What classifier / regressor / machine learning tool do I use?
‘-
• You can take the representation from deep learning and use it with any classifier.
51
• These are in decreasing order of importance
• Deep learning addresses 2 and 3 simultaneously (and
blurs the boundary between them).
26