PowerPoint 프레젠테이션
Changjae Oh
Copyright By PowCoder代写 加微信 powcoder
Computer Vision
– Machine learning basics and classification –
Semester 1, 22/23
Today’s lecture: Objectives
• To review the past recording
̶ with quizzes
• More details about
̶ K-NN classification
̶ SVM classification
Machine learning problems
The machine learning framework
• Apply a prediction function to a feature representation of the image to
get the desired output:
Slide credit: L. Lazebnik
f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
Machine learning framework
• Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)},
estimate the prediction function f by minimizing the prediction error
on the training set
• Testing: apply f to a never before seen test example x and output the
predicted value y = f(x)
output prediction function Image feature
Slide credit: L. Lazebnik
Machine learning framework
Classifier
Classifier
Prediction
Test Image
Classifier
• Raw pixels
• Histograms
• GIST descriptors
Learning a classifier
• Given some set of features with corresponding labels, learn a function to p
redict the labels from the features
Example: Image Classification by K-NN
Image Feature Classifier Prediction
Extraction
Example: Image Classification by K-NN
Image Feature Classifier Prediction
Classifier
Example: Image Classification by K-NN
Image Feature Classifier Prediction
Classifier
Example: Image Classification by K-NN
Image Feature Classifier Prediction
Prediction
Spectrum of supervision
Supervised
Semi-Supervised
Unsupervised
Reinforcement
Computer vision
Spectrum of supervision
Slide credit: S. Lazebnik
Spectrum of supervision
• Which type of machine learning is this?
https://www.enjoyalgorithms.com/blogs/supervised-unsupervised-and-semisupervised-learning
Spectrum of supervision
• Which type of machine learning is this?
https://www.enjoyalgorithms.com/blogs/supervised-unsupervised-and-semisupervised-learning
Spectrum of supervision
• Which type of machine learning is this?
https://www.enjoyalgorithms.com/blogs/supervised-unsupervised-and-semisupervised-learning
Spectrum of supervision
• Which type of machine learning is this?
https://www.enjoyalgorithms.com/blogs/supervised-unsupervised-and-semisupervised-learning
Spectrum of supervision
• Using reinforcement learning, you can move an AI agent, such as:
Changjae Oh
Computer Vision
– Classification –
Semester 1, 22/23
• Overview of recognition tasks
• A statistical learning approach
• “Classic” or “shallow” classification pipeline
• “Bag of features” representation
• Classifiers: nearest neighbor, linear, SVM
Object recognition
• A collection of related tasks for identifying objects in digital photographs.
• Consists of recognizing, identifying, and locating objects within a picture with a given de
gree of confidence.
semantic segmentation instance segmentation
image classification object detection
Image source
https://arxiv.org/pdf/1405.0312.pdf
Image classification vs Object detection
• Image classification
̶ Identifying what is in the image and the associated level of confidence.
̶ can be binary label or multi-label classification
• Object detection
̶ Localising and classifying one or more objects in an image
̶ Object localisation and image classification
Semantic segmentation vs Instance segmentation
• Semantic segmentation
̶ Assigning a label to every pixel in the image.
̶ Treating multiple objects of the same class as a single entity
• Instance segmentation
̶ Similar process as semantic segmentation, but identifies , for each pixel, the object in
stance it belongs to.
̶ Treating multiple objects of the same class as distinct individual objects (or instances)
̶ typically, instance segmentation is harder than semantic segmentation
The machine learning framework
• Apply a prediction function to a feature representation of the image to
get the desired output:
Slide credit: L. Lazebnik
f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
Machine learning framework
• Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)},
estimate the prediction function f by minimizing the prediction error
on the training set
• Testing: apply f to a never before seen test example x and output the
predicted value y = f(x)
output prediction function Image feature
Slide credit: S. Lazebnik
Machine learning framework
Classifier
Classifier
Prediction
Test Image
Classifier
“Classic” recognition pipeline
• Hand-crafted feature representation
• Off-the-shelf trainable classifier
representation
classifier
“Classic” representation: Bag of features
• Representing images as orderless collections of local features
Motivation 1: Part-based models
• Various parts of the image are used separately to determine if and wher
e an object of interest exists
Weber, Welling & Perona (2000), Fergus, Perona & Zisserman (2003)
Motivation 2: Texture models
• Texture is characterised by the repetition of basic elements or textons
Texton histogram
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
“Texton dictionary”
Motivation 3: Bags of words
• Orderless document representation:
̶ Frequencies of words from a dictionary Salton & McGill (1983)
Motivation 3: Bags of words
• Orderless document representation:
̶ Frequencies of words from a dictionary Salton & McGill (1983)
Motivation 3: Bags of words
• Orderless document representation:
̶ Frequencies of words from a dictionary Salton & McGill (1983)
Bag of features: Outline
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”
1. Local feature extraction
• Sample patches and extract descriptors
2. Learning the visual vocabulary
Slide credit:
Extracted descriptors from
the training set
2. Learning the visual vocabulary
Clustering
2. Learning the visual vocabulary
Clustering
Visual vocabulary
Recall: K-means clustering
• Want to minimize sum of squared Euclidean distances between features xi
and their nearest cluster centers mk
• Algorithm:
̶ Randomly initialize K cluster centers
̶ Iterate until convergence:
1. Assign each feature to the nearest center
2. Recompute each cluster center as the mean of all features assigned to it
Visual vocabularies
Source: B. Leibe
Appearance codebook
Bag of features: Outline
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”
Advanced approach: Spatial pyramids
Lazebnik, Schmid & Ponce (CVPR 2006)
Advanced approach: Spatial pyramids
level 0 level 1
Lazebnik, Schmid & Ponce (CVPR 2006)
Advanced approach: Spatial pyramids
level 0 level 1 level 2
Lazebnik, Schmid & Ponce (CVPR 2006)
Advanced approach: Spatial pyramids
• Caltech101 classification results
“Classic” recognition pipeline
• Hand-crafted feature representation
• Trainable classifier
̶ Nearest Neighbor classifiers
̶ Support Vector machines
representation
classifier
Classifiers: Nearest neighbor
f(x) = label of the training example nearest to x
• All we need is a distance or similarity function for our inputs
• No training required!
Training exa
mples from
Training exa
mples from
Functions for comparing histograms
• L1 distance:
• χ2 distance:
• Quadratic distance (cross-bin distance):
• Histogram intersection (similarity function):
))(),(min(),(
K-nearest neighbor classifier
• For a new point, find the k closest points from training data
• Vote for class label with labels of the k points
K-nearest neighbor classifier
• Which classifier is more robust to outliers?
Credit: , http://cs231n.github.io/classification/
http://cs231n.github.io/classification/
K-nearest neighbor classifier
Credit: , http://cs231n.github.io/classification/
http://cs231n.github.io/classification/
Example: Image Classification by K-NN
Image Feature Classifier Prediction
Extraction
Example: Image Classification by K-NN
Image Feature Classifier Prediction
Classifier
Example: Image Classification by K-NN
Image Feature Classifier Prediction
Classifier
Example: Image Classification by K-NN
Image Feature Classifier Prediction
Prediction
Where else can we use K-NN ?
• Image Matting (Soft segmentation)
̶ Instead of fixed-window based filtering, we can search K-NN for aggregation
[Chen 2012]
Where else can we use K-NN ?
• Deep learning based 3D Computer Vision
[Qiu 2021]
Linear classifiers
• Find a linear function to separate the classes:
f(x) = sgn(w x + b)
Visualizing linear classifiers
Credit: , http://cs231n.github.io/classification/
Example learned weights at the end of learning for CIFAR-10.
http://cs231n.github.io/classification/
Nearest neighbor vs. linear classifiers
• NN pros:
̶ Simple to implement
̶ Decision boundaries not necessarily linear
̶ Works for any number of classes
̶ Nonparametric method
• NN cons:
̶ Need good distance function
̶ Slow at test time
• Linear pros:
̶ Low-dimensional parametric representation
̶ Very fast at test time
• Linear cons:
̶ Works for two classes
̶ How to train the linear function?
̶ What if data is not linearly separable?
Nonparametric methods are good when you have a lot of
data and no prior knowledge, and when you don’t want to
worry too much about choosing just the right features.
[Artificial Intelligence: A Modern Approach]
A learning model that summarizes data with a set of parameters of fixed
size (independent of the number of training examples) is called a parame
tric model. No matter how much data you throw at a parametric model,
it won’t change its mind about how many parameters it needs.
[Artificial Intelligence: A Modern Approach]
Linear classifiers
• When the data is linearly separable, there may be more than one separa
tor (hyperplane)
Which separator
is the best?
Support vector machines
• Find a hyperplane that maximizes the margin between the positive and
negative examples
1:1)(negative
1:1)( positive
Support vectors
Distance between point
and hyperplane: ||||
For support vectors, 1=+ bi wx
Therefore, the margin is 2 / ||w||
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
http://www.umiacs.umd.edu/~joseph/support-vector-machines4.pdf
1. Maximize margin 2 / ||w||
2. Correctly classify all training data:
• Quadratic optimization problem:
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
1)(subject to
1:1)(negative
1:1)( positive
Finding the maximum margin hyperplane
http://www.umiacs.umd.edu/~joseph/support-vector-machines4.pdf
SVM parameter learning
• Separable data:
• Non-separable data:
1)(subject to
Maximize margin Classify training data correctly
+C max 0,1- y
Maximize margin Minimize classification mistakes
SVM parameter learning
Demo: http://cs.stanford.edu/people/karpathy/svmjs/demo
+C max 0,1- y
http://cs.stanford.edu/people/karpathy/svmjs/demo
Nonlinear SVMs
• General idea: the original input space can always be mapped to some hi
gher-dimensional feature space where the training set is separable
Φ: x→ φ(x)
Nonlinear SVMs
• Linearly separable dataset in 1D:
• Non-separable dataset in 1D:
• We can map the data to a higher-dimensional space:
The kernel trick
• General idea:
• The original input space can always be mapped to some higher-dimensional feature s
pace where the training set is separable
• The kernel trick:
• Instead of explicitly computing the lifting transformation φ(x), define a kernel functio
n K such that
K(x,y) = φ(x) · φ(y)
The kernel trick
• Linear SVM decision function:
+=+ xxxw
The kernel trick
• Linear SVM decision function:
• Kernel SVM decision function:
• This gives a nonlinear decision boundary in the original feature space
+=+ ),()()( xxxx
+=+ xxxw
Polynomial kernel:
cK )(),( yxyx +=
Gaussian kernel
• Also known as the radial basis function (RBF) kernel:
exp),( yxyx
Gaussian kernel
Kernels for bags of features
• Histogram intersection:
• Square root (Bhattacharyya kernel):
• Generalized Gaussian kernel:
̶ D can be L1 distance, Euclidean distance, χ2 distance, etc.
exp),( hhD
Another way?
• Use good features
R. Girshick et al.”Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.” CVPR 2014
SVMs: Pros and cons
̶ Non-linear SVM framework is very powerful, flexible
̶ Training is convex optimization, globally optimal solution can be found
̶ SVMs work very well in practice, even with very small training sample sizes
̶ No “direct” multi-class SVM, must combine two-class SVMs (e.g., with one-vs-others)
̶ Computation, memory (esp. for nonlinear SVMs)
SVM for image processing?
C. Oh et al, Sparse Edit Propagation for Hight Resolution Image using Support Vector Machins, ICIP 2015
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com