PowerPoint 프레젠테이션
Changjae Oh
Copyright By PowCoder代写 加微信 powcoder
Computer Vision
– Machine learning basics and recognition –
Semester 1, 22/23
Objectives
• To understand machine learning basics for high-level vision problems
Machine learning problems
Slide credit: J. Hays
Machine learning problems
Slide credit: J. Hays
Dimensionality Reduction
• Principal component analysis (PCA),
̶ PCA takes advantage of correlations in data dimensions t
o produce the best possible lower dimensional representa
tion based on linear projections (minimizes reconstruction
̶ PCA should be used for dimensionality reduction, not for
discovering patterns or making predictions. Don’t try to as
sign semantic meaning to the bases.
• Locally Linear Embedding, Isomap, Autoencoder,
Machine learning problems
K-means clustering
Image Clusters on intensity Clusters on color
Mean shift algorithm
Spectral clustering
Group points based on links in a graph
Visual PageRank
• Determining importance by random walk
̶ What’s the probability that you will randomly walk to a given node?
• Create adjacency matrix based on visual similarity
• Edge weights determine probability of transition
C. Oh et al., Probabilistic Correspondence Matching using Random Walk with Restart, BMVC 2012
Machine learning problems
The machine learning framework
• Apply a prediction function to a feature representation of the image to
get the desired output:
Slide credit: L. Lazebnik
f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
Machine learning framework
• Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)},
estimate the prediction function f by minimizing the prediction error
on the training set
• Testing: apply f to a never before seen test example x and output the
predicted value y = f(x)
output prediction function Image feature
Slide credit: L. Lazebnik
Machine learning framework
Classifier
Classifier
Prediction
Test Image
Classifier
• Raw pixels
• Histograms
• GIST descriptors
Learning a classifier
• Given some set of features with corresponding labels, learn a function to p
redict the labels from the features
Many classifiers to choose from
• Neural networks
• Naïve Bayes
• K-nearest neighbour
• Bayesian network
• Logistic regression
• Randomized Forests
• Boosted Decision Trees
• Deep Convolutional Network
Classifiers: Nearest neighbor
f(x) = label of the training example nearest to x
• All we need is a distance function for our inputs
• No training required!
Training exa
mples from
Training exa
mples from
Slide credit: S. Lazebnik
Classifiers: Linear
• Find a linear function to separate the classes:
f(x) = sgn(w x + b)
Slide credit: L. Lazebnik
Example: Image Classification by K-NN
Image Feature Classifier Prediction
Extraction
Example: Image Classification by K-NN
Image Feature Classifier Prediction
Classifier
Example: Image Classification by K-NN
Image Feature Classifier Prediction
Prediction
Recognition task and supervision
• Images in the training set must be annotated with the “correct answer”
that the model is expected to produce
Contains a motorbike
Slide credit: S. Lazebnik
Spectrum of supervision
Supervised
Semi-Supervised
Unsupervised
Reinforcement
Computer vision
Spectrum of supervision
Slide credit: S. Lazebnik
Generalisation
• How well does a learned model generalise from the data it was trained on
to a new test set?
Training set (labels known) Test set (labels unknown)
Generalisation
• How well does a learned model generalise from the data it was trained on
to a new test set?
Changjae Oh
Computer Vision
– Classification –
Semester 1, 22/23
• Overview of recognition tasks
• A statistical learning approach
• “Classic” or “shallow” classification pipeline
• “Bag of features” representation
• Classifiers: nearest neighbor, linear, SVM
Verification/Classification
Adapted from Fei-fei -Li
Is this a building?
Adapted from Fei-fei -Li
Where are the people?
Identification
Adapted from Fei-fei -Li
Is this 天安門?
Semantic Segmentation
Adapted from Fei-fei -Li
Object recognition
• A collection of related tasks for identifying objects in digital photographs.
• Consists of recognizing, identifying, and locating objects within a picture with a given de
gree of confidence.
semantic segmentation instance segmentation
image classification object detection
Image source
https://arxiv.org/pdf/1405.0312.pdf
Image classification vs Object detection
• Image classification
̶ Identifying what is in the image and the associated level of confidence.
̶ can be binary label or multi-label classification
• Object detection
̶ Localising and classifying one or more objects in an image
̶ Object localisation and image classification
Semantic segmentation vs Instance segmentation
• Semantic segmentation
̶ Assigning a label to every pixel in the image.
̶ Treating multiple objects of the same class as a single entity
• Instance segmentation
̶ Similar process as semantic segmentation, but identifies , for each pixel, the object in
stance it belongs to.
̶ Treating multiple objects of the same class as distinct individual objects (or instances)
̶ typically, instance segmentation is harder than semantic segmentation
Image classification
The machine learning framework
• Apply a prediction function to a feature representation of the image to
get the desired output:
Slide credit: L. Lazebnik
f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
Machine learning framework
• Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)},
estimate the prediction function f by minimizing the prediction error
on the training set
• Testing: apply f to a never before seen test example x and output the
predicted value y = f(x)
output prediction function Image feature
Slide credit: S. Lazebnik
Machine learning framework
Classifier
Classifier
Prediction
Test Image
Classifier
“Classic” recognition pipeline
• Hand-crafted feature representation
• Off-the-shelf trainable classifier
representation
classifier
“Classic” representation: Bag of features
• Representing images as orderless collections of local features
Motivation 1: Part-based models
• Various parts of the image are used separately to determine if and wher
e an object of interest exists
Weber, Welling & Perona (2000), Fergus, Perona & Zisserman (2003)
Motivation 2: Texture models
• Texture is characterised by the repetition of basic elements or textons
Texton histogram
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
“Texton dictionary”
Motivation 3: Bags of words
• Orderless document representation:
̶ Frequencies of words from a dictionary Salton & McGill (1983)
Motivation 3: Bags of words
• Orderless document representation:
̶ Frequencies of words from a dictionary Salton & McGill (1983)
Motivation 3: Bags of words
• Orderless document representation:
̶ Frequencies of words from a dictionary Salton & McGill (1983)
Bag of features: Outline
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”
1. Local feature extraction
• Sample patches and extract descriptors
2. Learning the visual vocabulary
Slide credit:
Extracted descriptors from
the training set
2. Learning the visual vocabulary
Clustering
2. Learning the visual vocabulary
Clustering
Visual vocabulary
Recall: K-means clustering
• Want to minimize sum of squared Euclidean distances between features
xi and their nearest cluster centers mk
• Algorithm:
̶ Randomly initialize K cluster centers
̶ Iterate until convergence:
1. Assign each feature to the nearest center
2. Recompute each cluster center as the mean of all features assigned to it
Visual vocabularies
Source: B. Leibe
Appearance codebook
Bag of features: Outline
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”
“Classic” recognition pipeline
• Hand-crafted feature representation
• Trainable classifier
̶ Nearest Neighbor classifiers
̶ Support Vector machines
representation
classifier
Classifiers: Nearest neighbor
f(x) = label of the training example nearest to x
• All we need is a distance or similarity function for our inputs
• No training required!
Training exa
mples from
Training exa
mples from
Functions for comparing histograms
• L1 distance:
• χ2 distance:
• Quadratic distance (cross-bin distance):
• Histogram intersection (similarity function):
))(),(min(),(
K-nearest neighbor classifier
• For a new point, find the k closest points from training data
• Vote for class label with labels of the k points
K-nearest neighbor classifier
• Which classifier is more robust to outliers?
Credit: , http://cs231n.github.io/classification/
http://cs231n.github.io/classification/
K-nearest neighbor classifier
Credit: , http://cs231n.github.io/classification/
http://cs231n.github.io/classification/
Linear classifiers
• Find a linear function to separate the classes:
f(x) = sgn(w x + b)
Visualizing linear classifiers
Credit: , http://cs231n.github.io/classification/
Example learned weights at the end of learning for CIFAR-10.
http://cs231n.github.io/classification/
Nearest neighbor vs. linear classifiers
• NN pros:
̶ Simple to implement
̶ Decision boundaries not necessarily linear
̶ Works for any number of classes
̶ Nonparametric method
• NN cons:
̶ Need good distance function
̶ Slow at test time
• Linear pros:
̶ Low-dimensional parametric representation
̶ Very fast at test time
• Linear cons:
̶ Works for two classes
̶ How to train the linear function?
̶ What if data is not linearly separable?
Linear classifiers
• When the data is linearly separable, there may be more than one separa
tor (hyperplane)
Which separator
is the best?
Support vector machines
• Find a hyperplane that maximizes the margin between the positive and
negative examples
1:1)(negative
1:1)( positive
Support vectors
Distance between point
and hyperplane: ||||
For support vectors, 1=+ bi wx
Therefore, the margin is 2 / ||w||
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
http://www.umiacs.umd.edu/~joseph/support-vector-machines4.pdf
1. Maximize margin 2 / ||w||
2. Correctly classify all training data:
• Quadratic optimization problem:
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
1)(subject to
1:1)(negative
1:1)( positive
Finding the maximum margin hyperplane
http://www.umiacs.umd.edu/~joseph/support-vector-machines4.pdf
SVM parameter learning
• Separable data:
• Non-separable data:
1)(subject to
Maximize margin Classify training data correctly
+C max 0,1- y
Maximize margin Minimize classification mistakes
SVM parameter learning
Demo: http://cs.stanford.edu/people/karpathy/svmjs/demo
+C max 0,1- y
http://cs.stanford.edu/people/karpathy/svmjs/demo
Nonlinear SVMs
• General idea: the original input space can always be mapped to some hi
gher-dimensional feature space where the training set is separable
Φ: x→ φ(x)
Nonlinear SVMs
• Linearly separable dataset in 1D:
• Non-separable dataset in 1D:
• We can map the data to a higher-dimensional space:
SVMs: Pros and cons
̶ Non-linear SVM framework is very powerful, flexible
̶ Training is convex optimization, globally optimal solution can be found
̶ SVMs work very well in practice, even with very small training sample sizes
̶ No “direct” multi-class SVM, must combine two-class SVMs (e.g., with one-vs-others)
̶ Computation, memory (esp. for nonlinear SVMs)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com