Computer Vision
Machine learning basics and recognition
Semester 1
Changjae Oh
Copyright By PowCoder代写 加微信 powcoder
Objectives
• To understand machine learning basics for high-level vision problems
Machine learning problems
Slide credit: J. Hays
Machine learning problems
Slide credit: J. Hays
Dimensionality Reduction • Principal component analysis (PCA),
PCA takes advantage of correlations in data dimensions t o produce the best possible lower dimensional representa tion based on linear projections (minimizes reconstruction error).
PCA should be used for dimensionality reduction, not for discovering patterns or making predictions. Don’t try to as sign semantic meaning to the bases.
• Locally Linear Embedding, Isomap, Autoencoder, etc.
Machine learning problems
means clustering
Image Clusters on intensity Clusters on color
Mean shift algorithm
Spectral clustering
Group points based on links in a graph
Visual PageRank
• Determining importance by random walk
What’s the probability that you will randomly walk to a given node? Create adjacency matrix based on visual similarity
Edge weights determine probability of transition
C. Oh et al., Probabilistic Correspondence Matching using Random Walk with Restart, BMVC 2012
Machine learning problems
The machine learning framework
• Apply a prediction function to a feature representation of the image to
get the desired output:
f( ) = “apple”
f( ) = “tomato” f( ) = “cow”
Slide credit: L. Lazebnik
Machine learning framework
output prediction function Image feature
• Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set
• Testing: apply f to a never before seen test example x and output the predicted value y = f(x)
Slide credit: L. Lazebnik 13
Machine learning framework
Training Images
Test Image
Image Features
Image Features
Training Labels
Classifier Training
Trained Classifier
Trained Classifier
Prediction
• Raw pixels
• Histograms
• GIST descriptors • CNNs
Learning a classifier
• Given some set of features with corresponding labels, learn a function to p redict the labels from the features
Many classifiers to choose from
• Neural networks
• Naïve Bayes
• K-nearest neighbour
• Bayesian network
• Logistic regression
• Randomized Forests
• Boosted Decision Trees
• Deep Convolutional Network
Classifiers: Nearest neighbor
Training exa mples from class 1
Test exa mple
Training exa mples from class 2
f(x) = label of the training example nearest to x
• All we need is a distance function for our inputs
• No training required!
Slide credit: S. Lazebnik
Classifiers: Linear
• Find a linear function to separate the classes: f(x) = sgn(w x + b)
Slide credit: L. Lazebnik 19
Example: Image Classification by K
Classifier
Prediction
Feature Extraction (HOG)
Example: Image Classification by K
Classifier
Prediction
K-NN Classifier
Example: Image Classification by K
Image Feature
Classifier
Prediction
Prediction
Recognition task and supervision
• Images in the training set must be annotated with the “correct answer” that the model is expected to produce
Contains a motorbike
Slide credit: S. Lazebnik
Spectrum of supervision
Computer vision
Supervised Semi-Supervised Unsupervised Reinforcement learning learning learning learning
Spectrum of supervision
Slide credit: S. Lazebnik
Generalisation
• How well does a learned model generalise from the data it was trained on to a new test set?
Training set (labels known) Test set (labels unknown)
Generalisation
• How well does a learned model generalise from the data it was trained on to a new test set?
EBU7240 Computer Vision
Changjae Oh
Classification
Semester 1, 2021
Overview of recognition tasks
A statistical learning approach
“Classic” or “shallow” classification pipeline
• “Bag of features” representation
• Classifiers: nearest neighbor, linear, SVM
Verification/Classification
Is this a building?
Adapted from Fei-fei -Li
Where are the people?
Adapted from Fei-fei -Li
Identification
Is this 天安門?
Adapted from Fei-fei -Li
Semantic Segmentation
Adapted from Fei-fei -Li
Object recognition
• A collection of related tasks for identifying objects in digital photographs.
• Consists of recognizing, identifying, and locating objects within a picture with a given de
gree of confidence.
image classification object detection
semantic segmentation instance segmentation
Image source
Image classification vs Object detection
• Image classification
̶ Identifying what is in the image and the associated level of confidence. ̶ can be binary label or multi-label classification
• Object detection
̶ Localising and classifying one or more objects in an image ̶ Object localisation and image classification
Semantic segmentation vs Instance segmentation
• Semantic segmentation
̶ Assigning a label to every pixel in the image.
̶ Treating multiple objects of the same class as a single entity
• Instance segmentation
Similar process as semantic segmentation, but identifies , for each pixel, the object in stance it belongs to.
Treating multiple objects of the same class as distinct individual objects (or instances) typically, instance segmentation is harder than semantic segmentation
Image classification
The machine learning framework
• Apply a prediction function to a feature representation of the image to
get the desired output:
f( ) = “apple”
f( ) = “tomato” f( ) = “cow”
Slide credit: L. Lazebnik
Machine learning framework
output prediction function Image feature
• Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set
• Testing: apply f to a never before seen test example x and output the predicted value y = f(x)
Slide credit: S. Lazebnik
Machine learning framework
Training Images
Test Image
Image Features
Image Features
Training Labels
Classifier Training
Trained Classifier
Trained Classifier
Prediction
“Classic” recognition pipeline
• Hand-crafted feature representation
• Off-the-shelf trainable classifier
Image Pixels
Trainable classifier
Class label
Feature representation
“Classic” representation: Bag of features
• Representing images as orderless collections of local features
Motivation 1: Part
• Various parts of the image are used separately to determine if and wher e an object of interest exists
based models
Weber, Welling & Perona (2000), Fergus, Perona & Zisserman (2003)
Motivation 2: Texture models
• Texture is characterised by the repetition of basic elements or textons
Texton histogram
“Texton dictionary”
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
Motivation 3: Bags of words
• Orderless document representation:
̶ Frequencies of words from a dictionary Salton & McGill (1983)
Motivation 3: Bags of words
• Orderless document representation:
̶ Frequencies of words from a dictionary Salton & McGill (1983)
Motivation 3: Bags of words
• Orderless document representation:
̶ Frequencies of words from a dictionary Salton & McGill (1983)
Bag of features: Outline
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”
1. Local feature extraction
• Sample patches and extract descriptors
2. Learning the visual vocabulary
Extracted descriptors from the training set
Slide credit:
2. Learning the visual vocabulary
Clustering
2. Learning the visual vocabulary
Visual vocabulary
Clustering
• Want to minimize sum of squared Euclidean distances between features xi and their nearest cluster centers mk
means clustering
• Algorithm:
Randomly initialize K cluster centers Iterate until convergence:
D(X,M)= (x −m )2 ik
Assign each feature to the nearest center
Recompute each cluster center as the mean of all features assigned to it
clusterk pointiin cluster k
Visual vocabularies
Appearance codebook
Source: B. Leibe
Bag of features: Outline
1. Extract local features
2. Learn “visual vocabulary”
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”
“Classic” recognition pipeline
• Hand-crafted feature representation
• Trainable classifier
Nearest Neighbor classifiers Support Vector machines
Image Pixels
Feature representation
Class label
Trainable classifier
Classifiers: Nearest neighbor
Training exa mples from class 1
Test exa mple
Training exa mples from class 2
f(x) = label of the training example nearest to x
• All we need is a distance or similarity function for our inputs
• No training required!
Functions for comparing histograms
• L1 distance: 2
D(h,h )= 1 2
D(h,h )= 1 2
|h(i)−h (i)| 12
• χ distance:
2 i=1 (h(i)−h(i))
• Quadratic distance (cross-bin distance): D(h,h )=A (h(i)−h (j))2
• Histogram intersection (similarity function):
I(h,h)=N min(h(i),h(i))
h (i) + h (i) 12
• For a new point, find the k closest points from training data
nearest neighbor classifier
• Vote for class label with labels of the k points
nearest neighbor classifier
• Which classifier is more robust to outliers?
Credit: , http://cs231n.github.io/classification/
nearest neighbor classifier
Credit: , http://cs231n.github.io/classification/
Linear classifiers
• Find a linear function to separate the classes: f(x) = sgn(w x + b)
Visualizing linear classifiers
Example learned weights at the end of learning for CIFAR-10.
Credit: , http://cs231n.github.io/classification/
Nearest neighbor vs. linear classifiers
• NN pros:
Simple to implement
Decision boundaries not necessarily linear Works for any number of classes Nonparametric method
• NN cons:
̶ Need good distance function ̶ Slow at test time
• Linear pros:
̶ Low-dimensional parametric representation ̶ Very fast at test time
• Linear cons:
Works for two classes
How to train the linear function? What if data is not linearly separable?
Linear classifiers
• When the data is linearly separable, there may be more than one separa tor (hyperplane)
Which separator is the best?
Support vector machines
• Find a hyperplane that maximizes the margin between the positive and negative examples
xi positive(yi =1): xi w+b1 xi negative(yi =−1): xi w+b−1
xi w+b=1
|xi w+b| ||w||
Therefore, the margin is 2 / ||w|| Margin
For support vectors,
Distance between point and hyperplane:
Support vectors
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
Finding the maximum margin hyperplane
1. Maximize margin 2 / ||w||
2. Correctly classify all training data:
xi positive(yi =1): xi w+b1 xi negative(yi =−1): xi w+b−1
• Quadratic optimization problem:
min 12 w 2 subject to yi (w xi + b) 1 w,b
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
SVM parameter learning
min12 w2 subjectto yi(wxi +b)1 • Non-separable data:
min 2 w +C max(0,1-yi(w×xi +b))
• Separable data:
Maximize margin
Classify training data correctly
Maximize margin
Minimize classification mistakes
SVM parameter learning
min 2 w +C max(0,1-yi(w×xi +b))
Demo: http://cs.stanford.edu/people/karpathy/svmjs/demo
Nonlinear SVMs
• General idea: the original input space can always be mapped to some hi gher-dimensional feature space where the training set is separable
Nonlinear SVMs
• Linearly separable dataset in 1D:
• Non-separable dataset in 1D:
• We can map the data to a higher-dimensional space: x2
SVMs: Pros and cons
Non-linear SVM framework is very powerful, flexible
Training is convex optimization, globally optimal solution can be found SVMs work very well in practice, even with very small training sample sizes
̶ No “direct” multi-class SVM, must combine two-class SVMs (e.g., with one-vs-others) ̶ Computation, memory (esp. for nonlinear SVMs)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com