3/30/2021
CSE 473/573
‘-
Introduction to Computer Vision and Image Processing
1
Using K-NN
• Simple, a good one to try first
• With infinite examples, 1-NN provably has error that is at most twice Bayes optimal error
2
‘-
1
3/30/2021
Sec. 15.1
Support Vector Machine (SVM)
• SVMs maximize the margin around the separating hyperplane.
‐ A.k.a. large margin classifiers
Support vectors
• The decision function is fully specified by a subset of training samples, the support vectors.
• Solving SVMs is a quadratic programming problem
• Seen by many as the most successful current text classification method*
‘-
Maximizes
Narrower margin
margin
*but other discriminative methods often perform very similarly
3
What about multi-class SVMs?
• Unfortunately, there is no “definitive” multi-class SVM formulation
• In practice, we have to obtain a multi-class SVM by combining multiple
two-class SVMs
• One vs. others
‘-
• Traning: learn an SVM for each class vs. the others
• Testing: apply each SVM to test example and assign to it the class of the SVM that returns the highest decision value
• One vs. one
• Training: learn an SVM for each pair of classes
• Testing: each learned SVM “votes” for a class to assign to the test example
4
Slide credit: L. Lazebnik
2
3/30/2021
SVMs: Pros and cons
• Pros
• Many publicly available SVM packages:
http://www.kernel-machines.org/software
• Kernel-based framework is very powerful, flexible ‘-
• SVMs work very well in practice, even with very small training sample sizes
• Cons
• No “direct” multi-class SVM, must combine two-class SVMs • Computation, memory
‐ During training time, must compute matrix of kernel values for every pair of examples
‐ Learning can take a very long time for large-scale problems
5
Machine Learning Considerations
• 3 important design decisions:
1) What data do I use?
2) How do I represent my data (what feature)?
3) What classifier / regressor / machine learn‘-ing tool do I use?
• These are in decreasing order of importance
• Deep learning addresses 2 and 3 simultaneously (and blurs the
boundary between them).
• You can take the representation from deep learning and use it with any classifier.
6
3
3/30/2021
What to remember about classifiers
• No free lunch: machine learning algorithms are tools, not dogmas
• Try simple classifiers first
• Better to have smart features and simple classifiers than simple
features and smart classifiers
• Use increasingly powerful classifiers with more training data (bias-variance tradeoff)
7
Slide credit: D. Hoiem
‘-
RECOGNITION
Slide Credits: Hays
‘-
8
4
3/30/2021
‘-
9
OBJECTS
ANIMALS
PLANTS
INANIMATE
…..
NATURAL
‘-
VERTEBRATE
MAMMALS
BIRDS
TAPIR
BOAR
GROUSE
MAN-MADE
CAMERA
10
5
3/30/2021
‘-
11
Computer
Vision?
• Object Detection
• Scene Categorization
• Scene Tagging
• Image Parsing & Segmentation
• Scene Understanding?
Svetlana Lazebnik
sky
tree
banner
find pedestrians?
people
building
‘-
building
street lamp
market
• outdoor/indoor
• city/forest/factory/etc.
• street
• people
• building • mountain • tourism
• cloudy
• brick
•…
12
6
3/30/2021
Recognitionisallabou modeling variability
t
‘-
Variability:
Camera position Illumination Shape parameters
Within-class variations?
13
Svetlana Lazebnik
Within-class variations
‘-
14
Svetlana Lazebnik
7
3/30/2021
History of ideas in recognition
• 1960s – early 1990s: the geometric era
‘-
Variability: Camera position Focus: Alignment Illumination
Shape: Assumed known
Roberts (1965); Lowe (1987); Faugeras & Hebert (1986); 15 Grimson & Lozano-Perez (1986); Huttenlocher & Ullman (1987) Svetlana Lazebnik
Recognition as an alignment problem: Block world
16
L. G. Roberts, Machine Perception of Three Dimensional Solids, Ph.D. thesis, MIT Department ‘-
of Electrical Engineering, 1963.
J. Mundy, Object Recognition in the Geometric Era: a Retrospective, 2006
8
3/30/2021
History of ideas in recognition
• 1960s – early 1990s: the geometric era
‘-
Variability Invariance to: Camera position Illumination
Internal parameters
Duda & Hart ( 1972); Weiss (1987); Mundy et al. (1992-94);
Rothwell et al. (1992); Burns et al. (1993)
17
Svetlana Lazebnik
B
C
D
A
Example: invariant to similarity transformations computed from four points
Projective invariants (Rothwell et al., 1992):
General 3D objects do not admit monocular viewpoint invariants (Burns et al., 1993)
‘-
18
9
3/30/2021
Representing and recognizing object categories is harder…
ACRONYM (Brooks and Binford, 1981)
Binford (1971), Nevatia & Binford (1972), Marr & Nishihara (1978)
19
‘-
Recognition by components
Primitives(geons) Biederman(1987)
http://en.wikipedia.org/wiki/Recognition_by_Components_Theory
20
Svetlana Lazebnik
Objects
‘-
10
3/30/2021
General shape primitives?
Generalized cylinders Ponce et al. (1989)
Zisserman et al. (1995)
‘-
Forsyth (2000) 21 Svetlana Lazebnik
History of ideas in recognition
• 1960s – early 1990s: the geometric era • 1990s: appearance-based models
‘-
Empirical models of image variability
Appearance-based techniques
Turk & Pentland (1991); Murase & Nayar (1995); etc.
22
Svetlana Lazebnik
11
3/30/2021
Eigenfaces (Turk & Pentland, 1991)
‘-
23
Svetlana Lazebnik
Color Histograms
‘-
Swain and Ballard, Color Indexing, IJCV 1991. 24 Svetlana Lazebnik
12
3/30/2021
History of ideas in recognition
• 1960s – early 1990s: the geometric era
• 1990s: appearance-based models
• 1990s – present: sliding window approaches ‘-
25
Svetlana Lazebnik
Sliding window approaches
‘-
26
13
3/30/2021
Sliding window approaches
• Turk and Pentland, 1991
• Belhumeur, Hespanha, & Kriegman,
1997
• Schneiderman & Kanade 2004 • Viola and Jones, 2000
‘-
• Schneiderman & Kanade, 2004
• Argawal and Roth, 2002
• Poggio et al. 1993
27
History of ideas in recognition
• 1960s – early 1990s: the geometric era • 1990s: appearance-based models
• Mid-1990s: sliding window approaches • Late 1990s: local features
‘-
28
Svetlana Lazebnik
14
3/30/2021
Local features for object instance recognition
‘-
D. Lowe (1999, 2004)
29
Large-scale image search
Combining local features, indexing, and spatial constraints
‘-
Image credit: K. Grauman and B. Leibe
30
15
3/30/2021
Large-scale image search
Combining local features, indexing, and spatial constraints
‘-
Philbin et al. ‘07
31
Large-scale image search
Combining local features, indexing, and spatial constraints
‘-
32
Svetlana Lazebnik
16
3/30/2021
History of ideas in recognition
• 1960s – early 1990s: the geometric era • 1990s: appearance-based models
• Mid-1990s: sliding window approaches • Late 1990s: local features
• Early 2000s: parts-and-shape models
‘-
33
Parts-and-shape models
• Model:
• Object as a set of parts
• Relative locations between parts • Appearance of part
‘-
34
Figure from [Fischler & Elschlager 73]
17
3/30/2021
Constellation models
‘-
Weber, Welling & Perona (2000), Fergus, Perona & Zisserman (2003)
35
Representing people
‘-
36
18
3/30/2021
Discriminatively trained part-based models
P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, “Object Detection with Discriminatively Trained Part-Based Models,” PAMI 2009
‘-
37
History of ideas in recognition
• 1960s – early 1990s: the geometric era • 1990s: appearance-based models
• Mid-1990s: sliding window approaches • Late 1990s: local features
• Early 2000s: parts-and-shape models • Mid-2000s: bags of features
‘-
38
Svetlana Lazebnik
19
3/30/2021
Bag-of-features models
‘-
39
Svetlana Lazebnik
Bag-of-features models
Object
Bag of ‘words’
‘-
40
Svetlana Lazebnik
20
Objects as texture
• All of these are treated as being the same
‘-
• No distinction between foreground and background: scene recognition?
41
Svetlana Lazebnik
3/30/2021
Origin 1: Texture recognition
• Texture is characterized by the repetition of basic elements or textons • For stochastic textures, it is the identity of the textons, not their spatial
arrangement, that matters
‘-
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 200342
21
3/30/2021
Origin 1: Texture recognition
histogram
Universal texton dictionary
‘-
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 200343
Origin 2: Bag-of-words models
• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
44
‘-
22
3/30/2021
Origin 2a: Noisy Bag-of-words models
• What if we have noisy text (ie OCR?)
• Consider just relationships between characters…. • Bi-Gram Models
• Tri-Gram Models
Computer Vision = CO + OM + MP + PU + UT + TE + ER + VI + IS +
SI + IO + ON Build a histogram …
‘-
45
Bag-of-features steps
1. Extract features
2. Learn “visual vocabulary”
3. Quantize features using visual vocabulary
4. Represent images by frequencies of “visual words”
‘-
46
23
3/30/2021
1. Feature extraction
• Regular grid or interest regions
‘-
47
1. Feature extraction
Compute descriptor
‘-
Normalize patch
Detect patches
Slide credit: Josef Sivic
48
24
3/30/2021
1. Feature extraction
…
‘-
Slide credit: Josef Sivic
49
2. Learning the visual vocabulary
…
‘-
Slide credit: Josef Sivic
50
25
3/30/2021
2. Learning the visual vocabulary
…
‘-
Clustering
Slide credit: Josef Sivic
51
…
Visual vocabulary
‘-
Clustering
Slide credit: Josef Sivic
52
26
3/30/2021
Clustering and vector quantization
• Clustering is a common method for learning a visual vocabulary or codebook
• Unsupervised learning process
• Each cluster center produced by k-means becomes a codevector
• Codebook can be learned on separate training set ‘-
• Provided the training set is sufficiently representative, the codebook will be “universal”
• The codebook is used for quantizing features
• A vector quantizer takes a feature vector and maps it to the index
of the nearest codevector in a codebook
• Codebook = visual vocabulary
• Codevector = visual word
53
Example Codebook
‘-
Appearance codebook
54
Source: B. Leibe
…
27
3/30/2021
Another codebook
… … … …
‘-
Appearance codebook
…
55
Source: B. Leibe
Visual vocabularies: Issues
• How to choose vocabulary size? • Too small: visual words not
representative of all patches
• Too large: quantization artifacts, ‘- overfitting
• Computational efficiency
• Vocabulary trees
(Nister & Stewenius, 2006)
56
28
3/30/2021
But what about layout?
‘-
All of these images have the same color histogram
57
Spend 30 minutes writing up ideas of how the following may be solved. Think about how we moved from pixels to features to ????
‘-
• What other tools do we have in our tool bag that can now be applied to “objects”?
58
• What can we do deal with Layout for Recognition?
• What can we about the Brute Force Matching problem you may have
seen in the project?
29