CMT107 Visual Computing
CMT107 Visual Computing
Copyright By PowCoder代写 加微信 powcoder
Object Recognition
School of Computer Science and Informatics
Cardiff University
• Object Recognition
• Overview
• What “Works” Today
• Machine Learning Approach for Recognition
• The Machine Learning Framework
• Classifiers
• Nearest neighbour
• Recognition Task and Supervision
• Generalization
• Datasets
• Face Detection and Recognition
• The Viola/Jones Face Detector
• Face Recognition
Object Recognition
• Object recognition is the task of finding a given object in an image or a video.
• The object recognition problem can be defined as a labelling problem based
on models of known objects.
• Object recognition approaches:
• Geometric model-based methods
• Appearance-based methods
• Feature-based methods
How many Visual Object Categories are there?
Biederman 1987
Slides adapted from Fei-Fei Li, , , and
How many Visual Object Categories are there?
Visual Object Categories
ANIMALS INANIMATEPLANTS
MAN-MADENATURAL
VERTEBRATE…..
MAMMALS BIRDS
GROUSEBOARTAPIR CAMERA
Specific Recognition Tasks
Scene Categorisation
• outdoor/indoor
• city/forest/factory/etc.
Image Annotation/Tagging
• building
• mountain
Object Detection
• find pedestrians
Image Parsing
street lamp
Image Understanding?
Recognition is All about Modelling Variability
• Variability
• Camera position
• Illumination
• Shape parameters
Within-class variations?
Within-class Variations
History of Ideas in Recognition
• 1960s – early 1990s: the geometric era
Recognition is All about Modelling Variability
• Variability
• Camera position
• Illumination
• Shape parameters : assumed known
Roberts (1965); Lowe (1987);
Faugeras & Hebert (1986);
Grimson & Lozano-Perez (1986);
Huttenlocher & Ullman (1987)
• Alignment: fitting a model to a transformation between pairs of features
(matches) in two images
‘ Find transformation T
that minimizes
σ𝑖 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙(𝑇 𝑥𝑖 , 𝑥𝑖
Recognition as an Alignment Problem
J. Mundy, Object Recognition in the Geometric Era: a Retrospective, 2006
L. G. Roberts, Machine
Perception of Three
Dimensional Solids, Ph.D.
thesis, MIT Department of
Electrical Engineering, 1963.
http://www.di.ens.fr/~ponce/mundy.pdf
http://www.packet.cc/files/mach-per-3D-solids.html
Recognition is All about Modelling Variability
• Variability
• Camera position
• Illumination
• Internal parameters
Invariant to Duda & Hart ( 1972);
Weiss (1987);
Mundy et al. (1992-94);
Rothwell et al. (1992);
Burns et al. (1993)
Representing and Recognising Object Categories is Harder…
ACRONYM (Brooks and Binford, 1981)
Binford (1971), Nevatia & Binford (1972), Marr & Nishihara (1978)
Recognition by Components
Primitives (geons) Objects
http://en.wikipedia.org/wiki/Recognition_by_Components_Theory
Biederman (1987)
http://en.wikipedia.org/wiki/Recognition_by_Components_Theory
General Shape Primitives
Zisserman et al. (1995)
Generalized cylinders
Ponce et al. (1989)
Forsyth (2000)
History of Ideas in Recognition
• 1960s – early 1990s: the geometric era
• 1990s – appearance-based models
Recognition is All about Modelling Variability
• Empirical models of image variability
• Appearance-based models:
Turk & Pentland (1991); Murase & Nayar (1995); etc.
Eigenfaces (Turk & Pentland 1991)
Colour Histograms
Swain and Ballard, Color Indexing, IJCV 1991.
http://www.inf.ed.ac.uk/teaching/courses/av/LECTURE_NOTES/swainballard91.pdf
Appearance Manifolds
H. Murase and S. Nayar, Visual learning and recognition of 3-d objects
from appearance, IJCV 1995
Limitations of Global Appearance Models
• Requires global registration of patterns
• Not robust to clutter, occlusion, geometric transformations
History of Ideas in Recognition
• 1960s – early 1990s: the geometric era
• 1990s – appearance-based models
• Mid-1990s – sliding window approaches
Sliding Window Approaches
Sliding Window Approaches
• Turk and Pentland, 1991
• Belhumeur, Hespanha, & Kriegman, 1997
• Schneiderman & Kanade 2004
• Viola and Jones, 2000
• Schneiderman & Kanade, 2004
• Argawal and Roth, 2002
• Poggio et al. 1993
History of Ideas in Recognition
• 1960s – early 1990s: the geometric era
• 1990s – appearance-based models
• Mid-1990s – sliding window approaches
• Late-1990s – local features
Local features for Object Instance Recognition
D. Lowe (1999, 2004)
Large-Scale Image Search
• Combine local features, indexing, and spatial constraints
Image credit: K. Grauman and B. Leibe
Large-Scale Image Search
• Combine local features, indexing, and spatial constraints
Philbin et al. ‘07
History of Ideas in Recognition
• 1960s – early 1990s: the geometric era
• 1990s – appearance-based models
• Mid-1990s – sliding window approaches
• Late-1990s – local features
• Early-2000s – parts-and-shape models
Parts-and-Shape Models
• Object as a set of parts
• Relative locations between parts
• Appearance of parts
Figure from [Fischler & Elschlager 73]
Constellation Models
Weber, Welling & Perona (2000), Fergus, Perona & Zisserman (2003)
Pictorial Structure Models
• Representing people
Part AppearancePart Geometry
Fisheler and Elschlager(73), Felzenszwalb and Huttenlocher(00)
History of Ideas in Recognition
• 1960s – early 1990s: the geometric era
• 1990s – appearance-based models
• Mid-1990s – sliding window approaches
• Late-1990s – local features
• Early-2000s – parts-and-shape models
• Mid-2000s – bags of features
Bag-of-Features Model
Bag of ‘words’
Objects as Texture
• All of these are being treated as the same
• No distinction between foreground and background: Scene recognition?
History of Ideas in Recognition
• 1960s – early 1990s: the geometric era
• 1990s – appearance-based models
• Mid-1990s – sliding window approaches
• Late-1990s – local features
• Early-2000s – parts-and-shape models
• Mid-2000s – bags of features
• Present trends: combination of local and global methods, data-driven
methods, context
Global Scene Descriptors
• The “gist” of a scene: Oliva & Torralba (2001)
http://people.csail.mit.edu/torralba/code/spatialenvelope/
http://people.csail.mit.edu/torralba/code/spatialenvelope/
Data Driven Methods
J. Hays and A. Efros,
Scene Completion using
Millions of Photographs,
SIGGRAPH 2007
http://graphics.cs.cmu.edu/projects/scene-completion/
Data Driven Methods
J. Tighe and S. Lazebnik, ECCV 2010
Geometric Context
D. Hoiem, A. Efros, and M. Herbert. Putting Objects in Perspective. CVPR 2006.
http://www.cs.uiuc.edu/homes/dhoiem/projects/pop/
Discriminatively Trained Part-based Models
P. Felzenszwalb, R. Girshick, D.
McAllester, D. Ramanan, “Object
Detection with Discriminatively
Trained Part-Based Models,”
http://www.ics.uci.edu/~dramanan/papers/latentmix.pdf
What “Works” Today
• Reading license plates, postcodes, cheques
What “Works” Today
• Reading license plates, postcodes, cheques
• Fingerprint recognition
What “Works” Today
• Reading license plates, postcodes, cheques
• Fingerprint recognition
• Face detection
What “Works” Today
• Reading license plates, postcodes, cheques
• Fingerprint recognition
• Face detection
• Recognition of flat textured objects (CD covers, book covers, etc.)
Recognition: A Machine Learning Approach
Slides adapted from Fei-Fei Li, , , , and Derek Hoiem
The Machine Learning Framework
• Apply a prediction function to a feature representation of the image to get
the desired output
f( ) = “apple”
f( ) = “tomato”
f( ) = “cow”
The Machine Learning Framework
• Training: given a training set of labelled examples {(x1,y1), …, (xN,yN)},
estimate the prediction function f by minimizing the prediction error on the
training set
• Testing: apply f to a never seen before test example x and output the
predicted value y = f(x)
output prediction
Image feature
The Machine Learning Framework
Prediction
Test Image
• Raw pixels
• Histograms
• GIST Descriptors
Classifier: Nearest Neighbour
f(x) = label of the training sample nearest to x
• All we need is a distance function for the inputs
• No training required!
Training examples
from class 1
Training examples
from class 2
Classifier: Linear
• Find a linear function to separate the classes:
f(x) = sgn(w x + b)
Training examples
from class 1
Training examples
from class 2
Recognition Task and Supervision
• Images in the training set must be annotated with the “correct answer” that
the model is expected to produce
“Contains a motorbike”
Spectrum of Supervision
Unsupervised “Weakly” supervised Fully supervised
Definition depends on task
Generalisation
• How well does a
learned model
generalise from
the data it was
trained on to a
new test set?
Training set (labels known) Test set (labels unknown)
Generalisation
• Components of generalisation error
• Bias: how much the average model over all training sets differ from the true model?
– Error due to inaccurate assumptions/simplifications made by the model
• Variance: how much models estimated from different training sets differ from each
• Underfitting: model is too “simple” to represent all the relevant class
characteristics
• High bias and low variance
• High training error and high test error
• Overfitting: model is too “complex” and fits irrelevant characteristics (noise)
in the data
• Low bias and high variance
• Low training error and high test error
Bias-Variance Tradeoff
Training error
Test error
Complexity Low Bias
High Variance
Low Variance
Effect of Training Size
Many training examples
Few training examples
Complexity Low Bias
High Variance
Low Variance
Effect of Training Size
Generalisation Error
Number of Training Examples
Fixed prediction model
• Circa 2001: 5 categories, 100s of images per category
• Circa 2004: 101 categories
• Today: up to thousands of categories, millions of images
Caltech 101 & 256
Griffin, Holub, Perona, 2007
Fei-Fei, Fergus, Perona, 2004
http://www.vision.caltech.edu/Image_Datasets/Caltech101/
http://www.vision.caltech.edu/Image_Datasets/Caltech256/
http://www.vision.caltech.edu/Image_Datasets/Caltech101/
http://www.vision.caltech.edu/Image_Datasets/Caltech256/
Caltech 101: Intraclass Variability
Face Detection
Behold a state-of-the-art face detector!
(Courtesy Boris Babenko)
http://vision.ucsd.edu/~bbabenko/
Face Detection and Recognition
Detection Recognition “Sally”
Consumer Application: Apple iPhoto
http://www.apple.com/ilife/iphoto/
http://www.apple.com/ilife/iphoto/
Consumer Application: Apple iPhoto
Consumer Application: Apple iPhoto
• Things iPhoto thinks are faces
Funny Nikon Ads
“The Nikon S60 detects up to 12 faces.”
Funny Nikon Ads
“The Nikon S60 detects up to 12 faces.”
Challenges of Face Detection
• Sliding window detector must evaluate tens of thousands of local/scale
combinations
• Faces are rare: 0 – 10 per image
• For computational efficiency, we should try to spend as little time as possible on the
non-face windows
• A megapixel image has ~106 pixels and a comparable number of candidate face
• To avoid having a false positive in every image, our false positive rate has to be less
The Viola/Jones Face Detector
• A seminal approach to real-time object detection
• Training is slow but detection is very fast
• Key ideas
• Integral images for fast feature evaluation
• Boosting for feature selection
• Attentional cascade for fast rejection of non-face windows
P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features.
CVPR 2001.
P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), 2004.
http://research.microsoft.com/en-us/um/people/viola/pubs/detect/violajones_cvpr2001.pdf
http://www.vision.caltech.edu/html-files/EE148-2005-Spring/pprs/viola04ijcv.pdf
Image Features
• “Rectangular filters”
• Value = ∑ (pixels in white area)
– ∑ (pixels in black area)
Fast Computation with Integral Images
• The integral image computes a
value at each pixel (x,y) that is the
sum of the pixel values above and
to the left of (x,y), inclusive
• This can quickly be computed in
one pass through the image
Computing the Integral Image
Computing the Integral Image
• Cumulative row sum: s(x, y) = s(x–1, y) + i(x, y)
• Integral image: ii(x, y) = ii(x, y−1) + s(x, y)
ii(x, y-1)
Computing Sum within a Rectangle
• Let A,B,C,D be the values of the
integral image at the corners of a
• Then the sum of original image
values within the rectangle can be
computed as:
sum = A – B – C + D
• Only 3 additions are required for
any size of rectangle!
Integral Image
Black = A-B-C+D
White = C-D-E+F
Value = White-Black = -A+B+2C-2D-E+F
Feature Selection
• For a 24×24 detection region, the number of possible rectangle features is
Feature Selection
• For a 24×24 detection region, the number of possible rectangle features is
• At test time, it is impractical to evaluate the entire feature set
• Can we create a good classifier using just a small subset of all possible
• How to select such a subset?
• Boosting is a classification scheme that combines weak learners into a more
accurate ensemble classifier
• Training procedure
• Initially, weight each training example equally
• In each boosting round:
• Find the weak learner that achieves the lowest weighted training error
• Raise the weights of training examples misclassified by current weak learner
• Compute final classifier as linear combination of all weak learners (weight of each
learner is directly proportional to its accuracy)
• Exact formulas for re-weighting and combining weak learners depend on the particular boosting
scheme (e.g., AdaBoost)
Y. Freund and R. Schapire, A short introduction to boosting, Journal of Japanese Society for Artificial Intelligence,
14(5):771-780, September, 1999.
http://www.cs.princeton.edu/~schapire/uncompress-papers.cgi/FreundSc99.ps
Boosting for Face Detection
• Define weak learners based on rectangle features
• For each round of boosting:
• Evaluate each rectangle filter on each example
• Select best filter/threshold combination based on weighted training error
• Reweight examples
otherwise 0
value of rectangle feature
parity threshold
Boosting for Face Detection
• First two features selected by
• This feature combination can
yield 100% detection rate and
50% false positive rate
Boosting for Face Detection
• First two features selected by
• This feature combination can
yield 100% detection rate and
50% false positive rate
Boosting for Face Detection
• A 200-feature classifier can yield 95% detection rate and a false positive rate
of 1 in 14084
Receiver operating characteristic (ROC) curve
Not good enough!
Attentional Cascade
• We start with simple classifiers which reject many of the negative sub-
windows while detecting almost all positive sub-windows
• Positive response from the first classifier triggers the evaluation of a second
(more complex) classifier, and so on
• A negative outcome at any point leads to the immediate rejection of the sub-
SUB-WINDOW
Classifier 1 Classifier 3
Classifier 2
Attentional Cascade
• Chain classifiers that are progressively
more complex and have lower false
positive rates:
SUB-WINDOW
Classifier 1 Classifier 3
Classifier 2
vs false neg determined by
% False Pos
0 50
Receiver operating characteristic
Attentional Cascade
• The detection rate and the false positive rate of the cascade are found by
multiplying the respective rates of the individual stages
• A detection rate of 0.9 and a false positive rate on the order of 10-6 can be
achieved by a 10-stage cascade if each stage has a detection rate of 0.99
(0.9910 ≈ 0.9) and a false positive rate of about 0.30 (0.310 ≈ 6×10-6)
SUB-WINDOW
Classifier 1 Classifier 3
Classifier 2
Training the Cascade
• Set target detection and false positive rates for each stage
• Keep adding features to the current stage until its target rates have been met
• Need to lower AdaBoost threshold to maximize detection (as opposed to minimizing
total classification error)
• Test on a validation set
• If the overall false positive rate is not low enough, then add another stage
• Use false positives from current stage as the negative training examples for
the next stage
The Implemented System
• Training data
• 5000 faces
all frontal, rescaled to 24×24 pixels
• 300 million non-faces
9500 non-face images
• Faces are normalized:
scale, translation
• Many variations
• Across individuals
• Illumination
System Performance
• Training time: “weeks” on 466 MHz Sun workstation
• 38 layers, total of 6061 features
• Average of 10 features evaluated per window on test set
• “On a 700 Mhz Pentium III processor, the face detector can process a 384 by
288 pixel image in about .067 seconds”
• 15 times faster than previous detector of comparable accuracy (Rowley et al., 1998)
Output of Face Detector on Test Images
Other Detection Tasks
Facial Feature Localization
Profile Detection
Profile Detection
Viola/Jones Detector Summary
• Rectangle features
• Integral images for fast computation
• Boosting for feature selection
• Attentional cascade for fast rejection of negative windows
Face Recognition
• N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar, “Attribute
and Simile Classifiers for Face Verification,” ICCV 2009.
http://www.cs.columbia.edu/CAVE/projects/faceverification/
Face Recognition
• N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar, “Attribute
and Simile Classifiers for Face Verification,” ICCV 2009.
Attributes for training Similes for training
http://www.cs.columbia.edu/CAVE/projects/faceverification/
Face Recognition
• Results on Labeled Faces in the Wild Dataset
• N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar, “Attribute
and Simile Classifiers for Face Verification,” ICCV 2009.
http://vis-www.cs.umass.edu/lfw/
http://www.cs.columbia.edu/CAVE/projects/faceverification/
• What is object recognition?
• Briefly describe the history of ideas of object recognition.
• Describe the machine learning framework.
• Describe nearest neighbour and linear classifiers.
• What is the task of face detection and recognition?
• Describe the Viola/Jones Face Detector.
Slide 1: CMT107 Visual Computing
Slide 2: Overview
Slide 3: Object Recognition
Slide 4: How many Visual Object Categories are there?
Slide 5: How many Visual Object Categories are there?
Slide 6: Visual Object Categories
Slide 7: Specific Recognition Tasks
Slide 8: Scene Categorisation
Slide 9: Image Annotation/Tagging
Slide 10: Object Detection
Slide 11: Image Parsing
Slide 12: Image Understanding?
Slide 13: Recognition is All about Modelling Variability
Slide 14: Within-class Variations
Slide 15: History of Ideas in Recognition
Slide 16: Recognition is All about Modelling Variability
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com