Fundamentals of Computer Vision
Lecture
• Harris corner detector.
• Multi-scale detection.
• Why do we need feature descriptors?
• Designing feature descriptors.
Overview of today’s lecture
Slide credits
Most of these slides were adapted from:
• Kris Kitani (15-463, Fall 2016), Ioannis Gkioulekas (16-385, Spring 2019). Some slides were inspired or taken from:
• Fredo Durand (MIT).
• James Hays (Georgia Tech).
Harris corner detection: the math
Consider shifting the window p by (u,v)
• how do the pixels in p change?
• compare each pixel before and after by summing up the squared differences (SSD)
• this defines an SSD “error” E(u,v):
p
Error function
Change of intensity for the shift [u,v]:
Error Window function function
Window function w(x,y) =
Shifted intensity
Intensity
1 in window, 0 outside
Gaussian
or
Bilinear approximation
For small shifts [u,v] we have a ‘bilinear approximation’:
Change in appearance for a shift [u,v]
where M is a 2×2 matrix computed from image derivatives:
‘second moment’ matrix ‘structure tensor’
M
Visualization as an ellipse
Since M is symmetric, we have
We can visualize M as an ellipse with axis lengths determined by
the eigenvalues and orientation determined by R direction of the
Ellipse equation:
fastest change
(lmax)-1/2
(lmin)-1/2
direction of the slowest change
interpreting eigenvalues
l2
edge
l2 >> l1
l1 ~ l2
corner
‘horizontal’
flat
l1 >> l2 ‘vertical’
edge
l1
Use threshold on eigenvalues to detect corners
l2
Use threshold on eigenvalues to detect corners
flat
Think of a function to score ‘cornerness’
l1
l2
Use threshold on eigenvalues to detect corners
flat
strong corner
Think of a function to score ‘cornerness’
l1
Use threshold on eigenvalues to detect corners
l2
R<0 R>0
^
corner
flat
R<0
l1
Interpreting the eigenvalues
Classification of image points using eigenvalues of M: l2
“Edge”
l2 >> l1
“Corner”
l1 and l2 are large, l1 ~ l2;
E increases in all directions
“Edge”
l1 >> l2
“Flat” region
l1 and l2 are small;
E is almost constant in all directions
l1
Harris & Stephens (1988)
Kanade & Tomasi (1994)
Nobel (1998)
Harris Detector
C.Harris and M.Stephens. “A Combined Corner and Edge Detector.”1988. 1. Computexandyderivativesofimage
2. Computeproductsofderivativesateverypixel
3. Computethesumsoftheproductsofderivativesat each pixel
4.
Harris Detector
C.Harris and M.Stephens. “A Combined Corner and Edge Detector.”1988. Define the matrix at each pixel
Compute the response of the detector at each pixel
Threshold on value of R; compute non-max suppression.
5.
6.
Corner Response Example
Harris R score.
Ix, Iy computed using Sobel operator Windowing function w = Gaussian, sigma=1
Corner Response Example
Threshold: R < -10000 (edges)
Corner Response Example
Threshold: > 10000 (corners)
Corner response
Thresholded corner response
Non-maximal suppression
Harris Detector: Invariance Properties
• Rotation
Ellipse rotates but its shape (i.e. eigenvalues) remains the same
Corner response is invariant to image rotation
Harris Detector: Invariance Properties
• Affine intensity change: I ® aI + b üOnly derivatives are used =>
invariance to intensity shift I ® I + b üIntensity scale: I ® a I
RR
threshold
x (image coordinate) x (image coordinate)
Partially invariant to affine intensity change
The Harris detector is not invariant to changes in …
Harris Detector: Invariance Properties
• Scaling
Corner
All points will be classified as edges
Not invariant to scaling
Multi-scale detection
How can we automatically select the scale?
How can we make a feature detector scale-invariant?
Intuitively…
Find local maxima in both position and scale
f
Image 1
f
Image 2
region size
• Design a function on the region which is “scale invariant”
(has the same shape even if the image is resized)
• Take a local maximum of this function
s1
s2
region size
Formally…
Laplacian filter
Highest response when the signal has the same characteristic scale as the filter
characteristic scale – the scale that produces peak filter response
characteristic scale
we need to search over characteristic scales
Blob detection in 2D: scale selection
• Laplacian-of-Gaussian = “blob” detector Ñ2 g = ¶2 g + ¶2 g ¶x2 ¶y2
Bastian Leibe img1 img2 img3
filter scales
Locating Blobs in Scale Space
Scale-space = stack of images convolved with LoG/DoG filters at varying scales (sigma values). Extremal points (local maxima or minima) in this volume determine the location and scale/size of light and dark blobs in the image.
Space (1D slice)
DoG Filters at different scales
Scale
Scale (sigma)
Locate Key Points
DoG level L-1 DoG level L DoG level L+1
Space
Find maxima and minima in an LoG (or DoG) scale space. The 2D location of the extremal point tells the location of the key point. The scale level of the extremal point tells the canonical scale of the image patch surrounding the key point. Pixel x is an extremal point if it is either larger or smaller in value than the 26 pixels surrounding it.
What happens if you apply different Laplacian filters?
Full size 3/4 size
peak!
peak!
optimal scale
2.1 4.2 6.0 9.8 15.5 17.0
Full size image
2.1 4.2 6.0 9.8 15.5 17.0
3/4 size image
optimal scale
2.1
4.2 6.0 9.8 15.5 17.0
maximum response
Full size image
4.2 6.0 9.8 15.5 17.0
maximum response
2.1
3/4 size image
local maximum
4.2
cross-scale maximum
local maximum
6.0
local maximum
9.8
How would you implement scale selection?
implementation
For each level of the Gaussian pyramid
compute feature response (e.g. Harris, Laplacian)
if local maximum and cross-scale save scale and location of feature
References
Basic reading:
• Szeliski textbook, Sections 4.1.
Feature descriptors
We know how to detect good points Next question: How to match them?
?
Answer: Come up with a descriptor for each point, find similar descriptors between the two images
• State of the art approach: SIFT
• David Lowe, UBC http://www.cs.ubc.ca/~lowe/keypoints/
• Geometric Rotation
Image transformations
Scale
Photometric
Intensity change
Invariance vs. discriminability
• Invariance:
• Descriptor shouldn’t change even if image is transformed
• Discriminability:
• Descriptor should be highly unique for each point
Designing feature descriptors
Image patch
Just use the pixel values of the patch
1
2
3
4
5
6
7
8
9
(5) vector of intensity values
1
2
3
4
6
7
8
9
Perfectly fine if geometry and appearance is unchanged (a.k.a. template matching)
Image patch
Just use the pixel values of the patch
1
2
3
4
5
6
7
8
9
(5) vector of intensity values
1
2
3
4
6
7
8
9
Perfectly fine if geometry and appearance is unchanged (a.k.a. template matching)
What are the problems?
Image patch
Just use the pixel values of the patch
1
2
3
4
5
6
7
8
9
(5) vector of intensity values
1
2
3
4
6
7
8
9
Perfectly fine if geometry and appearance is unchanged (a.k.a. template matching)
What are the problems?
How can you be less sensitive to absolute intensity values?
Image gradients
Use pixel differences
1
2
3
4
5
6
7
8
9
()
vector of x derivatives ‘binary descriptor’
–
+
+
–
–
+
Feature is invariant to absolute intensity values
What are the problems?
Image gradients
Use pixel differences
1
2
3
4
5
6
7
8
9
()
vector of x derivatives
–
+
+
–
–
+
Feature is invariant to absolute intensity values
What are the problems?
How can you be less sensitive to deformations?
Color histogram
Count the colors in the image using a histogram
What are the problems?
colors
Invariant to changes in scale and rotation
Color histogram
Count the colors in the image using a histogram
What are the problems?
colors
Invariant to changes in scale and rotation
Color histogram
Count the colors in the image using a histogram
What are the problems?
How can you be more sensitive to spatial layout?
colors
Invariant to changes in scale and rotation
Spatial histograms
Compute histograms over spatial ‘cells’
What are the problems?
Retains rough spatial layout Some invariance to deformations
Spatial histograms
Compute histograms over spatial ‘cells’
What are the problems?
How can you be completely invariant to rotation?
Retains rough spatial layout Some invariance to deformations
Orientation normalization
Use the dominant image gradient direction to normalize the orientation of the patch
save the orientation angle along with
What are the problems? Multi-Scale?
MOPS Descriptor
Multi-Scale Oriented Patches (MOPS)
Multi-Image Matching using Multi-Scale Oriented Patches. M. Brown, R. Szeliski and S. Winder. International Conference on Computer Vision and Pattern Recognition (CVPR2005). pages 510-517
Multi-Scale Oriented Patches (MOPS)
Multi-Image Matching using Multi-Scale Oriented Patches. M. Brown, R. Szeliski and S. Winder. International Conference on Computer Vision and Pattern Recognition (CVPR2005). pages 510-517
Given a feature
Get 40 x 40 image patch, subsample every 5th pixel
(what’s the purpose of this step?)
Subtract the mean, divide by standard deviation
(what’s the purpose of this step?)
Haar Wavelet Transform
(what’s the purpose of this step?)
Multi-Scale Oriented Patches (MOPS)
Multi-Image Matching using Multi-Scale Oriented Patches. M. Brown, R. Szeliski and S. Winder. International Conference on Computer Vision and Pattern Recognition (CVPR2005). pages 510-517
Given a feature
Get 40 x 40 image patch, subsample every 5th pixel
(low frequency filtering, absorbs localization errors)
Subtract the mean, divide by standard deviation
(what’s the purpose of this step?)
Haar Wavelet Transform
(what’s the purpose of this step?)
Multi-Scale Oriented Patches (MOPS)
Multi-Image Matching using Multi-Scale Oriented Patches. M. Brown, R. Szeliski and S. Winder. International Conference on Computer Vision and Pattern Recognition (CVPR2005). pages 510-517
Given a feature
Get 40 x 40 image patch, subsample every 5th pixel
(low frequency filtering, absorbs localization errors)
Subtract the mean, divide by standard deviation
(removes bias and gain)
Haar Wavelet Transform
(what’s the purpose of this step?)
Multi-Scale Oriented Patches (MOPS)
Multi-Image Matching using Multi-Scale Oriented Patches. M. Brown, R. Szeliski and S. Winder. International Conference on Computer Vision and Pattern Recognition (CVPR2005). pages 510-517
Given a feature
Get 40 x 40 image patch, subsample every 5th pixel
(low frequency filtering, absorbs localization errors)
Subtract the mean, divide by standard deviation
(removes bias and gain)
Haar Wavelet Transform (low frequency projection)
SIFT
SIFT
(Scale Invariant Feature Transform)
SIFT describes both a detector and descriptor
1. Multi-scale extrema detection
2. Keypoint localization
3. Orientation assignment
4. Keypoint descriptor
1. Multi-scale extrema detection
Gaussian Difference of Gaussian (DoG)
First octave
Second octave
Gaussian
Laplacian
Scale-space extrema
Selected if larger than all 26 neighbors
Difference of Gaussian (DoG)
Scale of Gaussian variance
Keypoint descriptor
Scale Invariant Feature Transform
Basic idea:
• Take 16×16 square window around detected feature
• Compute edge orientation (angle of the gradient – 90°) for each pixel
• Throw out weak edges (threshold gradient magnitude)
• Create histogram of surviving edge orientations
0 2p angle histogram
Adapted from slide by David Lowe
SIFT descriptor
Full version
• Divide the 16×16 window into a 4×4 grid of cells (2×2 case shown below)
• Compute an orientation histogram for each cell
• 16 cells * 8 orientations = 128 dimensional descriptor
Adapted from slide by David Lowe
Properties of SIFT
Extraordinarily robust matching technique • Can handle changes in viewpoint
• Up to about 60 degree out of plane rotation • Can handle significant changes in illumination
• Sometimes even day vs. night (below) • Fast and efficient—can run in real time
• Lots of code available
• http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_im plementations_of_SIFT