[06-30213][06-30241][06-25024]
Computer Vision and Imaging &
Robot Vision
Dr Hyung Jin Chang Dr Yixing Gao
h.j.chang@bham.ac.uk y.gao.8@bham.ac.uk
School of Computer Science
LOCAL FEATURES: DETECTION AND DESCRIPTION
(SZELISKI 4.1)
2
Scale invariant interest points
How can we independently select interest points in each image, such that the detections are repeatable across different scales?
Kristen Grauman
3
From key
• Harris and Hessian determine the key-point location
• Accurate localization
• High repeatability of detection
• To compare these points, a descriptor calculated in the region around the points is required.
• So how to choose the right scale of these regions?
–
points to features
Slide credit: B. Leibe
Naive approach: check all scales
• Checkallscalesexhaustively
• Varytheregionsizeandcomparethedescriptors…
fA
similarity measure
fB
e.g. color
e.g. color
1
d(fA, fB)
Slide credit: Krystian Mikolajczyk
Naive approach: check all scales
• Checkallscalesexhaustively
• Varytheregionsizeandcomparethedescriptors…
fA
similarity measure
fB
e.g. color
e.g. color
1
d(fA, fB)
Slide credit: Krystian Mikolajczyk
Naive approach: check all scales
• Checkallscalesexhaustively
• Varytheregionsizeandcomparethedescriptors…
fA
similarity measure
fB
e.g. color
e.g. color
1
d(fA, fB)
Slide credit: Krystian Mikolajczyk
Naive approach: check all scales
• Checkallscalesexhaustively
• Varytheregionsizeandcomparethedescriptors…
fA similarity measure =
e.g. color
fB
e.g. color
d(fA, fB)
Slide credit: Krystian Mikolajczyk
Naive approach: check all scales
• Changetheregionsizeandcomparethedescriptors
• Computationallyinefficient
• Inefficient,butplausibleformatching selected feature pairs.
• Inappropriateforkey-point-based search in large-scale image retrieval
• Inappropriateforrecognition
fA
e.g. color
e.g. color
similarity measure
e.g. color
fB
=
e.g. color
e.g. color
d(fA, fB)
Slide credit: Krystian Mikolajczyk
Automatic scale selection
• Solution:
• Constructascaleinvariantfunctiononaselectedregion
• Outputs the same value for regions with the same content, even if the regions are located at different scales.
Example: Average intensity of the gray-scale region.
Even if two corresponding regions are at different scales, we will get the same output.
241.3
241.3
Automatic scale selection
• Function responses to different scales (scale signatures)
f(Ii1!im (x,s)) f(Ii1!im (x¢,s))
Slide credit: Krystian Mikolajczyk
Automatic scale selection
• Function responses to different scales (scale signatures)
f(Ii1!im (x,s)) f(Ii1!im (x¢,s))
Slide credit: Krystian Mikolajczyk
Automatic scale selection
• Function responses to different scales (scale signatures)
f(Ii1!im (x,s)) f(Ii1!im (x¢,s))
Slide credit: Krystian Mikolajczyk
Automatic scale selection
• Function responses to different scales (scale signatures)
f(Ii1!im (x,s)) f(Ii1!im (x¢,s))
Slide credit: Krystian Mikolajczyk
Automatic scale selection
• Function responses to different scales (scale signatures)
f(Ii1!im (x,s)) f(Ii1!im (x¢,s))
Slide credit: Krystian Mikolajczyk
Automatic scale selection
• Function responses to different scales (scale signatures)
f (Ii1!im (x,s)) f (Ii1!im (x¢,s¢))
Slide credit: Krystian Mikolajczyk
Automatic scale selection
• Normalize:Rescaletoapredefinedsize
f (Ii1!im (x,s)) f (Ii1!im (x¢,s¢))
Slide credit: Tinne Tuytelaars
Automatic scale selection
• Standardprocedure:
• Formasignaturefunctionaroundtheselectedkey-point.
f
Image 1 Assume a scaled f image is observed,
e.g.,scale = 1⁄2
s2 = 1⁄2 s1
Image 2
• Takethelocalmaximumofthesignaturefunction.
• Note:Maximumcorrespondstoascaleinvariantregion.
Important: the region size (scale) is determined independently in each image for each key-point!
s1
Region size
s2 Region size
Recall: Edge detection
f
d2 dx2 g
d 2
f * dx2 g
Edge
Second derivative of Gaussian (Laplacian)
Edge = zero crossing of second derivative
Source: S. Seitz
From edges to blobs
• Edge = ripple
• Blob = superposition of two ripples
Spatial selection: the magnitude of the Laplacian response will achieve a maximum at the center of the blob, provided the scale of the Laplacian is “matched” to the scale of the blob
maximum
Slide credit: Lana Lazebnik
What is a useful signature function?
• Blob detection – find regions that look like “spots”
• Laplacian of Gaussian (LoG):
Circular-symmetric operator for blob detection…
Blob detection IS scale selection
• Laplacian of Gaussian = blob detector
Note the maximal Response at different scales!
img1 img2 img3
Slide credit: Bastian Leibe
Filter scales
Characteristic scale
• ThecharacteristicscaleisthescaleatwhichtheLoG filter yields a maximum response.
x
characteristic scale
T. Lindeberg “Feature detection with automatic scale selection.” International Journal of Computer Vision 30 (2): pp 77—116, 1998.
filter „diameter“
Laplacian
• Key-points:
• Local maxima in scale space of the LoG filter.
Lxx (s ) + Lyy (s )
s2
s
of Gaussian (
LoG
s5 s4
)
s3
Laplacian pyramid!
Slide adapted from Krystian Mikolajczyk
Laplacian
• Key-points:
• Local maxima in scale
of Gaussian (
LoG
s5
)
space of the LoG filter.
s4 s3
Lxx (s ) + Lyy (s )
s2
s
Compare LoG at each point to its 8+9+9 neighbors (same scale + upper/lower scale.)
Slide adapted from Krystian Mikolajczyk
Laplacian
• Key-points:
• Local maxima in scale
of Gaussian (
LoG
s5
)
space of the LoG filter.
s4 s3
Lxx (s ) + Lyy (s )
s2
s
Compare LoG at each point to its 8+9+9 neighbors (same scale + upper/lower scale.)
Slide adapted from Krystian Mikolajczyk
Laplacian
• Key-points:
• Local maxima in scale
of Gaussian (
LoG
s5
)
space of the LoG filter.
s4 s3
Lxx (s ) + Lyy (s )
s2
Þ List (x, y, σ)
See an example…
s
Compare LoG at each point to its 8+9+9 neighbors (same scale + upper/lower scale.)
Slide adapted from Krystian Mikolajczyk
LoG
detector in action
Input image
Slide credit: Svetlana Lazebnik
LoG
detector in action
LoG filtered image (varying sigma)
Slide credit: Svetlana Lazebnik
LoG
detector in action
Local maxima across scales
Slide credit: Svetlana Lazebnik
Technical details
• TheLoGcanbewellapproximatedwithadifference of Gaussians at different values of 𝜎.
(normalized Laplacian of Gaussian)
(Difference of Gaussians)
Slide credit:B. Leibe
Difference of Gaussians (
• Difference of Gaussians is an approximation of the LoG • Used in the popular D. Lowe’s SIFT
• Advantages
• Does not require computation of second derivatives
• Results of Gaussian filtering already calculated during calculation of image resizing (Gaussian Pyramid!).
DoG
)
–
=
How to efficiently localize the key-points using DoG?
Slide credit:B. Leibe
DoG
pyramid
–
Efficient calculation
•
Calculated from a Gaussian pyramid (sequential octaves equivalent to
filtering with
)
Subsample by step size 𝜎4 =2
𝑘!𝜎 𝑘#𝜎
𝑘”𝜎 𝑘!𝜎
𝑘𝜎
𝜎 𝑘=2
Reference image
1 s = 24
David G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, 60, 2 (2004), pp. 91-110
Slide adapted from Krystian Mikolajczyk
s intervals
Key
• Find local maxima of DoG in the scale-space.
• Check 8+2*9=26 neighbors • Remove the low contrast
points (threshold dependent)
• If local change in response is
small compared to neighbors.
• Remove points detected at the edges
• Test using the Hessian matrix.
–
point localization using
DoG
Key-point candidates: List of triplets (x,y,σ)
Fit a quadratic function to each key- point and its neighbors to improve localization of the maxima (x,y ,σ).
Slide credit: David Lowe
Results: Lowe‘s
DoG
–
based detector
B. Leibe
Results: Lowe‘s
DoG
–
based detector
David G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, 60, 2 (2004), pp. 91-110
(a) 233×189 image
(b) 832 extremes in DoG
(c) 729 remain after contrast verification
(d) 536 remain after verification of the Hessian matrices.
Slide credit: David Lowe
Summary:
• Input:Twoimagesofthesamescenetakenat significantly different scales.
• Goal:Findstablekey-pointsindependentlyineach image.
• Solution:Findmaximaofspecializedfunctionsin scale-space and image coordinates.
• Twostrategies
• LaplacianofGaussian(LoG)
• DifferenceofGaussians(DoG)asanefficientapproximation
scale
–
invariant key
–
point detection
Slide credit: B. Leibe
Local features: main components
1) Detection: Identify the interest points
2) Description:Extractvectorx =[x(1),!,x(1)]
feature descriptor surrounding each interest point.
3) Matching: Determine correspondence between descriptors in two views
11d
Slide credit: Kristen Grauman
x =[x(2),!,x(2)] 21d
39
Geometric transformations
Slide credit: Kristen Grauman
e.g. scale, translation, rotation40
Photometric transformations
Figure from T. Tuytelaars ECCV 2006 tutorial
41
Local descriptors
• The simplest descriptor: a vector of region intensities.
• Analyze the invariance of such descriptor…
• Small shifts may cause a large change in the descriptor.
• Sensitive to photometric changes.
Descriptor: SIFT
• Scale Invariant Feature Transform:
• Split region into 4×4 sub-regions: 16 cells
• Calculate gradients on each pixel and smooth over a few neighbors.
• In each cell calculate a histogram of gradient orientations (8 directions)
• Each point contributes proportionally to its gradient magnitude
• The contribution is weighted by a Gaussian centered at the region center
• Descriptor (Stack histograms into a vector and normalize): 4x4x8 = 128 dim SIFT
Actually, there are a few important subtle details:
David G. Lowe. “Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp. 91-110, 2004.
Slide adapted from Svetlana Lazebnik
Invariance:
Orientation normalization
• •
• •
The SIFT from previous slide is not rotation-invariant. Calculate the histogram of orientations
• 36 bins by angle, each point contributes proportionally to its gradient magnitude and distance from the center.
Determine the dominant orientation from histogram Normalize: rotate gradients into a rectified orientation
Calculate the SIFT using these rectified gradients.
Gradient orientation histogram
Some keypoint Region considered
0
2p 45
Invariance:
• The SIFT from previous slide is not rotation-invariant.
• Normalize: rotate gradients into a rectified orientation
• Find all orientations in histogram, whose amplitude is, e.g., 80% of the strongest bin.
• Form a separate SIFT for each detected orientation. Why subpatches? To preserve spatial structure
Why does SIFT have some illumination invariance? Gradients
Orientation normalization
Gradient orientation histogram
Some keypoint Region considered
0
2p 45 80
SIFT descriptor [Lowe 2004]
• Extraordinarily robust matching technique
• Can handle changes in viewpoint
• Up to about 60 degree out of plane rotation
• Can handle significant changes in illumination • Sometimes even day vs. night (below)
• Fast and efficient—can run in real time
• Lots of code available
• http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_implementations_of_SIFT
Steve Seitz
46
Example
NASA Mars Rover images
47
Example
NASA Mars Rover images with SIFT feature matches Figure by Noah Snavely
48
SIFT descriptor properties
Invariant to
• Scale
• Rotation
Partially invariant to
• Illumination changes • Camera viewpoint
• Occlusion, clutter
49
What makes SIFT descriptor invariant?
• Scale: Finding scale that gives local maxima of some function in both position and scale
• Rotation: Rotating patch according to its dominant gradient orientation
• Illumination: Using gradients of sub-patches
• Translations: Histograms for robustness to small shifts and translations
50
Affine adaptation
• We have addressed invariance to • Translation
• Scale
• Rotation
• But that’s not enough for very large changes in viewpoint
• We require an affine adaptation!
Slide credit: Tinne Tuytelaars
Affine adaptation
• Problem:
• Determine the characteristic shape of local region.
• Assumption: shape described by an affine local window.
• Solution: iterative approach
• In circular window calculate a gradient covariance matrix (similar to Harris)
• Estimate an ellipse from the covariance matrix
• Using the new window calculate the new covariance matrix and iterate.
K. Mikolajczyk and C. Schmid, Scale and affine invariant interest point detectors, IJCV 60(1):63-86, 2004.
Slide credit: Svetlana Lazebnik
Affine adaptation: Example
Detect blobs accross scales
Slide credit: Svetlana Lazebnik
Affine adaptation: Example
Affine-adapted regions
Slide credit: Svetlana Lazebnik
Affine normalization
rotate scale
• Steps
• Rotate the elipse into a horizontal position
• Scale the region such that the elipse transforms into a circle
(Note: Rotation+Scaling computed from eigen vectors and eigen values )
(
deskewing
)
Slide credit: Tinne Tuytelaars
Summary:
• For each key-point determine the affine adaptation, and calculate the descriptor from the rectified region.
Determine the affine Remove the Form a descriptor region Normalize region rotational ambiguity from normalized region
Affine invariance
SIFT (Lowe ’04)
Slide credit: Svetlana Lazebnik
De-skew
Local features: main components
1) Detection: Identify the interest points
2) Description:Extract vector
feature descriptor surrounding each interest point.
3) Matching: Determine correspondence between descriptors in two views
Slide credit: Kristen Grauman
57
Correspondences using
• Compare keypoints by calculating the Euclidian distance (𝐿! norm) between descriptors.
left image right image
• Strategy 1: For each keypoint in the left image identify the most similar keypoint in the right image.
• Result: potential (putative) matches/correspondences
keypoints
Correspondences using
• Strategy 2: Keep only symmetric matches
left image right image
keypoints
Definition of a symmetric match:
• “Let point A be a point in the left image and point B its match in the right image. If B is most similar to A among all points in the right image and vice versa, they form a symmetric match.”
Correspondences using
• Strategy3:CalculatethesimilarityofAtothemostsimilar keypoint and the second-most similar keypoint in the right image.
• Ratioofthesetwosimilaritieswillbehighfordistinctivekey- points and low for non-distinctive ones.
• Threshold ~0.8 gives good results with SIFT.
keypoints
David G. Lowe. “Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp. 91-110, 2004.
Back to stitching example
• Detectkey-pointsindependentlyineachimage
• Determinepotentialcorrespondences
• Rejectimprobablecorrespondencesbystrategy1,2,or3 • PerformRANSACtofindtheinliersandfitmodel
Affine
• Widebase-linestereo
• Trackingandmotiondetection • Panoramas
• Mobilerobotnavigation
• 3Dreconstruction
• Recognition
•…
–
adapted descriptors: Applications
Slide credit: Kristen Grauman
Numerous descriptors exist
• Wehaveonlyconsideredamostpopulardescriptor (SIFT, Lowe2004)
• NotethatLoweproposedDoGforkeypointdetectionand SIFT for descriptor – don’t mix these!
• SIFThasbeenpatentedbyLowe
• However,manyefficientandreallyfastdescriptors
have been presented since.
• Manykeypointdetectorsalsoexistandcanbe combined with the descriptors.
• Seeamplerelatedworkfordetails.
Implementations…
• ExcellentimplementationsforMatlab/C http://www.vlfeat.org/
• Averygoodtutorialonfeature-points:
• Modernfeatures:advances,applications,andsoftware
• Moretutorialsonconferencepagesoverlastyears ECCV, ICCV, CVPR, ACCV, BMVC:
• E.g.:http://eccv2012.unifi.it/program/tutorials/
References
• David A. Forsyth, Jean Ponce, Computer Vision: A Modern Approach (2nd Edition), (prva izdaja dostopna na spletu)
• R. Szeliski,Computer Vision: Algorithms and Applications, 2010
• R. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, 2nd Edition, Cambridge University Press, 2004
• D. Lowe, Distinctive image features from scale-invariant keypoints, IJCV 60(2), pp. 91-110, 2004
• T. Tuytelaars, K. Mikolajczyk, Local Invariant Feature Detectors: A Survey, Foundations and Trends in Computer Graphics and Vision, Vol. 3, No. 3, pp 177-280, 2008.
Summary
Interest point detection
• Harris corner detector
• Laplacian of Gaussian, automatic scale selection
Invariant descriptors
• Rotation according to dominant gradient direction
• Histograms for robustness to small shifts and translations (SIFT descriptor)
66
Thank you
Hyung Jin Chang
28/02/2021