Feature-Based Methods
Matching based on sparse set of features within each image.
Detect interest points in both images
Find corresponding pairs of points:
for each interest point in image one
for each interest point in search area of other image
compute similarity between features
repeat for all interest points in image two within a search area.
select the interest point in image two that maximizes the similarity measure
repeat for each interest point in image one
Computer Vision / Mid-Level Vision / Correspondence 12
Feature-Based Methods: performance
Advantages
−
− Less computationally expensive than correlation-based methods (only need to match selected locations rather than every pixel)
Disadvantages
− Provides sparse correspondence map
•
•
− Only suitable when “good” interest points can be extracted from the scene
• •
Relatively insensitive to illumination and size changes
if correspondence required at intermediate locations this needs to be interpolated
sufficient for many applications
doesn’t perform well with textured/random regions choice of interest points is important
Computer Vision / Mid-Level Vision / Correspondence 13
Feature-Based Methods: interest points 1. Need to detect the same point independently in both images
We need a repeatable detector
Interest points should be invariant to image scaling, rotation, translation (caused by changing 3D camera viewpoint) and to change in illumination
2. Need to correctly recognize cor?responding points in both images We need a distinctive descriptor
Feature descriptors should be sufficiently complex so that corresponding points can be correctly matched with high probability.
Computer Vision / Mid-Level Vision / Correspondence 14
Feature-Based Methods: interest points
Hard to localize
Low information content BAD interest point
Easy to localize
High information content GOOD interest point
Computer Vision / Mid-Level Vision / Correspondence
15
Interest points: corners
Corners provide repeatable points that are well suited for matching.
A corner = a point where two edges meet
Hence, in the region around a corner, the intensity gradient is high in two directions
“flat” region: no change in all directions
“edge”:
no change along the edge direction
“corner”: significant change in all directions
Computer Vision / Mid-Level Vision / Correspondence
16
Corner Detection
Compute intensity gradients (Ix and Iy) in x and y directions by convolving image with derivative of Gaussian masks
Sum gradients over a small neighbourhood (Gaussian window) around each pixel to generate Hessian matrix H
H = ∑ I 2x ∑ I x I y 2 ∑ I x I y ∑ I 2y
Find eigenvalues, λ1 and λ2, of H.
Eigenvalues correspond to maximum slope of intensity gradient at two orthogonal directions.
Corner is where min(λ1 , λ2) > threshold
Computer Vision / Mid-Level Vision / Correspondence 17 1
edge
2 >>1
1 and 2 are large,
1 /2 ≈ 1
corner
flat
1 and
2 are small
edge
1 >>2
Harris corner detector
Avoids calculating eigenvalues. Defines a measure, R, as:
R=det(H)−k(trace(H))2 R=[∑I2x∑I2y−(∑Ix Iy)2]−k[∑I2x+∑I2y]2
(k = 0.04-0.06 found to work empirically) 2
R is large for a corner R is negative with large
magnitude for an edge
|R| is small for a flat region
Hence,
Corner is where R > threshold
Take the points of local maxima of R as interest points
Computer Vision / Mid-Level Vision / Correspondence 18 1
edge
R<0
corner
R>0
flat
|R| small
edge
R<0
Harris corner detector
Take the points of local maxima of R as interest points.
Use non-maximum suppression:
For each pixel, if it has a neighbour with a larger R value, set its value to 0.
1
2
2
2
0
1
4
3
0
2
2
2
0
1
1
0
0
0
0
0
0
0
4
0
0
0
0
0
0
0
0
0
Non-maximum suppression is commonly used in computer vision, not just for interest point detection
Computer Vision / Mid-Level Vision / Correspondence 19
Harris corner detector: algorithm
Computer Vision / Mid-Level Vision / Correspondence 20
Harris corner detector: example
Two images
Computer Vision / Mid-Level Vision / Correspondence 21
Harris corner detector: example
Corner response R red=+ve, black=-ve
Computer Vision / Mid-Level Vision / Correspondence 22
Harris corner detector: example
Thresholded R
Computer Vision / Mid-Level Vision / Correspondence 23
Harris corner detector: example
Local maxima of R
Computer Vision / Mid-Level Vision / Correspondence 24
Interest points: corners
Harris corner detector is translation and rotation invariant
Eigenvectors rotate, but eigenvalues (and R values) remain the same.
Harris corner detector is partly invariant to changes in illumination and to changes in viewpoint.
Harris corner detector is not scale invariant
Points originally classified as edges become corners at coarser scale.
Computer Vision / Mid-Level Vision / Correspondence
25
Interest points: scale invariant
To overcome sensitivity to scale, we can perform corner detection across a range of scales using an image pyramid.
Harris-Laplacian
Find local maximum of:
• Harris corner detector in space and scale
scale
y
An alternative algorithm for scale invariant interest point detection is the Scale Invariant Feature Transform (SIFT).
x
SIFT
Find local maximum of:
Difference of Gaussians in space and scale
scale
y
●
Computer Vision / Mid-Level Vision / Correspondence
26
x
Interest points: scale invariant
Both the Harris-Laplacian and SIFT algorithms search for the maxima of suitable functions (interest point detectors) in scale and in space
This will enable us to find the same interest points independently in two images which differ in scale
Computer Vision / Mid-Level Vision / Correspondence 28
SIFT: interest point detection
• Convolve image with Difference of Gaussians (DoG) mask
• Repeat for different image resolutions (i.e. create a Laplacian image pyramid)
• Detect maxima and minima of difference-of- Gaussian across scale space (x selected if larger or smaller than all 26 neighbours in 3x3x3 neighbourhood)
233x189 image
832 DoG extrema
Computer Vision / Mid-Level Vision / Correspondence
29
SIFT: interest point detection
• •
Keep points with high contrast.
Keep points with sufficient structure: approach similar to Harris corner detector but using ratio of Trace and Determinant of Hessian matrix.
traceH2 r12 detH r
(r = 10 found to work empirically)
729 left after peak value
threshold
536 left after testing ratio of principle curvatures
Computer Vision / Mid-Level Vision / Correspondence
30
Feature-Based Methods
Matching based on sparse set of features within each image.
Detect interest points in both images Harris corner detector, or
SIFT detector
Find corresponding pairs of points
Requires a measure of similarity between points.
Need a “descriptor” (a list of features) for each interest point.
Computer Vision / Mid-Level Vision / Correspondence 31
Harris: feature descriptor and matching
I1
I2
Descriptor: a small window around the interest point (i.e. a set of pixel intensity values).
Similarity measure: Euclidean distance, SSD, SAD, etc.
Robust to translation, but not to rotation, scale, changes in viewpoint or illumination.
Computer Vision / Mid-Level Vision / Correspondence 32
SIFT: feature descriptor
Descriptor: the SIFT algorithm specifies a method for deriving a set of features for each interest point.
Step 1: Calculate the orientation and magnitude of the intensity gradient at all pixels surrounding the interest point.
This is done using the Gaussian smoothed image at the scale where the interest point was found.
Magnitude and orientation approximated using pixel differences.
interest point
mag= I −I 2I −I 2
ori=tan−1I y1−I y2
x1 x2 y1 y2
Iy1
Ix2
Ix1
Iy2
Ix1−Ix2
Computer Vision / Mid-Level Vision / Correspondence 33
SIFT: feature descriptor
Step 2: Create a histogram of all the orientations around the interest point.
Each sample added to the histogram is weighted by its gradient magnitude and a Gaussian centred on the interest point.
Find dominant orientation, by finding peak in histogram.
Rotate all orientations so that dominant orientation points up.
Computer Vision / Mid-Level Vision / Correspondence 34
SIFT: feature descriptor
Step 3: Create separate histograms of all the orientations in 4x4 sub-windows around the interest point.
Each sample added to each histogram is weighted by its gradient magnitude and a Gaussian centred on the interest point.
Use 8 orientation bins in each histogram.
The descriptor for each interest point is therefore a 4x4x8 = 128 element vector.
This vector is normalised to unit length.
Computer Vision / Mid-Level Vision / Correspondence 35
SIFT: feature descriptor and matching
Descriptor: 128 element vector of intensity gradient orientations around the interest point.
Similarity measure: Euclidean distance between vectors.
Robust to translation, rotation, scale, changes in viewpoint and illumination.
Computer Vision / Mid-Level Vision / Correspondence 36