程序代写代做代考 AI Fundamentals of Computer Vision

Fundamentals of Computer Vision
Lecture

• Harris corner detector.
• Multi-scale detection.
• Why do we need feature descriptors?
• Designing feature descriptors.
Overview of today’s lecture

Slide credits
Most of these slides were adapted from:
• Kris Kitani (15-463, Fall 2016), Ioannis Gkioulekas (16-385, Spring 2019). Some slides were inspired or taken from:
• Fredo Durand (MIT).
• James Hays (Georgia Tech).

Harris corner detection: the math
Consider shifting the window p by (u,v)
• how do the pixels in p change?
• compare each pixel before and after by summing up the squared differences (SSD)
• this defines an SSD “error” E(u,v):
p

Error function
Change of intensity for the shift [u,v]:
Error Window function function
Window function w(x,y) =
Shifted intensity
Intensity
1 in window, 0 outside
Gaussian
or

Bilinear approximation
For small shifts [u,v] we have a ‘bilinear approximation’:
Change in appearance for a shift [u,v]
where M is a 2×2 matrix computed from image derivatives:
‘second moment’ matrix ‘structure tensor’
M

Visualization as an ellipse
Since M is symmetric, we have
We can visualize M as an ellipse with axis lengths determined by
the eigenvalues and orientation determined by R direction of the
Ellipse equation:
fastest change
(lmax)-1/2
(lmin)-1/2
direction of the slowest change

interpreting eigenvalues
l2
edge
l2 >> l1
l1 ~ l2
corner
‘horizontal’
flat
l1 >> l2 ‘vertical’
edge
l1

Use threshold on eigenvalues to detect corners

l2
Use threshold on eigenvalues to detect corners
flat
Think of a function to score ‘cornerness’
l1

l2
Use threshold on eigenvalues to detect corners
flat
strong corner
Think of a function to score ‘cornerness’
l1

Use threshold on eigenvalues to detect corners
l2
R<0 R>0
^

corner
flat
R<0 l1 Interpreting the eigenvalues Classification of image points using eigenvalues of M: l2 “Edge” l2 >> l1
“Corner”
l1 and l2 are large, l1 ~ l2;
E increases in all directions
“Edge”
l1 >> l2
“Flat” region
l1 and l2 are small;
E is almost constant in all directions
l1

Harris & Stephens (1988)
Kanade & Tomasi (1994)
Nobel (1998)

Harris Detector
C.Harris and M.Stephens. “A Combined Corner and Edge Detector.”1988. 1. Computexandyderivativesofimage
2. Computeproductsofderivativesateverypixel
3. Computethesumsoftheproductsofderivativesat each pixel

4.
Harris Detector
C.Harris and M.Stephens. “A Combined Corner and Edge Detector.”1988. Define the matrix at each pixel
Compute the response of the detector at each pixel
Threshold on value of R; compute non-max suppression.
5.
6.

Corner Response Example
Harris R score.
Ix, Iy computed using Sobel operator Windowing function w = Gaussian, sigma=1

Corner Response Example
Threshold: R < -10000 (edges) Corner Response Example Threshold: > 10000 (corners)

Corner response

Thresholded corner response

Non-maximal suppression

Harris Detector: Invariance Properties
• Rotation
Ellipse rotates but its shape (i.e. eigenvalues) remains the same
Corner response is invariant to image rotation

Harris Detector: Invariance Properties
• Affine intensity change: I ® aI + b üOnly derivatives are used =>
invariance to intensity shift I ® I + b üIntensity scale: I ® a I
RR
threshold
x (image coordinate) x (image coordinate)
Partially invariant to affine intensity change

The Harris detector is not invariant to changes in …

Harris Detector: Invariance Properties
• Scaling
Corner
All points will be classified as edges
Not invariant to scaling

Multi-scale detection

How can we automatically select the scale?

How can we make a feature detector scale-invariant?

Intuitively…
Find local maxima in both position and scale
f
Image 1
f
Image 2
region size
• Design a function on the region which is “scale invariant”
(has the same shape even if the image is resized)
• Take a local maximum of this function
s1
s2
region size

Formally…
Laplacian filter
Highest response when the signal has the same characteristic scale as the filter

characteristic scale – the scale that produces peak filter response
characteristic scale
we need to search over characteristic scales

Blob detection in 2D: scale selection
• Laplacian-of-Gaussian = “blob” detector Ñ2 g = ¶2 g + ¶2 g ¶x2 ¶y2
Bastian Leibe img1 img2 img3
filter scales

Locating Blobs in Scale Space
Scale-space = stack of images convolved with LoG/DoG filters at varying scales (sigma values). Extremal points (local maxima or minima) in this volume determine the location and scale/size of light and dark blobs in the image.
Space (1D slice)
DoG Filters at different scales
Scale

Scale (sigma)
Locate Key Points
DoG level L-1 DoG level L DoG level L+1
Space
Find maxima and minima in an LoG (or DoG) scale space. The 2D location of the extremal point tells the location of the key point. The scale level of the extremal point tells the canonical scale of the image patch surrounding the key point. Pixel x is an extremal point if it is either larger or smaller in value than the 26 pixels surrounding it.

What happens if you apply different Laplacian filters?
Full size 3/4 size

peak!

peak!

optimal scale
2.1 4.2 6.0 9.8 15.5 17.0
Full size image
2.1 4.2 6.0 9.8 15.5 17.0
3/4 size image

optimal scale
2.1
4.2 6.0 9.8 15.5 17.0
maximum response
Full size image
4.2 6.0 9.8 15.5 17.0
maximum response
2.1
3/4 size image

local maximum
4.2
cross-scale maximum
local maximum
6.0
local maximum
9.8

How would you implement scale selection?

implementation
For each level of the Gaussian pyramid
compute feature response (e.g. Harris, Laplacian)
if local maximum and cross-scale save scale and location of feature

References
Basic reading:
• Szeliski textbook, Sections 4.1.

Feature descriptors
We know how to detect good points Next question: How to match them?
?
Answer: Come up with a descriptor for each point, find similar descriptors between the two images
• State of the art approach: SIFT
• David Lowe, UBC http://www.cs.ubc.ca/~lowe/keypoints/

• Geometric Rotation
Image transformations
Scale
Photometric
Intensity change

Invariance vs. discriminability
• Invariance:
• Descriptor shouldn’t change even if image is transformed
• Discriminability:
• Descriptor should be highly unique for each point

Designing feature descriptors

Image patch
Just use the pixel values of the patch
1
2
3
4
5
6
7
8
9
(5) vector of intensity values
1
2
3
4
6
7
8
9
Perfectly fine if geometry and appearance is unchanged (a.k.a. template matching)

Image patch
Just use the pixel values of the patch
1
2
3
4
5
6
7
8
9
(5) vector of intensity values
1
2
3
4
6
7
8
9
Perfectly fine if geometry and appearance is unchanged (a.k.a. template matching)
What are the problems?

Image patch
Just use the pixel values of the patch
1
2
3
4
5
6
7
8
9
(5) vector of intensity values
1
2
3
4
6
7
8
9
Perfectly fine if geometry and appearance is unchanged (a.k.a. template matching)
What are the problems?
How can you be less sensitive to absolute intensity values?

Image gradients
Use pixel differences
1
2
3
4
5
6
7
8
9
()
vector of x derivatives ‘binary descriptor’

+
+


+
Feature is invariant to absolute intensity values
What are the problems?

Image gradients
Use pixel differences
1
2
3
4
5
6
7
8
9
()
vector of x derivatives

+
+


+
Feature is invariant to absolute intensity values
What are the problems?
How can you be less sensitive to deformations?

Color histogram
Count the colors in the image using a histogram
What are the problems?
colors
Invariant to changes in scale and rotation

Color histogram
Count the colors in the image using a histogram
What are the problems?
colors
Invariant to changes in scale and rotation

Color histogram
Count the colors in the image using a histogram
What are the problems?
How can you be more sensitive to spatial layout?
colors
Invariant to changes in scale and rotation

Spatial histograms
Compute histograms over spatial ‘cells’
What are the problems?
Retains rough spatial layout Some invariance to deformations

Spatial histograms
Compute histograms over spatial ‘cells’
What are the problems?
How can you be completely invariant to rotation?
Retains rough spatial layout Some invariance to deformations

Orientation normalization
Use the dominant image gradient direction to normalize the orientation of the patch
save the orientation angle along with
What are the problems? Multi-Scale?

MOPS Descriptor

Multi-Scale Oriented Patches (MOPS)
Multi-Image Matching using Multi-Scale Oriented Patches. M. Brown, R. Szeliski and S. Winder. International Conference on Computer Vision and Pattern Recognition (CVPR2005). pages 510-517

Multi-Scale Oriented Patches (MOPS)
Multi-Image Matching using Multi-Scale Oriented Patches. M. Brown, R. Szeliski and S. Winder. International Conference on Computer Vision and Pattern Recognition (CVPR2005). pages 510-517
Given a feature
Get 40 x 40 image patch, subsample every 5th pixel
(what’s the purpose of this step?)
Subtract the mean, divide by standard deviation
(what’s the purpose of this step?)
Haar Wavelet Transform
(what’s the purpose of this step?)

Multi-Scale Oriented Patches (MOPS)
Multi-Image Matching using Multi-Scale Oriented Patches. M. Brown, R. Szeliski and S. Winder. International Conference on Computer Vision and Pattern Recognition (CVPR2005). pages 510-517
Given a feature
Get 40 x 40 image patch, subsample every 5th pixel
(low frequency filtering, absorbs localization errors)
Subtract the mean, divide by standard deviation
(what’s the purpose of this step?)
Haar Wavelet Transform
(what’s the purpose of this step?)

Multi-Scale Oriented Patches (MOPS)
Multi-Image Matching using Multi-Scale Oriented Patches. M. Brown, R. Szeliski and S. Winder. International Conference on Computer Vision and Pattern Recognition (CVPR2005). pages 510-517
Given a feature
Get 40 x 40 image patch, subsample every 5th pixel
(low frequency filtering, absorbs localization errors)
Subtract the mean, divide by standard deviation
(removes bias and gain)
Haar Wavelet Transform
(what’s the purpose of this step?)

Multi-Scale Oriented Patches (MOPS)
Multi-Image Matching using Multi-Scale Oriented Patches. M. Brown, R. Szeliski and S. Winder. International Conference on Computer Vision and Pattern Recognition (CVPR2005). pages 510-517
Given a feature
Get 40 x 40 image patch, subsample every 5th pixel
(low frequency filtering, absorbs localization errors)
Subtract the mean, divide by standard deviation
(removes bias and gain)
Haar Wavelet Transform (low frequency projection)

SIFT

SIFT
(Scale Invariant Feature Transform)
SIFT describes both a detector and descriptor
1. Multi-scale extrema detection
2. Keypoint localization
3. Orientation assignment
4. Keypoint descriptor

1. Multi-scale extrema detection
Gaussian Difference of Gaussian (DoG)
First octave
Second octave

Gaussian
Laplacian

Scale-space extrema
Selected if larger than all 26 neighbors
Difference of Gaussian (DoG)
Scale of Gaussian variance

Keypoint descriptor
Scale Invariant Feature Transform
Basic idea:
• Take 16×16 square window around detected feature
• Compute edge orientation (angle of the gradient – 90°) for each pixel
• Throw out weak edges (threshold gradient magnitude)
• Create histogram of surviving edge orientations
0 2p angle histogram
Adapted from slide by David Lowe

SIFT descriptor
Full version
• Divide the 16×16 window into a 4×4 grid of cells (2×2 case shown below)
• Compute an orientation histogram for each cell
• 16 cells * 8 orientations = 128 dimensional descriptor
Adapted from slide by David Lowe

Properties of SIFT
Extraordinarily robust matching technique • Can handle changes in viewpoint
• Up to about 60 degree out of plane rotation • Can handle significant changes in illumination
• Sometimes even day vs. night (below) • Fast and efficient—can run in real time
• Lots of code available
• http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_im plementations_of_SIFT