程序代写代做代考 Fundamentals of Computer Vision

Fundamentals of Computer Vision
Lecture

Overview of today’s lecture
• Feature matching
• Reminder: image transformations.
• 2D transformations.
• Projective geometry.
• Transformations in projective geometry.
• Classification of 2D transformations.

Slide credits
Most of these slides were adapted from:
• Kris Kitani (15-463, Fall 2016), Ioannis Gkioulekas (16-385, Spring 2019), Robert Colin (454, Fall 2019s).
Some slides were inspired or taken from:
• Fredo Durand (MIT).
• James Hays (Georgia Tech).

Hint for previous class

Formally…
Laplacian filter
Highest response when the signal has the same characteristic scale as the filter

Scale selection
• We want to find the characteristic scale of the blob by convolving it with Laplacians at several scales and looking for the maximum response
• However, Laplacian response decays as scale increases:
original signal (radius=8)
increasing σ
Why does this happen?

Scale normalization
• The response of a derivative of Gaussian filter to a perfect step edge decreases as σ increases
1 s 2p

Scale normalization
• The response of a derivative of Gaussian filter to a perfect step edge decreases as σ increases
• To keep response the same (scale-invariant), must multiply Gaussian derivative by σ
• Laplacian is the second Gaussian derivative, so it must be multiplied by σ2

Original signal
Effect of scale normalization
Unnormalized Laplacian response
Scale-normalized Laplacian response
maximum

Back to SIFT

Scale-Invariant Feature Transform (SIFT) descriptor
[Lowe, ICCV 1999]
Histogram of oriented gradients
• Captures important texture information
• Robust to small translations /
affine deformations
K. Grauman, B. Leibe

Computing gradients
L = the image intensity
• tan(α)= !””!#$%& #$(& )(*)+&,% #$(&

Gradients
m(x, y)
= sqrt(1 +
0) = 1
Θ(x,
y) = atan(0/1) =
0

Gradients
m
(x, y) = sqrt(0 + 1) = 1
Θ(x, y) = atan(1/0) = 90

Gradients
m(x,
y) = sqrt(1 +
1) = 1.
41
Θ(x,
y) = at
an(
1/1) = 45

Scale Invariant Feature Transform
Basic idea:
• Take 16×16 square window around detected feature
• Compute edge orientation (angle of the gradient – 90°) for each pixel
• Throw out weak edges (threshold gradient magnitude)
• Create histogram of surviving edge orientations
0 2p angle histogram
Adapted from slide by David Lowe

Full version
SIFT descriptor
• Divide the 16×16 window into a 4×4 grid of cells (2×2 case shown below)
• Compute an orientation histogram for each cell
• 16 cells * 8 orientations = 128 dimensional descriptor
Adapted from slide by David Lowe

Scale Invariant Feature Transform
Full version
• Divide the 16×16 window into a 4×4 grid of cells (2×2 case shown below)
• Quantize the gradient orientations i.e. snap each gradient to one of 8 angles
• Each gradient contributes not just 1, but magnitude(gradient) to the histogram, i.e. stronger gradients contribute more
• 16 cells * 8 orientations = 128 dimensional descriptor for each detected feature
• Normalize + clip (threshold normalize to 0.2) + normalize the descriptor
• After normalizing, we have:
0.2
such that:
Adapted from L. Zitnick, D. Lowe

Feature descriptors
• Recall: covariant detectors => invariant descriptors Detect regions Normalize regions
Compute appearance descriptors

Local features: main components
1) Detection:Identifytheinterest points
2) Description:Extractvector feature descriptor surrounding each interest point.
3) Matching:Determine correspondence between descriptors in two views
Slide credit: Kristen Grauman
20

Feature matching
Given a feature in I1, how to find the best match in I2?
1. Define distance function that compares two descriptors
2. Test all the features in I2, find the one with min distance

Normalized Correlation
When a scene is imaged by different sensors, or under different illumination intensities, both SSD and correlation can
be misleading for patches representing the same area in the scene!
•A solution is to NORMALIZE the pixels in both patches before correlating them by subtracting the mean of the patch intensities and dividing by a constant proportional to the standard deviation.
åi (ui -u)(vi -v)
æå öæ ö
r(u,v)=
å
èj øèj ø
çç (u -u)2 ÷÷çç j
(v -v)2 ÷÷ j

Understanding NCC
Important point about NCC:
Score values range from 1 (perfect match) to -1 (completely anti-
correlated)
Intuition: treating the normalized patches as vectors, we see they are unit vectors. Therefore, correlation becomes dot product of unit vectors, and thus must range between -1 and 1.
Consequence: determining whether a match is “good” or not becomes easier, because the range of score values is tightly bounded and well- understood.

Feature descriptors
• Simplest descriptor: vector of raw intensity values
• How to compare two such vectors?
• Sum of squared differences (SSD)
• Not invariant to intensity change • Normalized correlation
r(u,v)=
åi (ui -u)(vi -v)
æå öæ ö
2 SSD(u,v)=å(u -v)
ii i
• Invariant to affine intensity change
å
èj øèj ø
çç (u -u)2 ÷÷çç j
(v -v)2 ÷÷ j

Feature distance
How to define the difference between two features f1, f2?
• • •
Simple approach: L2 distance, ||f1 – f2 ||
Can give good scores to ambiguous (incorrect) matches At what SSD value do we have a good match?
f1 f2
I1 I2

Feature distance
How to define the difference between two features f1, f2?
• Better approach: ratio distance = ||f1 – f2 || / || f1 – f2’ ||
• f2 is best SSD match to f1 in I2
• f2’ is 2nd best SSD match to f1 in I2
• Calculate the distance to best match / distance to second best match
• If low, first match looks good.
• If high, could be ambiguous match.
f1 f2′ f2
I1 I2

Matching SIFT Descriptors
• How can we tell which putative matches are more reliable?
• Heuristic: compare distance of nearest neighbor to that of second nearest neighbor
• Ratio of closest distance to second-closest distance will be high for features that are not distinctive.
David G. Lowe. “Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp. 91-110, 2004.
Threshold of 0.8 provides good separation

Reminder: image transformations

Warping example: feature matching
Given a set of matched feature points:
point in one image
and a transformation:
point in the other image
transformation function
parameters find the best estimate of the parameters
What kind of transformation functions are there?

2D transformations

2D transformations
translation rotation aspect
affine perspective cylindrical

2D planar transformations

How would you implement scaling?
• Each component multiplied by a scalar • Uniform scaling – same scalar for each
component
Scale

Scale
• Each component multiplied by a scalar • Uniform scaling – same scalar for each
component
What’s the effect of using different scale factors?

Scale
matrix representation of scaling:
scaling matrix S
• Each component multiplied by a scalar • Uniform scaling – same scalar for each
component

How would you implement shearing?
Shear

Shear
or in matrix form:

How would you implement rotation?
rotation around the origin

rotation around the origin

𝑟
rotation around the origin
𝑟 φ
Polar coordinates…
x = r cos (φ)
y = r sin (φ)
x’ = r cos (φ + θ) y’ = r sin (φ + θ)
Trigonometric Identity…
x’ = r cos(φ) cos(θ) – r sin(φ) sin(θ) y’ = r sin(φ) cos(θ) + r cos(φ) sin(θ)
Substitute…
x’ = x cos(θ) – y sin(θ) y’ = x sin(θ) + y cos(θ)

or in matrix form:
rotation around the origin

2D planar and linear transformations
𝒙0 = 𝑓 𝒙; 𝑝
𝑥0 𝑦0
𝑥 𝑦
=𝑴 parameters𝑝
point 𝒙

2D planar and linear transformations
Scale
Rotate
Shear
Flip across y
Flip across origin
Identity

2D translation
How would you implement translation?

2D translation
What about matrix representation?

2D translation
What about matrix representation? Not possible.

Projective geometry

Homogeneous coordinates
heterogeneou homogeneous
s coordinates
coordinates
𝑥 𝑦 1
• Represent 2D point with a 3D vector
𝑥 𝑦
⇒
add a 1 here

Homogeneous coordinates
heterogeneou homogeneous s coordinates coordinates
𝑥 𝑎𝑥 ⇒ 𝑦 ≝ 𝑎𝑦
1𝑎
• Represent 2D point with a 3D vector
• 3D vectors are only defined up to scale
𝑥 𝑦

2D translation
What about matrix representation using homogeneous coordinates?

2D translation
What about matrix representation using heterogeneous coordinates?

2D translation using homogeneous coordinates

Conversion:
• heterogeneous → homogeneous
Homogeneous coordinates
• homogeneous → heterogeneous
• scale invariance

image point in pixel coordinates
Projective geometry
image plane
image point in homogeneous coordinates
X is a projection of a point P on the image plane

Transformations in projective geometry

2D transformations in heterogeneous coordinates
Re-write these transformations as 3×3 matrices:
translation
? rotation
? scaling
? shearing

2D transformations in heterogeneous coordinates
Re-write these transformations as 3×3 matrices:
translation scaling
?
rotation shearing
?

2D transformations in heterogeneous coordinates
Re-write these transformations as 3×3 matrices:
translation scaling
?
rotation shearing

2D transformations in heterogeneous coordinates
Re-write these transformations as 3×3 matrices:
translation scaling
rotation shearing

Transformations can be combined by matrix multiplication:
Matrix composition
p’=? ? ?p

Matrix composition
Transformations can be combined by matrix multiplication:
p’ = translation(tx,ty) rotation(θ) scale(s,s) p
Does the multiplication order matter?

Classification of 2D transformations

Degree of Freedom: DOF
Classification of 2D transformations
?
? ?
? ?

Classification of 2D transformations
Translation:
How many degrees of freedom?

Classification of 2D transformations
Euclidean (rigid): rotation + translation
How many degrees of freedom?

Classification of 2D transformations
what will happen to the image if this increases?
Euclidean (rigid): rotation + translation

Classification of 2D transformations
Similarity: uniform scaling + rotation + translation
multiply these four by scale s
How many degrees of freedom?

Classification of 2D transformations
Affine transform: uniform scaling + shearing + rotation + translation
similarity shear
How many degrees of freedom?

Affine transformations
Affine transformations are combinations of
• arbitrary linear transformations;; and translations results in (6-DOF) .
Properties of affine transformations:
• origin does not necessarily map to origin
• lines map to lines
• parallel lines map to parallel lines
• ratios are preserved
• compositions of affine transforms are also affine transforms

Projective transformations
Projective transformations are combinations of
• affine transformations;; and
• projective wraps
Properties of projective transformations:
• origin does not necessarily map to origin
• lines map to lines
• parallel lines do not necessarily map to parallel lines
• ratios are not necessarily preserved
• compositions of projective transforms are also projective transforms
How many degrees of freedom?

Projective transformations
Projective transformations are combinations of
• affine transformations;; and
• projective wraps
Properties of projective transformations:
• origin does not necessarily map to origin
• lines map to lines
8 DOF: vectors (and therefore matrices) are defined up to scale)
• parallel lines do not necessarily map to parallel lines
• ratios are not necessarily preserved
• compositions of projective transforms are also projective transforms

Degree of Freedom: DOF
Classification of 2D transformations

How to interpret projective transformations here?
image point in pixel coordinates
image plane
image point in heterogeneous coordinates
X is a projection of a point P on the image plane

Basic reading:
• Szeliski textbook, Section 3.6.
References
Additional reading:
• Hartley and Zisserman, “Multiple View Geometry in Computer Vision,” Cambridge University
Press 2004.
a comprehensive treatment of all aspects of projective geometry relating to computer
vision, and also a very useful reference for the second part of the class. • Richter-Gebert, “Perspectives on projective geometry,” Springer 2011.
a beautiful, thorough, and very accessible mathematics textbook on projective geometry (available online for free from CMU’s library).

Related Posts