Fundamentals of Computer Vision
Lecture
Overview of today’s lecture
• Leftover from previous lecture : RANSAC.
• Some motivational imaging experiments.
• Pinhole camera.
• Accidental pinholes.
• Camera matrix.
• Perspective.
• Other camera models.
• Pose estimation.
Slide credits
Most of these slides were adapted from:
• Kris Kitani (15-463, Fall 2016), Ioannis Gkioulekas (16-385, Spring 2019), Robert Colin (454, Fall 2019s).
Some slides were inspired or taken from: • Fredo Durand (MIT).
Given two images…
find matching features (e.g., SIFT) and a translation transform
Matched points will usually contain bad correspondences
good correspondence
how should we estimate the transform?
LLS will find the ‘average’ transform
‘average’ transform
solution is corrupted by bad correspondences
Use RANSAC
How many correspondences to compute translation transform?
Need only one correspondence, to find translation model
Pick one correspondence, count inliers
one correspondence
Pick one correspondence, count inliers
2 inliers
Pick one correspondence, count inliers
one correspondence
Pick one correspondence, count inliers
5 inliers
Pick one correspondence, count inliers
5 inliers
Pick the model with the highest number of inliers!
Estimating homography using RANSAC
• RANSAC loop
1. Get four point correspondences (randomly) 2. Compute H (DLT)
3. Count inliers
4. Keep if largest number of inliers
• Recompute H using all inliers
Estimating homography using RANSAC
• RANSAC loop
1. Get four point correspondences (randomly) 2. Compute H using DLT
3. Count inliers
4. Keep if largest number of inliers
• Recompute H using all inliers
Estimating homography using RANSAC
• RANSAC loop
1. Get four point correspondences (randomly) 2. Compute H using DLT
3. Count inliers
4. Keep if largest number of inliers
• Recompute H using all inliers
Estimating homography using RANSAC
• RANSAC loop
1. Get four point correspondences (randomly) 2. Compute H using DLT
3. Count inliers
4. Keep H if largest number of inliers
• Recompute H using all inliers
Estimating homography using RANSAC
• RANSAC loop
1. Get four point correspondences (randomly) 2. Compute H using DLT
3. Count inliers
4. Keep H if largest number of inliers
• Recompute H using all inliers
Feature matching and homography estimation Do both simultaneously using RANSAC.
RANSAC pros and cons
• Pros
• Simple and general
• Applicable to many different problems
• Often works well in practice
• Cons
• Lots of parameters to tune
• Doesn’t work well for low inlier ratios (too many iterations, or can fail completely)
• Can’t always get a good initialization of the model based on the minimum number of samples
Geometric camera models
Let’s say we have a sensor…
digital sensor (CCD or CMOS)
… and an object we like to photograph
real-world object
digital sensor (CCD or CMOS)
What would an image taken like this look like?
Bare-sensor imaging
real-world object
digital sensor (CCD or CMOS)
Bare-sensor imaging
real-world object
digital sensor (CCD or CMOS)
Bare-sensor imaging
real-world object
digital sensor (CCD or CMOS)
Bare-sensor imaging
real-world object
digital sensor (CCD or CMOS)
All scene points contribute to all sensor pixels
What does the image on the sensor look like?
Bare-sensor imaging
All scene points contribute to all sensor pixels
Let’s add something to this scene
barrier (diaphragm)
pinhole (aperture)
real-world object
digital sensor (CCD or CMOS)
What would an image taken like this look like?
Pinhole imaging
most rays are blocked
real-world object
digital sensor (CCD or CMOS)
one makes it through
Pinhole imaging
most rays are blocked
real-world object
digital sensor (CCD or CMOS)
one makes it through
Pinhole imaging
real-world object
digital sensor (CCD or CMOS)
Each scene point contributes to only one sensor pixel
What does the image on the sensor look like?
Pinhole imaging
real-world object
copy of real-world object (inverted and scaled)
Pinhole camera
Pinhole camera a.k.a. camera obscura
Pinhole camera terms
barrier (diaphragm)
pinhole (aperture)
real-world object
digital sensor (CCD or CMOS)
Pinhole camera terms
barrier (diaphragm)
pinhole (aperture)
camera center (center of projection)
image plane
real-world object
digital sensor (CCD or CMOS)
Focal length
real-world object
focal length f
Focal length
What happens as we change the focal length?
real-world object
focal length 0.5 f
Focal length
What happens as we change the focal length?
real-world object
focal length 0.5 f
Focal length
What happens as we change the focal length? object projection is half the size
real-world object
focal length 0.5 f
Pinhole size
real-world object
pinhole diameter
Ideal pinhole has infinitesimally small size • In practice that is impossible.
What happens as we change the pinhole diameter?
Pinhole size
real-world object
pinhole diameter
Pinhole size
What happens as we change the pinhole diameter?
real-world object
Pinhole size
What happens as we change the pinhole diameter?
real-world object
Pinhole size
What happens as we change the pinhole diameter? object projection becomes blurrier
real-world object
real-world object
pinhole diameter
What about light efficiency?
•
What is the effect of doubling the pinhole diameter? 2x pinhole diameter → 4x light
focal length f
real-world object
pinhole diameter
What about light efficiency?
•
What is the effect of doubling the focal length? 2x focal length → 1⁄4x light
focal length f
The lens camera
Lenses map “bundles” of rays from How does this mapping work exactly? points on the scene to the sensor.
The pinhole camera
Central rays propagate in the same way for both models!
Important difference: focal length
In a pinhole camera, focal length is distance between aperture and sensor
focal length f
Important difference: focal length
In a lens camera, focal length is distance where parallel rays intersect
object distance D focal length f
focus distance D’
Describing both lens and pinhole cameras
We can derive properties and descriptions that hold for both camera models if:
• We use only central rays.
• We assume the lens camera is in focus.
• We assume that the focus distance of
the lens camera is equal to the focal length of the pinhole camera.
Remember: focal length f refers to different things for lens and pinhole cameras.
• In this lecture, we use it to refer to the
aperture-sensor distance, as in the pinhole camera case.
Camera matrix
The camera as a coordinate transformation
3D object
3D to 2D transform (camera)
2D image
A camera is a mapping from: the 3D world
to:
a 2D image
2D image
2D to 2D transform (image warping)
The camera as a coordinate transformation
homogeneous coordinates
A camera is a mapping from:
the 3D world to:
a 2D image
2D image point
camera matrix
3D world point
What are the dimensions of each variable?
The camera as a coordinate transformation
homogeneous camera homogeneous image coordinates matrix world coordinates 3×1 3×4 4×1
The pinhole camera
image plane
real-world object
camera center
focal length f
real-world object
The (rearranged) pinhole camera
image plane
camera focal length f center
The (rearranged) pinhole camera
image plane
camera center
principal axis
• Principal axis: line from the camera center perpendicular to the image plane
• Normalized (camera) coordinate system: camera center is at the origin and the principal axis is the z-axis
The (rearranged) pinhole camera
image plane
camera center
principal axis
What is the equation for image coordinate x in terms of X?
The 2D view of the (rearranged) pinhole camera
image plane
What is the equation for image coordinate x in terms of X?
Basic Perspective Projection
Scene Point
Perspective Projection Eqns
(X,Y,Z)
Image Point
(x,y,f)
Z
X Y
y
xyZ y=fYZ
x=f
X Z
Y X
f
O
derived via similar triangles rule
O.Camps, PSU
ff ZZ
X
y
Y
x
x
The 2D view of the (rearranged) pinhole camera
image plane
The (rearranged) pinhole camera
image plane
camera center
What is the camera matrix P for a pinhole camera?
principal axis
The pinhole camera matrix
Relationship from similar triangles: General camera model:
𝑓𝑋 𝑓𝑌 𝑍
What does the pinhole camera projection look like?
The pinhole camera matrix
Relationship from similar triangles: General camera model:
𝑓𝑋 𝑓𝑌 𝑍
What does the pinhole camera projection look like?
Generalizing the camera matrix
In general, the camera and image have different coordinate systems.
world point
image point
Generalizing the camera matrix
In particular, the camera origin and image origin may be different:
image plane
image coordinate system
How does the camera matrix change?
camera coordinate system
Generalizing the camera matrix
In particular, the camera origin and image origin may be different:
image plane
image coordinate system
How does the camera matrix change?
camera coordinate system
shift vector transforming camera origin to image origin
Camera matrix decomposition
We can decompose the camera matrix like this:
What does each part of the matrix represent?
Camera matrix decomposition
We can decompose the camera matrix like this:
(homogeneous) transformation from 2D to 2D, accounting for not unit focal length and origin shift
Also written as:
(homogeneous) projection from 3D to 2D, assuming image plane at z = 1 and shared camera/image origin
where
v
COP
y
A Tale of Two Coordinate Systems
w
Camera
u
Two important coordinate systems: 1. World coordinate system
2. Camera coordinate system
o
“The World”
x
z
Generalizing the camera matrix
In general, there are three, generally different, coordinate systems.
world point
image point
We need to know the transformations between them.
World-to-camera coordinate system transformation
Camera coordinate system
𝑿& 𝒘
tilde means
heterogeneous
coordinates
World coordinate system
World-to-camera coordinate system transformation
𝑿& 𝒘
Camera coordinate system
𝑪&
Coordinate of the camera center in the world coordinate frame
World coordinate system
World-to-camera coordinate system transformation
𝑿&𝒄 and 𝑿&𝒘 are not two different points. They are the same physical point, described in two different coordinate systems.
𝑿& 𝒄
𝑿& 𝒘
Why aren’t the points aligned?
Camera coordinate system
𝑪&
Translate by – 𝑪& (align origins)
𝑿& 𝒘 − 𝑪&
translate
World coordinate system
Coordinate of the camera center in the world coordinate frame
World-to-camera coordinate system transformation
𝑿& 𝒄
𝑿& 𝒘
points now coincide
Camera coordinate system
𝑪&
𝑹 ⋅ 𝑿& 𝒘 − 𝑪&
rotate translate
World coordinate system
3D Rotation of Points
Slide Credit: Saverese
Rotation around the coordinate axes
, counter-clockwise
z
é10 0ù
Rx(a)=êê0 cosa -sinaúú
ê0 sina cosaú ëû
écosb 0 sinbù
Ry(b)=êê0 1 0úú
ê-sinb 0 cosbú ëû
écosg -sing 0ù
Rz (g ) = êêsing cosg 0úú
ê0 0 1ú ëû
:
𝑿& 𝒄
y
g
𝑿& 𝒘
Slide source: Derek Hoiem
Modeling the coordinate system transformation
In heterogeneous coordinates, we have:
𝐗& 𝐜 = 𝐑 ⋅ 𝐗& 𝐰 − 𝐂3
How do we write this transformation in homogeneous coordinates?
Modeling the coordinate system transformation
In heterogeneous coordinates, we have:
𝐗& 𝐜 = 𝐑 ⋅ 𝐗& 𝐰 − 𝐂3
In homogeneous coordinates, we have: 3×3 3×1
𝐗&𝐰
or
X&5 = R −RC3 X&: 1011
𝑿 𝒄 = R − R C3 𝑿 𝒘 01
1×3