Midterm Review
Midterm Review
CPSC 425: Computer Vision
( unless otherwise stated slides are taken or adopted from Bob Woodham, Jim Little and Fred Tung )
Overview: Image Formation, Cameras and Lenses
The image formation process that produces a particular image depends on
— Lightening condition
— Scene geometry
— Surface properties
— Camera optics
Sensor (or eye) captures amount of light reflected from the object
source
surface
element
normal
sensor
eye
!2
Camera Obscura (latin for “dark chamber”)
Reinerus Gemma-Frisius observed an eclipse of the sun at Louvain on January
24, 1544. He used this illustration in his book, “De Radio Astronomica et
Geometrica,” 1545. It is thought to be the first published illustration of a camera
obscura.
Credit: John H., Hammond, “Th Camera Obscure, A Chronicle”
!3
Camera Obscura (latin for “dark chamber”)
Reinerus Gemma-Frisius observed an eclipse of the sun at Louvain on January
24, 1544. He used this illustration in his book, “De Radio Astronomica et
Geometrica,” 1545. It is thought to be the first published illustration of a camera
obscura.
Credit: John H., Hammond, “Th Camera Obscure, A Chronicle”
principles behind the pinhole camera or camera obscura were first
mentioned by Chinese philosopher Mozi (Mo-Ti) (470 to 390 BCE)
!3
Pinhole Camera (Simplified)
x’
x
z
f’
image
plane
pinhole object
f’ is the focal length of the camera
Note: In a pinhole camera we can adjust the focal length, all this will do is change the size of the resulting image
!4
x’
x
z
f’
image
plane
pinhole object
f’
x’
image
plane
Pinhole Camera (Simplified)
It is convenient to think of the image plane which is in from of the pinhole
What happens if object moves towards the camera? Away from the camera?
!5
Perspective Projection
P =
2
4
x
y
z
3
5
P
0 =
x
0
y
0
� x0 = f 0
x
z
y
0 = f 0
y
z
Forsyth & Ponce (1st ed.) Figure 1.4
Note: this assumes world coordinate frame at the optical center (pinhole) and aligned with the image plane, image
coordinate frame aligned with the camera coordinate frame
3D object point
P =
2
4
x
y
z
3
5
P
0 =
x
0
y
0
� x0 = f 0
x
z
y
0 = f 0
y
z
P =
2
4
x
y
z
3
5
P
0 =
x
0
y
0
� x0 = f 0
x
z
y
0 = f 0
y
z
projects to 2D image point where
!6
Summary of Projection Equations
P =
2
4
x
y
z
3
5
P
0 =
x
0
y
0
� x0 = f 0
x
z
y
0 = f 0
y
z
P =
2
4
x
y
z
3
5
P
0 =
x
0
y
0
� x0 = f 0
x
z
y
0 = f 0
y
z
projects to 2D image point 3D object point where
Perspective
Weak Perspective
Orthographic
!7
Sample Question: Image Formation
True of false: A pinhole camera uses an orthographic projection.
!8
Why Not a Pinhole Camera?
— If pinhole is too big then many directions
are averaged, blurring the image
— If pinhole is too small then diffraction
becomes a factor, also blurring the image
— Generally, pinhole cameras are dark,
because only a very small set of rays from a
particular scene point hits the image plane
— Pinhole cameras are slow, because only a
very small amount of light from a particular
scene point hits the image plane per unit time
Image Credit: Credit: E. Hecht. “Optics,” Addison-Wesley, 1987
!9
Pinhole Model (Simplified) with Lens
x’
x
z
image
plane
objectlens
z’
!10
Vignetting
Image Credit: Cambridge in Colour
!11
Chromatic Aberration
— Index of refraction depends on wavelength, λ, of light
— Light of different colours follows different paths
— Therefore, not all colours can be in equal focus
Image Credit: Trevor Darrell
!12
Lens Distortion
Szeliski (1st ed.) Figure 2.13
Fish-eye Lens
Lines in the world are no longer lines on the image, they are curves!
!13
Sample Question: Cameras and Lenses
True of false: Snell’s Law describes how much light is reflected and how much
passes through the boundary between two materials.
!14
Compute the new pixel value, , as the sum of values, where
each value is the product of the original pixel value in and the
corresponding values in the filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )
For a give and , superimpose the filter on the image centered at I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Linear Filters
!15
Linear Filter Example
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!16
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!17
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!18
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!19
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!20
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!21
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!22
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!23
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!24
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!25
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!26
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!27
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!28
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!29
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!30
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!31
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!32
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!33
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!34
Linear Filter Example
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y )image output
filter
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!35
1. Ignore these locations: Make the computation undefined for the top and
bottom k rows and the leftmost and rightmost k columns
2. Pad the image with zeros: Return zero whenever a value of I is required
at some position outside the defined limits of X and Y
3. Assume periodicity: The top row wraps around to the bottom row; the
leftmost column wraps around to the rightmost column
Linear Filters: Boundary Effects
Three standard ways to deal with boundaries:
!36
— The correlation of and is:
!37
— Visual interpretation: Superimpose the filter on the image at ,
perform an element-wise multiply, and sum up the values
— Convolution is like correlation except filter “flipped”
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5
if then correlation = convolution.F (X,Y ) = F (�X,�Y )
!37
I 0(X,Y ) =
kX
j=�k
kX
i=�k
F (I, J)I(X + i, Y + j)
filter image (signal)output
Linear Filters
Linear System: Characterization Theorem
Any linear, shift invariant operation can be expressed as a convolution
!38
Linear System: Characterization Theorem
Any linear, shift invariant operation can be expressed as a convolution
!39
(`if and only if’ result)
Smoothing with a Gaussian
!40
Idea: Weight contributions of pixels by spatial proximity (nearness)
2D Gaussian (continuous case):
G�(x, y) =
1
2⇡�
2
exp
� x
2+y2
2�2
Forsyth & Ponce (2nd ed.)
Figure 4.2
!41
σ σσσ σσσσ
68%
99.99%
99.7%
95%
Gaussian: Area Under the Curve
Efficient Implementation: Separability
A 2D function of x and y is separable if it can be written as the product of two
functions, one a function only of x and the other a function only of y
Both the 2D box filter and the 2D Gaussian filter are separable
Both can be implemented as two 1D convolutions:
— First, convolve each row with a 1D filter
— Then, convolve each column with a 1D filter
— Aside: or vice versa
The 2D Gaussian is the only (non trivial) 2D function that is both separable and
rotationally invariant.
!42
— Convolution is symmetric. That is,
Linear Filters: Additional Properties
!43
Let denote convolution. Let be a digital image. Let F and G be
digital filters
⌦ k F1 F2 F I(X,Y )
(F1 + F2)⌦ I(X,Y ) = F1 ⌦ I(X,Y ) + F2 ⌦ I(X,Y )
(kF )⌦ I(X,Y ) = F ⌦ (kI(X,Y )) = k(F ⌦ I(X,Y ))
⌦ k F1 F2 F I(X,Y )
(F1 + F2)⌦ I(X,Y ) = F1 ⌦ I(X,Y ) + F2 ⌦ I(X,Y )
(kF )⌦ I(X,Y ) = F ⌦ (kI(X,Y )) = k(F ⌦ I(X,Y ))
G⌦ (F ⌦ I(X,Y )) = (G⌦ F )⌦ I(X,Y )
(G⌦ F )⌦ I(X,Y ) = (F ⌦G)⌦ I(X,Y )
— Convolution is associative. That is,
Convolving with filter F and then convolving the result with filter G can
be achieved in single step, namely convolving with filter G⌦ F = F ⌦G
⌦ k F1 F2 F I(X,Y )
(F1 + F2)⌦ I(X,Y ) = F1 ⌦ I(X,Y ) + F2 ⌦ I(X,Y )
(kF )⌦ I(X,Y ) = F ⌦ (kI(X,Y )) = k(F ⌦ I(X,Y ))
⌦ k F1 F2 F I(X,Y )
(F1 + F2)⌦ I(X,Y ) = F1 ⌦ I(X,Y ) + F2 ⌦ I(X,Y )
(kF )⌦ I(X,Y ) = F ⌦ (kI(X,Y )) = k(F ⌦ I(X,Y ))
(G⌦ F )⌦ I(X,Y ) = (G⌦ F )⌦ I(X,Y )
Bilateral Filter
An edge-preserving non-linear filter
Like a Gaussian filter:
— The filter weights depend on spatial distance from the center pixel
— Pixels nearby (in space) should have greater influence than pixels far away
Unlike a Gaussian filter:
— The filter weights also depend on range distance from the center pixel
— Pixels with similar brightness value should have greater influence than pixels
with dissimilar brightness value
!44
!45
Gaussian filter: weights of neighbor at a spatial offset away from the
center pixel given by:
G�(x, y) =
1
2⇡�
2
exp
� x
2+y2
2�2
=
✓
1
p
2⇡�
exp
� x
2
2�2
◆✓
1
p
2⇡�
exp
� y
2
2�2
◆
I(X,Y )
(x, y)
Bilateral filter: weights of neighbor at a spatial offset away from the center
pixel given by a product:
exp
� x
2+y2
2�2
d
exp
� (I(X+x,Y +y)�I(X,Y ))
2
2�2
r
(x, y)
I(X,Y )
(with appropriate normalization)
(with appropriate normalization)
domain
kernel
range
kernel
Bilateral Filter
Bilateral Filter Application: Denoising
!46
Noisy Image Gaussian Filter Bilateral Filter
Slide Credit: Alexander Wong
Sample Question: Filters
What does the following 3 × 3 linear, shift invariant filter compute when applied
to an image?
!47
2
4
�1 �1 �1
0 0 0
1 1 1
3
5
Denote the image as a function, , where and are spatial variables
Aside: The convention for this section is to use lower case letters for the
continuous case and upper case letters for the discrete case
!48
Continuous Case
x
y
i(x,y)
i(x, y) i(x, y)
i(x, y)
Discrete Case
!49
i(x,y)
x
y
Idea: Superimpose (regular) grid on continuous image
Sample the underlying continuous image according to the tessellation
imposed by the grid
i(x,y)
x
y
pixel
Discrete Case
!50
Each grid cell is called a picture element (pixel)
Denote the discrete image as
We can store the pixels in a matrix or array
I(X,Y )
It is clear that some information may be lost when we work on a discrete pixel grid.
!51
Sampling
Forsyth & Ponce (2nd ed.) Figure 4.7
It is clear that some information may be lost when we work on a discrete pixel grid.
!52
Sampling
Forsyth & Ponce (2nd ed.) Figure 4.7
It is clear that some information may be lost when we work on a discrete pixel grid.
!53
Sampling
Forsyth & Ponce (2nd ed.) Figure 4.7
It is clear that some information may be lost when we work on a discrete pixel grid.
!54
Sampling
Forsyth & Ponce (2nd ed.) Figure 4.7
It is clear that some information may be lost when we work on a discrete pixel grid.
!55
Sampling
Forsyth & Ponce (2nd ed.) Figure 4.7
Exact reconstruction requires constraint on the rate at which can change
between samples
— “rate of change” means derivative
— the formal concept is bandlimited signal
— “bandlimit” and “constraint on derivative” are linked
An image is bandlimited if it has some maximum spatial frequency
A fundamental result (Sampling Theorem) is:
For bandlimited signals, if you sample regularly at or above twice the
maximum frequency (called the Nyquist rate), then you can reconstruct
the original signal exactly
!56
Sampling Theory
i(x, y)
Sometimes undersampling is unavoidable, and there is a trade-off between
“things missing” and “artifacts.”
Medical imaging: usually try to maximize information content, tolerate some
artifacts
Computer graphics: usually try to minimize artifacts, tolerate some
information missing
!57
Sampling Theory
!58
Template Matching
Slide Credit: Kristen Grauman
We can think of convolution/correlation as comparing a template (the filter)
with each local image patch.
— Consider the filter and image patch as vectors.
— Applying a filter at an image location can be interpreted as computing the
dot product between the filter and the local image patch.
!59
Template Matching
We can think of convolution/correlation as comparing a template (the filter)
with each local image patch.
— Consider the filter and image patch as vectors.
— Applying a filter at an image location can be interpreted as computing the
dot product between the filter and the local image patch.
!60
Template Matching
0 0
0 0 0
0
1
1 1
Template
We can think of convolution/correlation as comparing a template (the filter)
with each local image patch.
— Consider the filter and image patch as vectors.
— Applying a filter at an image location can be interpreted as computing the
dot product between the filter and the local image patch.
!61
Template Matching
0 0
0 0 0
0
1
1 1
0 0
0 0 0
0
1
1 1
0 0
0
0
00
1
1 1
TemplateImage
Patch 1
Image
Patch 2
We can think of convolution/correlation as comparing a template (the filter)
with each local image patch.
— Consider the filter and image patch as vectors.
— Applying a filter at an image location can be interpreted as computing the
dot product between the filter and the local image patch.
!62
Template Matching
0 0
0 0 0
0
1
1 1
0 0
0 0 0
0
1
1 1
0 0
0 0 0
0
1
1 1
0 0
0
0
00
1
1 1
TemplateImage
Patch 1
Image
Patch 2
elem
ent
mult
iply
0 0
0 0 0
0
1
0 0
We can think of convolution/correlation as comparing a template (the filter)
with each local image patch.
— Consider the filter and image patch as vectors.
— Applying a filter at an image location can be interpreted as computing the
dot product between the filter and the local image patch.
!63
Template Matching
0 0
0 0 0
0
1
1 1
0 0
0 0 0
0
1
1 1
0 0
0 0 0
0
1
1 1
0 0
0
0
00
1
1 1
TemplateImage
Patch 1
Image
Patch 2
elem
ent
mult
iply
0 0
0 0 0
0
1
0 0
= 3
= 1
We can think of convolution/correlation as comparing a template (the filter)
with each local image patch.
— Consider the filter and image patch as vectors.
— Applying a filter at an image location can be interpreted as computing the
dot product between the filter and the local image patch.
!64
Template Matching
0 0
0 0 0
0
1
1 1
0 0
0 0 0
0
1
1 1
0 0
0 0 0
0
1
1 1
0 0
0
0
00
1
1 1
TemplateImage
Patch 1
Image
Patch 2
elem
ent
mult
iply
0 0
0 0 0
0
1
0 0
= 3
= 1 ⇥255⇥255
We can think of convolution/correlation as comparing a template (the filter)
with each local image patch.
— Consider the filter and image patch as vectors.
— Applying a filter at an image location can be interpreted as computing the
dot product between the filter and the local image patch.
!65
Template Matching
0 0
0 0 0
0
1
1 1
0 0
0 0 0
0
1
1 1
0 0
0 0 0
0
1
1 1
0 0
0
0
00
1
1 1
TemplateImage
Patch 1
Image
Patch 2
elem
ent
mult
iply
0 0
0 0 0
0
1
0 0
= 3
= 1 ⇥255⇥255
The dot product may be large simply because the image region is bright.
We need to normalize the result in some way.
Let and be vectors. Let be the angle between them. We know
where · is dot product and | | is vector magnitude
Correlation is a dot product
Correlation measures similarity between the filter and each local image region
Normalized correlation varies between −1 and 1
Normalized correlation attains the value 1 when the filter and image region are
identical (up to a scale factor)
!66
Template Matching
cos ✓ =
a · b
|a||b|
=
a · b
p
(a · a)(b · b)
=
a
|a|
b
|b|
a b ✓
!67
Template Matching
Slide Credit: Kristen Grauman
Example 1:
!68
Credit: W. Freeman et al., “Computer Vision for Interactive Computer Graphics,”
IEEE Computer Graphics and Applications, 1998
Template (left), image (middle),
normalized correlation (right)
Note peak value at the true
position of the hand
Sample Question: Template Matching
True or false: Normalized correlation is robust to a constant scaling in the
image brightness.
!69
Scaled Representations: Goals
to find template matches at all scales
— template size constant, image scale varies
— finding hands or faces when we don’t know what size they are in the image
efficient search for image–to–image correspondences
— look first at coarse scales, refine at finer scales
— much less cost (but may miss best match)
to examine all levels of detail
— find edges with different amounts of blur
— find textures with different spatial frequencies (i.e., different levels of detail)
!70
!71
Shrinking the Image
Forsyth & Ponce (2nd ed.) Figure 4.12-4.14 (top rows)
Image Pyramid
An image pyramid is a collection of representations of an image. Typically,
each layer of the pyramid is half the width and half the height
of the previous layer.
In a Gaussian pyramid, each layer is smoothed by a Gaussian filter and
resampled to get the next layer
!72
Gaussian Pyramid
!73
Forsyth & Ponce (2nd ed.) Figure 4.17
!74
From Template Matching to Local Feature Detection
Slide Credit: Li Fei-Fei, Rob Fergus, and Antonio Torralba
Recall, for a 2D (continuous) function, f(x,y)
Differentiation is linear and shift invariant, and therefore can be implemented as
a convolution
A (discrete) approximation is
!75
@f
@x
= lim
✏!0
f(x+ ✏, y)� f(x, y)
✏
@f
@x
⇡
F (X + 1, y)� F (x, y)
�x
Estimating Derivatives
@f
@x
= lim
✏!0
f(x+ ✏, y)� f(x, y)
✏
@f
@x
⇡
F (X + 1, y)� F (x, y)
�x
�1 1
A similar definition (and approximation) holds for
Image noise tends to result in pixels not looking exactly like their neighbours,
so simple “finite differences” are sensitive to noise.
The usual way to deal with this problem is to smooth the image prior to
derivative estimation.
!76
@f
@y
Estimating Derivatives
What Causes Edges?
!77
What causes an edge?
• Depth discontinuity
• Surface orientation
discontinuity
• Reflectance
discontinuity (i.e.,
change in surface
material properties)
• Illumination
discontinuity (e.g.,
shadow)
Slide credit: Christopher Rasmussen
Slide Credit: Christopher Rasmussen
Smoothing and Differentiation
Edge: a location with high gradient (derivative)
Need smoothing to reduce noise prior to taking derivative
Need two derivatives, in x and y direction
We can use derivative of Gaussian filters
— because differentiation is convolution, and
— convolution is associative
Let denote convolution
!78
D ⌦ (G⌦ I(X,Y )) = (D ⌦G)⌦ I(X,Y )
⌦
Gradient Magnitude
Let be a (digital) image
Let and be estimates of the partial derivatives in the and
directions, respectively.
Call these estimates and (for short) The vector is the gradient
The scalar is the gradient magnitude
!79
I(X,Y )
I
x
(X,Y ) Iy(X,Y )
[I
x
, I
y
]I
x
Iy
q
I2
x
+ I2
y
x
y
The gradient direction is given by: ✓ = tan�1
✓
I
y
I
x
◆
Two Generic Approaches for Edge Detection
!80
r
r
r
i(r)
d i(r)
dr
d2i(r)
dr2
x
y
r
Two generic approaches to edge point detection:
— (significant) local extrema of a first derivative operator
— zero crossings of a second derivative operator
Marr / Hildreth Laplacian of Gaussian
A “zero crossings of a second derivative operator” approach
Steps:
1. Gaussian for smoothing
2. Laplacian ( ) for differentiation where
3. Locate zero-crossings in the Laplacian of the Gaussian ( ) where
!81
r2
r2f(x, y) =
@
2
f(x, y)
@x
2
+
@
2
f(x, y)
@y
2
r2G
r2G(x, y) =
�1
2⇡�
4
2�
x
2
+ y
2
�
2
�
exp
� x
2+y2
2�2
Here’s a 3D plot of the Laplacian of the Gaussian ( )
. . . with its characteristic “Mexican hat” shape
!82
Marr / Hildreth Laplacian of Gaussian
r2G
Canny Edge Detector
Steps:
1. Apply directional derivatives of Gaussian
2. Compute gradient magnitude and gradient direction
3. Non-maximum suppression
— thin multi-pixel wide “ridges” down to single pixel width
4. Linking and thresholding
— Low, high edge-strength thresholds
— Accept all edges over low threshold that are connected to edge over high
threshold
!83
Edge Hysteresis
One way to deal with broken edge chains is to use hysteresis
Hysteresis: A lag or momentum factor
Idea: Maintain two thresholds and
— Use khigh to find strong edges to start edge chain
— Use klow to find weak edges which continue edge chain
Typical ratio of thresholds is (roughly):
!84
k
high
k
low
= 2
khigh k
low
!85
“Divide the image into some number of segments, where the segments
represent ’things’ or ’parts of things’ in the scene. The number of segments is
up to you, as it depends on the image. Something between 2 and 30 is likely to
be appropriate. It is important that all of the segments have approximately equal
importance.”
(Martin et al. 2004)
How do humans perceive boundaries?
!86
Figure Credit: Szeliski Fig. 4.31. Original: Martin et al. 2004
Each image shows multiple (4-8) human-marked boundaries. Pixels are darker
where more humans marked a boundary.
How do humans perceive boundaries?
Sample Question: Edges
Why is non-maximum suppression applied in the Canny edge detector?
!87
What is a corner?
We can think of a corner as any locally distinct 2D image feature that (hopefully)
corresponds to a distinct position on an 3D object of interest in the scene.
!88
Image Credit: John Shakespeare, Sydney Morning Herald
Why are corners distinct?
!89
A corner can be localized reliably.
Thought experiment:
— Place a small window over a patch of constant image value.
If you slide the window in any direction, the image in the
window will not change.
“corner”:
significant change
in all directions
— Place a small window over an edge. If you slide the window in the direction of
the edge, the image in the window will not change
→ Cannot estimate location along an edge (a.k.a., aperture problem)
— Place a small window over a corner. If you slide the window in any direction,
the image in the window changes.
Image Credit: Ioannis (Yannis) Gkioulekas (CMU)
Autocorrelation
!90
Szeliski, Figure 4.5
Corner Detection
Edge detectors perform poorly at corners
Observations:
— The gradient is ill defined exactly at a corner
— Near a corner, the gradient has two (or more) distinct values
!91
Harris Corner Detection
!92
1.Compute image gradients over
small region
2.Compute the covariance matrix
3.Compute eigenvectors and
eigenvalues
4.Use threshold on eigenvalues to
detect corners
Slide Adopted: Ioannis (Yannis) Gkioulekas (CMU)
2. Compute the covariance matrix (a.k.a. 2nd moment matrix)
!93
Sum over small region
around the corner
Gradient with respect to x, times
gradient with respect to y
Matrix is symmetric
C =
Harris Corner Detection
— Filter image with Gaussian
— Compute magnitude of the x and y gradients at each pixel
— Construct C in a window around each pixel
— Harris uses a Gaussian window
— Solve for product of the λ’s
— If λ’s both are big (product reaches local maximum above threshold) then we
have a corner
— Harris also checks that ratio of λs is not too high
!94
!95
Harris corners
• Originally developed as features for motion tracking
• Greatly reduces amount of computation compared to
tracking every pixel
• Translation and rotation invariant (but not scale invariant)
Harris Corner Detection
Properties: NOT Invariant to Scale Changes
!96
edge!
corner!
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)
Sample Questions: Corners
The Harris corner detector is stable under some image transformations
(features are considered stable if the same locations on an object are typically
selected in the transformed image).
True or false: The Harris corner detector is stable under image blur.
!97
Texture
We will look at two main questions:
1. How do we represent texture?
→ Texture analysis
2. How do we generate new examples of a texture?
→ Texture synthesis
!98
Texture Synthesis
Why might we want to synthesize texture?
1. To fill holes in images (inpainting)
— Art directors might want to remove telephone wires. Restorers might want to
remove scratches or marks.
— We need to find something to put in place of the pixels that were removed
— We synthesize regions of texture that fit in and look convincing
2. To produce large quantities of texture for computer graphics
— Good textures make object models look more realistic
!99
Texture Synthesis
!100
Szeliski, Fig. 10.49
!101
Texture Synthesis
Photo Credit: Associated Pres
Infinite sample image
SAMPLE
p
— What is conditional probability distribution of p, given the neighbourhood
window?
— Directly search the input image for all such neighbourhoods to produce a
histogram for p
— To synthesize p, pick one match at random
!102
Efros and Leung: Synthesizing One Pixel
Infinite sample image
SAMPLE
p
— Since the sample image is finite, an exact neighbourhood match might not
be present
— Find the best match using SSD error, weighted by Gaussian to emphasize
local structure, and take all samples within some distance from that match
!103
Efros and Leung: Synthesizing One Pixel
For multiple pixels, “grow” the texture in layers
— In the case of hole-filling, start from the edges of the hole
For an interactive demo, see
https://una-dinosauria.github.io/efros-and-leung-js/
(written by Julieta Martinez, a previous CPSC 425 TA)
!104
Efros and Leung: Synthesizing Many Pixels
https://una-dinosauria.github.io/efros-and-leung-js/
Randomness Parameter
!105
Slide Credit: http://graphics.cs.cmu.edu/people/efros/research/NPS/efros-iccv99.ppt
http://graphics.cs.cmu.edu/people/efros/research/NPS/efros-iccv99.ppt
!106
“Big Data” Meets Inpainting
Original Image Input
Figure Credit: Hays and Efros 2007
!107 Figure Credit: Hays and Efros 2007
Scene MatchesInput Output
“Big Data” Meets Inpainting
Algorithm sketch (Hays and Efros 2007):
1. Create a short list of a few hundred “best matching” images based on global
image statistics
2. Find patches in the short list that match the context surrounding the image
region we want to fill
3. Blend the match into the original image
Purely data-driven, requires no manual labeling of images
!108
“Big Data” Meets Inpainting
!109
Credit: Bill Freeman
Compare textures and decide if they’re mae of the same “stuff”
Goal of Texture Analysis
Texture Representation
Observation: Textures are made up of generic sub-elements, repeated over a
region with similar statistical properties
Idea: Find the sub-elements with filters, then represent each point in the image
with a summary of the pattern of sub-elements in the local region
!110
Texture Representation
Observation: Textures are made up of generic sub-elements, repeated over a
region with similar statistical properties
Idea: Find the sub-elements with filters, then represent each point in the image
with a summary of the pattern of sub-elements in the local region
Question: What filters should we use?
Answer: Human vision suggests spots and oriented edge filters at a variety of
different orientations and scales
!111
Texture Representation
Observation: Textures are made up of generic sub-elements, repeated over a
region with similar statistical properties
Idea: Find the sub-elements with filters, then represent each point in the image
with a summary of the pattern of sub-elements in the local region
Question: What filters should we use?
Answer: Human vision suggests spots and oriented edge filters at a variety of
different orientations and scales
Question: How do we “summarize”?
Answer: Compute the mean or maximum of each filter response over the region
— Other statistics can also be useful
!112
Texture Representation
!113
Figure Credit: Leung and Malik, 2001
Spots and Bars (Fine Scale)
!114
Forsyth & Ponce (1st ed.) Figures 9.3–9.4
Spots and Bars (Coarse Scale)
!115
Forsyth & Ponce (1st ed.) Figures 9.3 and 9.5
!116
Laplacian Pyramid
Laplacian Pyramid
Building a Laplacian pyramid:
— Create a Gaussian pyramid
— Take the difference between one Gaussian pyramid level and the next
(before subsampling)
Properties
— Also known as the difference-of-Gaussian (DOG) function, a close
approximation to the Laplacian
— It is a band pass filter – each level represents a different band of spatial
frequencies
Reconstructing the original image:
— Reconstruct the Gaussian pyramid starting at top
!117
Constructing a Laplacian Pyramid
!118
repeat:
filter
subsample
until min resolution reached
Algorithm
compute residual
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)
Reconstructing the Original Image
!119
repeat:
upsample
until orig resolution reached
Algorithm
sum with residual
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)
Gaussian vs Laplacian Pyramid
!120
Shown in opposite
order for space
Which one takes
more space to
store?
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)
Oriented Pyramids
!121
Forsyth & Ponce (1st ed.) Figure 9.13
Final Texture Representaation
Steps:
1. Form a Laplacian and oriented pyramid (or equivalent set of responses to
filters at different scales and orientations)
2. Square the output (makes values positive)
3. Average responses over a neighborhood by blurring with a Gaussian
4. Take statistics of responses
— Mean of each filter output
— Possibly standard deviation of each filter
!122
Sample Question: Texture
How does the top-most image in a Laplacian pyramid differ from the others?
!123