CS计算机代考程序代写 chain algorithm Midterm Review

Midterm Review

Midterm Review

CPSC 425: Computer Vision

( unless otherwise stated slides are taken or adopted from Bob Woodham, Jim Little and Fred Tung )

Overview: Image Formation, Cameras and Lenses

The image formation process that produces a particular image depends on
— Lightening condition
— Scene geometry
— Surface properties
— Camera optics

Sensor (or eye) captures amount of light reflected from the object

source

surface
element

normal

sensor

eye

!2

Camera Obscura (latin for “dark chamber”)

Reinerus Gemma-Frisius observed an eclipse of the sun at Louvain on January
24, 1544. He used this illustration in his book, “De Radio Astronomica et
Geometrica,” 1545. It is thought to be the first published illustration of a camera
obscura.

Credit: John H., Hammond, “Th Camera Obscure, A Chronicle”
!3

Camera Obscura (latin for “dark chamber”)

Reinerus Gemma-Frisius observed an eclipse of the sun at Louvain on January
24, 1544. He used this illustration in his book, “De Radio Astronomica et
Geometrica,” 1545. It is thought to be the first published illustration of a camera
obscura.

Credit: John H., Hammond, “Th Camera Obscure, A Chronicle”

principles behind the pinhole camera or camera obscura were first
mentioned by Chinese philosopher Mozi (Mo-Ti) (470 to 390 BCE)

!3

Pinhole Camera (Simplified)

x’

x

z
f’

image
plane

pinhole object

f’ is the focal length of the camera

Note: In a pinhole camera we can adjust the focal length, all this will do is change the size of the resulting image

!4

x’

x

z
f’

image
plane

pinhole object

f’

x’

image
plane

Pinhole Camera (Simplified)
It is convenient to think of the image plane which is in from of the pinhole

What happens if object moves towards the camera? Away from the camera?
!5

Perspective Projection

P =

2

4
x

y

z

3

5
P

0 =


x

0

y

0

� x0 = f 0
x

z

y

0 = f 0
y

z

Forsyth & Ponce (1st ed.) Figure 1.4

Note: this assumes world coordinate frame at the optical center (pinhole) and aligned with the image plane, image
coordinate frame aligned with the camera coordinate frame

3D object point

P =

2

4
x

y

z

3

5
P

0 =


x

0

y

0

� x0 = f 0
x

z

y

0 = f 0
y

z

P =

2

4
x

y

z

3

5
P

0 =


x

0

y

0

� x0 = f 0
x

z

y

0 = f 0
y

z

projects to 2D image point where

!6

Summary of Projection Equations

P =

2

4
x

y

z

3

5
P

0 =


x

0

y

0

� x0 = f 0
x

z

y

0 = f 0
y

z

P =

2

4
x

y

z

3

5
P

0 =


x

0

y

0

� x0 = f 0
x

z

y

0 = f 0
y

z

projects to 2D image point 3D object point where

Perspective

Weak Perspective

Orthographic

!7

Sample Question: Image Formation

True of false: A pinhole camera uses an orthographic projection.

!8

Why Not a Pinhole Camera?

— If pinhole is too big then many directions
are averaged, blurring the image

— If pinhole is too small then diffraction
becomes a factor, also blurring the image

— Generally, pinhole cameras are dark,
because only a very small set of rays from a
particular scene point hits the image plane

— Pinhole cameras are slow, because only a
very small amount of light from a particular
scene point hits the image plane per unit time

Image Credit: Credit: E. Hecht. “Optics,” Addison-Wesley, 1987
!9

Pinhole Model (Simplified) with Lens

x’

x

z

image
plane

objectlens

z’

!10

Vignetting

Image Credit: Cambridge in Colour
!11

Chromatic Aberration
— Index of refraction depends on wavelength, λ, of light

— Light of different colours follows different paths

— Therefore, not all colours can be in equal focus

Image Credit: Trevor Darrell
!12

Lens Distortion

Szeliski (1st ed.) Figure 2.13

Fish-eye Lens

Lines in the world are no longer lines on the image, they are curves!
!13

Sample Question: Cameras and Lenses

True of false: Snell’s Law describes how much light is reflected and how much
passes through the boundary between two materials.

!14

Compute the new pixel value, , as the sum of values, where
each value is the product of the original pixel value in and the
corresponding values in the filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )

For a give and , superimpose the filter on the image centered at I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Linear Filters

!15

Linear Filter Example

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!16

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!17

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!18

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!19

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!20

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!21

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!22

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!23

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!24

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!25

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!26

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!27

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!28

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!29

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!30

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!31

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!32

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!33

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!34

Linear Filter Example
I(X,Y )

F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y )image output

filter

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)!35

1.  Ignore these locations: Make the computation undefined for the top and
bottom k rows and the leftmost and rightmost k columns 


2.  Pad the image with zeros: Return zero whenever a value of I is required
at some position outside the defined limits of X and Y 


3.  Assume periodicity: The top row wraps around to the bottom row; the
leftmost column wraps around to the rightmost column 


Linear Filters: Boundary Effects
Three standard ways to deal with boundaries:

!36

— The correlation of and is:

!37

— Visual interpretation: Superimpose the filter on the image at ,
perform an element-wise multiply, and sum up the values

— Convolution is like correlation except filter “flipped”

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

I(X,Y )
F (X,Y )
n⇥ n
m⇥m
m = 5

if then correlation = convolution.F (X,Y ) = F (�X,�Y )

!37

I 0(X,Y ) =
kX

j=�k

kX

i=�k
F (I, J)I(X + i, Y + j)

filter image (signal)output

Linear Filters

Linear System: Characterization Theorem

Any linear, shift invariant operation can be expressed as a convolution

!38

Linear System: Characterization Theorem

Any linear, shift invariant operation can be expressed as a convolution

!39

(`if and only if’ result)

Smoothing with a Gaussian

!40

Idea: Weight contributions of pixels by spatial proximity (nearness)

2D Gaussian (continuous case):

G�(x, y) =
1

2⇡�

2
exp

� x
2+y2

2�2

Forsyth & Ponce (2nd ed.)
Figure 4.2

!41

σ σσσ σσσσ

68%

99.99%

99.7%

95%

Gaussian: Area Under the Curve

Efficient Implementation: Separability

A 2D function of x and y is separable if it can be written as the product of two
functions, one a function only of x and the other a function only of y

Both the 2D box filter and the 2D Gaussian filter are separable

Both can be implemented as two 1D convolutions:
— First, convolve each row with a 1D filter
— Then, convolve each column with a 1D filter
— Aside: or vice versa

The 2D Gaussian is the only (non trivial) 2D function that is both separable and
rotationally invariant.

!42

— Convolution is symmetric. That is,

Linear Filters: Additional Properties

!43

Let denote convolution. Let be a digital image. Let F and G be
digital filters
⌦ k F1 F2 F I(X,Y )

(F1 + F2)⌦ I(X,Y ) = F1 ⌦ I(X,Y ) + F2 ⌦ I(X,Y )
(kF )⌦ I(X,Y ) = F ⌦ (kI(X,Y )) = k(F ⌦ I(X,Y ))

⌦ k F1 F2 F I(X,Y )
(F1 + F2)⌦ I(X,Y ) = F1 ⌦ I(X,Y ) + F2 ⌦ I(X,Y )
(kF )⌦ I(X,Y ) = F ⌦ (kI(X,Y )) = k(F ⌦ I(X,Y ))

G⌦ (F ⌦ I(X,Y )) = (G⌦ F )⌦ I(X,Y )

(G⌦ F )⌦ I(X,Y ) = (F ⌦G)⌦ I(X,Y )

— Convolution is associative. That is,

Convolving with filter F and then convolving the result with filter G can
be achieved in single step, namely convolving with filter G⌦ F = F ⌦G

⌦ k F1 F2 F I(X,Y )
(F1 + F2)⌦ I(X,Y ) = F1 ⌦ I(X,Y ) + F2 ⌦ I(X,Y )
(kF )⌦ I(X,Y ) = F ⌦ (kI(X,Y )) = k(F ⌦ I(X,Y ))

⌦ k F1 F2 F I(X,Y )
(F1 + F2)⌦ I(X,Y ) = F1 ⌦ I(X,Y ) + F2 ⌦ I(X,Y )
(kF )⌦ I(X,Y ) = F ⌦ (kI(X,Y )) = k(F ⌦ I(X,Y ))

(G⌦ F )⌦ I(X,Y ) = (G⌦ F )⌦ I(X,Y )

Bilateral Filter

An edge-preserving non-linear filter

Like a Gaussian filter:
— The filter weights depend on spatial distance from the center pixel

— Pixels nearby (in space) should have greater influence than pixels far away

Unlike a Gaussian filter:
— The filter weights also depend on range distance from the center pixel 

— Pixels with similar brightness value should have greater influence than pixels
with dissimilar brightness value

!44

!45

Gaussian filter: weights of neighbor at a spatial offset away from the
center pixel given by:

G�(x, y) =
1

2⇡�

2
exp

� x
2+y2

2�2

=


1

p
2⇡�

exp

� x
2

2�2

◆✓
1

p
2⇡�

exp

� y
2

2�2

I(X,Y )

(x, y)

Bilateral filter: weights of neighbor at a spatial offset away from the center
pixel given by a product:

exp

� x
2+y2

2�2
d

exp

� (I(X+x,Y +y)�I(X,Y ))
2

2�2
r

(x, y)

I(X,Y )

(with appropriate normalization)

(with appropriate normalization)

domain
kernel

range
kernel

Bilateral Filter

Bilateral Filter Application: Denoising

!46

Noisy Image Gaussian Filter Bilateral Filter

Slide Credit: Alexander Wong

Sample Question: Filters

What does the following 3 × 3 linear, shift invariant filter compute when applied
to an image?

!47

2

4
�1 �1 �1
0 0 0
1 1 1

3

5

Denote the image as a function, , where and are spatial variables

Aside: The convention for this section is to use lower case letters for the
continuous case and upper case letters for the discrete case

!48

Continuous Case

x

y
i(x,y)

i(x, y) i(x, y)
i(x, y)

Discrete Case

!49

i(x,y)

x

y

Idea: Superimpose (regular) grid on continuous image

Sample the underlying continuous image according to the tessellation
imposed by the grid

i(x,y)

x

y

pixel

Discrete Case

!50

Each grid cell is called a picture element (pixel)

Denote the discrete image as

We can store the pixels in a matrix or array

I(X,Y )

It is clear that some information may be lost when we work on a discrete pixel grid.

!51

Sampling

Forsyth & Ponce (2nd ed.) Figure 4.7

It is clear that some information may be lost when we work on a discrete pixel grid.

!52

Sampling

Forsyth & Ponce (2nd ed.) Figure 4.7

It is clear that some information may be lost when we work on a discrete pixel grid.

!53

Sampling

Forsyth & Ponce (2nd ed.) Figure 4.7

It is clear that some information may be lost when we work on a discrete pixel grid.

!54

Sampling

Forsyth & Ponce (2nd ed.) Figure 4.7

It is clear that some information may be lost when we work on a discrete pixel grid.

!55

Sampling

Forsyth & Ponce (2nd ed.) Figure 4.7

Exact reconstruction requires constraint on the rate at which can change
between samples
— “rate of change” means derivative
— the formal concept is bandlimited signal
— “bandlimit” and “constraint on derivative” are linked

An image is bandlimited if it has some maximum spatial frequency

A fundamental result (Sampling Theorem) is:
For bandlimited signals, if you sample regularly at or above twice the
maximum frequency (called the Nyquist rate), then you can reconstruct
the original signal exactly

!56

Sampling Theory

i(x, y)

Sometimes undersampling is unavoidable, and there is a trade-off between
“things missing” and “artifacts.”

Medical imaging: usually try to maximize information content, tolerate some
artifacts

Computer graphics: usually try to minimize artifacts, tolerate some
information missing

!57

Sampling Theory

!58

Template Matching

Slide Credit: Kristen Grauman

We can think of convolution/correlation as comparing a template (the filter)
with each local image patch.
— Consider the filter and image patch as vectors.
— Applying a filter at an image location can be interpreted as computing the
dot product between the filter and the local image patch.

!59

Template Matching

We can think of convolution/correlation as comparing a template (the filter)
with each local image patch.
— Consider the filter and image patch as vectors.
— Applying a filter at an image location can be interpreted as computing the
dot product between the filter and the local image patch.

!60

Template Matching

0 0

0 0 0

0

1

1 1

Template

We can think of convolution/correlation as comparing a template (the filter)
with each local image patch.
— Consider the filter and image patch as vectors.
— Applying a filter at an image location can be interpreted as computing the
dot product between the filter and the local image patch.

!61

Template Matching

0 0

0 0 0

0

1

1 1
0 0

0 0 0

0

1

1 1

0 0

0

0

00

1

1 1

TemplateImage
Patch 1

Image
Patch 2

We can think of convolution/correlation as comparing a template (the filter)
with each local image patch.
— Consider the filter and image patch as vectors.
— Applying a filter at an image location can be interpreted as computing the
dot product between the filter and the local image patch.

!62

Template Matching

0 0

0 0 0

0

1

1 1
0 0

0 0 0

0

1

1 1

0 0

0 0 0

0

1

1 1

0 0

0

0

00

1

1 1

TemplateImage
Patch 1

Image
Patch 2

elem
ent

mult
iply

0 0

0 0 0

0

1

0 0

We can think of convolution/correlation as comparing a template (the filter)
with each local image patch.
— Consider the filter and image patch as vectors.
— Applying a filter at an image location can be interpreted as computing the
dot product between the filter and the local image patch.

!63

Template Matching

0 0

0 0 0

0

1

1 1
0 0

0 0 0

0

1

1 1

0 0

0 0 0

0

1

1 1

0 0

0

0

00

1

1 1

TemplateImage
Patch 1

Image
Patch 2

elem
ent

mult
iply

0 0

0 0 0

0

1

0 0

= 3

= 1

We can think of convolution/correlation as comparing a template (the filter)
with each local image patch.
— Consider the filter and image patch as vectors.
— Applying a filter at an image location can be interpreted as computing the
dot product between the filter and the local image patch.

!64

Template Matching

0 0

0 0 0

0

1

1 1
0 0

0 0 0

0

1

1 1

0 0

0 0 0

0

1

1 1

0 0

0

0

00

1

1 1

TemplateImage
Patch 1

Image
Patch 2

elem
ent

mult
iply

0 0

0 0 0

0

1

0 0

= 3

= 1 ⇥255⇥255

We can think of convolution/correlation as comparing a template (the filter)
with each local image patch.
— Consider the filter and image patch as vectors.
— Applying a filter at an image location can be interpreted as computing the
dot product between the filter and the local image patch.

!65

Template Matching

0 0

0 0 0

0

1

1 1
0 0

0 0 0

0

1

1 1

0 0

0 0 0

0

1

1 1

0 0

0

0

00

1

1 1

TemplateImage
Patch 1

Image
Patch 2

elem
ent

mult
iply

0 0

0 0 0

0

1

0 0

= 3

= 1 ⇥255⇥255

The dot product may be large simply because the image region is bright.
We need to normalize the result in some way.

Let and be vectors. Let be the angle between them. We know

where · is dot product and | | is vector magnitude

Correlation is a dot product

Correlation measures similarity between the filter and each local image region

Normalized correlation varies between −1 and 1

Normalized correlation attains the value 1 when the filter and image region are
identical (up to a scale factor)

!66

Template Matching

cos ✓ =
a · b
|a||b|

=

a · b
p

(a · a)(b · b)
=

a

|a|
b

|b|

a b ✓

!67

Template Matching

Slide Credit: Kristen Grauman

Example 1:

!68

Credit: W. Freeman et al., “Computer Vision for Interactive Computer Graphics,”
IEEE Computer Graphics and Applications, 1998

Template (left), image (middle),
normalized correlation (right)

Note peak value at the true
position of the hand

Sample Question: Template Matching

True or false: Normalized correlation is robust to a constant scaling in the
image brightness.

!69

Scaled Representations: Goals

to find template matches at all scales
— template size constant, image scale varies
— finding hands or faces when we don’t know what size they are in the image

efficient search for image–to–image correspondences
— look first at coarse scales, refine at finer scales
— much less cost (but may miss best match)

to examine all levels of detail
—  find edges with different amounts of blur
—  find textures with different spatial frequencies (i.e., different levels of detail)

!70

!71

Shrinking the Image

Forsyth & Ponce (2nd ed.) Figure 4.12-4.14 (top rows)

Image Pyramid

An image pyramid is a collection of representations of an image. Typically,
each layer of the pyramid is half the width and half the height
of the previous layer.

In a Gaussian pyramid, each layer is smoothed by a Gaussian filter and
resampled to get the next layer

!72

Gaussian Pyramid

!73

Forsyth & Ponce (2nd ed.) Figure 4.17

!74

From Template Matching to Local Feature Detection

Slide Credit: Li Fei-Fei, Rob Fergus, and Antonio Torralba

Recall, for a 2D (continuous) function, f(x,y) 


Differentiation is linear and shift invariant, and therefore can be implemented as
a convolution

A (discrete) approximation is

!75

@f

@x

= lim
✏!0

f(x+ ✏, y)� f(x, y)

@f

@x


F (X + 1, y)� F (x, y)

�x

Estimating Derivatives

@f

@x

= lim
✏!0

f(x+ ✏, y)� f(x, y)

@f

@x


F (X + 1, y)� F (x, y)

�x

�1 1

A similar definition (and approximation) holds for

Image noise tends to result in pixels not looking exactly like their neighbours,
so simple “finite differences” are sensitive to noise.

The usual way to deal with this problem is to smooth the image prior to
derivative estimation.

!76

@f

@y

Estimating Derivatives

What Causes Edges?

!77

What causes an edge?

• Depth discontinuity
• Surface orientation

discontinuity
• Reflectance

discontinuity (i.e.,
change in surface
material properties)

• Illumination
discontinuity (e.g.,
shadow)

Slide credit: Christopher Rasmussen

Slide Credit: Christopher Rasmussen

Smoothing and Differentiation

Edge: a location with high gradient (derivative)

Need smoothing to reduce noise prior to taking derivative

Need two derivatives, in x and y direction

We can use derivative of Gaussian filters
— because differentiation is convolution, and
— convolution is associative

Let denote convolution 


!78

D ⌦ (G⌦ I(X,Y )) = (D ⌦G)⌦ I(X,Y )

Gradient Magnitude

Let be a (digital) image

Let and be estimates of the partial derivatives in the and
directions, respectively.

Call these estimates and (for short) The vector is the gradient

The scalar is the gradient magnitude

!79

I(X,Y )

I
x

(X,Y ) Iy(X,Y )

[I
x

, I
y

]I
x

Iy
q

I2
x

+ I2
y

x

y

The gradient direction is given by: ✓ = tan�1

I
y

I
x

Two Generic Approaches for Edge Detection

!80

r

r

r

i(r)

d i(r)
dr

d2i(r)
dr2

x

y

r

Two generic approaches to edge point detection:
— (significant) local extrema of a first derivative operator
— zero crossings of a second derivative operator

Marr / Hildreth Laplacian of Gaussian
A “zero crossings of a second derivative operator” approach

Steps:
1. Gaussian for smoothing

2. Laplacian ( ) for differentiation where

3. Locate zero-crossings in the Laplacian of the Gaussian ( ) where

!81

r2

r2f(x, y) =
@

2
f(x, y)

@x

2
+

@

2
f(x, y)

@y

2

r2G

r2G(x, y) =
�1
2⇡�

4


2�

x

2
+ y

2

2


exp

� x
2+y2

2�2

Here’s a 3D plot of the Laplacian of the Gaussian ( )

. . . with its characteristic “Mexican hat” shape
!82

Marr / Hildreth Laplacian of Gaussian

r2G

Canny Edge Detector
Steps:

1. Apply directional derivatives of Gaussian

2. Compute gradient magnitude and gradient direction

3. Non-maximum suppression
— thin multi-pixel wide “ridges” down to single pixel width

4. Linking and thresholding
— Low, high edge-strength thresholds
— Accept all edges over low threshold that are connected to edge over high
threshold

!83

Edge Hysteresis

One way to deal with broken edge chains is to use hysteresis

Hysteresis: A lag or momentum factor

Idea: Maintain two thresholds and
— Use khigh to find strong edges to start edge chain
— Use klow to find weak edges which continue edge chain

Typical ratio of thresholds is (roughly):

!84

k
high

k
low

= 2

khigh k
low

!85

“Divide the image into some number of segments, where the segments
represent ’things’ or ’parts of things’ in the scene. The number of segments is
up to you, as it depends on the image. Something between 2 and 30 is likely to
be appropriate. It is important that all of the segments have approximately equal
importance.”

(Martin et al. 2004)

How do humans perceive boundaries?

!86
Figure Credit: Szeliski Fig. 4.31. Original: Martin et al. 2004

Each image shows multiple (4-8) human-marked boundaries. Pixels are darker
where more humans marked a boundary.

How do humans perceive boundaries?

Sample Question: Edges

Why is non-maximum suppression applied in the Canny edge detector?

!87

What is a corner?

We can think of a corner as any locally distinct 2D image feature that (hopefully)
corresponds to a distinct position on an 3D object of interest in the scene.

!88

Image Credit: John Shakespeare, Sydney Morning Herald

Why are corners distinct?

!89

A corner can be localized reliably.

Thought experiment:

— Place a small window over a patch of constant image value.
If you slide the window in any direction, the image in the
window will not change.

“corner”:

significant change

in all directions

— Place a small window over an edge. If you slide the window in the direction of
the edge, the image in the window will not change
→ Cannot estimate location along an edge (a.k.a., aperture problem)

— Place a small window over a corner. If you slide the window in any direction,
the image in the window changes.

Image Credit: Ioannis (Yannis) Gkioulekas (CMU)

Autocorrelation

!90

Szeliski, Figure 4.5

Corner Detection

Edge detectors perform poorly at corners

Observations:
— The gradient is ill defined exactly at a corner
— Near a corner, the gradient has two (or more) distinct values

!91

Harris Corner Detection

!92

1.Compute image gradients over
small region

2.Compute the covariance matrix

3.Compute eigenvectors and
eigenvalues

4.Use threshold on eigenvalues to
detect corners

Slide Adopted: Ioannis (Yannis) Gkioulekas (CMU)

2. Compute the covariance matrix (a.k.a. 2nd moment matrix)

!93

Sum over small region
around the corner

Gradient with respect to x, times
gradient with respect to y

Matrix is symmetric

C =

Harris Corner Detection

— Filter image with Gaussian

— Compute magnitude of the x and y gradients at each pixel

— Construct C in a window around each pixel
— Harris uses a Gaussian window

— Solve for product of the λ’s

— If λ’s both are big (product reaches local maximum above threshold) then we
have a corner
— Harris also checks that ratio of λs is not too high

!94

!95

Harris corners

• Originally developed as features for motion tracking
• Greatly reduces amount of computation compared to

tracking every pixel
• Translation and rotation invariant (but not scale invariant)

Harris Corner Detection

Properties: NOT Invariant to Scale Changes

!96

edge!
corner!

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)

Sample Questions: Corners

The Harris corner detector is stable under some image transformations
(features are considered stable if the same locations on an object are typically
selected in the transformed image).


True or false: The Harris corner detector is stable under image blur.

!97

Texture

We will look at two main questions:

1.  How do we represent texture? 

→ Texture analysis 


2. How do we generate new examples of a texture? 

→ Texture synthesis 


!98

Texture Synthesis

Why might we want to synthesize texture?

1. To fill holes in images (inpainting)
— Art directors might want to remove telephone wires. Restorers might want to
remove scratches or marks.
— We need to find something to put in place of the pixels that were removed
— We synthesize regions of texture that fit in and look convincing

2. To produce large quantities of texture for computer graphics
— Good textures make object models look more realistic 


!99

Texture Synthesis

!100

Szeliski, Fig. 10.49

!101

Texture Synthesis

Photo Credit: Associated Pres

Infinite sample image

SAMPLE

p

— What is conditional probability distribution of p, given the neighbourhood
window?
— Directly search the input image for all such neighbourhoods to produce a
histogram for p
— To synthesize p, pick one match at random

!102

Efros and Leung: Synthesizing One Pixel

Infinite sample image

SAMPLE

p

— Since the sample image is finite, an exact neighbourhood match might not
be present
— Find the best match using SSD error, weighted by Gaussian to emphasize
local structure, and take all samples within some distance from that match

!103

Efros and Leung: Synthesizing One Pixel

For multiple pixels, “grow” the texture in layers
— In the case of hole-filling, start from the edges of the hole

For an interactive demo, see
https://una-dinosauria.github.io/efros-and-leung-js/
(written by Julieta Martinez, a previous CPSC 425 TA)

!104

Efros and Leung: Synthesizing Many Pixels

https://una-dinosauria.github.io/efros-and-leung-js/

Randomness Parameter

!105
Slide Credit: http://graphics.cs.cmu.edu/people/efros/research/NPS/efros-iccv99.ppt

http://graphics.cs.cmu.edu/people/efros/research/NPS/efros-iccv99.ppt

!106

“Big Data” Meets Inpainting

Original Image Input

Figure Credit: Hays and Efros 2007

!107 Figure Credit: Hays and Efros 2007

Scene MatchesInput Output

“Big Data” Meets Inpainting

Algorithm sketch (Hays and Efros 2007):

1.  Create a short list of a few hundred “best matching” images based on global
image statistics

2.  Find patches in the short list that match the context surrounding the image
region we want to fill

3.  Blend the match into the original image 


Purely data-driven, requires no manual labeling of images

!108

“Big Data” Meets Inpainting

!109
Credit: Bill Freeman

Compare textures and decide if they’re mae of the same “stuff”

Goal of Texture Analysis

Texture Representation
Observation: Textures are made up of generic sub-elements, repeated over a
region with similar statistical properties

Idea: Find the sub-elements with filters, then represent each point in the image
with a summary of the pattern of sub-elements in the local region

!110

Texture Representation
Observation: Textures are made up of generic sub-elements, repeated over a
region with similar statistical properties

Idea: Find the sub-elements with filters, then represent each point in the image
with a summary of the pattern of sub-elements in the local region

Question: What filters should we use?

Answer: Human vision suggests spots and oriented edge filters at a variety of
different orientations and scales

!111

Texture Representation
Observation: Textures are made up of generic sub-elements, repeated over a
region with similar statistical properties

Idea: Find the sub-elements with filters, then represent each point in the image
with a summary of the pattern of sub-elements in the local region

Question: What filters should we use?

Answer: Human vision suggests spots and oriented edge filters at a variety of
different orientations and scales

Question: How do we “summarize”?

Answer: Compute the mean or maximum of each filter response over the region

— Other statistics can also be useful

!112

Texture Representation

!113
Figure Credit: Leung and Malik, 2001

Spots and Bars (Fine Scale)

!114

Forsyth & Ponce (1st ed.) Figures 9.3–9.4

Spots and Bars (Coarse Scale)

!115

Forsyth & Ponce (1st ed.) Figures 9.3 and 9.5

!116

Laplacian Pyramid

Laplacian Pyramid
Building a Laplacian pyramid:
— Create a Gaussian pyramid
— Take the difference between one Gaussian pyramid level and the next
(before subsampling)

Properties
— Also known as the difference-of-Gaussian (DOG) function, a close
approximation to the Laplacian
— It is a band pass filter – each level represents a different band of spatial
frequencies

Reconstructing the original image:
— Reconstruct the Gaussian pyramid starting at top

!117

Constructing a Laplacian Pyramid

!118

repeat:
filter

subsample
until min resolution reached

Algorithm

compute residual

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)

Reconstructing the Original Image

!119

repeat:

upsample

until orig resolution reached

Algorithm

sum with residual

Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)

Gaussian vs Laplacian Pyramid

!120

Shown in opposite
order for space

Which one takes
more space to

store?
Slide Credit: Ioannis (Yannis) Gkioulekas (CMU)

Oriented Pyramids

!121

Forsyth & Ponce (1st ed.) Figure 9.13

Final Texture Representaation

Steps:

1. Form a Laplacian and oriented pyramid (or equivalent set of responses to
filters at different scales and orientations)

2. Square the output (makes values positive)

3. Average responses over a neighborhood by blurring with a Gaussian

4. Take statistics of responses
— Mean of each filter output
— Possibly standard deviation of each filter

!122

Sample Question: Texture

How does the top-most image in a Laplacian pyramid differ from the others?

!123