Fundamentals of Computer Vision
Lecture
• Stereo matching.
• Improving stereo matching.
• Structured light.
• A note on normalization.
Overview of today’s lecture
Slide credits
Most of these slides were adapted from:
• Kris Kitani (15-463, Fall 2016, Fall 2017), Ioannis Gkioulekas (16-385, Spring 2019), Robert Colin (454, Fall 2019s).
Some slides were inspired or taken from: • Fredo Durand (MIT).
Stereo
X
x
x’
(baseline)
Disparity
inversely proportional to depth
What is stereo rectification?
Reproject image planes onto a common plane parallel to the line between camera centers
Need two homographies (3×3 transform), one for each input image reprojection
C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo Vision.Computer Vision and Pattern Recognition, 1999.
Stereo Rectification:
1. Compute E to get R
2. Rotate right image by R
3. Rotate both images by Rrect
4. Scale both images by H
Stereo Rectification:
1. Compute E to get R
2. Rotate right image by R
3. Rotate both images by Rrect
4. Scale both images by H
1.Rectify images
(make epipolar lines horizontal)
2.For each
a.Find epipolar line scanline in the right image b.Search the scanline and pick the best match c.Compute disparity x-x’
d.Compute depth from disparity
pixel
How would you do this?
Reminder from filtering
How do we detect an edge?
Reminder from filtering
How do we detect an edge?
• We filter with something that looks like an edge.
1
0
-1
*
*
horizontal edge filter
1
0
-1
original
We can think of linear filtering as a way to evaluate how similar an image is locally to some template.
vertical edge filter
Find this template
How do we detect the template in the following image?
Find this template
How do we detect the template in he following image?
filter output
What will the output look like?
image
Solution 1: Filter the image using the template as filter kernel.
Find this template
How do we detect the template in he following image?
filter output
image
Solution 1: Filter the image using the template as filter kernel.
What went wrong?
Find this template
How do we detect the template in he following image?
filter output
image
Solution 1: Filter the image using the template as filter kernel.
Increases for higher local intensities.
Find this template
How do we detect the template in he following image?
filter
template mean
output
What will the output look like?
image
Solution 2: Filter the image using a zero-mean template.
Find this template
How do we detect the template in he following image?
filter
template mean
output
output
True detection
thresholding
What went wrong?
image
Solution 2: Filter the image using a zero-mean template.
False detections
Find this template
How do we detect the template in he following image?
filter
template mean
output
output
image
Solution 2: Filter the image using a zero-mean template.
Not robust to high- contrast areas
Find this template
How do we detect the template in he following image?
filter output
What will the output look like?
image
Solution 3: Use sum of squared differences (SSD).
How do we detect the template in he following image?
filter output
1-output
Find this template
True detection
image
Solution 3: Use sum of squared differences (SSD).
thresholding
What could go wrong?
How do we detect the template in he following image?
filter output
1-output
Find this template
image
Solution 3: Use sum of squared differences (SSD).
Not robust to local intensity changes
Find this template
How do we detect the template in he following image?
Observations so far:
• subtracting mean deals with brightness bias
• dividing by standard deviation removes contrast bias Can we combine the two effects?
Find this template
How do we detect the template in he following image?
filter
template mean
What will the output look like?
output
image
Solution 4: Normalized cross-correlation (NCC).
local patch mean
How do we detect the template in he following image?
Find this template
1-output
True detections
Solution 4: Normalized cross-correlation (NCC).
thresholding
How do we detect the template in he following image?
Find this template
1-output
True detections
Solution 4: Normalized cross-correlation (NCC).
thresholding
What is the best method?
It depends on whether you care about speed or invariance.
• Zero-mean: Fastest, very sensitive to local intensity.
• Sum of squared differences: Medium speed, sensitive to intensity offsets.
• Normalized cross-correlation: Slowest, invariant to contrast and brightness.
scanline
Stereo Block Matching
Left
Right
Matching cost
• •
Slide a window along the epipolar line and compare contents of that window with the reference window in the left image Matching cost: SSD or normalized correlation
disparity
SSD
Normalized cross-correlation
Similarity Measure
Formula
Sum of Absolute Differences (SAD)
Sum of Squared Differences (SSD)
Zero-mean SAD
Locally scaled SAD
Normalized Cross Correlation (NCC)
SAD SSD NCC Ground truth
Effect of window size
W = 3 W = 20
Effect of window size
W = 3
Smaller window
+ More detail – More noise
W = 20
Loss of finer details
Larger window
+ Smoother disparity maps
– Less detail
– Fails near boundaries
When will stereo block matching fail?
When will stereo block matching fail?
textureless regions repeated patterns
specularities
Limitations
So far, each left image patch has been matched independently along the right epipolar line.
We would like to at least enforce some consistency among matches in the same row (scanline).
Disparity Space Image
First we introduce the concept of DSI.
The DSI for one row represents pairwise match scores between patches along that row in the left and right image.
Pixels along left scanline
Pixel i
Pixel j
C(i,j) = Match score for patch centered at left pixel i with patch centered at right pixel j.
Pixels along right scanline
Disparity Space Image (DSI)
Left Image Right Image
Dissimilarity Values (1-NCC) or SSD
Disparity Space Image (DSI)
Left Image Right Image
Dissimilarity Values
(1-NCC) or SSD
Disparity Space Image (DSI)
Left Image Right Image
Dissimilarity Values
(1-NCC) or SSD
Disparity Space Image (DSI)
Left Image
DSI
Dissimilarity Values
Enter each vector of match scores as a column in the DSI
Invalid entries due to constraint that disparity >= low value
(0 in this case)
Disparity Space Image
Left scanline
Invalid entries due to constraint that disparity <= high value
64 in this case)
Right scanline
Disparity Space Image
N cols in left scanline
If we rearrange the diagonal band of valid values into a rectangular array (in this case of size 64 x N), that is what is traditionally known as the DSI
However, we’re going to keep the full image around, including invalid values
coordinate in left scanline (e.g. N)
Disparity Space Image
Disparity (e.g. 64)
M cols in right scanline
DSI and Scanline Consistency
Assigning disparities to all pixels in left scanline now amounts to finding a connected path through the DSI
Start
End
Lowest Cost Path
We would like to choose the “best” path.
Want one with the lowest “cost” (Lowest sum of dissimilarity scores along the path)
?
?
?
Other Constraints on Path
It is common to impose an ordering constraint on the path.
Intuitively, the path is not allowed to “double back” on itself.
A
B
D C
ABC
Ordering Constraint
ABCD
DABC
ABCD
Ordering constraint...
...and its failure
ABCD
D
Occlusions Can Occur
Left scanline Right scanline
......
Occlusions Can Occur
Left scanline
......
Right scanline
Occluded from right scanline
Match
Match Match
Occluded from left scanline
However, note that the order of matching patches is preserved.
Occlusions: No matches
Left image
Right image
An Optimal Scanline Strategy
• We want to find best connected path, taking into account ordering constraint and the possibility of occlusions.
Practical algorithm:
Cox, Hingorani, Rao, Maggs, “A Maximum Likelihood Stereo Algorithm,” Computer Vision and Image Understanding, Vol 63(3), May 1996, pp.542-567.
See also Ohta & Kanade ’85
Start
Cox et.al. Stereo Matching
Recap: want to find lowest cost path from upper left to lower right of DSI image.
At each point on the path we have three choices: step left, step down, step diagonally.
Each choice has a well-defined cost associated with it.
End
This problem call for Dynamic Programming! (which, indeed, is how Cox et.al. solve the problem)
Example: 5x5 windows NCC match score
Computed disparities Ground truth Black pixels: bad disparity values, or no matching patch in the other image
Example
Result of Dynamic Programming algorithm
Ground truth
Result of DP alg. Black pixels = occluded.
Scanline Algorithm Limitations
• Each line being matched independently
• No obvious way to generalize dynamic programming from 1D scanline to 2D grid
What are some problems with the result?
Too many discontinuities.
How can we improve depth estimation?
We expect disparity values to change slowly.
Let’s make an assumption:
depth should change smoothly
Stereo matching as ... Energy Minimization
What defines a good stereo correspondence?
1. Match quality
– Want each pixel to find a good match in the other image 2. Smoothness
– If two pixels are adjacent, they should (usually) move about the same amount
I2
W2(i+D(i ))
I1
Energy Minimization
D
W1(i )
D(i )
E = a Edata (I1, I2 , D) + b Esmooth (D)
data term
Want each pixel to find a good match in the other image
(block matching result)
smoothness term
Adjacent pixels should (usually) move about the same amount
(smoothness function)
Lazebnik, UNC
{ {
Energy Minimization
E = a Edata (I1, I2 , D) + b Esmooth (D) E =å(W(i)-W(i+D(i)))2
data
i
12
data term
SSD distance between windows centered at I(x, y) and J(x+ d(x,y), y)
Lazebnik, UNC
Energy Minimization
E = a Edata (I1, I2 , D) + b Esmooth (D) E =å(W(i)-W(i+D(i)))2
data
i
12
data term
SSD distance between windows centered at I(x, y) and J(x+ d(x,y), y)
Esmooth = år(D(i)-D(j)) neighbors i, j
e.g. Potts model e.g. truncated linear model
Lazebnik, UNC
Energy Minimization
E = a Edata (I1, I2 , D) + b Esmooth (D)
Energy functions of this form can be minimized efficiently using an algorithm
known as Graph cuts
Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001
V. Kolmogorov and R. Zabih, Computing Visual Correspondence with Occlusions using Graph Cuts, ICCV 2001
Lazebnik, UNC
State-of-the-Art Results (as of 2015)
Algorithm Results Ground truth
M. Mozerov and J. van Weijer. Accurate stereo matching by two step global optimization. IEEE Transactions on Image Processing 24(3), January 2015.
Match only Match & smoothness (via graph cut)
Ground Truth
Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001
All of these cases remain difficult, what can we do?
textureless regions repeated patterns
specularities
Structured light
Use controlled (“structured”) light to make correspondences easier
Disparity between laser points on the same scanline in the images determines the 3-D coordinates of the laser point on object
Use controlled (“structured”) light to make correspondences easier
Structured light and two cameras
laser
IJ
Structured light and one camera
Projector acts like “reverse” camera
IJ
Example: Laser scanner
Digital Michelangelo Project
http://graphics.stanford.edu/projects/mich/
A note on normalization
Estimating F – 8-point algorithm
• The fundamental matrix F is defined by
x'TFx = 0
for any pair of matches x and x’ in two images.
é f11 f12 • Letx=(u,v,1)Tandx’=(u’,v’,1)T, F=êêf21 f22
f13 ù f23úú
each match gives a linear equation
êfffú ë 31 32 33û
uu' f11 +vu' f12 +u' f13 +uv' f21 +vv' f22 +v' f23 +uf31 +vf32 + f33 =0
Problem with 8-point algorithm
é f11 ù êê f12 úú
é ùê f13 ú uu ́vu ́u ́uv ́vv ́v ́uv1êú
ê11 11 1 11 11 1 1 1 úêf21ú uu ́vu ́u ́uv ́vv ́v ́u v 1
ê22 22 2 22 22 2 2 2 úêfú=0 ê! ! ! ! ! !!!!úê22ú
ê úê f23 ú uu ́vu ́u ́uv ́vv ́v ́u v 1
ënn nn n nn nn n n n ûêfú ~10000 ~10000 ~100 ~10000 ~10000 ~100 ~100 ~100 1 ê 31ú
Orders of magnitude difference êê f32 úú ! between column of data matrix ë f33 û
® least-squares yields poor results
Normalized 8-point algorithm
normalized least squares yields good results Transform image to ~[-1,1]x[-1,1]
(0,500)
(700,500) é2ù (-1,1) 0 -1
(1,1)
(0,0)
êê 700 úú ê 2-1ú
(0,0)
(700,0)
(-1,-1) (1,-1)
êê 5001úú
êú ëû
1. Transform input by
ˆ i
i ,
to obtain Fˆ
Normalized 8-point algorithm
xˆ i , xˆ 'i F = T'ΤFT
2. Call 8-point on 3. ˆ
x=Tx ˆ
x' =Tx' ii
image point
camera point
x'TFx = 0
F=0 Fˆ
T -T
ˆ
x' T'
-1ˆ Tx
Normalized 8-point algorithm
[x1, T1] = normalise2dpts(x1);
[x2, T2] = normalise2dpts(x2);
A = [x2(1,:)'.*x1(1,:)' x2(1,:)'.*x1(2,:)' x2(1,:)' ...
x2(2,:)'.*x1(1,:)' x2(2,:)'.*x1(2,:)' x2(2,:)' ...
x1(1,:)' x1(2,:)' ones(npts,1) ];
[U,D,V] = svd(A);
F = reshape(V(:,9),3,3)';
[U,D,V] = svd(F);
F = U*diag([D(1,1) D(2,2) 0])*V';
% Denormalise
F = T2'*F*T1;
𝐹 = 𝑈Σ𝑉&
Form a matrix Σ’ by replacing all but the k largest singular values in Σ with 0.
References
Basic reading:
• Szeliski textbook, Section 8.1 (not 8.1.1-8.1.3), Chapter 11, Section 12.2. • Hartley and Zisserman, Section 11.12.