程序代写代做代考 C algorithm kernel go graph Fundamentals of Computer Vision

Fundamentals of Computer Vision
Lecture

• Stereo matching.
• Improving stereo matching.
• Structured light.
• A note on normalization.
Overview of today’s lecture

Slide credits
Most of these slides were adapted from:
• Kris Kitani (15-463, Fall 2016, Fall 2017), Ioannis Gkioulekas (16-385, Spring 2019), Robert Colin (454, Fall 2019s).
Some slides were inspired or taken from: • Fredo Durand (MIT).

Stereo

X
x
x’
(baseline)
Disparity
inversely proportional to depth

What is stereo rectification?
Reproject image planes onto a common plane parallel to the line between camera centers
Need two homographies (3×3 transform), one for each input image reprojection
C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo Vision.Computer Vision and Pattern Recognition, 1999.

Stereo Rectification:
1. Compute E to get R
2. Rotate right image by R
3. Rotate both images by Rrect
4. Scale both images by H

Stereo Rectification:
1. Compute E to get R
2. Rotate right image by R
3. Rotate both images by Rrect
4. Scale both images by H

1.Rectify images
(make epipolar lines horizontal)
2.For each
a.Find epipolar line scanline in the right image b.Search the scanline and pick the best match c.Compute disparity x-x’
d.Compute depth from disparity
pixel
How would you do this?

Reminder from filtering
How do we detect an edge?

Reminder from filtering
How do we detect an edge?
• We filter with something that looks like an edge.
1
0
-1
*
*
horizontal edge filter
1
0
-1
original
We can think of linear filtering as a way to evaluate how similar an image is locally to some template.
vertical edge filter

Find this template
How do we detect the template in the following image?

Find this template
How do we detect the template in he following image?
filter output
What will the output look like?
image
Solution 1: Filter the image using the template as filter kernel.

Find this template
How do we detect the template in he following image?
filter output
image
Solution 1: Filter the image using the template as filter kernel.
What went wrong?

Find this template
How do we detect the template in he following image?
filter output
image
Solution 1: Filter the image using the template as filter kernel.
Increases for higher local intensities.

Find this template
How do we detect the template in he following image?
filter
template mean
output
What will the output look like?
image
Solution 2: Filter the image using a zero-mean template.

Find this template
How do we detect the template in he following image?
filter
template mean
output
output
True detection
thresholding
What went wrong?
image
Solution 2: Filter the image using a zero-mean template.
False detections

Find this template
How do we detect the template in he following image?
filter
template mean
output
output
image
Solution 2: Filter the image using a zero-mean template.
Not robust to high- contrast areas

Find this template
How do we detect the template in he following image?
filter output
What will the output look like?
image
Solution 3: Use sum of squared differences (SSD).

How do we detect the template in he following image?
filter output
1-output
Find this template
True detection
image
Solution 3: Use sum of squared differences (SSD).
thresholding
What could go wrong?

How do we detect the template in he following image?
filter output
1-output
Find this template
image
Solution 3: Use sum of squared differences (SSD).
Not robust to local intensity changes

Find this template
How do we detect the template in he following image?
Observations so far:
• subtracting mean deals with brightness bias
• dividing by standard deviation removes contrast bias Can we combine the two effects?

Find this template
How do we detect the template in he following image?
filter
template mean
What will the output look like?
output
image
Solution 4: Normalized cross-correlation (NCC).
local patch mean

How do we detect the template in he following image?
Find this template
1-output
True detections
Solution 4: Normalized cross-correlation (NCC).
thresholding

How do we detect the template in he following image?
Find this template
1-output
True detections
Solution 4: Normalized cross-correlation (NCC).
thresholding

What is the best method?
It depends on whether you care about speed or invariance.
• Zero-mean: Fastest, very sensitive to local intensity.
• Sum of squared differences: Medium speed, sensitive to intensity offsets.
• Normalized cross-correlation: Slowest, invariant to contrast and brightness.

scanline
Stereo Block Matching
Left
Right
Matching cost
• •
Slide a window along the epipolar line and compare contents of that window with the reference window in the left image Matching cost: SSD or normalized correlation
disparity

SSD

Normalized cross-correlation

Similarity Measure
Formula
Sum of Absolute Differences (SAD)
Sum of Squared Differences (SSD)
Zero-mean SAD
Locally scaled SAD
Normalized Cross Correlation (NCC)
SAD SSD NCC Ground truth

Effect of window size
W = 3 W = 20

Effect of window size
W = 3
Smaller window
+ More detail – More noise
W = 20
Loss of finer details
Larger window
+ Smoother disparity maps
– Less detail
– Fails near boundaries

When will stereo block matching fail?

When will stereo block matching fail?
textureless regions repeated patterns
specularities

Limitations
So far, each left image patch has been matched independently along the right epipolar line.
We would like to at least enforce some consistency among matches in the same row (scanline).

Disparity Space Image
First we introduce the concept of DSI.
The DSI for one row represents pairwise match scores between patches along that row in the left and right image.
Pixels along left scanline
Pixel i
Pixel j
C(i,j) = Match score for patch centered at left pixel i with patch centered at right pixel j.
Pixels along right scanline

Disparity Space Image (DSI)
Left Image Right Image
Dissimilarity Values (1-NCC) or SSD

Disparity Space Image (DSI)
Left Image Right Image
Dissimilarity Values
(1-NCC) or SSD

Disparity Space Image (DSI)
Left Image Right Image
Dissimilarity Values
(1-NCC) or SSD

Disparity Space Image (DSI)
Left Image
DSI
Dissimilarity Values
Enter each vector of match scores as a column in the DSI

Invalid entries due to constraint that disparity >= low value
(0 in this case)
Disparity Space Image
Left scanline
Invalid entries due to constraint that disparity <= high value 64 in this case) Right scanline Disparity Space Image N cols in left scanline If we rearrange the diagonal band of valid values into a rectangular array (in this case of size 64 x N), that is what is traditionally known as the DSI However, we’re going to keep the full image around, including invalid values coordinate in left scanline (e.g. N) Disparity Space Image Disparity (e.g. 64) M cols in right scanline DSI and Scanline Consistency Assigning disparities to all pixels in left scanline now amounts to finding a connected path through the DSI Start End Lowest Cost Path We would like to choose the “best” path. Want one with the lowest “cost” (Lowest sum of dissimilarity scores along the path) ? ? ? Other Constraints on Path It is common to impose an ordering constraint on the path. Intuitively, the path is not allowed to “double back” on itself. A B D C ABC Ordering Constraint ABCD DABC ABCD Ordering constraint... ...and its failure ABCD D Occlusions Can Occur Left scanline Right scanline ...... Occlusions Can Occur Left scanline ...... Right scanline Occluded from right scanline Match Match Match Occluded from left scanline However, note that the order of matching patches is preserved. Occlusions: No matches Left image Right image An Optimal Scanline Strategy • We want to find best connected path, taking into account ordering constraint and the possibility of occlusions. Practical algorithm: Cox, Hingorani, Rao, Maggs, “A Maximum Likelihood Stereo Algorithm,” Computer Vision and Image Understanding, Vol 63(3), May 1996, pp.542-567. See also Ohta & Kanade ’85 Start Cox et.al. Stereo Matching Recap: want to find lowest cost path from upper left to lower right of DSI image. At each point on the path we have three choices: step left, step down, step diagonally. Each choice has a well-defined cost associated with it. End This problem call for Dynamic Programming! (which, indeed, is how Cox et.al. solve the problem) Example: 5x5 windows NCC match score Computed disparities Ground truth Black pixels: bad disparity values, or no matching patch in the other image Example Result of Dynamic Programming algorithm Ground truth Result of DP alg. Black pixels = occluded. Scanline Algorithm Limitations • Each line being matched independently • No obvious way to generalize dynamic programming from 1D scanline to 2D grid What are some problems with the result? Too many discontinuities. How can we improve depth estimation? We expect disparity values to change slowly. Let’s make an assumption: depth should change smoothly Stereo matching as ... Energy Minimization What defines a good stereo correspondence? 1. Match quality – Want each pixel to find a good match in the other image 2. Smoothness – If two pixels are adjacent, they should (usually) move about the same amount I2 W2(i+D(i )) I1 Energy Minimization D W1(i ) D(i ) E = a Edata (I1, I2 , D) + b Esmooth (D) data term Want each pixel to find a good match in the other image (block matching result) smoothness term Adjacent pixels should (usually) move about the same amount (smoothness function) Lazebnik, UNC { { Energy Minimization E = a Edata (I1, I2 , D) + b Esmooth (D) E =å(W(i)-W(i+D(i)))2 data i 12 data term SSD distance between windows centered at I(x, y) and J(x+ d(x,y), y) Lazebnik, UNC Energy Minimization E = a Edata (I1, I2 , D) + b Esmooth (D) E =å(W(i)-W(i+D(i)))2 data i 12 data term SSD distance between windows centered at I(x, y) and J(x+ d(x,y), y) Esmooth = år(D(i)-D(j)) neighbors i, j e.g. Potts model e.g. truncated linear model Lazebnik, UNC Energy Minimization E = a Edata (I1, I2 , D) + b Esmooth (D) Energy functions of this form can be minimized efficiently using an algorithm known as Graph cuts Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001 V. Kolmogorov and R. Zabih, Computing Visual Correspondence with Occlusions using Graph Cuts, ICCV 2001 Lazebnik, UNC State-of-the-Art Results (as of 2015) Algorithm Results Ground truth M. Mozerov and J. van Weijer. Accurate stereo matching by two step global optimization. IEEE Transactions on Image Processing 24(3), January 2015. Match only Match & smoothness (via graph cut) Ground Truth Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001 All of these cases remain difficult, what can we do? textureless regions repeated patterns specularities Structured light Use controlled (“structured”) light to make correspondences easier Disparity between laser points on the same scanline in the images determines the 3-D coordinates of the laser point on object Use controlled (“structured”) light to make correspondences easier Structured light and two cameras laser IJ Structured light and one camera Projector acts like “reverse” camera IJ Example: Laser scanner Digital Michelangelo Project http://graphics.stanford.edu/projects/mich/ A note on normalization Estimating F – 8-point algorithm • The fundamental matrix F is defined by x'TFx = 0 for any pair of matches x and x’ in two images. é f11 f12 • Letx=(u,v,1)Tandx’=(u’,v’,1)T, F=êêf21 f22 f13 ù f23úú each match gives a linear equation êfffú ë 31 32 33û uu' f11 +vu' f12 +u' f13 +uv' f21 +vv' f22 +v' f23 +uf31 +vf32 + f33 =0 Problem with 8-point algorithm é f11 ù êê f12 úú é ùê f13 ú uu ́vu ́u ́uv ́vv ́v ́uv1êú ê11 11 1 11 11 1 1 1 úêf21ú uu ́vu ́u ́uv ́vv ́v ́u v 1 ê22 22 2 22 22 2 2 2 úêfú=0 ê! ! ! ! ! !!!!úê22ú ê úê f23 ú uu ́vu ́u ́uv ́vv ́v ́u v 1 ënn nn n nn nn n n n ûêfú ~10000 ~10000 ~100 ~10000 ~10000 ~100 ~100 ~100 1 ê 31ú Orders of magnitude difference êê f32 úú ! between column of data matrix ë f33 û ® least-squares yields poor results Normalized 8-point algorithm normalized least squares yields good results Transform image to ~[-1,1]x[-1,1] (0,500) (700,500) é2ù (-1,1) 0 -1 (1,1) (0,0) êê 700 úú ê 2-1ú (0,0) (700,0) (-1,-1) (1,-1) êê 5001úú êú ëû 1. Transform input by ˆ i i , to obtain Fˆ Normalized 8-point algorithm xˆ i , xˆ 'i F = T'ΤFT 2. Call 8-point on 3. ˆ x=Tx ˆ x' =Tx' ii image point camera point x'TFx = 0 F=0 Fˆ T -T ˆ x' T' -1ˆ Tx Normalized 8-point algorithm [x1, T1] = normalise2dpts(x1); [x2, T2] = normalise2dpts(x2); A = [x2(1,:)'.*x1(1,:)' x2(1,:)'.*x1(2,:)' x2(1,:)' ... x2(2,:)'.*x1(1,:)' x2(2,:)'.*x1(2,:)' x2(2,:)' ... x1(1,:)' x1(2,:)' ones(npts,1) ]; [U,D,V] = svd(A); F = reshape(V(:,9),3,3)'; [U,D,V] = svd(F); F = U*diag([D(1,1) D(2,2) 0])*V'; % Denormalise F = T2'*F*T1; 𝐹 = 𝑈Σ𝑉& Form a matrix Σ’ by replacing all but the k largest singular values in Σ with 0. References Basic reading: • Szeliski textbook, Section 8.1 (not 8.1.1-8.1.3), Chapter 11, Section 12.2. • Hartley and Zisserman, Section 11.12.