IT代写 Mid-Level Vision (Multiview): Correspondence

Mid-Level Vision (Multiview): Correspondence
1. Describe what is meant by the “correspondence problem” and briefly describe three scenarios which might require a solution to this problem.
The correspondence problem is the problem of finding the same 3D location in two or more images.
This problem might arise:
1. When using multiple cameras to obtain two, or more, images of the same scene from different locations –
solving the correspondence problem would enable the 3D structure of the scene to be recovered.
2. When using a single camera to obtain two, or more, images of the same scene at different times – solving the
correspondence problem would enable estimation of camera and object motion.
3. When comparing an image with one or more stored images – solving the correspondence problem would
enable the similarity between the images to be determined, and hence, allow object recognition.
2. Briefly describe the methodology used in correlation-based and feature-based methods of solving the correspondence problem.
Correlation-based methods
Attempt to establish a correspondence for every pixel by matching image intensities in a window around each
pixel.
Feature-based methods
Attempt to establish a correspondence for a sparse sets of image locations, usually corners, using a feature
description extracted from around each of those locations.
3. For feature-based methods of solving the correspondence problem, briefly explain what is meant by a “detector” and a “descriptor”
The detector is the method used to locate image features (or interest points) which are suitable for matching.
The descriptor is an array of feature values associated with each interest point. These descriptors are compared
to determine which points match.
4. The two arrays below show the intensity values for each pixel in a stereo pair of 4 by 3 pixel images.
4767 7675 left:3 4 5 4 right:4 5 4 5
8768 7687
Calculate the similarity of the pixel at coordinates (2,2) in the left image, to all pixel location in the right image, and hence, calculate the disparity at that point. Repeat this calculation for the pixel at coordinates (3,2) in the left image. Assume that (a) a 3 by 3 pixel window is used, (b) similarity is measured using the Sum of Absolute Differences (SAD), (c) the image is padded with zeros to allow calculation of similarity at the edges, (d) the cameras have coplanar image planes, (e) disparity is calculated as the translation from right to left.
For coplanar cameras, assuming the x-axes are also collinear, we can restrict the search for correspondence to the same row in the right image from which the pixel in the left comes. However, all similarity measures are shown here.
For point (2,2), SAD =
Hence, best match is at location (3,2) in the right image. Disparity is left-right = (2,2)-(3,2) = (-1,0).
For point (3,2), SAD =
Hence, best match is at location (2,2) in the right image. Disparity is left-right = (3,2)-(2,2) = (1,0).
36 33 30 39 15 12 9 24
36 34
35 42
32 39 11 22 35 42
40 35 25 0
40 36
27

5. Below is a pair of images showing different views of the same scene.
L4
L3
L1
L2
L5
R3
R2
R1
The locations of interest points are indicated on each image, and vectors of features values for each of these interest point is given below:
R5
R4
Point L1 L2 L3 L4 L5
Feature Values Point (10, 4) R1 (3, 8) R2 (0, 2) R3 (6, 9) R4 (9, 1) R5
Feature Values (3, 7)
(1, 1)
(5, 7)
(8, 0)
(1, 2)
For each interest point in the left image, find the best matching interest point in the right image assuming that similarity is measured using the sum of absolute differences (SAD).
For L1, SAD:
R1: 10; R2: 12;
Therefore best match is R4. For L2, SAD:
R1: 1; R2: 9; Therefore best match is R1
For L3, SAD:
R1: 8; R2: 2;
Therefore best match is R5 For L4, SAD:
R1: 5; R2: 13; Therefore best match is R3
For L5, SAD:
R1: 12; R2: 8;
Therefore best match is R4
R3: 8; R3: 3;
R3: 10; R3: 3; R3: 6;
R4: 6; R4: 13;
R4: 10; R4: 11; R4: 2;
R5: 11 R5: 8
R5: 1
R5: 12 R5: 9
6. The coordinates of the interest points in Question 5, are as follows: Point Coordinates Point Coordinates
L1 L2 L3 L4 L5
(187, 168) (203, 290) (215, 87) (234, 28) (366, 142)
R1 (101, 394) R2 (115, 186) R3 (135, 128) R4 (269, 243) R5 (336, 178)
Calculate the disparity at each point in the left image. Note that if the cameras have coplanar image planes (although not necessarily colinear x-axes), calculating the disparity is equivalent to calculating the translation from right to left.
28

Translation from R4 to L1 is (187,168)-(269, 243) = (-82, -75) Translation from R1 to L2 is (203, 290)-(101, 394) = (102, -104) Translation from R5 to L3 is (215, 87)-(336, 178) = (-121, -91) Translation from R3 to L4 is (234, 28)-(135, 128) = (99, -100) Translation from R4 to L5 is (366, 142)-(269, 243) = (97, -101)
7. Write pseudo-code for the RANSAC algorithm.
1. Randomly choose a minimal subset (a sample) of data points necessary to fit the model
2. Fit the model to this subset of data
3. Test all the other data points to determine if they are consistent with the fitted model (i.e. if they lie within a distance t of the model’s prediction).
4. Count the number of inliers (the consensus set). Size of consensus set is model’s support
5. Repeat from step 1 for N trials
After N trials select the model parameters with the highest support and re-estimate the model using all the points in this subset.
8. Apply the RANSAC algorithm to find the true correspondence between the two images in Question 5. Assume (a) that the images are related by a pure translation in the x-y plane, (b) that t (the threshold for comparing the model’s prediction with the data) is 20 pixels, (c) 3 trials are performed and these samples are chosen in the order L1, L2, L3 rather than being randomly chosen.
Choose L1. Model is a translation of (-82, -75).
Locations of matching points predicted by this model are: For L2; (203, 290)-(-82, -75) = (285, 365)
actual match is at (101, 394) hence this is an outlier for this model. For L3; (215, 87)-(-82, -75) = (297, 162)
actual match is at (336, 178) hence this is an outlier for this model. For L4; (234, 28)-(-82, -75) = (316, 103)
actual match is at (135, 128) hence this is an outlier for this model. For L5; (366, 142)-(-82, -75) = (448, 217)
actual match is at (269, 243) hence this is an outlier for this model. Hence, consensus set = 0.
Choose L2. Model is a translation of (102, -104). Locations of matching points predicted by this model are: For L1; (187, 168)-(102, -104) = (85, 272)
actual match is at (269, 243) hence this is an outlier for this model. For L3; (215, 87)-(102, -104) = (113, 191)
actual match is at (336, 178) hence this is an outlier for this model. For L4; (234, 28)-(102, -104) = (132, 132)
actual match is at (135, 128) hence this is an inlier for this model. For L5; (366, 142)-(102, -104) = (264, 246)
actual match is at (269, 243) hence this is an inlier for this model. Hence, consensus set = 2.
Choose L3. Model is a translation of (-121, -91). Locations of matching points predicted by this model are: For L1; (187, 168)-(-121, -91) = (308, 259)
actual match is at (269, 243) hence this is an outlier for this model. For L2; (203, 290)-(-121, -91) = (324, 381)
actual match is at (101, 394) hence this is an outlier for this model. For L4; (234, 28)-(-121, -91) = (355, 119)
actual match is at (135, 128) hence this is an outlier for this model. For L5; (366, 142)-(-121, -91) = (487, 233)
actual match is at (269, 243) hence this is an outlier for this model. Hence, consensus set = 0.
Therefore the true correspondence is given by the matches for L2, L4, and L5. The best estimation of the model is 13[(102,−104)+(99,−100)+(97,−101)]=(99.33,−101.67)
9. Below is shown a simple 5 by 4 pixel binary image. The two arrays show the derivatives of the image intensities in the x and y directions.
29

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
1
2
3
10000 1−1000 Ix=−1 1 0 0 0Iy=−1 0 −1 0 0
 0010011100
00000 00000
Given the x any y derivatives of the image intensities shown above, calculate the response of the Harris corner detector at each of the six central pixels, assuming (a) a value of k=0.05, (b) that products of derivatives are summed over an equally weighted, 3 by 3 pixel, window around each pixel.
R = 􏰌 􏰊 I x2 􏰊 I y2 − ( 􏰊 I x I y ) 2 􏰍 − k 􏰌 􏰊 I x2 + 􏰊 I y2 􏰍 2
10000 11000 10000
I2=1 1 0 0 0I2=1 0 1 0 0II=1 0 0 0 0 x  0 0 1 0 0  y  1 1 1 0 0  x y  0 0 1 0 0 
00000 00000 00000 33100 34210 22000
􏰋I2=3 4 2 1 0􏰋I2=5 7 4 2 0􏰋II=2 3 1 1 0
4
12345
x
 2 3 2 1 0  y  3 5 3 2 0  x y  1 2 1 1 0  01110 23210 01110
 9 12 2 0 0   4 4 0 0 0  􏰋I2􏰋I2=15 28 8 2 0(􏰋II)2=4 9 1 1 0 x y  6 15 6 2 0  x y  1 4 1 1 0 
03210 01110  36 49 9 1 0 
(􏰋I2+􏰋I2)2= 64 121 36 9 0  
x y 25642590 4 16 9 4 0
R = 􏰌 􏰋 I x2 􏰋 I y2 − ( 􏰋 I x I y ) 2 􏰍 − k 􏰌 􏰋 I x2 + 􏰋 I y2 􏰍 2
12 2 0 0   4 4 0 0 0   36 49 9
 9
R=15 28 8 2 0−4 9 1 1 0−0.05×64 121 36 9 0
 6 1562014110 25 64 2590
0 3 2 1 0 0 1 1 1 0
 3.2 5.55 1.55 −0.05 0 R= 7.8 12.95 5.2 0.55 0  3.75 7.8 3.75 0.55 0 −0.2 1.2 0.55 −0.2 0
4 16 9 4 0

 
10. For the Harris corner detector, describe what type of image feature will give rise to the following values of R. (a)R≈0
(b)R<0 (c)R>0.
(a) R ≈ 0 occurs where intensity values are unchanging (b) R < 0 occurs at edges (c) R > 0 occurs at corners.
30
1 0 