计算机代写 Mid-Level Vision (Multiview): Video

Mid-Level Vision (Multiview): Video
1. List constraints typically applied to solving the video correspondence problem, and note circumstances in which they fail.
• Spatial coherence (assume neighbouring points have similar optical flow). Fails at discontinuities between surfaces at different depths, or surfaces with different motion.
• Small motion (assume optical flow vectors have small magnitude). Fails if relative motion is fast or frame rate is slow.
2. Define what is meant by the “aperture problem” and suggests how this problem can be overcome.
The aperture problem refers to the fact that the direction of motion of a small image patch can be ambiguous. Particularly, for an edge information is only available about the motion perpendicular to the edge, while no
information is available about the component of motion parallel to the edge.
Overcoming the aperture problem might be achieved by
1. integrating information from many local motion detectors / image patches, or
2. by giving preference to image locations where image structure provides unambiguous information about
optic flow (e.g. corners).
3. Consider a traditional barber’s pole as shown in this image:
When the pole rotates on its axis in which direction is the (a) motion field, (b) optic flow? How might this be explained by the aperture problem?
(a) the motion field is horizontal
(b) the optic flow is (predominantly) vertical
The stripes of the pole provide ambiguous information. Only where the stripes meet the top and bottom and
the sides of the pole are corners present, and hence, there is (seemingly) unambiguous information.
If the visual system integrates information then as there are more corners moving vertically, and a large com-
ponent of vertical movement perpendicular to the stripes, overall motion is seen as vertical.
If the visual system relies more heavily on unambiguous motion cues (i.e. corners), there are more corners at the sides of the pole moving vertically, than there are corners at the top and bottom of the pole moving horizontally.
Hence, overall motion is seen as vertical.
4. Two frames in a video sequence were taken at times t and t+1s. The point (50,50,t) in the first image has been found to correspond to the point (25,50,t+1) in the second image. Given that the camera is moving at 0.1ms−1 along the camera x-axis, the focal length of the camera is 35mm, and the pixel size of the camera is 0.1mm/pixel, calculate the depth of the identified scene point.
Thedepthisgivenby: Z =−fVx . x ̇
The velocity of the image point is 25−50 = −25 pixels/s. 1
Given the pixel size this is equivalent to 0.0001 × −25 = −0.0025 m/s.
Hence, the depth is Z = − 0.035×0.1 = 1.4m. −0.0025
35

5. Two frames in a video sequence were taken at times t and t+1s. The point (50,70,t) in the first image has been found to correspond to the point (45,63,t+1) in the second image. Given that the camera is moving at 0.1ms−1 along the optical axis of the camera (i.e., the z-axis), and the centre of the image is at pixel coordinates (100,140), calculate the depth of the identified scene point.
The depth is given by: Z2 = x1Vz . x ̇
The coordinates of the points with respect to the centre of the image are: (-50,-70,t) and (-55,-77,t+1). The velocity of the image point is −55−(−50) = −5 pixels/s.
1 Hence, the depth is Z2 = −50×0.1 = 1m.
−5
Alternatively, using the y-coordinates: Z2 = y1Vz .
y ̇
The velocity of the image point is −77−(−70) = −7 pixels/s.
1 Hence, the depth is Z2 = −70×0.1 = 1m.
−7
6. Give an equation for the time-to-collision of a camera and a scene point which does not require the recovery of the
depth of the scene point.
Using this equation, calculate the time-to-collision of the camera and the scene point in the previous question, assuming the camera velocity remains constant.
time-to-collision = x1 . x ̇
Hence, for the point in the previous question, time-to-collision = −50 = 10s. −5
(From the answer to the previous question, we know that the camera is 1m from the point and moving at 0.1ms−1, so this agrees with the result above.)
7. The arrays below show the pixel intensities in the same 1 by 5 pixel patch taken from four frames of a greyscale video. In order to segment any moving object from the background, calculate the result of performing (a) image differencing, (b) background subtraction. In both cases assume that the threshold is 50 and in (b) that the background is calculated using a moving average which is initialised to zero everywhere and which is updated using the formula B(x,y) = (1− β)B(x,y)+βI(x,y,t)whereβ=0.5.
I(x,y,t1)=[190,200,90,110,90] I(x,y,t2)=[110,170,160,70,70] I(x,y,t3)=[100,60,170,200,90] I(x,y,t4)=[90,100,100,190,190]
(a) Image differencing. abs(I(x,y,t1)−I(x,y,t2))=[80,30,70,40,20]
abs(I(x,y,t2)−I(x,y,t3))=[10,110,10,130,20] abs(I(x,y,t3)−I(x,y,t4))=[10,40,70,10,100]
applying threshold:
abs(I(x,y,t1)−I(x,y,t2))>50=[1,0,1,0,0] abs(I(x,y,t2)−I(x,y,t3))>50=[0,1,0,1,0] abs(I(x,y,t3)−I(x,y,t4))>50=[0,0,1,0,1]
(b) Background subtraction.
att1: B =0.5I(x,y,t1)=[95,100,45,55,45]
abs(I(x,y,t1)−B)=[95,100,45,55,45]
att2: B =0.5B+0.5I(x,y,t2)=[102.5,135,102.5,62.5,57.5]
abs(I(x,y,t2)−B)=[7.5,35,57.5,7.5,12.5]
att3: B =0.5B+0.5I(x,y,t3)=[101.25,97.5,136.25,131.25,73.75]
abs(I(x,y,t3)−B)=[1.25,37.5,33.75,68.75,16.25]
att4: B =0.5B+0.5I(x,y,t4)=[95.625,98.75,118.125,160.625,131.875]
abs(I(x,y,t4)−B)=[5.625,1.25,18.125,29.375,58.125] applying threshold:
abs(I(x,y,t1)−B)>50=[1,1,0,1,0] abs(I(x,y,t2)−B)>50=[0,0,1,0,0] abs(I(x,y,t3)−B)>50=[0,0,0,1,0] abs(I(x,y,t4)−B)>50=[0,0,0,0,1]
36