CS代考 Low-Level Vision (Artificial)

Low-Level Vision (Artificial)
1. Below is shown a convolution mask, H. Calculate the result of convolving this mask with (a) image I1, and (b) image I2. In both cases only calculate results for locations where the mask fits entirely inside the image.
􏰈10􏰉 000 000 H= 1 1 , I1=0 1 0, I2=1 1 0
000 010
􏰈1 1􏰉 Rotate the mask: 0 1
(a)
(b)
H ∗I1 =
H ∗I2 =
􏰈 (1×0)+(1×0)+(0×0)+(1×1) (1×0)+(1×1)+(0×0)+(1×0)
􏰈 (1×0)+(1×0)+(0×1)+(1×1) (1×1)+(1×1)+(0×0)+(1×1)
(1×0)+(1×0)+(0×1)+(1×0) 􏰉 􏰈 1 0 􏰉 (1×1)+(1×0)+(0×0)+(1×0) = 1 1
(1×0)+(1×0)+(0×1)+(1×0) 􏰉 􏰈 1 0 􏰉 (1×1)+(1×0)+(0×1)+(1×0) = 3 1
2. Calculate H ∗ I, padding the image with zeros where necessary to produce a result the same size as I, when:
0 0 0 H=0 0 1,
0 0 0
0.25 1 0.8 I=0.75 1 1 
0 1 0.4
000 Rotatethemask:  1 0 0 
000
0 0.25 1 H∗I=0 0.75 1
001
3. Calculate h ∗ hT (giving the result as a 3-by-3 pixel image), where h = [1, 0.5, 0.1]. Show that this is equal to hT × h. Hence, calculate I ∗ H, where:
1 0.5 0.1 H=0.5 0.25 0.05,
0.1 0.05 0.01
111 I=1 1 1
1 1 1
1 TreathasthemaskandhT astheimage: hT = 0.5 .
0.1
11

Rotate the mask: 􏰂 0.1 0.5 1 􏰃
 1×1 0.5×1
h∗hT = 1×0.5 0.5×0.5 0.1×0.5 = 0.5 0.25 0.05 
0.1×1   1 0.5 0.1  1×0.1 0.5×0.1 0.1×0.1 0.1 0.05 0.01
 1   1 0.5 0.1  hT×h=0.5􏰂1 0.5 0.1􏰃=0.5 0.25 0.05
0.1 0.1 0.05 0.01
Hence, h ∗ hT = hT × h (more generally, the convolution of a row vector with a column vector can be written as vector multiplication).
I ∗ H = I ∗ (h ∗ hT ) = (I ∗ h) ∗ hT
1.5 1.6 0.6 (I∗h)= 1.5 1.6 0.6 
1.5 1.6 0.6
2.25 2.4 0.9 (I∗h)∗hT = 2.4 2.56 0.96 
0.9 0.96 0.36
This is the same as I ∗ H (confirm by doing 2D convolution by hand, or by using MATLAB).
4. List the categories of image features that can produce intensity-level discontinuities in an image.
• Depth discontinuities – due to surfaces at different distances
• Orientation discontinuities – due to changes in the orientation of a surface • Reflectance discontinuities – due to change in surface material properties • Illumination discontinuities – e.g. shadow boundaries
5. Convolution masks can be used to provide a finite difference approximation to first and second order direc-
tional derivatives. Write down the masks that approximate the following directional derivatives: (a) − δ , (b) − δ , δx δy
(c)− δ2 ,(d)− δ2 ,(e)− δ2 − δ2 . δx2 δy2 δx2 δy2
12

(a)−δ ≈􏰂−1 1􏰃 δx
δ 􏰈−1􏰉 (b)−δy ≈ 1
(c)−δ2 ≈􏰂−1 2 −1􏰃 δx2
 −1  (d)−δ2 ≈ 2 
δy2
−1
−1−1−1 0 −1 0 (e)−δ2 −δ2 ≈−1 8 −1or−1 4 −1
−1 −1 −1 0 −1 0
6. Convolve the mask 􏰂 −1 1 􏰃 with itself to produce a 3-by-1 pixel result.
The“image”paddedwithzerosis􏰂 0 −1 1 0 􏰃andtherotatedmaskis􏰂 1 −1 􏰃.
Hence, the result is:
􏰂 (0×1+−1×−1) (−1×1+1×−1) (1×1+0×−1) 􏰃=􏰂 1 −2 1 􏰃
i.e.,−δ ∗−δ = δ2 δx δx δx2
7. (a) Write down a mathematical expression describing the effect of convolving an image I with a Laplacian −1 −1 −1
mask (i.e. L =  −1 8 −1 ). (b) Hence, write down a mathematical expression describing the effect of −1 −1 −1
−1 −1 −1 convolving an image I with the following mask: L′ =  −1 9 −1 
δx2 δy2
(a)I∗L≈−􏰄δ2I +δ2I􏰅 δx2 δy2
−1 −1 −1
13

000 −1−1−1 (b)NoticethatL′ = 0 1 0 + −1 8 −1 =L′′ +L.
0 0 0 −1 −1 −1
SoI∗L′ =I∗(L′′ +L)=(I∗L′′)+(I∗L).
(I∗L′′)=IandI∗L≈−􏰄δ2I +δ2I􏰅. δx2 δy2

−1/8 −1/8 −1/8
Note that  −1/8 1 −1/8  is also ≈ −
−1/8 −1/8 −1/8 We denote this variable scale by c.
Therefore,I∗L′≈I−c􏰄δ2I +δ2I􏰅 δx2 δy2
􏰄δ2I δ2I 􏰅 δx2 + δy2
so the scale of the standard devistion is variable.
Note: (h1 ∗ I) + (h2 ∗ I) = (h1 + h2) ∗ I = H ∗ I where H = h1 + h2 Whereas: h1 ∗ (h2 ∗ I) = (h1 ∗ h2) ∗ I = H ∗ I where H = h1 ∗ h2
8. For edge detection, a Laplacian mask is usually “combined” with a Gaussian mask to create a Laplacian of Gaussian (or LoG) mask. (a) How are these masks “combined”? (b) Why is this advantageous for edge detection? (c) What other mathematical function can be used to approximate a LoG mask?
(a) Using convolution.
(b) The Laplacian is sensitive to noise as well as other intensity-level discontinuities.
The Gaussian is a smoothing mask that suppresses noise.
The combination of the two produces a mask that is sensitive to intensity-level discontinuities that are image
features rather than noise.
(c) A Difference of Gaussians (DoG) mask:
1 􏰆−(x2 + y2)􏰇 2πσ12 exp 2σ12
where σ2 > σ1
1
− 2πσ2 exp
􏰆−(x2 + y2)􏰇 2σ2
14

9. Use the following formula for a 2D Gaussian to calculate a 3-by-3 pixel numerical approximation to a Gaussian with standard deviation of 0.46 pixels, rounding values to two decimal places.
1 􏰆 (x2+y2)􏰇 G(x,y)=2πσ2exp − 2σ2
For central pixel x=0, y=0: G(0, 0) = 1 2π×0.462
exp 􏰄− (02 +02 ) 􏰅 = 0.7522 2×0.462
For middle-left pixel x=-1, y=0: G(−1, 0) =
1 2π×0.462
exp 􏰄− (−12 +02 ) 􏰅 = 0.0708 2×0.462
(the value will be the same for G(1, 0), G(0, −1), G(0, 1).)
For top-left pixel x=-1, y=-1: G(−1, −1) = 1 exp 􏰄− (−12 +−12 ) 􏰅 = 0.0067
2π×0.462
2×0.462
(the value will be the same for G(−1, 1), G(1, −1), G(1, 1)).
 0.01 0.07 0.01  Rounding to 2 d.p. and placing in mask: H =  0.07 0.75 0.07 
0.01 0.07 0.01
10. To perform multiscale feature analysis, it would be possible to either (1) keep the image size fixed and vary the size of the mask, or (2) keep the mask size fixed and vary the size of the image. (a) Why is the latter preferred? (b) Give and explicit example of the advantage of method (2) assuming that we have a 100 by 100 pixel image and a 3 by 3 pixel mask and we want to detect features at this scale and at double this scale.
(a) It is computationally cheaper.
(b) Convolving a m×m pixel mask with an n×n pixel image requires m2n2 multiplication operations (assuming output image is also n × n pixels).
For both methods we need to convolve a 100 × 100 pixel image with a 3 × 3 pixel mask, which requires 1002 × 32 = 90000 multiplications. In addition:
15

• For method (1) we need to convolve a 100 × 100 pixel image with a 6 × 6 pixel mask, which requires 1002 × 62 = 360000 multiplications.
• For method (2) we need to convolve a 50 × 50 pixel image with a 3 × 3 pixel mask, which requires 502 × 32 = 22500 multiplications. (i.e., 24 fewer multiplications than above.)
11. Downsample the following image by a factor of 2.
 0.3 0.2 0.1 0.5 0.5 0.4   0.4 0.1 0.3 0.6 0.2 1   0.6 0.3 0.8 0.2 0.5 0 
I= 0.3 0.3 0 0.5 0.6 0.9   0 . 6 0 . 4 0 . 9 1 0 . 7 0 . 9 
0.7 0.5 0.7 0.5 0.4 0.8
 0.1 0.6 1  2↓(I)= 0.3 0.5 0.9 
0.5 0.5 0.8
12. What is aliasing and how is this avoided when down-sampling images to create an image pyramid?
Aliasing refers to the distortion, or misrepresentation, that can occur due to the sampling of an image.
To avoid aliasing images are smoothed (by convolving with a Gaussian mask) prior to down-sampling.
13. Briefly describe what is meant by (a) a Gaussian image pyramid, and (b) a Laplacian image pyramid.
(a) a Gaussian image pyramid is a multiscale representation of a single image at different resolutions obtained by iteratively convolving an image with a Gaussian filter and down-sampling.
(b) a Laplacian image pyramid is a multiscale representation of a single image that highlights intensity discon- tinuities at multiple scales. It is obtained by iteratively convolving an image with a Gaussian filter, subtracting the smoothed image from the previous one, and down-sampling the smoothed image.
16