This week
• Edge and Feature Detection
• important for subsequent analysis
• Requires Filtering
• linear filter – replaces each pixel by a linear combination of itself and its neighbours
• generates a new image
• low-level vision = image processing
• Requires Convolution
• theprocessofapplyingthefiltertotheimage
Part a
Convolution
7CCSMCVI / 6CCS3COV: Computer Vision
Low-Level Vision (Artificial)
Convolution
I'(i,j)=I∗H=∑k,l I(i+k,j+l)H(−k,−l)
filtered image
at each image location (i,j) a new value is calculated as a weighted sum of pixel values in the neighbourhood (defined by the range of values taken by k and l – in this example k and l range from -1 to +1)
Convolution
mask (or template, or kernel)
an array of weights (like a small piece of image)
I'(i,j)=I∗H=∑k,l I(i+k,j+l)H(−k,−l)
To calculate values at each location in the filtered image:
You can imagine sliding the mask across the input image, filling in the values for the output (filtered) image as you go.
Alternatively, you can imagine the mask replicated at every pixel location in the output image, and the results generated in parallel (like the DoG filters in the retina).
Contents
Convolution
• method
• interpretation
• mask (separability and weights)
Convolution: method
For each image pixel in turn:
1. Centre the rotated mask over that pixel
2. Multiply each mask element by the corresponding image pixel value
3. Sum these products and write answer in corresponding pixel
location in the output image
-1 -1 0
-1 01 12 2 2 3
0 10 11 4 3 0222 0110
Convolution: method
For each image pixel in turn:
1. Centre the rotated mask over that pixel
2. Multiply each mask element by the corresponding image pixel value
3. Sum these products and write answer in corresponding pixel
location in the output image
-1 -1 0
-11 02 12 2
3 6
0113
014
0222 0110
Convolution: method
110 1 0 -1 0 -1 -1
Rotate the mask:
-1 -1 0 -1 0 1 011
1222 0 1 4 3 0 2 2 2 0110
H∗I
Convolution: method
For each image pixel in turn:
1. Centre the rotated mask over that pixel
2. Multiply each mask element by the corresponding image pixel value
3. Sum these products and write answer in corresponding pixel
location in the output image
-1 -1 0
1 2 -12 02 1
3 6 7 1
01011
43
0222 0110
Convolution: method
For each image pixel in turn:
1. Centre the rotated mask over that pixel
2. Multiply each mask element by the corresponding image pixel value
3. Sum these products and write answer in corresponding pixel
location in the output image
-1 -11 02 2 2 3 6 7 1 -1 00 11 4 3 2
0 10 12 2 2 0110
Convolution: method
For each image pixel in turn:
1. Centre the rotated mask over that pixel
2. Multiply each mask element by the corresponding image pixel value 3. Sum these products and write answer in corresponding pixel
location in the output image
-1 -1 0 1 -12 02 12
3 6 7
0011
143
0222 0110
Convolution: method
For each image pixel in turn:
1. Centre the rotated mask over that pixel
2. Multiply each mask element by the corresponding image pixel value 3. Sum these products and write answer in corresponding pixel
location in the output image skipping to the end:
1222 3671
0 1 4 3
0 2 -1 -1 0
2 5 2 -6 3 3 -4 -9 1 -1 -5 -5
0 1 -1 0 1
22
10
011
Convolution at image boundaries
How to deal with image boundaries (where mask “falls off” image)?
Most common methods:
1. pad the input image with zeros (area outside the black square in example below). i.e. the implicit assumption made in the preceding example.
2. make the output image smaller than the input image (red area in example below). i.e. only apply mask at locations on the input image where it does not fall off the input image.
-10-10000 0 0 -1001122 2 0 0010114 3 0 0022 2 0 0011 0 0 000000
3671 252-6 33-4-9 1-1-5-5
Convolution: method
For each image pixel in turn:
1. Centre the rotated mask over that pixel
2. Multiply each mask element by the corresponding image pixel value 3. Sum these products and write answer in corresponding pixel
location in the output image
-11 -12 02 2 3 6 7 1 -10 01 14 3 2 5
00 12 12 2 0110
Convolution and Correlation
convolution:
I'(i,j)=I∗H=∑k,l I(i+k,j+l)⏟H(−k,−l) rotated mask
cross-correlation:
I'(i,j)=I⋆H=∑k,l I(i+k,j+l)H(k,l)
Hence, if mask is not rotated the result will be the cross-correlation
rather than the convolution of I and H.
If mask is symmetrical (i.e.H(k ,l)=H(−k ,−l)) then
cross-correlation = convolution.
Convolution and Correlation
convolution is:
• commutative I∗H=H∗I
• associative (I∗H )∗G=I∗(H∗G)
cross-correlation is not:
• commutative I⋆H≠H⋆I
• associative (I H) G≠I (H G) ⋆⋆⋆⋆
For this reason convolution is used in preference to cross-correlation (despite the inconvenience of needing to rotate the mask)
Convolution at image boundaries
How to deal with image boundaries (where mask “falls off” image)?
Most common methods:
1. pad the input image with zeros (area outside the black square in example below). i.e. the implicit assumption made in the preceding example.
2. make the output image smaller than the input image (red area in example below). i.e. only apply mask at locations on the input image where it does not fall off the input image.
-11 -12 02 2 -10 01 14 3 00 12 12 2 0110
52 3 -4
Masks as point-spread functions
Convolve a mask with a simple image that has just an isolated white dot on a black background.
The output will be the mask itself shifted by the row and column numbers of the isolated pixel in the input.
An ordinary image can be thought of as a combination of such points – one for each pixel. So the result of a convolution can be thought of as a superimposition of masks, each one weighted by the intensity of an image pixel.
Input Image
Output Image
The convolution output is a maximum when large values in the input get multiplied by large values in the mask.
This means that convolution masks respond most strongly to image features that resemble the rotated mask.
The rotated mask is like a template which is scanned across the image to find image features that match that template.
Masks as templates
*
Mask
=
Input Image
Output Image
*
=
Mask
Separable Masks
A 2D convolution is separable if the kernel H can be written as a convolution of two (row) vectors
i.e. if H=h *h T 12
A separable convolution can be performed as two 1D convolutions
I*H = I * (h *h T) 12
= (I* h )* h T 12
Note h *h T = h T × h (i.e. the convolution of two vectors equals the 1221
product of those vectors)
Separable convolutions are much more efficient to compute. Convolving an image with an m×n pixel kernel takes:
m×n products per pixel for 2D mask
m+n products per pixel for two 1D masks
Mask weight values
Smoothing mask, e.g.
1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9
Differencing mask, e.g.
-1 -1 -1 -1 8 -1 -1 -1 -1
The weights in a smoothing mask are: • allpositive,and
• addupto1
to preserve average intensity values.
The weights in differencing masks are: • positive and negative, and
• addupto0
to generate a zero response when intensity values in a region are all the same.
Part b
Smoothing
Mask weights values
The values chosen for the mask determine the effects of the convolution.
• They are chosen to match/detect the features we want to locate
In theory, we could choose any values for the weights. In practice, two categories of mask are commonly used:
• smoothing masks
• differencing masks
Spatial Frequency
High frequency
Low frequency
Frequency
• in temporal domain
• in spatial domain
Small-scale texture is said to have a high spatial frequency.
Smoothing removes this, leaving low spatial frequencies.
High freq.
Low freq.
space (x)
time
amplitude= greyscale
Spatial Frequency: example
High Low
Contents
Smoothing Images
• spatial frequency • box masks
• Gaussian masks
space (y)
amplitude
Smoothing mask example Original Images
Images convolved with 3×3 box mask
1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9
Smoothing mask example Original Images
Images convolved with 9×9 box mask
Box mask
1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9
Each pixel replaced by average of itself and its eight neighbours.
Generates a smoothed (blurred) image. Useful.
Called “mean filter” or “box mask”.
Gaussian mask
The box mask is not very good at smoothing as it has sharp edges.
The Gaussian mask is more effective as it falls off gently at the edges. It gives more weight to nearby pixels.
cross-section
Gx,y= 1 exp−x2y2 22 22
The Gaussian mask effectively removes any spatial frequencies with a period or wavelength smaller than the mask dimensions (defined by σ = standard deviation, or “scale”).
Smoothing with Gaussian mask
Original image σ=1 σ=2 σ=4
“Scale” = the standard deviation of the Gaussian (i.e. σ)
as this parameter goes up:
– the size of the mask needs to increase (mask width should be ≥6σ) – more pixels are involved in the average
– the image gets more blurred (more high frequencies suppressed)
– noise is more effectively suppressed
Smoothing mask example Original Images
Images convolved with 17×17 box mask
Part c
Differencing
Contents
Differencing Masks
• 1st derivative masks • 2nd derivative masks • Laplacian mask
Separability of the Gaussian mask
Gx,y= 1 exp−x2y2 22 22
1−x2 1−y2 ={ exp 2 }{ exp 2 }
2 2 2 2
i.e. 2D Gaussian is the product of two 1D Gaussians.
Each 1D Gaussian is a function of one variable only (either x or y)
1st derivative mask Estimate of gradient is Δ y
Δx intensity
i2 i1
-1 1
• repeat this calculation at all locations, but
Multiply pixel values by this →
x
Convolution will:
• rotate the mask, so we get i1-i2 or ≈−δ/δ x 2nd derivative mask
Estimate of change of gradient is (i3-i2)-(i2-i1) = i1-2i2+i3
i.e.: (i2-i1)
intensity
i3 i2
i1
Convention is to actually use this mask:
x
Multiply pixel values by this →
1 -2 1
-1 2 -1 which calculates ≈−2/x2
Difference Masks
As well as calculating averages (as with box and Gaussian masks), convolution can also be used to calculate differences.
The difference between pixel values measures the gradient of the intensity values.
Hence:
– smoothing masks
approximate integration
intensity
– difference y masks
approximate differentiation
x
Difference mask example Original Images
Images convolved with vertical difference mask
-1 2 -1
Original Images
Images convolved with horizontal difference mask
-1 2 -1
−2/ y2
Difference mask example
−2/ x2
Difference masks for different directions
vertical horizontal horizontal vertical
1st derivative masks:
diagonals diagonals
Mask orientation
Orientation of intensity change detected
-1 ≈−δ/δy -1 1 2nd derivative masks:
-1 0
1 ≈−δ/δx 01 10
-1
2 ≈−δ2/δy2 -1
-1 2 -1 ≈−δ2/δx2
-1 0 0 0 0 -1 0 2 0 0 2 0 00-1 -100
0 -1
Difference mask example Original Images
Images convolved with diagonal difference mask
0 0 -1 020 -1 0 0
Difference mask example Original Images
Images convolved with vertical + horizontal + both diagonal difference mask
-1 -1 -1 -1 8 -1 -1 -1 -1
≈−2/x2−2/y2
Difference mask example Original Images
Images convolved with diagonal difference mask
-1 0 0 020 0 0 -1
Part d
Edge Detection
Contents
Edge Detection
• Intensity discontinuities
• Laplacian of Gaussian (LoG) / DoG mask • Gaussian derivative masks
• The Canny edge detector
The Laplacian mask
The final example be seen as a combination of 2nd derivative difference masks in each direction.
It therefore detects intensity discontinuities at all orientations.
-1 -1 -1 -1 8 -1 -1 -1 -1
≈−δ2/δx2−δ2/δy2
Called the Laplacian mask.
However, note that strictly the Laplacian should be the additive inverse of this mask.
Causes of intensity discontinuities
Not just object boundary “edges”
Depth discontinuity (D) – due to surfaces at different distances
Orientation discontinuity (O) – due to changes in the orientation of a surface Reflectance discontinuity (R) – due to change in surface material properties Illumination discontinuity (I) – e.g. shadow boundaries
Causes of intensity discontinuities
D O R
I
Depth discontinuity (D) – due to surfaces at different distances
Orientation discontinuity (O) – due to changes in the orientation of a surface Reflectance discontinuity (R) – due to change in surface material properties Illumination discontinuity (I) – e.g. shadow boundaries: not helpful for object
recognition
Intensity discontinuities
Discontinuities in the intensity (pixel) values of an image often correspond to useful (meaningful) physical characteristics, such as the boundaries of objects.
These features are useful, and can be sufficient, for recognising the content of an image.
e.g. we can understand an image that only contains edges (but not always!)
Edge detection
An edge is an image region in which the intensity gradients have a common direction and large magnitudes (in a direction orthogonal to the edge).
Idealised Edge Types: white=1, black=0 step: bar:
Edge detection: Idealised Edges
The difference masks considered previously will detect these
idealised edges.
step: bar:
white=1, black=0
-1 1
−/x
0 1 0
0 -1 1 0
Edge detection: Idealised Edges
The difference masks considered previously will detect these
idealised edges.
step: bar:
-1 2 -1 −δ2/δx2
white=1, black=0
0 1 -1 0
0 -1 2 -1 0
Edge detection: Noise
However, difference masks are also sensitive to a single high intensity dot, one pixel in size (i.e. noise).
white=1, black=0
-1 1
Edge detection: Noise vs Edges
However, we want to be able to distinguish these two conditions.
0-1 1 0 0-1 2-10
-1 2 -1
-1 1
0 -1 1 0
uninteresting cause
white=1, black=0
0 -1 1 0
potentially interesting cause
-1 1
Edge detection: Idealised Edges
The difference masks considered previously will detect these
idealised edges.
step: bar:
-1 -1 -1
-1 8 -1
-1 -1 -1 03-30 −δ2/δx2−δ2/δy2
white=1, black=0
0-36-30
Edge detection
Convolving with a differencing mask: • enhancesedges
• but also enhances noise
Convolving with a smoothing mask: • removesnoise
• but also blurs edges
There is a trade-off between edge detection and noise suppression.
Edge detection requires:
– a smoothing operator (a Gaussian mask) which establishes a spatial scale
– a differencing operator to find significant grey-level changes (at that spatial scale)
Effects of scale
Results depend on the standard deviation of the Gaussian smoothing mask used, e.g.:
original image
=0.5
smoothed image laplacian of =2 smoothed images
Edge detection: Noise
However, difference masks are also sensitive to a single high intensity dot, one pixel in size (i.e. noise).
white=1, black=0
-1 -1 -1 -1 8 -1 -1 -1 -1
-1 8 -1 0
Edge detectors Sequential method:
Edge detector method:
* (G * ∂) =
edge-detection mask
equivalent due to convolution being associative
*G=
*∂=
Laplacian of Gaussian / DoG mask
A standard approach is to combine Gaussian smoothing with a Laplacian operator.
The two masks can be combined by convolution into a single mask. This is called the Laplacian of Gaussian (LoG) mask.
-1/8 -1/8 -1/8 *-1/8 1 -1/8 =
-1/8 -1/8 -1/8
2 2 ≈−x2 G−y2 G
It is similar to the Difference of Gaussians or DoG mask (or “centre- surround” or “Mexican hat”).
–
=
=G1−G2
Effects of scale
Results depend on the standard deviation of the Gaussian smoothing mask used, e.g.:
original image
=2
smoothed image
=8 smoothed images
laplacian of
Edge detection comparison
original image (I)
derivative of Gaussian masks
I =− G∗I x x
Laplacian mask
noisy
less noisy – edges more prominent
I =− G∗I y y
−2 −2
2 G∗I evenlessnoisy
I ‘= 2 G x y
magnitude
I 2I 2 x y
The Canny edge detector
One of the most popular edge detectors is based on Gaussian derivative masks 1. convolution with derivatives of Gaussian masks
22
2. calculation of magnitude M= I I and direction D=arctan I /I
xy yx 3. non-maximum suppression (thin multi-pixel wide “ridges” down to a single
pixel by removing current pixel if neighbouring pixels perpendicular to the direction of the edge have a higher magnitude)
4. recursive hysteresis thresholding (accept/reject pixels above/below high/low threshold, others accepted if neighbouring an edge or neighbouring a pixel with value between thresholds that becomes an edge due to recursive hysteresis thresholding)
x gradient (Ix)
y gradient (Iy)
magnitude non-maximum suppression
Gaussian derivative masks
Alternatively, the 1st derivative difference operators can be combined with Gaussian smoothing. These tend to produce more robust results than the DoG mask.
-1 1 =
* *
Such masks emphasise vertical and horizontal structure at the scale set by the σ parameter of the Gaussian component.
=− G x
x gradient (at scale σ)
-1 1
=
=− G
y gradient y (atscaleσ)
LoG / DoG mask
Edge detection: issues
Parameters that work well for one image, may be poor for another, e.g.:
Edge detection: issues
Not all detected edges correspond to meaningful/useful features in the image, such as the contours of objects
Edge detection: issues
Results of edge detection are highly dependent on the parameter values, e.g.:
thres=0.1-0.3, sig=0.7 thres=0.2-0.3, sig=2.0
thres=0.1-0.3, sig=2.0 thres=0.1-0.3, sig=5.0 thres=0.1-0.5, sig=2.0
Part e
Multi-Scale Feature Detection
Contents
Multi-Scale Feature Detection
• Image Pyramids
– Gaussian Image Pyramids – Laplacian Image Pyramids
Edge detection: issues
Not all detected edges correspond to meaningful/useful features in the image, such as the contours of objects
Template matching across scales
Convolution scans the template across the image to find locations where the image matches the template
However, we don’t know the scale of the feature we are trying to find, as this will depend on the unknown parameters of the image formation process
How to find image features with invariance to scale?
Image Pyramid
To find image features with invariance to scale, we could:
1. apply filters of different sizes to the image, or
2. apply filters of a fixed size to the image presented at different sizes
The 2nd method is usually used. This is called an image pyramid. mask
original
image *
= *=
output images
resized images
=
*
General algorithm for feature detection
Recall that a mask can be viewed as a template for a particular image feature (one identical to the rotated mask).
Input Image
Output Image
*
Mask
=
• Convolve the image with a mask for the feature.
• Choose a threshold.
• Detect the feature in those parts of the image where the convolved image exceeds the threshold.
Image Pyramid creation
To find image features across a range of scales, we need a number of images at different scales.
↓2 ↓4 ↓8
↓2 ↓2 ↓2
Rather than resampling the original image, can sub-sample the previous sub-sampled image (this allows us to write a recursive code).
Down-sampling problem: aliasing
Sample chess board image at each circle:
← Good sampling (output image resembles input)
← Bad sampling (aliased)
Sample may not be representative of image region it come from.
Need to take account of larger region then just taking value from a single pixel
Down-sampling
Decreasing the size of an image is achieved by down-sampling (or sub-sampling):
s↓(I)(i, j)=I(si,s j) or ↓s
(create a new image by taking every sth pixel of the original image). e.g.:
↓2
Gaussian Pyramid creation
If we smooth an image with a Gaussian having a standard deviation of σ, and then convolve the result with a Gaussian having a standard deviation of σ, we get the same result as smoothing the original image with a Gaussian with a standard deviation of 2
i.e. Gaussian * Gaussian=another Gaussian
G∗G=G2
I0∗G2 I0∗G24
I0 I0∗G2
I1
I2
I1∗G2 I2∗G2
I3
In general: Ii=s↓(Ii−1∗Gσ) Gaussian Pyramid example
Down-sampling solution: smoothing
sampling at every second pixel
smoothing with Gaussian (σ=1 pixel), then sampling at every 2nd pixel
smoothing with Gaussian (σ=2 pixel), then sampling at every 2nd pixel
down- sampled image mis- represents original
Laplacian Pyramid creation
Gaussian Pyramid
I0 * I0∗G ↓ I1 * I1∗G ↓ I2 * I2∗G +-+-+-
L0 L1 L2
Laplacian Pyramid
Summary
Linear Filtering:
Creates a new image with pixel values that are weighted sums of original pixel values Requires:
• Convolution – the procedure for calculating the new pixel values
• Mask – the weights used to calculate the new pixel values
Simple (common) Masks:
Smoothing – box, Gaussian
Difference – 1st and 2nd derivatives in x and y directions,
2nd derivative in all directions (Laplacian) Edge Detection:
Combines smoothing and differencing (to find intensity changes at a certain scale) – Gaussian derivative mask – 1st derivative of Gaussian in x and y directions – LoG/DoG mask – 2nd derivative of Gaussian in all directions
Image Pyramids:
Important for describing and searching an image at all scales
• Gaussian pyramid – multi-scale representation of an image
• Laplacian pyramid – multi-scale representation of intensity discontinuities
Laplacian Pyramid
Recall that:
– a Laplacian of Gaussian (LoG) mask detects intensity discontinuities (e.g. edges) at all orientations
– can be approximated by a Difference of Gaussians (DoG)
A Laplacian image pyramid is an image pyramid that highlights intensity discontinuities at multiple scales.
A Gaussian pyramid already provides us with images that have been convolved with Gaussians of different scales.
A difference of Gaussians can therefore be obtained by subtracting images taken from the Gaussian pyramid.
In general: Li=Ii−(Ii∗Gσ) (an image in the Laplacian pyramid) where: Ii=s↓(Ii−1∗Gσ) (an image in the Gaussian pyramid)