COMP9517: Computer Vision
Feature Representation Part 1
Week 3 COMP9517 2021 T1 1
Outline
• Need for feature representation
• Major types of features – Colour
– Texture – Shape
• Feature descriptors and their use in various computer vision applications
Week 3 COMP9517 2021 T1 2
Image Features
• Image features are essentially vectors that are a compact representation of images
• They represent important information shown in an image
• Intuitive examples of image features: – Blobs
– Edges
– Corners – Ridges – Circles – Ellipses – Lines
– Etc…
Week 3
COMP9517 2021 T1 3
Image Features
• We need to represent images as feature vectors for further processing in a more efficient and robust way
• Examples of further processing include: – Objectdetection
– Imagesegmentation
– Imageclassification
– Content-basedimageretrieval – Imagestitching
– Objecttracking
Week 3 COMP9517 2021 T1 4
Object Detection
Week 3 COMP9517 2021 T1 5
Segmentation
Week 3 COMP9517 2021 T1 6
Image Classification
Week 3 COMP9517 2021 T1 7
Content-Based Image Retrieval
Week 3 COMP9517 2021 T1 8
Image Stitching
Week 3 COMP9517 2021 T1 9
Object Tracking
https://heartbeat.fritz.ai/
Week 3 COMP9517 2021 T1 10
Properties of Features
• Why not just use pixels values directly?
– Pixel values change with light intensity, colour and direction
– They also change with camera orientation
– And they are highly redundant
• Repeatability (robustness)
– Should be detectable at the same locations in different images
despite changes in illumination and viewpoint • Saliency (descriptiveness)
– Similar salient points in different images should have similar features
• Compactness (efficiency) – Fewerfeatures
– Smaller features
Week 3 COMP9517 2021 T1 11
Object detection Image segmentation Image classification Image retrieval Image stitching Object tracking
…
Image Pre-processing
Feature Representation
Pattern Recognition
Post-processing
Deep Learning
General Framework
Week 3
COMP9517 2021 T1
12
Feature Types – Colourhistogram
• Colour features
– Colourmoments
• Texture features
– Haralick texture features
– Localbinarypatterns(LBP)
– Scale-invariant feature transform (SIFT)
– Texture feature encoding
• Shape features
Week 3
COMP9517 2021 T1 13
– – –
Basic shape features
Shapecontext
Histogram of oriented gradients (HOG)
Colour Features
• Colour is the simplest feature to compute, and is invariant to image scaling, translation and rotation.
• Example: colour-based image retrieval
http://labs.tineye.com/multicolr/
Week 3 COMP9517 2021 T1 14
Colour Histogram
• Represent the global distribution of pixel colours in an image
– –
Step 1: Construct a histogram for each colour channel (R, G, B)
Step 2: Concatenate the histograms (vectors) of all channels as the final feature vector
Histogram of R channel
Histogram of G channel
Histogram of B channel
Week 3
COMP9517 2021 T1
15
Colour Moments
fij is the value of the i-th colour component of pixel j and N is the number of pixels in the image
• Another way of representing colour distributions
– First-ordermoment
– Second-ordermoment – Third-ordermoment
(mean) (variance)
(skewness)
N
i =( (fij −i)2)2
1
1 N j=1
N
si =( (fij −i)3)3
1
1 N j=1
• Moments based representation of colour distributions – Gives a feature vector of only 9 elements (for RGB images)
– Lowerrepresentationcapabilitythanthecolourhistogram
1N
i = N fij j =1
Week 3 COMP9517 2021 T1 16
Application Example • Colour-based image retrieval
Week 3 COMP9517 2021 T1 17
Texture Features
• Texture is a powerful discriminating feature for identifying visual patterns with properties of homogeneity that cannot result from the presence of only a single colour or intensity
• Example: texture classification
https://arxiv.org/abs/1801.10324
Week 3 COMP9517 2021 T1 18
Haralick Features
• Haralick features give an array of statistical descriptors of image patterns to capture the spatial relationship between neighbouring pixels, that is, textures
Week 3
COMP9517 2021 T1 19
– –
Step 1: Construct the gray-level co-occurrence matrix (GLCM) Step 2: Compute the Haralick feature descriptors from the GLCM
Haralick Features • Step 1: Construct the GLCMs
Week 3
COMP9517 2021 T1 20
–
–
Given a distance d at an orientation angle θ, then p(d, θ)(l1, l2), being the (l1, l2) coefficient of the corresponding matrix P(d, θ), is the co-occurrence count or probability of going from a grey level l1 to another grey level l2 with an inter- sample spacing of d along the axis making an angle θ with the x axis.
If the number of distinct gray levels in the quantized image is L, then the co- occurrence matrix P will be of size L×L.
original 4×4 image
co-occurrence matrix construction
4 2 1 0 2 1 3 0 2 4 0 0 1 2 1 0
0
0
1
1
0
0
1
1
0
2
2
2
2
2
3
3
P=P=
(1,0)
1 0 6 1 (1,135) 3 1 0 2
0 0 1 2
0 0 2 0
Haralick Features • Step 1: Construct the GLCMs
Week 3
COMP9517 2021 T1 21
– – – –
For computational efficiency, the number of gray levels (L) can be reduced by binning (similar to histogram binning), e.g. L = 256/n, with n a constant factor.
Different co-occurrence matrices can be constructed by using various combinations of distance (d) and angular directions (θ).
On their own, these co-occurrence matrices do not provide any measure of texture that can easily be used as texture descriptors.
The information in the co-occurrence matrices needs to be further extracted as a set of feature values => Haralick descriptors.
Haralick Features
• Step 2: Compute the Haralick descriptors from the GLCMs
– OnesetofHaralickdescriptorsforeachGLCMcorrespondingtoa particular distance (d) and angular direction (θ)
Week 3 COMP9517 2021 T1 22
Haralick Features
• Step 2: Compute the Haralick descriptors from the GLCMs
– OnesetofHaralickdescriptorsforeachGLCMcorrespondingtoa particular distance (d) and angular direction (θ)
Week 3 COMP9517 2021 T1 23
Application Example
https://doi.org/10.1016/j.patrec.2008.04.013
COMP9517 2021 T1 24
Week 3
Application Example
• Commonly used nowadays in medical imaging studies due to its simplicity and interpretability
C. Jensen et al.
Assessment of prostate cancer prognostic Gleason grade group using zonal-specific features extracted from biparametric MRI using a KNN classifier
Journal of Applied Clinical Medical Physics, 2019
https://doi.org/10.1002/acm2.12542
1. Pre-processing
2. Extract Haralick, run-length, and histogram
features from the region of interest
3. Feature selection
4. Classification using kNN
Week 3
COMP9517 2021 T1 25
Local Binary Patterns
• Describe the spatial structure of local image texture
– –
–
Divide the image into cells of N x N pixels (e.g. N = 16 or 32)
Compare each pixel in a cell to each of its 8 neighbouring pixels: If the centre pixel’s value is greater than the neighbour’s value, write “0”, otherwise write “1”
This gives an 8-digit binary pattern per pixel after comparing with all 8 neighbouring pixels, representing a value in the range 0…255
0
0
1
1
0
0
1
1
0
2
2
2
2
2
3
3
1 1 1 1 0 0 00
Week 3
COMP9517 2021 T1 26
Local Binary Patterns
• Describe the spatial structure of local image texture (cont.)
– Generate the histogram for all pixels in the cell, computing the frequency of each 8-digit binary number occurring in the cell
– This gives a 256-bin histogram (the LBP feature vector)
– Combine the histograms of all cells to obtain the image-level LBP feature descriptor
0
0
1
1
0
0
1
1
0
2
2
2
2
2
3
3
A histogram of 256 elements
Week 3
COMP9517 2021 T1 27
•
Local Binary Patterns LBP can be multi-resolution and rotation-invariant
– Multi-resolution: varying the distance between the centre pixel and neighbouring pixels, and the number of neighbouring pixels
T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7):971-987, 2002. https://doi.org/10.1109/TPAMI.2002.1017623
Week 3 COMP9517 2021 T1 28
Local Binary Patterns • LBP can be multi-resolution and rotation-invariant
– Rotation-invariant: varying the way of constructing the 8-digit binary number, e.g. performing bitwise shift to derive the smallest number
Example:
1 1 1 1 0 0 00 1 1 1 0 0 0 01 1 1 00 00 1 1 1 00 0 0 1 1 1 0 0 0 0 1 1 1 1 0 0 0 1 1 1 1 0 0 0 1 1 1 1 0 0 01 1 1 10 00
= 240 = 225 = 195 = 135 = 15 = 30 = 60 = 120
Note: not all patterns have 8 shifted variants (e.g. 11001100 has only 4)
Week 3 COMP9517 2021 T1 29
15
•
Local Binary Patterns LBP can be multi-resolution and rotation-invariant
– Rotation-invariant: varying the way of constructing the 8-digit binary number, e.g. performing bitwise shift to derive the smallest number => this reduces the LBP feature dimension from 256 to 36
T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7):971-987, 2002. https://doi.org/10.1109/TPAMI.2002.1017623
Week 3 COMP9517 2021 T1 30
Application Example • Texture classification
Week 3 COMP9517 2021 T1 31
• •
Scale-Invariant Feature Transform
SIFT feature describes the texture features in a localised region around a keypoint
SIFT descriptor is invariant to uniform scaling, orientation, and partially invariant to affine distortion and illumination changes
Scale-Space Extrema Detection Keypoint Localization Orientation Assignment Keypoint Descriptor
Find maxima/minima in DoG images across scales
Discarding low-contrast keypoints Eliminating edge responses
Achieve rotation invariance
Compute gradient orientation histograms
Week 3
COMP9517 2021 T1 32
•
SIFT Extrema Detection
Detect maxima and minima in the scale space of the image
Gaussian scale
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis. 60(2):91-110, November 2004. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Week 3 COMP9517 2021 T1 33
SIFT Keypoint Localization • Improve and reduce the set of found keypoints
– –
Use 3D quadratic fitting in scale-space to get subpixel optima Reject low-contrast and edge points using Hessian analysis
Week 3
COMP9517 2021 T1
34
Initial keypoints from scale-space optima
Keypoints after rejecting low-contrast points
Final keypoints after rejecting edge points
SIFT Orientation Assignment • Estimate keypoint orientation using local gradient vectors
– –
Make an orientation histogram of local gradient vectors
Find the dominant orientations from the peaks of the histogram
Week 3
COMP9517 2021 T1 35
SIFT Keypoint Descriptor
• 4 x 4 array of gradient histogram weighted by magnitude
• 8 bins in gradient orientation histogram
• Total 8 x 4 x 4 array = 128 dimensions
Week 3 COMP9517 2021 T1 36
Application Example • Image matching
Week 3 COMP9517 2021 T1 37
Application Example • Image matching
– FindSIFTkeypoints
Week 3 COMP9517 2021 T1 38
Application Example • Image matching
– Find best matching between SIFT keypoints
Week 3 COMP9517 2021 T1 39
Descriptor Matching • Nearest Neighbour Distance Ratio
DA – DB d2 DA -DC
NNDR=d1 =
– d1 is the distance to the first nearest neighbour
– d2 is the distance to the second nearest neighbour – Neighboursinfeaturespace
Week 3 COMP9517 2021 T1 40
Application Example • Image stitching
Week 3 COMP9517 2021 T1 41
Application Example • Image stitching
– FindSIFTkeypointsandfeaturecorrespondences
Week 3 COMP9517 2021 T1 42
Application Example • Image stitching
– Find the right spatial transformation
Week 3 COMP9517 2021 T1 43
Transformations
translation
rotation
scale
affine
original
Week 3
COMP9517 2021 T1
44
perspective
Week 3
COMP9517 2021 T1
45
Transformations
x’ = sx 0 x
y’ 0 s y
y y
Scale
x’ cos −sin x =
x’ a b cx y’=d e fy 1 0 0 11
Affine
Shear
y’ sin cos y Rotate
x 1 0 txx = y
Translate
x’ a b cx y’=d e fy
w’ g h iw Projective
x’ = 1 x x
y’ 1 y
y 0 1 ty 1
Fitting and Alignment
• Least-squares (LS) fitting of corresponding keypoints (x , x’ )
E =r2=f(x;p)−x’
2
ii
1 0 0 11
LS
ii
x’ a b cx y’=d e fy
x y 0 0 1 0b
x’
0 0 x y 0 1d=y’ e
i
ii
where p are the parameters of the transformation f a
c
Ap = b
Week 3
COMP9517 2021 T1
46
T −1 T
p=[AA] Ab
f
Fitting and Alignment
• RANdom SAmple Consensus (RANSAC) fitting
– Least-squares fitting is hampered by outliers
– Some kind of outlier detection and rejection is needed
– Better use a subset of the data and check inlier agreement
– RANSAC does this in a iterative way to find the optimum
Week 3
COMP9517 2021 T1 47
Fitting and Alignment • RANSAC
(line fitting example)
Algorithm:
1. Sample (randomly) the number of points required to fit the model
2. Solve for model parameters using samples
3. Score by the fraction of inliers within a preset threshold of the model
Repeat 1-3 until the best model is found with high confidence
Week 3 COMP9517 2021 T1 48
Fitting and Alignment • RANSAC
(line fitting example)
Algorithm:
1. Sample (randomly) the number of points required to fit the model
2. Solve for model parameters using samples
3. Score by the fraction of inliers within a preset threshold of the model
Repeat 1-3 until the best model is found with high confidence
Week 3 COMP9517 2021 T1 49
Fitting and Alignment • RANSAC
(line fitting example)
Algorithm:
1. Sample (randomly) the number of points required to fit the model
2. Solve for model parameters using samples
3. Score by the fraction of inliers within a preset threshold of the model
Repeat 1-3 until the best model is found with high confidence
Week 3 COMP9517 2021 T1 50
Fitting and Alignment
• RANSAC
(line fitting example)
δ
Algorithm:
1. Sample (randomly) the number of points required to fit the model
2. Solve for model parameters using samples
3. Score by the fraction of inliers within a preset threshold of the model
Repeat 1-3 until the best model is found with high confidence
Week 3 COMP9517 2021 T1 51
Fitting and Alignment • RANSAC
(line fitting example)
Algorithm:
δ
1. Sample (randomly) the number of points required to fit the model
2. Solve for model parameters using samples
3. Score by the fraction of inliers within a preset threshold of the model
Repeat 1-3 until the best model is found with high confidence
Week 3 COMP9517 2021 T1 52
Fitting and Alignment
• Given matched points A and B, estimate the translation
x iB = x iA + t x y i B y i A t y
A1
A B1B 22
B3
A3
Week 3
COMP9517 2021 T1
53
Alignment by Least Squares
1. Write down the objective function
2. Obtain the analytical solution
a) Compute derivative
b) Compute solution
3. Obtain computational solution
10 xB−xA 1 1
0 1 yB−yA tx11
t= 1 0y xB−xA
a) b)
Write in form Ap = b
Solve using pseudo-inverse
nn 0 1 yB−yA
nn
A1
A B1B 22
B3
A3
Week 3
COMP9517 2021 T1
54
Alignment by RANSAC
1. Sample a set of matching points (1 pair)
2. Solve for transformation parameters
3. Score parameters with number of inliers
4. Repeat steps 1-3 N times
xB xA tx i = i + t
yiB
yiA
y
A1
A B1B 22
B3
A3
Week 3
COMP9517 2021 T1
55
Application Example • SIFT-based texture classification – how to do this?
bread
cracker
Week 3
COMP9517 2021 T1
56
Problem: the number of SIFT keypoints (and thus the number of SIFT feature descriptors) may vary highly between images
Feature Encoding • Global encoding of local SIFT features
– Integrate the local features (SIFT keypoint descriptors) of an image into a global vector to represent the whole image
Week 3 COMP9517 2021 T1 57
Feature Encoding • Most popular method: Bag-of-Words (BoW)
– The variable number of local image features are encoded into a fixed-dimensional histogram to represent each image
http://cs.brown.edu/courses/cs143/2011/results/proj3/hangsu/
Week 3 COMP9517 2021 T1 58
Feature Encoding • Bag-of-Words (BoW) – step 1
– –
Create the vocabulary from the set of local descriptors (SIFT keypoint descriptors) extracted from the training data
This vocabulary represents the categories of local descriptors
Week 3
COMP9517 2021 T1 59
Feature Encoding
• Bag-of-Words (BoW) – step 1
– Main technique used to create the vocabulary: k-means clustering
– k-meansclusteringisoneofthesimplestandmostpopular unsupervised learning approaches that perform automatic clustering (partitioning) of the training data into multiple categories
Week 3 COMP9517 2021 T1 60
Feature Encoding • Bag-of-Words (BoW) – step 1
–
K-means clustering:
o Initialize: k cluster centres, typically randomly
o Iterate: 1) Assign data (feature vectors) to the closest cluster (Euclidean distance)
2) Update cluster centres as the mean of the data samples in each cluster o Terminate: When converged or the number of iterations reaches the maximum
Week 3
COMP9517 2021 T1 61
K-Means Clustering
Week 3 COMP9517 2021 T1 62
Feature Encoding • Bag-of-Words (BoW) – step 2
– –
The cluster centres are the “visual words” which form the “vocabulary” that is used to represent an image
An individual local feature descriptor (e.g. SIFT keypoint descriptor) is assigned to one visual word with the smallest distance
Week 3
COMP9517 2021 T1 63
Feature Encoding • Bag-of-Words (BoW) – step 2
– –
For an image the number of local feature descriptors assigned to each visual word is computed
The numbers are concatenated into a vector which forms the BoW representation of the image
Week 3
COMP9517 2021 T1 64
Application Example • SIFT-based texture classification
Vocabulary
Classification model
bread
Week 3
COMP9517 2021 T1
65
1. SIFT feature extraction
2. BoW encoding
3. Classification
Application Example
• SIFT-based texture
classification
Build vocabulary Train classifier
Classify image
http://heraqi.blogspot.com/2017/03/BoW.html
Week 3
COMP9517 2021 T1 66
Feature Encoding
• Local features can be other types of features, not just SIFT
– LBP, SURF, BRIEF, ORB
• There are also more advanced techniques than BoW
– VLAD,FisherVector
• A very good source of additional information is VLFeat.org
– http://www.vlfeat.org/
Week 3 COMP9517 2021 T1 67
Summary
• Feature representation is essential in solving almost all types of computer vision problems
• Most commonly used image features:
– Colourfeatures(Part1)
• Colour moments and histogram
– Texture features (Part 1) • Haralick, LBP, SIFT
– Shapefeatures(Part2)
• Basic, shape context, HOG
Week 3
COMP9517 2021 T1 68
Summary
• Other techniques described (Part 1)
– Descriptor matching
– Feature encoding (Bag-of-Words)
– k-means clustering
– AlignmentandRANSAC
– Spatialtransformations
• To be discussed (Part 2) – Shapefeatures
– Shapematching
– Slidingwindowdetection
Week 3
COMP9517 2021 T1 69
References and Acknowledgements
• Szeliski, Chapter 4 (in particular Sections 4.1.1 to 4.1.3 and 4.3.2), Chapter 6 (in particular Sections 6.1.1 to 6.1.4)
• Some content are extracted from the above resource, James Hays slides, and slides from Michael A. Wirth
• L. Liu et al., From BoW to CNN: two decades of texture representation for texture classification, International Journal of Computer Vision, 2019
• And other resources as indicated by the hyperlinks
Week 3 COMP9517 2021 T1 70