COMP 9517 Final Recap
2021 T1
Topics
• Image formation
• Image processing
• Feature representation • Segmentation
• Pattern recognition
• Motion and Tracking
• Deep Learning
Image Formation
Geometry of Image Formation
Mapping between image and world coordinates • Pinhole camera model
• Project geometry
• Projection matrix
Pinhole Camera
Projective Geometry
Vanishing point and lines
Perspective Projection
Three colour spaces
• RGB (Red, Green, Blue)
Default, hard to transfer the colour
• HSV (Hue, Saturation, Value, a.k.a, Lightness) Very useful in colour segmentation
• YCbCr (a.k.a YUV)
Y means the luma (brightness)
Very useful in video process and digital camera systems
Spatial Resolution
• Spatial Resolution: number of pixels per unit of length
• Human faces can be recognized at 64 x 64 pixels per face.
• Appropriate resolution is essential:
• Too little resolution, poor recognition
• Too much resolution, slow and wastes memory
Quantisation
• Quantisation digitises the intensity or amplitude values, i.e., F(x, y)
• Called intensity or gray level quantisation
• Gray-Level resolution:
• Usually has 16, 32, 64, …, 128, 256 level
Image Processing
Image Processing
• Two types of image processing • Spatial domain
• Frequency domain
• Two principal categories in spatial processing • Intensity transformation
• Spatial filtering
Image processing on spatial domain
• Some basic gray-level transformation function
• Histogram processing
• Spatial filtering
• Smooth spatial filter • Sharpen spatial filter
Basic grey-level transformation
• Image reversal
• Log transformation
• Power transformation • Contrast Stretching
𝑠=𝐿−1−𝑟
𝑠 = 𝑐𝑙𝑜𝑔(1 + 𝑟)
𝑠 = 𝑐𝑟!
Image reversal
𝑠=𝐿−1−𝑟
• S and r represents the pixel values before and after processing respectively
• The image reversal is particularly suitable for processing white or gray details embedded in the dark areas of image.
Log transformation
𝑠 = 𝑐𝑙𝑜𝑔(1 + 𝑟)
• C is a constant
• To expand the value of dark pixel or suppress higher grey level value in the picture.
Power transformation
𝑠 = 𝑐𝑟!
• Similar to log transformation on input and output
• Family of possible transform by varying 𝛾
• Useful in displaying an image accurately on a computer screen (for example on web sites!) by pre-processing images appropriately before display.
• Also useful for general-purpose contrast manipulation
Contrast Stretching
• One of the simplest piecewise linear transformations
• To increase the dynamic range of grey levels in image
• Produces images of higher contrast
• Puts values below L in the input to black in the output • Puts values above H in the input to white in the output
• Linearly scales values between L and H in the input to the maximum range in the output
• Used in display devices or recording mediums to span the full intensity range
Contrast Stretching
Grey-Level Slicing
• Highlighting of specific range of grey levels
• Display high value for all grey levels in range of interest, and low value for
all others produces binary image
• Brighten the desired range of grey levels, while preserving background and other grey-scale tones of image
Grey-Level Slicing
Histogram Processing
• Histogram Equalization
• To get an image with equally distributed brightness levels over the whole brightness scale
• Histogram Matching
• To get an image with a specified histogram (brightness distribution)
Histogram Equalization
Histogram Equalization
Example (Histogram Equalization)
Histogram Matching
Example (Histogram Matching)
The difference
• Histogram Equalization is kind of generator of 𝑇(𝑟) • In histogram matching, the 𝑇(𝑟) has been given.
Smooth spatial filter
• Neighbourhood Averaging
• Gaussian filter
• Median filter (non-linear filter) • Max filter (non-linear filter)
• Min filter (non-linear filter)
Smooth spatial filter
Smooth spatial filter
Neighbourhood Averaging
Gaussian Filter
Non-linear spatial filters
Sharpening spatial filters
• The sharpening spatial filters are utilized by the differential • Gradient Operator
• The Laplacian
• The Sobel
• Non-sharpening mask
Gradient Operator
Basic idea – Derivatives
• Horizontal scan of the image
• Edge modelled as a ramp- to represent blurring due to sampling
• First derivative is
• Non-zero along ramp
• Zero in regions of constant intensity
• Constant during an intensity transition
• Second derivative is
• Nonzero at onset and end of ramp
• Stronger response at isolated noise point
• Zero everywhere except at onset and termination of intensity transition
• Thus, magnitude of first derivative can be used to detect the presence of an edge, and sign of second derivative to determine whether a pixel lies on dark or light side of an edge.
Basic idea – Derivatives
The Sobel
The Laplacian
Non-sharpening mask (sharpening process)
• The procedure:
• Blurring the original image
• Obtaining the mask via minus the original image with the blurred image • Plus the mask on the original
𝑔”#$% 𝑥,𝑦 =𝑓 𝑥,𝑦 −𝑓2(𝑥,𝑦) 𝑔𝑥,𝑦 =𝑓𝑥,𝑦 +𝑘×𝑔”#$%(𝑥,𝑦)
Padding
• When we use a spatial filters for pixels on the boundary of an image, we do not have enough neighbours.
• To get an image with the same size as input
• Zero: set all pixels outside the source image to –
• Constant: set all pixels outside the source image to a specified border value • Clamp: repeat edge pixels indefinitely
• Wrap: copy pixels from opposite side of the image
• Mirror: reflect pixels across the image edge
Image Processing on Frequency domain
• Fourier Transform
• Frequency Domain Filtering • Notch Filter
• Gaussian Filter • DoG Filter
One-Dim Discrete Fourier Transform and its Inverse
One-Dim Discrete Fourier Transform and its Inverse
Frequency Domain Filtering
• Frequency is directly related to rate of change, so frequencies in the Fourier transform may be related to patterns of intensity variations in the image.
• Slowest varying frequency at u=v=0 corresponds to average grey level of the image.
• Low frequencies correspond to slowly varying components in the image- for example, large areas of similar grey levels.
• Higher frequencies correspond to faster grey level changes such as edges, noise etc.
Procedure for Filtering in the Frequency Domain
Notch Filter
Gaussian Filter
DoG Filter
Image Pyramids
• Multiple resolutions may be useful
• Local statistics such as intensity averages can vary in different parts of an image
Feature Representation
Feature Representation
• Colour features
• Colour histogram • Colour moments
• Texture features
• Haralick texture features
• Local binary patterns
• Scale-invariant feature transform (SIFT) • Texture feature encoding
• Shape features
• Basic shape features
• Histogram of oriented gradients (HOG)
Colour Features
• Represent the global distribution of pixel colours in an image
• Step 1: Construct a histogram for each colour channel (R, G, B)
• Step 2: Concatenate the histogram (vectors) of all channels as the final feature vector
Colour Moments
Haralick Features
• Haralick features give an array of statistical descriptors of image patterns to capture the spatial relationship between neighbouring pixels, that is, textures
• Step 1: Construct the gray-level co-occurrence matrix (GLCM)
• Step 2: Compute the Haralick feature descriptors from the GLCM
Haralick Features
Haralick Features
Local Binary Patterns
Local Binary Patterns
Local Binary Patterns
SIFT
SIFT Extrema Detection
SIFT Keypoints Localization
SIFT Orientation Assignment
SIFT Keypoint Descriptor
SIFT procedure
• Find SIFT keypoints
• Find best matching between SIFT keypoints • Descript the keypoints
• Descriptor matching
Descriptor matching
Feature Encoding
• The most popular method: Bag-of-words (BoW)
• Local image features are encoded into a histogram to represent the overall image feature
Feature Encoding
Feature Encoding
Feature Encoding
Feature Encoding
Application Example
Feature Encoding
• Local features can be other types of features, not just SIFT • LBP, SURF, BRIEF, ORB
• There are also more advanced techniques than BoW • VLAD, Fisher Vector
Shape Features
• Shape is an essential feature of material objects that can be used to identify and classify them
• Example: object recognition
Shape Features
• Human perception of an object or region involves capturing prominent / salient aspects of shape
• Shape features in an image are normally extracted after the image has been segmented into object regions
Boundary Descriptors
• Chain code descriptor
• The shape of a region can be represented by labelling the relative position of
consecutive points on its boundary
• A chain code consists of a list of directions from a starting point and provides a compact boundary representation
Boundary Descriptors
Application Example
Histogram of Oriented Gradients
• HOG describes the distributions of gradient orientations in localized areas and does not require initial segmentation
Histogram of Oriented Gradients
• Step 1: Calculate gradient magnitude and orientation at each pixel with a gradient operator => gradient vector
Histogram of Oriented Gradients
• Step 2: Divide orientations into N bins and assign the gradient magnitude of each pixel to the bin corresponding to its orientation => cell histogram
Histogram of Oriented Gradients
• Step 3: Concatenate and block-normalise cell histograms to generate detection-window level HOG descriptor
Histogram of Oriented Gradients
• Detection via sliding window on the image
Image Segmentation
• The partition of an image into a set of regions • Meaningful areas
• Border pixels grouped into structures
• Groups of pixels with shapes
• Foreground and background
Segmentation approaches
• Region-based
• Curve-based
• Early techniques tend to use region splitting and/or merging • Recent algorithms optimize some global criterion
Segmentation approach
• Region Split and Merge
• Watershed
• Mean Shift
• Superpixel Segmentation • Conditional Random Field • Active Contours
Connected Components
• Number of components depends on the chosen connectivity
Region split and Merge
• The simplest possible techniques
• Use a threshold and then compute connected components
• Rarely sufficient due to lighting and intra-object statistical varitations
Region Splitting
• One of the oldest techniques in computer vision
• First computes a histogram for the whole image
• Then finds a threshold that best separates the large peaks in the histogram
Region splitting
• Otsu’s method
Region Merging
Watershed Segmentation
Mean Shift
• Mean shift is a variant of iterative steepest-ascent method to seek stationary points (i.e. peaks) in a density function, which is applicable in many areas of multi-dimensional data analysis
• Attempts to find all possible cluster centers in feature space (without the requirement of knowing the number of cluster like k-means)
• K-means clustering has limitations: • Needs to choose K
• Sensitive to outliers
• Prone to local minima
Mean Shift
• Iterative mode searching
1. Initialize a random seed point x and window N
2. Calculate the mean (center of gravity) m(x) within N
3. Shift the search window to the mean
4. Repeat Step 2 until convergence
Mean Shift
• Advantages:
• Model-free, does not assume any prior shape on data clusters • Just a single parameter (window size)
• Finds variable number of nodes (clusters)
• Robust to outliers
• Limitations:
• Computationally expensive (need to shift many windows) • Output depends on window size
• Window size (bandwidth) selection is not trivial
• Does not scale well with dimensions of feature space
Superpixel Segmentation
• Superpixel-based segmentation improves the efficiency • Group similar pixels into one superpixel
• Segmentation (classification) performed on superpixels
• Also classed over-segmentation
• The method: Simple linear iterative clustering (SLIC)
• A popular superpixel generation algorithm
• Pros: preserves image boundaries, fast, and memory efficient
Conditional Random Field
• An undirected graphical structure
• Nodes: superpixels (feature representation of superpixels)
• Edges: adjacent superpixels (similarity between superpixels)
Conditional Random Field
Active Contours
• Aim
• To locate boundary curves in images
• How
• Boundary detectors iteratively move towards their final solution under the combination of image, smoothness, and optional user-guidance forces.
Active Contours
• Active contours / Snakes are parametric models
• Level-set has become more popular
• Level-set evolve to fit and track objects of interest by modifying the underlying embedding function instead of curve functions
Mathematical morphology
• Erosion • Dilation
Basic set operations
Dilation of binary images
Erosion of binary images
Opening of binary images
Closing of binary images
Morphological edge detection
Reconstruction of binary objects
Reconstruction of binary objects
Distance transform of binary images
Ultimate erosion and reconstruction
Dilation of grey-scale images
Erosion of grey-scale images
Opening of grey-scale images
Closing of grey-scale images
Summary of mathematical morphology
Motion
• Change detection
• Using image subtraction to detect changes in scenes
• Sparse motion estimation
• Using template matching to estimate local displacements
• Dense motion estimation
• Using optical flow to compute a dense motion vector field
Tracking
• Bayesian inference
• Using probabilistic models to perform tracking
• Kalman filtering
• Using linear model assumptions for tracking
• Particle filtering
• Using nonlinear models for tracking