程序代写代做代考 kernel graph database 2812ICT Perceptual Computing

2812ICT Perceptual Computing
Local Image Descriptors

Outline
• Image pyramids
• Scale-Invariant Feature Transform (SIFT) features and descriptors

Scaled representations
• Big bars (resp. spots, hands, etc.) and little bars are both interesting • Stripes and hairs, for example
• Inefficient to detect big bars with big filters
• And there is superfluous detail in the filter kernel
• Alternative:
• Apply filters of fixed size to images of different sizes
• Typically, a collection of images whose edge length changes by a factor of 2 (or root 2)
• This is a pyramid (or Gaussian pyramid) by visual analogy

Computer Vision – A Modern Approach Set: Pyramids and Texture Slides by D.A. Forsyth
A bar in the big images is a hair on the zebra’s nose; in smaller images, a stripe; in the smallest, the animal’s nose

Aliasing
• Can’t shrink an image by taking every second pixel
• If we do, characteristic errors appear
• In the next few slides
• Typically, small phenomena look bigger; fast phenomena can look slower
• Common phenomenon
• Wagon wheels rolling the wrong way in movies • Checkerboards misrepresented in ray tracing
• Striped shirts look funny on colour television
• Need to lowpass filter (aka smooth) the image before subsampling

Gaussian pre-filtering (low pass)

Gaussian pre-filtering (low pass)

Compared to downsampling without low-pass filtering…
Low-pass filtering (i.e. smoothing with a Gaussian kernel) before subsampling the image!

Scale-Invariant Feature Transform (SIFT)
David G. Lowe, “Distinctive image features from scale-invariant keypoints”, International Journal of Computer Vision, vol. 60(2): 91-110, 2004

SIFT: Motivation
• The Harris operator is not invariant to scale and correlation is not invariant to rotation
• For better image matching, Lowe’s goal was to develop an interest operator that is invariant to scale and rotation.
• Also, Lowe aimed to create a descriptor that was robust to the variations corresponding to typical viewing conditions. The descriptor is the most-used part of SIFT.

Object Instance Recognition (matching)

Challenges
• Scale change • Rotation
• Occlusion
• Illumination
……

Strategy
• Matching by stable, robust and distinctive local features.
• SIFT: Scale Invariant Feature Transform; transform image data into scale-invariant coordinates relative to local features

Idea of SIFT
• Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters
SIFT Features

Advantages of SIFT
• Locality: features are local, so robust to occlusion and clutter (no prior segmentation)
• Distinctiveness: individual features can be matched to a large database of objects
• Quantity: many features can be generated for even small objects
• Efficiency: close to real-time performance
• Extensibility: can easily be extended to wide range of differing feature types, with each adding robustness

SIFT: Overall Procedure
1. 2.
3.
4.
Scale-space extrema detection
Search over multiple scales and image locations
Keypoint localization
Fit a model to determine location and scale Select keypoints based on a measure of stability
Orientation assignment
Compute best orientation(s) for each keypoint region
Keypoint descriptor
Use local image gradients at selected scale and rotation to describe each keypoint region

Lowe’s Scale-space Interest Points: Difference of Gaussians
• Gaussian is an ad hoc solution of heat diffusion equation
• Hence
• k is not necessarily very small in practice

Scale-space Extrema Detection
• Find the points, whose surrounding patches (at some scale) are distinctive
• An approximation to the scale-normalized Laplacian of Gaussian

Lowe’s Pyramid Scheme

Scale space is separated into octaves: • Octave 1 uses scale 
• Octave 2 uses scale 2
• etc.
In each octave, the initial image is repeatedly convolved with Gaussians to produce a set of scale space images.
Adjacent Gaussians are subtracted to produce the DOG
After each octave, the Gaussian image is down-sampled by a factor of 2 to produce an image 1⁄4 the size to start the next level.

• •

Lowe’s Pyramid Scheme s+2 filters
s+1=2(s+1)/s0
.
.
i=2i/s0
.
. s+3 2=22/s0 images 1=21/s0 including 0 original
s+2 difference images
The parameter s determines the number of images per octave.

Key point localization
• Detect maxima and minima of difference-of-Gaussian in scale space
• Each point is compared to its 8 neighbors in the current image and 9 neighbors each in the scales above and below
s+2 difference images. top and bottom ignored. s planes searched.
ResBalumrple
Subtract
For each max or min found, output is the location and the scale.

Scale-space extrema detection: experimental results over 32 images that were synthetically transformed and noise added.
% detected
% correctly matched
average no. detected
average no. matched
Expense
Stability
• Sampling in scale for efficiency
• How many scales should be used per octave? S=?
• Morescalesevaluated,morekeypointsfound • S < 3, stable keypoints increased too • S>3,stablekeypointsdecreased
• S=3,maximumstablekeypointsfound

Keypoint localization
Issues:
• There are a lot of points, some of them are not good enough. • The locations of keypoints may be not accurate.
• Keypoints due to edges are unstable – need to eliminate them

Inaccurate Keypoint Localization

To locate extrema point:
Taylor expansion:
Minimize to find accurate extrema:
(1)
Function value at the extremum is given by (substituting (2) into (1)):
(3)
(2)

(b) The initial 832 keypoints locations at maxima and minima of the difference-of- Gaussian function. Keypoints are displayed as vectors indicating scale, orientation, and location.
Each keypoint is represented as (x, y, )
But, not all of them are good…

If D(xˆ)0.03 (pixel values assumed in range [0,1]) – keypoint is discarded.
(c) After applying a threshold on minimum contrast, 729 keypoints remain

Eliminating edge points
• Such a point has large principal curvature across the edge but a small one in the perpendicular direction
High “cornerness”  No dominant principal curvature component
• The principal curvatures can be calculated from a Hessian matrix H
• The eigenvalues of H are proportional to the principal curvatures, so two eigenvalues shouldn’t differ too much

Finding “Cornerness”
• Principal curvature are proportional to eigenvalues max , min of H
• Harris (1988) showed that cornerness can be determined by the determinant and trace obtained from H • Let=max ,=min
• Let r be the ratio such that  = r 
• Then
• Can just check for the desired ratio r
Threshold: if r > 10 – ratio is too great, keypoint discarded

(d) The final 536 keypoints that remain following an additional threshold on ratio of principal curvatures.

Orientation assignment
• Assign an orientation to each keypoint, the keypoint descriptor can be represented relative to this orientation and therefore achieve invariance to image rotation
• Compute magnitude and orientation on the Gaussian smoothed images

Orientation assignment
• Create gradient histogram (36 bins) weighted by magnitude and Gaussian window ( is 1.5 times that of the scale of a keypoint)
• Peaks in the histogram correspond to the orientations of the patch
• For the same scale and location, there could be multiple keypoints with different
orientations
• Any histogram peak within 80% of highest peak is assigned to keypoint • About 15% of keypoints has multiple orientations assigned

Stability of location, scale, and orientation (within 15 degrees) under noise
The top line in the graph shows the percent of keypoint locations and scales that are repeatably detected as a function of pixel noise.
The second line shows the repeatability after also requiring agreement in orientation.
The bottom line shows the final percent of descriptors correctly matched to a large database.

Keypoint Descriptors
• At this point, each keypoint has • location
• scale
• orientation
• Next is to compute a descriptor for the local image region about each keypoint that is
• highly distinctive
• invariant as possible to variations such as changes in viewpoint and illumination

Keypoint Descriptor
Create 16 gradient histograms (8 bins) weighted by magnitude and Gaussian window (  is 0.5 times of the window)
Keypoint Descriptor – 128 (4x4x8) element vector

Keypoint descriptor
• Descriptor depends on two main parameters:
• number of orientations r
• n x n array of orientation histograms
rn2 features
SIFT: r=8, n=4
128 features

Lowe’s Keypoint descriptor
• Based on 16*16 patches
• 4*4 subregions
• 8 bins in each subregion
• 4*4*8=128 dimensions in total

Change of Illumination
• Change of brightness => doesn’t effect gradients (difference of pixels value)
• Change of contrast => doesn’t effect gradients (up to normalization)
• Invariance to linear illumination changes • Normalization to unit length is sufficient
• Saturation (non-linear change of illumination) => affects magnitudes much more than orientation
• Threshold gradient magnitudes to 0.2 and renormalize

Robustness to viewpoint changes:
Match features after random change in image scale and orientation, with 2% image noise, and affine distortion. Find nearest neighbor in database of 30,000 features.

Matching SIFT features



Given a feature in I1, how to find the best match in I2? Define distance function that compares two descriptors Test all the features in I2, find the one with min distance
I1 I2



What distance function to use? Sum-of-square differences SSD(f1, f2)
𝑆𝑆𝐷(𝑓,𝑓)=σ𝑁 𝑓−𝑓2 12 𝑖=11𝑖2𝑖
Can give good score to very ambiguous matches
f1 f2
Matching SIFT features

I1 I2

Matching SIFT features
• Consider the ratio:
SSD(f1, f2) / SSD(f1, f2’)
• f2 is best SSD match to f1 in I2
• f2’ is 2nd best SSD match to f1 in I2
f1
f2′ f2
I1
I2

Matching SIFT features
Accept a match if SSD(f1, f2) / SSD(f1, f2’) < 0.8 • 90% of false matches were eliminated • Less than 5% of correct matches were discarded The probability that a match is correct can be determined by taking the ratio of distance from the closest neighbor to the distance of the second closest. Using a database of 40,000 keypoints, the solid line shows the PDF of this ratio for correct matches, while the dotted line is for matches that were incorrect Matches Matches Application: object recognition • The SIFT features of training images are extracted and stored • For a query image 1. Extract SIFT feature 2. Efficient nearest neighbor indexing 3. 3 keypoints, Geometry verification (clustering with Hough transform) Object Recognition Object Models Object Recognition Image panoramas SIFT Steps Summary (1) Scale-space extrema detection • Extract scale and rotation invariant interest points (i.e., keypoints). (2) Keypoint localization • Determine location and scale for each interest point. • Eliminate “weak” keypoints (3) Orientation assignment • Assign one or more orientations to each keypoint. (4) Keypoint descriptor • Use local image gradients at the selected scale. The most successful feature (probably the most successful paper* in computer vision, cited over 48000 time as of 19/8/2018 ) A lot of heuristics, the parameters are optimized based on a small and specific dataset. Different tasks should have different parameter settings *D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision, 60(2):91-110, 2004. Extensions to SIFT • PCA-SIFT 1. Working on 41*41 patches 2. 2*39*39 dimensions 3. Using PCA to project it to 20 dimensions • Surf • Approximate SIFT • Works almost equally well • Very fast SIFT: Illustrative Example SIFT detector SIFT detections Patch at detected position, scale, orientation SIFT descriptor • Extract patch around detected keypoint • Normalize the patch to canonical scale and orientation SIFT descriptor • Extract patch around detected keypoint • Normalize the patch to canonical scale and orientation • Resize patch to 16x16 pixels SIFT descriptor • Compute the gradients SIFT descriptor • Compute the gradients • Unaffected by additive intensity change SIFT descriptor • Compute the gradients • Unaffected by additive intensity change • Apply a Gaussian weighting function SIFT descriptor • Compute the gradients • Unaffected by additive intensity change • Apply a Gaussian weighting function • Weighs down gradients far from the centre • Avoids sudden changes in the descriptor with small changes in the window position SIFT descriptor • Compute the gradients • Unaffected by additive intensity change • Apply a Gaussian weighting function • Weighs down gradients far from the centre • Avoids sudden changes in the descriptor with small changes in the window position • Divide the patch into 16 4x4 pixels squares SIFT descriptor • Compute gradient direction histograms over 8 directions in each square SIFT descriptor • Compute gradient direction histograms over 8 directions in each square • Trilinear interpolation • Robust to small shifts, while preserving some spatial information SIFT descriptor • Compute gradient direction histograms over 8 directions in each square • Trilinear interpolation • Robust to small shifts, while preserving some spatial information SIFT descriptor • Concatenate the histograms to obtain a 128 dimensional feature vector SIFT descriptor • Concatenate the histograms to obtain a 128 dimensional feature vector • Normalize to unit length • Invariant to multiplicative contrast change • Threshold gradient magnitudes to avoid excessive influence of high gradients • Clamp gradients > 0.2 • Renormalize

Feature comparison

Feature comparison

Feature comparison

SIFT

Next Week
• Texture features
• Image segmentation