EBU7240 Computer Vision
Fitting: Least squares, RANSAC,
Semester 1, 2021
Changjae Oh
Copyright By PowCoder代写 加微信 powcoder
Raguram ‘11
• We’ve learned how to detect edges, cor ners, blobs. Now what?
• We would like to form a higher-level, m ore compact representation of the feat ures in the image by grouping multiple features according to a simple model
• Choose a parametric model to represent a set of features
simple model: lines simple model: circles
complicated model: car
Credit: K. Grauman
Fitting: Challenges
Case study: Line detection
• Noise in the measured feature locations
• Extraneous data: clutter (outliers), multiple lines
• Missing data: occlusions
Fitting: Overview
• If we know which points belong to the line, how do we find the “optimal ” line parameters?
̶ Least squares
• What if there are outliers?
̶ Robust fitting, RANSAC
• What if there are many lines?
̶ Voting methods: RANSAC, Hough transform
• What if we’re not even sure it’s a line? ̶ Model selection (not covered)
• Least squares
Least squares line fitting
• Data: (x1, y1), …, (xn, yn)
• Line equation: yi = mxi + b
• Find (m, b) to minimize
2 11 E=Y−XB where Y= X=
E= Y−XB2 =(Y−XB)T(Y−XB)=YTY−2(XB)TY+(XB)T(XB)
dE =2XT XB−2XTY =0 dB
Normal equations: least squares solution to XB=Y
E=n (y−mx−b)2
m B = b
yx1 nn
y=mx+b (xi, yi)
X T XB = X T Y
Problem with “vertical” least squares
• Not rotation-invariant
• Fails completely for vertical lines
Total least squares
• Distance between point (xi, yi) and line ax+by=d (a2+b2=1): |axi + byi – d|
Unit normal: N=(a, b)
Total least squares
• Distance between point (xi, yi) and line ax+by=d (a2+b2=1): |axi + byi – d|
• Find (a, b, d) to minimize the sum of squared perpendicular distances
Unit normal: N=(a, b)
E=n (ax+by−d)2
Total least squares
• Distance between point (xi, yi) and line ax+by=d (a2+b2=1): |axi + byi – d|
• Find (a, b, d) to minimize the sum of squared perpendicular distances
E=n −2(axi+byi−d)=0 d i=1
Unit normal: N=(a, b)
d=an xi+bn yi=ax+by n i=1 n i=1
E=n (ax+by−d)2
x−x y−y 2
n 2 1 1 a T
E= i=1(a(x−x)+b(y−y))= b =(UN)(UN) ii
x−x y−y dE=2(UTU)N=0 n n
Solution to (UTU)N = 0, subject to ||N||2 = 1: eigenvector of UTU associated with the smallest eigenvalue (least squares solution to homogeneous linear system UN = 0)
Total least squares
x−x y−y 11
nn (xi −x)2 (xi −x)(yi −y)
U= n (x−x)(y−y)
n (y−y)2
nni=1 i=1
x−x y−y
Total least squares
x−x y−y 11
(xi −x)2 UTU= i=1
U= n (x−x)(y−y)
n (y−y)2
(xi −x)(yi −y)
second moment matrix
N = (a, b)
(xi −x,yi −y)
i=1 nni=1 i=1
x−x y−y
Least squares: Robustness to noise • Least squares fit to the red points:
Least squares: Robustness to noise • Least squares fit with an outlier:
Problem: squared error heavily penalizes outliers
Robust estimators
• General approach: find model parameters θ that minimize
– 𝑟 𝑥 , 𝜃 : residual of i-th point w.r.t. model parameters 𝜃 𝑖𝑖
: robust function with scale parameter 𝜎
The robust function 𝜌 behaves like squared distance for small values of the residual 𝑢 but sat urates for larger values of 𝑢
Slide from Lazebnik
Robust estimators
• General approach: find model parameters θ that minimize
– 𝑟 𝑥 , 𝜃 : residual of i-th point w.r.t. model parameters 𝜃 𝑖𝑖
– 𝜌 . : robust function with scale parameter 𝜎
Robust fitting is a nonlinear optimization problem that must be solved iteratively Least squares solution can be used for initialization
Scale of robust function should be chosen carefully
Choosing the scale: Just right
The effect of the outlier is minimized
• Least squares
• Robust fitting can deal with a few outliers – what if we have very many?
• Random sample consensus (RANSAC):
̶ Very general framework for model fitting in the presence of outliers
Choose a small subset of points uniformly at random
Fit a model to that subset
Find all remaining points that are “close” to the model and reject the rest as outliers Do this many times and choose the best model
M. A. Fischler, R. C. Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applicatio ns to Image Analysis and Automated Cartography. Comm. of the ACM, Vol 24, pp 381-395, 1981.
(RANdom SAmple Consensus) : Learning technique to estimate parameters of a model by random sampling of observed data
Fischler & Bolles in ‘81.
: I → P , O such that:
Model parameters
T −1 T f(P,)= −(P P) P
Slide from Savarese
(RANdom SAmple Consensus) : Fischler & Bolles in ‘81.
Algorithm:
1. Sample (randomly) the number of points required to fit the model
2. Solve for model parameters using samples
3. Score by the fraction of inliers within a preset threshold of the model
Repeat 1-3 until the best model is found with high confidence
Line fitting example
Algorithm:
2. Solve for model parameters using samples
3. Score by the fraction of inliers within a preset threshold of the model
Repeat 1-3 until the best model is found with high confidence
1. Sample (randomly) the number of points required to fit the model (#=2)
Line fitting example
Algorithm:
1. Sample (randomly) the number of points required to fit the model (#=2)
3. Score by the fraction of inliers within a preset threshold of the model Repeat 1-3 until the best model is found with high confidence
2. Solve for model parameters using samples
Line fitting example
NI =6 Algorithm:
1. Sample (randomly) the number of points required to fit the model (#=2)
2. Solve for model parameters using samples
Repeat 1-3 until the best model is found with high confidence
3. Score by the fraction of inliers within a preset threshold of the model
Algorithm:
1. Sample (randomly) the number of points required to fit the model (#=2)
2. Solve for model parameters using samples
3. Score by the fraction of inliers within a preset threshold of the model
Repeat 1-3 until the best model is found with high confidence
Parameters in RANSAC
• Number of trials 𝑆
• Number of sampled points 𝑘
• Fitting the line to these 𝑘 points
• Finding inliers to this line among the remaining points (i.e., points whose distance from the line is less than )
How to choose parameters?
• Number of trials 𝑆
̶ Choose S so that, with probability P, at least one random sample is free from outliers (e.g. P=0.99) (outlier ratio: e )
• Number of sampled points 𝑘
̶ Minimum number needed to fit the model
• Distance threshold
Choose so that a good point with noise is likely (e.g., prob=0.95) within threshold
1−𝑃 = (1−(1−𝑒)𝑘)𝑆
proportion of outliers e
k 5% 10% 20% 25% 30% 40% 50%
2 2 3 5 6 7 11 17 3 3 4 7 9 11 19 35 4 3 5 9 13 17 34 72 5 4 6 12 17 26 57 146 6 4 7 16 24 37 97 293 7 4 8 20 33 54 163 588 8 5 9 26 44 78 272 1177
𝑆 = log(1 − 𝑃) log(1− 1−𝑒 𝑘)
Image Stitching using Affine Transform • Affine transformation estimation with RANSAC
1. Randomly sample 𝑘 data (𝑘 ≥ 3). Here, compare when 𝑘 = 3, 4.
2. Estimate the affine transformation 𝑇 by solving 𝑴𝒙 = 𝒃.
3. Score by computing the number of inliers satisfying |𝑇𝑝 − 𝑝′|2 < 𝛿2 from all matches.
Repeat 1~3 steps 𝑆 times (𝑇 ,...,𝑇 ). 1𝑆
4. Select the best affine transformation 𝑇 from (𝑇 , ... , 𝑇 ). 𝐵1𝑆
5. Re-estimate the affine transformation by solving 𝑴𝒙 = 𝒃 with 𝑇 ’s inliers. 𝐵
Image Stitching using Affine Transform
Input:Asetof 𝑁 matchedpoints 𝑀𝑃={ 𝑝 ,𝑝′ , 𝑝 ,𝑝′ ..., 𝑝 ,𝑝′ }
Output: Affine transform 𝑇 𝐹
𝑆: the number of trials count_mat: 𝑆 × 1 vector 𝐼𝑁: a set of inliers
Initialize count_mat to 0
for i = 0 ~ S-1
Randomly select k matched points from 𝑀𝑃 (Usually, k=3,4) Estimate an affine transformation 𝑇 with k matched points
for j = 0 ~ N-1 if|𝑇𝑝 −𝑝′|2<𝛿2
count_mat[i]++
Choosethebestaffinetransformation𝑇←𝑇 where𝐾=𝑎𝑟𝑔maxcount_mat[i]
for j = 0 ~ N-1 if|𝑇𝑝 −𝑝′|2<𝛿2
𝐼𝑁 ← 𝐼𝑁 ∪ {(𝑝 , 𝑝′ )} 𝑗𝑗
Re-estimate an affine transformation 𝑇 with 𝐼𝑁 𝐹
0011 𝑁−1𝑁−1
Applications
RANSAC: Conclusions • Good
Robust to outliers
Applicable for larger number of objective function parameters than Hough transform Optimization parameters are easier to choose than Hough transform
Computational time grows quickly with fraction of outliers and number of parameters Not good for getting multiple fits
Lots of parameters to tune
Doesn’t work well for low inlier ratios (too many iterations, or can fail completely)
• Common applications
̶ Computing a homography (e.g., image stitching)
̶ Estimating fundamental matrix (relating two views)
Hough transform
Fitting: Review
• If we know which points belong to the line, how do we find the “optimal ” line parameters?
̶ Least squares
• What if there are outliers?
̶ Robust fitting, RANSAC
• What if there are many lines?
̶ Voting methods: RANSAC, Hough transform
• Least squares
Voting schemes
• Let each feature vote for all the models that are compatible with it
̶ Hopefully the noise features will not vote consistently for any single model
̶ Missing data doesn’t matter as long as there are enough features remaining to agree on a good model
• An early type of voting scheme
Discretize parameter space into bins
For each feature point in the image, put a vote in every bin in the parameter space that could have generated this point
Find bins that have the most votes Find maximum or local maxima in grid
Image space
Hough parameter space
P.V.C. Hough, Machine Analysis of Bubble Chamber Pictures, Proc. Int. Conf. High Energy Accelerators and Instrumentation, 1959
: Fitting Multiple Lines • One simple example using
̶ Fitting lines, when a set of sparse points are given.
• Key idea: transform (𝑥, 𝑦) → (𝑎, 𝑏)
Parameter space representation
• A line in the image corresponds to a point in Hough space
Image space Hough parameter space
Source: S. Seitz
Parameter space representation
• What does a point 𝒙𝟎, 𝒚𝟎 in the image space map to the Hough space? ̶ Answer: the solutions of 𝑦0 = 𝑚𝑥0 + 𝑏
̶ This is a line in Hough space
Image space Hough parameter space
Parameter space representation
• Where is the line that contains both 𝒙𝟎, 𝒚𝟎 and 𝒙𝟏, 𝒚𝟏 ?
(x1, y1) (x0, y0)
Parameter space representation
• Where is the line that contains both 𝒙𝟎, 𝒚𝟎 and 𝒙𝟏, 𝒚𝟏 ?
̶ Itistheintersectionofthelines𝑏=−𝑥 𝑚+𝑦 and𝑏=−𝑥 𝑚+𝑦 00 11
(x1, y1) (x0, y0)
b = –x1m + y1
: Fitting Multiple Lines • Suppose we have five points
1,0 →𝑏=−𝑎 1,1 →𝑏=−𝑎+1 2,1 →𝑏=−2𝑎+1 4,1 →𝑏=−4𝑎+1 3,2 →𝑏=−3𝑎+2
Maximum intersections!
𝑎,𝑏 = 0,1 𝑎,𝑏 =(1,−1)
: Fitting Multiple Lines • Problem with (a,b) space
Unbounded parameter domains
𝑦 = 𝑎𝑥 + 𝑏 is not able to model a vertical line
Let’s use different type of parameterization (𝑟,𝜃)!
(𝑟𝑐𝑜𝑠𝜃, 𝑟𝑠𝑖𝑛𝜃) (𝑥, 𝑦)
𝑦−𝑟𝑠𝑖𝑛𝜃=−tan90−𝜃 =−𝑐𝑜𝑠𝜃 𝑥 − 𝑟𝑐𝑜𝑠𝜃 𝑠𝑖𝑛𝜃
𝑥𝑐𝑜𝑠𝜃 + 𝑦𝑠𝑖𝑛𝜃 = 𝑟
: Fitting Multiple Lines
Use a polar representation for the parameter space
Hough space
𝑥𝑐𝑜𝑠𝜃 + 𝑦𝑠𝑖𝑛𝜃 = 𝑟
: Fitting Multiple Lines
• One problem still happens when handling continuous parameters (𝑟, 𝜃) on the discrete domain.
̶ Too many possible (𝑟, 𝜃)!
̶ So, let’s just use a few number of quantized (𝑟, 𝜃)
• If we discretize 𝜃 to use only four values: -45°, 0°, 45°, 90°
: Fitting Multiple Lines
• The accumulator array contains how many times each value of (r, θ) appears in the table
: Fitting Multiple Lines
Clean data
Features Votes
Slide from S. Savarese
: Fitting Multiple Lines • Issue: noisy data
Noisy data
Features Votes
Slide from S. Savarese
: Fitting Multiple Lines • Issue: spurious peaks due to uniform noise
Features Votes
Dealing with noise
• Choose a good grid / discretization
̶ Too coarse: large votes obtained when too many different lines correspond to a single bucket
̶ Too fine: miss lines because some points that are not exactly collinear cast votes for different buckets
• Increment neighboring bins (smoothing in accumulator array)
• Try to get rid of irrelevant features
̶ E.g., take only edge points with significant gradient magnitude
: Fitting Multiple Lines
Canny Edge Detector
Find peaks and post- process
Algorithm outline
1: Initialize accumulator H to all zeros
2: for each feature point (x,y) in the image
8: Find the value(s) of (θ, ρ) where H(θ,ρ) is
a local maximum
9: The detected line in the image is given by
ρ = x cos θ + y sin θ
for θ = 0 to 180
ρ = x cos θ + y sin θ
H(θ, ρ) = H(θ, ρ) + 1
: Fitting Multiple Lines
• Practical considerations ̶ Bin size
̶ Smoothing
̶ Finding multiple lines
̶ Finding line segments
Adjusting bin size and the amount of smoothing
Finding multiple lines & line segments
Application
: Conclusions • Good
Robust to outliers: each point votes separately
Fairly efficient (much faster than trying all sets of parameters) Provides multiple good fits
Some sensitivity to noise
Bin size trades off between noise tolerance, precision, and speed/memory Can be hard to find sweet spot
Not suitable for more than a few parameters
grid size grows exponentially
• Common applications
̶ Line fitting (also circles, ellipses, etc.)
̶ Object instance recognition (parameters are affine transform)
Feature Matching and Fitting
Feature matching
Fundamental matrix estimation
Feature matching
Image stitching using geometric transform estimation
Feature Matching and Fitting
Feature Matching Modules
Parameter Estimation
Robust Estimation
1) Robust least square
3) M-estimator
4) Least Median of Square (LMedS) And so on...
Robust Estimation
Parameter Examples
1) 2D Geometric transform 2) Fundamental matrix
3) 3D Camera motion
(3D rotation/translation) And so on
Parameter Estimation
1) Least squares fit
(or Total least square)
2) Hough transform
3) Iterative Closest Points (ICP) And so on
Note) ICP does NOT require feature matching modules
Next topic
• Images are high-dimensional signals, consisting of redundant pixels. How can we group them?
̶ Prerequisite
• Review EBU6230 Image/Video Processing – Week3: Edges
EBU7240 Computer Vision
Grouping: thresholding, K
Semester 1, 2021
Changjae Oh
• Unsupervised Segmentation
̶ Thresholding-based segmentation ̶ K-mean clustering
• Interactive segmentation
• Unsupervised Segmentation
̶ Thresholding-based segmentation ̶ K-mean clustering
• Interactive segmentation
Introduction
• Image segmentation
̶ Divide an image into non-overlapping regions (object) with similar information
• Single threshold
Thresholding
• Single threshold
̶ Shows the hidden aspects of an image
Thresholding
• Double Thresholding
Thresholding
a if f (x, y) T 2
g(x,y)=b ifT f(x,y)T 12
c if f (x, y) T 1
How to predict appropriate threshold?
How to predict appropriate threshold?
• Watch out the histogram
̶ The threshold can exist between two dominant distributions.
How to predict appropriate threshold?
• Watch out the histogram
̶ The threshold can exist between two dominant distributions.
Otsu’s Method: Optimum Global Thresholding
• Basic assumption: bimodal signal (two representative distributions)
• Key idea: Exhaustively search for the threshold that minimizes the within-class variance (the variance within the class)
Minimizing the within-class (intra-class) variance 𝜎2 (𝑘): 𝑊
A weighted sum of variances of the two classes 𝐶 and 𝐶 12
Maximizing the between-class (inter-class) variance 𝜎2(𝑘) 𝐵
Otsu’s Method: Optimum Global Thresholding • The within-class (intra-class) variance
s2 (k)=q(k)s2(k)+q(k)s2(k) W1122
𝑞(𝑘),𝑚(𝑘),and𝜎2(𝑘):probabilitysum,mean,andvarianceat𝐶 (0~𝑘) 111 1
𝑞 (𝑘), 𝑚 (𝑘), and 𝜎2(𝑘): probability sum, mean, and variance at 𝐶 (𝑘 + 1~𝐿 − 1) 222 2
• You can simply seek the optimal threshold 𝑘𝑜𝑝𝑡 by minimizing the within-class variance.
k =argmins2(k) opt W
s 12 (k ) 𝐶1
s 12 (k ) 𝐶2
Using threshold 𝑘𝑜𝑝𝑡, segment an image into two regions kk
For 𝑀 × 𝑁 image with 𝐿 intensity levels, 𝑛(𝑖): the number of pixels with intensity 𝑖
𝑛(𝑖) 𝑝(𝑖) = 𝑀𝑁
𝑝(𝑖) = 1 𝑖=0
𝑞 (𝑘), 𝑚 (𝑘), and 𝜎2(𝑘): probability sum, mean, and variance at 𝐶 (0~𝑘) 111 1
𝑞 (𝑘), 𝑚 (𝑘), and 𝜎2(𝑘): probability sum, mean, and variance at 𝐶 (𝑘 + 1~𝐿 − 1) 222 2
Otsu's method
Note) An image is assumed to be normalized to 0~1.
Adaptive Thresholding Using Moving Averages • Adaptive thresholding based on moving averages
Works well when objects are small with respect to an image size. Quite useful in document processing.
𝑔 𝑖,𝑗 =ቊ1 𝑖𝑓𝐼 𝑖,𝑗 >𝑇(𝑖,𝑗) 0 𝑜𝑡h𝑒𝑟𝑤𝑖𝑠𝑒
𝑇𝑖,𝑗 =𝑏×𝑚(𝑖,𝑗)
Using the mean intensity 𝑚(𝑖, 𝑗)
= 𝑤 𝑠,𝑡 𝐼(𝑖+𝑠,𝑗+𝑡) 𝑠=−𝑎 𝑡=−𝑏
Thresholding
Using Moving Averages
[From “Digital Image Processing”, . Gonzalez and Woods]
Thresholding
Using Moving Averages
[From “Digital Image Processing”, . Gonzalez and Woods]
• Image Segmentation ̶ Thresholding
Simple thresholding Otsu method Adaptive thresholding
• K-mean clustering
• Goal:choosethree“centers”astherepresentativeintensities, and label every pixel according to which of these centers it is nearest to.
• BestclustercentersarethosethatminimizeSSDbetweenall points and their nearest cluster center ci:
Clustering
• With this objective, it is a “chicken and egg” problem:
̶ If we knew the cluster centers, we could allocate points to groups by assigning each t o its closest center.
̶ If we knew the group memberships, we could get the centers by computing the mea n per group.
Slide from
• Basic idea: randomly initialize the k cluster centers, and iterate between the two ste ps we just saw.
1. Randomly initialize the cluster centers, c1, …, cK
2. Given cluster centers, determine points in each cluster
• For each point p, find the closest ci. Put p into cluster i
3. Given points in each cluster, solve for ci
• Set ci to be the mean of points in cluster i
4. If ci have changed, repeat Step 2
Properties
• Will always converge to some solution
• Can be a “local minimum”
• does not always find the global minimum of objective function:
means Clustering
means Clustering
means Clustering
means Clustering
means Clustering
means Clustering
means Clustering
means Clustering
Feature Space on Image Segmentation
• Depending on what we choose as the feature space, we can group pixels in different ways.
Grouping pixels based on intensity similarity
Clusters based on intensity similarity don’t have to be s
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com