CS计算机代考程序代写 Bayesian GPU AI algorithm Bundle Adjustment &

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 1/23

Chapter 7
Bundle Adjustment &
Nonlinear Optimization
Multiple View Geometry
Summer 2021

Prof. Daniel Cremers
Chair for Computer Vision and Artificial Intelligence

Departments of Informatics & Mathematics
Technical University of Munich

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 2/23

Overview

1 Optimality in Noisy Real World Conditions

2 Bundle Adjustment

3 Nonlinear Optimization

4 Gradient Descent

5 Least Squares Estimation

6 Newton Methods

7 The Gauss-Newton Algorithm

8 The Levenberg-Marquardt Algorithm

9 Summary

10 Example Applications

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 3/23

Optimality in Noisy Real World Conditions
In the previous chapters we discussed linear approaches to
solve the structure and motion problem. In particular, the
eight-point algorithm provides closed-form solutions to
estimate the camera parameters and the 3D structure, based
on singular value decomposition.

However, if we have noisy data x̃1, x̃2 (correspondences not
exact or even incorrect), then we have no guarantee

• that R and T are as close as possible to the true solution.
• that we will get a consistent reconstruction.

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 4/23

Statistical Approaches to Cope with Noise

The linear approaches are elegant because optimal solutions
to respective problems can be computed in closed form.
However, they often fail when dealing with noisy and imprecise
point locations. Since measurement noise is not explicitly
considered or modeled, such spectral methods often provide
suboptimal performance in noisy real-world conditions.

In order to take noise and statistical fluctuation into account,
one can revert to a Bayesian formulation and determine the
most likely camera transformation R, T and ‘true’ 2D
coordinates x given the measured coordinates x̃ , by
performing a maximum aposteriori estimate:

arg max
x ,R,T

P(x ,R,T | x̃) = arg max
x ,R,T

P(x̃ |x ,R,T ) P(x ,R,T )

This approach will however involve modeling probability
densities P on the fairly complicated space SO(3)× S2 of
rotation and translation parameters, as R ∈ SO(3) and T ∈ S2
(3D translation with unit length).

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 5/23

Bundle Adjustment and Nonlinear Optimization

Under the assumption that the observed 2D point coordinates
x̃ are corrupted by zero-mean Gaussian noise, maximum
likelihood estimation leads to bundle adjustment:

E(R,T ,X 1, . . . ,X N) =
N∑

j=1

∣∣x̃ j1 − π(X j )∣∣2 + ∣∣x̃ j2 − π(R,T ,X j )∣∣2
It aims at minimizing the reprojection error between the
observed 2D coordinates x̃ ji and the projected 3D coordinate
X j (w.r.t. camera 1). Here π(R,T ,X j ) denotes the perspective
projection of X j after rotation and translation.

For the general case of m images, we get:

E({Ri ,Ti}i=1..m, {X j}j=1..N) =
m∑

i=1

N∑
j=1

θij
∣∣x̃ ji − π(Ri ,Ti ,X j )∣∣2,

with T1 = 0 and R1 = 1. θij = 1 if point j is visible in image i ,
θij = 0 else. The above problems are non-convex.

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 6/23

Different Parameterizations of the Problem

The same optimization problem can be parameterized
differently. For example, we can introduce x ji to denote the true
2D coordinate associated with the measured coordinate x̃ ji :

E({x j1, λ
j
1}j=1..N ,R,T ) =

N∑
j=1

‖x j1−x̃
j
1‖

2 +‖x̃ j2−π(Rλ
j
1x

j
1 +T )‖

2.

Alternatively, we can perform a constrained optimization by
minimizing a cost function (similarity to measurements):

E({x ji}j=1..N ,R,T ) =
N∑

j=1

2∑
i=1

‖x ji − x̃
j
i‖

2

subject to (consistent geometry):

x j>2 T̂Rx
j
1 = 0, x

j>
1 e3 = 1, x

j>
2 e3 = 1, j = 1, . . . ,N.

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 7/23

Some Comments on Bundle Adjustment

Bundle adjustment aims at jointly estimating 3D coordinates of
points and camera parameters – typically the rigid body
motion, but sometimes also intrinsic calibration parameters or
radial distortion. Different models of the noise in the observed
2D points leads to different cost functions, zero-mean
Gaussian noise being the most common assumption.

The approach is called bundle adjustment (dt.:
Bündelausgleich) because it aims at adjusting the bundles of
light rays emitted from the 3D points. Originally derived in the
field of photogrammetry in the 1950s, it is now used frequently
in computer vision. A good overview can be found in:
Triggs, McLauchlan, Hartley, Fitzgibbon, “Bundle Adjustment –
A Modern Synthesis”, ICCV Workshop 1999.

Typically it is used as the last step in a reconstruction pipeline
because the minimization of this highly non-convex cost
function requires a good initialization. The minimization of
non-convex energies is a challenging problem. Bundle
adjustment type cost functions are typically minimized by
nonlinear least squares algorithms.

http://lear.inrialpes.fr/pubs/2000/TMHF00/Triggs-va99.pdf
http://lear.inrialpes.fr/pubs/2000/TMHF00/Triggs-va99.pdf

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 8/23

Nonlinear Programming
Nonlinear programming denotes the process of iteratively
solving a nonlinear optimization problem, i.e. a problem
involving the maximization or minimization of an objective
function over a set of real variables under a set of equality or
inequality constraints.

There are numerous methods and techniques. Good
overviews of respective methods can be found for example in
Bersekas (1999) “Nonlinear Programming”, Nocedal & Wright
(1999), “Numerical Optimization” or Luenberger & Ye (2008),
“Linear and nonlinear programming”.

Depending on the cost function, different algorithms are
employed. In the following, we will discuss (nonlinear) least
squares estimation and several popular iterative techniques for
nonlinear optimization:

• the gradient descent,
• Newton methods,
• the Gauss-Newton algorithm,
• the Levenberg-Marquardt algorithm.

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 9/23

Gradient Descent
Gradient descent or steepest descent is a first-order
optimization method. It aims at computing a local minimum of a
(generally) non-convex cost function by iteratively stepping in
the direction in which the energy decreases most. This is given
by the negative energy gradient.

To minimize a real-valued cost E : Rn → R, the gradient flow
for E(x) is defined by the differential equation:{

x(0) = x0
dx
dt = −

dE
dx (x)

Discretization: xk+1 = xk − � dEdx (xk ), k = 0,1,2, . . . .

E(x) E(x)

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 10/23

Gradient Descent
Under certain conditions on E(x), the gradient descent iteration

xk+1 = xk − �
dE
dx

(xk ), k = 0,1,2, . . .

converges to a local minimum. For the case of convex E , this
will also be the global minimum. The step size � can be chosen
differently in each iteration.

Gradient descent is a popular and broadly applicable method.
It is typically not the fastest solution to compute minimizers
because the asymptotic convergence rate is often inferior to
that of more specialized algorithms. First-order methods with
optimal convergence rates were pioneered by Yuri Nesterov.

In particular, highly anisotropic cost functions (with strongly
different curvatures in different directions) require many
iterations and trajectories tend to zig-zag. Locally optimal step
sizes in each iteration can be computed by line search. For
specific cost functions, alternative techniques such as the
conjugate gradient method, Newton methods, or the BFGS
method are preferable.

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 11/23

Linear Least Squares Estimation

Ordinary least squares or linear least squares is a method to
for estimating a set of parameters x ∈ Rd in a linear regression
model. Assume for each input vector bi ∈ Rd , i ∈ {1, ..,n}, we
observe a scalar response ai ∈ R. Assume there is a linear
relationship of the form

ai = b
>
i x + ηi

with an unknown vector x ∈ Rd and zero-mean Gaussian noise
η ∼ N (0,Σ) with a diagonal covariance matrix of the form
Σ = σ2In. Maximum likelihood estimation of x leads to the
ordinary least squares problem:

min
x


i

(ai − x>bi )2 = (a− Bx)>(a− Bx).

Linear least squares estimation was introduced by Legendre
(1805) and Gauss (1795/1809). When asking for which noise
distribution the optimal estimator was the arithmetic mean,
Gauss invented the normal distribution.

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 12/23

Linear Least Squares Estimation

For general Σ, we get the generalized least squares problem:

min
x

(a− Bx)>Σ−1(a− Bx).

This is a quadratic cost function with positive definite Σ−1. It
has the closed-form solution:

x̂ = arg min
x

(a− Bx)>Σ−1(a− Bx)

= (B>Σ−1B)−1B>Σ−1a.

If there is no correlation among the observed variances, then
the matrix Σ is diagonal. This case is referred to as weighted
least squares:

min
x


i

wi (ai − x>bi )2, with wi = σ−2i .

For the case of unknown matrix Σ, there exist iterative
estimation algorithms such as feasible generalized least
squares or iteratively reweighted least squares.

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 13/23

Iteratively Reweighted Least Squares

The method of iteratively reweighted least squares aims at
minimizing generally non-convex optimization problems of the
form

min
x


i

wi (x)|ai − fi (x)|2,

with some known weighting function wi (x). A solution is
obtained by iterating the following problem:

xt+1 = arg min
x


i

wi (xt ) |ai − fi (x)|2

For the case that fi is linear, i.e. fi (x) = x>bi , each subproblem

xt+1 = arg min
x


i

wi (xt ) |ai − x>bi |2

is simply a weighted least squares problem that can be solved
in closed form. Nevertheless, this iterative approach will
generally not converge to a global minimum of the original
(nonconvex) problem.

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 14/23

Nonlinear Least Squares Estimation

Nonlinear least squares estimation aims at fitting observations
(ai ,bi ) with a nonlinear model of the form ai ≈ f (bi , x) for some
function f parameterized with an unknown vector x ∈ Rd .
Minimizing the sum of squares error

min
x


i

ri (x)
2, with ri (x) = ai − f (bi , x),

is generally a non-convex optimization problem.

The optimality condition is given by


i

ri
∂ri
∂xj

= 0, ∀ j ∈ {1, ..,d}.

Typically one cannot directly solve these equation. Yet, there
exist iterative algorithms for computing approximate solutions,
including Newton methods, the Gauss-Newton algorithm and
the Levenberg-Marquardt algorithm.

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 15/23

Newton Methods for Optimization
Newton methods are second order methods: In contrast to
first-order methods like gradient descent, they also make use
of second derivatives. Geometrically, the Newton method
iteratively approximate the cost function E(x) quadratically and
takes a step to the minimizer of this approximation.

Let xt be the estimated solution after t iterations. Then the
Taylor approximation of E(x) in the vicinity of this estimate is:

E(x) ≈ E(xt ) + g>(x − xt ) +
1
2

(x − xt )>H(x − xt ),

The first and second derivative are denoted by the Jacobian
g = dE/dx(xt ) and the Hessian matrix H = d2E/dx2(xt ). For
this second-order approximation, the optimality condition is:

dE
dx

= g + H(x − xt ) = 0 (∗)

Setting the next iterate to the minimizer x leads to:

xt+1 = xt − H
−1g.

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 16/23

Newton Methods for Optimization

In practice, one often choses a more conservative step size
γ ∈ (0,1):

xt+1 = xt − γ H
−1g.

When applicable, second-order methods are often faster than
first-order methods, at least when measured in number of
iterations. In particular, there exists a local neighborhood
around each optimum where the Newton method converges
quadratically for γ = 1 (if the Hessian is invertible and Lipschitz
continuous).

For large optimization problems, computing and inverting the
Hessian may be challenging. Moreover, since this problem is
often not parallelizable, some second order methods do not
profit from GPU acceleration. In such cases, one can aim to
iteratively solve the extremality condition (∗).

In case that H is not positive definite, there exist quasi-Newton
methods which aim at approximating H or H−1 with a positive
definite matrix.

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 17/23

The Gauss-Newton Algorithm
The Gauss-Newton algorithm is a method to solve non-linear
least-squares problems of the form:

min
x


i

ri (x)
2.

It can be derived as an approximation to the Newton method.
The latter iterates:

xt+1 = xt − H
−1g.

with the gradient g:

gj = 2

i

ri
∂ri
∂xj

,

and the Hessian H:

Hjk = 2

i

(
∂ri
∂xj

∂ri
∂xk

+ ri
∂2ri
∂xj∂xk

)
.

Dropping the second order term leads to the approximation:

Hjk ≈ 2

i

JijJik , with Jij =
∂ri
∂xj

.

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 18/23

The Gauss-Newton Algorithm

The approximation

H ≈ 2J>J, with the Jacobian J =
dr
dx
,

together with g = 2J>r , leads to the Gauss-Newton algorithm:

xt+1 = xt + ∆, with ∆ = −(J
>J)−1J>r

In contrast to the Newton algorithm, the Gauss-Newton
algorithm does not require the computation of second
derivatives. Moreover, the above approximation of the Hessian
is by construction positive definite.

This approximation of the Hessian is valid if∣∣∣∣ri ∂2ri∂xj∂xk
∣∣∣∣�

∣∣∣∣ ∂ri∂xj ∂ri∂xk
∣∣∣∣ ,

This is the case if the residuum ri is small or if it is close to
linear (in which case the second derivatives are small).

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 19/23

The Levenberg-Marquardt Algorithm
The Newton algorithm

xt+1 = xt − H
−1g,

can be modified (damped):

xt+1 = xt −
(
H + λIn

)−1
g,

to create a hybrid between the Newton method (λ = 0) and a
gradient descent with step size 1/λ (for λ→∞) .

In the same manner, Levenberg (1944) suggested to damp the
Gauss-Newton algorithm for nonlinear least squares:

xt+1 = xt + ∆, with ∆ = −
(
J>J + λIn

)−1
J>r .

Marquardt (1963) suggested a more adaptive component-wise
damping of the form:

∆ = −
(
J>J + λdiag(J>J)

)−1
J>r ,

which avoids slow convergence in directions of small gradient.

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 20/23

Summary
Bundle adjustment was pioneered in the 1950s as a technique
for structure and motion estimation in noisy real-world
conditions. It aims at estimating the locations of N 3D points X j
and camera motions (Ri ,Ti ), given noisy 2D projections x̃

j
i in m

images.

The assumption of zero-mean Gaussian noise on the 2D
observations leads to the weighted nonlinear least squares
problem:

E
(
{Ri ,Ti}i=1..m, {X j}j=1..N

)
=

m∑
i=1

N∑
j=1

θij
∣∣x̃ ji − π(Ri ,Ti ,X j )∣∣2,

with θij = 1 if point j is visible in image i , θij = 0 else.

Solutions of this nonconvex problem can be computed by
various iterative algorithms, most importantly the
Gauss-Newton algorithm or its damped version, the
Levenberg-Marquardt algorithm. Bundle adjustment is typically
initialized by an algorithm such as the eight-point or five-point
algorithm.

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 21/23

Example I: From Internet Photo Collections…

Flickr images for search term “Notre Dame”

Snavely, Seitz, Szeliski, “Modeling the world from Internet
photo collections,” IJCV 2008.

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 22/23

…to Sparse Reconstructions

Snavely, Seitz, Szeliski, “Modeling the world from Internet
photo collections,” IJCV 2008.

Bundle Adjustment &
Nonlinear Optimization

Prof. Daniel Cremers

Optimality in Noisy
Real World Conditions

Bundle Adjustment

Nonlinear Optimization

Gradient Descent

Least Squares
Estimation

Newton Methods

The Gauss-Newton
Algorithm

The
Levenberg-Marquardt
Algorithm

Summary

Example Applications

updated April 12, 2021 23/23

Example II: Realtime Structure and Motion

Klein & Murray, “Parallel Tracking and Mapping (PTAM) for
Small AR Workspaces,” ISMAR 2007.

Optimality in Noisy Real World Conditions
Bundle Adjustment
Nonlinear Optimization
Gradient Descent
Least Squares Estimation
Newton Methods
The Gauss-Newton Algorithm
The Levenberg-Marquardt Algorithm
Summary
Example Applications