CS代考 IJCV 2004 by

Please read the announcement in CANVAS carefully!

• Image Filtering – Image
– Linear filter

Copyright By PowCoder代写 加微信 powcoder

– Cross-correlation – Convolution
– Mean filter
– Gaussian Filter
– Image Sharpening

What is an image? • A grid (matrix) of intensity values
(common to use one byte per value: 0 = black, 255 = white)

• Filtering
– Form a new image whose pixels are a combination
of the original pixels • Why?
– To get useful information from images
• E.g., extract edges or contours (to understand shape)
– To enhance the image
• E.g., to remove noise
• E.g., to sharpen or to “enhance image”

Image filtering
• Modify the pixels in an image based on some function of a local neighborhood of each pixel
Some function
Local image data
Modified image data
Source: L. filtering
• One simple version of filtering: linear filtering (cross-correlation, convolution)
– Replace each pixel by a linear combination (a weighted sum) of its neighbors
• The prescription for the linear combination is called the “kernel” (or “mask”, “filter”)
Local image data kernel Modified image data
Source: L. Zhang

Cross-correlation
Let be the image, be the kernel (of size 2k+1 x 2k+1), and be the output image
This is called a cross-correlation operation:
• Can think of as a “dot product” between local neighborhood and kernel for each pixel

Convolution
• Same as cross-correlation, except that the kernel is “flipped” (horizontally and vertically)
This is called a convolution operation:

Mean filtering

Linear filters: Mean filter
Blur (with a mean filter)
Source: D. Lowe

Linear filters: examples
Sharpening filter
Source: D.
Source: D. Kernel
5 x 5,  = 1
• Constant factor at front makes volume sum to 1 (can be ignored, as
we should re-normalize weights to sum to 1 in any case)
0.003 0.013 0.022 0.013 0.003 0.013 0.059 0.097 0.059 0.013 0.022 0.097 0.159 0.097 0.022 0.013 0.059 0.097 0.059 0.013 0.003 0.013 0.022 0.013 0.003
Source: C. Kernel
Source: C. revisited • What does blurring take away?
Let’s add it back: +α
smoothed (5×5)
Source: S. Lazebnik

• Edge Detection
– Image derivatives
– Image gradient
– Laplacian of Gaussian

Image derivatives
• How can we differentiate a digital image F[x,y]?
– Option 1: reconstruct a continuous image, f, then compute the derivative
– Option 2: take discrete derivative (finite difference)
Source: S. gradient
• The gradient of an image:
The gradient points in the direction of most rapid increase in intensity
The edge strength is given by the gradient magnitude:
The gradient direction is given by:

Edge detection: smooth first
To find edges, look for peaks in
Source: S. of Gaussian filter
x-direction
y-direction

The Sobel operator
• Common approximation of derivative of Gaussian
• The standard defn. of the Sobel operator omits the 1/8 term – doesn’t make a difference for edge detection
– the 1/8 term is needed to get the right gradient magnitude

Laplacian of Gaussian (LoG) • Consider
Laplacian of Gaussian operator
Where is the edge? Zero-crossings of bottom graph

2D edge detection filters
Gaussian derivative of Gaussian
is the Laplacian operator:
Laplacian of Gaussian

• Image Sampling
– Gaussian pyramid – Image upsampling

Gaussian pyramid
F0 F1 F2… blur subsample blur subsample

Upsampling
• This image is too small for this screen: • How can we make it 10 times as big? • Simplest approach:
repeat each row
and column 10 times • (“Nearest neighbor
interpolation”)

• Color and Texture – Histogram
– Edge-based Texture Measures – Local Binary Pattern Measure – Co-occurrence Matrix Features

Histograms
• A histogram of a gray-tone image is an array H[*] of bins, one for each gray tone.
• H[i] gives the count of how many pixels of an image have gray tone i.
• P[i] (the normalized histogram) gives the percentage of pixels that have gray tone i.

Two Edge-based Texture Measures
1. edgeness per unit area
where N is the size of the unit area
2. edge magnitude and direction histograms
where these are the normalized histograms of gradient magnitudes and gradient directions, respectively.
Fedgeness = |{ p | gradient_magnitude(p)  threshold}| / N
Fmagdir = ( Hmagnitude, Hdirection )

Local Binary Pattern Measure
• For each pixel p, create an 8-bit number b1 b2 b3 b4 b5 b6 b7 b8, where bi = 0 if neighbor i has value less than or equal to p’s value and 1 otherwise.
• Represent the texture in the image (or a region) by the histogram of these numbers.
1 1 1 1 1 100

Local Binary Pattern (LBP)
𝐿𝐵𝑃𝑁= 𝑔𝑁−𝑁2𝑝 𝑝,𝑟𝑐 𝑝𝑐
𝑁 : center pixel 𝑐
𝑁 : neighbor pixel 𝑝
r: radius (for 3×3 cell, it is 1). binary threshold function 𝑔 𝑥 is,
𝑔𝑥 =ቊ0, 𝑥<0 1, 𝑥≥0 Co-occurrence Matrix Features A co-occurrence matrix is a 2D array C in which • Both the rows and columns represent a set of possible image values. • Cd (i,j) indicates how many times value i co-occurs with value j in a particular spatial relationship d. • The spatial relationship is specified by a vector d = (dr,dc). Co-occurrence Example co-occurrence matrix 1100 1100 0022 0022 0022 0022 gray-tone image 10 3 20 2 00 1 From Cd we can compute Nd, the normalized co-occurrence matrix, where each value is divided by the sum of all the values. • Corners and blobs – Harris corner detection – Blob detection Corner detection: the math Consider shifting the window W by (u,v) • define an SSD “error” E(u,v): • Thus, E(u,v) is locally approximated as a quadratic error function Corner detection: the math The surface E(u,v) is locally approximated by a quadratic form. Interpreting the eigenvalues Classification of image points using eigenvalues of M: 2 R=12 –k(1 +2)2 R is large for corner R is negative (with large magnitude) for edge R is small for flat region 1 and 2 are small; E is almost constant in all directions 1 and 2 are large, 1 ~ 2; E increases in all directions “Flat” region Other Versions of The Harris operator min is a variant of the “Harris operator” for feature detection Harris detector: Invariance properties -- Image translation • Derivatives and window function are shift-invariant Corner location is covariant w.r.t. translation Harris detector: Invariance properties -- Image rotation Corner location is covariant w.r.t. rotation Harris detector: Invariance properties – Affine intensity change I→a I + b • Only derivatives are used =>
invariance to intensity shift I → I + b • Intensity scaling: I → a I
RR threshold
x (image coordinate)
x (image coordinate)
Partially invariant to affine intensity change

: Invariance Properties • Scaling
All points will be classified as edges
Not invariant to scaling

Scale Selection
• Instead of computing f for larger and larger windows, we can implement using a fixed window size with a Gaussian pyramid
(sometimes need to create in- between levels, e.g. a 3⁄4 -size image)

Laplacian of Gaussian • “Blob” detector
• Find maxima and minima of LoG operator in space and scale

Scale-space blob detector
Lxx()+Lyy() 3 2
 List of (x, y, s)
K. Grauman, B. Leibe

• Scale Invariant Feature Transform – Building scale-space
– Interest point detection
– Orientation assignment
– SIFT feature descriptor – SIFT distance calculation

Scale Invariant Feature Transform • Scale space peak selection
– Potential locations for finding features • Key point localization
– Accurately locating the feature key points • Orientation assignment
– Assigning orientation to the key points • Key point descriptor
– Describing the key point as a high dimensional vector (128) (SIFT Descriptor )
Adapted from slide by

Building the Scale Space
Adapted from IJCV 2004 by

Peak Detection
Adapted from slide by

Assignment of the Orientation
• An orientation histogram is formed from the gradient orientations of sample points within a region around the keypoint.
• The orientation histogram has 36 bins covering the 360 degree range of orientations.
• The samples added to the histogram is weighted by the gradient magnitude.
• The dominate direction is the peak in the histogram.
Adapted from IJCV 2004 by

Full version
SIFT descriptor
• Divide the 16×16 window into a 4×4 grid of cells (2×2 case shown below)
• Compute an orientation histogram (8 bin) for each cell (relative orientation and magnitude)
• 16 cells * 8 orientations = 128 dimensional descriptor
Adapted from slide by

Feature distance
How to define the difference between two features f1, f2? – Simple approach: L2 distance, ||f1 – f2 || (aka SSD)
– can give good scores to ambiguous (incorrect) matches

Feature distance
How to define the difference between two features f1, f2?
• Better approach: ratio distance = ||f1 – f2 || / || f1 – f2’ ||
• f2 is best SSD match to f1 in I2
• f2’ is 2nd best SSD match to f1 in I2
• gives large values for ambiguous matches

• Bag of Words
– The BoW representation – TF-IDF weighting
– Inverted file

Images as histograms of visual words • Inspired by ideas from text retrieval
– [Sivic and Zisserman, ICCV 2003]
visual words

TF (term frequency)- IDF(inverse document frequency) weighting
• Instead of computing a regular histogram distance, we’ll weight each word by it’s inverse document frequency
• inverse document frequency (IDF) of word j =
number of documents
number of documents in which j appears

TF-IDF weighting
• To compute the value of bin j in image I:
term frequency of j in I x inverse document frequency of j

Inverted file • Each image has ~1,000 features
• We have ~1,000,000 visual words
→each histogram is extremely sparse (mostly zeros)
• Inverted file
– mapping from words to documents

• Transformation and Alignment – Image warping
– All 2D Linear Transformations – Homogeneous coordinates
– Affine Transformations

Forward Warping
• Send each pixel f(x,y) to its corresponding
location (x’,y’) = T(x,y) in g(x’,y’)
• What if pixel lands “between” two pixels?
• Answer: add “contribution” to several pixels, normalize later
• Can still result in holes
y T(x,y) y’
x’ g(x’,y’)

Inverse Warping
• Get each pixel g(x’,y’) from its corresponding
location (x,y) = T-1(x’,y’) in f(x,y)
• Requires taking the inverse of the transform
• What if pixel comes from “between” two pixels?
y T-1(x’,y’) y’ x f(x,y)
x’ g(x’,y’)

All 2D Linear Transformations
• Linear transformations are combinations of …
– Rotation, – Shear, and – Mirror
x’ = a be y’ c dg
bx d   y 
 y ‘  f i
 c jx

Homogeneous coordinates
Trick: add one more coordinate:
homogeneous image coordinates
homogeneous plane
(x/w, y/w, 1)
y Converting from homogeneous coordinates

Affine Transformations
• Affine transformations are combinations of …
– Linear transformations, and – Translations
x’ 1 0 tx x y’ = 0 1 ty y
1 0 0 11     
x’ cos −sin 0x y’ = sin cos 0y
10 0 11     
x’ a b cx y’=d e fy
w 0 0 1w
x’ sx 0 0x y’ =  0 sy 0y 1 0 0 11
x’ 1 shx 0x y’ = shy 1 0y
1 0 0 11     
2D in-plane rotation

Simple case: translations
Displacement of match i =
Mean displacement =

RANSAC • General version:
1. Randomly choose s samples
• Typically s = minimum sample size that lets you fit a
2. Fit a model (e.g., line) to those samples
3. Count the number of inliers that approximately fit the model
4. Repeat N times
5. Choose the model that has the largest set of inliers

– Pinhole camera
– Camera parameters – Modeling projection

Pinhole and the Perspective Projection
image plane
r=(x, y,z)
optical axis
effective focal length, f’
r’=(x’, y’, f ‘)
r’ = r x’ = x y’ = y f’z f’zf’z

Pinhole camera model
Figure from Forsyth
f = Focal length
c = Optical center of the camera
Real object

Camera parameters
• How can we model the geometry of a camera?
Two important coordinate systems: 1. World coordinate system
2. Camera coordinate system
“The World”

Camera parameters
• To project a point (x,y,z) in world coordinates into a camera
• First transform (x,y,z) into camera coordinates
• Need to know
– Camera position (in world coordinates)
– Camera orientation (in world coordinates)
• The formation of image frame – Need to know camera intrinsics

Intrinsic Parameters
• In the image frame, denote location of
𝑐 𝑝𝑟𝑖𝑛𝑐𝑖𝑝𝑙𝑒 𝑝𝑜𝑖𝑛𝑡 image plane as 𝑐𝑥 and 𝑐𝑦 • Imageprinciplepoint:
Intersection between the camera optical axis and image plane
𝑃′ = 𝑥′,𝑦′ = 𝑓𝑥+𝑐𝑥,𝑓𝑦+𝑐𝑦 𝑧𝑧

Intrinsic Parameters
• Points in digital image are expressed as in pixels
• Points in image plane are represented in physical measurement (e.g., centimeter)
• The mapping between digital image and image plan can be something like 𝑝𝑖𝑥𝑒𝑙𝑠
• We can use two parameters, k and l, to describe the mapping. If 𝑘 = 𝑙, then the camera has “square pixels”.
• The equation now becomes:
𝑃′= 𝑥′,𝑦′ = 𝑓𝑘𝑥+𝑐𝑥,𝑓𝑙𝑦+𝑐𝑦
=(α𝑥 +𝑐𝑥,𝛽𝑦+𝑐𝑦) 𝑧𝑧

Intrinsic Parameters In matrix form: 𝑥
𝛼 0 𝑐𝑥 0 𝑦 𝑃′=0𝛽𝑐𝑦 0𝑧=𝑀𝑃
𝛼 0 𝑐𝑥 𝑃′=𝑀𝑃=0𝛽𝑐𝑦 𝐼0𝑃=𝐾𝐼0𝑃
K: Camera matrix (or calibration matrix)
𝑃′= 𝑥′,𝑦′ =(α𝑥+𝑐𝑥,𝛽𝑦+𝑐𝑦) 𝑧𝑧

Extrinsic Parameters
• What if the information about the 3D world is available in a different coordinate system?
• We need to relate the points from world reference system to the camera reference system
• Given a point in world reference system 𝑃 , the camera 𝑤
coordinate is computed as 𝑃=𝑅𝑇𝑃
World coordinate
3D->2D Extrinsic Parameters Intrinsic Parameters
Camera coordinate
Pixel coordinate

Projection Matrix
• Combining intrinsic and extrinsic parameters, we have extrinsic parameters
𝑃′=𝐾𝑅 𝑇𝑃 =𝑀𝑃 𝑤𝑤
intrinsic parameters
• K changes as the type of camera changes
• Extrinsic parameters are independent of camera

• Stereo Vision and Structure from Motion – Depth and Disparity
– Epipolar geometry
– Stereo matching
– Structure from motion: problem definition

Optic axes of 2 cameras are parallel
camera L O
image planes f
(from similar triangles)
=y=y yl yr
x-b P=(x,z)
Y-axis is perpendicular to the page.

3D from Stereo Images
For stereo cameras with parallel optical axes, focal length f, baseline b, corresponding image points (xl,yl) and (xr,yr), the location of the 3D point can be derived from previous slide’s equations:
Depth z= f*b/(xl-xr)=f*b/d
x = xl*z/f or b + xr*z/f y = yl*z/f or yr*z/f
Note that depth is inversely proportional to disparity

Epipolar geometry: notation
• Baseline – line connecting the two camera centers
• Epipoles
= intersections of baseline with image planes = projections of the other camera center
• Epipolar Plane – plane containing baseline (1D family)
• Epipolar Lines – intersections of epipolar plane with image
planes (always come in corresponding pairs)

• Slide a window along the right scanline and compare contents of that window with the reference window in the left image
• Matching cost: SSD, SAD, or normalized cross correlation
Matching cost

Similarity Measure
Sum of Absolute Differences (SAD)
Sum of Squared Differences (SSD)
Zero-mean SAD
Normalized Cross Correlation (NCC)
Matching windows:
SAD SSD NCC
Ground truth
http://siddhantahuja.wordpress.com/category/stereo-vision/

Structure from motion
minimize g(R, T, X)

– Assumptions in Lucas-Kanade method
– Lucas-Kanade Algorithm (Brightness Constancy Equation)
• Optical flow

Optical Flow
I(x,y,t) I(x,y,t+1)
• Given two subsequent frames, estimate the point
translation
• Key assumptions of Lucas-Kanade Tracker
• Brightness constancy: projection of the same point looks the same in every frame
• Small motion: points do not move very far
• Spatial coherence: points move like their neighbors

The brightness constancy constraint
• Brightness Constancy Equation:
I ( x , y , t ) = I ( x + u , y + v , t + 1)
Take Taylor expansion of I(x+u, y+v, t+1) at (x,y,t) to linearize the right side: Image derivative along x Difference over frames
I(x+u,y+v,t+1)I(x,y,t)+ u+Iy v+ I(x+u,y+v,t+1)−I(x,y,t)=+Ix u+Iy v+It
I(x,y,t+1)
I u+I v+I 0 →Iu v +I =0 xyt t

Iterative Refinement
• Iterative Lucas-Kanade Algorithm
1. Estimate velocity at each pixel by solving Lucas- Kanade equations
2. (t-1) towards I(t) using the estimated flow field
– use image warping techniques
3. Repeat until convergence

Coarse-to-fine optical flow estimation
run iterative L-K
warp & upsample
run iterative L-K
Gaussian pyramid of image 1 (t)
Gaussian pyramid of image 2 (t+1)

• Top Level
A Few Details
– Apply L-K to get a flow field representing the flow from the first frame to the second frame.
– Apply this flow field to warp the first frame toward the second frame.
– Return L-K on the new warped image to get a flow field from it to the second frame.
– Repeat till convergence.
• Next Level
– Upsample the flow field to the next level as the first guess of the flow at that level.
– Apply this flow field to warp the first frame toward the second frame.
– Rerun L-K and warping till convergence as above. • Etc.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com