4/6/2021
CSE 473/573
Introduction to Computer Vision and Image Processing
‘-
Mobile Retriever: Use of context
• Represent each document page as a “bag of
visual words”
• Build a reverse index of visual words from the
lexicon ‘-
• Perform ranked retrieval based on the number of
“hits” for each document
• Verify top candidates
2
1
4/6/2021
‘-
3
Layout Context : Visual Words without OCR
Relative coordinates of n most visible neighbors
With n=5, one layout context consists of 5x2x2=20 integers
Experiment shows approximately 100 accurate layout contexts are enough for retrieval, therefore each query has a small foot print = 100*20*2=4KB
4
Invariant to:
Translation ‘- Rotation
Scaling
Stable with “long” words
only
2
4/6/2021
‘-
5
‘-
3
4/6/2021
Using Slope to Determine Orientation
Consider the slopes….
B A
C
C’ B’
Slope(BC) – Slope(AB)? ‘- <0
Slope(BC) – Slope(AB)? >0
A’
Verification
• Use the spatial relationship between “visual words”
• Orientation of triplet is invariant under rotation, translation, scaling, warping and even crinkling
A’•
B’ •
C’’•
A• B•
+
C•
+‘- C’•
A’’• B’’•
–
8
4
4/6/2021
Scoring
• When viewed from another angle
• Score is simply
‘-
9
Verification
• Approximate estimation C: 50%
C‘: 50%
One triplet:
P(A,B,C clockwise) = 0.5 P(A,B,C counterclockwise) = 0.5
Assuming triplets are independent,
‘-
possibility that M triplets out of N triplets
accidentally satisfies orientation verification is
1 NN Q(N,M)(2) M
N
M
Q(N,M)
10
5
0.25
30
20
0.027
50
40
0.0001
60
50
6.5×10‐8
5
4/6/2021
Good Match
‘-
Score = 1538
Bad Match (page not in DB)
‘-
Score = 332
It is safe to set threshold score =1000
6
Without Triplet verification
‘-
With Triplet verification
1500 1000 500 0 -500
-10000 20 40
‘-
Query ID
60
80 100
13
14
4/6/2021
Score of token triplets verification
7
4/6/2021
Instance recognition: remaining issues
• How to summarize the content of an entire image? And gauge overall similarity?
• How large should the vocabulary be? How to perform quantization efficiently?‘-
• Is having the same set of visual words enough to identify the object/scene? How to verify spatial agreement?
• How to score the retrieval results?
Kristen Grauman
Precision and Recall
• Precision – How precise are the answers you gave?
• (# Relevant)/(# Total Returned)
• Recall – How many did you find of the ones that could be
found?
• (# Relevant)/(# Total Relevant)
‘-
• F Measure – Harmonic Mean of Precision and Recall • (2*Precision*Recall)/(Precison+Recall)
Project #1 Character Detection Relevant = Correct
Returned = Guesses
8
Precision and Recall Curve
Database size: 10 images Relevant (total): 5 images
Query
Results (ordered):
‘-
precision = #relevant / #returned recall = #relevant / #total relevant
1 0.8 0.6 0.4 0.2
00 0.2 0.4 0.6 0.8 1 recall
What else can we borrow from text retrieval?
China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004’s $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to
Slide credit: Ondrej Chum
annoy the US, which has long argued that
‘s-urplus, commerce, China’s exports are unfairly helped by a
also needed to d
o
m
o
re
to
b
oo
st
e
exports, imports, US,
deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the
yuan, bank, domestic,
yuan is only one factor. Bank of China
foreign, increase,
governor Zhou Xiaochuan said the country
t
r
a
d
e,
v
demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.
a
lu
domestic
4/6/2021
precision
$660bn. The fig
ur
es
ar
e
lik
e
l
y
to
,
C
a
h
in
a
,
t
r
d
e
fu
rther
9
4/6/2021
tf-idf weighting
• Term frequency – inverse document frequency
• Describe image by frequency of each word within it, scale down words that appear often in the database
‘-
Document 2:
For Sale: Volkswagen Golf, 1999, Green, 2000cc, petrol, manual, Green Interior, great car, hatchback, 94000miles, 2.0 GTi, 2 Registered Keepers, HPI Checked, Air-Conditioning, Front and Rear Parking Sensors, ABS, Alarm, Alloy, Reliable Manufacturer
Kristen Grauman
Document 1:
How can the grass on the greens at a golf course be so perfect? For example, a skilled golfer expects to reach the green on a par-four hole in …manufactures and sells synthetic golf putting greens and mats.
tf-idf weighting
• Term frequency – inverse document frequency
• Describe image by frequency of each word within it, scale down words that appear often in the database
• (Standard weighting for text retrieval) ‘-
Number of occurrences of word i in document d
Number of words in document d
Total number of documents in database
Number of documents word i occurs in, in whole database
Kristen Grauman
10
4/6/2021
Query expansion
Query: golf green Results:
‐ How can the grass on the greens at a golf course be so perfect?
‐ For example, a skilled golfer expects to reach the gre‘-en on a par‐four hole in … ‐ Manufactures and sells synthetic golf putting greens and mats.
Irrelevant result can cause a `topic drift’:
‐ Volkswagen Golf, 1999, Green, 2000cc, petrol, manual, , hatchback, 94000miles, 2.0 GTi, 2 Registered Keepers, HPI Checked, Air‐Conditioning, Front and Rear Parking Sensors, ABS, Alarm, Alloy
Slide credit: Ondrej Chum
Query Expansion
Query image
New query
Results
…
Spatial verification
New results
Chum, Philbin, Sivic, Isard, Zisserman: Total Recall…, ICCV 2007
Slide credit: Ondrej Chum
‘-
11
4/6/2021
Recognition via alignment
Pros:
• Effective when we are able to find reliable features within clutter • Great results for matching specific instances
Cons:
• Scaling with number of models
‘-
• Spatial verification as post-processing – not seamless, expensive for
large-scale problems
• Notsuitedforcategoryrecognition.
Kristen Grauman
Summary
• Matching local invariant features
• Useful not only to provide matches for multi-view geometry, but also to find objects and scenes.
• Bag of words representation: quantize feature space to make discrete set of visual words ‘-
• Summarize image by distribution of words • Index individual words
• Inverted index: pre-compute index to enable faster search at query time
• Recognition of instances via alignment: matching local features followed by spatial verification
• Robust fitting : RANSAC, GHT
Kristen Grauman
12
4/6/2021
Lessons from a Decade Later • For Category recognition
• Bag of Feature models remained the state of the art until Deep Learning.
• Spatial layout either isn’t that impor‘t-ant or its too difficult to encode.
• Quantization error is, in fact, the bigger problem. Advanced feature encoding methods address this.
• Bag of feature models are nearly obsolete. At best they seem to be inspiring tweaks to deep models e.g. NetVLAD.
James Hays
Lessons from a Decade Later • For instance retrieval (this lecture)
• deep learning has taking over.
• learn better local features (replace SIFT) e.g.
MatchNet ‘-
• or learn better image embeddings (replace the
histograms of visual features) e.g. Vo and Hays 2016.
• or learn to do spatial verification e.g. DeTone, Malisiewicz, and Rabinovich 2016.
• or learn a monolithic deep network to recognition all locations e.g. Google’s PlaNet 2016.
James Hays
13
4/6/2021
Things to remember
• Object instance recognition
– Find keypoints, compute descriptors
– Match descriptors
–Vote for / fit affine parameters
– Return object if # inliers > T ‘-
• Keys to efficiency –Visual words
• Used for many applications – Inverse document file
• Used for web-scale search
Context
‘-
14
4/6/2021
Three papers on computational models of context:
• A. Torralba, K. P. Murphy, and W. T. Freeman,
“Contextual models for object detection using
boosted random fields,” in Advances in Neural
Information Processing Systems 17 (NIPS), 2005.
• D. Hoiem, A. A. Efros, and M. Hebert, “Putting objects in perspective,” in Computer Vision and Pattern Recognition, 2006
• G. Heitz and D. Koller, “Learning spatial context: Using stuff to find things,” in ECCV 2008, pp. 30-43.
‘-
Why is detection hard?
y
‘-
x
We want to do this for ~ 1000 objects
1,000,000 images/day
time
10,000 patches/object/image
15
4/6/2021
Is local information enough?
‘-
Slide credit: A. Torralba
chair table
car ‘- road
keyboard
table
road
If we have 1000 categories (detectors), and each detector produces 1 fa every 10 images, we will have 100 false alarms per image… pretty much garbage…
Slide credit: A. Torralba
16
4/6/2021
Is local information even enough?
‘-
Slide credit: A. Torralba
Is local information even enough?
Information
Local features
‘-
Contextual features
Distance
Slide credit: A. Torralba
17
4/6/2021
We know there is a keyboard present in this scene even if we cannot see it clearly.
‘-
… even if there is one indeed.
Slide credit: A. Torralba
We know there is no keyboard present in this scene
The multiple personalities of a blob
‘-
Slide credit: A. Torralba
18
4/6/2021
The multiple personalities of a blob
‘-
Slide credit: A. Torralba
‘-
Slide credit: A. Torralba
19
4/6/2021
‘-
Slide credit: A. Torralba
‘-
Slide credit: A. Torralba
20
4/6/2021
Look-Alikes by Joan Steiner
‘-
Slide credit: A. Torralba
Look-Alikes by Joan Steiner
‘-
Slide credit: A. Torralba
21
4/6/2021
Look-Alikes by Joan Steiner
‘-
Slide credit: A. Torralba
The context challenge
How far can you go without
using an object detector?
Slide credit: A. Torralba
‘-
22
4/6/2021
1
‘-
2
Slide credit: A. Torralba
What are the hidden objects?
‘-
Chance ~ 1/30000
Slide credit: A. Torralba
23
4/6/2021
The importance of context
• Cognitive psychology • Palmer 1975
• Biederman 1981
•…
• Computer vision
• Noton and Stark (1971)
• Hanson and Riseman (1978) • Barrow & Tenenbaum (1978) • Ohta, kanade, Skai (1978)
• Haralick (1983)
• Strat and Fischler (1991)
• Bobick and Pinhanez (1995) • Campbell et al (1997)
‘-
Slide credit: A. Torralba
Next Lecture: Object Detection
‘-
4/5/2021
48
24