Images as histograms of visual words • Inspired by ideas from text retrieval
– [Sivic and Zisserman, ICCV 2003]
visual words
Copyright By PowCoder代写 加微信 powcoder
Bag of features: outline
1. Extract features
2. Learn“visualvocabulary”
3. Quantize features using visual vocabulary
4. Representimagesbyfrequenciesof “visual words”
Quantize: approximate by one whose amplitude is restricted to a prescribed set of values.
1. Feature extraction
Compute SIFT descriptor
Normalize patch
Detect patches [Mikojaczyk and Schmid ’02]
[Mata, Chum, Urban & Pajdla, ’02] [Sivic & Zisserman, ’03]
Slide credit:
1. Feature extraction
2. Learning the visual vocabulary
2. Learning the visual vocabulary
Clustering
Slide credit:
2. Learning the visual vocabulary
Visual vocabulary
Clustering
Slide credit:
Large-scale image search • Build the database:
– Extract features from the database images
– Learn a vocabulary using k- means (typical k: 100,000)
– Compute weights for each word
– Create an inverted file mapping words→images
Weighting the words
• Just as with text, some visual words are more discriminative than others
the, and, or vs. cow, AT&T, Cher
• the bigger fraction of the documents a word
appears in, the less useful it is for matching
– e.g., a word that appears in all documents is not helping us
TF (term frequency)- IDF(inverse document frequency) weighting
• Instead of computing a regular histogram distance, we’ll weight each word by it’s inverse document frequency
• inverse document frequency (IDF) of word j =
number of documents
number of documents in which j appears
TF-IDF weighting
• To compute the value of bin j in image I:
term frequency of j in I x inverse document frequency of j
TF-IDF weighting
document1 document2 document3 document4 document5
I love him
don‘t buy dog
maybe not take
park stupid
problem help my dog
For word ‘dog’ in document 1:
Avoid zero denominator
TF(Term frequency) = 2/8 = 0.25
IDF(Inverse document frequency) = log(5/(3+1)) = 0.097 TF-IDF = 0.25*0.097 = 0.02425
document1 document2 document3 document4 document5
I love him
don‘t buy dog
maybe not take
park stupid
my dog has
problem help my dog
For word ‘flea’ in document 1:
TF(Term frequency) = 1/8 = 0.125
IDF(Inverse document frequency) = log(5/(1+1)) = 0.398 TF-IDF = 0.125*0.398 = 0.04975
Analysis: ‘Dog’ appears twice while ‘flea’ appears once in document 1. However, ‘dog’ also appears in document 3 and 4. Thus, ‘dog’ is less discriminative than ‘flea’. (weight of ‘dog’(0.097) < weight of ‘flea’(0.398).)
Inverted file • Each image has ~1,000 features
• We have ~1,000,000 visual words
→each histogram is extremely sparse (mostly zeros)
• Inverted file
– mapping from words to documents
Inverted file
• Can quickly use the inverted file to compute similarity between a new image and all the images in the database
– Only consider database images whose bins overlap the query image
Inverted file
Fish in water?
Inverted file
I like math
I like programming
programming is difficult
Mapping from words to documents:
programming
Spatial pyramid: BoW disregards all nformation about the spatial layout of th
Compute histogram in each spatial bin
Slide credit: D. Hoiem
Spatial pyramid
[Lazebnik et al. CVPR 2006]
Slide credit: D. Hoiem
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com