代写代考 ICCV 2003]

Images as histograms of visual words • Inspired by ideas from text retrieval
– [Sivic and Zisserman, ICCV 2003]
visual words

Copyright By PowCoder代写 加微信 powcoder

Bag of features: outline
1. Extract features
2. Learn“visualvocabulary”
3. Quantize features using visual vocabulary
4. Representimagesbyfrequenciesof “visual words”
Quantize: approximate by one whose amplitude is restricted to a prescribed set of values.

1. Feature extraction
Compute SIFT descriptor
Normalize patch
Detect patches [Mikojaczyk and Schmid ’02]
[Mata, Chum, Urban & Pajdla, ’02] [Sivic & Zisserman, ’03]
Slide credit:

1. Feature extraction

2. Learning the visual vocabulary

2. Learning the visual vocabulary
Clustering
Slide credit:

2. Learning the visual vocabulary
Visual vocabulary
Clustering
Slide credit:

Large-scale image search • Build the database:
– Extract features from the database images
– Learn a vocabulary using k- means (typical k: 100,000)
– Compute weights for each word
– Create an inverted file mapping words→images

Weighting the words
• Just as with text, some visual words are more discriminative than others
the, and, or vs. cow, AT&T, Cher
• the bigger fraction of the documents a word
appears in, the less useful it is for matching
– e.g., a word that appears in all documents is not helping us

TF (term frequency)- IDF(inverse document frequency) weighting
• Instead of computing a regular histogram distance, we’ll weight each word by it’s inverse document frequency
• inverse document frequency (IDF) of word j =
number of documents
number of documents in which j appears

TF-IDF weighting
• To compute the value of bin j in image I:
term frequency of j in I x inverse document frequency of j

TF-IDF weighting
document1 document2 document3 document4 document5
I love him
don‘t buy dog
maybe not take
park stupid
problem help my dog
For word ‘dog’ in document 1:
Avoid zero denominator
TF(Term frequency) = 2/8 = 0.25
IDF(Inverse document frequency) = log(5/(3+1)) = 0.097 TF-IDF = 0.25*0.097 = 0.02425

document1 document2 document3 document4 document5
I love him
don‘t buy dog
maybe not take
park stupid
my dog has
problem help my dog
For word ‘flea’ in document 1:
TF(Term frequency) = 1/8 = 0.125
IDF(Inverse document frequency) = log(5/(1+1)) = 0.398 TF-IDF = 0.125*0.398 = 0.04975
Analysis: ‘Dog’ appears twice while ‘flea’ appears once in document 1. However, ‘dog’ also appears in document 3 and 4. Thus, ‘dog’ is less discriminative than ‘flea’. (weight of ‘dog’(0.097) < weight of ‘flea’(0.398).) Inverted file • Each image has ~1,000 features • We have ~1,000,000 visual words →each histogram is extremely sparse (mostly zeros) • Inverted file – mapping from words to documents Inverted file • Can quickly use the inverted file to compute similarity between a new image and all the images in the database – Only consider database images whose bins overlap the query image Inverted file Fish in water? Inverted file I like math I like programming programming is difficult Mapping from words to documents: programming Spatial pyramid: BoW disregards all nformation about the spatial layout of th Compute histogram in each spatial bin Slide credit: D. Hoiem Spatial pyramid [Lazebnik et al. CVPR 2006] Slide credit: D. Hoiem 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com