CS代写 BM25,)Language)Models,)Divergence)From) Randomness

IR#H/M##Course
Recall:’IR’Model
Model Matching
There%are%many%IR%models

Copyright By PowCoder代写 加微信 powcoder

− The&relevance&decision&mechanism&can&be&either&strict or& flexible (e.g.&set&retrieval&vs.&ranked&retrieval)
− The&representation&of&the&data&can&have&a&varying’degree’of’ abstraction (e.g.&unigrams&vs.&concepts)
Progress’in’retrieval’models’has’corresponded’with’ improvements’in’effectiveness
Recall:’Relevance
We#have#seen#that#relevance#is#a#complex#and#subjective#concept
−Many’factors’to’consider’ −People’often’disagree’when’making’relevance’judgments
Retrieval#models#make#various#assumptions#about#relevance#to# simplify#the#problem
−e.g.,’topical’relevance'(query’and’document’are’about’the’same’topic)”’ vs.’
user’relevance'(taking’into’account’the’usefulness’of’document’ from’a’user’s’perspective’such’as’novelty,’freshness,’language,’ etc.)
−e.g.,’binary’vs.’multiCvalued’relevance

IR#H/M##Course 10/02/22
Classical’Models'(in’brief):’
• Boolean,)Vector)Space) Probabilistic’Models
• BM25,)Language)Models,)Divergence)From) Randomness
• Semi=structured)models:)Fields=based)Models • Proximity)Models
Boolean,&Vector&Space
CLASSICAL&MODELS

IR#H/M##Course 10/02/22
Recall:’Boolean’Retrieval
Recall&the&example&of&using& an&inverted&index:
− Query(for(‘time’ AND(‘dark’
There&are&2&docs&with&“time” in&dictionary&
− IDs&1 and(2 from(posting(file
There&is&1&doc&with&“dark” in& dictionary&
− ID&2 from(posting(file
Therefore,&only&doc&ID&2& satisfied&the&query.
1 1 2 1 1 2
1 1 1 1 1 1
and come country dark
1 1 1 1 2 1
1 1 1 1 2 1
Standard’Boolean’Model
Assumptions
− A$document$is$represented$as$a$set of$keywords (i.e.$model$of$ documents)
− Queries$are$boolean expressions$of$keywords,$connected$by$ AND,$OR,$and$NOT$(i.e.$model$of$queries)$
− A$document$is$judged$to$be$relevant$if$the$index$terms$in$the$ document$satisfy the$logical$expression$in$the$query$(i.e.$ relevance$decision$mechanism)
R(D,3Q)3=3D3! Q
F.W.’Lancaster;’E.G’Fayen (1973).’Information’Retrieval’OnELine,’Melville’Publishing’Co.,’Los’Angeles,’Calif8ornia G.’Salton,’E.A.’Fox,’and’H.’Wu.'(1983).’Extended’Boolean’information’retrieval.’Communications’of’the’ACM’26(11),’1022E1036.’

IR#H/M##Course 10/02/22
Based&on&a&Closed&World&Assumption&(CWA)& Example
Decision,Mechanism
− Three’index’terms:’A,,B,,C
d1,=,{A,,B},
Query&=&&“A&! B&! ¬ C” − Retrieved :”d1′
− Not retrieved :’d2,’d3′
d2,=,{B,,C}
d3,=,{A,,B,,C}
Advantages*and*Disadvantage* of*the*Boolean*Model
Advantages
− Simple*queries*are*easy*to* understand
− Exact*and*simple*to* program
− The*whole*panoply*of* Boolean*Algebra*available
Disadvantages
− Effectiveness*depends* entirely*on*the*user
− Complex*query*syntax*is* often*misunderstood*(if* understood*at*all)
− Problems*of*Null*output*or* Information*Overload
− Output*is*not*ordered*in* any*useful*fashion*(set* retrieval)

IR#H/M##Course 10/02/22
Classical’Models’#2
Vector’Space’Model
−Documents,are,represented,by,a,term,vector
−Queries,are,similarly,represented,by,a,term,vector Similarity’metric’is’ad2hoc
• Coordination,matching
Which%one%to%use? What%is%the%ideal%one? How%do%you%find?
Ad2hoc’weightings'(i.e.,’not’theoretically’based) No’guidance’on’when’to’stop’ranking Not’necessarily’optimal’ranking
G.2’S7alt/o1n0,’A/.2’W0o1ng4,’C.S.’Yang'(1975).’A’vector’space’model’for’automatic’indexing.’Communication1s1’of’the’ACM,’18(11),’613I620
PROBABILISTIC+MODELS+AND+BEST+ MATCH
What,are,the,drawbacks,of,the Vector,space,retrieval,model?

IR#H/M##Course 10/02/22
Why$use$probabilities$?
IR#deals#with#Uncertain#Information
Explain*why*IR*is*an* Uncertain*process?
−Document#Representation:%How%good%is%this%representation −Information#need#and#Query#Representation:%How%
representative%is%the%query%viz information%need
−Matching#Process:%How%relevant%is%the%result%to%the%query?%Is% the%ranking%optimal?
−Results#presentation:%What%is%the%impact%of%the%presentation%of% the%results?
Uncertainty%is%everywhere!
Probability#theory#seems#to#be#the#most#natural#way#to#
quantify#uncertainty
Probability*Ranking*Principle
W.S.$Cooper$argued:
If%a%reference%retrieval%system’s%response$to$each$request is%a”ranking”of” the”documents”in”the”collections in%order*of*decreasing*probability*of* usefulness to%the%user%who%submitted%the%request%…
…%where%the%probabilities%are%estimated%as%accurately%a%possible%on%the% basis%of%whatever%data%made%available%to%the%system%for%this%purpose%% …
…%then%the%overall$effectiveness of%the%system%to%its%users%will$be$the$best that% is%obtainable%on%the%basis%of%that%data.
Ranking*Criteria:%Probability%of%relevance%of%the%document% with%respect%to%the%query
(Robertson%1977)

IR#H/M##Course 10/02/22
Another(way(to(view(ranking…
IR#as#a#classification(problem
− Given)a)document)D,)we)want)to)decide)whether)this)is)
relevant)or)not
− Classify)as)relevant)or)non;relevant)and)retrieve)if)it)is)relevant
− If)we)can)calculate)the)probability)of)relevance)(p(R|D)))or) non;relevance)(p(NR|D)))then)we)can)use)the)set)with)the) highest)probability
If#p(R|D)(>(p(NR|D)#then#D is#relevant, Otherwise#D is#not#relevant
Relevant# Documents
Non;Relevant# Documents
Document(d
C.J.(van(Rijsbergen.((1979).(Information(Retrieval((2nd(edition).(Butterworths.
Bayes&Classifier
Bayes&Decision&Rule
• A$document$D”is$relevant$if$P(R|D)$> P(NR|D) Estimating&probabilities
• Use$Bayes&Rule
p(R|D)= p(D|R)p(R) p(D)
p(NR | D) = p(D | NR)p(NR) p(D)
• Classify$a$document$as$relevant$if
p(D R) > p(NR)
p(D NR) p(R)
−lhs$is$likelihood”ratio
p(D|R), p(D|NR) – probability that if a relevant (non-relevant) document is retrieved, it is D.

IR#H/M##Course 10/02/22
Estimating)Probabilities
How$do$we$compute$all$those$probabilities?
− Cannot)compute)exact)probabilities)– we)have)to)use)estimates Binary$Independence$Retrieval$(BIR)
− “Binary” =)Boolean
− Document)represented)by)a)vector)of)binary)features)indicating)term)occurrence)(or)
nonCoccurrence)
Independence$Assumption
− Terms)occur)in)documents)independently p(a∩b)= p(a).p(b)
− Binary)vectors)of)terms
− Classify)a)document)as relevant)if)p(D|R)$p(R)$>$p(D|NR)$p(NR)$(after)using) Bayes$Rule)
p(D R) > p(NR) p(D NR) p(R)
lhs)Likelihood)ratio
Binary’Independence’Model
Assume&independence&of&terms
− doc’represented’by’a’vector’of’binary’features’indicating’term’occurrence’ (or’non8occurrence)
−pi is’probability’that’term’i occurs'(i.e.,’has’the’value’1)’in’a’document’from’ the’relevant’set; si is’probability’of’occurrence’of’i in’the’non8relevant’set’
After&a&bit&of&math&(see&next&slide)
p(D|R) =∏pi(1−si)⋅∏1−pi p(D|NR) i:di=1si(1−pi) i 1−si
Avoid’multiplying’lots’of’ small’numbers'(scoring
function):
∑log pi(1−si)) i:di =1 si (1 − pi )
Retrieval’Status’ Value'(RSV)
Constant’for each’query
S.E.’Robertson,’K.’Sparck Jones'(1976).’Relevance’weighting’of’search’terms.’ Journal’of’the’American’Society’for’Information’Science,’27:’129–146
Cut$Off$Point

IR#H/M##Course 10/02/22
Binary’Independence’Model
Added$$&$cancel$each$other
pi = p(di =1 R) 1−pi =p(di =0 R)
si = p(di =1 NR) 1−si =p(di =0 NR)
∏ means(that(it(is(a(product(over(the(terms(that(have(the(value(1(in(the(document i:di =1
Ranking’Documents
How$do$we$score$documents$in$practice?
−Query)provides)information)about)relevant) documents
n 0.5(1− Ni )
−If)we)assume)pi constant)(0.5),)si is)approximated)by) entire)collection,)then)we)get)an)idf3like$weight
Relevance$Score$for$a$
RSV = log ∏ pi (1 − si ) = ∑ log pi (1 − si )
Score$Computation
s (1− p ) i i
s (1− p ) i i
Non-Relevant
• Estimates:
ci ≈K(N,ni,R,ri)=log
(ni −ri) (N−ni −R+ri)
Add)0.5)to every) expression
See’Textbook

IR#H/M##Course 10/02/22
Looking’back
In#the#absence#of#any#information#about#the#relevant# documents,#
−The’term’weight’derived’from’the’binary’independence’model’is’very’ similar’to’idf weight
−There’is’no’tf component’
− Because’in’BIR’we’consider’only’a’binary’representation
Performance#Issues
−How’good’is’this’model?
• Not’very’good:’if’tf is’ignored,’most’
effectiveness’measures’could’drop’by’up’
• Let’s’bring’back’tf!
So&we&started&with&a&binary& representation4&That&is&throwing& away&all&the&frequency&information4
Used&Bayes&rule&and&came&up&with& a&ranking&where&in&the&worst&case,& it&is&equivalent&to&an&idfBbased ranking!
Widely’used’ranking’approach’based’on’binary’ independence’model
− Adds&document&and&query&term&weights
R(D,Q)=∑ log (ri +0.5)/(R−ri +0.5) •(k1 +1)tfi •(k2 +1)qfi
i∈Q (n −r+0.5)/(N−n −R+r+0.5) K+tf k +qf iiiii2i
− k1,$k2$$and b$are&parameters&whose&values&are&set&empirically
− ,&dl is&document&length −Typical&TREC&value&for&k1 is&1.2,&k2$ varies&from&0&to&1000,&b&=&0.75
Still)one)of)the)best)performing)ranking)models
K.)Sparck Jones,)S.)Walker,)S.E.)Robertson)(2000).)A)probabilistic)model)of)information)retrieval:)development)and)comparative) experiments.)IPM,)36(6),)779M808
S.E.)Robertson,)S.)Walker,)S.)Jones,)M.)HancockMBeaulieu,)M.)Gatford (1994).)Okapi)at)TRECM3.)TREC)1994.

IR#H/M##Course 10/02/22
Building’language’models,’Smoothing
LANGUAGE’MODELLING
Language’Model
Unigram(language(model
# Probability-distribution-over-the-words-in-a-language
0.2 girl 0.1 cat 0.35 the 0.25 boy 0.1 meet
P(s | M) = 0.2 x 0.1 = 0.02
Models(the(probability of(generating(strings(in(the(language( A(topic in(a(document(or(query(can(be(represented(as(a(language(
i.e.,-words-that-tend-to-occur-often-when-discussing-a-topic-will-have-high- probabilities-in-the-corresponding-language-model

IR#H/M##Course 10/02/22
Unigram(and(higher
order( models
P((((((((((((((()
=(P(((((() P((((((|(((() P((((((|(((((() P((((((|(((((((((()
Unigram(Language(Models
P(((((()(P(((((()((P(((((()((P(((((()
Easy. Effective!
Bigram((generally,(n5gram)(Language(Models P(((((()(P((((((|(((()((P((((((|(((()((P((((((|(((()
Why$Use$Language$Models?
Use$language$models$to$model$the$process$of$query$ generation:
− User(thinks(of(some(relevant(documents − Pick(some(keywords(to(use(as(a(query
Metrological$office$ warned$of$high$winds,$ rain$across$west$of$ Scotland.$There$will$be$ local$floods,$traffic$ chaos.$The$bad$weather will$continue$for$the$ foreseeable$future.
Forecast, Glasgow, weather
How$do$different$retrieval$models$ handle$missing$query$terms?
Why&are&we&not&using&higher2order& models&in&IR?
Note:$some(words(are( not(in(the(document

IR#H/M##Course 10/02/22
Generative)Probabilistic)Models
What%is%the%probability%of%producing%the%query%from%a%document?% P(Q|D)
− Referred)to)as)the)query5likelihood Assumptions:%
− The)probability)of)a)document)being)relevant)is)strongly)correlated)with)the) probability)of)a)query)given)a)document
• P(D|R)%is%correlated%with%p(Q|D)
− User)has)a)reasonable)idea)of)the)terms)that)are)likely)to)appear)in)the)
“ideal”)document
− User’s)query)terms)can)distinguish)the)“ideal”)document)from)the)rest)of) the)corpus
The%query%is%generated%as%a%representative%of%the%“ideal”% document
− System’s)task)is)to)estimate)for)each)of)the)documents)in)the)collection,) which)is)most)likely)to)be)the)“ideal”)document
J.M.)Ponte)and)W.B.)Croft.)(1998).)”A)Language)Modeling Approach)to)Information)Retrieval”.)SIGIR’98,)275–281. W.B.)Croft,)J.)Lafferty)(2003).)Language)modeling for)information)retrieval.)Kluwer Academic)Publishers.
Language’Models
Basic&Idea:
Weather, Glasgow, Forecast
− Let’s)assume)we)point)blindly,)one)at)a)time,)at)3)words)in)a) document
− What)is)the)probability)that)I,)by)accident,)pointed)at)the)words) “Weather”,)“Glasgow”)and)“Forecast”?
− Compute)the)probability,)and)use)it)to)rank)the)documents. Query/Likelihood&Model&
−Rank&documents&by&the&probability&that&the&query&could&be& generated&by&the&document&model
− Given)a)query,)start)with)P(D|Q) − Using)Bayes’&Rule:
p(D|Q)= p(D)p(Q|D) = p(D)p(Q|D)
p(Q ) rankGenerative)Proba2b8ilistic)IR)Models

IR#H/M##Course 10/02/22
Standard’LM’Approach
Assume&that&query&terms&are&drawn&identically&and& independently&from&a&document&
p(Q|D)=∏p(qi |D) qi ∈Q
Maximum&Likelihood&Estimate&of&p(qi|D)&
− Simply*use*the*number*of*times*the*query*term*qi occurs*in*the* document*divided*by*the*total*number*of*term*occurrences.
p(q |D)= fqi,D
|D| Problem:&Zero&Probability&Problem
− If*any*of*the*query*words*are*missing*from*the*document,*the* score*will*be*zero
Zero%Probability%Problem
May$not$wish$to$assign$a$probability$of$zero$to$a$ document$that$is$missing$one$or$more$of$the$query$terms$
−Missing)1)out)of)4)query)words)same)as)missing)3)out)of)4 Document$texts$are$a$sample from$the$topic’s$model
−Missing)words)should)not)have)zero)probability)of)occurring
Smoothing is$a$technique$for$estimating$probabilities$for$ missing$(or$unseen)$words
−Lower)(or)discount))the)probability)estimates)for)words)that)are) seen)in)the)document)text
−Assign)that)“leftEover”)probability)to)the)estimates)for)the) words)that)are)not)seen)in)the)text

IR#H/M##Course 10/02/22
The$Need$for$Smoothing
Zero%probabilities%spell%disaster
−We&need&to&smooth probabilities
• Discount nonzero&probabilities
• Give&some&probability&mass&to&unseen terms
−E.g.&adding&1,&1⁄2&or&! to&counts,&Dirichlet priors,&discounting,&and& interpolation
− A&simple&idea&that&works&well&in&practice&is&to&use&a&mixture&between&the& document&and&the&collection&distribution&(Jelinek8Mercer)&
p(qi |D)=(1−λ)p(qi |D)+λp(qi |C) A%non8occurring%term%is%possible,%but%no%more%likely%than%would%
be%expected%by%chance%in%the%collection
C.$Zhai,$J.$Lafferty.$(2004).$A$study$of$smoothing$methods$for$language$models$applied$to$information$retrieval.$ACM$TOIS$22(2),$179I214
Avoiding(the(estimation(problem(and(overcoming( data(sparsity
• Lower((discount)(the(probability(estimates(for(words(that( are(seen(in(a(document(text,
−Assign(the(leftover(probability(to(estimates(for(the(words( that(are(not(seen(in(the(text
−Unseen(words(can(be(estimated(from(the(collection
p(qi /D)=(1−λ) fqi,D +λ cqi p(Q|D)= p(qi /D)
Mercer+Smoothing
p(q |C)= cqi
How$many$times$the$query$term$occurred$in i | C | the$collection$divided$by$the$total$number$of
word$occurrences$in$the$collection$
D |C| 27/10/14 Language.Models
Justify.the.use.of.the.Jelinek/P(tf+1|d,D)/F P(tf|d,D)
• In/this/way,/increases/in/term/frequency/should/be/absorbed

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com