COMP6714:
Informa2on
Retrieval
&
Web
Search
Introduc)on
to
Informa(on
Retrieval
Lecture
6:
Scoring,
Term
Weigh)ng
and
the
Vector
Space
Model
1
COMP6714:
Informa2on
Retrieval
&
Web
Search
Recap
of
lecture
5
Collec)on
and
vocabulary
sta)s)cs:
Heaps’
and
Zipf’s
laws
Dic)onary
compression
for
Boolean
indexes
Dic)onary
string,
blocks,
front
coding
Pos)ngs
compression:
Gap
encoding,
prefix‐unique
codes
Variable‐Byte,
Gamma
codes,
Golomb/Rice
codes
collection (text, xml markup etc) 3,600.0 collection (text) 960.0 Term-doc incidence matrix 40,000.0 postings, uncompressed (32-bit words) 400.0 postings, uncompressed (20 bits) 250.0 postings, variable byte encoded 116.0 postings, γ-encoded 101.0 MB 2
COMP6714:
Informa2on
Retrieval
&
Web
Search
This
lecture;
IIR
Sec)ons
6.2‐6.4.3
Ranked
retrieval