程序代写 BM25 Retrieval

Generalising IR Operations
Many operations in an information retrieval pipeline can be thought of a “transformer” functions.
Example: BM25 Retrieval
Query: Glasgow weather BM25 Results:

Generalising IR Operations
Many operations in an information retrieval pipeline can be thought of a “transformer” functions.
Example: BM25 Retrieval
Query: Glasgow weather Inverted Index Results: k1=1.2 b=0.75
Transformers can have parameters.

Generalising IR Operations
Many operations in an information retrieval pipeline can be thought of a “transformer” functions.
Example: BM25 Retrieval
Inverted Index
k1=1.2 b=0.75
More generally: a batch of queries to a batch results
glasgow…
glasgow…
glasgow…
glasgow weather
flights to glasgow

Generalising IR Operations
Many operations in an information retrieval pipeline can be thought of a “transformer” functions.
Example: BM25 Retrieval
Inverted Index
k1=1.2 b=0.75
Shorthand: BM25 maps a Query (Q) frame to a Result (R) frame.
glasgow…
glasgow…
glasgow…
glasgow weather
flights to glasgow

Generalising IR Operations
The PyTerrier Data Model:
glasgow weather
flights to glasgow
Science & Mathematics Physi…
The hot glowing…
School-Age Kids Growth &…
Developmental…
glasgow weather
glasgow weather
glasgow weather

Generalising IR Operations
x↦y Transformer

Generalising IR Operations
x↦y Transformer
Transformer Class
Q ↦ R Retrieval
Find documents related to the queries

Generalising IR Operations
x↦y Transformer
Transformer Class
Q ↦ R Retrieval
Rewrite queries based on the returned documents.

Generalising IR Operations
x↦y Transformer
Transformer Class
Q ↦ R Retrieval
R ↦ R Re-ranking
LambdaMART
Find a better order
for the results. (Usually a more expensive method than retrieval.)

Generalising IR Operations
x↦y Transformer
Transformer Class
Q ↦ R Retrieval
R ↦ R Re-ranking
LambdaMART
Query Re-writing
Build a better version of the user’s query

Generalising IR Operations
x↦y Transformer
Transformer Class
Q ↦ R Retrieval
R ↦ R Re-ranking
LambdaMART
Query Re-writing
D ↦ D Doc. Re-writing
Build a better version of documents to index.

Generalising IR Operations
These operations are composable!
Example: Perform SDM, then BM25, then RM3, then BM25 again.
Q↦Q↦R↦Q↦R SDM BM25 RM3 BM25

What’s the point of all this?
Transformer Class
Q ↦ R Retrieval
R ↦ R Re-ranking
LambdaMART
Query Re-writing
D ↦ D Doc. Re-writing

What’s the point of all this?
All of these transformations can be replaced with a Neural Network!
Transformer Class
Neural Examples
Q ↦ R Retrieval
ANCE, ColBERT
ColBERT-PRF
R ↦ R Re-ranking
LambdaMART
CEDR, monoT5
Query Re-writing
T5-QE, IntenT5
D ↦ D Doc. Re-writing
Doc2Query, DeepImpact

Why Neural Networks?
• Neural Natural Language Processing (NLP) techniques are highly effective: state-of-the-art at most NLP tasks.
− Particularly: Using models trained on a “language modeling” objective transfer the knowledge well to other tasks.
• Able to learn complex & subtle patterns from training data − Automatically learns to overcome the lexical gap, proximity
relations, term salience (importance), coreference, etc. − This means less manual “feature engineering”

Building comprehensive rules/heuristics for language is challenging.
What about other usages of “where”?
BM25=12.05
Iodine was discovered by in 1811
BM25=12.18
Iodine was discovered in 1811 by French chemist was iodine discovered?
Should “where”
in France. Courtois was trying Courtois (1777-1838). The queries match “in”?
to extract potassium chloride
element occurs primarily in seawater and in solids formed when seawater evaporates. Its single most important
from seaweed. After
crystallizing the potassium
chloride, he added sulfuric acid
What about other
to the remaining liquid. This, property may be the ability to
usages of “in”?
rather surprisingly, produced a kill germs. purple vapor, which
condensed into dark crystals.
How close does it need to be to other query terms?
Example from MS-MARCO collection17

Building comprehensive rules/heuristics for language is challenging.
iodine discovered
BM25=12.05
Iodine was discovered by
in France. Courtois was trying
to extract potassium chloride
froWmosueladwtheised.oAcuftmeernt
What about other
BM25=12.18
formulations of the query?
Iodine was discovered in 1811 by French chemist (1777-1838). The element occurs primarily in
crystallizing the potassium
of the document?
be less relevant if this
when seawater evaporates. Its single most important property may be the ability to kill germs.
chloride, he added sulfuric acid
phrase was later in the
to the remaining liquid. This,
rather surprisingly, produced a purple vapor, which condensed into dark crystals.
Or other formulations
seawater and in solids formed
Example from MS-MARCO collection18

Building comprehensive rules/heuristics for language is challenging.
iodine discovered
BM25=12.05
Iodine was discovered by
in France. Courtois was trying
to extract potassium chloride
from seaweed. After
BM25=12.18
in 1811 by chemist (1777-1838). The element occurs primarily in
Iodine was discovered
Is the fact that the
seawater and in solids formed
Is “France” really
crystallizing the potassium
when seawdaistceorvevrearpworaastes. Its
enough? Would the
chloride, he added sulfuric acid to the remcaitiynibnegblieqtuteird?. This, rather surprisingly, produced a purple vapor, which condensed into dark crystals.
single most important
French close enough?
property may be the ability to kill germs.
Example from MS-MARCO collection19

Natural language is messy
The same idea can be described by many sequences of words.
Meanwhile… A sequence of words can describe different ideas (ambiguity).
Even the grammar of languages themselves are challenging to fully describe.
We can instead let a neural network learn how to deal with this mess.

1. Review of LTR & Basics of Neural Networks for NLP
2. Neural Re-ranking
3. Neural Retrieval Q ↦ R 4.NeuralQueryRewriting&PRF Q ↦ Q & R ↦ Q 5. Neural Document Rewriting D ↦ D
6. Neural IR in PyTerrier

Review of Supervised Learning
Decision boundary
Use training data to make a model that is able to predict something about new inputs.
Many algorithms exist (e.g., Logistic Regression, LambdaMART, etc.)
• Today we’re focusing on neural networks
These algorithms have settings (hyper-parameters)
that affect how the model is built
In reality: these algorithms operate over many features (figure shows 2: x and y axis)
Classification:

Learning to Rank
Relevant Query-doc pair
Q: Where was iodine discovered?
D: Iodine was discovered by in 1811 in France…
Non-Relevant Query-doc pair
Q: Where was iodine discovered?
D: Iodine was discovered in 1811 by French chemist …

Neural Networks (Intro.)
Consist of computational elements (neurons)
A neuron receives INPUT from other nodes and each
INPUT is associated with a weight ! (learned)
The unit computes some function ” of the weighted
sumofit’sINPUTs:#=%(∑$ !() !”# ! !
Also sometimes referred to as a perceptron
e.g. Tf-idf “#
# is called the “activation function”. It transforms the OUTPUT shape of the weighted sum.

Neural Networks (Intro.)
• Feeding in learning examples ! = ($! , &! )
• The network adjusts the weights based on training samples
• Uses a loss function to determine how “wrong” the predicated value is
• Propagates loss signals obtained by gradients of OUTPUT with respect to INPUT

Neural Networks (Intro.)
A neural network consists of multiple layers of neurons Helps in capture non-linear target functions
• Many interesting problems are non-linear in nature! Several special structures allow functionality on certain
types of data: e.g., convolutional, recurrent, transformer This is the basic “feed forward” neural network:
hidden input

Learning to rank with neural networks
hidden input
Tf-idf = 0.18 PageRank = 0.21
Rel. score
This could work, but doesn’t provide much beyond what more basic methods can do (e.g., LambdaMART).

Learning to rank with neural networks
We explore feeding the text of the query and document as features directly into the neural networks.
Goal: Determine relevance scores based on the query and document text itself.
Let the neural network learn rules for vocabulary mismatch, proximity, etc.
Q: Where was iodine discovered?
D2: Iodine was discovered by in 1811 in France…
But how can we provide text as input to a neural network?

Representing Text
One option: One-hot encoding
animal coronavirus covid
early hypertension quarantine outside reopening symptoms test

Representing Text
One option: One-hot encoding
Coronavirus
animal coronavirus covid
early hypertension quarantine outside reopening symptoms test

Representing Text
One option: One-hot encoding
Coronavirus
Bag of words
coronavirus
hypertension quarantine outside reopening symptoms test

Representing Text
One option: One-hot encoding
Coronavirus
coronavirus
hypertension quarantine outside reopening symptoms test

Coronavirus
COVID Early Sympto
Representing Text
Problem: no relationship between words
Completely different
coronavirus
hypertension quarantine outside reopening symptoms test
animal coronavirus covid
early hypertension quarantine outside reopening symptoms test

Representing Text
Word Vectors: Map each word to a dense vector Also known as a word “embedding”
coronavirus =

Representing Text
Building word vectors
• Based on word co-occurrences
• Trained using neural network
coronavirus =
• e.g. word2vec, gloVe, etc.
• Important: these vectors are the same, regardless of context that the word appears in (i.e., “static”).

Representing Text
Word Vectors: Map each word to a dense vector
Coronavirus Early Symptoms
Coronavirus
COVID Early Symptoms

Representing Text
Word Vectors: Map each word to a dense vector
Coronavirus Early Symptoms
Coronavirus
Not identical vectors, but close
COVID Early Symptoms

Problem: Not Context
✅ Handles different tokens with similar meanings “Coronavirus” has a similar vector than “COVID”
❌ Doesn’t handle a single token with multiple possible meanings “A bear is raiding homes in California.”
“Bear in mind that we do not have much time before the big show.” %
has the same vector than

BERT & Transformer Networks
ranking score
layer n layer l+1
er n decomp.
[CLS] bear in mind that we … the big show . [SEP]
Static vectors
Contextualisation
Self-attention layers

BERT & Transformer Networks
By the end, a “contextualised” vector ranking score is produced – one that knows how the
word is used in context.
layer n layer l+1
er n decomp.
[CLS] bear in mind that we … the big show . [SEP]
At each layer, a token’s vector is a learned combination of all the other vectors in the text.
Static vectors
Contextualisation
Self-attention layers

BERT & Transformer Networks
ranking score
bear, keep, etc.
Wcombine layer n
Training objective: Predict what word goes where a masked token is. (Masked Language Modeling)
layer l+1 decomp.
er n decomp.
[CLS] [MASK] in mind that we … the big show . [SEP]
Static vectors
Contextualisation
Self-attention layers

BERT & Transformer Networks
Special token representation
layer n layer l+1
ranking score
used for text classification.
er n decomp.
[CLS] bear in mind that we … the big show . [SEP]
Static vectors
Contextualisation
Self-attention layers

Problem: Not Context
✅ Handles different tokens with similar meanings “Coronavirus” has a similar vector than “COVID”
✅ Doesn’t handles a single token with multiple possible meanings “A bear is raiding homes in California.”
“Bear in mind that we do not have much time before the big show.” %
has a different vector than

T5 & Text Generation
Transformer “Encoder”
er n decomp.
[CLS] bear in mind that we … the big show . [SEP]
Self-attention layers

T5 & Text Generation
Transformer “Encoder”
Transformer “Decoder”
ind that we …
Build a new representation based on the encoder outputs.

T5 & Text Generation
Transformer “Encoder”
Transformer “Decoder”
ind that we … the big show .

T5 & Text Generation
Transformer “Encoder”
Transformer “Decoder”
It Predict the next word in the sequence.
ind that we …

T5 & Text Generation
Transformer “Encoder”
Transformer “Decoder”
Continue the process iteratively
ind that we …
[START] It

Neural NLP techniques can:
• Build representations that place similar words near each other
• Build representations that can distinguish different meanings a single word may have
• Generate text sequences
We will now use these tools to perform IR operations!

Neural Network

Why Neural Re
Simple formulation: Given a query and document, assign
a new score.
glasgow weather
glasgow weather
glasgow weather
glasgow weather
glasgow weather
glasgow weather
Computational cost: Lexical retrieval method like BM25 are simple and achieve reasonably recall already; you just need to re-score a set of 100-1000 query-document pairs using an expensive NN.
Test Collection Suitability: Most benchmarks are based on pooling of lexical systems like BM25, so there is a higher likelihood that documents in this set have relevance assessments.

Formulation
(query, document) -> score
Title: How can we evaluate an
interrelation of symptoms? Abstract: A pandemic of 2019 novel
coronavirus (COVID-19) is an international problem and factors associated with increased risk of mortality have been reported. However, there exists limited statistical method to estimate a comprehensive risk for a case in which a patient has several characteristics…
,-. = 2.142
Coronavirus Early Symptoms

“Vanilla BERT”
approach that works (really well)!
Idea: Concatenate query and document; let the model learn how to combine into a ranking score.
ranking score
layer n layer l+1
At each layer, the [CLS] token gathers evidence about the relevance between Q and D
The simplest
er n decomp.
embed. tokens
[CLS] [tax] [evade] [SEP] [world] [news] … query
[for] [tax] [fraud][today] [SEP]
(Also sometimes called monoBERT, Cross BERT, etc.)
Meanwhile, tokens are being contextualised.
Self-attention layers
… … …

Usually: Pairwise training.

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts