程序代写 IFN647- Assignement II

PowerPoint Presentation

IFN647- Assignement II
demonstration

Copyright By PowCoder代写 加微信 powcoder

Introduction
Discover a good information filtering model that recommends relevant documents to users on all 50 topics
Approach and the Models
Use python to implement three models: BM25, Model_1 and Model_2, and test them on the given data collection of 50 topics (50 datasets)

BM25 based on IRM

Model_1 based on PRM

Model_2 based on PRM, the pool bag model is used to represent the documents,
and logistic regression classification is used
Library: math, sklearn, pandas

Introduction
This unit discusses several ways to extend query-based methods, such as pseudo-relevant
feedback, query expansion or hybrid methods.
We Discover a good information filtering model that recommends relevant documents to users on all 50 topics

Identification: An algorithm for evaluating the correlation between searching words and documents

Model explanation:

Step: Results:

Pseudo-relevant model is well-researched query extension technique.
Library: math

Advantages
simple processing
achieved good results

Disadvantages
the accuracy is difficult to ensure
Not all queries improve performance

Pseudo-revelance feedback is a well-researched query extension technique, It assumes that the highest-ranked documents in the initial retrieval result set are relevant, and then extracts extensions from those documents.

Most traditional models do not consider both word frequency and co-occurrence relationship between candidate words and query words when selecting extension words. Words that appear at the same time as a query term are more likely to be related to the query topic.

Without user involvement, the system simply assumes that the first K chapters of the returned document are relevant, and then feeds back.

(1) Advantages
Do not consider the user factor, simple processing
Many experiments have also achieved good results

(2) Disadvantages
No user judgment, so the accuracy is difficult to ensure
Not all queries improve performance

Information retrieval

model2 is also a PRM, but uses a bag of words model to represent the documents, and then uses logestic regression to classify them

Library: sklearn, pandas

For all the files in a query, use CountVectorizer() to encode and get the features, then use a similar method as in model1 to get the pseudo tags of the files, train the logistic regression model with these data, and finally use the predicted probability of the model as the basis for ranking.

The result:

Results and Evaluation

Map, F1, Precision
The performance of BM25 model on 3 measures

The best model: Model_1

/docProps/thumbnail.jpeg

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com