PowerPoint Presentation
IFN647- Assignement II
demonstration
Copyright By PowCoder代写 加微信 powcoder
Introduction
Discover a good information filtering model that recommends relevant documents to users on all 50 topics
Approach and the Models
Use python to implement three models: BM25, Model_1 and Model_2, and test them on the given data collection of 50 topics (50 datasets)
BM25 based on IRM
Model_1 based on PRM
Model_2 based on PRM, the pool bag model is used to represent the documents,
and logistic regression classification is used
Library: math, sklearn, pandas
Introduction
This unit discusses several ways to extend query-based methods, such as pseudo-relevant
feedback, query expansion or hybrid methods.
We Discover a good information filtering model that recommends relevant documents to users on all 50 topics
Identification: An algorithm for evaluating the correlation between searching words and documents
Model explanation:
Step: Results:
Pseudo-relevant model is well-researched query extension technique.
Library: math
Advantages
simple processing
achieved good results
Disadvantages
the accuracy is difficult to ensure
Not all queries improve performance
Pseudo-revelance feedback is a well-researched query extension technique, It assumes that the highest-ranked documents in the initial retrieval result set are relevant, and then extracts extensions from those documents.
Most traditional models do not consider both word frequency and co-occurrence relationship between candidate words and query words when selecting extension words. Words that appear at the same time as a query term are more likely to be related to the query topic.
Without user involvement, the system simply assumes that the first K chapters of the returned document are relevant, and then feeds back.
(1) Advantages
Do not consider the user factor, simple processing
Many experiments have also achieved good results
(2) Disadvantages
No user judgment, so the accuracy is difficult to ensure
Not all queries improve performance
Information retrieval
model2 is also a PRM, but uses a bag of words model to represent the documents, and then uses logestic regression to classify them
Library: sklearn, pandas
For all the files in a query, use CountVectorizer() to encode and get the features, then use a similar method as in model1 to get the pseudo tags of the files, train the logistic regression model with these data, and finally use the predicted probability of the model as the basis for ranking.
The result:
Results and Evaluation
Map, F1, Precision
The performance of BM25 model on 3 measures
The best model: Model_1
/docProps/thumbnail.jpeg
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com