IFN647 – Assignment 2 Requirements Weighting: 35% of the assessment for IFN647
Deliverable Items
1. A final report in PDF or word file, which includes
• Statement of completeness and your name(s) and student ID(s) in a cover
Copyright By PowCoder代写 加微信 powcoder
• Your solution for all steps (See more details in the “Requirements for each
step” session).
• User manual section describing the structure of the data folder setup,
information on “importing packages” and how to execute Python code in
Terminal, IDLE or PyCharm; and
• An appendix listing the top 10 documents for all topics and all models.
2. Your source code for all steps, containing all .py files (using a zip file “code.zip” to put them together) necessary to run the solution and perform the evaluation (source code only, no executables); and
3. A poster and demo to your tutor during your week 13 workshop time.
Please zip all files (the final report, code, and poster) into a zip file as your “studentID_Asm2.zip” and submit it through the Blackboard (a single submission only for a group) before the due date.
Please do not include the dataset folder generated by “DataCollection.zip” or “RelevanceFeedback.zip” in your submission.
The following are the frameworks/libraries that you can used for assignment 2: (a) sk-learn
(c) pandas
(e) Matplotlib
If you want to use another package or library, you need to get your tutor’s approval. Due date of Blackboard Submission: Sunday week 13 (12th June 2022)
Group: You are required to work on this assignment in a group of three people.
Due to the difficulty of automatically obtaining user information requirements, most search systems only use queries (or topics) instead of user information requirements. The first reason is that users may not know how to represent the topics they are interested in. The second reason is that users may not want to spend a lot of effort mining relevant documents from the hundreds of thousands of candidates provided by the system.
This unit discusses several ways to extend query-based methods, such as pseudo-relevant feedback, query expansion or hybrid methods. In this assignment, for a given data collection (including 50 datasets), you need to discover a good information filtering model that recommends relevant documents to users on all 50 topics, where documents in each dataset are collected for a corresponding topic. The methodology you will use for this assignment includes the following steps:
• Step 1 – Design an IR-based baseline model (BM_IR) that ranks documents in each dataset using the corresponding queries for all 50 datasets.
• Step 2 – Based on the knowledge you gained from this unit, design two different models to rank documents in each dataset using corresponding queries from all 50 datasets. You can call them Model_1 and Model_2, respectively.
• Step 3 – Use python to implement three models: BM_IR, Model_1 and Model_2, and test them on the given data collection of 50 topics (50 datasets) and print out the top 10 documents for each dataset (put the output in the appendix of your final report).
• Step 4 – Choose three effectiveness measures to display testing results against the selected effectiveness measures in tables or graphs.
• Step 5 – Recommend the best model based on significance test and your analysis. Data Collection
It is a subset of RCV1 data collection. It is only for IFN647 students who will be supervised by Prof. Li Yuefeng. Please do not release this data collection to others.
DataCollection.zip file – It includes 50 Datasets (folders “dataset101” to “dataset150”) for topic R101 to topic R150.
“Topics.txt” file – It contains definitions for 50 topics (numbered from R101 to R150) for 50 datasets in the data collection, where each
Example of topic R102 – “Convicts, repeat offenders” is defined as follows:
Search for information pertaining to crimes committed by people who have been previously convicted and later released or paroled from prison.
Relevant documents are those which cite actual crimes committed by “repeat offenders” or ex-convicts. Documents which only generally discuss the topic or efforts to prevent its occurrence with no specific cases cited are irrelevant.
RelevanceFeedback.zip file – It includes relevance judgements (file “dataset101.txt” to file “dataset150.txt”) for all documents used in the 50 datasets, where “1” in the third column of each .txt file indicates that the document (the second column) is relevant to the corresponding topic (the first column); and “0” means the document is non-relevant.
Requirements for each step
Step 1: Design a BM25 based IR model as a baseline model (BM_IR). You need to use the following equation
for all topics R101 to R150, where Q is the title of a topic.
Formally describe your design for BM_IR in an algorithm to rank documents in each dataset using corresponding queries for all 50 datasets. You also need to determine the values for all parameters used in the above equation, and how to rank documents in each dataset.
Step 2. Design two different models Model_1 and Model_2
In this step, you can only use the 50 datasets (DataCollection.zip file) and topics (Topics.txt). You cannot use relevance judgements (RelevanceFeedback.zip). You may design IR-based models, pseudo relevance models or other hybrid methods.
Write your design (or ideas) for the two models into two algorithms. Your approach should be generic that means it is feasible to be used for other topics. You also need to discuss the difference between three models.
Step 3. Implement three models: BM_IR, Model_1 and Model_2
Design python programs to implement these three models. You can use a .py file for each model. Discuss the data structures used to represent a single document and a set of documents for each model (you can use the same data structure for different models). You also need to test the three models on the given data collection of 50 datasets for the 50 topics and print out the top 10 documents for each dataset (in descending order). The output will be put in the appendix of the final report.
Below is the output format for all 50 topics for the model BM_IR only.
Topic R101: …
Topic R102: 3
DocID 73038 26061 65414 57914 58476 76635 12769 12767 25096 78836
Topic R103 …
Topic R150
Weight 5.8987 4.2736 4.1414 3.9671 3.7084 3.5867 3.4341 3.3521 2.7646 2.6823
Step 4. Display test results on effectiveness measures
In this step, you need to use the Relevance Judgment (RelevanceFeedback.zip) to display the test results for the selected effectiveness metric.
You need to choose three different effectiveness measures to evaluate the test results, such as top-10 precision, MAP, F1, or interpolation. Evaluation results can be summarized in tables or graphs (e.g., precision-recall curves). Below is an example summary table of F1 measure for the three models.
Table 1. The performance of three models on F1 measure at position 25
R101 R102 R103
0.2100 0.0320 0.0765
0.2200 0.0350 0.0765
0.2300 0.0370 0.0787
………… R150 … … …
Step 5. Recommend the best model
You need a significance test to compare models. You can choose a t-test to perform a significance test on the evaluation results you reported in step 4. You can compare models between BM_IR and Model_1, BM_IR and Model_2, and/or Model_1 and Model_2. Based on t-test results (p-value and t-statistic value), you can recommend the best model. You can perform the t-test using a single effectiveness measure or multiple measures. Generally, using more effectiveness measures provides stronger evidence against the null hypothesis.
Note that if the t-test is unsatisfactory, you can use the evaluation results to refine your model. For example, you can adjust parameter settings or update your design.
Please Note
• Your programs should be well laid out, easy to read and well commented.
• All items submitted should be clearly labelled with your name and student number.
• Marks will be awarded for design (algorithms), programs (correctness, programming style,
elegance, commenting) and evaluation results, according to the marking guide.
• You will lose marks for missing or inaccurate statements of completeness or user manual,
and for missing sections, files, or items.
• Your results do not need to be exactly the same as the sample output.
• We recommend that you use a fair workload distribution approach, such as one person per
model, but the baseline model is simple, so the person responsible for the baseline may do
more in the evaluation.
• If your group has team conflict issues, your individual contributions will be assessed at the
Week 13 workshop (you may be asked to do a peer review); otherwise, all group members
will participate equally in this assessment project.
• See the marking guide for more details.
END OF ASSIGNMENT 2
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com