CS代考 INFS7410 Project – Part 2

# INFS7410 Project – Part 2

_version 1.0_

Copyright By PowCoder代写 加微信 powcoder

### Preamble

The due date for this assignment is **21 October 2022 16:00 Eastern Australia Standard Time**.

This part of the project is worth 20% of the overall mark for INFS7410 (part 1 + part 2 = 40%). A detailed marking sheet for this assignment is provided alongside this notebook. The project is to be completed individually.

We recommend that you make an early start on this assignment and proceed by steps. There are several activities you may have already tackled, including setting up the pipeline, manipulating the queries, implement some retrieval functions, and performing evaluation and analysis. Most of the assignment relies on knowledge and code you should have already have experienced in the computer practicals; however, there are some hidden challenges here and there that you may require some time to solve.

Project aim: The aim of this project is for you to implement several neural information retrieval methods, evaluate them and compare them in the context of a multi-stage ranking pipeline.

The speficic objectives of Part 2 is to:

* Setup your infrastructure to index the collection and evaluate queries.
* Implement neural information retrieval models (only inference).
* Implement multi-stage ranking pipelines, i.e., BM25 + neural rankers.

### The Information Retrieval Task: Web Passage Ranking

As in part 1 of the project, in part 2 we will consider the problem of open-domain passage ranking in answer to web queries. In this context, users pose queries to the search engine and expect answers in the form of a ranked list of passages (maximum 1000 passages to be retrieved).

The provided queries are actual queries submitted to the Microsoft Bing search engine. There are approximately 8.8 million passages in the collection, and the goal is to rank them based on their relevance to the queries.

### What we provide you with:

#### Files from practical

* A collection of 8.8 million text passages extracted from web pages (`collection.tsv`— provided in Week 1).
* A query file that contains 43 queries for you to perform retrieval experiments (`queries.tsv`— provided in Week 2).
* A qrel file containing relevance judgements to tune your methods (`qrels.txt`— provided in Week 2).
* Pytorch model files for ANCE.

#### Extra files for this project

* A leaderboard system for you to evaluate how well your system performs.
* A test query file that contains 54 queries for you to generate run files to submit to the leaderboard (`test_queries.tsv`).
* This jupyter notebook, which you will include inside it your implementation and report.
* An hdf5 file that contains TILDEv2 pre-computed terms weights for the collection. Download from this [link](https://drive.google.com/file/d/199IO4E2ThiyLkMWokfr3Y9JY3DWSoFLt/view?usp=sharing)

Put this notebook and provided files under the same directory.

#### What you need to produce

You need to produce:

* Correct implementations of the methods required by this project specifications.
* An explanation of the retrieval methods used, including the formulas that represent the models you implemented and code that implements that formula, an explanation of the evaluation settings followed, and a discussion of the findings. Please refer to the marking sheet to understand how each of these requirements are graded.

You are required to produce both of these within this jupyter notebook.

#### Required methods to implement

In Part 2 of the project, you are required to implement the following retrieval methods. All implementations should be based on your code (except for BM25, where you can use the Pyserini built-in SimpleSearcher).

1. Dense Retriever (ANCE): Use ANCE to re-rank BM25 top-k documents. See the practical in Week 10 for background information.
2. TILDEv2: Use TILDEv2 to re-rank BM25 top-k documents. See the practical in Week 10 for background information.
3. Three-stage ranking pipeline: Use TILDEv2 to re-rank BM25 top-k documents, then use monoBERT to re-rank TILDEv2 top-k documents. See the practical in Week 9 and Week 10 for background information.

You can choose an arbitrary number for the choice of cut-off k, but you need to be aware that these neural models are slow to perform inference on the CPU, where a large k might be infeasible. You are free to use Colab, but make sure you copy your code in this notebook.

For TILDEv2, unlike what you did in practical, we offer you the pre-computed term weights for the whole collection (for more details, see the `Initial packages and functions` cell). This means you can have a fast re-ranking speed for TILDEv2. Use this advantage to trade-off effectiveness and efficiency for your three-stage ranking pipeline implementation.

You should have already attempted many of these implementations above as part of the computer pracs exercises.

#### Required evaluation to perform

In Part 2 of the project, you are required to perform the following evaluation:

1. For all methods, report effectiveness using `queries.tsv` and `qrels.txt` and submit your runs on the `test_queries.tsv` using the parameter values you selected from the `queries.tsv` to the leaderboard system.
2. Report every method’s effectiveness and efficiency (average query latency) on the `queries.tsv` and the corresponding cut-off k into a table. Perform statistical significance analysis across the results of the methods and report them in the tables.
3. Produce a gain-loss plot that compares the most and least effective of the three required methods above in terms of on `queries.csv`.
4. Comment on trends and differences observed when comparing your findings. Is there a method that consistently outperforms the others on the `queries.tsv` and the `test_queries.tsv`?

Regarding evaluation measures, evaluate the retrieval methods with respect to nDCG at 10 (`ndcg_cut_10`). You should use this measure as the target measure for tuning. Also compute reciprocal rank at 1000 (`recip_rank`),  MAP (`map`) and Recall at 1000 (`recall_1000`).

For all statistical significance analyses, use a paired t-test and distinguish between p<0.05 and p<0.01. #### How to submit You will have to submit one file: 1. A zip file containing this notebook (.ipynb) and this notebook **as a PDF document**. The code should be able to be executed by us. Remember to include all your discussion and analysis also in this notebook and not as a separate file. It needs to be submitted via the relevant Turnitin link in the INFS7410 BlackBoard site by **28 October 2021, 16:00 Eastern Australia Standard Time**, unless you have been given an extension (according to UQ policy), *before* the due date of the assignment. 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com