Q1: Indexing Statistics
Indexing Statistics:
Question 1
Enter the number of indexed documents
Answer:
Question 2
Enter the size of the vocabulary
Answer:
Question 3
Enter the number of tokens indexed
Answer:
Question 4
Enter the number of pointers
Answer:
Question 5
Enter the time it took Terrier to index the collection (in seconds)
Answer:
Q2: Simple TF*IDF
Question 6
Paste your Java method code for your Simple TF*IDF weighting model
public double score(Posting p) {
//paste your code here.
}
Question 7
Consider a given document and two queries, namely:
• jury service
• jury service jury
Does your implemented Simple TF*IDF weighting model give the same score to this document for both queries?
Select one:
True
False
Question 8
On your created index, run the following query using the Simple TF*IDF weighting Scheme: jury service
What is the docno of the second ranked document returned by the system for this query?
Answer:
Question 9
On your created index, run the following query using the PL2 weighting Scheme: jury service
What is the docno of the second ranked document returned by the system for this query?
Answer:
Question 10
Inspect the top retrieved documents by Simple TF*IDF and PL2 for the query: jury service
Explain the differences in the retrieved documents by both weighting models.
(Select one or more answers)
Select one or more:
a. SimpleTF-IDF favours longer documents
b. SimpleTF-IDF will boost the docs with the higher term frequencies of related terms
c. SimpleTF-IDF will boost the docs with the higher query term frequencies
d. SimpleTF-IDF favours shorter documents
Q2: Vector Space TF*IDF
Question 11
Not yet answered
Not graded
Flag question
Question text
Paste your Java method code for your Vector Space TF.IDF Model implementation.
Question 12
On your created index, run the following query using your Vector Space TF*IDF Model implementation:
Consumer food advice
What is the docno of the second ranked document returned for this query?
Answer:
Question 13
Consider the following two queries:
Query 1: a history of American agriculture
Query 2: american agriculture history in america
Using your Vector Space TF*IDF Model implementation, provide the top 2 ranked documents by the system for these two queries, separated by comma (without space), i.e.
Gxx-xx-xxxxxxx,Gyy-yy-yyyyyyy,Gaa-aa-aaaaaaa,Gbb-bb-bbbbbbb
where Gxx-xx-xxxxxxx and Gyy-yy-yyyyyyy are the top two ranked documents for Query 1 and Gaa-aa-aaaaaaa and Gbb-bb-bbbbbbb are the top two ranked documents for Query 2.
Answer:
Question 14
When instantiating a WeightingModel class in Terrier, what is the purpose of setEntryStatistics()?
Select one:
a. To provide information about the document length, which is important for normalisation
b. To inform the weighting model about the query, such as the length of the query
c. To inform the weighting model of the collection statistics, such as the number of documents
d. To inform the weighting model of the statistics of the term, such as the document frequency
Clear my choice
Question 15
What is the term frequency of query term ‘jury’ in the 10th ranked document by your Vector Space Model implementation for the query: jury service
Answer:
Q3: Weighting Model Results (Simple TF*IDF)
窗体顶端
Question 16
Enter the MAP performance of Simple TF*IDF on HP04 topics
Answer:
Question 17
Enter the MAP performance of Simple TF*IDF on NP04 topics
Answer:
Question 18
Enter the MAP performance of Simple TF*IDF on TD04 topics
Answer:
Question 19
Enter the average MAP performance of Simple TF*IDF across the HP04, NP04 and TD04 topics
Answer:
窗体底端
窗体顶端
Q3: Weighting Model Results (Vector Space TF*IDF)
Question 20
Enter the MAP performance of Vector Space TF*IDF on HP04 topics
Answer:
Question 21
Enter the MAP performance of Vector Space TF*IDF on NP04 topics
Answer:
Question 22
Enter the MAP performance of Simple TF*IDF on TD04 topics
Answer:
Question 23
Enter the average MAP performance of Vector Space TF*IDF across the HP04, NP04 and TD04 topics
Answer:
窗体底端
Q3: Weighting Model Results (Terrier TF*IDF)
Question 24
Enter the MAP performance of Terrier’s TF*IDF on HP04 topics
Answer:
Question 25
Enter the MAP performance of Terrier’s TF*IDF on NP04 topics
Answer:
Question 26
Enter the MAP performance of Terrier’s TF*IDF on TD04 topics
Answer:
Question 27
Enter the average MAP performance of Terrier’s TF*IDF across the HP04, NP04 and TD04 topics
Answer:
Q3: Weighting Model Results (BM25)
Question 28
Enter the MAP performance of BM25 on HP04 topics
Answer:
Question 29
Enter the MAP performance of BM25 on NP04 topics
Answer:
Question 30
Enter the MAP performance of BM25 on TD04 topics
Answer:
Question 31
Enter the average MAP performance of BM25 across the HP04, NP04 and TD04 topics
Answer:
Q3: Weighting Model Results (PL2)
Question 32
Enter the MAP performance of PL2 on HP04 topics
Answer:
Question 33
Enter the MAP performance of PL2 on NP04 topics
Answer:
Question 34
Enter the MAP performance of PL2 on TD04 topics
Answer:
Question 35
Enter the average MAP performance of PL2 across the HP04, NP04 and TD04 topics
Answer:
Q3: Recall-Precision Graphs
Question 36
Upload your 3 Recall-Precision graphs for each of the HP04, NP04 and TD4 topic sets. Use a single PDF document to show the three graphs.
Maximum file size: 230MB, maximum number of files: 1
You can drag and drop files here to add them.
Question 37
Enter the precision of PL2 on the Homepage Finding Task (HP04) at the interpolated 0.2 Recall.
Answer:
Question 38
Enter the precision of Simple TF*IDF on the Named Page Finding Task (NP04) at the interpolated 0.5 Recall.
Answer:
Question 39
On the TD04 topic set, what is the best performing weighting model among the 5 evaluated models on the early interpolated recall values (i.e. recall >= 0.1 and <= 0.2)?
Select one:
A. BM25
B. Terrier's TF*IDF
C. PL2
D. Simple TF*IDF
E. Vector Space TF*IDF
Clear my choice
Question 40
Identify the most effective weighting model in terms of MAP across the 3 topic sets
Select one:
1. PL2
2. Terrier's TF*IDF
3. BM25
4. Vector Space TF*IDF
5. Simple TF*IDF
Q4: Query Expansion with Best Weighting Model
Question 41
Enter the MAP performance of the identified best weighting model + Query Expansion on the HP04 topics
Answer:
Question 42
Enter the MAP performance of the identified best weighting model + Query Expansion on the TD04 topics
Answer:
Q4: Query Expansion with Simple TF*IDF
Question 43
Enter the MAP performance of Simple TF*IDF + Query Expansion on the HP04 topics
Answer:
Question 44
Enter the MAP performance of Simple TF*IDF + Query Expansion on the TD04 topics
Answer:
Query Expansion with Vector Space TF*IDF
Question 45
Enter the MAP performance of Vector Space TF*IDF + Query Expansion on the HP04 topics
Answer:
Question 46
Enter the MAP performance of Vector Space TF*IDF + Query Expansion on the TD04 topics
Answer:
Q4: Query-By-Query Histograms (QE vs No QE)
Question 47
Enter your 2 query-by-query histograms comparing your identified best weighting model in Q3 with and without query expansion on each of the topic sets (HP04, TD04). Upload a single PDF document showing the 2 histograms.
Maximum file size: 230MB, maximum number of files: 1
You can drag and drop files here to add them.
Question 48
Enter the number of queries whose performances have degraded after the application of query expansion on the HP04 topics
Answer:
Question 49
Enter the number of queries whose performances have improved after the application of query expansion on the TD04 topics
Answer:
Question 50
The application of query expansion on query ID 5 (American Music) of the TD04 topic set has improved its performance
Select one:
True
False
Q4: Analysis of QE vs No QE
Question 51
Consider the following query: Pop stars who once worked at McDonald's
Based on your understanding of the course material, the application of query expansion using the Vector Space Model on this query will enhance the Average Precision of the query.
Select one:
True
False
Question 52
Based on your understanding of the lecture material, when is query expansion likely to work when applied with a Vector Space Model:
(Select one or more answers)
Select one or more:
a. The query is well formed with a clear information need
b. The query vector is somehow close to the vector representations of the documents the user desires
c. All the relevant documents use different vocabulary from the query
d. The query is full of misspellings and vague terms
e. The relevant documents are tightly clustered in the vector space
Question 53
Based on your understanding of the course material, your expectation was that the application of query expansion on the Homepage Finding topics (HP04) ....
Select one:
A. will improve the system's MAP performance
B. will not help to improve the system's MAP performance
Clear my choice
Question 54
Based on your understanding of the course material, the application of query expansion on the topic distillation (TD04) topics was expected to improve the system's MAP performance ?
Select one:
True
False
To Conclude ....
Question 55
Enter how many hours you spent on this exercise?
Answer:
Question 56
Do you have any additional feedback on the exercise? e.g. what did you find the most difficult?