CS代写 IFN647 Tutorial (Week 6): IR models

IFN647 Tutorial (Week 6): IR models
********************************************************
Task 1. TF-IDF is the product of two statistics, term frequency and inverse document frequency, to measure the weight of a term’s appearance in a document. Various ways for determining the exact values of both statistics exist.
Discuss the following recommended tf*idf weighting schemes and the one we discussed in lecture notes.

Copyright By PowCoder代写 加微信 powcoder

Task 2. Manually calculate the df value for each term in the following table.
D1 D2 D3 D4 D5 D6
term1 term2 term3 term4 term5
3 0 0 5 7 5 3 4 6 0 0 0 5 4 6
0 1 0 3 2 3 0 2 4 4
Task 3. Design a python function c_df(docs) to calculate df value for each term in docs to verify if you can get the same result as you did in Task 2. The function returns a {term:df, …} dictionary. In your program, you can represent the above table as follows when you use it to test your python function.
docs = {‘D1’:{‘term1’:3, ‘term4’:5, ‘term5′:7},’D2’:{‘term1’:5, ‘term2’:3, ‘term3’:4, ‘term4’:6}, ‘D3’:{‘term3’:5, ‘term4’:4, ‘term5’:6}, ‘D4’:{‘term1’:9, ‘term4’:1, ‘term5’:2}, ‘D5’:{‘term2’:1, ‘term4’:3, ‘term5′:2},’D6’:{‘term1’:3, ‘term3’:2, ‘term4’:4, ‘term5’:4}}

Task 4. Let Q = {US, ECONOM, ESPIONAG} be a query, and
C = {D1, D2, D3, D4, D5, D6, D7} be a collection of documents, where
D1 = {GERMAN, VW}
D2 = {US, US, ECONOM, SPY}
D3 = {US, BILL, ECONOM, ESPIONAG}
D4 = {US, ECONOM, ESPIONAG, BILL}
D5 = {GERMAN, MAN, VW, ESPIONAG}
D6 = {GERMAN, GERMAN, MAN, VW, SPY} D7 = {US, MAN, VW}
Assume relevant and non-relevant documents (user feedback) are labeled as follows:
Document ID
D1 D2 D3 D4 D5 D6 D7
Terms: dij
GERMAN, VW
US, US, ECONOM, SPY
US, BILL, ECONOM, ESPIONAG
US, ECONOM, ESPIONAG, BILL GERMAN, MAN, VW, ESPIONAG GERMAN, GERMAN, MAN, VW, SPY US, MAN, VW
Relevance to Q 0 no
1 yes 0 no 0 no 0 no
For a given incoming document D = {US, VW, ESPIONAG}, let term 1 = ‘US’, term 2 = ‘VW’ and term 3 = ‘ESPIONAG’. Based on binary independence model, work out the missing values for the following contingency tables, where di = 1 if term i is present in the document, and 0 otherwise.
d1 = 1 ri = 3 d1 = 0 R- ri = 0 Total R= 3
Non-relevant
ni-ri= 1 (N-R)-( ni -ri) = N- ni –R +ri = 3 N-R= 4
Non-relevant
N-ni –R+ri= N-R =
Non-relevant
N-ni –R+ri= N-R =
ni=4 N- ni = 3 N=7
ni= N-ni= N=
ni= N-ni= N=
d2 = 1 d2 = 0 Total
d3 = 1 d3 = 0 Total
ri = R-ri = R=
ri = R-ri = R=

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com