wk7_lec_support
Week 7 Lecture review question¶
© Professor Yuefeng Li
Copyright By PowCoder代写 加微信 powcoder
Question 3 (2)¶
Define a python function df2(docs, x, y) to calculate nxy – the number of documents that include both terms x and y.
#define a function to calculate n_{ab}
def df2(docs, a, b):
“””Calculate the number of documents that include both terms.”””
for id, doc in docs.items():
if (a in doc) and (b in doc):
nab = nab + 1
return nab
# test the function
# the document collection is represented as a dict
docs = {‘D1’:{‘term1’:3, ‘term4’:5, ‘term5′:7},’D2’:{‘term1’:5, ‘term2’:3, ‘term3’:4, ‘term4’:6}, ‘D3’:{‘term3’:5, ‘term4’:4, ‘term5’:6}, ‘D4’:{‘term1’:9, ‘term4’:1, ‘term5’:2}, ‘D5’:{‘term2’:1, ‘term4’:3, ‘term5′:2},’D6’:{‘term1’:3, ‘term3’:2, ‘term4’:4, ‘term5’:4}}
# test the input
n45= df2(docs, ‘term4’, ‘term5’)
print(n45)
Question 3 (3)¶
(3) Assume python function c_df(docs) calculates df value for all terms in docs and returns a {term:df, …} dictionary. Use both functions df2() and c_df() to define a Python function MIM(docs, a, b) to calculate the MIM value for terms a and b.
#define a function to calculate df for all terms
def c_df(docs):
“””Calculate DF of each term in docs and return a {term:df, …} dictionary.”””
for id, doc in docs.items():
for term in doc.keys():
df_[term] += 1
except KeyError:
df_[term] = 1
return df_
#define a function to calculate MIM for two terms
import math
def MIM(docs, a, b):
t_df = c_df(docs)
na = t_df[a]
nb = t_df[b]
nab = df2(docs, a, b)
_mim = math.log((len(docs)*nab/(na*nb)), 2)
return _mim
Test the definations
mim_12 = MIM(docs, ‘term1′,’term2’)
mim_45 = MIM(docs, ‘term4′,’term5’)
print(mim_12, mim_45)
print(len(docs))
-0.4150374992788438 0.0
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com