代写代考 Week 9 Question Solutions

Week 9 Question Solutions
Professor Yuefeng Li
School of Computer Science, Queensland University of Technology (QUT)
Classification introduction

Copyright By PowCoder代写 加微信 powcoder

Classification, also referred to as categorization, is the task of automatically applying labels (e.g., class names) to data, such as emails, web pages, or images.
Classification has been studied for many years and classification task is a classic machine learning problem. In machine learning, learning algorithms are typically characterized as supervised, semi-supervised or unsupervised. In supervised learning, a function or classifier is learned using a set of fully labeled data, which is often called a training set.
Once a classifier or function is learned, it can be applied to a set of unlabeled data in a real application (or called the test set in evaluation), in order to automatically apply labels.
Classification is often cast as a supervised learning problem. For example, given a set of emails that have been labeled as “spam” or “not spam” (the training set), a classifier can be learned. The model then can be applied to incoming emails in order to group them as “spam” or “not spam” classes.
Normally, a labeled data is obtained by asking human users to make judgments about a given piece of unlabeled data (e.g., “what the topic of a news article is?”). It is too expensive to provide a large amount of labeled data(samples) because of thousands of categories used and dynamic changes of user information needs. The popular technique for generating labeled data is semi-supervised learning (SSL) which usually assumes that a limited number of labeled data (a small training set) can be obtained. SSL intends to enlarge the small training set using unlabeled data in order to boost the performance. However, SSL has been overshadowed by the successes of purely supervised learning as it is very hard for dealing with uncertainties in unlabeled data using the existing techniques (e.g., co-training, data augmentation or uncertainty sampling for classifiers).

Another potential technique is unsupervised learning that recently has a catalytic effect in reviving interest in deep learning, and people [LeCun et al., 2015] predicated that unsupervised learning will become far more important in the longer term.
In term of statistics, unsupervised learning intends to infer prior probability distributions pC(x) and supervised learning intends to infer conditional probability distributions pC(x|Y) for any input object x and class C based on a large training set Y.
Priors can be created using a number of statistical methods (e.g., a normal distribution) or determined from previous experiments. However, in real applications, priors are universal if the relevant background is not taken into account. Recently, one of my PhD students extended the concept for inferring prior probability distributions pC(x) [Albishre et al., 2020]. We consider the relevant background by using a query (Q); therefore, unsupervised learning in this consideration intends to infer a probability distribution p(x,Q).
In this lecture, we mainly discuss supervised learning algorithms.
Classifiers
Naïve Bayes is one of the most straightforward yet effective classification techniques. It uses Bayes decision and Bayes’ rule as we discussed in IR models. For the binary classification (just two classes of interest, the relevant class and the non-relevant class), the obvious way is to calculate the probability P(C|D), where d is a document and c is a class label. You can find more details in the lecture notes.
Question 1. (Multinomial Document Representation)
Table 1 shows an illustration of how documents are represented in the multinomial event space. In this table, there are 10 documents (each with a unique document id), two class labels (spam and not_spam), and a vocabulary that consists of the terms “cheap”, “buy”, “banking”, “dinner”, and “the”.
Question 1 continued overleaf

Question 1 continued
Table 1. A training set (multinomial event space)
Based on Table 1, we can represent this training set for Naïve Bayes classification as follows: D = {d1, d2, …, d10}, C = {c1, c2} = {‘spam’, ‘not spam’},
V= {w1, w2, …, w5} = {‘cheep’, ‘buy’, ‘banking’, ‘dinner’, ‘the’}
where D is the training set, C is the set of class labels, and V is the set of terms (vocabulary). We can use the following equation to calculate the condition probability P(c|d)
𝑃(𝑐|𝑑) = log+𝑃(𝑐), + .”!∈$log (𝑃(𝑤!|𝑐))
(1) Based on Naïve Bayes Algorithm, calculate P(c) for all cÎC.
(2) Based on Naïve Bayes Algorithm, calculate P(wi|cj) for all wiÎV and all cjÎC.
Question 2. (Naive Bayes classification)
Let {c0, c1} be a set of classes and D be a training set, where each element in D is a pair (dj, cj), dj is a document and cj is its class. For example, Table 1 shows a training set D which contains 10 documents and their multinomial document representations.
Assume this training set is represented as two Python lists as follows:
D = [[0,0,0,0,2],[3,0,1,0,1],[0,0,0,0,1],[2,0,3,0,2],[5,2,0,0,1],[0,0,1,0,1],[0,1,1,0,1],[0,0,0,0,1],[0,0,0,0,1],[1,1,0,1,2]] C = [0,1,0,1,1,0,0,0,0,0]
where a document is represented as a list of terms’ frequency counts in the document (please note we assume the vocabulary set V= [‘cheep’, ‘buy’, ‘banking’, ‘dinner’, ‘the’] is numbered from 0 to n- 1), D is represented as a list of documents, C is represented as a list of class labels of the corresponding documents in D, and c1 (spam) and c0 (not spam) are labelled as 1 and 0 in C.

The training procedure of Naive Bayes classification is to calculates prior probability P(c) and conditional probabilities P(w|c) for all terms w in V and all classes c in {c0, c1}, where P(c) = Nc / |D|, P(w|c) = (tfw,c + 1)/(|c| + |V|), Nc is the number of training documents with class label c, tfw,c is the number of times that term w occurs in class c in the training set, and |c| is the total number of times of terms that occur in training documents with class label c.
It then uses Bayes’ Rule to compute the probability of class ci occurring for the given document d, P(ci|d) (i =0 or 1) for all document d, and assigns a class label ci to document d if ci is the highest probability of being computed given the document.
(1) Define a python function TRAIN_MULTINOMIAL_NB(C, D, V) to calculate prior probability P(c) and conditional probabilities P(wj|ci). Please note the function parameters are different to the Naive Bayes Algorithm in the lecture notes.
(2) Define a python function APPLY_MULTINOMIAL_NB(V, prior, condprob, d) to assign a label (1 or 0) to document d, where d is represented as a list of words.
Question 3. (Rocchio Classification)
Let C be a set of classes and D be a training set, where each element in D is a pair (dj, cj), dj is a document and cj is its class. The training procedure of Rocchio classification for the input training set D can be found in the lecture notes. It uses centroids to define the boundaries between classes. The centroid of each class is computed as the vector average of its members.
For the given topic “R101”, we have obtained the following training set which includes seven documents, ten selected features (terms) (VM, US, Spy, Sale, Man, GM, Espionag, Econom, Chief,
and Bill) and the non-relevant.
corresponding term weights, where Class = 1 means relevant, and Class = 0 means
US Spy Sale 0.20 0.00 0.10 0.00 0.10 0.20 0.20 0.00 0.00 0.30 0.00 0.30 0.00 0.15
Assume the above training set is represented as
Chief Bill Class 0.11 0.00 0.00 1 0.20 0.00 0.10 1 0.10 0.01 0.00 1 0.11 0.00 0.00 1 0.12 0.10 0.00 0 0.20 0.15 0.20 0 0.00 0.00 0.10 0
Doc VM ‘39496’ 0.17 0.01 ‘46547’ 0.10 0.21 ‘46974’ 0.00 0.23 ‘62325’ 0.17 0.01
0.02 0.20 0.12 0.00 0.00 0.22 0.05 0.10 0.10 0.02 0.20 0.12 0.10 0.20 0.00 0.20 0.00 0.00 0.20 0.25 0.00
‘6146’ 0.10 0.00 ‘18586’ 0.00 0.30 ‘22170’ 0.20 0.00
two Python dictionaries as follows:
document_vectors = {
‘39496’: {‘VM’: 0.17, ‘US’: 0.01, ‘Spy’: 0.20, ‘Sale’: 0.00, ‘Man’: 0.02, ‘GM’: 0.20, ‘Espionag’: 0.12,
‘Econom’: 0.11, ‘Chief’: 0.00, ‘Bill’: 0.00},

‘46547’: {‘VM’: 0.10, ‘US’: 0.21, ‘Spy’: 0.10, ‘Sale’: 0.00, ‘Man’: 0.00, ‘GM’: 0.00, ‘Espionag’: 0.22, ‘Econom’: 0.20, ‘Chief’: 0.00, ‘Bill’: 0.10},
‘46974’: {‘VM’: 0.00, ‘US’: 0.23, ‘Spy’: 0.10, ‘Sale’: 0.20, ‘Man’: 0.05, ‘GM’: 0.10, ‘Espionag’: 0.10, ‘Econom’: 0.10, ‘Chief’: 0.01, ‘Bill’: 0.00},
‘62325’: {‘VM’: 0.17, ‘US’: 0.01, ‘Spy’: 0.20, ‘Sale’: 0.00, ‘Man’: 0.02, ‘GM’: 0.20, ‘Espionag’: 0.12, ‘Econom’: 0.11, ‘Chief’: 0.00, ‘Bill’: 0.00},
‘6146’: {‘VM’: 0.10, ‘US’: 0.00, ‘Spy’: 0.00, ‘Sale’: 0.30, ‘Man’: 0.10, ‘GM’: 0.20, ‘Espionag’: 0.00, ‘Econom’: 0.12, ‘Chief’: 0.10, ‘Bill’: 0.00},
‘18586’: {‘VM’: 0.00, ‘US’: 0.30, ‘Spy’: 0.00, ‘Sale’: 0.30, ‘Man’: 0.20, ‘GM’: 0.00, ‘Espionag’: 0.00, ‘Econom’: 0.20, ‘Chief’: 0.15, ‘Bill’: 0.20},
‘22170’: {‘VM’: 0.20, ‘US’: 0.00, ‘Spy’: 0.00, ‘Sale’: 0.15, ‘Man’: 0.20, ‘GM’: 0.25, ‘Espionag’: 0.00, ‘Econom’: 0.00, ‘Chief’: 0.00, ‘Bill’: 0.10} }
relevance_judgements = {‘R101’: {‘0’: [‘6146’, ‘18586’, ‘22170’], ‘1’: [‘39496’, ‘46547’, ‘46974’, ‘62325’]}}
Let given_topic = ‘R101’, design a Python function train_Rocchio(document_vectors, relevance_judgements, given_topic) to calculate the centroid of the relevant class (named as ‘R101’) and the centroid of non-relevant class (named as ‘nR101’), respectively. For example, use the above table, we have ‘nR101’ = {‘VM’: 0.100, ‘US’: 0.100, ‘Spy’: 0.000, ‘Sale’: 0.250, ‘Man’: 0.167, ‘GM’: 0.150, ‘Espionag’: 0.000, ‘Econom’: 0.107, ‘Chief’: 0.083, ‘Bill’: 0.100}.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com