程序代写 COMP90049 Introduction to Machine Learning (Semester 1, 2022) Week 6: Solut

School of Computing and Information Systems The University of Melbourne
COMP90049 Introduction to Machine Learning (Semester 1, 2022) Week 6: Solutions
1. Consider the following dataset:
id apple ibm lemon sun label A 4 0 1 1 fruit B 5 0 5 2 fruit C 2 5 0 0 comp D 1 2 1 7 comp E2031? F1010?

Copyright By PowCoder代写 加微信 powcoder

(a) Treat the problem as an unsupervised machine learning problem (excluding the label attributes) and calculate the clusters according to k-means with k = 2, using the Manhattan distance, and instances A and F as starting seeds (initialized cluster centers).
This is an unsupervised problem, so we ignore (or don’t have access to) the label attribute. (We’re going to ignore id as well because it obviously isn’t a meaningful point of comparison.)
We begin by setting the initial centroids for our two clusters, let’s say cluster 1 has centroid C1 = (4, 0, 1, 1) and cluster 2 C2 = (1, 0, 1, 0).
We now calculate the (Manhattan) distance for each instance (“training” and “test” are equivalent in this context) to the centroids of each cluster:
d(A, C1) = | 4 − 4 | + | 0 − 0 | + | 1 − 1 | + | 1 − 1 | = 0 d(A, C2) = | 4 − 1 | + | 0 − 0 | + | 1 − 1 | + | 1 − 0 | = 4 d(B, C1) = | 5 − 4 | + | 0 − 0 | + | 5 − 1 | + | 2 − 1 | = 6 d(B, C2) = | 5 − 1 | + | 0 − 0 | + | 5 − 1 | + | 2 − 0 | = 10 d(C, C1) = | 2 − 4 | + | 5 − 0 | + | 0 − 1 | + | 0 − 1 | = 9 d(C, C2) = | 2 − 1 | + | 5 − 0 | + | 0 − 1 | + | 0 − 0 | = 7 d(D, C1) = | 1 − 4 | + | 2 − 0 | + | 1 − 1 | + | 7 − 1 | = 11 d(D, C2) = | 1 − 1 | + | 2 − 0 | + | 1 − 1 | + | 7 − 0 | = 9 d(E, C1) = | 2 − 4 | + | 0 − 0 | + | 3 − 1 | + | 1 − 1 | = 4 d(E, C2) = | 2 − 1 | + | 0 − 0 | + | 3 − 1 | + | 1 − 0 | = 4 d(F, C1) = | 1 − 4 | + | 0 − 0 | + | 1 − 1 | + | 0 − 1 | = 4 d(F, C2) = | 1 − 1 | + | 0 − 0 | + | 1 − 1 | + | 0 − 0 | = 0
We now assign each instance to the cluster with the smallest (Manhattan) distance to the cluster’s centroid: for A, this is C1 because 0 < 4, for B, this is C1 because 6 < 10, and so on. We will see that C is closer to cluster 2, D to cluster 2, F to cluster 2, and for E we have a tie. Let’s say we randomly break the tie for instance E by assigning it to cluster 2. (We’ll see what would have happened if we’d assigned E to cluster 1 below.) So, cluster 1 is {A,B} and cluster 2 is {C,D,E,F}. We re-calculate the centroids: 𝐶1 = $4 + 5 , 0 + 0 , 1 + 5 , 1 + 2+ = (4.5,0,3,1.5) 2222 𝐶2 = $2 + 1 + 2 + 1 , 5 + 2 + 0 + 0 , 0 + 1 + 3 + 1 , 0 + 7 + 1 + 0+ = (1.5,1.75,1.25,2) 4444 Now, let’s re-calculate the distances according to these new centroids: d(A,C1)=|4−4.5|+|0−0|+|1−3|+|1−1.5|= 3 d(A, C2) = | 4 − 1.5 | + | 0 − 1.75 | + | 1 − 1.25 | + | 1 − 2 | = 5.5 d(B, C1) = | 5 − 4.5 | + | 0 − 0 | + | 5 − 3 | + | 2 − 1.5 | = 3 d(B, C2) = | 5 − 1.5 | + | 0 − 1.75 | + | 5 − 1.25 | + | 2 − 2 | = 9 d(C, C1) = | 2 − 4.5 | + | 5 − 0 | + | 0 − 3 | + | 0 − 1.5 | = 12 d(C, C2) = | 2 − 1.5 | + | 5 − 1.75 | + | 0 − 1.25 | + | 0 − 2 | = 7 d(D, C1) = | 1 − 4.5 | + | 2 − 0 | + | 1 − 3 | + | 7 − 1.5 | = 13 d(D, C2) = | 1 − 1.5 | + | 2 − 1.75 | + | 1 − 1.25 | + | 7 − 2 | = 6 d(E, C1) = | 2 − 4.5 | + | 0 − 0 | + | 3 − 3 | + | 1 − 1.5 | = 3 d(E, C2) = | 2 − 1.5 | + | 0 − 1.75 | + | 3 − 1.25 | + | 1 − 2 | = 5 d(F, C1) = | 1 − 4.5 | + | 0 − 0 | + | 1 − 3 | + | 0 − 1.5 | = 7 d(F, C2) = | 1 − 1.5 | + | 0 − 1.75 | + | 1 − 1.25 | + | 0 − 2 | = 4.5 What are the assignments of instances to clusters now? Cluster 1 {A,B,E} and cluster 2 {C,D,F}. (Note that we’re at the same place now that we would have been if we’d randomly broke the tie for instance E to cluster 1 earlier.) We calculate the new centroids based on these instances: 𝐶1 = $4 + 5 + 2 , 0 + 0 + 0 , 1 + 5 + 3 , 1 + 2 + 1, ≈ (3.67,0,3,1.33) 3333 𝐶2 = $2 + 1 + 1 , 5 + 2 + 0 , 0 + 1 + 1 , 0 + 7 + 0, ≈ (1.33,2.33,0.67,2.33) 3333 We re-calculate the distances according to these new centroids: d(A,C1)≈ |4−3.67|+|0−0|+|1−3|+|1−1.33|≈ 2.67 d(A,C2)≈ |4−1.33|+|0−2.33|+|1−0.67|+|1−2.33|≈ 6.67 d(B,C1)≈ |5−3.67|+|0−0|+|5−3|+|2−1.33|≈ 4 d(B,C2)≈ |5−1.33|+|0−2.33|+|5−0.67|+|2−2.33|≈ 10.67 d(C,C1)≈ |2−3.67|+|5−0|+|0−3|+|0−1.33|≈ 11 d(C,C2)≈ |2−1.33|+|5−2.33|+|0−0.67|+|0−2.33|≈ 6.33 d(D,C1)≈ |1−3.67|+|2−0|+|1−3|+|7−1.33|≈ 12.33 d(D,C2)≈ |1−1.33|+|2−2.33|+|1−0.67|+|7−2.33|≈ 5.67 d(E,C1)≈ |2−3.67|+|0−0|+|3−3|+|1−1.33|≈ 2 d(E,C2)≈ |2−1.33|+|0−2.33|+|3−0.67|+|1−2.33|≈ 6.67 d(F,C1)≈ |1−3.67|+|0−0|+|1−3|+|0−1.33|≈ 6 d(F,C2)≈ |1−1.33|+|0−2.33|+|1−0.67|+|0−2.33|≈ 5.33 The new assignments of instances to clusters are cluster 1 {A,B,E} and cluster 2 {C,D,F}. This is the same as the last iteration, so we stop (and this is the final assignment of instances to clusters). (b) Perform agglomerative clustering of the above dataset (excluding the id and label attributes), using the Euclidean distance and calculating the group average as the cluster centroid. In the lectures you have been introduced to single link, complete link and group average methods to compute the distance of two clusters. Here we are using the cluster centroid. In this method we simply compute the average of the instances in a cluster as the new cluster centroid and compute the distance of two clusters based on the Euclidean distance of the cluster centroids. We begin by finding the pairwise similarities — or distances, in this case, between each instance. I’m going to skip the Euclidian distance calculations (you can work through them as an exercise) and go straight to the proximity matrix: A B C D E F - √18 √31 √49 √18 - √63 √61 √31 √63 - √60 √49 √61 √60 - √45 √53 √8 √14 √35 √45 - √6 √10 √36 √27 √53 √6 - We can immediately observe (without simplifying the square roots) that the most similar instances (with the smallest distance) are E and F. We will then form a new cluster EF, for which we calculate the centroid: (1.5, 0,2,0.5), and then we must calculate the distances to this new cluster1. A B C D EF A - √18 √31 √49 √𝟕.𝟓 B √18 - √63 √61 √𝟐𝟑.𝟓 C √31 √63 - √60 √𝟐𝟗.𝟓 D √49 √61 √60 - √𝟒𝟕.𝟓 EF √𝟕.𝟓 √𝟐𝟑.𝟓 √𝟐𝟗.𝟓 √𝟒𝟕.𝟓 - The closest distance now is A with the new cluster EF; the resulting cluster AEF has the centroid √8 √10 √14 √36 √35 √27 (! ,0,# ,$). """ AEF B C D AEF - √𝟐𝟎 √𝟐𝟖.𝟑 √𝟒𝟔.𝟑 B √𝟐𝟎 - √63 √61 C √𝟐𝟖.𝟑 √63 - √60 D √𝟒𝟔.𝟑 √61 √60 - Now B gets clustered with AEF; ABEF has the centroid (3, 0, 2.5, 1). ABEF C D ABEF - √𝟑𝟑. 𝟐𝟓 √𝟒𝟔. 𝟐𝟓 C √𝟑𝟑.𝟐𝟓 - √60 D √𝟒𝟔.𝟐𝟓 √60 - All that is left now is to assign C to ABEF; there is no need to calculate the centroid anymore, as there are only two clusters (ABCEF and D) remaining. Hence, we have here the agglomerate clustering E-F, A, B, C, D. 2. Revise the concept of unsupervised and supervised evaluation for clustering evaluation (a) Explain the two main concepts that we use to measures the goodness of a clustering structure without respect to external information. 1 There are other ways of performing this step, for example, single link: using the shortest distance out of the ones calculated above to the points in this cluster, so that the distance from A to EF is min (√8, √10) = √8 The two main concepts that we check in unsupervised clustering evaluation are the clusters cohesiveness and separability. We want the members of each cluster to be as integrated and close to each other as possible and meanwhile we want the clusters to be as separate and independent as possible from other clusters. Using the SSE metric, we can use W-SSE (Within Cluster Sum of Squared Errors) to measure intra-cluster cohesion and B-SSE (Between Cluster Sum of Squared Errors) to measure inter-cluster separation. (b) Explain the two main concepts that we use to measure the how well do cluster labels match externally supplied class labels. The two main factors in supervised clustering evaluation metrics are homogeneity and completeness. Homogeneity measures if all the elements of each cluster have the same true label. We can use measure the homogeneity of a cluster using Entropy and Purity. Completeness metric measure if all the members of a class are assigned to the same cluster. In other words, Homogeneity only considers if all the members of a cluster have the same label. But if there are too many clusters with the same label the homogeneity will not be able to detect it. For example, in a dataset with ten zeros and ten ones, a proper clustering would be forming two clusters each with 10 members. But if our system makes 20 clusters (one per instance), the homogeneity index (purity or entropy) will still show a perfect result, although the clustering is not very useful. So, we need another metric that indicate if all the members of a given label are assigned to the same cluster. This metric is completeness. 3. Consider a Naive Bayes model trained using the following familiar weather dataset: ID Outl Temp Humi Wind PLAY AshnFN BshhTN CohhFY DrmhFY ErcnFY FrcnTN Suppose that you made additional observations of days and their features. But you don’t have the label for the PLAY in these days: ID Outl Temp Humi Wind PLAY GomnT? HsmhF? How could you information into your Naïve Bayes model without manually annotating the labels? If necessary, recompute your model parameters. For incorporating the new unlabeled instances, we can use Self Training method to increase our training data. The Self-training steps are as follow: (a) Trainthelearneronthecurrentlylabelledinstances (b) Use the learner to predict the labels of the unlabeled instances (c) Wherethelearnerisveryconfident,addnewlylabelledinstancestothetrainingset (d) Repeat until all instances are labelled, or no new instances can be labelled confidently. For step (a) let’s train our model using the labelled instances. In NB we need the probability of each label (the prior probabilities): P(Play = Y) = % P(Play = N) = % $$ And all the conditional probabilities between the labels of class (PLAY) and other attribute values. P(Outl = s | N) = $ P(Outl = o | N) = 0 P(Outl = r | N) = % P(Outl = s | Y) = 0 P(Outl = o | Y) = % P(Outl = r | Y) = $ "" incorporate this P(Temp = h | N) = $ P(Temp = m | N) = 0 P(Temp = c | N) = % "" P(Temp = h | Y) = % P(Temp = m | Y) = % P(Temp = c | Y) = % """ P(Humi = n | N) = $ P(Humi = h | N) = % "" P(Humi = n | Y) = % P(Humi = h | Y) = $ "" P(Wind = T | N) = $ P(Wind = F | N) = % "" P(Wind = T | Y) = 0 P(Wind = F | Y) = 1 For step (b) using Laplace smoothing method we can find the label for instances G and H and check the confidence of NB model for classifying them. For G, this will look like: 𝑃(𝑁)× 𝑃(𝑂𝑢𝑡𝑙=𝑜|𝑁)𝑃(𝑇𝑒𝑚𝑝=𝑚|𝑁)𝑃(𝐻𝑢𝑚𝑖=𝑛|𝑁)𝑃(𝑊𝑖𝑛𝑑=𝑇|𝑁) =1×0+1×0+1×2+1×2+1=1×1×1×3×3 =𝟎.𝟎𝟎𝟓 23+33+33+23+226655 𝑃(𝑌)× 𝑃(𝑂𝑢𝑡𝑙=𝑜|𝑌)𝑃(𝑇𝑒𝑚𝑝=𝑚|𝑌)𝑃(𝐻𝑢𝑚𝑖=𝑛|𝑌)𝑃(𝑊𝑖𝑛𝑑=𝑇|𝑌) = 1 × 1 + 1 × 1 + 1 × 1 + 1 × 0 + 1 = 1 × 2 × 2 × 2 × 1 ≅ 0.004 23+33+33+23+226655 Using these results G will be classified as N with probability of 0.005. For H, we will have: 𝑃(𝑁)× 𝑃(𝑂𝑢𝑡𝑙=𝑠|𝑁)𝑃(𝑇𝑒𝑚𝑝=𝑚|𝑁)𝑃(𝐻𝑢𝑚𝑖=h|𝑁)𝑃(𝑊𝑖𝑛𝑑=𝐹|𝑁) =1×2+1×0+1×1+1×1+1=1×3×1×2×2 ≅0.007 23+33+33+23+226655 𝑃(𝑌)× 𝑃(𝑂𝑢𝑡𝑙=𝑠|𝑌)𝑃(𝑇𝑒𝑚𝑝=𝑚|𝑌)𝑃(𝐻𝑢𝑚𝑖=h|𝑌)𝑃(𝑊𝑖𝑛𝑑=𝐹|𝑌) = 1 × 0 + 1 × 1 + 1 × 2 + 1 × 3 + 1 = 1 × 1 × 2 × 3 × 4 ≅ 𝟎. 𝟎𝟏𝟑 23+33+33+23+226655 Using these results H will be classified as Y with probability of 0.013. In step (c), we are adding the instances that our model is confident about their label to our training set. Let’s assume the probability of higher than 0.01 as our confidentiality threshold. According to the results our model is not confident about instance G label (0.005 ≯ 0.01) but rather confident about the instance H label (0.013 > 0.01). So, we add H to our model training data and recompute our model parameters and repeat the steps again. Our new dataset would be as follows:
ID Outl Temp Humi Wind PLAY AshnFN BshhTN CohhFY DrmhFY ErcnFY FrcnTN HsmhFY
This time our prior probabilities will be
P(Play = Y) = & P(Play = N) = ”
Our conditional probabilities will also change to

P(Outl = s | N) = $ P(Outl = o | N) = 0 P(Outl = r | N) = % “”
P(Outl = s | Y) = % P(Outl = o | Y) = % P(Outl = r | Y) = $ &&&
P(Temp = h | N) = $ P(Temp = m | N) = 0 P(Temp = c | N) = % “”
P(Temp = h | Y) = % P(Temp = m | Y) = $ P(Temp = c | Y) = % &&&
P(Humi = n | N) = $ P(Humi = h | N) = % “”
P(Humi = n | Y) = % P(Humi = h | Y) = ” &&
P(Wind = T | N) = $ P(Wind = F | N) = % “”
P(Wind = T | Y) = 0 P(Wind = F | Y) = 1
Using these new parameters, we will recalculate the probability of labels for instance G.
𝑃(𝑁)× 𝑃(𝑂𝑢𝑡𝑙=𝑜|𝑁)𝑃(𝑇𝑒𝑚𝑝=𝑚|𝑁)𝑃(𝐻𝑢𝑚𝑖=𝑛|𝑁)𝑃(𝑊𝑖𝑛𝑑=𝑇|𝑁) =3×0+1×0+1×2+1×2+1=3×1×1×3×3 ≅𝟎.𝟎𝟎𝟒𝟐
73+33+33+23+276655
𝑃(𝑌)× 𝑃(𝑂𝑢𝑡𝑙=𝑜|𝑌)𝑃(𝑇𝑒𝑚𝑝=𝑚|𝑌)𝑃(𝐻𝑢𝑚𝑖=𝑛|𝑌)𝑃(𝑊𝑖𝑛𝑑=𝑇|𝑌)
= 4 × 1 + 1 × 2 + 1 × 1 + 1 × 0 + 1 = 4 × 2 × 3 × 2 × 1 ≅ 0.0038 74+34+34+24+277766
Instance G will still be classified as N with probability of 0.0042 but it doesn’t pass our confidentiality threshold (0.01) and therefor our self-training algorithm stops in this stage.
4. One of the strategies for Query sampling was query-by-committee (QBC), where a suite of classifiers is trained over a fixed training set, and the instance that results in the highest disagreement amongst the classifiers, is selected for querying. Using the equation below, which captures vote entropy, determine the instance that our active learner would select first.
𝑥∗ = argm𝑎𝑥(−[𝑉(𝑦+)𝑙𝑜𝑔 𝑉(𝑦+)) ‘( * ,!𝐶$𝐶
Respectively 𝑦- , 𝑉(𝑦- ), 𝑎𝑛𝑑 𝐶 are the possible labels, the number of “votes” that a label receives from the classifiers, and the total number of classifiers.
classifier
Instance 1 Instance 2 Instance 3
𝒚𝟏 𝒚𝟐 𝒚𝟑 𝒚𝟏 𝒚𝟐 𝒚𝟑 𝒚𝟏 𝒚𝟐 𝒚𝟑
𝐶$ 0.2 0.7 0.1 𝐶% 0.1 0.3 0.6 𝐶& 0.8 0.1 0.1 𝐶’ 0.3 0.5 0.2
0.2 0.6 0.05 0.9 0.1 0.8
0.1 0.6 0.1 0.3
0.2 0.21 0.21 0.58 0.05 0.75 0.01 0.24 0.1 0.1 0.28 0.62
For each instance, we calculate the total number of votes that each class label receives:
Instance 1 Votes Instance 2 Instance 3
classifier
𝐶& 0 1 0 0 1 0 0 0 1 V(1)=1 V(2)=2 V(3)=1 V(1)=0 V(2)=4 V(3)=0 V(1)=2 V(2)=0 V(3)=2
𝑦𝟏 𝑦𝟐 𝑦𝟑 𝑦𝟏 𝑦𝟐 𝑦𝟑 𝑦𝟏 𝑦𝟐 𝑦𝟑 𝐶% 0 1 0 0 1 0 1 0 0 𝐶$ 0 0 1 0 1 0 0 0 1 𝐶” 1 0 0 0 1 0 1 0 0

We have 4 classifiers in total, and after placing the vote values in the vote entropy, we get the following for each instance:
Instance 1: 𝐻 = − _% 𝑙𝑜𝑔$ % + $ 𝑙𝑜𝑔$ $ + % 𝑙𝑜𝑔$ %` = 1.5 &&&&&&
Instance 2: 𝐻 = −(1𝑙𝑜𝑔$1) = 0
Instance 3: 𝐻 = − _$ 𝑙𝑜𝑔$ $ + $ 𝑙𝑜𝑔$ $` = 1
The instance that we select is instance 1, for which the classifiers have the highest disagreement. This sample is most difficult to classify and may lie on the boundary between the three classes, therefore by querying this instance, we might learn more about the data space.

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com