lOMoARcPSD|8369453
2021 Semester 2 Tutorial 2 solution
Knowledge Technologies (University of Melbourne)
StuDocu is not sponsored or endorsed by any college or university
Copyright By PowCoder代写 加微信 powcoder
Downloaded by
School of Computing and Information Systems The University of Melbourne
COMP90049 Introduction to Machine Learning (Semester 2, 2021) Week 3: Sample Solution
For the following dataset:
id apple ibm lemon sun CLASS TRAINING INSTANCES
A 4011FRUIT
B 5052FRUIT
C 2500COMPUTER
D 1217COMPUTER
TEST INSTANCES
Using the Euclidean distance measure, classify the test instances using the 1-NN method.
In this method we need to calculate the distance between a test instance and a prototype. To do so
we need a to use a similarity/distance function. Here our distance function is the Euclidian Distance:
𝑑!(𝐴,𝐵)=()(𝑎” −𝑏”)# ”
Using this function, we will calculate the distance between the test instance and each training instance:
𝑑!(𝑇$,𝐴)=.(2−4)# +(0−0)# +(3−1)# +(1−1)# = √8≈2.828
𝑑!(𝑇$,𝐵)=.(2−5)# +(0−0)# +(3−5)# +(1−2)# = √14≈3.742
𝑑!(𝑇$,𝐶)=.(2−2)# +(0−5)# +(3−0)# +(1−0)# = √35≈5.916
𝑑!(𝑇$,𝐷)=.(2−1)# +(0−2)# +(3−1)# +(1−7)# = √45≈6.708
The nearest neighbour is the one with the smallest distance — here, this is instance A, which is a FRUIT instance. Therefore, we will classify this instance as FRUIT.
The second test instance is similar:
𝑑!(𝑇#,𝐴)=.(1−4)# +(2−0)# +(1−1)# +(0−1)# = √14≈3.742
𝑑!(𝑇#,𝐵)=.(1−5)# +(2−0)# +(1−5)# +(0−2)# = √40≈6.325
𝑑!(𝑇#,𝐶)=.(1−2)# +(2−5)# +(1−0)# +(0−0)# = √11≈3.317
𝑑!(𝑇#,𝐷)=.(1−1)# +(2−2)# +(1−1)# +(0−7)# = √49=7
Here, the nearest neighbour is instance C, which is a COMPUTER instance. Therefore, we will classify this instance as COMPUTER.
lOMoARcPSD|8369453
Downloaded by
The inverse distance weighting method:
In this method we first need to choose a value for 𝜖, let’s say 1:
• For the first test instance:
o The first neighbour (a FRUIT) gets a weight of $
o The second neighbour (a FRUIT) gets a weight of o The first neighbour (a COMPUTER) gets a weight of
= $ = 0.2 )’$
lOMoARcPSD|8369453
(ii). Using the Manhattan distance measure, classify the test instances using the 3-NN method, for the three weightings we discussed in the lectures: majority class, inverse distance, inverse linear distance.
The first thing to do is to calculate the Manhattan distances, which is like the Euclidean distance, but without the squares/square root:
𝑑%(𝐴,𝐵)=)|𝑎” −𝑏”| ”
𝑑 % ( 𝑇$ , 𝐴 ) = | 2 − 4 | + | 0 − 0 | + | 3 − 1 | + | 1 − 1 | = 4 𝑑 % ( 𝑇$ , 𝐵 ) = | 2 − 5 | + | 0 − 0 | + | 3 − 5 | + | 1 − 2 | = 6 𝑑 % ( 𝑇$ , 𝐶 ) = | 2 − 2 | + | 0 − 5 | + | 3 − 0 | + | 1 − 0 | = 9 𝑑 % ( 𝑇$ , 𝐷 ) = | 2 − 1 | + | 0 − 2 | + | 3 − 1 | + | 1 − 7 | = 1 1
𝑑 % ( 𝑇# , 𝐴 ) = | 1 − 4 | + | 2 − 0 | + | 1 − 1 | + | 0 − 1 | = 6 𝑑 % ( 𝑇# , 𝐵 ) = | 1 − 5 | + | 2 − 0 | + | 1 − 5 | + | 0 − 2 | = 1 2 𝑑 % ( 𝑇# , 𝐶 ) = | 1 − 2 | + | 2 − 5 | + | 1 − 0 | + | 0 − 0 | = 5 𝑑 % ( 𝑇# , 𝐷 ) = | 1 − 1 | + | 2 − 2 | + | 1 − 1 | + | 0 − 7 | = 7
The nearest neighbours for the first test instance are A, B, and C. For the second test instance, they are C, A, and D.
The majority class weighting method:
In this method we effectively assign a weight of 1 to every instance in the set of nearest neighbours:
• For the first test instance, there are 2 FRUIT instances and 1 COMPUTER instance. There are more FRUIT than COMPUTER, so we predict FRUIT.
• For the second test instance, there are 2 COMPUTER instances and 1 FRUIT instance. There are more COMPUTER than FRUIT, so we predict COMPUTER.
Overall, FRUIT instances have a score of 0.2+0.14 = 0.34, and COMPUTER instances have a score of 0.1, so we would predict FRUIT for this instance.
Downloaded by
= ≈ 0.14 *’$
o The first neighbour (a COMPUTER) gets a weight of
Overall, FRUIT instances have a score of 0.5, and COMPUTER instances have a score of 1+0=1,
so we would predict COMPUTER for this instance. Can we do weighted k-NN using cosine similarity?
• For the second test instance:
$$ o The second neighbour (a FRUIT) gets a weight of =
lOMoARcPSD|8369453
o The first neighbour (a COMPUTER) gets a weight of $ = $ ≈ 0.17 &'( ,’$
o The first neighbour (a COMPUTER) gets a weight of
Overall, FRUIT instances have a score 0.14, and COMPUTER instances have a score of
0.17+0.12=0.29, so we would predict COMPUTER for this instance.
Note: If we have used Euclidean distance (instead of Manhattan distance) would give a different
result here.
The inverse linear distance weighting method:
In this method we are going to weight instances by re-scaling the distances according to the following formula, where dj is the distance of the jth nearest neighbour:
Note: Compared to the lecture version, we have substituted k = 3 here, because we are using the 3- Nearest Neighbour method.
• For the first test instance:
o The first neighbour (a FRUIT) gets a weight of &!0&” = +0) = 1
o The second neighbour (a FRUIT) gets a weight of = &!0&” +0)
o The first neighbour (a COMPUTER) gets a weight of
Overall, FRUIT instances have a score of 1+0.6 = 1.6, and COMPUTER instances have a score of
0, so we would predict FRUIT for this instance. • For the second test instance:
o The first neighbour (a COMPUTER) gets a weight of &!0&” = -0, = 1 &!0&” -0,
o The second neighbour (a FRUIT) gets a weight of = &!0&” -0,
Of course! If anything, this is easier than with a distance, because we can assign a weighting for each instance using the cosine similarity directly. An overall weighting for a class can be obtained by summing the cosine scores for the instances of the corresponding class, from among the set of nearest neighbours.
Let’s summarise all of these predictions in a table (overleaf). We can see that there is some divergence for these methods, depending on whether B or D is the 3rd neighbour for T2:
&!0&! &!0&”
&!0&! &!0&”
Downloaded by
lOMoARcPSD|8369453
Inst Measure k Weight Prediction
1- dE 3 Maj
3 ID 3 ILD 1 –
dM 3 Maj 3 ID
1 – cos 3 Maj 3 Sum
FRUIT FRUIT FRUIT FRUIT FRUIT FRUIT FRUIT FRUIT FRUIT FRUIT FRUIT
COMPUTER FRUIT FRUIT COMPUTER COMPUTER COMPUTER COMPUTER COMPUTER COMPUTER FRUIT FRUIT
1- 3 Maj 3 ID 3 ILD 1 –
dM 3 Maj 3 ID
1 – cos 3 Maj 3 Sum
3. Approximately 1% of women aged between 40 and 50 have breast cancer. 80% of mammogram screening tests detect breast cancer when it is there. 90% of mammograms DO NOT show breast cancer when it is NOT there. Based on this information, complete the following table.
Cancer Probability
No 99% Yes 1%
Cancer Test Probability
Yes Positive 80% Yes Negative ? No Positive ? No Negative 90%
Based on the probability rule of sum for mutually exclusive events (events that cannot both happen at the same time), we know that the sum of positive and negative test results should sum up to 1 (or 100%).
Therefore, when we have a patient with cancer (Cancer = ‘Yes’), and we know that there is 80% probability that the test detects it (Test returns ‘Positive’), it means that there is 20% chance (1 – 0.80 = 0.20) that the test does not detect the cancer (Test returns ‘Negative’ results). We call this a False Negative (wrong negative); you will learn more about it later in lectures.
Similarly, when a patient does not have cancer (Cancer = ‘No’), and we have that there is 90% chance that the test proves that (Test returns ‘Negative’), it means that there is 9% chance (1 – 0.9 = 0.1) that the test detects cancer (returns ‘positive’ results) when it is not there! We call this a False Positive (wrong positive), and again you will learn more about it later in lectures when we talk about evaluations.
So, the filled table would be as follow:
Cancer Test Probability
Yes Positive 80% Yes Negative 20% No Positive 10% No Negative 90%
Downloaded by
lOMoARcPSD|8369453
4. Based on the results in question 2, calculate the marginal probability of ‘positive’ results in a Mammogram Screening Test.
According to the law of total probability, we know that
𝑃(𝐴)= )𝑃(𝐴|𝐵1)𝑃(𝐵1) 1
So, to calculate the probability of ‘positive’ result for Test, we will have:
𝑃(𝑇𝑒𝑠𝑡 = ′𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒2) = 𝑃(𝑇𝑒𝑠𝑡 = ′𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒2|𝐶𝑎𝑛𝑐𝑒𝑟 = ′𝑛𝑜′). 𝑃(𝐶𝑎𝑛𝑐𝑒𝑟 = ′𝑛𝑜)
+ 𝑃(𝑇𝑒𝑠𝑡 = ′𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒′|𝐶𝑎𝑛𝑐𝑒𝑟 = ′𝑦𝑒𝑠′). 𝑃(𝐶𝑎𝑛𝑐𝑒𝑟 = ′𝑦𝑒𝑠′)
Based on the question definition, we know that the chance of having breast cancer (for females aged between 40 and 50) is 1% . So P(Cancer = ’yes’) = 0.01 and P(Cancer = ’no’) = 0.99.
From question 1, we know that the probability of a positive test result is 80% for a patient with cancer (P(Test = ‘positive’│Cancer= ‘yes’)=0.8) and the probability of a positive test result is 10% for a patient with no cancer (P(Test=’positive’|Cancer=’yes’)=0.1).
So, we have:
𝑃(𝑇𝑒𝑠𝑡 = ′𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒2) = 0.1 × 0.99 + 0.8 × 0.01 = 0.107
We can show all these in a Joint Probability Distribution table as follow. Test
Positive 0.01 x 0.8 = 0.008
5. Based on the results in question 2, calculate P(Cancer = ‘Yes’ | Test = ‘Positive’), using the Bayes Rule.
According to the Bayesian Rule, we know that we can calculate the probability that a person actually has breast cancer given that her mammography test results return positive, using the following formula:
𝑃(𝐶𝑎𝑛𝑐𝑒𝑟 = ′𝑦𝑒𝑠′|𝑇𝑒𝑠𝑡 = ′𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒′) = 𝑃(𝑇𝑒𝑠𝑡 = ′𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒′|𝐶𝑎𝑛𝑐𝑒𝑟 = ′𝑦𝑒𝑠′). 𝑃(𝐶𝑎𝑛𝑐𝑒𝑟 = ′𝑦𝑒𝑠′) 𝑃(𝑇𝑒𝑠𝑡 = ′𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒′)
Based on the given information in the question text, we know that “80% of mammogram screening tests detect breast cancer when it is there”, so P(Test = ‘positive’ | Cancer = ‘yes’) is 0.8 (80%).
Also, there 1% chance of having breast cancer (for females aged between 40 and 50). So P(Cancer = ‘yes’) = 0.01.
Also, from Question2, we have the P(Test = ‘positive’) = 0.107 (the expectation of ‘positive’ results for a mammogram test).
Total 0.01 x 0.2 = 0.002 0.01 0.99 x 0.9 = 0.891 0.99
0.99 x 0.1 = 0.099 0.107
We call the totals (row and column) the Marginal Probability, because they are in the margin!
Downloaded by
So we can easily calculate the P(Cancer= ‘yes’ | Test = ‘positive’): 𝑃(𝐶𝑎𝑛𝑐𝑒𝑟 = ′𝑦𝑒𝑠′|𝑇𝑒𝑠𝑡 = ′𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒′) = 0.8 × 0.01 ≅ 0.075 = 7.5%
lOMoARcPSD|8369453
This result shows that even if a mammography test results return positive, there is only a 7.5% chance that the person actually has Cancer!J
Downloaded by
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com