MTH6101 Introduction to Machine Learning
Coursework three: submit in Coursework box 14 at the latest 12.00 (noon) hours on Wednesday 11th March 2020 .
Read carefully the following instructions:
• Coursework is to be submitted individually, as a single A4 page which could be written on both sides so the maximum is two sides of A4. Submission is of a physical sheet of A4 paper, NOT email, unless you have a good reason to do so, such as disability or ECs in which case please communicate with me previously.
• Write clearly your name and student number on the top of the front side of your coursework.
• You are asked to submit answers depending on the LAST digit of your student number. Make sure you submit the answer the correct question, as submitting an answer to a question not allocated to you will lead to zero marks. If in doubt, ask the Module organizer which is your question.
• You will perform some computations numerically using R and your submission could be written using markdown, typed with a word processor or even done by hand.
• You are expected to only include relevant material to the question, and anything you put must be there for a reason. Your are not to include raw R output.
• This and each coursework is worth 6% of Module mark, so polish what you submit.
Description of activities
Measurements of chemical impurities were taken in a sample of six underground aquifers, referred to by acronyms: Nw, Yk, Dn, Gr, As, Ba. The impurities were quantified in four variables referred to by their acronyms: Ub, Uc, Jd and Va; and were measured in milligrammes per cubic metre. Data is not to be transformed.
1. Compute and report the euclidean distance matrix for these data.
2. Perform agglomerative cluster analysis with your data using euclidean dis- tance and the prescribed linkage. Report the dendrogram from R and briefly comment on the possible number of clusters in the data. Use R library cluster and function agnes.
3. Report a step by step description of the agglomerative clustering procedure using your data, euclidean distance and the prescribed linkage. In each step, report clusters formed and the updated dissimilarity matrix between clusters. Your results must be fully compatible with your earlier dendrogram.
1
The data and linkage to be used ID ending in 0; use single linkage
ID ending in
Ub Uc
Nw Yk Dn Gr As Ba
Nw Yk Dn Gr As Ba
1.45 3.93 1.64 3.85 1.82 4.13
1.6 3.72 2.51 2.87 2.14 2.96
1.71 1.38 1.64 1.27 1.03 1.84 1.82 1.25 1.96 2.55 1.39 2.27
Ub Uc 1.65 3.54 1.14 3.22
1 3.77 1.83 3.57 2.96 2.65 2.2 2.08
Jd Va 1.62 1.48 1.13 1.39 1.76 1.74 1.41 1.54 1.55 2.27 1.82 2.94
1; use average linkage Jd Va
ID ending in 2; use single linkage
3; use average linkage Nw 1.47 3.49 1.16 1.4
Yk 1.03 3.27 1.43 1.62 Dn 1.26 3.63 1.86 1.15 Gr 1.18 4.15 1.84 1.14 As 2.34 2.89 1.32 2.71 Ba 2.63 2.44 1.34 2.93
Ub Nw 1.82 Yk 1.18 Dn 1.37 Gr 1.14 As 2.77 Ba 2.84
Uc Jd
3.8 1.12 3.79 1.07 3.91 1 3.67 1.61 2.51 1.6 2.57 1.43
Va 1.54 1.04 1.16 1.12 2.85 2.14
ID ending in
Ub Uc Jd Va
ID ending in 4; use single linkage
ID ending in Ub Nw 1.73 Yk 1.78 Dn 1.64 Gr 1.65 As 2.37 Ba 2.35
ID ending in Ub Nw 1.17 Yk 1.29 Dn 1.97 Gr 1.14 As 2.06 Ba 2.08
ID ending in Ub Nw 1.09 Yk 1.41 Dn 1.68 Gr 1.93 As 2.88 Ba 2.29
5; use average linkage Uc Jd Va 3.63 1.51 1.5 3.65 1.9 1.13 3.72 1.44 1.95 3.17 1.47 1.86 2.37 1.76 2.11 2.81 1.04 2.72
7; use average linkage Uc Jd Va 3.55 1.24 1.7 3.46 1.78 1.38 4.06 1.77 1.58 3.84 1.26 1.43 2.58 1.23 2.72 2.89 1.3 2.99
9; use average linkage
Ub Uc Jd Nw 1.84 3.47 1.68 Yk 1.46 3.85 1.42 Dn 1.88 3.27 1.9 Gr 1.7 3.67 1.45 As 2.11 2.86 1.13 Ba 2.05 2.11 1.68
Va 1.7 1.3
1.95 1.84 2.3 2.85
ID ending in 6; use single linkage
Ub Nw 1.7 Yk 1.56 Dn 1.25 Gr 1.23 As 2.55 Ba 2.34
Uc Jd 3.71 1.14 3.18 1.47 3.73 1.01
4 1.15
2 1.32 2.34 1.23
Va 1.29 1.86 1.69 1.36 2.02 2.06
ID ending in 8; use single linkage
Ub Uc Jd Nw 1.54 4.02 1.62 Yk 1.01 3.61 1.97 Dn 1.99 4.15 1.5 Gr 1.19 3.92 1.04 As 2.58 2.1 1.46 Ba 2.06 2.47 1.88
Va 1.8 1.32 1.07 1.08 2.4 2.96
Uc 3.94 3.79 3.58 3.95 2.29 2.29
Jd Va 1.04 1.57 1.45 1.02 1.86 1.55
1.8 1.69 1.89 2.11 1.31 2.9
2
Grading: Q1-15% Q2-20% Q3-50%
Presentation-15%