Note: An indicative mark is in front of each question. The total mark is 13. You may mark your own work when we release the solutions.
1. Consider 30-bit deep colour images of size 1200 × 1200. How many possible images of this size and bit depth are there?
3. Given a dataset {0, 2, 4, 6, 24, 26}, initialise the k-means clustering algorithm with 2 cluster centres c1 = 3 and c2 = 4. What are the values of c1 and c2 after one iteration of k-means? What are the values of c1 and c2 after the second iteration of k-means?
We define the centre values of cluster 1 and cluster 2 as c1 and c2. Let X = {0, 2, 4, 6, 24, 26}. There are two steps in each iteration of K-means algorithm, aiming to find:
Copyright By PowCoder代写 加微信 powcoder
MLAI Week 8 Exercise: Unsupervised Learning
For a 30-bit image, each pixel in the image can be represented by a 30-bit integer. The 30-bit integer can have 230 possible values. Therefore, the total number of distinct images with a size of 1200 × 1200 (1200 × 1200 pixels) is calculated as follows.
#Distinct Images = (230)(1200×1200) = 230×1200×1200 (1)
2. We are using PCA to reduce data dimensionality from 3 to 2. The top two eigenvectors are 0.4729 −0.8817
−0.8817 −0.4719 where each column is an eigenvector. Use this PCA transformation
to reduce the dimensionality of two data points x1 = (2, 3, 3)⊤ and x2 = (4, 1, 0))⊤ to 2 as
xˆ1 and xˆ2. Show the procedures to compute xˆ1 and xˆ2. Assume that all data points are centred already.
0.4729 −0.8817
Let R = −0.8817 −0.4719, X = (x1, x2). The projection of data points x1 and x2
from the 3-dimensional space to the 2-dimensional subspace spanned by the eigenvectors is calculated as follows.
Xˆ = ( x ˆ , x ˆ ) = R X = 12
0.4729 −0.8817 02 4 3 1 =
−0.8817 −0.4719 0 3 0
−1.6993 1.0099 −3.1791 −3.9987
“Assume that all data points are centred already.” was added while preparing the solution. Centring, i.e. subtracting the mean (from the training data), before projection is a good and common practice.
(i) 2
j=1 x(i)∈X allocated to j
In the first step we group the data points to a cluster whose centre they are closest to in terms of distance. With an initialisation of c1 = 3 and c2 = 4 in the first iteration, we allocate 0 and 2 to cluster 1 and 4, 6, 24, 26 to cluster 2. In the second step, we set new centres to the clusters found in the first step. In this step, we simply set the centre to the mean value of data points in the same cluster:
ci =E[Xi]= x(j) ,x(j) ∈Xi, i=1,2 (4) |Xi| i i
where X1 = {0, 2}, X2 = {4, 6, 24, 26}. Therefore the centres of cluster 1 and cluster 2 are updated to c1 = 1 and c2 = 15 in Eqn. (4). In the next iteration, we repeat step 1 and step 2. With the c1 = 1 and c2 = 15 from previous iteration, cluster 1 and cluster 2 will contain data points X1 = {0, 2, 4, 6}, X2 = {24, 26} in the second iteration. The new centres for cluster 1 and cluster 2 are updated to c1 = 3 and c2 = 25.
2 4. For the graph below, compute the normalised cut Ncut(A,B).
N cut(A, B) = cut(A, B) V ol(A)+V ol(B) V ol(A)V ol(B)
We first show the similarity matrix for the data S =
Note that the similarity of data between itself is 1 because the Gaussian Kernel in Eqn. (5) is equal to 1 when xi = xj. The similarity of data points without direct link in the graph is set to 0 because we assume |xi − xj|2 → ∞.
−|xi −xj|2
W(i, j) = exp σ2 (5)
1 0.7 0.9 0 0.2 0 0.7 1 0.5 0 0 0
0.9 0.5 1 0.1 0 0
0 0 0.1 1 0.4 0.6. 0.2 0 0 0.4 1 0.9 0 0 0 0.6 0.9 1
By the definition of volume in page 38 of week 8 slide, Vol(A) = sij, i = 1,2,3;j = 1,2,3,4,5,6 withsij isanelementinSandiandjaretheentriesofS. SoVol(A)=7.5. Similarly, Vol(B) = sij, i = 4,5,6;j = 1,2,3,4,5,6 with Vol(B) = 7.1. Finally, the cut(A,B) = sij, i = 1,2,3;j = 4,5,6 with cut(A,B) = 0.3. Note that S is symmetrical, so cut(A, B) can be also calculated by cut(A, B) = sij , i = 4, 5, 6; j = 1,2,3 withthesamevalue.
With Vol(A),Vol(B),cut(A,B) calculated above, the normalised cut Ncut(A,B) is obtained by:
N cut(A, B) = cut(A, B) V ol(A) + V ol(B) = 0.3 × 7.5 + 7.1 = 0.08225 (6) V ol(A)V ol(B) 7.5 × 7.1
3 5. An alternative to derive PCA is to minimize the reconstruction error (Slide 26) for all N data samples x(i),i = 1,··· ,N, assuming that the mean μ = i x(i) is zero. Take this approach to derive the first principal component (as the first eigenvector of the data matrix).
Solution: Themostelegantproofisfromhttps://people.eecs.berkeley.edu/~jorda courses/294-fall09/lectures/dimensionality/paper-1×2.pdf.
Let us denote an orthonormal projection vector as u. It will project an input vector
a scalar y = u⊤x. Using this scalar to reconstruct x as xˆ = uy = uu⊤x. Reconstruction error
N(i) (i)2 x−xˆ)
N (i) ⊤ (i)2
(7) (8) (9)
(10) note u⊤u=1
(11) (12) (13)
x(i) − uu⊤x(i))⊤(x(i) − uu⊤x(i)
x(i)⊤ − x(i)⊤ uu⊤)(x(i) − uu⊤x(i) i=1
x(i)⊤x(i) − x(i)⊤ uu⊤x(i) − x(i)⊤uu⊤x(i) + x(i)⊤ uu⊤uu⊤x(i)
x(i)⊤x(i) − x(i)⊤ uu⊤x(i) i=1
constant − x(i)⊤ uu⊤x(i)
N ⊤ ( i ) 2 u x
Note u⊤x(i) is the projection y(i) = u⊤x(i) so the summation in Eqn. (13) is the
variance. Maximising the variance minimises the reconstruction error so we have the same solution as that by variance maximisation.
3 6. In spectral clustering, show that the smallest eigenvalue for the formulated generalized eigenvalue problem on Slide 41 is 0 with the corresponding generalized eigenvector y = 1, hence the same “representation/embedding” for all nodes.
(D − W )y = λDy 11
(D − W )y = λD 2 D 2 y
D 2 (D−W)y=λD 2 D2D2y −1 −11 1
D2 (D−W)D2 D2y=λID2y 1
Make the substitution of z = D 2 y −1 −1
D2 (D−W)D2 z=λz
If we set y to 1 we get 1
D 2 (D−W)D 2 D21=λD21 −1 1
D2 (D−W)I1=λD21
If we observe (D − W)1, we can see that it’s a summation of the rows of D − W
In a row of the Laplacian, D−W, we have the degree of the ith node, di on the diagonal, and all of the negative weights of the edges connected to node i filling the rest of the row. Therefore, adding across a row gives us:
di + nj=1(−wi,j) = nj=1 wi,j + nj=1(−wi,j) = 0
Which means (D−W)1 = 0 and therefore the eigenvector corresponds to the eigenvalue λ = 0.