CSE 404: Introduction to Machine Learning (Fall 2020)
Homework #11 (Optional)
Due 12/14/2020 by 11.59pm
Note: (1) Please upload a soft copy on D2L and do not forget to upload your code.
1. (50 points) This question is related to Principle Component Analysis. You are to apply data
pre-processing techniques to a collection of handwritten digit images from the USPS dataset.
You can load the whole dataset into Python using the function loadmat in Scipy.io. The
matrix A contains all the images of size 16 by 16. Each of the 3000 rows in A corresponds to
the image of one handwritten digit (between 0 and 9).
(a) Implement PCA and apply it to the data using d = 10, 50, 100, 200 principal components.
You are not allowed to use an existing code. You can use existing packages for eigen-
decomposition.
(b) Reconstruct images using the selected principal components from (a).
(c) Report the first two reconstructed images for d = 10, 50, 100, 200.
NOTE: Please do NOT forget to upload your codes. Output of the codes will not be sufficient
to get the full point.
1