Cardiff School of Computer Science and Informatics Coursework Assessment Pro‐forma
Module Code: Module Title: Lecturer: Assessment Title: Assessment Number: Date Set:
Submission Date and Time: Return Date:
CM3106
Multimedia
Dr Kirill Sidorov
Compressed representation of images for storage and retrieval 2/2
15 December 2020
29 January 2021 at 9:30am
Within three weeks of submission date
This assignment is worth 40% of the total marks available for this module. If the assign- ment is submitted late (and where there are no extenuating circumstances):
1. If it is submitted no later than 24 hours after the deadline, the mark for the assign- ment will be capped at the minimum pass mark.
2. If it is submitted more than 24 hours after the deadline, a mark of zero will be given for the assignment.
Your submission must include the official coursework submission cover sheet, which can be found here: https://docs.cs.cf.ac.uk/downloads/coursework/Coversheet.pdf
Submission Instructions
You should submit the following files via Learning Central:
Description
Cover sheet Report Code
Compulsory Compulsory Compulsory
Type
One PDF file One PDF file One ZIP archive
Name
[student number]_coversheet.pdf [student number]_report.pdf
No restriction for individual files
Any code submitted will be tested using MATLAB® 2020a and must be submitted as stipulated in the instructions above. Your code should be platform-independent and must run on MATLAB® 2020a.
Staff reserve the right to invite students to a meeting to discuss coursework submissions.
1
Assignment
This exercise involves carrying out a number of experiments in MATLAB® in order to investigate how principal component analysis (PCA) can be used for efficient storage and retrieval of images. The experiments to be carried out are described later in this document (see page 4 onwards).
You must submit a typeset report in PDF format — a short (no more that 3,000 words) written description of your experimental approach and results, with illustrations, tables, plots etc. where appropriate to corroborate your findings. The report should detail your solutions to the exercises below, your experimental results and their discussion, and should especially highlight any novel ideas or contributions that you devised on top of the basic experiments.
You should also submit all MATLAB® code as text files (.m) or Live Scripts and include all other supporting files that are needed to run your code in a single ZIP file for submission. Please do not include the faces dataset in your submission.
Learning Outcomes Assessed
This coursework particularly addresses the following learning outcomes of the module:
• Showfamiliarity,includingpracticalinvestigation,witharangeofmultimediaformats for text, graphics, animation, audio, images, and video by detailed study of common multimedia formats.
• Show familiarity with a range of common data compression methods and how they are used in common compression formats (such as MP3, JPEG and MPEG) through practical investigation and detailed study of the relevant mathematical foundations.
• Possess an understanding of the underlying concepts, representation, and processing of multimedia data with relevance to the applications in digital audio, imagery, and video, including digital audio processing and synthesis, audio/image/video compres- sion and multimedia data retrieval.
• Show an understanding of the derivation from mathematical principles of underlying data compression algorithms. Possess an awareness of the underlying compression techniques utilised in common compression formats (e.g. JPEG, GIF, MPEG).
Criteria for Assessment
The exercises are decomposed in four parts. The indicative weights of each part are as follows:
Exercise Weight
PART 1: Warm-up exercises: eigenspace representation of images 15% PART 2: Experiments with the truncated basis 20%
PART 3: Applying quantisation PART 4: Image retrieval
30% 35%
2
The solutions to each of the exercises will be assessed on the following scale:
• Fail(0:::39%)meansthesubmissiondoesnotadequatelyaddressthestatedrequire- ments for the part. There are no discernible experimental results. The discussion is lacking.
• 3rd (40 : : : 59%) means the submission minimally addresses the stated requirements; for example, many experiments have not been carried out as specified, but some progress towards the solution is emerging.
• 2.2 (60:::64%) means the submission partially addresses the stated requirements; for example, some elements of the solution may be partially working, with some promising experimental results, but the problem on the whole has not been adequately solved. The discussion is naive.
• 2.1 (65:::69%) means the submission fully addresses the stated requirements, but has weaknesses in terms of the weakness indicators below. Good experimental results with careful justification and thorough discussion.
• 1st (70:::100%) means the submission fully addresses the stated requirements, as well as meets the excellence indicators below. The professionalism is evident. The solutions are complete, the experimental results are impressive, the discussion is deep.
Factor
Approach
Argument
Insight
Weakness indicator
Does not adopt a professional or defensible approach to the solution.
Unstructured and shows little or no evidence of research nor of logical reasoning. Methods and conclusions are poorly jus- tified.
Little or no insight and under- standing.
Excellence indicator
Adopts appropriate methods with full justification for the choices made. Clear evidence that they work.
Clearly justified and demonstrable out- come.
Considerable insight and understand- ing of the theory, methods, positive or negative results is evident. Good re- flection and understanding of what has been accomplished is provided.
Feedback and Suggestions for Future Learning
Feedback on your coursework will address the above criteria. Individual feedback will be given via Grade Centre within three weeks of submission. Further general cohort feedback will be provided on Learning Central in the “Coursework Feedback” section under “Learning Materials”, at the same time as individual feedback.
3
Compressed Storage and Retrieval of Images using PCA Introduction
Previously in the module we have seen how a simple linear transform can make signals more amenable to compression. One powerful example of such a transform that we considered was the Discrete Cosine Transform (DCT) which represents signals as a linear combination of cosine harmonics. A question naturally arises: is there a different linear transform, with a different basis, that may be even more powerful or useful than DCT? The answer is: maybe.
The DCT uses the same basis no matter what data are being transformed, and, as we have seen, typically has good energy compaction properties: for a wide class of natural signals, the first few DCT coefficients offer a good approximation.
A better basis?
Idea: given a particular set of data points (vectors), we can try to find a basis with which these particular points can be approximated most compactly. This can be accomplished with Principal Component Analysis (PCA). Like DCT, Principal Component Analysis pro- duces an orthonormal basis. However, PCA chooses the basis in such a way, that for a given set of data points, most of the variation in the data happens along the first basis vector; and most of the remaining variation, that is not accounted for by the first basis vector, happens along the second basis vector; and so on. Thus, PCA has the best energy compaction property1 by construction, and can be used to approximate points with the first few coefficients of the transformed representation.
Let x 2 Rd be data points (observations) in d-dimensional Euclidean space, i = 1 : : : N . i
Let us first shift these points so that their mean is zero. The mean μ of the original points xi is simply
1 XN
μ= xi (1)
N
i=1
and we can subtract if from our points, yielding zero-centered points x~i = xi μ. Next, compose the observation matrix A 2 RdN by stacking x~ as columns of A:
i
01 “””
A=B@x~ x~ x~ CA: (2) 12N
###
In other words, columns of A are data points (observations), each being a d-dimensional
vector, centered around the mean. The covariance matrix of A is
C = AAT : (3)
The basis that we are looking for, V, is obtained by decomposing C into C = VLV1, where V is the matrix of eigenvectors — which will be our basis2, and L is a matrix
1At least for the data that was used to compute the basis. 2Hence the name eigenspace often used in the literature.
4
with the corresponding eigenvalues on the main diagonal. This decomposition is accom- plished using singular value decomposition (SVD)3 or eigenvalue decomposition (EVD). Incidentally, VLV1 = VLVT , since V will be orthonormal.
Let j be the eigenvalues (found on the main diagonal of L). In order to ensure the
desirable energy compaction property, the eigenvalues are sorted by magnitude in de-
scending order, 1 2 : : : d, and the corresponding eigenvectors v1; v2; : : : ; vd,
constituting the columns of V, are also sorted in the same order4. Thus, most of the
variation in the original data will, by the choice of the basis, happen along v1, most of
the remaining variation — along v , and so on. Projection of a vector x 2 Rd onto this 2
basis V takes the form
and since orthonormality of V implies V1 = VT, the inverse transform is then simply
x = Vp + μ: (5)
Approximation of data
One can discard the least significant basis vectors, responsible for the least amount of variation, keeping only the k “most important” ones: the thus truncated basis is V~dk = V(:; 1 : : : k). Each original data point x 2 Rd can be approximately described by a lower- dimensional (k d) parameter vector, p~ 2 Rk, by projecting x onto the truncated basis:
p~ = V~ T (x μ); (6) and, conversely, the original data points can be approximated using the reduced dimen-
sionality parameters p~, by unprojecting them back to the original space:
x V~ p~ + μ: (7)
Example: Figure 1 shows a simple example for the case of 2D points. Note that here the distribution of data is fairly skewed. In the basis computed by PCA, most of the variation in data is along the first basis vector (long green arrow), and the rest is along the sec- ond vector (short green arrow). In this new green coordinate system, an approximate description of data can be obtained by throwing away the coordinate along the short arrow.
Remark on computational complexity
Singular value decomposition has O(d3) time complexity. When d is large, but N is small,
we can simplify computations as follows. Suppose AdN is a centered observation matrix
and C = AAT is the corresponding covariance matrix. Now let T be an N-by-N matrix
T=ATA. Thenife,e,…,e aretheeigenvectorsofTthenvectorsAe,Ae,…, 12N 12
AeN aretheeigenvectorsoftheoriginalC(butnotnecessarilynormalisedtounitlength). Thus, the complexity is reduced to O(N3).
3See the svd function in MATLAB®.
4 The svd function does this sorting for you.
p = VT (x μ); (4)
5
Figure 1: Example: 2D data points in the original space and the PCA basis.
Implementation of PCA
To get you started, an implementation of PCA is provided for you in the Appendix (you can also download it from Learning Central)5. As you can see, it is only a few lines long.
Approximation of images with PCA
Let Irc be an r-by-c grayscale image6. It can also be represented as a vector x 2 Rrc by scanning the pixels of the image in some predefined order and concatenating them into a vector, which we will write as7 x = reshaperc1 I. In other words, an r-by-c image can be regarded as a point in rc-dimensional Euclidean space and an ensemble of such images corresponds to a point cloud in this space.
Under the assumption that natural images occupy only a tiny subspace of all possible images Rrc, we can use PCA to find a low-dimensional subspace spanned by the valid images, and use the PCA basis to approximate images with the first few coefficients in the transformed representation.
Experiments with Compact Representation of Images
For this exercise, we will be using the classic dataset of face images, “AT&T Database of Faces” which you can download from Learning Central. It consists of facial images (approximately aligned) of different 40 subjects, with 10 images per subject (total 400 images). The images are in gray scale, of size 11292 pixels. The 40 subjects are arranged each into folders named s1, s2, …, s40; each folder contains 10 images of that subject: 1.pgm, …, 10.pgm. Carry out the following experiments with these images.
5MATLAB® also has a built-in implementation of PCA (see the pca function), but I find its usage less intuitive.
6Generalisation to colour images is trivial.
7 In MATLAB® you can use either the reshape command, or simply the (:) operator.
6
PART 1: Warm‐up exercises: eigenspace representation of images
1. Download the faces dataset from Learning Central.
2. Write a piece of MATLAB® code to read the images from the dataset and construct the (11292)-by-400 observation matrix A by vectorising each image and putting the vectors into columns of A.
3. Carry out PCA on data A obtaining basis V, eigenvalues j, and the mean μ. Ex- amine the results:
(a) Display8 the mean image μ. Here and in what follows, you will need to “un- vectorise” the image by reshape’ing the corresponding rc-by-1 vector back to the original image size r-by-c.
In this and all subsequent exercises, describe your results, observations, and answers to questions in the report. Analyse and discuss the results.
(b) The basis vectors of V can be regarded as basis images whose linear combi- nation is used to represent original images. Display9 the first several basis vectors of V as images (again, appropriately reshape’ing). What can you ob- serve?
(d) Repeat the above10 but for randomly generated “images”. Generate the same number of images, and of the same size as before, but with uniformly dis- tributed random pixel values. Display the resulting basis images and the cu- mulative eigenenergy in this case, and compare the results with the case of real face images. What can you observe?
4. WriteapieceofcodetoprojectanimagevectorxontothebasisV,andtounproject it back. Do not forget to take care of the mean μ. How would you project the entire matrix A at once?
5. To visualise how the data varies along each i-th basis vector vi (“modes of varia- tion”), systematically alter the i-th component of the parameters vectors p, keeping other components zero, unprojecting p and visualising the results as images. It is a good idea to do so between 3 : : : + 3 in, say, 11 increments. Repeat this for the first several basis vectors, and show the results. What can you observe?
6. On a scatter plot, display the projections of all images A onto the i-th vs. the j-th basis vectors. Plot this for several different pairs of vectors. (You may also plot
8You can use the MATLAB® built-in functions imshow or imagesc to display images, but I highly rec- ommend that you download the following function which is more powerful: https://uk.mathworks.com/ matlabcentral/fileexchange/16233-sc-powerful-image-rendering
9You may wish to write a convenience function to display a number of images side by side, as a gallery. 10Obviously, it makes sense to wrap the code into a reusable function.
(c) The eigenvalues j reflect the amount of variation (“eigenenergy”) in the data along the corresponding basis vectors v . Plot the cumulative eigenenergy
P
along the first k basis vectors
total eigenenergy. What can you observe from this plot? How many basis vectors are needed to represent 90% of variation in the data? 80%? 95%? What can be said about the energy compaction property of PCA?
j
kj=1 k, as a function of k, in percent of the
7
3D scatter plots, showing projections onto triplets of basis vectors.) Modify your code so that the points corresponding to each of the 40 subjects are plotted in a different colour and/or with a different marker. Discuss your observations.
PART 2: Experiments with the truncated basis
Let V~k denote a truncated basis with only the first k most important basis vectors: V~k =V(:;1:::k).
7. Write a function to project images onto a truncated basis V~k, and to unproject them back.
8. Choose an image from the dataset, obtain a compressed representation by pro- jecting it onto Vk and reconstruct it by unprojecting back. Display, side by side, the original image, the reconstructed image, and the difference between the two. Repeat this systematically for the varying number k = 1; 2; 3; : : : ; kmax of basis vec- tors in V~k. What can you say about the reconstruction quality vs. the number of basis vectors? In particular, examine the perceptual differences, and objectively measure the root mean square error or the mean absolute difference between the original and the reconstructed images, as a function of k. Discuss how this tech- nique may be relevant for image compression.
9. Compare how well images can be represented with just the first k coefficients using the PCA basis vs. using the DCT basis. To compute the DCT transform of images, you may wish to use the dct2 function in MATLAB®. Since the 2D DCT coefficients have no natural sequential ordering by energy, to make the comparison more fair, traverse the matrix of 2D DCT coefficients in a zig-zag fashion, from the coefficients corresponding to the low-frequency harmonics to the high frequency ones, as shown in Figure 2.
Figure 2: Zig-zag order for traversing 2D DCT coefficients.
PART 3: Applying quantisation
The above PCA transform can already be used for image compression by truncating the basis. As we demonstrated in the module, carefully done quantisation can be used to compress data further, often without a perceived loss in quality.
10. Write a piece of code to quantise and de-quantise PCA coefficients of projected images. You may use a basic uniform quantisation scheme, or something more refined.
8
11. Investigate on your own how quantisation of PCA coefficients can be used, pos- sibly in conjunction with truncating the basis, to achieve image compression (at least for the case of face images). Can you find the best compromise between the compression ratio and the image quality, both in terms of perceptual quality and some objective difference measures (MAD, RMS)? Make sure to carefully evaluate and justify your findings. Compare this approach to compression with JPEG, and discuss the relative advantages and disadvantages.
12. Bonus marks can be gained by suggesting and implementing your own improve- ments to the above compression scheme, and for additional investigation.
For example:
(a) What if you split the images into blocks, like in JPEG, and compress each block individually?
(b) What if you somehow combine the PCA and the DCT basis?
(c) Is there something to be gained if you compress the images at different reso- lutions (scales) and then combine the results?
(d) How well does the method work on other images, not from this training set, for example images of your face?
…and so on. There are endless possibilities here and you are encouraged to come up with your own ideas and extensions.
PART 4: Image retrieval
We can also use the PCA coefficients as features to search for images in a database, or, in this case, to recognise faces. (See section “Further reading” below.) For example, assume you have a database of police mug shots (face images and the corresponding subject IDs), and you want to identify a suspect from a novel image.
13. Select a particular image (it will play the role of a “suspect”). This will be the novel (unseen) image that we want to identify. Here is how it can be done:
(a) ComputethePCAbasisfromallotherimagesexceptthisone(allotherimages will constitute our training set). Of course, in a real scenario this can be computed in advance.
(b) Project the known images onto the PCA basis, obtaining a feature vector (de- scriptor, signature) for each image in the police database. In doing so, record, for each feature vector, the corresponding subject ID.
(c) Project the novel (“suspect”) image onto the same PCA basis also.
(d) You may want to visualise, in scatter plots, as done above, the feature vectors corresponding to different subjects in different colours, and also visualise on the same plot the feature vector corresponding to the suspect.
(e) Find, among the known feature vectors, the one nearest to the feature vector of the suspect, thus (hypothetically, at least) establishing the suspect’s sub- ject ID. (The assumption here is that different images of the same face will map to nearby points in the feature space.)
9
14. Repeat the above procedure systematically for each image in the database. That is, conduct a leave-one-out experiment, in each iteration using all other images except the current one to compute the PCA basis, and then, as above, identify to which subject an unseen image belongs. Record the number of true/false positives and true/false negatives, and hence compute the accuracy of this method.
15. Investigate, whether the accuracy of the above approach can be improved if in- stead of searching for just one feature vector, you find K nearest vectors, and then establish the subject ID by majority vote.
16. Theimagesinthedatasetcontainalotofvariation,especiallyduetogloballighting variation, different hair styles, background etc. Investigate whether your recogni- tion accuracy improves if you crop the images on the sides, leaving only the central part of the face. How would you pre-process the images to minimise the effect of the overall brightness or lighting variation?
17. Bonus marks can be obtained for proposing improvements to this method, and/or for exceptionally deep investigation. There are countless possibilities how the ac- curacy of this simple approach can be improved, and you are encouraged to come up with your own ideas.
Further reading
You may find the following resources helpful:
• Discussion of the relevant theory in Wikipedia:
– https://en.wikipedia.org/wiki/Principal_component_analysis – https://en.wikipedia.org/wiki/Covariance_matrix
– https://en.wikipedia.org/wiki/Singular_value_decomposition
• The use of PCA for face recognition/classification has been proposed in the classic 1991 paper by Turk and Pentland:
– M. Turk, A. Pentland. “Eigenfaces for recognition”. Journal of Cognitive Neuroscience. 3 (1): 71–86, 1991. Download it here: http://www.cs.ucsb.edu/~mturk/Papers/jcn.pdf
– See also the discussion of eigenfaces in Wikipedia and references therein: https://en.wikipedia.org/wiki/Eigenface
– B. A. Draper, W. S. Yambor, J. R. Beveridge. “Analyzing PCA-based face recognition al- gorithms: eigenvector selection and distance measures”. 2002. https://www.cs.colostate.edu/evalfacerec/papers/eemcvcsu.pdf
10
Appendix: example PCA implementation
function [V, L, mu] = cw_pca(X)
%CW_PCA Principal Component Analysis.
% AnexampleimplementationforCM3106coursework.
%
% Input:
% Observation matrix X. Each column of X is a datapoint (observation). %
% Output:
% [V, L, mu] = kspca(X) computes the orthonormal basis V, with basis
% vectors arranged by the corresponding eigenenergy.
% It also returns the eigenvalues L and the mean of the data MU.
[d, N] = size(X); % Compute the mean
mu = mean(X, 2);
% Subtract the mean from data
X = bsxfun(@minus, X, mu);
if (N >= d) % More samples than dimensions
C = X * X’; % Covariance matrix
[V, D, ~] = svd(C); % Singular value decomposition
else % Fewer samples than dimensions, use the trick from the remark C=X’*X;
[V, D, ~] = svd(C);
V=X*V;
denom = repmat(sqrt(diag(D)’), d, 1); denom(denom == 0) = 1; % Avoid division by zero V = V ./ denom; % Scale basis vectors
end
% Eigenvalues
L = diag(D)’ / N;
11