Advanced Algorithms COMP4121 2020 Topics for the Major Project
• Abbreviation BHK refers to the textbook “Foundations of Data Science” by Blum, Hopcroft and Kannan, available for free at https://www.cs.cornell. edu/jeh/book.pdf.
• Abbreviation MC refers to the book “Networked Life” by Mung Chiang.
• Abbreviation KT refers to the COMP3121 textbook Algorithms Design by
Kleinberg and Tardos.
Here are a few topics for those undecided about what to do as a Major Project for this class (50% of your class mark). You can choose several kinds of projects. Here are some possible choices.
(I) You can choose the topic yourselves, in consultation with me. That would be the best choice, choosing something which you are personally interested in, preferably in a field that you would like to make a career in. If you have lined up a job already, talk to your future manager and ask them for a project related to what you will be working on. I’ll keep your project report confidential if your future employer requests that. You can also choose to implement an app that you have an idea for, in any field.
(II) You can write a scholarly essay, explaining a topic on the level equal to my lectures or in more detail. Here are a few options.
(1) Local Search. See pages 661-700 in KT. This is an important technique for finding an approximate solution to problems which are too hard to solve exactly in a reasonable amount of time. For a high mark, include solutions of the exercises at the end of this chapter of KT.
1
(2) Random Walks and Markov Chains See pages 76-116 in BHK. This is an extremely important material of which the PageRank is probably the most important example. For a high mark include solutions to a few exercises at the end of that chapter of BHK.
(3) Random Graphs See pages 245-299 in BHK. For a high mark include solutions to a few exercises at the end of the chapter.
(4) Topic Models, Nonnegative Matrix Factorisation, Hidden Markov Models, and Graphical Models See pages 310-355 in BHK. These are extremely important topics for those interested in advanced Machine Learning. For a high mark include solutions to a few exercises at the end of this chapter of BHK.
(5) Compressed Sensing and Sparse Vectors. See pages 360-373 in BHK. A relatively new field important for dealing with massive data. For a high mark include solutions to a few exercises at the end of that chapter of BHK.
(6) Wavelets. See pages 385-402. An important method for signal represen- tation, used for example, in JPEG2000. An alternative to the standard sinc-based signal representation. For a high mark include solutions to a few exercises at the end of that chapter of BHK.
(7) How does Google sell ad spaces? See pages 25 – 44 in MC. As an alternative, choose any chapter from MC and extend it with more details by finding relevant sources using Google.
(8) You can start reading “A first course in Machine Learning” by Simon Rogers and Mark Girolami. If you have not taken a machine learning class already, you are making a huge mistake – this is where most of the future jobs will be, especially future good jobs. Then summarise the part you managed to cover during the next couple of weeks. But, by all means, for your own good, finish reading the whole book (after the trimester is over and you are done with exams). In my opinion, this is by far the best introduction to Machine Learning. It accomplishes something I would think impossible: it assumes essentially only high school mathematics and no statistics background, and yet, by introducing math, probability and statistics as needed, it manages to do an entirely rigorous intro to Ma- chine Learning. No black boxes whatsoever, goes fairly deep and is really enjoyable to read. You can get a Kindle copy on Amazon and it will be one of the best investments you have ever made!
2
(III) You can choose to implement yourselves a few algorithms for the same task and compare their performance Examples include:
(1) Implement several clustering algorithms and apply them to synthetic and real life data sets. You can find such real life data sets at https:// archive.ics.uci.edu/ml/datasets.php?format=&task=clu&att=&area= &numAtt=&numIns=&type=&sort=nameUp&view=table.
(2) Implement several regression algorithms (such as the Ridge Regres- sion or LASO) and apply them to synthetic and real life data sets, such as the ones that can be found at https://archive.ics.uci.edu/ml/ datasets.php?format=&task=reg&att=&area=&numAtt=&numIns=&type= &sort=nameUp&view=table.
(3) Implement several classification algorithms and apply them to syn- thetic and real life data sets, such as https://archive.ics.uci.edu/ml/ datasets.php?format=&task=cla&att=&area=&numAtt=&numIns=&type= &sort=nameUp&view=table
(4) Join Kaggle competitions by going to https://www.kaggle.com/competitions to join; you can start with the Titanic competition https://www.kaggle. com/c/titanic; see also the tutorial https://www.kaggle.com/alexisbcook/ titanic-tutorial and then write a report on what you have learned.
This website is accessible as a starter to students with little exposure to Machine Learning.
Marking: There is no strict requirement how many pages your report should have; as a rough guideline, it usually takes 25-30 pages to present a topic well. Your submission must be self contained, and someone who reads it should be able to learn at least as much as you would learn by studying one topic which I have covered in class and to the same depth level or higher. You should include the bibliography, referencing any sources you have used. If you cite something verbatim, state so and give a reference. The main criterion in marking will be how profitable your project is for the advancement of your own knowledge and skill. This includes being able to document what you have learned and what you have done. I do not want you to do something just to pass the course; it has to be of real use to you and to anyone who might read your submission. Feel free to come to my Zoom office hours on Wednesday afternoon 2-3 https://unsw.zoom.us/j/95024053049 and discus what you want to do. Please start working on your project straight away!
3