CISC 271, Winter 2018
Assignment #4
Due by 10:00PM on Tuesday, April 10, 2017
There are two questions for you to complete in this assignment. Read the following details and instructions carefully before you proceed to work on the problem.
Marking System:
- This assignment is worth 12% of your overall mark in the course.
- The quality of your Matlab code will be considered. Your code should be appropri- ately indented, sufficiently commented, and otherwise be appropriate software.
- Your code must produce every result and figure that you provide in your report. What to turn in:
- You will submit your answers, using the onQ system, as a single zip file named dxxxxxxxx.zip, where the x’s stand for your 8-digit Queen’s student number.
- Your submission ZIP file will contain:
- – Your report as a PDF file, named xxxxxxxx.pdf. All values and plots, for which you wish to receive grades, must be presented in this file.
- – Ana4main.mMatlabscriptfilethattakesnoarguments,returnsnovalues,and requires no user input or action. Running this file should produce every value that is included in the report on the console, as well as every plot as a separate figure, and should produce no other values or figures.
- – As many helper Matlab functions as needed to complete the assignments, in log- ically named .m files.
- – The data files you used for running or testing your code. These will include all of the relevant data that were provided as part of the assignment statement.
Policies:
- You must complete these questions individually.
- Although you are allowed to discuss the problems with other students, you must write
your own answers and Matlab code
- The code must run on the CasLab computers; if it does not run you may receive zero marks for the assignment
- Lateness policy applies starting the next calendar day of the submission deadline (as specified in onQ), at a rate of 20% off the assignment value per calendar day
1
Learning Outcomes:
• Use library functions for data analysis
• Evaluate results of library functions
• Identify strengths and weakness of library functions, as applied to simple data
Computational Instructions:
- Create a working directory and put the contents of the A4data.zip file into this directory. This directory must include all, and only, the files needed to reproduce the results in your report.
- The instructor has provided a function plotclass2d that plots classified 2D vectors. Use of this function is recommended, but you may provide your own code if you prefer.
- The instructor has provided a function plotclass3d that plots classified 3D vectors. Use of this function is recommended, but you may provide your own code if you prefer.
- The instructor has provided a function plotline that plots a 2D line into the current Matlab figure. Use of this function is recommended, but you may provide your own code if you prefer.
- The instructor has provided a function plotplane that plots a 3D plane into the current Matlab figure. Use of this function is recommended, but you may provide your own code if you prefer.
This function invokes external Matlab code that was written at INRIA, France. It is recommended that you not try to fully understand the rather complex external code.
- The function linseplearn is a “skeleton” function. You will need to modify this function so that it performs the Perceptron Algorithm, as described in the course notes. The instructor has set this up to avoid infinite loops and to handle “breaking” out of the loops when you think the computation has converged; you will need to work within this skeleton.
You may only modify the indicated section of the code.
- The function svm271 is a “wrapper” function. It calls a function in a Matlab toolbox
that is installed on the Caslab computers.
You may not modify this function. You may not invoke SVM code in any way other than via this function. The purpose is to have you perform maps for non-linear em- bedding by yourself, so that you better understand their relation to kernel methods.
- You may use any of the Matlab toolboxes that are loaded and licensed on a Caslab computer. You may not use any other toolboxes, or any code that you have not person- ally written.
2
Problem #1: Linearly Separable Data (16/24 Points)
In data analysis, a common task is supervised binary classification. This question uses a special subset of Fisher’s Iris data. The instructor has extracted the lengths and width of the flower petals, and excluded one data vector to make the problem easier to compute.
The data vectors are provided in the file zpetal.txt, and the file ypetal.txt contains the classification of the data vectors. Using the plotclass2d function that was provided by the instructor, you should be able to re-create Figure 1 as Plot 1 in your report.
5.5
5
4.5
4
3.5
3
2.5
2
1.5
1
3456789
Figure 1: Data vectors for Question 1. The Class +1 vectors are plotted as plus signs in blue; Class -1 vectors are plotted as open circles in red.
Your problem is to apply three methods of analysis to this data set. The methods, and what you are expected to report on, are:
(a) Use the Matlab function kmeans to perform unsupervised classification of the data vectors. This will mis-classify some vectors. You should display the results as Plot 2.
Can you count, and plot, the mis-classifications? If so: describe their characteristics; display the results as Plot 3; and report the numerical index of each mis-classified data vector. If you cannot find these, leave Plot 3 blank and state this in your report.
(b) Modifythe“skeleton”functionlinseplearn,providedintheZIPfile,toperformthe Perceptron Algorithm describes in the course notes. This is only a few lines of simple code, so you should not need to code extensively for this part of the question.
Compute a separating hyperplane for the data using the Perceptron Algorithm. You should display the original data vectors, plus the separating hyperplane, as Plot 4. In your report, give the numerical values for the augmented vector of this hyperplane.
(c) Use the provided svm271 function to perform the computations for a support vector machine (SVM). As in part (b), compute a separating hyperplane for the data. You should display the original data vectors, plus the separating hyperplane, as Plot 5. In your report, give the numerical values for the augmented vector of this hyperplane.
Summarize your findings concisely. What are the relative strengths and weaknesses of these methods, as applied to this specific data set?
3
Problem #2: Non-Linearly Separable Data (8/24 Points)
This is also a problem in supervised binary classification, using the instructor’s data vectors.
The data vectors are provided in the file zxor.txt, and the file yxor.txt contains the classification of the data vectors. Using the plotclass2d function that was provided by the instructor, you should be able to re-create Figure 2 as Plot 6 in your report.
2.5
2
1.5
1
0.5
0
-0.5
-1
-1.5
-2
-2.5
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5
Figure 2: Data vectors for Question 2. The Class +1 vectors are plotted as plus signs in blue; Class -1 vectors are plotted as open circles in red.
These data are not linearly separable. Your problem for this question is:
(a) Find and compute an embedding map into a 3D vector space in which the data vectors become linearly separable. Using the plotclass3d function that is provided, or your own plotting code, display the mapped data vectors as Plot 7 in your report.
(b) Usethefunctionsvm271tocomputeaseparatinghyperplaneforthe3Ddata.Usingthe plotplane function that was provided by the instructor, or your own plotting code, display the mapped data vectors and a separating hyperplane as Plot 8 in your report. In your report, give the numerical values for the augmented vector of this hyperplane.
You should be able to re-create Figure 3, perhaps from a different plotting viewpoint, as Plot 8.
3
2
1
0
-1
-2
-3 2
1.5 1 0.5 0 -0.5
0
Figure 3: Data vectors for Question 2, mapped to 3D vectors. The Class +1 vectors are plotted as plus signs in blue; Class -1 vectors are plotted as open circles in red.
4
-1
-1.5 -2
Summarize your mapping and your findings concisely. What are the relative strengths and weaknesses of the SVM method, as applied to this specific data set?
Marking Guide
A student’s grade will be based on the report and on the code used to generate the report.
The distribution of points for the assignment grade are:
16/24 points: quality of the figures and descriptions, as described above in the statement of the assignment; clarity may be assessed, in part, by a table that tersely summarizes the plots and the results
8/24 points: quality of the code in the modified linsepiterate.m file, and in the a4main.m file that was used to generate values and plots for the report
If a student does not exactly follow the instructions for the k-means or SVM computation, or uses code other than their own for the Perceptron Algorithm, both the report and the quality of the code will receive zero (0) points.
5