程序代写代做 kernel algorithm C CSE 404: Introduction to Machine Learning (Spring 2020)

CSE 404: Introduction to Machine Learning (Spring 2020)
Exam 3
Due: 11:59 PM, Wednesday, April 29, 2020
• This exam should be completed independently and discussions of any type are NOT allowed. Non-detailed questions are allowed on Piazza. If you’re not sure if your question is okay to ask, ask it privately.
• Submit a PDF which follows the homework submission policy rules to D2L. Your work must be in order, easy to read, and cropped correctly, in addition to following the other submission policy rules.
1. (20 points) Support Vector Machines. Use the following two data points to solve this problem.
x1 =(1,0)T,y1 =−1,andx2 =(3,0)T,y2 =1.
(a) Compute the optimal w and b in a Support Vector Machine by solving the primal for-
mulation given as follows:
min 1wTw w,b 2
subject to yi(wT xi + b) ≥ 1, ∀i.
(b) Compute the optimal α in the dual formulation of the Support Vector Machine.
(c) Compute the optimal w based on the optimal α obtained in part (b).
(d) Compare the results in (a) and (c).
2. (10 points) Support Vector Machines: Primal/Dual. This question is related to the primal and dual formulations of SVM.
(a) Describe how the number of variables in the primal and dual formulations are related to the number of samples n and the data dimension d.
(b) If you need to solve an SVM with RBF kernel (Gaussian kernel) on a dataset with n = 1,000,000 and d = 1000, which formulation (primal or dual) would you choose? Justify your answer briefly.
3. (20 points) Support Vector Machines: Soft Margin. Provide a qualitative answer. Assume that we are training an SVM with a quadratic kernel, that is, our kernel function is a polynomial kernel of degree 2. You are given the dataset presented in Figures 1 and 2. The parameter C will determine the location of the separating hyperplane. Please answer the following questions qualitatively. Give a short answer/justification for each and draw your solution in the appropriate figure.
(a) Where would the decision boundary be for very large values of C (i.e., C → +∞)? Draw the boundary on the figure and justify your answer briefly.
(b) For C ≈ 0, draw on the figure where you would expect the decision boundary to be. Justify your answer.
1

Figure 1: Problem 3A Figure 2: Problem 3B
4. (15 points) Loss Functions.
(a) Describe hinge loss and logistic regression loss mathematically.
(b) Describe the similarities and differences of these two loss functions.
(c) Based on these two loss functions, consider a point that is correctly classified and distant from the decision boundary. Why would SVM’s decision boundary be unaffected by this point, but the boundary learned by logistic regression be affected?
5. (15 points) Principal Component Analysis. Consider a data matrix X ∈ Rp×n in which each column corresponds to a p-dimensional data point and there are a total of n data points. Assume that X is already normalized/centered, namely the mean has been subtracted.
(a) Describe how the PCA transformation G ∈ Rp×d can be computed from X.
(b) List the algorithm to find the reconstruction error of a reconstructed-PCA image.
(c) What does the reconstruction error represent?
6. (10 points) Model Complexity. In machine learning, model complexity often refers to the number of features included in a predictive model, as well as whether the chosen model is linear, nonlinear, etc. It can also refer to the algorithmic learning complexity or computational complexity.
(a) How is model complexity related to sample size and regularization?
(b) What is generalization performance and how does model complexity relate to general- ization performance?
7. (10 points) Clustering & Neural Networks.
(a) In your own words, describe how the K-means clustering algorithm works. If you use
the algorithm from the slides, you must still explain each step in your own words.
(b) Compare and contrast a basic Feedforward Neural Network (FNN) and a basic Recurrent Neural Network (RNN).
2

8. Bonus Questions. Worth one point each.
(a) Name the property that states that the probability of a future state only depends on the
current state, not the sequence of states preceding it.
(b) Can you use reinforcement learning to train a neural network?
(c) What is your favorite color?
3