APS1070 Foundations of Data Analytics and Machine Learning Final Examination Fall 2019
Non-programmable & non-communicating calculators are allowed Time allotted: 120 minutes
Part I: Multiple Choice (Select only 1 answer per question)
1. Which of the following statements is true? [2]
Copyright By PowCoder代写 加微信 powcoder
a) In k-fold cross-validation, training data is divided into k folds, then the model is trained on k folds and validated on 1 fold, for a total of k-1 times (splits).
b) In k-fold cross-validation, training data is divided into k folds, then the model is trained on k folds and validated on 1 fold, for a total of k times (splits).
c) In k-fold cross-validation, training data is divided into k folds, then the model
is trained on k-1 folds and validated on 1 fold, for a total of k times (splits).
d) In k-fold cross-validation, training data is divided into k folds, then the model is trained on k-1 folds and validated on 1 fold, for a total of k-1 times (splits).
2. When performing PCA, the goal is to accomplish which of the following? [2]
a) Maximize the variance of the primary components and maximize the residuals.
b) Minimize the variance of the primary components and minimize the residuals.
c) Maximize the variance of the primary components and minimize the residuals.
d) Minimize the variance of the primary components and maximize the residuals.
Part II: Calculations/Proofs
3. The schematic to the right has sets S1, S2, S3, S4, S5, S6 and S7. What sets, and in what order, would a greedy algorithm select to cover the universe (i.e., cover each point) if all sets are weighted equally? [2]
4. Is the matrix
5. Given that
6. We have
vectors? [2]
7. Calculate the Jacobian
components: [2]
invertible? [2] , prove that
. What is the angle between
of the function , which has
Fall 2019 Page 1 of 2
8. Consider functions and
Calculate the gradient using the chain rule. [2] Part III: Text Answers
9. In general, is using a gradient descent algorithm a good choice for finding optimal hyperparameters? Why or why not? [2]
10. What is a learning rate? Explain what happens if it is set too high or too low (you may use a schematic/drawing if helpful). [2]
Part IV: Algorithms
11. You have the following code:
How efficient is this code in terms of big O notation (what is the order of the [hint: ] [2]
12. The input parameter x and examples as rows.
a) What is the purpose of this function, specifically? [2]
b) Describe why this type of process is important to gradient descent and K-Nearest Neighbor algorithms. [2]
13. Consider a differentiable loss function and a dataset .
optimize the average loss with respect to . Write pseudo-
code implementing mini-batch gradient descent optimization using a decaying learning rate (i.e., a learning rate that is reduced by every epoch), with the following inputs: [4]
Differentiable loss with gradient . Dataset .
Mini-batch size m, initial weights and number of steps T. Initial learning rate and learning rate decay .
Note that your pseudo-code is just a representation of your algorithm written in plain English. Keep your pseudo-code simple and concise. Please use indentation and control structures. State any assumptions you make.
APS1070 Fall 2019 Page 2 of 2
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com