CS计算机代考程序代写 chain algorithm final exam – APS1070 – 2019_09 Fall.pdf

final exam – APS1070 – 2019_09 Fall.pdf

APS1070 Fall 2019 Page 1 of 2

APS1070 Foundations of Data Analytics and Machine Learning
Final Examination Fall 2019

Open book
Non-programmable & non-communicating calculators are allowed
Time allotted: 120 minutes

Part I: Multiple Choice (Select only 1 answer per question)

1. Which of the following statements is true? [2]

a) In k-fold cross-validation, training data is divided into k folds, then the model
is trained on k folds and validated on 1 fold, for a total of k-1 times (splits).

b) In k-fold cross-validation, training data is divided into k folds, then the model
is trained on k folds and validated on 1 fold, for a total of k times (splits).

c) In k-fold cross-validation, training data is divided into k folds, then the model
is trained on k-1 folds and validated on 1 fold, for a total of k times (splits).

d) In k-fold cross-validation, training data is divided into k folds, then the model
is trained on k-1 folds and validated on 1 fold, for a total of k-1 times (splits).

2. When performing PCA, the goal is to accomplish which of the following? [2]

a) Maximize the variance of the primary components and maximize the
residuals.

b) Minimize the variance of the primary components and minimize the
residuals.

c) Maximize the variance of the primary components and minimize the
residuals.

d) Minimize the variance of the primary components and maximize the
residuals.

Part II: Calculations/Proofs

3. The schematic to the right has sets S1, S2, S3,
S4, S5, S6 and S7. What sets, and in what order,
would a greedy algorithm select to cover the
universe (i.e., cover each point) if all sets are
weighted equally? [2]

4. Is the matrix invertible? [2]

5. Given that , prove that . [2]

6. We have and . What is the angle between
vectors? [2]

7. Calculate the Jacobian of the function , which has
components: [2]

; ; ;

APS1070 Fall 2019 Page 2 of 2

8. Consider functions and , where:

;

Calculate the gradient using the chain rule. [2]

Part III: Text Answers

9. In general, is using a gradient descent algorithm a good choice for finding optimal
hyperparameters? Why or why not? [2]

10. What is a learning rate? Explain what happens if it is set too high or too low (you
may use a schematic/drawing if helpful). [2]

Part IV: Algorithms

11. You have the following code:

How efficient is this code in terms of big O notation (what is the order of the

[hint: ] [2]

12. The input parameter x columns
and examples as rows.

a) What is the purpose of this
function, specifically? [2]

b) Describe why this type of process is important to gradient descent and
K-Nearest Neighbor algorithms. [2]

13. Consider a differentiable loss function and a dataset .

optimize the average loss with respect to . Write pseudo-

code implementing mini-batch gradient descent optimization using a decaying
learning rate (i.e., a learning rate that is reduced by every epoch), with the
following inputs: [4]

Differentiable loss with gradient .
Dataset .
Mini-batch size m, initial weights and number of steps T.
Initial learning rate and learning rate decay .

Note that your pseudo-code is just a representation of your algorithm written in
plain English. Keep your pseudo-code simple and concise. Please use indentation
and control structures. State any assumptions you make.