APS1070 – Foundations of Data Analytics and Machine Learning Final Examination Fall 2019
Non-programmable & non-communicating calculators are allowed Time allotted: 120 minutes
Part I: Multiple Choice (Select only 1 answer per question)
1. Which of the following statements is true? [2]
Copyright By PowCoder代写 加微信 powcoder
a) In k-fold cross-validation, training data is divided into k folds, then the model is trained on k folds and validated on 1 fold, for a total of k-1 times (splits).
b) In k-fold cross-validation, training data is divided into k folds, then the model is trained on k folds and validated on 1 fold, for a total of k times (splits).
c) In k-fold cross-validation, training data is divided into k folds, then the model
is trained on k-1 folds and validated on 1 fold, for a total of k times (splits).
d) In k-fold cross-validation, training data is divided into k folds, then the model is trained on k-1 folds and validated on 1 fold, for a total of k-1 times (splits).
2. When performing PCA, the goal is to accomplish which of the following? [2]
a) Maximize the variance of the primary components and maximize the residuals.
b) Minimize the variance of the primary components and minimize the residuals.
c) Maximize the variance of the primary components and minimize the residuals.
d) Minimize the variance of the primary components and maximize the residuals.
Part II: Calculations/Proofs
3. The schematic to the right has sets S1, S2, S3, S4, S5, S6 and S7. What sets, and in what order, would a greedy algorithm select to cover the universe (i.e., cover each point) if all sets are weighted equally? [2]
4. Isthematrix𝐴= ( 1 0 3 )invertible?[2]
5. Giventhat(𝑛)= 𝑛! ,provethat(𝑛)=( 𝑛 ).[2] 𝑟 𝑟!(𝑛−𝑟)! 𝑟 𝑛−𝑟
6. We have 𝑥 = [1,1,2]𝑇 ∈ R3 and 𝑦 = [1,0,3]𝑇 ∈ R3. What is the angle between vectors? [2]
7. Calculate the Jacobian 𝑱𝑭(𝑥1, 𝑥2, 𝑥3) of the function 𝑭 ∶ R3 → R4 , which has components: [2]
𝑦 =𝑥 +2𝑥2;𝑦 =𝑥 sin𝑥 ;𝑦 =𝑥 exp(𝑥);𝑦 =𝑥 +𝑥 113212323412
APS1070 Fall 2019 Page 1 of 2
8. Consider functions 𝑓 ∶ R3 → R1 and 𝒙 ∶ R1 → R3, where: 𝑡
𝑓(𝒙)=𝑥 +2𝑥2 +𝑥 ;𝒙(𝑡)=[ 3𝑡 ]
Calculate the gradient 𝑑𝑓 using the chain rule. [2] 𝑑𝑡
Part III: Text Answers
9. In general, is using a gradient descent algorithm a good choice for finding optimal hyperparameters? Why or why not? [2]
10. What is a learning rate? Explain what happens if it is set too high or too low (you may use a schematic/drawing if helpful). [2]
Part IV: Algorithms
11. You have the following code:
How efficient is this code in terms of big O notation (what is the order of the algorithm’s runtime)? [hint: ∑𝑛 𝑘3 = 𝑛2(𝑛+1)2] [2]
foriinrange n)
for inrange i ) )
12. The input parameter x of “my_function” is training data with features as columns and examples as rows.
a) What is the purpose of this function, specifically? [2]
b) Describe why this type of process is important to gradient descent and K-Nearest Neighbor algorithms. [2]
13. Consider a differentiable loss function 𝑙(𝒘, 𝒙, 𝑦) and a dataset 𝐷. You’re asked to optimize the average loss 1 ∑𝑁 𝑙(𝒘, 𝒙(𝒊), 𝑦(𝑖)) with respect to 𝒘. Write pseudo-
code implementing mini-batch gradient descent optimization using a decaying learning rate (i.e., a learning rate that is reduced by 𝛼𝐷 every epoch), with the following inputs: [4]
▪ Differentiable loss 𝑙(𝒘, 𝒙, 𝑦) with gradient ∇𝒘𝑙(𝒘, 𝒙, 𝑦). ▪ Dataset𝐷= (𝒙(1),𝑦(1)),…(𝒙(𝑁),𝑦(𝑁)).
▪ Mini-batch size m, initial weights 𝒘0 and number of steps T. ▪ Initial learning rate 𝛼0 and learning rate decay 𝛼𝐷.
Note that your pseudo-code is just a representation of your algorithm written in plain English. Keep your pseudo-code simple and concise. Please use indentation and control structures. State any assumptions you make.
APS1070 Fall 2019 Page 2 of 2
def my_function )
return mean )) std
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com