IV. Problems
Linear Regression (7 points)
(i) In lecture we derived a closed form solution for linear regression:
𝒘̃ = (𝑿̃𝑿̃𝑇)−1𝑿̃𝒚, where 𝑿̃ = [𝒙̃1,…,𝒙̃𝑁], 𝒙̃𝑛 = [𝑥𝑛𝑇,1]𝑇, and 𝒚 = [𝑦1,…,𝑦𝑁]𝑇 where a potential problem is that 𝑿̃𝑿̃𝑇 may not be invertible.
Copyright By PowCoder代写 加微信 powcoder
Consider a different minimization problem to find 𝒘̃. Again, let 𝐸(𝒘̃) = ∑𝑛(𝒘̃𝑇̃𝒙𝑛 − 𝑦𝑛)2. Instead of minimizing 𝐸(𝒘̃), we will minimize 𝐸(𝒘̃) + 𝜆 × ‖ 𝒘̃‖2 = 𝐸(𝒘̃) + 𝜆 × 𝒘̃𝑇𝒘̃, where 𝜆 > 0.
Show that the solution 𝒘̃ which minimizes 𝐸(𝒘̃) + 𝜆 × 𝒘̃𝑇𝒘̃ is 𝒘̃ = (𝑿̃𝑿̃𝑇 + 𝜆𝑰)−1𝑿̃𝒚, where I is a (D + 1) – by – (D + 1) identity matrix and D is the dimensionality of 𝑥𝑛. Write your derivation with no more than 8 lines.
In machine learning, this solution is called rigid regression. Note that, now 𝑿̃𝑿̃𝑇 + 𝜆𝑰 is invertible.
Regression, Gauss-Newton, and gradient descent: Part-1 (24 points)
Figure 1: (a) A dataset of one data instance; (b) the corresponding 𝑓(𝑥𝑖; 𝑤) − 𝑦𝑖; (c) the corresponding 𝐸(𝑤) = (𝑓(𝑥𝑖; 𝑤) − 𝑦𝑖)2.
Figure 1 (a) shows a dataset of just one data instance: (𝑥1, 𝑦1) = (2,2). Now we want to fit a nonlinear curve 𝑓(𝑥; 𝑤) = 𝑒𝑤𝑥 to the dataset, using the sum of square error (SSE): 𝐸(𝑤) = (𝑓(𝑥𝑖; 𝑤) − 𝑦𝑖)2. Figures 1 (b) and (c) show the curve 𝑓(𝑥𝑖; 𝑤) − 𝑦𝑖 w.r.t. w and the curve 𝐸(𝑤) w.r.t. w, respectively.
(ii) Gauss-Newton method (12 points)
Use the Gauss-Newton method introduced in the lectures to find the solution that minimizes 𝐸(𝑤). Note that, the optimal solution is 0.347. Since the Gauss-Newton method is an iterative method, the solution depends on the number of iterations.
1. Begin with the initialization 𝑤̂ (0) = 1.5, perform three iterations of Gauss-Newton to get 𝑤̂(3). What is 𝑤̂(3) (a numerical value)? [4 points]
2. Begin with the initialization 𝑤̂ (0) = 0.0, perform three iterations of Gauss-Newton to get 𝑤̂(3). What is 𝑤̂(3) (a numerical value)? [4 points]
3. Begin with the initialization 𝑤̂ (0) = −1.0, perform three iterations of Gauss- Newton to get 𝑤̂(3). What is 𝑤̂(3) (a numerical value)? [4 points]
Your answers must contain numerical values. For example, use 0.500 but not 1/2 in your answer. Round your answer to the third decimal. For example, 1.333333 becomes 1.333; -1.333333 becomes -1.333.
You should see that iterative methods are sensitive to the initialization.
(iii) Gradient descent (12 points)
Use the gradient descent (GD) method introduced in the lectures to find the solution 𝑤∗ that minimizes 𝐸(𝑤). Since GD is an iterative method, the solution depends on the number of iterations.
For our problem where w is one-dimensional, given the current guess 𝑤̂(𝑡), GD performs the following update:
𝑤̂(𝑡+1) ← 𝑤̂(𝑡) − 𝜂 𝑑𝐸 (𝑤̂(𝑡)) 𝑑𝑤
where 𝑑𝐸 (𝑤̂(𝑡)) is the derivative computed at 𝑤̂(𝑡) and 𝜂 is the step size or learning rate. 𝑑𝑤
1. Begin with the initialization 𝑤̂ (0) = 1.5, perform three iterations of GD (with 𝜂 = 0.1) to get 𝑤̂(3). What is 𝑤̂(3) (a numerical value)? [4 points]
2. Begin with the initialization 𝑤̂ (0) = 0.0, perform three iterations of GD (with 𝜂 = 0.1) to get 𝑤̂(3). What is 𝑤̂(3) (a numerical value)? [4 points]
3. Begin with the initialization 𝑤̂ (0) = 1.5, perform three iterations of GD (with 𝜂 = 0.001) to get 𝑤̂(3). What is 𝑤̂(3) (a numerical value)? [4 points]
Your answers must contain numerical values. For example, use 0.500 but not 1/2 in your answer. Round your answer to the third decimal. For example, 1.333333 becomes 1.333; -1.333333 becomes -1.333.
You should see that iterative methods are sensitive to the initialization.
Regression, Gauss-Newton, and gradient descent: Part-2 (44 points)
Figure 2: (a) A dataset of one data instance; (b) the corresponding 𝑓(𝑥𝑖; 𝑤) − 𝑦𝑖; (c) the corresponding 𝐸(𝑤) = (𝑓(𝑥𝑖; 𝑤) − 𝑦𝑖)2.
Consider a simpler problem: to fit a linear line 𝑓(𝑥; 𝑤) = 𝑤𝑥 to the same data, again using the sum of square error (SSE): 𝐸(𝑤) = (𝑓(𝑥𝑖; 𝑤) − 𝑦𝑖)2. Figure 2 (a) shows the same data as Figure 1, but Figure 2 (b) and (c) show the curve of 𝑓(𝑥𝑖; 𝑤) − 𝑦𝑖 w.r.t. w and the curve of 𝐸(𝑤) w.r.t. w, respectively, using 𝑓(𝑥; 𝑤) = 𝑤𝑥.
(iv) Gauss-Newton method (10 points)
Let us again use the Gauss-Newton method to find the solution 𝑤∗ that minimizes 𝐸(𝑤) in Figure 2. Note that, the optimal solution is 1.000.
1. Begin with the initialization 𝑤̂ (0) = 1.5, perform just one iterations of Gauss- Newton to get 𝑤̂(1). What is 𝑤̂(1) (a numerical value)? [4 points]
2. Begin with the initialization 𝑤̂ (0) = 0.0, perform just one iterations of Gauss- Newton to get 𝑤̂(1). What is 𝑤̂(1) (a numerical value)? [4 points]
Your answers must contain numerical values. For example, use 0.500 but not 1/2 in your answer. Round your answer to the third decimal. For example, 1.333333 becomes 1.333; -1.333333 becomes -1.333.
Describe the difference and explain the reason if you see a difference from your observations in “Regression, Gauss-Newton, and gradient descent: Part-1” above using Gauss-Newton. Write “No difference” if you do not find any difference. [2 points]
(v) Gradient descent (14 points)
Let us again use the gradient descent (GD) method to find the solution 𝑤∗ that minimizes 𝐸(𝑤) in Figure 2. For our problem where w is one-dimensional, given the current guess 𝑤̂ (𝑡), GD performs the following update:
𝑤̂(𝑡+1) ← 𝑤̂(𝑡) − 𝜂 𝑑𝐸 (𝑤̂(𝑡)) 𝑑𝑤
where 𝑑𝐸 (𝑤̂(𝑡)) is the derivative computed at 𝑤̂(𝑡) and 𝜂 is the step size or learning rate. 𝑑𝑤
1. Begin with the initialization 𝑤̂ (0) = 1.5, perform three iterations of GD (with 𝜂 = 0.1) to get 𝑤̂(3). What is 𝑤̂(3) (a numerical value)? [4 points]
2. Begin with the initialization 𝑤̂ (0) = 0.0, perform three iterations of GD (with 𝜂 = 0.1) to get 𝑤̂(3). What is 𝑤̂(3) (a numerical value)? [4 points]
3. Begin with the initialization 𝑤̂ (0) = 1.5, perform three iterations of GD (with 𝜂 = 1.0) to get 𝑤̂(3). What is 𝑤̂(3) (a numerical value)? [4 points]
Your answers must contain numerical values. For example, use 0.500 but not 1/2 in your answer. Round your answer to the third decimal. For example, 1.333333 becomes 1.333; -1.333333 becomes -1.333.
Can GD obtain the optimal solution in a few iterations? [2 points]
Naive Bayes and MLE (20 points)
Please write down the derivation of your answers, and highlight your answers clearly!
Calculation: Please perform rounding to your results after the second decimal number, unless stated otherwise. For example, 1.245 becomes 1.25 and -1.228 becomes -1.23.
Figure 3: A two-dimensional labeled dataset with 10 data instances.
Figure 3 gives a labeled dataset {(𝑥 , 𝑦 )}𝑁 of 10 data instances. Each 𝑥 is two-
dimensional: the first dimension 𝑥𝑖[1] is binary (i.e., {0,1}); the second dimension 𝑥𝑖[2] is
a real number. The label 𝑦𝑖 is binary (i.e., either class 0 or class 1).
Denote by X the two-dimensional random variable of data instances and Y the binary random variable of class labels, you are to construct the Naive Bayes classifier to predict the class label 𝑦̂ of a future data instance 𝑋 = 𝑥
𝑦̂ =𝑎𝑟𝑔 max 𝑝(𝑌=𝑐|𝑋=𝑥) 𝑐∈{0,1}
= 𝑎𝑟𝑔 max 𝑝(𝑌=𝑐)×∏𝑝(𝑋[𝑑]=𝑥[𝑑]|𝑌=𝑐)
You will begin by estimating the parameters of
𝑝(𝑋[𝑑]|𝑌 = 𝑐) ∀𝑐 ∈ {0,1},𝑑 ∈ {1,2}
from the labeled dataset, using Maximum Likelihood Estimation (MLE). Note that, 𝑝(𝑌) is a Bernoulli distribution; 𝑝(𝑋[1]|𝑌 = 𝑐) is a Bernoulli distribution of 𝑋[1];
𝑝(𝑋[2]|𝑌 = 𝑐) is a Gaussian distribution of 𝑋[2].
(vi) Prior distributions [2 points]
What is 𝑝(𝑌 = 1)? Write your derivation/calculation.
The answer is a real number.
(vii) Conditional distributions [4 points]
What is 𝑝(𝑋[1] = 1|𝑌 = 0)? Write your derivation/calculation.
What is 𝑝(𝑋[1] = 1|𝑌 = 1)? Write your derivation/calculation.
Each answer is a real number.
(viii) Conditional distributions [6 points]
What is 𝑝(𝑋[2] = 3.0|𝑌 = 0)? First write down the parameters (i.e., mean and variance) of 𝒑(𝑿[𝟐]|𝒀 = 𝟎) and 𝒑(𝑿[𝟐]|𝒀 = 𝟏).
What is 𝑝(𝑋[2] = 3.0|𝑌 = 1)? First write down the parameters (i.e., mean and variance) of 𝒑(𝑿[𝟐]|𝒀 = 𝟎) and 𝒑(𝑿[𝟐]|𝒀 = 𝟏).
Each answer is a real number.
(ix) Naive Bayes [8 points]
Given 𝑥 = |0, 3.0|𝑇 (i.e., 𝑥[1] = 0 and 𝑥[2] = 3.0), what is the prediction 𝑦̂? Write your
derivation/calculation.
Given 𝑥 = |1, 0.5|𝑇 (i.e., 𝑥[1] = 1 and 𝑥[2] = 0.5), what is the prediction 𝑦̂? Write your
derivation/calculation.
Each answer is either 0 or 1.
V. Carmen submission
Submit this portion of Homework #4 (Problem Set) into the drop box for the problem set portion. You will submit the programming portion into a different drop box.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com