代写 math python scala database graph CS 189 Introduction to Machine Learning

CS 189 Introduction to Machine Learning
Fall 2019 Jennifer Listgarten & Stella Yu HW 04
Due: October 10th, 2019
1 Getting Started
Read through this page carefully. You may typeset your homework in latex or submit neatly handwritten/scanned solutions. Please start each question on a new page. Deliverables:
1. Submit a PDF of your writeup, with an appendix for your code, to assignment on Grade- scope, “HW4 Write-Up”. If there are graphs, include those graphs in the correct sections. Do not simply reference your appendix.
2. If there is code, submit all code needed to reproduce your results, “HW4 Code”.
3. If there is a test set, submit your test set evaluation results, “HW4 Test Set”.
HW 04, ©UCB CS 189, Fall 2019. All Rights Reserved. This may not be publicly shared without explicit permission. 1

2 Gaussian Features
In this question we will be exploring augmenting our data using the Gaussian RBF. We might want
to do something like this so that we can model larger classes of functions by considering linear
combinations of basis functions f (x) = 􏰁K w φ (x)…, and we may want ”handcraft” our features k=1 k k
with functions φ(x) when we have intuition and insight about our data and its source.
(a)
First we will attempt to augment our data using polynomial features to see if this is appro- priate in the setting that our data comes from Gaussian distributions. Implement the function polynomial augmentation which takes as input a data matrix with only one feature and return a data matrix augmented with polynomial features of degree up to k (some values for k worth trying might be 2, 4, 6). Then, use the function linear regression which performs linear regression on a data matrix and plots the resulting curve along side the data points. What do you observe about the fitted curve? What happens as the degree of the polynomial features increases?
Now, we will use the Gaussian function (https://en.wikipedia.org/wiki/Gaussian function) to augment our data matrix. The idea is that our data comes from a linear combination of a set number of Gaussian functions, each with a center. Analyze the plot of data to find 3 centers that seem like they would be good centers and estimating three variances by looking at the plotted function against the data (hint: some of the Gaussians may be negatively weighted and be upside down). Then, implement the function rbf augmentation which takes as input a data matrix with only one feature and outputs a data matrix augmented with the values of the Gaussian functions centered at the chosen centers with the data as input. This function will take as input
(b)
and output
X=. .
121212  − | x1−c1| − |x1−c2| − |x1−c3| 

 x1
  . 
x  n
2σ2 2σ2 2σ2  xe1 e2 e3  1   ….  …. ….
 − 1 |xn−c1|2 − 1 |xn−c2|2 − 1 |xn−c3|2 xn e 2σ21 e 2σ2 e 2σ23 
3
where c1, c2, c3 are your centers and σ1, σ2, σ3 are the respective variances of the Gaussians. Probabilistic Principal Components Analysis (PPCA)
In this question, we explore a more general form of PCA, called PPCA. We start with a latent variable model
HW 04, ©UCB CS 189, Fall 2019. All Rights Reserved. This may not be publicly shared without explicit permission. 2

y = Wx + μ + z
for x ∼ N(0,I), noise z ∼ N(0,Σ), and constant μ,W. We will constrain ourselves to isotropic
noise i.e., Σ = σ2I.
(a) Characterize the distribution of y conditioned on x.
(b) Now consider i.i.d. x ∼ N(0, I). Now characterize the distribution of y. 4 Canonical Correlation Analysis
Assume that you have a database of images of the words typed in two different fonts. X, Y ∈ Rn×d corresponds to the dataset of font 1 and font 2 respectively. Think of the database X as being composed on n independent draws (samples) from a random variable X ∈ Rd, and similarly Y as n draws from a random variable Y ∈ Rd. Your goal is to use machine learning to build a text recognition of word images.
(a) Explain why you would want to consider using CCA in this problem.
(b) Assume that the data matrices X and Y include zero-mean features of the word images. Given two unit-length vectors u, v ∈ Rd , compute the correlation coefficient of the random variables x, y projected onto u, v, i.e., compute the correlation coefficient between uT x and vT y. Correlation coefficient between two scalar random variables P and Q is computed by:
ρ(P, Q) = cov(P, Q) σPσQ
(c) Assume that the features of matrix X are rescaled to have values between -1 and 1. How does this change the correlation coefficient?
5 Total Least Squares
In most of the models we have looked at so far, we’ve accounted for noise in the observed y measurement and adjusted accordingly. However, in the real world it could easily be that our feature matrix X of data is also corrupted or noisy. Total least squares is a way to account for this. Whereas previously we were minimizing the y distance from the data point to our predicted line because we had assumed the features were definitively accurate, now we are minimizing the entire distance from the data point to our predicted line. In this problem we will explore the mathematical intuition for the TLS formula. We will then apply the formula to adjusting the lighting of an image which contains noise in its feature matrix due to inaccurate assumptions we make about the image, such as the image being a perfect sphere.
Let X ∈ Rn×d and y ∈ Rn be the measurements. Recall that in the least squares problem, we want to solve for w in minw ||Xw − y||. We measure the error as the difference between Xw and y, which can be viewed as adding an error term εy such that the equation Xw = y + εy has a solution:
HW 04, ©UCB CS 189, Fall 2019. All Rights Reserved. This may not be publicly shared without explicit permission. 3

min||εy||2, subject to Xw = y + εy (1) εy ,w
Although this optimization formulation allows for errors in the measurements of y, it does not allow for errors in the feature matrix X that is measured from the data. In this problem, we will explore a method called total least squares that allows for both error in the matrix X and the vector y, represented by εX and εy, respectively. For convenience, we absorb the negative sign into εy and εX and define true measurements y and X like so:
ytrue =y+εy (2) Xtrue =X+εX (3)
Specifically, the total least squares problem is to find the solution for w in the following minimiza- tion problem:
􏰀􏰀 􏰀􏰀2
min 􏰀􏰀􏰀􏰀[ε ,ε ]􏰀􏰀􏰀􏰀 , subjectto(X+ε )w=y+ε (4)
εy,εX,w􏰀􏰀X y􏰀􏰀F X y
where the matrix [εX,εy] is the concatenation of the columns of εX with the column vector y. Recall the subscript F from discussion denotes the Frobenius norm of a matrix. Notice that the minimization is over w because it’s a free parameter, and it does not necessarily have to be in the objective function. Intuitively, this equation is finding the smallest perturbation to the matrix of data points X and the outputs y such that the linear model can be solved exactly. The constraint in the minimization problem can be rewritten as
[X+ε ,y+ε] =0 (5) X y −1
(a) LetthematrixX∈Rn×d andy∈Rn andnotethatεX ∈Rn×d andεy ∈Rn.Assumingthatn>d and rank(X + εX) = d, explain why rank([X + εX, y + εy]) = d.
(b) For the solution w to be unique, the matrix [X + εX, y + εy] must have exactly d linearly independent columns. Since this matrix has d+1 columns in total, it must be rank-deficient by 1. Recall that we can rewrite [X, y] as its SVD decomposition U ΣV ⊤ for orthonormal U, V and diagonal Σ. Eckart-Young-Mirsky Theorem tells us that the closest lower-rank matrix in the Frobenius norm is obtained by discarding the smallest singular values. Therefore, the matrix [X + εX, y + εy] that minimizes
􏰀􏰀􏰀􏰀[ε , ε ]􏰀􏰀􏰀􏰀2 = 􏰀􏰀􏰀􏰀[Xtrue, ytrue] − [X, y]􏰀􏰀􏰀􏰀2 􏰀􏰀􏰀􏰀 X y 􏰀􏰀􏰀􏰀 􏰀􏰀􏰀􏰀 􏰀􏰀􏰀􏰀
FF
is given by
HW 04, ©UCB CS 189, Fall 2019. All Rights Reserved. This may not be publicly shared without explicit permission. 4
 w

[X+εX,y+εy]=U  
… V 
0
Suppose we express the SVD of [X, y] in terms of submatrices and vectors:
   ⊤
[X,y] = u⊤ u yx
yy
 σ
d+1
v⊤ v yx yy

Uxx uxy Σd
 Vxx vxy
uxy ∈Rn−1 isthefirst(n−1)elementsofthe(d+1)-thcolumnofU,u⊤yx ∈Rd isthefirstd elements of the n-th row of U, uyy is the n-th element of the (d + 1)-th column of U, Uxx ∈ R(n−1)×d is the (n − 1) × d top left submatrix of U.
Similarly, vxy ∈ Rd is the first d elements of the (d + 1)-th column of V, v⊤yx ∈ Rd is the first d elements of the (d + 1)-th row of V, vyy is the (d + 1)-th element of the (d + 1)-th column of V, Vxx ∈ Rd×d is the d × d top left submatrix of V. σd+1 is the (d + 1)-th eigenvalue of [X, y]. Σd is the diagonal matrix of the d largest singular values of [X, y]
Using this information show that

Σd 
 
  0 ⊤
   ⊤ uxy vxy
[ε,ε]=− σ   X y uyy d+1vyy
(c) Using the result from the previous part and the fact that vyy is not 0 (see notes on Total 
w LeastSquares),findanonzerosolutionto[X+ε ,y+ε ] =0andthussolveforwin
Equation (5).
HINT: Looking at the last column of the product [X, y]V may or may not be useful for this problem, depending on how you solve it.
 w
(d) From the previous part, you can see that −1 is a right-singular vector of [X, y]. Show that (X⊤X − σ2d+1I)w = X⊤y (14)
(e) In this problem, we will use total least squares to approximately learn the lighting in a photo- graph, which we can then use to paste new objects into the image while still maintaining the realism of the image. You will be estimating the lighting coefficients for the interior of St. Pe- ter’s Basilica, and you will then use these coefficients to change the lighting of an image of a
HW 04, ©UCB CS 189, Fall 2019. All Rights Reserved. This may not be publicly shared without explicit permission. 5
X y −1

Figure 1: Tennis ball pasted on top of image of St. Peter’s Basilica without lighting adjustment (left) and with lighting adjustment (right)
Figure 2: Image of a spherical mirror inside of St. Peter’s Basilica
tennis ball so that it can be pasted into the image. In Figure 1, we show the result of pasting the tennis ball in the image without adjusting the lighting on the ball. The ball looks too bright for the scene and does not look like it would fit in with other objects in the image.
To convincingly add a tennis ball to an image, we need to need to apply the appropriate lighting from the environment onto the added ball. To start, we will represent environment lighting as a spherical function f(n) where n is a 3 dimensional unit vector (||n||2 = 1), and f outputs a 3 dimensional color vector, one component for red, green, and blue light intensities. Because f(n) is a spherical function, the input n must correspond to a point on a sphere. The function f(n) represents the total incoming light from the direction n in the scene. The lighting function of a spherical object f(n) can be approximated by the first 9 spherical harmonic basis functions.
The first 9 un-normalized spherical harmonic basis functions are given by:
L1 = 1 L2 = y L3 = x
HW 04, ©UCB CS 189, Fall 2019. All Rights Reserved. This may not be publicly shared without explicit permission. 6

L4 = z
L5 = xy
L6 = yz
L7 = 3z2 − 1 L8 = xz
L9 = x2 − y2
where n = [x, y, z]⊤. The lighting function can then be approximated as
9
f(n) ≈ 􏰂 γiLi(n) i=1

 — f(n1) —  L1(n1) L2(n1) … L9(n1)  — γ1 —   
  — f(n) — L(n) L(n) … L(n) — γ — 21222922   .=..  ... ...
 
 — f(nn) — 
where Li(n) is the ith basis function from the list above.
n×3
n×9
9×3
L1(nn) L2(nn) … L9(nn)
 — γ9 — 
The function of incoming light f(n) can be measured by photographing a spherical mirror placed in the scene of interest. In this case, we provide you with an image of the sphere as seen in Figure 2. In the code provided, there is a function extractNormals(img) that will extract the training pairs (ni, f(ni)) from the image. An example using this function is in the code.
Use the spherical harmonic basis functions to create a 9 dimensional feature vector for each sample. Use this to formulate an ordinary least squares problem and solve for the unknown coefficients γi. Report the estimated values for γi and include a visualization of the approximation using the provided code. The code provided will load the images, extracts the training data, relights the tennis ball with incorrect coefficients, and saves the results. Your task is to compute the basis functions and solve the least squares problems to provide the code with the correct coefficients. To run the starter code, you will need to use Python with numpy and scipy. Because the resulting data set is large, we reduce it in the code by taking every 50th entry in the data. This is done for you in the starter code, but you can try using the entire data set or reduce it by a different amount.
(f) When we extract from the data the direction n to compute (ni, f(ni)), we make some approx- imations about how the light is captured on the image. We also assume that the spherical mirror is a perfect sphere, but in reality, there will always be small imperfections. Thus, our measurement for n contains some error, which makes this an ideal problem to apply total least squares. Solve this problem with total least squares by allowing perturbations in the ma- trix of basis functions. Report the estimated values for γi and include a visualization of the approximation. The output image will be visibly wrong, and we’ll explore how to fix this problem in the next part. Your implementation may only use the SVD and the matrix inverse functions from the linear algebra library in numpy as well as np.linalg.solve.
HW 04, ©UCB CS 189, Fall 2019. All Rights Reserved. This may not be publicly shared without explicit permission. 7

(g) In the previous part, you should have noticed that the visualization is drastically different than the one generated using least squares. Recall that in total least squares we are minimizing ||[εX,εy]||2F. Intuitively, to minimize the Frobenius norm of components of both the inputs and outputs, the inputs and outputs should be on the same scale. However, this is not the case here. Color values in an image will typically be in [0, 255], but the original image had a much larger range. We compressed the range to a smaller scale using tone mapping, but the effect of the compression is that relatively bright areas of the image become less bright. As a compromise, we scaled the image colors down to a maximum color value of 384 instead of 255. Thus, the inputs here are all unit vectors, and the outputs are 3 dimensional vectors where each value is in [0, 384]. Propose a value by which to scale the outputs f(ni) such that the values of the inputs and outputs are roughly on the same scale. Solve this scaled total least squares problem, report the computed spherical harmonic coefficients and provide a rendering of the relit sphere.
HW 04, ©UCB CS 189, Fall 2019. All Rights Reserved. This may not be publicly shared without explicit permission. 8