程序代写代做 Homework 13

Homework 13
1. In this problem you will implement a neural network to solve a classification problem. To keep things simple, the data will consist of covariates x(i) ∈ R2 and a response yi ∈ {0,1} for i = 1, 2, . . . , N (notice y takes on only two possible values). The classification problem involves fitting the model y ∼ f(x) over functions f(x) that can be parameterized by our neural net, which is described below. The attached file nn.txt contains the samples. Each sample, corresponding to a row in the file, gives the three values (x(i), x(i), y ).
Your neural network should consist of three layers, as discussed
in class. The input layer should contain X1 and X2 nodes,
which will be the two coordinates for each of the samples x(i)
(i.e. X = x(i) and X = x(i)). The middle (hidden) layer 1122
should contain m nodes, Zj for j = 1,2,…,m. And an output layer consisting of nodes T1, T2. Then, the probability that the class of x(i) is 1 given by,
P(y = 1 | x(i),α) = exp[T1] , (1) exp[T1] + exp[T2]
where α is the vector of parameters of the neural net and T1, T2 are computed using the neural net with input x(i). To repeat what we mentioned in class, each Zj is parameterized as follows:
Z = σ(β(j) + β(j) · x), (2) j0
wherex=(X,X),β(j) ∈R,β(j) ∈R2,andσ(w)=1/(1+ 120
exp(−w)). The node Tj is parameterized as follows:
T = σ(γ(j) + γ(j) · z), (3)
12i
j0 wherez=(Z,Z,…,Z),γ(j) ∈R,γ(j) ∈Rm. Thenα
12m0
is the concatenation of all the parameters: β(j), β(j) for j =
1,2,…,m and γ(j), γ(j) for j = 1,2. 0
(a) Visualize the dataset by plotting it with different colors for the two classes of y.
1
0

(b) What is the dimension of α in terms of m?
(c) Write a function N N (x, α, m) which takes a sample x ∈ R2
and a choice for α and returns the neural net estimate
of P(y = 1 | x,α).. (Hint: It may be helpful to write
functions such as get_beta(alpha i), which given α and i
returns β(i), and get_beta_0(eta, i), which given α and
i returns β(i). Using such functions will greatly simplify 0
your code.)
(d) Explain why the log likelihood function logL(η) for the neural net is given by
􏰇N i=1
􏰅
exp[T1 ] 􏰆 􏰅
exp[T1] + exp[T2] +(1−yi) log 1 − exp[T1] + exp[T2]
(4)
pass the data to the function).
(e) Write a function that uses finite difference to compute the stochastic gradient of log L(α) based on a single or small number of samples (you can pick whether to use one sample or a few).
(f) Set m = 4 and train your neural net by maximizing the log L(α) using stochastic steepest ascent.
(g) Remember that a classifier in this case is a function F (x) : R2 → {0,1}, where x ∈ R2. Once you choose α by computing the maximum likelihood in (e), choose a cut- offp∈[0,1]. SetF by
F(x)=􏰄 1 ifP(y=1|x,α)>p (5) 0 ifP(y=1|x,α)≤p
Try different value of p and for each p, visualize your clas- sifier. You can visualize your classifier in any way you like, but here is one way. You can generate random coordinates within [−2,2], which is roughly where all the data points lie, using
x1 <- 4*runif(10000) - 2 x2 <- 4*runif(10000) - 2 2 􏰆 . log L(η) = Write a function that computes log L(α) (you will need to exp[T1 ] yi log Then plot each pair x1[i], x2[i] and vary the color of the point depending on whether the classifier predicts 1 or 0. 3