程序代写代做代考 algorithm Statistical Models and Computing Methods, Problem Set 1

Statistical Models and Computing Methods, Problem Set 1
October 15, 2020 Due 10/29/2020
Problem 1.
(1) Show that X ⇠ N (0, 1) is the maximum entropy distribution such that EX = 0 and EX2 = 1.
(2) Generalize the result in (1) for the maximum entropy distribution given the first k moments, i.e., EXi = mi, i = 1,…,k.
Problem 2.
Let Y1, . . . , Yn be a set of independent random variables with the following pdfs p(yi|✓i) = exp(yib(✓i) + c(✓i) + d(yi)), i = 1, . . . , n
Let E(Yi) = μi(✓i), g(μi) = xTi , where g is the link function and 2 Rd is the vector of model parameters.
(1) Denote g(μi) as ⌘i, and let s be the score function of . Show that
sj=Xn (yiμi)xij@μi, j=1,…,d i=1 Var(Yi ) @ ⌘i
(2) Let I be the Fisher information matrix. Show that
Xn x i j x i k ✓ @ μ i ◆ 2
Ijk =E(sjsk)= Var(Y) @⌘ , 81j,kd.
i=1 i i
Problem 3.
Use the following code to generate covariate matrices X
1 2 3 4 5
import numpy as np np.random.seed(1234)
n = 100
X = np.random.normal(size=(n,2))
(1) Generate n = 100 observations Y following the logistic regression model with true parameter 0 = (2, 1).
(2) Find the MLE using the iteratively reweighted least square algorithm.
(3) Repeat (1) and (2) for 100 instances. Compare the MLEs with the asymptotical distribution ˆ ⇠ N (0, I1(0)). Present your result with a scatter plot for MLEs with contours for the pdf of the asymptotical distribution.
1

(4) Try the same for n = 10000. Does the asymptotical distribution provide a better fit to the MLEs? You can use the empirical covariance matrix of the MLEs for comparison.
Problem 4.
Consider the probit regression model
Y |X, ⇠ Bernoulli(p), p = (X)
where is the cumulative distribution function of the standard normal distribution. Similarly as in Problem 3, generate a large covariate matrix X with 100000 instances and 100 features, and response Y with true parameter 0
1 2 3 4 5 6
1 2 3 4 5
import numpy as np np.random.seed(1234)
n, d = 100000, 100
X = np.random.normal(size=(n,d)) beta_0 = np.random.normal(size=d)
(1) Compare gradient descent and nesterov’s accelerated gradient descent.
(2) Compare vanilla stochastic gradient descent with di↵erent adaptive stochastic gradi- ent descent methods, including AdaGrad, RMSprop, and Adam. Using minibatch sizes 32, 64, 128.
(3) Bonus question. Generate a random mask matrix M as follows and use it to sparsify the covariance matrix X
np.random.seed(1234)
sparse_rate = 0.3
M = np.random.uniform(size=(n,d)) < sparse_rate X[M] = 0. Repeat your experiments in (2), and compare with the results for the full covariance matrix. 2