CS计算机代考程序代写 Bayesian Introduction to regression models

Introduction to regression models
Bayesian Statistics Statistics 4224/5224 Spring 2021
March 25, 2021
1

The multiple linear regression model
The multiple linear regression model with response (also called the dependent variable or outcome) yi and covariates (also called independent variables, predictors or inputs) xi1,…,xik is
yi|β, σ2 ∼ indep Normal(β1xi1 + · · · + βkxik, σ2) for i = 1,…,n.
We assume throughout that xi1 = 1 for all i so that β1 is the intercept (mean response if all other covariates are zero).
The coefficient βj for j = 2, . . . , k is the slope associated with the (j − 1)st covaiate.
The model actually has k+1 parameters (k regression coefficients and variance σ2).
2

Standardization
While not strictly required, we will, for the remainder of this dis- cussion, assume that all k−1 covariates (excluding the intercept term) have been standardized to have mean zero and variance one.
That is, if the original covariate j − 1, x ̃ij, had sample mean x ̄j and standard deviation sj then we set xij = (x ̃ij − x ̄j)/sj.
After standardization, the slope βj is interpreted as the change in the mean response corresponding to an increase of one standard deviation unit (sj) in the original covariate.
The model actually has k+1 parameters (k regression coefficients and variance σ2).
3

Matrix notation
Let y = (y1, . . . , yn)T be the response vector of length n, and the design matrix X be the n × k matrix with column j having elements x1j,…,xnj (the first column is equal to the vector of ones for the intercept).
The linear regression model is then
y|β, σ2 ∼ Normaln(Xβ, σ2I)
where I is the n × n identity matrix, with ones on the diagonal (so all responses have variance σ2) and zeros off the diagonal (so the responses are uncorrelated).
4

The usual least squares estimator in matrix notation is βˆ = ( X T X ) − 1 X T y .
Note that the least squares estimator is unique only if XTX is full rank (i.e., k < n and none of X’s columns are redundant). Assuming X is full rank, the sampling distribution used to con- struct frequentist confidence intervals and p-values is 2 􏰜 2 T −1􏰝 β|β,σ ∼Normalk β,σ (X X) . ˆ 5 The standard noninformative prior A convenient noninformative prior satisfies p(β|σ2) ∝ 1 . Assuming XTX has full rank, the posterior is 2 􏰜ˆ 2 T −1􏰝 β|σ ,y ∼ Normalk β,σ (X X) . The posterior mean is the least squares solution, and the poste- rior covariance matrix is the covariance matrix of the sampling distribution of the least squares estimator. Therefore, the posterior credible intervals from this model will numerically match the confidence intervals from a least squares analysis with known error variance. 6 A convenient noninformative prior for the variance is p(σ2) ∝ (σ2)−1 . The marginal posterior distribution of σ2 can be seen to have a scaled inverse-χ2 form, where Therefore, both the conditional posterior of the regression co- efficients, and the marginal posterior of the variance, belong to known families of distributions, and can be sampled from directly, without need for MCMC. 7 σ2|y ∼ Inv-χ2(n − k, s2) , 21ˆTˆ s =n−k(y−Xβ) (y−Xβ). Predictions One use of linear regression is to make a prediction for a new set of covariates x ̃ = (x ̃1, . . . , x ̃k). Given the model parameters, the distribution of the new response is y ̃|β, σ2 ∼ Normal(β1x ̃1 + · · · + βkx ̃k, σ2) . To properly account for parametric uncertainty, we should use the posterior predictive distribution that averages over the un- certainty in β and σ2. 8 For each of the s = 1,...,S samples of the parameters, we sim- ulate (s) 􏰆 (s) (s) 2(s)􏰇 y ̃ ∼Normal β1 x ̃1+···+βk x ̃k,σ , and use the S draws y ̃ ,...,y ̃ to approximate the posterior (1) (S) predictive distribution. Example The file azdiabetes contains data on health-related variables of a population of n = 532 women. Here we will model the condi- tional distribution of glucose level (glu) as a linear combination of the other k − 1 = 6 variables, excluding the variable diabetes. Courseworks → Files → Examples → Example14 9