Logistic Regression
Course Starts at 6:30
1
Logistic Regression
π(x)= eα+β1𝑋1 + β2X2 +…+βpXp 1+eα+β1𝑋1 +β2X2 +…+βpXp
2
Logistic Regression
This is a Logistic Function – Hence Logistic Regression
3
Logistic Regression – Model Odds are multiplicative
• ln (odds) = α + β 1 X 1 + β2X 2 + … + βpX p
• Odds = eα + β 1 𝑋 1 + β2X 2 + … + βpX p
• = (𝒆𝜶) (𝒆β𝟏 𝑿𝟏 )(…)(𝒆β𝒑 𝑿𝒑 )
• Or: Odds = constant * exp (constant * X 1
…
* exp (constant * X p
) )
4
Logistic Regression – Model Log (Odds) are additive
• ln (odds) = α + β 1 X 1 + β2X 2 + … + βpX p
• Or: ln (Odds) = constant + (constant * X 1 )
…
+ (constant * X p )
5
Logistic Regression – Model Flexibility
• X (independent) variables can be continuous or categorical
• Interactions can be incorporated
• Coefficients are estimated by maximum
likelihood
• Most computer programs implicitly use prior probabilities estimated from sample.
6
Logistic Regression – Model Generalized Linear Model (GLM)
• Logistic regression is an example of the GLM
• Define Y = outcome = 1 (event) or 0 (not)
• E (Y|X’s) = μ = P(1|X’s)
• Find a function g(μ), called the link function, such that:
g(μ) = linear function of the X’s
• This is called the GLM
• Here we take g(μ) = ln (odds) = logit function
7
Logistic Regression – Model Estimation
• Model is: g(μ) = α + β 1 X 1 + β2X 2 + … + βpX p
• Needtoestimate:α,β1 , β2, …βp
• Use an interative process called Iterative Weighted Least Squares:
1. Start with initial estimates of parameters
2. Evaluate the score equations (derivative of log- likelihood = 0)
3. Solve the score equations and get new estimates of parameters
4. Repeat until convergence.
8
Logistic Regression – Model Example: Depression Data Set Adjusted Risk Ratio
• RR = P(Y=1|X=1)/P(Y=1|X=0)
• P(Y=1|X) = eLC / (1 + eLC ), LC = A + B1 X1 + …
• Example: Depression, X = sex = 1 if F, age = 30, income = 10 ($10K/year)
• Find adjusted RR for F vs. M
• LC=-0.676–0.021Age–0.037Income+0.929Sex
9
Steps in Multiple Regression
1. State the research hypothesis.
2. State the null hypothesis
3. Gather the data
4. Assess each variable separately first (obtain measures of central tendency and dispersion; frequency distributions; graphs); is the variable normally distributed?
5. Assess the relationship of each independent variable, one at a time, with the dependent variable (calculate the correlation coefficient; obtain a scatter plot); are the two variables linearly related?
6. Assess the relationships between all of the independent variables with each other (obtain a correlation coefficient matrix for all the independent variables); are the independent variables too highly correlated with one another?
7. Calculate the regression equation from the data
8. Calculate and examine appropriate measures of association and tests of statistical significance for each coefficient and for the equation as a whole
9. Accept or reject the null hypothesis
10. Reject or accept the research hypothesis
11. Explain the practical implications of the findings
10