PowerPoint Presentation
Feedback of Individual Assignment
Coherent logic from issues, variables, methods, and recommendations
Clear boundaries of each issue (e.g., reputation, employability, students selection, education quality/personalized program)
Justification of variables (e.g., using references in addition to the case)
Mitigation of ethical issues
Writing (consider using Smarthinking service or other proof-reading tool)
Min: 35.0
Max: 92.0
Median: 70.0
Lecture
Outcomes
Lecture Outcomes
The learning outcomes from this week’s lecture are:
Explain the principles of logistic regression (probabilities, odds, log odds)
Discuss how logistic regression can be applied in a business context
Run a logistic regression using SAS VA
Interpret and discuss the outputs produced by a logistic regression model
Logistic
Regression
Predicting Categorical Outcomes
Examples of binary outcomes include:
Binary Outcomes Possible explanatory variables include
Success vs Failure
of a medical treatment dosage of medicine administered, patient’s age, sex, weight and severity of condition, etc
High vs Low
cholesterol level sex, age, whether a person smokes or not, previous conditions, exercises etc.
Vote For vs Against
a political party age, gender, education level, region, ethnicity, geographical location, etc.
Customer Churns vs Stays
with a company usage pattern, complaints, social demographics, etc.
Why can’t we use Linear Regression?
Predicting binary/categorical outcomes belongs to classification problems.
Linear regression cannot be used for representing the probability of an event occurring or the likelihood that an object or person belongs to a specific group.
https://www.graphpad.com/guides/prism/8/curve-fitting/reg_simple_logistic_and_linear_difference.htm
Principles of Logistic Regression
In logistic regression:
Terms are expressed in logarithmic terms (logit)
We predict the probability of the DV occurring
We can use a threshold rule to classify the DV as either 1 (e.g., where the probability is greater than 0.5) or 0 (the probability is less than or equal to 0.5)
The logit provides the basis for a linear relationship between the IVs and a binary DV
To see how this works we need to look at probabilities, odds, and logarithms.
Probabilities
In binary logistic regression each case’s Y variable (DV) has a probability between 0 and 1 that depends on the values of its X variables (IVs) such that:
That is, the probability of the events occurring must sum to one.
Let’s look at an example:
Assume that out of 15,000 students 12,000 wish to go to university, thus the remaining 3,000 do not wish to attend university
The probability a student will choose to go to university is 12,000/15,000 = 0.80
The probability that a student will not go to university is 1 – 0.80 = 0.20 (which is equivalent to 3,000/15,000)
Odds
Probabilities can be restated as odds. The odds that a student will choose to go to university are 12,000/3,000 = 4:1. If we choose a student at random then they are 4 times more likely that this student wished to go to university than not.
Odds are calculated as:
Thus, the odds of a student wishing to go to university are 0.8/(1–0.8) = 0.8/0.2 = 4.
Logarithms and Logit
Once the probability is converted to odds, we can use a logarithm to calculate the ‘logit’ or log odds. Hence log odds is the value we get when we take the log of the odds.
It is more appropriate to use log odds than probabilities because log odds do not have the floor of 0 and a ceiling of 1 which constrains probabilities.
By taking the log of the odds we get a scale that ranges from negative to positive infinity which is centred on 0.
Providing a linear relationship between the probability of Y and the X predictor variables, allowing us to apply a form of regression model.
Logit
The logit is the natural logarithm of the odds, which in turn are derived from the probability:
The logit can now be expressed as a familiar regression equation with a linear relationship between the DV (logit) and the IVs.
+ + + …
Logistic Function
FIGURE 7.3, Vidgen et al. 2019
Linear vs Logistic Regression
Image sourced from https://towardsdatascience.com/introduction-to-logistic-regression-66248243c148
Parameter Estimation
In multiple linear regression parameters are usually estimated using least squares
In logistic regression maximum likelihood estimation (MLE) is used to estimate parameters
MLE yields values for the unknown parameters that maximize the probability of obtaining the observed set of data
That is, the chosen estimates of the independent variables will be ones that, when values of the predictor variables are placed in it, result in values of Y closest to the observed values
Sample Size
Some sample size recommendations state that:
The sample size should be greater than 400
If we want to train and then test our model we should have a sample size of at least 800
Minimum would be 20x the number of independent/explanatory variables
Each group in the DV needs at least 10 observations per estimated parameter (IV)
Model
Evaluation
Interpreting Results
Model Assessment
Three forms of assessment are available:
Misclassification – assesses predictive accuracy of the model
Lift – assesses effectiveness of the model
Receiver-operating characteristic (ROC) – assesses classification accuracy
Misclassification
Misclassification examines the number of correct and incorrect predictions for two categories (known as positive and negative).
True positives (TP) = number of correctly predicted positive cases
True negatives (TN) = number of correctly predicted negative cases
False positives (FP) = number of incorrectly predicted positive cases
False negatives (FN) = number of incorrectly predicted negative cases
Confusion Matrix Input Values
Positive predictive value (precision) = TP/(TP + FP)
Negative predictive value = TN/(TN + FN)
True positive rate (sensitivity) = TP/(TP + FN)
True negative rate (specificity) = TN/(TN + FP)
False positive rate (fallout) = FP/(TN + FP) (or 1 – specificity)
Accuracy = (TP + TN)/total number of cases
F1 = 2TP/(2TP + FP + FN) or = 2 * ((precision * sensitivity)/(precision + sensitivity))
F1 is the harmonic mean of both precision and sensitivity – to balance both measures
Confusion Matrix
https://scaryscientist.blogspot.com/2016/03/confusion-matrix.html?view=classic
Confusion Matrix
TABLE 7.3, Vidgen et al. 2019
Lift
Receiver Operating Characteristic Curve
ROC graphs are:
Useful for organising classifiers and visualising their performance
Two-dimensional graphs in which the true positive rate is plotted on the Y-axis and a false positive rate is plotted on the X-axis
Depicting the relative trade-offs between benefits (true positives) and costs (false positives)
Informally, one point in ROC space is better than another if it is to the upper-left.
ROC
Residuals
Influence Plot
Parameter Estimates
Parameter Estimates
Sex (female)
Indicating that a female is more than 13 times likely to survive than a male, everything else being held constant.
Fit Statistics
AIC: Aikaike’s Information Criterion
BIC: Bayesian Information Criterion
Model
Prediction
Making Predictions
Let’s predict the probability of survival for the following cases:
Miss Maioni, a 16-year-old upper-class female passenger who paid 86.50 for her ticket, embarked at Southampton, and has no siblings or parents on board.
Mr Rouse, a 50-year-old lower-class male passenger who paid 8.05 for his ticket, embarked at Southampton, and also has no siblings or parents aboard.
We simply input the values in the model we wrote out on slide 29.
Making Predictions
TAB:LE 7.6, Vidgen et al. 2019
Maioni
Rouse
Summary
Summary of Logistic Regression
Logistic regression is a powerful technique for predicting categorical outcomes, however the models are not deterministic.
For example, a model predicting customer churn does not indicate that this customer will churn or will not. Models give us a probability that a customer will churn.
Thus, with logistic regression we are thinking of outcomes that are probabilities rather than the continuous values predicted in multiple linear regression.