CS代考 ISOM3360 Data Mining for Business Analytics, Session 9

ISOM3360 Data Mining for Business Analytics, Session 9
Logistic Regression
Instructor: Department of ISOM Spring 2022

A Few Questions to Ask Yourself
What is linear regression? What business applications can it be used for?
What is simple vs. multiple linear regression?
What are considered as the right parameters/coefficients for linear regression?
How to interpret the coefficients of linear regression? What is LASSO regression? What can it be used for?

Commonly Used Induction Algorithms
Logistic regression is a classification model!

The Term “Regression” in Data Mining
In data mining, the term “regression” simply means perform prediction on numeric attribute (e.g., house price, spending amount, …)
It is very different from the meaning of “regression” in statistics. Don’t mix them up.
Decision trees can be used to do regression -> regression trees.
Not every model with “regression” in name performs regression. Logistic regression is a classification model!

Classification Problems: Revisit
Churn in cellular services: Stay / Leave? Email: Spam / Not Spam?
Online Transactions: Fraudulent (Yes / No)?
Tumor: Malignant / Benign ?
0: “Negative Class” (e.g., benign tumor) 1: “Positive Class” (e.g., malignant tumor)

Linear Regression (Recap)
Regression Task: Predict house price on sq feet.
Classification Task: Predict malignant on tumor size.

Transformation

Logistic Function
Replace z with the linear regression function

What does Logistic Regression Do?
Instead of directly predicting target variable value Y=1 or Y=0, we predict the probability (likelihood) of Y=1, P(Y=1)?
Given P(Y=1), the probability (likelihood) of Y=0 will be 1- P(Y-1).
The logistic regression model uses the predictor variables, which can be categorical or continuous, to predict the probability of target variable.

Interpretation of Output
“probability that y = 1, given x, parameterized by ”
Example: cancer diagnosis from tumor size If (x is the size of the tumor),
􏰂tell patient 70% chance of tumor being malignant.
Note that Therefore

Logistic Regression
Probability that y=1, given features ,
parameterized by .
Computer program helps to get the best that “fits” the training data (maximum likelihood estimation).

An Example
A group of 20 students spend between 0 and 6 hours studying
for an exam. Can we predict whether a student will pass an exam
based on the hours studying for the exam?
0: failed; 1: passed
If a student studies for 2 hours, estimated probability of passing 二the exam of 0.26;
If a student studies for
probability of passing the exam is 0.87.
4 hours, estimated

A Detour on Odds
For a given observation, odds indicates how much
more likely the positive event is to occur than the
negative event, e.g., P(Head)/P(Tail) for flipping a
Defaulter: 1
Non-Defaulter: 0
What is the odds of having a Defaulter?器

Logistic Regression: Another Interpretation
Logistic regression assumes that log adds of P(Y=1) is a linear combination of coefficients 𝜽 and predictor attributes 𝒙.

Interpreting Coefficients
Recall: In Linear Regression, how can we interpret the coefficients?
What about coefficients in Logistic Regression?
One unit of change in 𝑥𝑖 is associated with 𝜃𝑖 change in the log odds of P(y=1). 一

Interpreting Coefficients
One unit of change in 𝑥𝑖 is associated with 𝜃𝑖 change in the log odds of P(y=1).
A positive coefficient means that the predictor variable has a positive impact on the probability of the target
variable, while a negative coefficient means the opposite.
A large regression coefficient means strong impact on the probability of target variable (note: feature normalization required).

The Titanic Example
Which variable has the highest
impact on a person’s survival?

L1 and L2 Regularization in Logistic Regression
In logistic regression, we can also use regularization to automatically control the model complexity.
􏰁 L1 regularization (penalize based on absolute values of
coefficients)
􏰁 L2 regularization (penalize based on the squared values of coefficients)
The logistic regression model in sklearn library of Python supports both L1 and L2 regularization (please check the ‘penalty’ parameter when you apply it).

Multi-Class Classification
So far, we only discuss binary classification, where the target variable only has two values.
Many of real-world data mining tasks are multi-class classification.
􏰁 Email tagging: work, friends, family, hobby 􏰁 Illness Diagnose: not ill, cold, flu
􏰁 Weather: sunny, cloudy, rain, snow
􏰁 Image Recognition: cat, dog, horse

Multi-Class Classification
Solution: For a k-class classification, train k binary classifiers.
One-vs-rest (also known as one-vs-all) 一

One-vs-Rest
Train a logistic regression classifier for each class to predict the probability that .
On a new example , to make a prediction, pick the class that maximizes
What is your prediction?
= 0.4 = 0.2

Generalization Performance
Different modeling procedures may have different performance on the same data.
Different training sets may result in different generalization performance.
Different test sets may result in different estimates of the generalization performance.
If the training set size changes, you may also expect
different generalization performance from the resultant model.

Learning Curve
A learning curve shows how the generalization performance changes with varying sample size!

Decision Trees vs. Logistic Regression (I)
What is more comprehensible to the stakeholders? 􏰁 Rules?
􏰁 Numeric functions?
How much data do you have?
􏰁 For smaller training-set sizes, logistic regression yields
nnnoo_nnznng_f_uy.in
better generalization accuracy than tree induction
􏰁 With larger training sets, flexibility of tree induction can be an advantage: trees can represent substantially nonlinear relationships between the features and the target (need pruning to reduce overfitting)

A Real Case: TelCo
TelCo, a major telecommunications firm, wants to investigate its problem with customer attrition, or “churn”. They want to build a model to predict the churning probability of customers.
This dataset contains 20,000oexamples. 25

Learning Curve Comparison

Decision Trees vs. Logistic Regression (II)
What are the characteristics of the data?
􏰁 Trees are fairly robust to: missing values, types of variables
(numeric, categorical), how many are irrelevant, etc.
􏰁 Trees do not perform well when there is a lot of noise in the
Do you need good estimate on class probabilities?
􏰁 Logistic regression generates probabilities in a more sophisticated way.
Do you still remember how tree generates class probability estimates?

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts