PowerPoint Presentation
Recap from Week 7- Logistic Regression
Logistic Regression Model (Probability, Odds, Logit)
Model Evaluation (Confusion Matrix, Fit, Lift, ROC)
Model Predication (interpret results for categorical variables, making prediction )
Decision
Trees
Learning Objectives
The learning outcomes from this week’s lecture are:
Explain how a decision tree works for categorical and continuous outcome variables
Build a decision tree using SAS VA
Explain how a splitting algorithm works
Discuss how the performance of a decision tree is evaluated
Explain how decision tree parameters can be adjusted to improve performance
Introduction to Decision Trees
The benefit of using decision trees is that they are intuitive and can be understood by those who have little to no experience with machine learning and predictive analytics
A particular strength of decision trees is that the decisions entailed in the model are visible and explicit
Despite their apparent simplicity, decision trees are a powerful data science technique that can produce impressive results
Decision trees tend to work best with large datasets with many variables where they can reveal underlying data structures that might be more difficult to find using multiple regression and logistic regression models
Uses of Decision Trees
Classification and regression trees (CART) can be used in place of:
Multiple regression with a continuous outcome
Logistic regression with a binary outcome
Multinomial regression with an outcome that has multiple unordered responses
Ordinal regression with an outcome that has multiple ordered responses
How Decision Trees Work
Rather than creating an equation, decision trees provide a set of rules based on select variables from a dataset to make predictions
The tree-like structure and rule-based interpretation utilizes Boolean logic (true/false) to make the interpretation of the model more straightforward
The logical approach of decision trees more closely resembles human decision-making than predictions based on equations
As a result, the model is intuitive and useful for communicating predictions and other results to non-technical and non-quantitative stakeholders
Titanic Dataset Example
FIGURE 8.1., Vidgen et al. 2019
In the example:
The first split is based on sex of the passengers on board
Then based on the age being high or low (where the exact values of age depend on the previous split), the model classifies whether a person is likely to survive or perish
Decision Trees
in SAS VA
Split and Search
Split and Search
Information Entropy
FIGURE 8.6., Vidgen et al. 2019
For example, if two events had an equal chance of occurring, where the probability of each equals 0.5, corresponding to the following entropy:
1
Entropy formula:
Information Entropy
FIGURE 8.6., Vidgen et al. 2019
We will use the Titanic dataset below to calculate entropy for the target variable (Survived)
Information Gain Calculation
Using entropy we can calculate the information gain of each variable with regard to the target outcome. The best variable to split on is the one with the highest information gain.
First, let’s calculate the probability of survival for each gender group as follows:
FIGURE 8.3., Vidgen et al. 2019
Information Gain Calculation
Next, let’s calculate the entropy for each gender group as follows:
Information Gain Calculation
Before we can calculate the information gain, we need to calculate the proportions of each gender group as follows:
Information Gain Calculation
Lastly, information gain can be calculated as follows:
Model
Evaluation
Model Performance
Model Performance
Growth Strategy
Basic Growth Strategy
Advanced Growth Strategy
Growth Strategy
Basic Growth Strategy
Advanced Growth Strategy
Interactive Modelling
An important aspect of decision tree algorithms is determining the size of the tree, take care with the amount of detail provided as there is always a risk of overfitting the data
Pruning allows the modeller to adjust the complexity of the model, it starts with the most complex tree and generates subtrees by systematically removing branches at each level until only the root node is left
The subtrees’ predictive abilities are less than the most complex tree, with the least predictive tree being the single root node
Lenient pruning will lead to the most complex tree, while aggressive pruning will produce the least complex tree while maintaining an adequate level of predictive ability
Model Comparison
Decision Trees and Continuous Targets
Summary
Summary of Decision Trees
Decision trees are a powerful algorithm that are quite intuitive to understand, producing a set of rules similar to human decision-making
Decision trees can be used as an alternative to linear and logistic regression, particularly useful when the criteria of a regression model are not met
Alternative approaches to modelling categorical and continuous targets, which are an extension of decision trees, include random forests, naïve Bayes model, artificial neural network, and support vector machine
These models are more complex and require a deeper level of technical expertise, however these models can still be utilised by those without an extensive data science background by leveraging automated machine learning platforms