代写代考 ISOM3360 Data Mining for Business Analytics, Session 4

ISOM3360 Data Mining for Business Analytics, Session 4
Decision Trees (I)
Instructor: Department of ISOM Spring 2022

Recap: Data Understanding
Preliminary investigation of the data to better understand its specific characteristics
❖ Help in selecting appropriate data mining algorithms
Things to look at
❖ Class imbalance
❖ Dispersion of data attribute values ❖ Skewness, outliers, missing values ❖ Correlation analysis
Visualization tools are important ❖ Histograms, box plots
❖ Scatter plot

Recap: Data Preprocessing
Data transformation might be needed ❖ Handling missing values
❖ Handling categorical variables
❖ Feature transformation
􏰀 e.g., log transformation
􏰀 Normalization (back to this when clustering is discussed)
❖ Feature discretization

Which of the following normalization method(s) may transform the original variable to a negative value: min-max, z-score, or both, or neither?
I0，1 score

z training set
Do you apply normalization on training or testing set?
age i25 j income testing set
20，000 gender
age 25 j income 20 ooo gender
mean SDofhainingset todonormalization of testing set

What Induction Algorithm Shall We Use?

Commonly Used Induction Algorithms
Post-mortem analysis of a popular data mining competition

Why Decision Trees?
Decision trees are one of the most popular data mining tools.
❖ Classification trees (used when target variable is categorical)
❖ Regression trees (used when target variable is numeric) They are easy to understand, implement and use, and
computationally cheap.
The model comprehensibility is important for communication to non-DM-savvy stakeholders.

Classification Tree
Thetreeusually wn4
accuracy on havelwg training example
Objective: predicting borrowers who will default on loan payments.

Classification Tree: Upside-Down
Class = Not Default
Class = Not Default
Class = Not Default
Class = Default

Classification Tree: Divide and Conquer
“Recursive Partitioning”
❖ Each node represents one attribute
❖ Tests on nominal attribute: number of splits (branches) is number of possible values or 2 (one value vs. rest)
❖ Continuous attributes are discretized
Yes No Class = Not
Class = Not Default
❖ A class assignment (e.g., default /not default)
❖ Also provide a distribution over all possible classes (e.g., default with probability 0.25, not default with prob. 0.75)
Class = Not Default
Class = Default

How a Tree is Used for Classification ?
To determine the class of a new example: e,g., Mark, age 40, retired, balance 38K.
Yes ❖ The example is routed down the tree Class = Not
according to values of attributes .
❖ At each node, a test is applied to one attribute.
❖ When a leaf is reached, the example is assigned to a class—or alternatively to a distribution over the possible classes (e.g., default with probability 0.25, not default with prob. 0.75).
Class = Not Default
Class = Not Default
Class = Default

Assigning Probability Estimates
Entire Population
Age >=45 Age <45 Age >=45 Age < 45 Balance >= 50k Balance < 50k Training set Q: How would you assign estimates of class probability (e.g., probability of default/not default) based on such trees? Exercise: Assume this is the classification tree you learned from the A Process View to a Business Problem past defaulting data. A new guy, who is 45 and has 20K balance, is applying a credit card issued by your company. Can you predict if this new guy is gonna default? How confident are you about your prediction? Another girl is also applying the same credit card. But the only information we have about her is she has 70k balance. Can you predict if she will default? How sure about that? nyfrot Default in Classification Tree Learning Objective: based on customer attributes, partition the customers into subgroups that are less impure – with respect to the class (i.e., such that in each group most instances belong to the same class) Classification Tree Learning Partitioning into “purer” groups Orange Bodies Yes Yes Yes Purple Bodies Yes Yes No Classification Tree Learning Partitioning into “purer” groups recursively Purple Bodies Classification Tree Learning Purple Bodies Classification Tree Learning Partitioning into “purer” groups Orange Bodies Yes Yes Yes Purple Bodies Yes Yes No Classification Tree Learning Partitioning into “purer” groups recursively Orange Bodies Yes Yes Yes Yes Yes No Classification Tree Learning Orange Bodies Blue Limbs Yellow Limbs Classification Tree Learning Orange Bodies Purple Bodies No No No No Blue Limbs Yes Yes Yes Yellow Limbs Summary: Classification Tree Learning A tree is constructed by recursively partitioning the examples. With each partition, the examples are split into subgroups that are “increasingly pure”. The Idea of Partitioning Let’s play a game. I have someone in my mind, and your job is to guess this person. You can only ask yes/no question. This person is an entrepreneur. Some important questions without being answered yet: ❖ How to automatically choose which attribute to be used to split the data? ❖ When to stop the splitting? How to Choose Which Attribute to Split Over? Objectives ❖ For each splitting node, choose the attribute that best partitions the population into less impure groups. ❖ All else being equal, fewer nodes is better (more comprehensible, easy to use, reduce overfitting) Impurity measures: many available but most common one (from information theory) is: entropy. is the proportion of class in the data For example: our initial population is composed of 16 cases of class “Default” and 14 cases of class “Not default” Entropy (entire population of examples) = HighImpurity or Entropy Exercise A dataset is composed of 10 cases of class “Positive” and 10 cases of class “Negative” A 0.5 A dataset is composed of 0 cases of class “Positive” and 20 cases of class “Negative” 𝑡𝑖𝑝: 0 log2 0 = 0 % of one class Two-class entropy function 28 Information Gain (based on Entropy) The information gain is based on the reduction in entropy after a dataset is split on an attribute Information Gain = entropy (parent) – [weighted average] entropy (children) where weight of each child is given by the proportion of the examples in that child. Balance<50k Balance>=50k
information

Information Gain Example
30 instances
Balance < 50k Balance >= 50k
Entopywultorly
decrease or noremain unchange aftersplit
17 instances
13 instances
(Weighted) Average Impurity of Children =
Information Gain = 0.997 – 0.615 = 0.382

What If We Split Over “oAge” First? 30 instances
1092告十点以方 15 instances
0，836 Exercise!
15 instances Impurity = ? ogzz
Impurity = ?
0722 十号 0.836 二0.779
Information Gain = ?
, gain from first splitting on “Balance” = 0.382

Our Original Question
Now, ready to answer anxiously awaiting question:
How to choose which attribute to split over?
Answer: at each node, choose the attribute that obtains maximum information gain!

Decision Tree Algorithm (Full Tree)
❑ Step 1: Calculate the information gain from splitting over each attribute using the dataset
❑ Step 2: Split the set into subsets using the attribute for which
maxliufgainl
❑ Step 3: Make a decision tree node containing that attribute, divide the dataset by its branches and repeat the same process on every node.
❑ Step 4a: A node with entropy of 0, or cannot partition further, or obtain no information gain from additional split, is a leaf.
the information gain is maximum
❑ Step 4b: Otherwise, the node needs further splitting.

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts