留学生辅导 ID 376 489 541 693 782 976

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary
Fundamentals of Machine Learning for
Predictive Data Analytics
Chapter 4: Information-based Learning Sections 4.1, 4.2, 4.3

and Namee and Aoife D’Arcy

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary
Fundamentals
Decision Trees Shannon’s Entropy Model Information Gain
Standard Approach: The ID3 Algorithm
A Worked Example: Predicting Vegetation Distributions

Fundamentals Standard Approach: The ID3 Algorithm Summary
In this chapter we are going to introduce a machine learning algorithm that tries to build predictive models using only the most informative features.
In this context an informative feature is a descriptive feature whose values split the instances in the dataset into homogeneous sets with respect to the target feature value.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary
(a) Brian (b) John (c) Aphra (d) Aoife Figure: Cards showing character faces and names for the
Guess-Who game
Yes Yes No No
No No Yes No
Yes No No No
Brian Aoife

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary
(a) Brian (b) John (c) Aphra (d) Aoife Figure: Cards showing character faces and names for the
Guess-Who game
Which question would you ask first:
1 Is it a man?
2 Does the person wear glasses?

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary
Does the person wear glasses?
Does the person wear glasses?
Figure: The different question sequences that can follow in a game of Guess-Who beginning with the question Does the person wear
they have long hair?
they have long hair?
Is it a man?

Fundamentals Standard Approach: The ID3 Algorithm Summary
In both of the diagrams:
one path is 1 question long,
one path is 2 questions long,
and two paths are 3 questions long.
Consequently, if you ask Question (2) first the average number of questions you have to ask per game is:
1+2+3+3 =2.25 4

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary
Is it a man?
Does the person wear glasses?
Do they have long hair?

Figure: The different question sequences that can follow in a game of Guess-Who beginning with the question Is it a man?

Fundamentals Standard Approach: The ID3 Algorithm Summary
All the paths in this diagram are two questions long.
So, on average if you ask Question (1) first the average number of questions you have to ask per game is:
2+2+2+2=2 4

Fundamentals Standard Approach: The ID3 Algorithm Summary
On average getting an answer to Question (1) seems to give you more information than an answer to Question (2): less follow up questions.
This is not because of the literal message of the answers: YES or NO.
It is to do with how the answer to each questions splits the domain into different sized sets based on the value of the descriptive feature the question is asked about and the likelihood of each possible answer to the question.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary
So the big idea here is to figure out which features are the most informative ones to ask questions about by considering the effects of the different answers to the questions, in terms of:
1 how the domain is split up after the answer is received,
2 and the likelihood of each of the answers.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary
Fundamentals

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Decision Trees
A decision tree consists of:
1 a root node (or starting node),
2 interior nodes
3 and leaf nodes (or terminating nodes).
Each of the non-leaf nodes (root and interior) in the tree specifies a test to be carried out on one of the query’s descriptive features.
Each of the leaf nodes specifies a predicted classification for the query.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Decision Trees
ID 376 489 541 693 782 976
SUSPICIOUS WORDS true true true false false false
UNKNOWN SENDER false true true true false false
IMAGES CLASS
true spam false spam false spam true ham false ham false ham
Table: An email spam prediction dataset.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Decision Trees
Contains Images
Suspicious Words
true false
Contains Images
Unknown Sender
Unknown Sender
true false
Suspicious Words
Suspicious Words
true false
Figure: (a) and (b) show two decision trees that are consistent with the instances in the spam dataset. (c) shows the path taken through the tree shown in (a) to make a prediction for the query instance: SUSPICIOUS WORDS = ’true’, UNKNOWN SENDER = ’true’, CONTAINS IMAGES = ’true’.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Decision Trees
Both of these trees will return identical predictions for all the examples in the dataset.
So, which tree should we use?

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Decision Trees
Apply the same approach as we used in the
Guess-Who game: prefer decision trees that use less tests (shallower trees).
This is an example of Occam’s Razor.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Decision Trees
How do we create shallow trees?
The tree that tests SUSPICIOUS WORDS at the root is very shallow because the SUSPICIOUS WORDS feature perfectly splits the data into pure groups of ’spam’ and ’ham’.
Descriptive features that split the dataset into pure sets with respect to the target feature provide information about the target feature.
So we can make shallow trees by testing the informative features early on in the tree.
All we need to do that is a computational metric of the purity of a set: entropy

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Shannon’s Entropy Model
’s entropy model defines a computational measure of the impurity of the elements of a set.
An easy way to understand the entropy of a set is to think in terms of the uncertainty associated with guessing the result if you were to make a random selection from the set.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Shannon’s Entropy Model
Entropy is related to the probability of a outcome. High probability → Low entropy
Low probability → High entropy
If we take the log of a probability and multiply it by -1 we get this mapping!

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Shannon’s Entropy Model
What is a log?
Remember the log of a to the base b is the number to which we must raise b to get a.
log2(0.5) = −1 because 2−1 = 0.5 log2(1) = 0 because 20 = 1
log2(8) = 3 because 23 = 8
log5(25) = 2 because 52 = 25 log5(32) = 2.153 because 52.153 = 32

Big Idea Fundamentals Shannon’s Entropy Model
Standard Approach: The ID3 Algorithm Summary
−10 −8 −6 −4 −2 0
− log2P(x)
0 2 4 6 8 10
0.0 0.2 0.4
0.4 0.6 0.8 1.0 P(x)
Figure: (a) A graph illustrating how the value of a binary log (the log to the base 2) of a probability changes across the range of probability values. (b) the impact of multiplying these values by −1.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Shannon’s Entropy Model
Shannon’s model of entropy is a weighted sum of the logs of the probabilities of each of the possible outcomes when we make a random selection from a set.
H(t)=−􏰈(P(t =i)×logs(P(t =i))) (1)

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Shannon’s Entropy Model
What is the entropy of a set of 52 different playing cards?
H(card) = =
−􏰈P(card =i)×log2(P(card =i))
− 􏰈 0.019 × log2(0.019) = − 􏰈 −0.1096 i=1 i=1
5.700 bits

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Shannon’s Entropy Model
What is the entropy of a set of 52 playing cards if we only distinguish between the cards based on their suit
{ê, T, , ♠}?

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Shannon’s Entropy Model
H(suit) = − 􏰈 P(suit = l) × log2(P(suit = l)) l ∈{ê,T, ,♠}
= −􏰁(P(ê)×log2(P(ê)))+(P(T)×log2(P(T)))
+ (P( ) × log2(P( ))) + (P(♠) × log2(P(♠))) 􏰂
= −􏰁 􏰃13/52 × log2(13/52)􏰄 + 􏰃13/52 × log2(13/52)􏰄
+ 􏰃13/52 × log2(13/52)􏰄 + 􏰃13/52 × log2(13/52)􏰄 􏰂
= −􏰁(0.25×−2)+(0.25×−2) +(0.25×−2)+(0.25×−2)􏰂

Big Idea Fundamentals Shannon’s Entropy Model
Standard Approach: The ID3 Algorithm Summary
(a) H(card) = 0.00
(b) H(card) = 0.81 (c) H(card) = 1.00
(d) H(card) = 1.50
(e) H(card) = 1.58 (f) H(card) = 3.58
Figure: The entropy of different sets of playing cards measured in bits.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Shannon’s Entropy Model
Table: The relationship between the entropy of a message and the set it was selected from.
Entropy of a Message
High Medium Medium Low
Properties of the Message Set
A large set of equally likely messages.
A large set of messages, some more likely than others. A small set of equally likely messages.
A small set of messages with one very likely message.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Information Gain
Suspicious Words
Contains Unknown Images
376 489 541
693 ham 782 ham 976 ham
true false
true false
Figure: How the instances in the spam dataset split when we partition using each of the different descriptive features from the spam dataset in Table 1 [15]
782 ham 976 ham
782 ham 976 ham

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Information Gain
Our intuition is that the ideal discriminatory feature will partition the data into pure subsets where all the instances in each subset have the same classification.
SUSPICIOUS WORDS perfect split.
UNKNOWN SENDER mixture but some information (when ’true’ most instances are ’spam’).
CONTAINS IMAGES no information.
One way to implement this idea is to use a metric called information gain.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Information Gain
Information Gain
The information gain of a descriptive feature can be understood as a measure of the reduction in the overall entropy of a prediction task by testing on that feature.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Information Gain
Computing information gain involves the following 3 equations: H(t,D)=− 􏰈 (P(t =l)×log2(P(t =l))) (2)
l∈levels(t)
rem(d,D)= 􏰈 |Dd=l| ×H(t,Dd=l) (3) |D| 􏰓 􏰒􏰑 􏰔
l∈levels(d) 􏰓 􏰒􏰑 􏰔
weighting partition Dd=l
IG (d, D) = H (t, D) − rem (d, D) (4)
entropy of

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Information Gain
As an example we will calculate the information gain for each of the descriptive features in the spam email dataset.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Information Gain
Calculate the entropy for the target feature in the dataset.
H(t,D)=− 􏰈 (P(t =l)×log2(P(t =l))) l∈levels(t)
ID 376 489 541 693 782 976
SUSPICIOUS WORDS true true true false false false
UNKNOWN SENDER false true true true false false
IMAGES CLASS
true spam false spam false spam true ham false ham false ham

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Information Gain
H(t,D)= − 􏰈 (P(t =l)×log2(P(t =l))) l ∈{’spam’,’ham’}
= − ((P(t = ’spam’) × log2(P(t = ’spam’)) + (P(t = ’ham’) × log2(P(t = ’ham’)))
= −􏰃􏰃3/6 × log2(3/6)�� + 􏰃3/6 × log2(3/6)􏰄􏰄 = 1 bit

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Information Gain
Calculate the remainder for the SUSPICIOUS WORDS feature in the dataset.
rem(d,D)= 􏰈 |Dd=l| ×H(t,Dd=l) |D| 􏰓 􏰒􏰑 􏰔
ID 376 489 541 693 782 976
SUSPICIOUS WORDS true true true false false false
UNKNOWN SENDER false true true true false false
IMAGES CLASS
true spam false spam false spam true ham false ham false ham
l∈levels(d) 􏰓 􏰒􏰑 􏰔
weighting partition Dd=l
entropy of

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Information Gain
rem (WORDS, D)
􏰅 |DWORDS=T | 􏰆 􏰅 |DWORDS=F | 􏰆 |D| × H (t, DWORDS=T ) + |D| × H (t, DWORDS=F )
= 􏰃3/6 × 􏰃− 􏰃􏰃3/3 × log2(3/3)􏰄 + 􏰃0/3 × log2(0/3)􏰄􏰄􏰄􏰄
+ 􏰃3/6 × 􏰃− 􏰃􏰃0/3 × log2(0/3)􏰄 + 􏰃3/3 × log2(3/3)􏰄􏰄􏰄􏰄 = 0 bits
   
3/6 × − 􏰈 P(t = l) × log2(P(t = l)) l ∈{’spam’ ,’ham’ }
   
3/6 × − 􏰈 P(t =l)×log2(P(t =l)) l ∈{’spam’ ,’ham’ }

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Information Gain
Calculate the remainder for the UNKNOWN SENDER feature in the dataset.
rem(d,D)= 􏰈 |Dd=l| ×H(t,Dd=l) |D| 􏰓 􏰒􏰑 􏰔
ID 376 489 541 693 782 976
SUSPICIOUS WORDS true true true false false false
UNKNOWN SENDER false true true true false false
IMAGES CLASS
true spam false spam false spam true ham false ham false ham
l∈levels(d) 􏰓 􏰒􏰑 􏰔
weighting partition Dd=l
entropy of

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Information Gain
rem (SENDER, D)
􏰅 |DSENDER=T | 􏰆 􏰅 |DSENDER=F | 􏰆 |D| × H (t, DSENDER=T ) + |D| × H (t, DSENDER=F )
= 􏰃3/6 × 􏰃− 􏰃􏰃2/3 × log2(2/3)􏰄 + 􏰃1/3 × log2(1/3)􏰄􏰄􏰄􏰄
+ 􏰃3/6 × 􏰃− 􏰃􏰃1/3 × log2(1/3)􏰄 + 􏰃2/3 × log2(2/3)􏰄􏰄􏰄􏰄 = 0.9183 bits
   
3/6 × − 􏰈 P(t = l) × log2(P(t = l)) l ∈{’spam’ ,’ham’ }
   
3/6 × − 􏰈 P(t =l)×log2(P(t =l)) l ∈{’spam’ ,’ham’ }

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Information Gain
Calculate the remainder for the CONTAINS IMAGES feature in the dataset.
rem(d,D)= 􏰈 |Dd=l| ×H(t,Dd=l) |D| 􏰓 􏰒􏰑 􏰔
ID 376 489 541 693 782 976
SUSPICIOUS WORDS true true true false false false
UNKNOWN SENDER false true true true false false
IMAGES CLASS
true spam false spam false spam true ham false ham false ham
l∈levels(d) 􏰓 􏰒􏰑 􏰔
weighting partition Dd=l
entropy of

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Information Gain
rem (IMAGES, D)
􏰅 |DIMAGES=T | 􏰆 􏰅 |DIMAGES=F | 􏰆 |D| × H (t, DIMAGES=T ) + |D| × H (t, DIMAGES=F )
= 􏰃2/6 × 􏰃− 􏰃􏰃1/2 × log2(1/2)􏰄 + 􏰃1/2 × log2(1/2)􏰄􏰄􏰄􏰄
+ 􏰃4/6 × 􏰃− 􏰃􏰃2/4 × log2(2/4)􏰄 + 􏰃2/4 × log2(2/4)􏰄􏰄􏰄􏰄 = 1 bit
   
2/6 × − 􏰈 P(t = l) × log2(P(t = l)) l ∈{’spam’ ,’ham’ }
   
4/6 × − 􏰈 P(t =l)×log2(P(t =l)) l ∈{’spam’ ,’ham’ }

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Information Gain
Calculate the information gain for the three descriptive feature in the dataset.
IG (d, D) = H (t, D) − rem (d, D)

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary Information Gain
IG (SUSPICIOUS WORDS, D) = H (CLASS, D) − rem (SUSPICIOUS WORDS, D) = 1 − 0 = 1 bit
IG (UNKNOWN SENDER, D) = H (CLASS, D) − rem (UNKNOWN SENDER, D) = 1 − 0.9183 = 0.0817 bits
IG (CONTAINS IMAGES, D) = H (CLASS, D) − rem (CONTAINS IMAGES, D) = 1 − 1 = 0 bits
The results of these calculations match our intuitions.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary
Standard Approach: The ID3 Algorithm

Fundamentals Standard Approach: The ID3 Algorithm Summary
ID3 Algorithm (Iterative Dichotomizer 3)
Attempts to create the shallowest tree that is consistent with the data that it is given.
The ID3 algorithm builds the tree in a recursive, depth-first manner, beginning at the root node and working down to the leaf nodes.

Fundamentals Standard Approach: The ID3 Algorithm Summary
The algorithm begins by choosing the best descriptive feature to test (i.e., the best question to ask first) using information gain.
A root node is then added to the tree and labelled with the selected test feature.
The training dataset is then partitioned using the test. For each partition a branch is grown from the node.
The process is then repeated for each of these branches using the relevant partition of the training set in place of the full training set and with the selected test feature excluded from further testing.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary
The algorithm defines three situations where the recursion stops and a leaf node is constructed:
1 All of the instances in the dataset have the same classification (target feature value) then return a leaf node tree with that classification as its label.
2 The set of features left to test is empty then return a leaf node tree with the majority class of the dataset as its classification.
3 The dataset is empty return a leaf node tree with the majority class of the dataset at the parent node that made the recursive call.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary A Worked Example: Predicting Vegetation Distributions
Table: The vegetation classification dataset.
ID STREAM 1 false 2 true
4 false 5 false 6 true 7 true
SLOPE ELEVATION steep high moderate low
VEGETATION chaparral riparian
steep medium riparian steep medium chaparral flat high conifer
steep highest conifer steep high chaparral

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary A Worked Example: Predicting Vegetation Distributions
H (VEGETATION, D)
= − 􏰈 P(VEGETATION = l) × log2 (P(VEGETATION = l))
􏰉’chaparral’,􏰊 l ∈ ’riparian’,
= − 􏰃􏰃3/7 × log2(3/7)􏰄 + 􏰃2/7 × log2(2/7)􏰄 + 􏰃2/7 × log2(2/7)􏰄􏰄 = 1.5567 bits

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary A Worked Example: Predicting Vegetation Distributions
Table: Partition sets (Part.), entropy, remainder (Rem.) and information gain (Info. Gain) by feature for the dataset in Table 3 [49].
Split By Feature
STREAM SLOPE
Level Part. ’true’ D1 ’false’ D2 ’flat’ D3 ’moderate’ D4 ’steep’ D5 ’low’ D6 ’medium’ D7 ’high’ D8 ’highest’ D9
Instances d2, d3, d6, d7
Partition Entropy 1.5
Rem. 1.2507
Info. Gain
0.3060 0.5774
d1, d4,d5 0.9183 d5 0 d2 0
d1,d3, d4 , d6 , d7 1.3710 d2 0
d3,d4 1.0 0.6793 0.8774 d1 , d5 , d7 0.9183

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary A Worked Example: Predicting Vegetation Distributions
low highest medium high
ID Stream Slope Vegetation
2 true moderate riparian
ID Stream Slope Vegetation
6 true steep conifer
Vegetation
Vegetation
Figure: The decision tree after the data has been split using ELEVATION.

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary A Worked Example: Predicting Vegetation Distributions
H (VEGETATION, D7)
= − 􏰈 P(VEGETATION = l) × log2 (P(VEGETATION = l))
􏰉’chaparral’,􏰊 l ∈ ’riparian’,
= − 􏰃􏰃1/2 × log2(1/2)􏰄 + 􏰃1/2 × log2(1/2)􏰄 + 􏰃0/2 × log2(0/2)􏰄􏰄 = 1.0 bits

Big Idea Fundamentals Standard Approach: The ID3 Algorithm Summary A Worked Example: Predicting Vegetation Distributions
Table: Partition sets (Part.), entropy, remainder (Rem.) and information gain (Info. Gain) by feature for the dataset D7 in Figure 9
Split By Feature
STREAM SLOPE
’true’ ’false’ ’flat’ ’moderate’ ’steep’
Partition Info. Part. Instances Entropy Rem. Gain
D10 d3 001.0 D11 d4 0
D13 0 1.0 0 D14 d3,d4 1.0

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts