Introduction to information system
Naïve Bayes and Decision Tree
Deema Abdal Hafeth
CMP3036M/CMP9063M Data Science
2016 – 2017
Objectives
Naïve Bayes
Naïve Bayes and nominal attributes
Bayes’s Rule
Naïve Bayes and numeric attributes
Decision Tree
Information value (entropy)
Information Gain
From Decision Tree to Decision Rule
• Appendix: Examples for R libraries
References
• James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An introduction
to statistical learning. Springer. (Chapter 8)
• Witten, I. H., Frank, E. (2005). Data Mining: Practical machine learning tools
and techniques. Morgan Kaufmann. (Chapter 4)
Naïve Bayes
Weather dataset
Find the probability of playing outside or
not when:
Weather dataset with nominal attributes
outlook temperature humidity windy play
rainy hot high false no
rainy hot high true no
overcast hot high false yes
sunny mild high false yes
sunny cool normal false yes
sunny cool normal true no
overcast cool normal true yes
rainy mild high false no
rainy cool normal false yes
runny mild normal false yes
rainy mild normal true yes
overcast mild high true yes
overcast hot normal false yes
sunny mild high true no
outlook temperature humidity windy play
sunny cool high true ?
Bayes’s Rule
𝑃 𝑌 𝑋1, … , 𝑋𝑛 =
𝑃 𝑋1, … , 𝑋𝑛 𝑌 𝑃(𝑌)
𝑃(𝑋1, … , 𝑋𝑛)
Where
• 𝑌 is the class
• 𝑋𝑛 is the particular combination of variable values
𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 ∝ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 × 𝑃𝑟𝑖𝑜𝑟 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦
Likelihood Class Prior Probability
Predictor Prior Probability Posterior Probability
𝑃 𝑌 = 𝑦𝑒𝑠 𝑋1, … , 𝑋𝑛 =
𝑃 𝑋1, … , 𝑋𝑛 𝑌 = 𝑦𝑒𝑠 𝑃(𝑌 = 𝑦𝑒𝑠)
𝑃 𝑋1, … , 𝑋𝑛 𝑌 = 𝑦𝑒𝑠 𝑃 𝑌 = 𝑦𝑒𝑠 + 𝑃 𝑋1, … , 𝑋𝑛 𝑌 = 𝑛𝑜 𝑃(𝑌 = 𝑛𝑜)
𝑃 𝑌 = 𝑛𝑜 𝑋1, … , 𝑋𝑛 =
𝑃 𝑋1, … , 𝑋𝑛 𝑌 = 𝑛𝑜 𝑃(𝑌 = 𝑛𝑜)
𝑃 𝑋1, … , 𝑋𝑛 𝑌 = 𝑦𝑒𝑠 𝑃 𝑌 = 𝑦𝑒𝑠 + 𝑃 𝑋1, … , 𝑋𝑛 𝑌 = 𝑛𝑜 𝑃(𝑌 = 𝑛𝑜)
Bayes’s Rule
The classification depends on the probability
and the prediction bases on the greater value.
𝑃 𝑛𝑜 𝐸 =
0.0206
0.0053 + 0.0206
= 0.795
𝑃 𝑦𝑒𝑠 𝐸 =
0.0053
0.0053 + 0.0206
= 0.204
outlook temperature humidity windy play
sunny cool high true no
Weather dataset
Find the probability of playing outside or
not when:
outlook temperature humidity windy play
sunny 66 90 true ?
Weather dataset with numeric attributes
outlook temperature humidity windy play
rainy 85 85 false no
rainy 80 90 true no
overcast 83 86 false yes
sunny 70 96 false yes
sunny 68 80 false yes
sunny 65 70 true no
overcast 64 65 true yes
rainy 72 95 false no
rainy 69 70 false yes
runny 75 80 false yes
rainy 75 70 true yes
overcast 72 90 true yes
overcast 81 75 false yes
sunny 71 91 true no
Where
• 𝜇 is the mean
• 𝜎 is standard deviation
𝑓 𝑥 =
1
2𝜋 𝜎
𝑒
−
𝑥−𝜇 2
2 𝜎2
For numeric attribute use the probability density function
𝑃 𝑦𝑒𝑠 𝐸 =
0.000036
0.000036 + 0.000108
= 0.25
outlook temperature humidity windy play
sunny 66 90 true no
𝑃 𝑛𝑜 𝐸 =
0.000108
0.000036 + 0.000108
= 0.75
Decision Tree
Find the probability of playing
outside or not when:
outlook temperature humidity windy play
rainy hot high false no
rainy hot high true no
overcast hot high false yes
sunny mild high false yes
sunny cool normal false yes
sunny cool normal true no
overcast cool normal true yes
rainy mild high false no
rainy cool normal false yes
runny mild normal false yes
rainy mild normal true yes
overcast mild high true yes
overcast hot normal false yes
sunny mild high true no
Weather dataset
outlook temperature humidity windy play
sunny cool high true ?
humidity
normal high normal
yes
windy
true false
outlook
humidity overcast windy
sunny
high
no
overcast
yes
rainy
false
yes
true
no
Construct tree
Where
• Each internal node represents predictor variable
• Each branch is a predictor variable value
• Each leaf node assigns a classification class
outlook temperature humidity windy play
sunny cool high true no
humidity
normal high normal
yes
windy
true false
outlook
humidity overcast windy
sunny
high
no
rainy
false
yes
true
no
Decision tree for the weather dataset
overcast
yes
How we do construct the decision tree?
Information value (entropy)
Where 𝑝 𝑥𝑖 is the fractions that add up to one
Entropy is a measure of disorder of data and uncertainty in any random variable
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑋 = − 𝑝 𝑥𝑖 log2 𝑝(𝑥𝑖)
𝑁
𝑖=1
If p = 1 and (1 − p) = 0 then the entropy is 0
If p = (1 − 𝑝) = 0.5 then the entropy is 1
Entropy function for two-class problem
E
n
tr
o
p
y
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 = −𝑝 log2 𝑝 − (1 − 𝑝) log2(1 − 𝑝)
Example
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 4,0 = −
4
4
× log2
4
4
−
0
4
× log2
0
4
=0
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 3,3 = −
3
6
× log2
3
6
−
3
6
× log2
3
6
= 0.5 + 0.5 = 1
𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛 = 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 (𝑝𝑎𝑟𝑒𝑛𝑡) – 𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑆𝑢𝑚 𝑜𝑓 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 (𝐶ℎ𝑖𝑙𝑑𝑟𝑒𝑛)
Information Gain
Information gain is the difference between the target entropy and the joint entropy
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 = 0.971
outlook
sunny overcast rainy
𝑒𝑛𝑡𝑟𝑜𝑝𝑦 = 0.971 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 = 0
entropy (play)
entropy (outlook)
Information Gain 𝑝𝑙𝑎𝑦, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 0.940 − 0.693 = 0.247
outlook
sunny overcast rainy
The information gain for all the predictor variables (attributes) and play
√ Information Gain (play, outlook) 0.247
Information Gain (play, windy) 0.048
Information Gain (play, humidity) 0.152
Information Gain (play, temperature) 0.029
humidity
normal high
outlook
humidity overcast rainy
sunny
The information gain for other predictor
variables (attributes) and (outlook = sunny)
√ Information Gain (outlook, humidity) 0.971
Information Gain (outlook, windy) 0.020
Information Gain (outlook, temperature) 0.571
humidity
normal high normal
yes
windy
true false
outlook
humidity overcast windy
sunny
high
no
overcast
yes
rainy
false
yes
true
no
Decision tree for the weather dataset
(2 , 0) (0 , 3)
(4 , 0)
(0 , 2) (3 , 0)
From Decision Tree to Decision Rule
Rule1: If (outlook = sunny) and (humidity = normal) then play = yes
Rule2: If (outlook = sunny) and (humidity = high) then play = no
Rule3: If (outlook = overcast) then play = yes
Rule4: If (outlook = rainy) and (windy = false) then play = yes
Rule5: If (outlook = rainy) and (windy = true) then play = no
normal
yes
windy
true false
outlook
humidity overcast windy
sunny
high
no
rainy
false
yes
true
no
humidity
normal high
overcast
yes
Summary
Naïve Bayes
Naïve Bayes and nominal attributes
Bayes’s Rule
Naïve Bayes and numeric attributes
Decision Tree
Information value (entropy)
Information Gain
From Decision Tree to Decision Rule
Thank You
dabdalhafeth@Lincoln.ac.uk
mailto:bchen@Lincoln.ac.uk
Appendix: Examples for R libraries
1) Naïve Bayes model
• e1071 https://cran.r-project.org/web/packages/e1071/e1071.pdf
• klaR https://cran.r-project.org/web/packages/klaR/klaR.pdf
2) Decision Tree model
• tree https://cran.r-project.org/web/packages/tree/tree.pdf
• rpart https://cran.r-project.org/web/packages/rpart/rpart.pdf
https://cran.r-project.org/web/packages/e1071/e1071.pdf
https://cran.r-project.org/web/packages/e1071/e1071.pdf
https://cran.r-project.org/web/packages/e1071/e1071.pdf
https://cran.r-project.org/web/packages/e1071/e1071.pdf
https://cran.r-project.org/web/packages/e1071/e1071.pdf
https://cran.r-project.org/web/packages/klaR/klaR.pdf
https://cran.r-project.org/web/packages/klaR/klaR.pdf
https://cran.r-project.org/web/packages/klaR/klaR.pdf
https://cran.r-project.org/web/packages/klaR/klaR.pdf
https://cran.r-project.org/web/packages/klaR/klaR.pdf
https://cran.r-project.org/web/packages/tree/tree.pdf
https://cran.r-project.org/web/packages/tree/tree.pdf
https://cran.r-project.org/web/packages/tree/tree.pdf
https://cran.r-project.org/web/packages/tree/tree.pdf
https://cran.r-project.org/web/packages/tree/tree.pdf
https://cran.r-project.org/web/packages/rpart/rpart.pdf
https://cran.r-project.org/web/packages/rpart/rpart.pdf
https://cran.r-project.org/web/packages/rpart/rpart.pdf
https://cran.r-project.org/web/packages/rpart/rpart.pdf
https://cran.r-project.org/web/packages/rpart/rpart.pdf