程序代写代做代考 data science data mining decision tree Introduction to information system

Introduction to information system

Naïve Bayes and Decision Tree

Deema Abdal Hafeth

CMP3036M/CMP9063M Data Science

2016 – 2017

Objectives

 Naïve Bayes

 Naïve Bayes and nominal attributes

 Bayes’s Rule

 Naïve Bayes and numeric attributes

 Decision Tree

 Information value (entropy)

 Information Gain

 From Decision Tree to Decision Rule

• Appendix: Examples for R libraries

References

• James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An introduction

to statistical learning. Springer. (Chapter 8)

• Witten, I. H., Frank, E. (2005). Data Mining: Practical machine learning tools

and techniques. Morgan Kaufmann. (Chapter 4)

Naïve Bayes

Weather dataset

Find the probability of playing outside or

not when:

Weather dataset with nominal attributes

outlook temperature humidity windy play

rainy hot high false no

rainy hot high true no

overcast hot high false yes

sunny mild high false yes

sunny cool normal false yes

sunny cool normal true no

overcast cool normal true yes

rainy mild high false no

rainy cool normal false yes

runny mild normal false yes

rainy mild normal true yes

overcast mild high true yes

overcast hot normal false yes

sunny mild high true no

outlook temperature humidity windy play

sunny cool high true ?

Bayes’s Rule

𝑃 𝑌 𝑋1, … , 𝑋𝑛 =
𝑃 𝑋1, … , 𝑋𝑛 𝑌 𝑃(𝑌)

𝑃(𝑋1, … , 𝑋𝑛)

Where

• 𝑌 is the class
• 𝑋𝑛 is the particular combination of variable values

𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 ∝ 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 × 𝑃𝑟𝑖𝑜𝑟 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦

Likelihood Class Prior Probability

Predictor Prior Probability Posterior Probability

𝑃 𝑌 = 𝑦𝑒𝑠 𝑋1, … , 𝑋𝑛 =
𝑃 𝑋1, … , 𝑋𝑛 𝑌 = 𝑦𝑒𝑠 𝑃(𝑌 = 𝑦𝑒𝑠)

𝑃 𝑋1, … , 𝑋𝑛 𝑌 = 𝑦𝑒𝑠 𝑃 𝑌 = 𝑦𝑒𝑠 + 𝑃 𝑋1, … , 𝑋𝑛 𝑌 = 𝑛𝑜 𝑃(𝑌 = 𝑛𝑜)

𝑃 𝑌 = 𝑛𝑜 𝑋1, … , 𝑋𝑛 =
𝑃 𝑋1, … , 𝑋𝑛 𝑌 = 𝑛𝑜 𝑃(𝑌 = 𝑛𝑜)

𝑃 𝑋1, … , 𝑋𝑛 𝑌 = 𝑦𝑒𝑠 𝑃 𝑌 = 𝑦𝑒𝑠 + 𝑃 𝑋1, … , 𝑋𝑛 𝑌 = 𝑛𝑜 𝑃(𝑌 = 𝑛𝑜)

Bayes’s Rule

The classification depends on the probability

and the prediction bases on the greater value.

𝑃 𝑛𝑜 𝐸 =
0.0206

0.0053 + 0.0206
= 0.795

𝑃 𝑦𝑒𝑠 𝐸 =
0.0053

0.0053 + 0.0206
= 0.204

outlook temperature humidity windy play

sunny cool high true no

Weather dataset

Find the probability of playing outside or

not when:

outlook temperature humidity windy play

sunny 66 90 true ?

Weather dataset with numeric attributes
outlook temperature humidity windy play

rainy 85 85 false no

rainy 80 90 true no

overcast 83 86 false yes

sunny 70 96 false yes

sunny 68 80 false yes

sunny 65 70 true no

overcast 64 65 true yes

rainy 72 95 false no

rainy 69 70 false yes

runny 75 80 false yes

rainy 75 70 true yes

overcast 72 90 true yes

overcast 81 75 false yes

sunny 71 91 true no

Where

• 𝜇 is the mean
• 𝜎 is standard deviation

𝑓 𝑥 =
1

2𝜋 𝜎
𝑒

𝑥−𝜇 2

2 𝜎2

For numeric attribute use the probability density function

𝑃 𝑦𝑒𝑠 𝐸 =
0.000036

0.000036 + 0.000108
= 0.25

outlook temperature humidity windy play

sunny 66 90 true no

𝑃 𝑛𝑜 𝐸 =
0.000108

0.000036 + 0.000108
= 0.75

Decision Tree

Find the probability of playing

outside or not when:

outlook temperature humidity windy play

rainy hot high false no

rainy hot high true no

overcast hot high false yes

sunny mild high false yes

sunny cool normal false yes

sunny cool normal true no

overcast cool normal true yes

rainy mild high false no

rainy cool normal false yes

runny mild normal false yes

rainy mild normal true yes

overcast mild high true yes

overcast hot normal false yes

sunny mild high true no

Weather dataset

outlook temperature humidity windy play

sunny cool high true ?

humidity

normal high normal

yes

windy

true false

outlook

humidity overcast windy
sunny

high

no

overcast

yes

rainy

false

yes

true

no

Construct tree

Where

• Each internal node represents predictor variable

• Each branch is a predictor variable value

• Each leaf node assigns a classification class

outlook temperature humidity windy play

sunny cool high true no

humidity

normal high normal

yes

windy

true false

outlook

humidity overcast windy

sunny

high

no

rainy

false

yes

true

no

Decision tree for the weather dataset

overcast

yes

How we do construct the decision tree?

Information value (entropy)

Where 𝑝 𝑥𝑖 is the fractions that add up to one

Entropy is a measure of disorder of data and uncertainty in any random variable

𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑋 = − 𝑝 𝑥𝑖 log2 𝑝(𝑥𝑖)

𝑁

𝑖=1

If p = 1 and (1 − p) = 0 then the entropy is 0

If p = (1 − 𝑝) = 0.5 then the entropy is 1

Entropy function for two-class problem

E
n
tr

o
p
y

𝑒𝑛𝑡𝑟𝑜𝑝𝑦 = −𝑝 log2 𝑝 − (1 − 𝑝) log2(1 − 𝑝)

Example

𝑒𝑛𝑡𝑟𝑜𝑝𝑦 4,0 = −
4

4
× log2

4

4

0

4
× log2

0

4
=0

𝑒𝑛𝑡𝑟𝑜𝑝𝑦 3,3 = −
3

6
× log2

3

6

3

6
× log2

3

6
= 0.5 + 0.5 = 1

𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛 = 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 (𝑝𝑎𝑟𝑒𝑛𝑡) – 𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑆𝑢𝑚 𝑜𝑓 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 (𝐶ℎ𝑖𝑙𝑑𝑟𝑒𝑛)

Information Gain

Information gain is the difference between the target entropy and the joint entropy

𝑒𝑛𝑡𝑟𝑜𝑝𝑦 = 0.971

outlook

sunny overcast rainy

𝑒𝑛𝑡𝑟𝑜𝑝𝑦 = 0.971 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 = 0

entropy (play)

entropy (outlook)

Information Gain 𝑝𝑙𝑎𝑦, 𝑜𝑢𝑡𝑙𝑜𝑜𝑘 = 0.940 − 0.693 = 0.247

outlook

sunny overcast rainy

The information gain for all the predictor variables (attributes) and play

√ Information Gain (play, outlook) 0.247
Information Gain (play, windy) 0.048

Information Gain (play, humidity) 0.152

Information Gain (play, temperature) 0.029

humidity

normal high

outlook

humidity overcast rainy

sunny

The information gain for other predictor

variables (attributes) and (outlook = sunny)

√ Information Gain (outlook, humidity) 0.971
Information Gain (outlook, windy) 0.020

Information Gain (outlook, temperature) 0.571

humidity

normal high normal

yes

windy

true false

outlook

humidity overcast windy

sunny

high

no

overcast

yes

rainy

false

yes

true

no

Decision tree for the weather dataset

(2 , 0) (0 , 3)

(4 , 0)

(0 , 2) (3 , 0)

From Decision Tree to Decision Rule

Rule1: If (outlook = sunny) and (humidity = normal) then play = yes

Rule2: If (outlook = sunny) and (humidity = high) then play = no

Rule3: If (outlook = overcast) then play = yes

Rule4: If (outlook = rainy) and (windy = false) then play = yes

Rule5: If (outlook = rainy) and (windy = true) then play = no

normal

yes

windy

true false

outlook

humidity overcast windy

sunny

high

no

rainy

false

yes

true

no

humidity

normal high

overcast

yes

Summary

 Naïve Bayes

 Naïve Bayes and nominal attributes

 Bayes’s Rule

 Naïve Bayes and numeric attributes

 Decision Tree

 Information value (entropy)

 Information Gain

 From Decision Tree to Decision Rule

Thank You

dabdalhafeth@Lincoln.ac.uk

mailto:bchen@Lincoln.ac.uk

Appendix: Examples for R libraries

1) Naïve Bayes model

• e1071 https://cran.r-project.org/web/packages/e1071/e1071.pdf

• klaR https://cran.r-project.org/web/packages/klaR/klaR.pdf

2) Decision Tree model

• tree https://cran.r-project.org/web/packages/tree/tree.pdf

• rpart https://cran.r-project.org/web/packages/rpart/rpart.pdf

https://cran.r-project.org/web/packages/e1071/e1071.pdf
https://cran.r-project.org/web/packages/e1071/e1071.pdf
https://cran.r-project.org/web/packages/e1071/e1071.pdf
https://cran.r-project.org/web/packages/e1071/e1071.pdf
https://cran.r-project.org/web/packages/e1071/e1071.pdf
https://cran.r-project.org/web/packages/klaR/klaR.pdf
https://cran.r-project.org/web/packages/klaR/klaR.pdf
https://cran.r-project.org/web/packages/klaR/klaR.pdf
https://cran.r-project.org/web/packages/klaR/klaR.pdf
https://cran.r-project.org/web/packages/klaR/klaR.pdf
https://cran.r-project.org/web/packages/tree/tree.pdf
https://cran.r-project.org/web/packages/tree/tree.pdf
https://cran.r-project.org/web/packages/tree/tree.pdf
https://cran.r-project.org/web/packages/tree/tree.pdf
https://cran.r-project.org/web/packages/tree/tree.pdf
https://cran.r-project.org/web/packages/rpart/rpart.pdf
https://cran.r-project.org/web/packages/rpart/rpart.pdf
https://cran.r-project.org/web/packages/rpart/rpart.pdf
https://cran.r-project.org/web/packages/rpart/rpart.pdf
https://cran.r-project.org/web/packages/rpart/rpart.pdf