IT enabled Business Intelligence, CRM, Database Applications
Sep-18
Classification
Exact Bayes & Naïve Bayes
Prof. Vibs Abhishek
The Paul Merage School of Business
University of California, Irvine
BANA 273 Session 4
1
Agenda
Classification using Exact Bayes & Naïve Bayes
Reminders
Assignment 1 due on Canvas
Assignment 2 posted
Project proposal (1 para) due soon (check Canvas for all due dates)
Project guidelines posted to Canvas (Announcements page)
2
Big Picture View of Course Progress
Databases, Data Warehousing, SQL
RFM & Pivot Tables
Classification
Bayesian (Naïve Bayes)
Decision Tree (ID3)
Association Rules
Apriori
Clustering
K Means
3
A classic: Microsoft’s Paperclip
4
Exact Bayes
For each record to be classified:
Find all other records just like it (i.e. where all the predictor values are the same)
Determine what classes they all belong to and which class is more prevalent
Assign that class to the new record
Thomas Bayes
5
Predict class attribute “Play” using Exact Bayes
No
True
High
Mild
Rainy
Yes
False
Normal
Hot
Overcast
Yes
True
High
Mild
Overcast
Yes
True
Normal
Mild
Sunny
Yes
False
Normal
Mild
Rainy
Yes
False
Normal
Cool
Sunny
No
False
High
Mild
Sunny
Yes
True
Normal
Cool
Overcast
No
True
Normal
Cool
Rainy
Yes
False
Normal
Cool
Rainy
Yes
False
High
Mild
Rainy
Yes
False
High
Hot
Overcast
No
True
High
Hot
Sunny
No
False
High
Hot
Sunny
Play
Windy
Humidity
Temp
Outlook
?
True
High
Cool
Sunny
Play
Windy
Humidity
Temp.
Outlook
?
False
High
Hot
Sunny
Play
Windy
Humidity
Temp.
Outlook
Notes
Bayesian classifier works best with categorical attributes
Unlikely to find exact matches for numerical variables
Numerical attributes must be binned and converted to categorical attributes
When the number of attributes is large (say 20), it becomes hard to find exact matches
7
Exact Bayes – Cutoff Probability Method
Establish a cutoff probability for the class of interest above which we consider that a record belongs to that class
Find all the training records just like the new record
Determine the probability that those records belong to the class of interest
If that probability is above the cutoff probability, assign the new record to the class of interest
8
Example – Exact Bayes
Sunny Overcast Rainy Total
Play=Yes 2 3 2 7
Play=No 3 9 4 16
Total 5 12 6 23
P(Play=Yes | outlook=sunny) = 40%
P(Play=Yes | outlook=overcast) = 25%
P(Play=Yes | outlook=rainy) = 33%
Conclusion: No matter what the outlook, predict Play = No
Cutoff probability method: Specify cutoff probability p
If Probability(Play=Yes | outlook = ?) > p then predict Play = Yes
Suppose p = 37%
Under what outlook would we forecast play = Yes?
9
Sep-18
Classification
Using Naïve Bayes
Prof. Vibs Abhishek
The Paul Merage School of Business
University of California, Irvine
10
Rules of probability: P(A1,…,Ap | B=1) = P(A1|B=1) * P(A2|B=1) * … * P(Ap|B=1)
This is correct only if the events A1,…,Ap are __________
Conditional Probability
Let’s start by assuming that they are, then:
P(Outlook=Sunny, Temp=High| Play=Yes) =
P(Outlook=Sunny| Play=Yes) * P(Temp=High| Play=Yes)
11
Apply Bayes’ Rule
B = the event “Play=Yes”
A = the event “Outlook = Sunny and Temp = High”
12
Meaning of conditional independence
P(outlook=sunny,Temp=High | Yes) with
P(outlook=sunny|Yes) * P(Temp=High | Yes)
This means that we are assuming conditional independence between outlook and Temp
If the conditional dependence is not extreme, it will work reasonably well
13
Probabilities for weather data
5/
14
5
No
9/
14
9
Yes
Play
3/5
2/5
3
2
No
3/9
6/9
3
6
Yes
True
False
True
False
Windy
1/5
4/5
1
4
No
Yes
No
Yes
No
Yes
6/9
3/9
6
3
Normal
High
Normal
High
Humidity
1/5
2/5
2/5
1
2
2
3/9
4/9
2/9
3
4
2
Cool
2/5
3/9
Rainy
Mild
Hot
Cool
Mild
Hot
Temperature
0/5
4/9
Overcast
3/5
2/9
Sunny
2
3
Rainy
0
4
Overcast
3
2
Sunny
Outlook
No
True
High
Mild
Rainy
Yes
False
Normal
Hot
Overcast
Yes
True
High
Mild
Overcast
Yes
True
Normal
Mild
Sunny
Yes
False
Normal
Mild
Rainy
Yes
False
Normal
Cool
Sunny
No
False
High
Mild
Sunny
Yes
True
Normal
Cool
Overcast
No
True
Normal
Cool
Rainy
Yes
False
Normal
Cool
Rainy
Yes
False
High
Mild
Rainy
Yes
False
High
Hot
Overcast
No
True
High
Hot
Sunny
No
False
High
Hot
Sunny
Play
Windy
Humidity
Temp
Outlook
Terminology
Frequency Chart also called contingency table (on previous slide)
Probability chart
Create the chart using Microsoft Excel – Pivot Table
How to open ARFF file in Excel?
Launch Excel, Open File, Delimited, comma delimited
Can also use SQL to compute entries in table.
5/
14
5
No
9/
14
9
Yes
Play
3/5
2/5
3
2
No
3/9
6/9
3
6
Yes
True
False
True
False
Windy
1/5
4/5
1
4
No
Yes
No
Yes
No
Yes
6/9
3/9
6
3
Normal
High
Normal
High
Humidity
1/5
2/5
2/5
1
2
2
3/9
4/9
2/9
3
4
2
Cool
2/5
3/9
Rainy
Mild
Hot
Cool
Mild
Hot
Temperature
0/5
4/9
Overcast
3/5
2/9
Sunny
2
3
Rainy
0
4
Overcast
3
2
Sunny
Outlook
?
True
High
Cool
Sunny
Play
Windy
Humidity
Temp.
Outlook
A new day:
Probabilities for weather data
Weather data example
?
True
High
Cool
Sunny
Play
Windy
Humidity
Temp.
Outlook
Evidence E
Pr [yes∣E]=Pr [Outlook=Sunny∣yes]×Pr [Temperature=Cool∣yes]
×Pr [Humidity=High∣yes]×Pr [Windy=True∣yes]×Pr [yes ]/Pr [E]
5/
14
5
No
9/
14
9
Yes
Play
3/5
2/5
3
2
No
3/9
6/9
3
6
Yes
True
False
True
False
Windy
1/5
4/5
1
4
No
Yes
No
Yes
No
Yes
6/9
3/9
6
3
Normal
High
Normal
High
Humidity
1/5
2/5
2/5
1
2
2
3/9
4/9
2/9
3
4
2
Cool
2/5
3/9
Rainy
Mild
Hot
Cool
Mild
Hot
Temperature
0/5
4/9
Overcast
3/5
2/9
Sunny
2
3
Rainy
0
4
Overcast
3
2
Sunny
Outlook
Weather data example
?
True
High
Cool
Sunny
Play
Windy
Humidity
Temp.
Outlook
Evidence E
Pr [no∣E]=Pr [Outlook=Sunny∣no]×Pr [Temperature=Cool∣ no]
×Pr [Humidity=High∣ no]×Pr [Windy=True∣ no]×Pr [no ]/Pr [E]
5/
14
5
No
9/
14
9
Yes
Play
3/5
2/5
3
2
No
3/9
6/9
3
6
Yes
True
False
True
False
Windy
1/5
4/5
1
4
No
Yes
No
Yes
No
Yes
6/9
3/9
6
3
Normal
High
Normal
High
Humidity
1/5
2/5
2/5
1
2
2
3/9
4/9
2/9
3
4
2
Cool
2/5
3/9
Rainy
Mild
Hot
Cool
Mild
Hot
Temperature
0/5
4/9
Overcast
3/5
2/9
Sunny
2
3
Rainy
0
4
Overcast
3
2
Sunny
Outlook
Normalize…
Pr[Yes ∣ E] + Pr [No ∣ E] = 1
Play can be either “Yes” or “No”
19
Example of Naïve Bayes Classifier
A: attributes
M: mammals
N: non-mammals
Example of Naïve Bayes Classifier
A: attributes
M: mammals
N: non-mammals
P(A|M)P(M) > P(A|N)P(N)
=> Mammals
Degenerate Probabilities (Pr[Outlook=Overcast|No)=0
Could be a “true” representation of the real-world
Of course, one does not have to worry in that case
Rare
The training data set is not big enough
Is it EVER possible to have “Outlook=rainy” when “Play=no”?
If the answer is yes, a larger data set would have captured that fact
What does one do when data set is not big enough?
We treat degeneracy seriously and try to remove it
Laplace approach
22
The “zero-frequency problem”
Why does degeneracy matter?
(e.g. “Humidity = high” for class “yes”)
Probability will be zero!
(No matter how likely the other values are!)
Remedy: add 1 to the count for every attribute value-class combination (Laplace estimator)
Result: probabilities will never be zero!
(also: stabilizes probability estimates)
Pr [Humidity=High∣yes]=0
Pr [yes∣E]=0
23
Pretend that we add 3 rows of data containing only columns Outlook and Play:
All 3 rows have play=no
1 row with Outlook = Sunny, 2nd with Outlook = Overcast and 3rd with Outlook = Rainy. See resulting change in conditional probabilities below. This eliminates the degenerate probability:
© Prof. V Choudhary, September 18
5/
14
5
No
9/
14
9
Yes
Play
3/5
2/5
3
2
No
3/9
6/9
3
6
Yes
True
False
True
False
Windy
1/5
4/5
1
4
No
Yes
No
Yes
No
Yes
6/9
3/9
6
3
Normal
High
Normal
High
Humidity
1/5
2/5
2/5
1
2
2
3/9
4/9
2/9
3
4
2
Cool
2/5
3/9
Rainy
Mild
Hot
Cool
Mild
Hot
Temperature
0/5
4/9
Overcast
3/5
2/9
Sunny
2
3
Rainy
0
4
Overcast
3
2
Sunny
Outlook
8/
17
8
No
9/
17
9
Yes
Play
3/5
2/5
3
2
No
3/9
6/9
3
6
Yes
True
False
True
False
Windy
1/5
4/5
1
4
No
Yes
No
Yes
No
Yes
6/9
3/9
6
3
Normal
High
Normal
High
Humidity
1/5
2/5
2/5
1
2
2
3/9
4/9
2/9
3
4
2
Cool
3/8
3/9
Rainy
Mild
Hot
Cool
Mild
Hot
Temperature
1/8
4/9
Overcast
4/8
2/9
Sunny
3
3
Rainy
1
4
Overcast
4
2
Sunny
Outlook
Modified probability estimates
In some cases, the number of rows to be added may need to be different from 3. In a more general setting we add μ rows.
Example: attribute outlook for class Play=No
Sunny
Overcast
Rainy
=(3+μ/3)/μ
=(0+μ/3)/μ
=(2+μ/3)/μ
25
Testing for Independence OPTIONAL
(Information Theoretic Testing)
Let A and B be two random variables
Let D(A,B) = (H(A) + H(B) – H(A,B))/H(A,B)
If A and B are independent
H(A,B) = H(A) + H(B)
D(A,B) = 0; this is the minimum
If A and B are linearly related (perfectly correlated)
H(A,B) = H(A) = H(B)
D(A,B) = 1; this is the maximum
If D() value is close to zero, assume independence
No need for looking up of statistical tables
Easy to implement
26
Piecing it all together
We want to estimate P(Y=1 | X1,…,Xp)
But we don’t have enough examples of each possible profile X1…, Xp in the training set
If we had instead P(X1,…,Xp | Y=1), we could separate it to
P(X1|Y=1) ּP(X2|Y=1) ּּּP(Xp|Y=1)
True if we can assume (conditional) independence between X1,…,Xp within each class
27
Piecing it all together
Proportion of rows with that predictor combination in the training set
Proportion of Play=Yes in training set
Use the cutoff to determine classification of this observation. Default: cutoff = 0.5 (classify to group that is most likely)
28
Advantages and Disadvantages
The good
Simple
Can handle large amount of predictors
High performance accuracy
Pretty robust to independence assumption
The bad
Need to categorize continuous predictors
Predictors with “rare” categories -> zero probability (Use Laplace fix)
No insight about importance/role of each predictor
29
What is the probability of Play=Yes | Humidity=Normal
and what would you predict for Play?
Humidity High Humidity Normal Total
Play=Yes 5 7 12
Play=No 7 12 19
Total 12 19 31
A: 5/12, Predict Play = Yes
B: 7/19, Predict Play = Yes
C: 5/12, Predict Play = No
D: 7/19, Predict Play = No
E: None of the above
30
Naive Bayes works better with categorical data because
A: It takes less time to compute probabilities for categorical data
B: It cannot compute the distance between different values for categorical data
C: It needs the predictor values to match to some rows to compute accurate conditional probabilities
D: Numeric data slows down the computation too much
E: None of the above
31
Data Preprocessing using Weka
Follow steps on the following page:
http://facweb.cs.depaul.edu/mobasher/classes/ect584/WEKA/preprocess.html
File conversion and opening text files in different applications
Excel, WordPad/TextEdit, Weka
CSV (text), XLSX (binary), ARFF (text)
Weka
Run Naïve Bayes Classifier on cleaned and binned version of 4bank-data.csv
RFM, Pivot Tables and London Jets Data
http://www.dbmarketing.com/articles/Art149.htm
London Jets Data in Excel format posted on Canvas for RFM analysis and Pivot tables.
Do RFM analysis on this data
Think about strategies that London Jets could use to revive their fortunes
Go to http://office.microsoft.com/en-us/
Search for “Pivot Table” and read up on creating and using them
34
Next Session
Testing and Validation
35
)
(
)
(
)
|
(
)
|
(
A
P
B
P
B
A
P
A
B
P
=
P(layYes | utlooksunny,empHigh)
POT
====
P(utlooksunny, empHigh | layYes)P(layYes
)
P(utlooksunny, empHigh)
OTPP
OT
===×=
=
==
P(utlooksunny | layYes)P(empHigh| PlayYe
s)P(layYes)
P(utlooksunny, empHigh)
OPTP
OT
==×==×=
=
==
23339
0.0053
999914
Pr[]Pr[]
EE
××××
==
31435
0.0206
555514
Pr[]Pr[]
EE
××××
==
0.00530.00530.0206
Pr[|]/
Pr[]Pr[]Pr[]
0.0053
Pr[|]0.205
0.00530.0206
YesE
EEE
YesE
æö
=+
ç÷
èø
==
+
0.00530.0206
1
Pr[]Pr[]
EE
+=
0.0206
Pr[|]0.795
0.00530.0206
NoE
==
+
NameGive BirthCan FlyLive in WaterHave LegsClass
humanyesnonoyesmammals
pythonnononononon-mammals
salmonnonoyesnonon-mammals
whaleyesnoyesnomammals
frognonosometimesyesnon-mammals
komodonononoyesnon-mammals
batyesyesnoyesmammals
pigeonnoyesnoyesnon-mammals
catyesnonoyesmammals
leopard sharkyesnoyesnonon-mammals
turtlenonosometimesyesnon-mammals
penguinnonosometimesyesnon-mammals
porcupineyesnonoyesmammals
eelnonoyesnonon-mammals
salamandernonosometimesyesnon-mammals
gila monsternononoyesnon-mammals
platypusnononoyesmammals
owlnoyesnoyesnon-mammals
dolphinyesnoyesnomammals
eaglenoyesnoyesnon-mammals
Give BirthCan FlyLive in WaterHave LegsClass
yesnoyesno?
animals2
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no no yes mammals
python no yes no no no reptiles
salmon no yes no yes no fishes
whale yes no no yes no mammals
frog no yes no sometimes yes amphibians
komodo no yes no no yes reptiles
bat yes no yes no yes mammals
pigeon no yes yes no yes birds
cat yes no no no yes mammals
leopard shark yes no no yes no fishes
turtle no yes no sometimes yes reptiles
penguin no yes no sometimes yes birds
porcupine yes no no no yes mammals
eel no yes no yes no fishes
salamander no yes no sometimes yes amphibians
gila monster no yes no no yes reptiles
platypus no yes no no yes mammals
owl no yes yes no yes birds
dolphin yes no no yes no mammals
eagle no yes yes no yes birds
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no no yes mammals
python no yes no no no non-mammals
salmon no yes no yes no non-mammals
whale yes no no yes no mammals
frog no yes no sometimes yes non-mammals
komodo no yes no no yes non-mammals
bat yes no yes no yes mammals
pigeon no yes yes no yes non-mammals
cat yes no no no yes mammals
leopard shark yes no no yes no non-mammals
turtle no yes no sometimes yes non-mammals
penguin no yes no sometimes yes non-mammals
porcupine yes no no no yes mammals
eel no yes no yes no non-mammals
salamander no yes no sometimes yes non-mammals
gila monster no yes no no yes non-mammals
platypus no yes no no yes mammals
owl no yes yes no yes non-mammals
dolphin yes no no yes no mammals
eagle no yes yes no yes non-mammals
animals2
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no no yes mammals
python no yes no no no reptiles
salmon no yes no yes no fishes
whale yes no no yes no mammals
frog no yes no sometimes yes amphibians
komodo no yes no no yes reptiles
bat yes no yes no yes mammals
pigeon no yes yes no yes birds
cat yes no no no yes mammals
leopard shark yes no no yes no fishes
turtle no yes no sometimes yes reptiles
penguin no yes no sometimes yes birds
porcupine yes no no no yes mammals
eel no yes no yes no fishes
salamander no yes no sometimes yes amphibians
gila monster no yes no no yes reptiles
platypus no yes no no yes mammals
owl no yes yes no yes birds
dolphin yes no no yes no mammals
eagle no yes yes no yes birds
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no no yes mammals
python no yes no no no non-mammals
salmon no yes no yes no non-mammals
whale yes no no yes no mammals
frog no yes no sometimes yes non-mammals
komodo no yes no no yes non-mammals
bat yes no yes no yes mammals
pigeon no yes yes no yes non-mammals
cat yes no no no yes mammals
leopard shark yes no no yes no non-mammals
turtle no yes no sometimes yes non-mammals
penguin no yes no sometimes yes non-mammals
porcupine yes no no no yes mammals
eel no yes no yes no non-mammals
salamander no yes no sometimes yes non-mammals
gila monster no yes no no yes non-mammals
platypus no yes no no yes mammals
owl no yes yes no yes non-mammals
dolphin yes no no yes no mammals
eagle no yes yes no yes non-mammals
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no yes no ?
0027
.
0
20
13
004
.
0
)
(
)
|
(
021
.
0
20
7
06
.
0
)
(
)
|
(
0042
.
0
13
4
13
3
13
10
13
1
)
|
(
06
.
0
7
2
7
2
7
6
7
6
)
|
(
=
´
=
=
´
=
=
´
´
´
=
=
´
´
´
=
N
P
N
A
P
M
P
M
A
P
N
A
P
M
A
P
animals2
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no no yes mammals
python no yes no no no reptiles
salmon no yes no yes no fishes
whale yes no no yes no mammals
frog no yes no sometimes yes amphibians
komodo no yes no no yes reptiles
bat yes no yes no yes mammals
pigeon no yes yes no yes birds
cat yes no no no yes mammals
leopard shark yes no no yes no fishes
turtle no yes no sometimes yes reptiles
penguin no yes no sometimes yes birds
porcupine yes no no no yes mammals
eel no yes no yes no fishes
salamander no yes no sometimes yes amphibians
gila monster no yes no no yes reptiles
platypus no yes no no yes mammals
owl no yes yes no yes birds
dolphin yes no no yes no mammals
eagle no yes yes no yes birds
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no no yes mammals
python no yes no no no non-mammals
salmon no yes no yes no non-mammals
whale yes no no yes no mammals
frog no yes no sometimes yes non-mammals
komodo no yes no no yes non-mammals
bat yes no yes no yes mammals
pigeon no yes yes no yes non-mammals
cat yes no no no yes mammals
leopard shark yes no no yes no non-mammals
turtle no yes no sometimes yes non-mammals
penguin no yes no sometimes yes non-mammals
porcupine yes no no no yes mammals
eel no yes no yes no non-mammals
salamander no yes no sometimes yes non-mammals
gila monster no yes no no yes non-mammals
platypus no yes no no yes mammals
owl no yes yes no yes non-mammals
dolphin yes no no yes no mammals
eagle no yes yes no yes non-mammals
animals2
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no no yes mammals
python no yes no no no reptiles
salmon no yes no yes no fishes
whale yes no no yes no mammals
frog no yes no sometimes yes amphibians
komodo no yes no no yes reptiles
bat yes no yes no yes mammals
pigeon no yes yes no yes birds
cat yes no no no yes mammals
leopard shark yes no no yes no fishes
turtle no yes no sometimes yes reptiles
penguin no yes no sometimes yes birds
porcupine yes no no no yes mammals
eel no yes no yes no fishes
salamander no yes no sometimes yes amphibians
gila monster no yes no no yes reptiles
platypus no yes no no yes mammals
owl no yes yes no yes birds
dolphin yes no no yes no mammals
eagle no yes yes no yes birds
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no no yes mammals
python no yes no no no non-mammals
salmon no yes no yes no non-mammals
whale yes no no yes no mammals
frog no yes no sometimes yes non-mammals
komodo no yes no no yes non-mammals
bat yes no yes no yes mammals
pigeon no yes yes no yes non-mammals
cat yes no no no yes mammals
leopard shark yes no no yes no non-mammals
turtle no yes no sometimes yes non-mammals
penguin no yes no sometimes yes non-mammals
porcupine yes no no no yes mammals
eel no yes no yes no non-mammals
salamander no yes no sometimes yes non-mammals
gila monster no yes no no yes non-mammals
platypus no yes no no yes mammals
owl no yes yes no yes non-mammals
dolphin yes no no yes no mammals
eagle no yes yes no yes non-mammals
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no yes no ?
1
1
1
12
1
(,…,|1)(1)
(1|,…,)
(,…,)
(|1)(|1)(|1)(1)
(,…,)
p
p
p
p
p
PXXYPY
PYXX
PXX
PXYPXYPXYPY
PXX
==
==
=×==×=
»
L
/docProps/thumbnail.jpeg