CS计算机代考程序代写 Bayesian Naïve Bayes

Naïve Bayes
AIMA 12.6

CMPSC 442
Week 7, Meeting 21, Three Segments

Outline

● Deriving Naive Bayes
● Applying Naive Bayes using Maximum a Posteriori Rule (See AIMA Ch.

20)
● Naive Bayes for Text Classification

2Outline, Wk 7, Mtg 21

Naïve Bayes
AIMA 12.6

CMPSC 442
Week 7, Meeting 20, Segment 1 of 3: Deriving Naïve Bayes

● Assume Toothache,Catch are conditionally independent, given Cavity

● Use conditional independence to compute the probability of Cavity

Example of Conditional Independence (from Mtg 20)

4Deriving Naive Bayes

Deriving Naïve Bayes

1. Product rule applied to a joint probability distribution:
a. P(X1,X2, . . ., Xn, C) = P(X1,X2, . . ., X3|C)P(C)

2. Conditional Independence
a. P(X1,X2, . . ., Xn|C) = P(X1|C)P(X2|C), . . ., P(Xn|C)

3. Substitution
a. P(X1,X2, . . ., Xn, C) = P(X1|C)P(X2|C), . . ., P(Xn|C)P(C)

5Deriving Naive Bayes

Generalization of Conditional Independence

● X and Y are conditionally independent given Z

● Any number of random variables can be conditionally independent
given a single random variable distinct from the rest: the Cause

6Deriving Naive Bayes

Derivation of Naïve Bayes

7Deriving Naive Bayes

Visualizing Naïve Bayes

● Effects e1, …, en assumed conditionally independent given the cause
● Total number of parameters is linear in n

8Deriving Naive Bayes

What Makes Naïve Bayes Naïve?

● Often used when the effects are not conditionally independent
● Can often work well in cases that violate the conditional independence

assumption
○ Text classification
○ Why use a more complicated model that seems more correct if a simpler

(more naïve) model perform as well or better

9Deriving Naive Bayes

Naïve Bayes
AIMA 12.6

CMPSC 442
Week 7, Meeting 20, Segment 2 of 3: Applying Naïve Bayes

Learning the Naïve Bayes Parameters

● Maximum Likelihood estimates
○ Use counts from a training set of data
○ stands for the probability estimate

11Applying Naive Bayes

Example of Naïve Bayes used for Prediction
● NB can be used to “classify”

even when there is no causal
relationship

● Four random variables
conditionally independent, given
PlayTennis
○ Outlook (3 values)
○ Temperature (3 values)
○ Humidity (2 values)
○ Wind (2 values)

● Full Joint Probability = 32 x 22 cases
● Conditional independence gives 20

12Applying Naive Bayes

Learning (Training) Phase: MLE Estimates

13Applying Naive Bayes

Using the Learned Parameters

● Given a new instance:

P(Play|e)=P(Outlook=Sunny, Temperature=Cool, Humidity=High,
Wind=Strong|Play)P(Play)

● Consult conditional probabilities for both cases (Play = Yes; Play = No)

● What does the model predict for P(Play|e)?

14Applying Naive Bayes

What Does the Evidence Predict?

15Applying Naive Bayes

Maximum a Posteriori Decision Rule (MAP)

● Approximately Bayesian
● Foundation for Naïve Bayes classifiers
● Find the most probable hypothesis hi, given the data d

16Applying Naive Bayes

Naïve Bayes
AIMA 12.6

CMPSC 442
Week 7, Meeting 20, Segment 3 of 3: Naïve Bayes for Text
Classification

Naïve Bayes for Text Classification

Classify sentences from journalism into Category = {news, sports, business,
weather, entertainment}
● Compute the probability of Category conditioned on the words W

i
● MLE estimates of prior probabilities of each category, e.g.,

P(Category=weather) = 0.09 if 9% of news sentences pertain to weather
● MLE estimates of conditional probabilities of words, e.g.,

P(Wordi=true|Category = business) for all words Wi where |Wi| > threshold
○ P(W=stock|business) is probably much higher than P(W=stock|weather)

18Naive Bayes for Text Classification

Performance of Naïve Bayes for Text Classification

● Words in running text are not independent, so application of NB for
text classification is naïve

● Nevertheless, NB works well on many text classification tasks
○ The observed probability may be quite different from the NB estimate
○ The NB classification can be correct even if the probabilities are incorrect,

as long as it gets the relative conditional frequency of each class value
correct

19Naive Bayes for Text Classification

Zero Conditional Probability Problem

● What if we have seen no training cases where patient had flu and
muscle aches?

● Then the value of any product using this conditional probability is zero;
none of the other evidence can be considered

20Naive Bayes for Text Classification

Smoothing

● Many words are relatively rare
○ A relatively infrequent word might not occur in one of the text categories
○ A word in the test data might not have been seen in the training data

● If any word is assigned a zero conditional probability for any category,
then the product of conditional probabilities is zero

● Ensure that a conditional probability is assigned for every word in
every class
○ Smooth all conditional probabilities: add a small number to all counts
○ Use a special UNKNOWN token during training and testing; all words not

in the training data can be treated as instances of UNKNOWN

21Naive Bayes for Text Classification

LaPlace Smoothing

● Smoothing is necessary in many applications of NB or other
probabilistic models, especially to text classification
○ Avoid zero counts using one of many smoothing methods
○ LaPlace: Add mp to numerator, m to denominator, of all parameter

estimates, where p is a prior estimate for an unseen value (e.g., 1/t for t
values of Xi), and m is a weight on the prior

22Naive Bayes for Text Classification

Summary

● Naïve Bayes is a classification method for data represented using N
random variables, where N-1 of them are conditionally independent
given the Nth

● The MAP decision rule is used to predict the value of the conditioning
variable, given a new observed example

● Although NB assumes conditional independence in cases where the
random variables are not conditionally independent (e.g., text
classification), it can still perform well

● For text classification, smoothing is critical

23Summary, Wk 7, Mtg 21