DSME5110F: Statistical Analysis
Bayes’ Theorem
and Naïve Bayes Classifier
Copyright By PowCoder代写 加微信 powcoder
• Bayes’ Theorem
– The Monty Hall Problem – Naïve Bayes Classifier
Bayes’ Theorem
• Bayes’ theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.
• Bayes analysis begins with initial or prior probability estimates for specific events of interest. If additional relevant information about these events becomes available, the prior probability values can be updated by calculating revised probabilities, posterior probabilities.
– For example, if the risk of developing health problems is known to increase with age, Bayes’ theorem allows the risk to an individual of a known age to be assessed more accurately than simply assuming that the individual is typical of the population as a whole.
Example 5.1
• You want to hire a student who is hard-working
• The prior probabilities that a student works hard or not are P(𝑊𝑊𝑊𝑊) = 0.80 and P(𝑁𝑁𝑊𝑊𝑊𝑊) = 0.20;
• If a student works hard, the probability that he/she gets an A is 0.90;
• If a student does not work hard, the probability that he/she gets an A is 0.20;
• A candidate gets an A. What is the probability that this candidate is a hard-working student?
Example 5.1 (Continued)
Bayes’ Theorem: Formula • Let𝐴𝐴and𝐵𝐵betwoevents.Then,
P 𝐴𝐴|𝐵𝐵 P 𝐵𝐵 P𝐴𝐴|𝐵𝐵P𝐵𝐵 +P𝐴𝐴|𝐵𝐵𝑐𝑐 P𝐵𝐵𝑐𝑐
where 𝐵𝐵𝑐𝑐 is the complement of 𝐵𝐵.
• More generally, let 𝐵𝐵1 , 𝐵𝐵2 , … 𝐵𝐵𝑛𝑛 be mutually exclusive and collectively exhaustive events, then
P 𝐴𝐴|𝐵𝐵 P 𝐵𝐵
∑𝑛𝑛 P𝐴𝐴|𝐵𝐵𝑖𝑖P𝐵𝐵𝑖𝑖 𝑖𝑖=1
Bayes’ Theorem: Example 5.2
• In a certain state, 25 percent of all cars emit excessive amounts of pollutants. If the probability is 0.99 that a car emitting excessive amounts of pollutants will fail the state’s vehicular emission test, and the probability is 0.17 that a car not emitting excessive amounts of pollutants will nevertheless fail the test, what is the probability that a car which fails the test actually emits excessive amounts of pollutants?
• Solutions:
– Let E be the event that a car emits excessive amounts of pollutants
and 𝐸𝐸𝑐𝑐 is the complement of E. Then, 𝑃𝑃(𝐸𝐸) = 0.25 and 𝑃𝑃 𝐸𝐸𝑐𝑐 = 0.75.
– Le F be the event that a car fails the test. Then, 𝑃𝑃(𝐹𝐹|𝐸𝐸) = 0.99 and 𝑃𝑃(𝐹𝐹|𝐸𝐸𝑐𝑐) = 0.17.
𝑃𝑃𝐸𝐸𝐹𝐹 = 𝑃𝑃𝐹𝐹|𝐸𝐸𝑃𝑃𝐸𝐸 = 0.990.25 =0.66. 𝑃𝑃𝐹𝐹|𝐸𝐸𝑃𝑃𝐸𝐸 +𝑃𝑃𝐹𝐹|𝐸𝐸𝑐𝑐 𝑃𝑃𝐸𝐸𝑐𝑐 0.990.25 +0.170.75
Bayes’ Theorem: Example 5.3
• A mail-order house employs three stock clerks, U, V, and W, who poll items from shelves and assemble them for subsequent verification and packaging. U makes a mistake in an order (gets a wrong item or the wrong quantity) one time in a hundred, V makes a mistake in an order five times in a hundred, and W makes a mistake in an order three times in a hundred. If U, V, and W fill, respectively, 30, 40, and 30 percent of all what are the probabilities that
– a mistake will be made in an order;
– if a mistake is made in an order, the order was filled by U?
– if a mistake is made in an order, the order was filled by V?
• Solutions:
– Let U, V, and W be the event that an order is handled by U, V, and W, respectively. Then, P(U) = 0.3, P(V)
– Let M be the probability of making mistake and 𝑀𝑀𝑐𝑐 is the complement of M. Then, P(M|U) = 0.01, P(M|V) = 0.05, and P(M|W) = 0.03.
= 0.4, and P(W) = 0.3.
– P(M) = P(M ∩ U) + P(M ∩ V) + P(M ∩ W) = P(M|U)P(U) + P(M|V)P(V) + P(M|W)P(W) = 0.01(0.3) + 0.05(0.4) + 0.03(0.3) = 0.032.
– P(U|M) = P(M|U)P(U)/P(M) = (0.01)(0.3)/(0.032) = 0.09375.
– P(V|M) = P(M|V)P(V)/P(M) = (0.05)(0.4) /(0.032) = 0.625.
• Bayes’ Theorem
– The Monty Hall Problem – Naïve Bayes Classifier
The Monty Hall Problem
• One box above has a coin, and you need to guess which box it is • Assume
– youchooseBox1;
– Icanopenoneoftheothertwoboxes(IcannotopenBox1);
– IopenBox2andthereisnocoin;
– Youhavethechancetoswap.
• Question: Do you want to stay with Box 1 or change to Box 3?
• Consider two scenarios:
– Idonotknowwhichboxcontainsthecoin – Iknowwhichboxcontainsthecoin
The Monty Hall Problem
– youstaywithbox1whenyouhavethechancetoswap.
• Define the following three events: – 𝐴𝐴 = {Box 1 contains the coin};
– 𝐴𝐴1={Box 2 contains the coin};
– 𝐴𝐴2={Box 3 contains the coin}.
• Define B= {I open Box 2 and there is no coin};
• Let’s calculate the conditional probability you win:
P 𝐴𝐴1 𝐵𝐵 =P(𝐴𝐴 )P(𝐵𝐵|𝐴𝐴 )+P 𝐴𝐴1 P(𝐵𝐵|𝐴𝐴1)+P 𝐴𝐴 P(𝐵𝐵|𝐴𝐴 ) 1122133
P𝐴𝐴=P𝐴𝐴=P𝐴𝐴= 123
• The prior probabilities: 3
Scenario 1
• I do NOT know which box contains the coin:
– P(𝐵𝐵|𝐴𝐴 )=1;
– P(𝐵𝐵|𝐴𝐴2) = 0;
2 – P(𝐵𝐵|𝐴𝐴3) = 12;
P𝐴𝐴 P𝐵𝐵𝐴𝐴 1 1
1 P𝐴𝐴1 P𝐵𝐵𝐴𝐴1 +P𝐴𝐴2 P𝐵𝐵𝐴𝐴2 +P𝐴𝐴3 P𝐵𝐵𝐴𝐴3
• From Bayes’ Theorem
= 13 × 12 = 1 13 × 12 + 13 × 0 + 13 × 12 2
Scenario 2 • I know which box contains the coin:
– P(𝐵𝐵|𝐴𝐴 )=1;
– P(𝐵𝐵|𝐴𝐴2) = 0;
2 – P(𝐵𝐵|𝐴𝐴3) = 1;
P𝐴𝐴 P𝐵𝐵𝐴𝐴 1 1
1 P𝐴𝐴 P𝐵𝐵𝐴𝐴 +P𝐴𝐴 P𝐵𝐵𝐴𝐴 +P𝐴𝐴 P𝐵𝐵𝐴𝐴
• From Bayes’ Theorem
1 113×122 213 3 = 13 × 12 + 13 × 0 + 13 × 1 = 3
• Bayes’ Theorem
– The Monty Hall Problem – Naïve Bayes Classifier
Motivating Example
Motivating Example
Review: Probability and Conditional Probability
The probability of an event is estimated from the observed data by the proportion
– If it rained 30 out of 100 days with similar conditions as today, the probability of rain today can be estimated as 30/100=0.3
The conditional probability of one event (A) conditional on the fact that another event (B) happened:
P𝐴𝐴𝐵𝐵 =P𝐴𝐴⋂𝐵𝐵 P(𝐵𝐵)
– If B={rainy}, and A={an amber rain alert issued}, then 𝐴𝐴⋂𝐵𝐵={rainy and an amber rain alert issued}
– If P(B)=0.3, and P(𝐴𝐴⋂𝐵𝐵)=0.06, then P(A|B)=0.2
– That is, given it is rainy, the probability of issuing an amber alert is 0.2
Example 5.4: Predicting Fraudulent Financial Reports
• 10 customers of an accounting firm
• For each customer, we have information on whether it had prior legal trouble, whether it is a small or large company, and whether the financial report was found to be fraudulent or truthful
• Calculate the conditional probability of fraud, given each of the four possible combinations {y, small}, {y, large}, {n, small}, {n, large}
Conditional Probabilities
Practical Difficulty with the Direct Computation
• When the number of predictors gets large (even a modest number like 20), many of the records to be classified will be without exact matches.
– For example, even a sizable sample may not contain even a single match for a new record who is a male Hispanic with high income from the US Midwest who voted in the last election, did not vote in the prior election, has three daughters and one son, and is divorced. (Eight variables)
– We will see another example in the R example later.
Review: Bayes Formula
• Suppose there are total 𝑚𝑚 classes for the response variable
𝑌𝑌, and the prior probability for each class P 𝑌𝑌 = 𝑘𝑘 = 𝜋𝜋𝑘𝑘. • The Bayes rule, for a given observation 𝑥𝑥 = (𝑥𝑥1, 𝑥𝑥2, ⋯ , 𝑥𝑥𝑝𝑝),
Pr(𝑌𝑌=𝑘𝑘|𝑋𝑋=𝑥𝑥) = P 𝑌𝑌=𝑘𝑘 P(𝑋𝑋=𝑥𝑥|𝑌𝑌=𝑘𝑘) . ∑𝑚𝑚 P 𝑌𝑌=𝑙𝑙 P(𝑋𝑋=𝑥𝑥|𝑌𝑌=𝑙𝑙)
• Naïve Bayes classifier follows the same Bayes idea but use approximations for P(𝑋𝑋 = 𝑥𝑥|𝑌𝑌 = 𝑘𝑘) and P(𝑋𝑋 = 𝑥𝑥|𝑌𝑌 = 𝑙𝑙)
Pr(𝑌𝑌 = 𝑘𝑘|𝑋𝑋 = 𝑥𝑥) =
P 𝑌𝑌 = 𝑘𝑘 ∏𝑝𝑝 P(𝑋𝑋𝑖𝑖 = 𝑥𝑥𝑖𝑖|𝑌𝑌 = 𝑘𝑘) 𝑖𝑖=1
Naïve Bayes Formula
• Again, Suppose there are total 𝑚𝑚 classes for the response
variable 𝑌𝑌, and the prior probability for each class
P𝑌𝑌=𝑘𝑘 =𝜋𝜋𝑘𝑘.
Naïve Bayes classifier (recall 𝑥𝑥 = (𝑥𝑥1, 𝑥𝑥2, ⋯ , 𝑥𝑥𝑝𝑝)):
• The above probabilities are easier to estimate from the
∑𝑚𝑚 P𝑌𝑌=𝑙𝑙∏𝑝𝑝 P(𝑋𝑋𝑖𝑖=𝑥𝑥𝑖𝑖|𝑌𝑌=𝑙𝑙) 𝑙𝑙=1 𝑖𝑖=1
training data.
• Then, for a new observation X, Naïve Bayes classifier assigns it to the class that gives the largest probability from the above equation.
Example 5.4: Predicting Fraudulent Financial Reports
• Compare these naïve Bayes probabilities to the exact probabilities. – Note how close these naïve Bayes probabilities are to the exact Bayes
• It is often the case that the rank ordering of probabilities is even closer to the exact Bayes method than the probabilities themselves
• For classification purposes it is the rank orderings that matter.
Probabilities
Naïve Bayes Classifier in R
• Use the naiveBayes() function in the “e1071” package
• Stage 1: Building the classifier:
– m <- naiveBayes(formula, data) or naiveBayes(train, class)
– formula is a formula. Interactions are not allowed.
– data is a data frame (the training data)
– train is a data frame or a matrix containing training data
– class is a factor vector with the class for each row in the training data
• Stage 2: Making predictions:
– p <- predict(m, test, type = "raw")
– m is a model trained by the naiveBayes() function
– test is a data frame or matrix containing test data with the same features/predictors as the training data used to build the classifier
– type is either “class” or “raw” and specifies whether the predictions should be the most likely class value or the raw predicted probabilities
• The function will return a vector of predicted class values or raw predicted probabilities depending upon the value of the type parameter.
Advantages and Shortcomings of the Naïve Bayes Classifier
Advantages
Disadvantages
Simplicity
Requires a large number of records to obtain good results
Computational efficiency
Not helpful for rare predictors
Effective (good performance)
Estimated probabilities are less reliable than the predicted classes
Example 5.5: Delayed Airplanes • A dataset:
– 2201 flights from Washington DC into NYC during January 2004 (www.transtats.bts.gov)
• The percent of delayed flights: 19.5%
– Independent variable: Flight Status (delayed (0) or ontime (1))
– Predictors (others such as distance and arrival time is irrelevant and deleted)
• Goal: whether a new flight will be delayed or not
• Denote the following by 𝑎𝑎1:
Some Calculations
P Delayed ∗ P Carrier = DL Delayed ∗ P Day_Week = 7 Delayed
∗ P Dep_Time = 10 Delayed ∗ P Dest = LGA Delayed ∗ P Origin = DCA Delayed
= 0.2045455 ∗ 0.11851852 ∗ 0.17037037 ∗ 0.025925926 ∗ 0.4518519 ∗ 0.51851852
• Denote the following by 𝑎𝑎2:
P Ontime ∗ P Carrier = DL Ontime ∗ P Day_Week = 7 Ontime ∗ P(Dep_Time = 10|Ontime) ∗ P(Dest = LGA|Ontime) ∗ P(Origin
= DCA|Ontime) = 0.7954545∗0.19333333∗0.10857143∗0.042857143∗0.5514286∗0.64761905
P Delayed|Carrier = DL, Day_Week = 7, Dep_Time = 10, Dest = LGA, Origin = DCA = 𝑎𝑎1/(𝑎𝑎1 + 𝑎𝑎2)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com