AI1/AI&ML – Naïve Bayes
Aims of the Session
This session aims to help you:
Copyright By PowCoder代写 加微信 powcoder
§ Describe the fundamental concepts in probability theory § Explain Bayes’ Theorem and its application in ML
§ Apply Naïve Bayes to classification for categorical and numerical independent variables
§ Fundamental concepts in Probability Theory
§ Bayes’ Theorem
§ Naïve Bayes for Categorical Independent Variables § Naïve Bayes for Numerical Independent Variables
Fundamental Concepts in Probability Theory
§ Probabilistic model: a mathematical description of an uncertain situation. The two main elements of a probabilistic model are:
• The sample space Ω, which is the set of all possible outcomes
• The probability law, which assigns to a set A of possible outcomes (called an
event) a nonnegative number 𝑃(𝐴) (called the probability of 𝐴)
§ Every probabilistic model involves an underlying process, called the
experiment, that produces exactly one of several possible outcomes § A subset of the sample space Ω is called an event
Example: Toss of a Coin
§ Consider the following experiment a single toss of a fair coin
• The sample space Ω: head (H) or tail (T)
• The probability law: 𝑃 𝐻 = 0.5 (called the probability of H), 𝑃 𝑇 = 0.5
Example: Toss of a Coin
§ Consider the following experiment a single toss of a fair coin
• The sample space Ω: head (H) or tail (T)
• The probability law: 𝑃 𝐻 = 0.5 (called the probability of H), 𝑃 𝑇 = 0.5
§ Let us now consider the experiment consisting of 3 coin tosses. What is the probability of having exactly 2 heads? What about exactly 1 head?
Example: Toss of a Coin
§ Consider the following experiment a single toss of a fair coin
• The sample space Ω: head (H) or tail (T)
• The probability law: 𝑃 𝐻 = 0.5 (called the probability of H), 𝑃 𝑇 = 0.5
§ Let us now consider the experiment consisting of 3 coin tosses. What is
the probability of having exactly 2 heads? What about exactly 1 head?
§ Repeat with the biased coin: 𝑃 𝐻
Probability Axioms
§ Nonnegativity: P(A) ≥ 0, for every event A
§ Additivity: If A and B are two disjoint events, then the probability of
their union satisfies: P(A ∪ B) = P(A) + P(B)
§ Normalisation: The probability of the entire sample space is equal to 1, namely𝑃𝛺 =1
(Discrete) Random Variables
§ Given an experiment and the corresponding sample space, a random variable maps a particular number with each outcome
§ Mathematically, a random variable X is a real-valued function of the experimental outcome
Image: taken from Introduction to Probability (Fig. 2.1 (b))
Probability Mass Function (PMF)
§ The probability mass function (PMF) captures the probabilities of the values that a (discrete) random variable can take
§ Let us consider the previous example: P(X = 1) = 1/16
Image: taken from Introduction to Probability (Fig. 2.1 (b))
Probability Mass Function (PMF)
§ The probability mass function (PMF) captures the probabilities of the values that a (discrete) random variable can take
§ Let us consider the previous example: P(X = 1) = 1/16
P(X = 2) = 3/16
P(X = 3) = 5/16
P(X = 4) = 7/16
Image: taken from Introduction to Probability (Fig. 2.2 (b))
§ Random variables are usually indicated with uppercase letters, e.g., X or Temperature or Infection
§Thevaluesareindicatedwithlowercaseletters,e.g.,𝑋∈ 𝑡𝑟𝑢𝑒,𝑓𝑎𝑙𝑠𝑒 or 𝐼𝑛𝑓𝑒𝑐𝑡𝑖𝑜𝑛 ∈ {𝑙𝑜𝑤, 𝑚𝑜𝑑𝑒𝑟𝑎𝑡𝑒, h𝑖𝑔h}
§ Vectors are usually indicated with bold letters or a small arrow above the letter, e.g., 𝑿 or 𝑋⃗
§PMFisusuallyindicatedbythesymbol𝑝! 𝑥
Unconditional/Conditional Probability Distributions
§ An unconditional (or prior) probability distribution gives us the probabilities of all possible events without knowing anything else about the problem, e.g., the maximum value of two rolls of a 4-sided die
§𝑷𝑋={“,$,#,%} “# “# “# “#
§ A conditional (or posterior) probability distribution gives us the probability of all possible events with some additional knowledge, e.g., the maximum value of two rolls of a 4-sided die knowing that the first roll is 3
§ 𝑷 𝑋 | 𝑋 ” = 3 = { 0 , 0 , $& , “& }
Joint Probability Distributions
§ A joint probability distribution is the probability distribution associated to all combinations of the values of two or more random variables
§ This is indicated by commas, e.g., 𝑷 𝑋, 𝑌 or 𝑷 𝑇𝑜𝑜𝑡h𝑎𝑐h𝑒, 𝐶𝑎𝑣𝑖𝑡𝑦
§ We can calculate the joint probability distribution by using the product
rule as in the following:
𝑷𝑋,𝑌 =𝑷𝑋 𝑌)𝑷𝑌 =𝑷𝑌 𝑋)𝑷(𝑋)
Mean, Variance and Standard Deviation
§ The mean (or expected value or expectation), also indicated by 𝜇, of a random variable 𝑋 with PMF 𝑝!(𝑥) represents the centre of gravity of the PMF:
𝐄𝑋 =∑”𝑥𝑝!(𝑥)
§ E.g., let us consider the random variable X, i.e., the roll of a 4-sided die. The meaniscalculatedas:𝐄 𝑋 =1*1⁄4+2*1⁄4+3*1⁄4+4*1⁄4=2.5
§ The variance of a random variable 𝑋 provides a measure of the dispersion around the mean:
𝑣𝑎𝑟𝑋 =-𝑥−𝐸𝑋 #𝑝! 𝑥 ”
§ The standard deviation is another measure of dispersion: 𝜎! = 𝑣𝑎𝑟(𝑥)
Continuous Random Variables
§ A random variable 𝑋 is called continuous if its probability law can be described in terms of a nonnegative function 𝑓!. This function is called probability density function (PDF) and is the equivalent of the PMF for discrete random variables
𝑃𝑋∈𝐵 =5𝑓! 𝑥𝑑𝑥 $
§ Since we are dealing with continuous variables, there are an infinite number of values that 𝑋 can take
§ As for the discrete case, also for continuous random variables we can have unconditional, conditional and joint probability distributions
Example: Random Number Generator
§ As an example, let us consider a random number generator that returns a random value between 0 and 1: 𝑋 ∈ [0,1]
§ And let us model it with a Gaussian (or normal) distribution ” ! “!# $
𝑃𝑋=𝑎𝜇,𝜎’)=(√(‘+)𝑒$%$ ,
where 𝜇 is the mean and 𝜎’ is the variance Also, recall that 𝜋 = 3.14159 and e = 2.71828
§ Fundamental concepts in Probability Theory
§ Bayes’ Theorem
§ Naïve Bayes for Categorical Independent Variables § Naïve Bayes for Numerical Independent Variables
Bayes’ Theorem
§ Recall the product rule for a joint probability distribution of independent variable(s) 𝑋 and dependent variable 𝑌:
𝑷𝑋,𝑌 =𝑷𝑋 𝑌)𝑷𝑌 =𝑷𝑌 𝑋)𝑷(𝑋)
§ By taking the second and last term from the above equation and rearranging, we get:
𝑷𝑋 𝑌)=𝑷𝑌 𝑋)𝑷𝑋 𝑷(𝑌)
§ The above equation is known as Bayes’ Theorem (also Bayes’ rule or Bayes’ law)
ML: Probabilistic Inference
§ Our ML task consists in computing the posterior probabilities for query propositions given some observed evidence: this method is probabilistic inference
§ We use Bayes’ Theorem to make predictions about an underlying process given a knowledge base consisting of the data produced by this process
Equivalent Terminology
§ Input attribute, independent variable, input variable
§ Output attribute, dependent variable, output variable, label
(classification)
§ Predictive model, classifier (classification), or hypothesis (statistical learning)
§ Learning a model, training a model, building a model
§ Training examples, training data
§ Example, observation, data point, instance (more frequently used for test examples)
§ 𝑃(𝑎,𝑏) = 𝑃(𝑎𝑎𝑛𝑑𝑏) = 𝑃(𝑎 ∧ 𝑏)
Learning Probabilities
§ Consider the training set
Sunny (𝑋!)
Windy (𝑋”)
Tennis (𝑌)
Learning Probabilities
§ Consider the training set
Sunny (𝑋!)
Windy (𝑋”)
Tennis (𝑌)
§ Let us build the model for one independent variable, e.g., Windy (𝑋’)
Frequency Table
Tennis = yes
Tennis = no
Windy = yes
Windy = no
Learning Probabilities
§ Consider the training set
Sunny (𝑋!)
Windy (𝑋”)
Tennis (𝑌)
§ Let us build the model for one independent variable, e.g., Windy (𝑋’)
Frequency Table
Tennis = yes
Tennis = no
Windy = yes
Windy = no
Learning Probabilities
§ Consider the training set
Sunny (𝑋!)
Windy (𝑋”)
Tennis (𝑌)
§ Let us build the model for one independent variable, e.g., Windy (𝑋’)
Frequency Table
Tennis = yes
Tennis = no
Windy = yes
Windy = no
Learning Probabilities
§ Consider the training set
Sunny (𝑋!)
Windy (𝑋”)
Tennis (𝑌)
§ Let us build the model for one independent variable, e.g., Windy (𝑋’)
Frequency Table
Tennis = yes
Tennis = no
Windy = yes
Windy = no
Learning Probabilities (continued)
P(Windy=yes|Tennis=yes) = P(Windy=no|Tennis=yes) = P(Windy=yes|Tennis=no) = P(Windy=no|Tennis=no) =
Frequency Table
Tennis = yes
Tennis = no
Windy = yes
Windy = no
Learning Probabilities (continued)
P(Windy=yes|Tennis=yes) = 1/3 P(Windy=no|Tennis=yes) = 2/3 P(Windy=yes|Tennis=no) = 2/3 P(Windy=no|Tennis=no) = 1/3
Frequency Table
Tennis = yes
Tennis = no
Windy = yes
Windy = no
Learning Probabilities (continued)
P(Windy=yes|Tennis=yes) = 1/3 P(Windy=no|Tennis=yes) = 2/3 P(Windy=yes|Tennis=no) = 2/3 P(Windy=no|Tennis=no) = 1/3
P(Windy=yes) = 3/6 = 1/2 P(Windy=no) = 3/6 = 1/2 P(Tennis=yes) = 3/6 = 1/2 P(Tennis=no) = 3/6 = 1/2
Frequency Table
Tennis = yes
Tennis = no
Windy = yes
Windy = no
Applying Bayes’ Theorem
§ Let us consider output class c and input value(s) a. Bayes’ Theorem can
be rewritten as
§ Now, given input value(s) a, we calculate the above for every class c: our
𝑃𝑐 𝑎)= prediction is the one with: max 𝑃 𝑐 𝑎)
𝑃 𝑎 𝑐)𝑃 𝑐 𝑃(𝑎)
𝑃 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠 𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠) = 𝑃 𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠)𝑃 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠 𝑃(𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠)
Applying Bayes’ Theorem (continued)
𝑃 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠 𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠) = 𝑃 𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠)𝑃 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠
𝑃(𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠) = “/$ ∗$/0 = 0.33
Frequency Table
Tennis = yes
Tennis = no
Windy = yes
Windy = no
Applying Bayes’ Theorem (continued)
𝑃 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠 𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠) = 𝑃 𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠)𝑃 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠
𝑃(𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠) = “/$ ∗$/0 = 0.33
𝑃 𝑇𝑒𝑛𝑛𝑖𝑠=𝑛𝑜 𝑊𝑖𝑛𝑑𝑦=𝑦𝑒𝑠)= 𝑃 𝑊𝑖𝑛𝑑𝑦=𝑦𝑒𝑠 𝑇𝑒𝑛𝑛𝑖𝑠=𝑛𝑜)𝑃 𝑇𝑒𝑛𝑛𝑖𝑠=𝑛𝑜
𝑃(𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠) = ‘/$ ∗$/0 = 0.67
Frequency Table
Tennis = yes
Tennis = no
Windy = yes
Windy = no
Applying Bayes’ Theorem (continued)
𝑃 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠 𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠) = 0.33 𝑃 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑛𝑜 𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠) = 0.67
max 𝑃 𝑐 𝑎) = max {0.33, 0.67} = 0.67 –
Frequency Table
Tennis = yes
Tennis = no
Windy = yes
Windy = no
Normalising Factor
𝑃 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠 𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠) = 𝑃 𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠)𝑃 𝑇𝑒𝑛𝑛𝑖𝑠 = 𝑦𝑒𝑠 𝑃(𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠)
= ‘/) ∗)/+ = 0.33 )/+
𝑃 𝑇𝑒𝑛𝑛𝑖𝑠=𝑛𝑜 𝑊𝑖𝑛𝑑𝑦=𝑦𝑒𝑠)= 𝑃 𝑊𝑖𝑛𝑑𝑦=𝑦𝑒𝑠 𝑇𝑒𝑛𝑛𝑖𝑠=𝑛𝑜)𝑃 𝑇𝑒𝑛𝑛𝑖𝑠=𝑛𝑜 𝑃(𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠)
= #/) ∗)/+ = 0.67 )/+
§ 1/𝑃(𝑊𝑖𝑛𝑑𝑦 = 𝑦𝑒𝑠) can be seen as a normalisation constant for the distribution: we can replace it with the constant parameter 𝛼 = 1/𝛽
§ 𝛽=∑,∈𝒴𝑃 𝑐 𝑃(𝑎|𝑐)
More than 1 Independent Variable
𝑃 𝑐 𝑎 , … , 𝑎 ) = # $#,…,$$ ‘)# ‘ = 𝛼 𝑃 𝑎 , … , 𝑎 𝑐)𝑃 𝑐 ! ” ∑(#’∏$#($|’)) ” <
§ 𝑃 represents the probability calculated based on the frequency tables § 𝑐 represents a class
§ 𝑎= represents the value of independent variable 𝑥= ∈ {1, ... , 𝑛}
§ 𝑛 is the number of independent variables
§ 𝛼 is the normalisation factor
Problems: Scaling and Missing Values
¬toothache
§ In this example (from the book), we have 3 Boolean variables
§ For a domain described by 𝑛 Boolean variables, we would need an input table
of size 𝑂(2/) and it would take 𝑂(2/) to process the table
§ Also, it is reasonable to think that we will never see values for all possible
combinations of the variables
§ Naïve Bayes can be used to deal with these issues
§ Fundamental concepts in Probability Theory
§ Bayes’ Theorem
§ Naïve Bayes for Categorical Independent Variables § Naïve Bayes for Numerical Independent Variables
Recall: Issues with Bayes’ Theorem
¬toothache
§ For increasing numbers of independent variables, all possible combinations must be considered:
𝑃𝑐𝑎",...,𝑎< =𝛼𝑃𝑐𝑃𝑎",...,𝑎<𝑐)
§ For a domain described by 𝑛 Boolean variables, we would need an input table of size 𝑂(2<) and it would take 𝑂(2<) to process the table
Naïve Bayes: Conditional Independence
§ Assumption: each input variable is conditionally independent of any other input variables given the output
§ Independence: 𝐴 is independent of 𝐵 when the following equality holds (i.e., 𝐵 does not alter the probability that 𝐴 has occurred):
§ Conditional independence: 𝑥" is conditionally independent of 𝑥' given 𝑦 when the following equality holds:
𝑃𝑥"𝑥',𝑦 =𝑃(𝑥",𝑦)
Naïve Bayes
§ Conditional independence: 𝑥" is conditionally independent of 𝑥' given 𝑦 when the following equality holds:
𝑃𝑥"𝑥',𝑦 =𝑃(𝑥",𝑦) 𝑃𝑐𝑎",...,𝑎< =𝛼𝑃𝑐𝑃𝑎",...,𝑎<𝑐)
Naïve Bayes
§ Conditional independence: 𝑥" is conditionally independent of 𝑥' given 𝑦 when the following equality holds:
𝑃𝑥"𝑥',𝑦 =𝑃(𝑥",𝑦) 𝑃𝑐𝑎",...,𝑎< =𝛼𝑃𝑐𝑃𝑎",...,𝑎<𝑐)
𝑃𝑐𝑎",...,𝑎< =𝛼𝑃𝑐𝑃𝑎"𝑐)𝑃𝑎'𝑐)...𝑃𝑎<𝑐)
Naïve Bayes
𝑃𝑐𝑎",...,𝑎< =𝛼𝑃𝑐𝑃𝑎"𝑐)𝑃𝑎'𝑐)...𝑃𝑎<𝑐) <
𝑃𝑐𝑎",...,𝑎< =𝛼𝑃𝑐f𝑃(𝑎=|𝑐) =>”
where𝛼=1/𝛽and𝛽=∑ (𝑃 𝑐 ∏< 𝑃(𝑎|𝑐)) ?∈𝒴 =>” =
Example: Naïve Bayes
§ Consider again the training set
Sunny (𝑋!)
Windy (𝑋”)
Tennis (𝑌)
§ Because of conditional independence, we have a table for each variable:
Frequency Table
Tennis = yes
Tennis = no
Windy = yes
Windy = no
Frequency Table
Tennis = yes
Tennis = no
Sunny = yes
Sunny = no
Example: Naïve Bayes (continued)
§ Let us determine the predicted class for the following instance: (Windy = no, Sunny = no, Y = ?)
Frequency Table
Tennis = yes
Tennis = no
Windy = yes
Windy = no
Frequency Table
Tennis = yes
Tennis = no
Sunny = yes
Sunny = no
§𝑃𝑐𝑎,…,𝑎 =𝛼𝑃𝑐 ∏’ 𝑃(𝑎|𝑐) &’ ()&(
§𝑃¬𝑇¬𝑊,¬𝑆 =𝛼𝑃¬𝑇𝑃¬𝑊¬𝑇𝑃¬𝑆¬𝑇 =𝛼*+∗&* ∗*=&+𝛼 §𝑃𝑇¬𝑊,¬𝑆 =𝛼𝑃𝑇𝑃¬𝑊𝑇𝑃¬𝑆𝑇 =𝛼*+∗,* ∗-*=0
Example: Naïve Bayes (continued)
§𝑃¬𝑇¬𝑊,¬𝑆 =𝛼𝑃¬𝑇𝑃¬𝑊¬𝑇𝑃¬𝑆¬𝑇 =𝛼-.∗!- ∗-=!.𝛼 §𝑃𝑇¬𝑊,¬𝑆 =𝛼𝑃𝑇𝑃¬𝑊𝑇𝑃¬𝑆𝑇 =𝛼-.∗/- ∗0-=0
§𝛼=!= ! =6 1 *+∗,* ∗-* 3 *+∗#* ∗*
§ 𝑃 ¬ 𝑇 ¬ 𝑊 , ¬ 𝑆 = !. ∗ 6 = 1 §𝑃𝑇¬𝑊,¬𝑆 =0
§ Problem: in this example, there is no data where Tennis = yes with Sunny = no, so regardless of the value of Windy, we will get inaccuracies in doing predictions
Laplace Smoothing
§ To avoid this problem, we can use Laplace smoothing by adding 1 to the frequency of all elements of our training data
Frequency Table
Tennis = yes
Tennis = no
Windy = yes
Windy = no
Frequency Table
Tennis = yes
Tennis = no
Sunny = yes
Sunny = no
§ Then we use the updated tables when calculating 𝑃(𝑎=|𝑐), so we do not get values with 0
§ When we calculate 𝑃(𝑐), we use the original tables
Naïve Bayes Learning Algorithm
§ Create frequency tables for each independent variable and the corresponding values for the frequency of an event
§ Count the number of training examples of each class with each independent variable
§ Apply Laplace smoothing
Naïve Bayes Model
§ Consists of the frequency tables obtained from Bayes’ Theorem under the conditional independence assumption (with or without Laplace smoothing)
Naïve Bayes prediction for an instance (X=a, Y=?)
§ We use Bayes’ Theorem under the conditional independence assumption
§ Fundamental concepts in Probability Theory
§ Bayes’ Theorem
§ Naïve Bayes for Categorical Independent Variables § Naïve Bayes for Numerical Independent Variables
Naïve Bayes for Numerical Independent Variables
𝑃𝑐𝑎’,…,𝑎/ =𝛼𝑃𝑐 𝑃𝑎’𝑐)𝑃𝑎#𝑐)…𝑃𝑎/𝑐) /
𝑃𝑐𝑎’,…,𝑎/ =𝛼𝑃𝑐L𝑃(𝑎0|𝑐) 01′
where𝛼=1/𝛽and𝛽=∑ (𝑃 𝑐 ∏/ 𝑃(𝑎|𝑐)) ,∈𝒴 01′ 0
§ Wepredicttheclasswithmax[𝑃 𝑐 𝑎’,…,𝑎/ ] 2
Naïve Bayes for Numerical Independent Variables
§ For categorical independent variables, we can compute the probability of an event through the probability mass function associated with the training data
Frequency Table
Tennis = yes
Tennis = no
Windy = yes
Windy = no
Naïve Bayes for Numerical Independent Variables
§ Instead, we assume that examples are drawn from a probability distribution. We can use a Gaussian distribution as we did before
§ Gaussian distribution with mean 𝜇 = 15 and variance 𝜎’ = 49 ” ! “!# $
𝑃𝑋=𝑎𝜇,𝜎’)=(√(‘+)𝑒$%$ ,
Also, recall that 𝜋 = 3.14159 and e = 2.71828
Naïve Bayes for Numerical Independent Variables
§ Let us consider the training data below. We create the PDF for Tennis = yes and for Tennis = no
§ So, for Tennis = yes, we calculate mean and variance §𝜇=”∑< 𝑥 ="#B'#B$#=25
§ 𝜎' = " ∑< (𝑥 −𝜇)' 1
=2 15−25″+ 25−25″+ 35−25″ =100
Sunny (𝑋!)
Temp. (𝑋”)
Tennis (𝑌)
Naïve Bayes for Numerical Independent Variables
§ Gaussian distribution with mean 𝜇 = 25 and variance 𝜎’ = 100
𝑃𝑋=𝑎 𝜇,𝜎’)= ” 𝑒!”!#$ (√(‘+) $%$
Naïve Bayes for Numerical Independent Variables
§ Gaussian distribution with mean 𝜇 = 25 and variance 𝜎’ = 100 𝑃𝑋=𝑎 𝜇,𝜎’)= ” 𝑒!”!#$
(√(‘+) $%$
Now, if we repeat for Tennis = no
§ 𝜇 = 11.67 § 𝜎’ = 58.34
Sunny (𝑋!)
Temp. (𝑋”)
Tennis (𝑌)
§ Let us build the tables for
Frequency Table
Tennis = yes
Tennis = no
Sunny = yes
Sunny = no
Parameter Table
Tennis = yes
Tennis = no
§ Now, let us use Naïve Bayes to make a prediction based on the tables:
Frequency Table
Tennis = yes
Tennis = no
Sunny = yes
Sunny = no
Parameter Table
Tennis = yes
Tennis = no
§𝑃𝑐𝑎,…,𝑎 =𝛼𝑃𝑐 ∏< 𝑃(𝑎|𝑐) "< =>“=
§ We use the frequency table for the categorical independent variables § We use the parameter table for the numerical independent variables
§ Calculate P(Tennis=yes|Sunny=no,Temperature=20):
Frequency Table
Tennis = yes
Tennis = no
Sunny = yes
Sunny = no
Parameter Table
Tennis = yes
Tennis = no
§𝑃 𝑇¬𝑆,𝑇𝑒𝑚𝑝=20 =𝛼𝑃 𝑇 𝑃 ¬𝑆𝑇 𝑃 𝑇𝑒𝑚𝑝=20𝑇 =𝛼36 ∗15 ∗𝑃 𝑇𝑒𝑚𝑝=20𝑇
§ Calculate P(Tennis=yes|Sunny=no,Temperature=20):
Frequency Table
Tennis = yes
Tennis = no
Sunny = yes
Sunny = no
Parameter Table
Tennis = yes
Tennis = no
§𝑃 𝑇¬𝑆,𝑇𝑒𝑚𝑝=20 =𝛼𝑃 𝑇 𝑃 ¬𝑆𝑇 𝑃 𝑇𝑒𝑚𝑝=20𝑇 = 𝛼36 ∗15 ∗0.035 = 0.0035𝛼
§ Calculate P(Tennis=no|Sunny=no,Temperature=20):
Frequency Table
Tennis = yes
Tennis = no
Sunny = yes
Sunny = no
Parameter Table
Tennis = yes
Tennis = no
§𝑃 ¬𝑇¬𝑆,𝑇𝑒𝑚𝑝=2
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com