Smoothing Prob. Density Functions Binning Bayesian Nets Summary
Fundamentals of Machine Learning for
Predictive Data Analytics
Chapter 6: Probability-based Learning Sections 6.4, 6.5
Copyright By PowCoder代写 加微信 powcoder
and Namee and Aoife D’Arcy
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
Continuous Features: Probability Density Functions Continuous Features: Binning
Bayesian Networks
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
P(fr) P(CH = ’none’ | fr) P(CH = ’paid’ | fr) P(CH = ’current’ | fr) P(CH = ’arrears’ | fr) P(GC = ’none’ | fr) P(GC = ’guarantor’ | fr) P(GC = ’coapplicant’ | fr) P(ACC = ’own’ | fr) P(ACC = ’rent’ | fr) P(ACC = ’free’ | fr)
= 0.1666 = 0.1666 = 0.5
= 0.1666 = 0.8334 = 0.1666 = 0
= 0.6666 = 0.3333 = 0
P(¬fr) P(CH = ’none’ | ¬fr) P(CH = ’paid’ | ¬fr) P(CH = ’current’ | ¬fr) P(CH = ’arrears’ | ¬fr) P(GC = ’none’ | ¬fr) P(GC = ’guarantor’ | ¬fr) P(GC = ’coapplicant’ | ¬fr) P(ACC = ’own’ | ¬fr) P(ACC = ’rent’ | ¬fr) P(ACC = ’free’ | ¬fr)
= 0.2857 = 0.2857 = 0.4286 = 0.8571 = 0
= 0.1429 = 0.7857 = 0.1429 = 0.0714
CREDIT HISTORY GUARANTOR/COAPPLICANT ACCOMMODATION FRAUDULENT paid guarantor free ?
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
P(fr) = 0.3 P(CH = paid | fr) = 0.1666 P(GC = guarantor | fr) = 0.1666
P(¬fr) = 0.7 P(CH = paid | ¬fr) = 0.2857
P(GC = guarantor | ¬fr) = 0 P(ACC=free|¬fr) = 0.0714
P(ACC=free|fr) = 0
mk=1 P(q[k] | fr) × P(fr) = 0.0
mk=1 P(q[k] | ¬fr) × P(¬fr) = 0.0
CREDIT HISTORY GUARANTOR/COAPPLICANT ACCOMMODATION FRAUDULENT paid guarantor free ?
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
The standard way to avoid this issue is to use smoothing.
Smoothing takes some of the probability from the events with lots of the probability share and gives it to the other probabilities in the set.
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
There are several different ways to smooth probabilities, we will use Laplace smoothing.
Laplace Smoothing (conditional probabilities)
P(f=v|t) =
count(f = v|t) + k
count (f |t ) + (k × |Domain(f )|)
Typical values of k – 1, 2, 3
Raw Probabilities
Smoothing Parameters
P(GC = none|¬fr)
P (GC = guarantor |¬fr )
P (GC = coapplicant |¬fr )
count (GC |¬fr )
count (GC = none|¬fr )
count (GC = guarantor |¬fr )
count (GC = coapplicant |¬fr )
|Domain(GC)|
= 0.8571 = 0
= 0.1429 = 3
= 0.6522 = 0.1304 = 0.2174
Smoothed Probabilities
P(GC = none|¬fr) = P(GC = guarantor|¬fr) = P(GC = coapplicant|¬fr) =
12+3 14+(3×3)
0+3 14+(3×3)
2+3 14+(3×3)
Table: Smoothing the posterior probabilities for the GUARANTOR/COAPPLICANT feature conditioned on FRAUDULENT being False.
P(fr) P(CH = none|fr) P(CH = paid|fr) P (CH = current |fr ) P(CH = arrears|fr) P(GC = none|fr) P (GC = guarantor |fr ) P (GC = coapplicant |fr ) P(ACC = own|fr) P(ACC = rent|fr) P(ACC = Free|fr)
= 0.2222 = 0.2222 = 0.3333 = 0.2222 = 0.5333 = 0.2667 = 0.2
= 0.4667 = 0.3333 = 0.2
P(¬fr) = P(CH = none|¬fr) = P (CH = paid |¬fr ) = P (CH = current |¬fr ) = P(CH = arrears|¬fr) = P(GC = none|¬fr) = P (GC = guarantor |¬fr ) = P (GC = coapplicant |¬fr ) = P(ACC = own|¬fr) = P (ACC = rent |¬fr ) = P(ACC = Free|¬fr) =
0.7 0.1154 0.2692 0.2692 0.3462 0.6522 0.1304 0.2174 0.6087 0.2174 0.1739
Table: The Laplace smoothed, with k = 3, probabilities needed by a Naive Bayes prediction model calculated from the fraud detection dataset. Notation key: FR=FRAUDULENT, CH=CREDIT HISTORY, GC = GUARANTOR/COAPPLICANT, ACC = ACCOMODATION, T=’True’, F=’False’.
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
CREDIT HISTORY GUARANTOR/COAPPLICANT ACCOMMODATION FRAUDULENT paid guarantor free ?
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
P(CH = paid|fr) P(GC = guarantor|fr) P(ACC = Free|fr)
= 0.2222 = 0.2667 = 0.2
P(CH = paid|¬fr) = P(GC = guarantor|¬fr) = P(ACC = Free|¬fr) =
0.2692 0.1304 0.1739
P(fr) = 0.3
P(¬fr) = 0.7
mk=1 P(q[m]|fr) × P(fr) = 0.0036 mk=1 P(q[m]|¬fr) × P(¬fr) = 0.0043
Table: The relevant smoothed probabilities, from Table 2 [9], needed by the Naive Bayes prediction model in order to classify the query from the previous slide and the calculation of the scores for each candidate classification.
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
Continuous Features: Probability Density Functions
Prob. Density Functions Binning Bayesian Nets Summary
A probability density function (PDF) represents the probability distribution of a continuous feature using a mathematical function, such as the normal distribution.
1 −(x − μ)2 N(x,μ,σ) = √ e 2σ2
μ−3σ μ−2σ μ−σ μ μ+σ μ+2σ μ+3σ
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
A PDF defines a density curve and the shape of the of the curve is determined by:
the statistical distribution that is used to define the PDF the values of the statistical distribution parameters
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
Table: Definitions of some standard probability distributions.
Normal (x − μ)2 x∈R 1−2
μ ∈ R σ∈R>0
Student-t x∈R
ρ ∈ R>0 κ ∈ R>0
Exponential
Mixture of n Gaussians
{μ ,…,μn|μ ∈R} 1 i
{σ1,…,σn|σi ∈ R>0} {ω1,…,ωn|ωi ∈ R>0} ni = 1 ω i = 0
√ e 2σ σ 2π
N(x,μ,σ) =
Γ( κ+1 ) τ(x,φ,ρ,κ) = Γ(κ)×√πκ×ρ ×
N(x,μ1,σ1,ω1,…,μn,σn,ωn) =
for x > 0 otherwise
(x − μi )2 ωi 2σ2
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
Values Values Values
(a) Normal/Student-t (b) Exponential (c) Mixture of Gaussians
Figure: Plots of some well known probability distributions.
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
Figure: Histograms of two unimodal datasets: (a) the distribution has light tails; (b) the distribution has fat tails.
Smoothing Prob. Density Functions
Bayesian Nets
Normal Student−t
Normal Student−t
Figure: Illustration of the robustness of the student-t distribution to outliers: (a) a density histogram of a unimodal dataset overlaid with the density curves of a normal and a student-t distribution that have been fitted to the data; (b) a density histogram of the same dataset with outliers added, overlaid with the density curves of a normal and a student-t distribution that have been fitted to the data. The student-t distribution is less affected by the introduction of outliers. (This figure is inspired by Figure 2.16 in (Bishop, 2006).)
Smoothing Prob. Density Functions Binning
Bayesian Nets
Values Values
Figure: Illustration of how a mixture of Gaussians model is composed of a number of normal distributions. The curve plotted using a solid line is the mixture of Gaussians density curve, created using an appropriately weighted summation of the three normal curves, plotted using dashed and dotted lines.
Prob. Density Functions Binning Bayesian Nets Summary
A PDF is an abstraction over a density histogram and consequently PDF represents probabilities in terms of area under the curve.
To use a PDF to calculate a probability we need to think in terms of the area under an interval of the PDF curve.
We can calculate the area under a PDF by looking this up in a probability table or to use integration to calculate the area under the curve within the bounds of the interval.
Smoothing Prob. Density Functions Binning Bayesian Nets
PDF(x− ) PDF(x− ) 22
PDF(x) PDF(x) PDF(x)
PDF(x+ ) PDF(x+ ) 22
x− x x+ x− x x+ x− x x+ 222222
(a) (b) (c)
Figure: (a) The area under a density curve between the limits x − 2ε and x + 2ε ; (b) the approximation of this area computed by
PDF (x ) × ε; and (c) the error in the approximation is equal to the difference between area A, the area under the curve omitted from the approximation, and area B, the area above the curve erroneously included in the approximation. Both of these areas will get smaller as the width of the interval gets smaller, resulting in a smaller error in the approximation.
Prob. Density Functions Binning Bayesian Nets Summary
There is no hard and fast rule for deciding on interval size – instead, this decision is done on a case by case basis and is dependent on the precision required in answering a question.
To illustrate how PDFs can be used in Naive Bayes models we will extend our loan application fraud detection query to have an ACCOUNT BALANCE feature
Table: The dataset from the loan application fraud detection domain with a new continuous descriptive features added: ACCOUNT BALANCE
ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
CREDIT GUARANTOR/
HISTORY COAPPLICANT ACCOMMODATION
current none own current none own current none own
paid guarantor rent arrears none own arrears none own current none own arrears none own current none rent
none none own current coapplicant own current none free current none rent
paid none own arrears none own current none own arrears coapplicant rent arrears none free arrears none own current none own
ACCOUNT BALANCE 56.75 1,800.11 1,341.03 749.50 1,150.00 928.30 250.90 806.15 1,209.02 405.72 550.00 223.89 103.23 758.22 430.79 675.11 1,657.20 1,405.18 760.51 985.41
FRAUD true false false true false true false false false true false true true false false false false false false false
Prob. Density Functions Binning Bayesian Nets Summary
We need to define two PDFs for the new ACCOUNT BALANCE (AB) feature with each PDF conditioned on a different value in the domain or the target:
P(AB = X|fr) = PDF1(AB = X|fr) P(AB = X|¬fr) = PDF2(AB = X|¬fr)
Note that these two PDFs do not have to be defined using the same statistical distribution.
Smoothing Prob. Density Functions
Bayesian Nets Summary
0.0000 0.0005 0.0010 0.0015 0.0020
0.0000 0.0005 0.0010 0.0015 0.0020
0 500 1000
Feature Values
1000 1500 2000
Feature Values
Figure: Histograms, using a bin size of 250 units, and density curves for the ACCOUNT BALANCE feature: (a) the fraudulent instances overlaid with a fitted exponential distribution; (b) the non-fraudulent instances overlaid with a fitted normal distribution.
Prob. Density Functions Binning Bayesian Nets Summary
From the shape of these histograms it appears that
the distribution of values taken by the ACCOUNT BALANCE feature in the set of instances where the target feature FRAUDULENT=’True’ follows an exponential distribution
the distributions of values taken by the ACCOUNT BALANCE feature in the set of instances where the target feature FRAUDULENT=’False’ is similar to a normal distribution.
Once we have selected the distributions the next step is to fit the distributions to the data.
Prob. Density Functions Binning Bayesian Nets Summary
To fit the exponential distribution we simply compute the sample mean, x ̄, of the ACCOUNT BALANCE feature in the set of instances where FRAUDULENT=’True’ and set the λ parameter equal to one divided by x ̄.
To fit the normal distribution to the set of instances where FRAUDULENT=’False’ we simply compute the sample mean and sample standard deviation, s, for the ACCOUNT BALANCE feature for this set of instances and set the parameters of the normal distribution to these values.
Table: Partitioning the dataset based on the value of the target feature and fitting the parameters of a statistical distribution to model the ACCOUNT BALANCE feature in each partition.
ACCOUNT ID . . . BALANCE 1 56.75
ID . . . 2
BALANCE FRAUD
1 800.11 false 1 341.03 false 1 150.00 false
250.90 false
806.15 false 1209.02 false 550.00 false 758.22 false 430.79 false 675.11 false
FRAUD true 749.50 true 928.30 true 405.72 true 223.89 true 103.23 true
411.22 0.0024
1 657.20 1 405.18
false 760.51 false 985.41 false
984.26 460.94
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
Table: The Laplace smoothed (with k = 3) probabilities needed by a naive Bayes prediction model calculated from the dataset in Table 5 [23], extended to include the conditional probabilities for the new ACCOUNT BALANCE feature, which are defined in terms of PDFs.
P(fr) P(CH = none|fr) P(CH = paid|fr) P (CH = current |fr ) P(CH = arrears|fr) P(GC = none|fr) P (GC = guarantor |fr ) P (GC = coapplicant |fr ) P(ACC = own|fr) P(ACC = rent|fr) P(ACC = free|fr) P(AB = x|fr)
= 0.2222 = 0.2222 = 0.3333 = 0.2222 = 0.5333 = 0.2667 = 0.2
= 0.4667 = 0.3333 = 0.2
P (¬fr ) = P(CH = none|¬fr) = P (CH = paid |¬fr ) = P (CH = current |¬fr ) = P(CH = arrears|¬fr) = P(GC = none|¬fr) = P (GC = guarantor |¬fr ) =
0.7 0.1154 0.2692 0.2692 0.3462 0.6522 0.1304 0.2174 0.6087 0.2174 0.1739
λ = 0.0024
σ = 460.94
= coapplicant |¬fr ) = P(ACC = own|¬fr) = P (ACC = rent |¬fr ) = P(ACC = free|¬fr) =
P(AB = x|¬fr)
Ex,≈ N μ = 984.26,
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
Table: A query loan application from the fraud detection domain.
Credit Guarantor/ Account
History CoApplicant Accomodation Balance Fraudulent paid guarantor free 759.07 ?
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
Table: The probabilities, from Table 7 [29], needed by the naive Bayes prediction model to make a prediction for the query
⟨CH = ’paid’, GC = ’guarantor’, ACC = ’free’, AB = 759.07⟩ and the calculation of the scores for each candidate prediction.
P(CH = paid|fr)
P (GC = guarantor |fr )
P(ACC = free|fr)
P (AB = 759.07|fr )
759.07, ≈Eλ=0.0024
= 0.2222 = 0.2667 = 0.2
P(¬fr) = P (CH = paid |¬fr ) = P (GC = guarantor |¬fr ) = P(ACC = free|¬fr) =
P (AB = 759.07|¬fr )
759.07,
≈ N μ = 984.26, =
0.7 0.2692 0.1304 0.1739
σ = 460.94 mk=1 P(q[k]|fr) × P(fr) = 0.0000014
mk=1 P(q[k]|¬fr) × P(¬fr) = 0.0000033
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
Continuous Features: Binning
Prob. Density Functions Binning Bayesian Nets Summary
In Section 3.6.2 we explained two of the best known binning techniques equal-width and equal-frequency.
We can use these techniques to bin continuous features into categorical features
In general we recommend equal-frequency binning.
Table: The dataset from a loan application fraud detection domain with a second continuous descriptive feature added: LOAN AMOUNT
ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
HISTORY COAPPLICANT ACCOMMODATION BALANCE current none own 56.75 current none own 1 800.11 current none own 1 341.03
paid guarantor rent 749.50 arrears none own 1 150.00 arrears none own 928.30 current none own 250.90 arrears none own 806.15 current none rent 1 209.02
none none own 405.72 current coapplicant own 550.00 current none free 223.89 current none rent 103.23
paid none own 758.22 arrears none own 430.79 current none own 675.11 arrears coapplicant rent 1 657.20 arrears none free 1 405.18
AMOUNT FRAUD 900 true 150 000 false 48 000 false 10 000 true 32 000 false 250 000 true 25 000 false 18 500 false 20 000 false 9500 true 16 750 false 9850 true 95 500 true 65000 false 500 false 16 000 false 15 450 false 50 000 false
CREDIT GUARANTOR/ ACCOUNT LOAN
arrears none own current none own
760.51 500 985.41 35 000
false false
Table: The LOAN AMOUNT continuous feature discretized into 4 equal-frequency bins.
ID 15 19 1 10 12 4 17 16 11 8
FRAUD false false true true true true false false false false
ID AMOUNT 9 20,000 7 25,000 5 32,000 20 35,000 3 48,000 18 50,000 14 65,000 13 95,500 2 150,000 6 250,000
AMOUNT FRAUD bin3 false bin3 false bin3 false bin3 false bin3 false bin4 false bin4 false
bin4 true bin4 false bin4 true
BINNED LOAN LOAN AMOUNT AMOUNT 500 bin1 500 bin1 900 bin1 9,500 bin1 9,850 bin1 10,000 bin2 15,450 bin2 16,000 bin2 16,750 bin2 18,500 bin2
BINNED LOAN LOAN
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
Once we have discretized the data we need to record the raw continuous feature threshold between the bins so that we can use these for query feature values.
Table: The thresholds used to discretize the LOAN AMOUNT feature in queries.
Bin Thresholds
49, 000 < Bin4
Bin1 ≤ 9, 925
Bin2 ≤ 19, 250
Bin3 ≤ 49, 000
Table: The Laplace smoothed (with k = 3) probabilities needed by a naive Bayes prediction model calculated from the fraud detection dataset. Notation key: FR = FRAUD, CH = CREDIT HISTORY, AB = ACCOUNT BALANCE, GC = GUARANTOR/COAPPLICANT, ACC = ACCOMMODATION, BLA = BINNED LOAN AMOUNT.
P(fr) P(CH = none|fr) P(CH = paid|fr) P (CH = current |fr ) P(CH = arrears|fr) P(GC = none|fr) P (GC = guarantor |fr ) P (GC = coapplicant |fr ) P(ACC = own|fr) P(ACC = rent|fr) P(ACC = free|fr) P(AB = x|fr)
= 0.2222 = 0.2222 = 0.3333 = 0.2222 = 0.5333 = 0.2667 = 0.2
= 0.4667 = 0.3333 = 0.2
P (¬fr ) = P(CH = none|¬fr) = P (CH = paid |¬fr ) = P (CH = current |¬fr ) = P(CH = arrears|¬fr) = P(GC = none|¬fr) = P (GC = guarantor |¬fr ) =
0.7 0.1154 0.2692 0.2692 0.3462 0.6522 0.1304 0.2174 0.6087 0.2174 0.1739
λ = 0.0024
P(BLA = bin1|fr) = 0.3333 P(BLA = bin2|fr) = 0.2222 P(BLA = bin3|fr) = 0.1667 P(BLA = bin4|fr) = 0.2778
≈ N μ = 984.26,
σ = 460.94 P(BLA = bin1|¬fr) = P(BLA = bin2|¬fr) = P(BLA = bin3|¬fr) = P(BLA = bin4|¬fr) =
0.1923 0.2692 0.3077 0.2308
= coapplicant |¬fr ) = P(ACC = own|¬fr) = P (ACC = rent |¬fr ) = P(ACC = free|¬fr) =
P(AB = x|¬fr)
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
Table: A query loan application from the fraud detection domain.
Credit Guarantor/ Account Loan
History CoApplicant Accomodation Balance Amount Fraudulent paid guarantor free 759.07 8,000 ?
Table: The relevant smoothed probabilities, from Table 13 [37], needed by the naive Bayes model to make a prediction for the query
⟨CH = ’paid’, GC = ’guarantor’, ACC = ’free’, AB = 759.07, LA = 8 000⟩ and the calculation of the scores for each candidate prediction.
P(CH = paid|fr)
P (GC = guarantor |fr )
P(ACC = free|fr)
P (AB = 759.07|fr )
759.07, ≈Eλ=0.0024
= 0.2222 = 0.2667 = 0.2
P (CH = paid |¬fr )
P (GC = guarantor |¬fr )
P(ACC = free|¬fr)
P (AB = 759.07|¬fr )
759.07,
≈ N μ = 984.26,
σ = 460.94 P(BLA = bin1|¬fr)
P(BLA = bin1|fr)
mk=1 P(q[k] | fr) × P(fr) = 0.000000462
nk=1 P(q[k] | ¬fr) × P(¬fr) = 0.000000633
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
Bayesian Networks
Prob. Density Functions Binning Bayesian Nets Summary
Bayesian networks use a graph-based representation to encode the structural relationships—such as direct influence and conditional independence—between subsets of features in a domain.
Consequently, a Bayesian network representation is generally more compact than a full joint distribution, yet is not forced to assert global conditional independence between all descriptive features.
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
A Bayesian Network is a directed acyclical graph that is composed of thee basic elements:
conditional probability tables (CPT)
Smoothing Prob. Density Functions Binning Bayesian Nets Summary
P(B=T|A,C)
Figure: (a) A Bayesian network for a domain consisting of two binary features. The structure of the network states that the value of feature A directly influences the value of feature B. (b) A Bayesian network consisting of 4 binary features with a path containin
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com