Bayesian learning
Strong magic in machine learning and engineering is the probabilistic approach. Particularly
successful approach has been the bayesian approach in machine learning.
Now, the fundamental concept is the coonditional probability,
i.e. “probability of when is known.” In the naming convention we are used to it could be
written as
Bayes theorem
Bayes theorem probability, a posteriori probability for event when is known is
P(A|B) , (1)
A B
P(y|x) . (2)
A B
lec04_bayesian_learning http://localhost:8888/nbconvert/html/lec04_bayesian_l…
1 of 5 9/8/21, 16:30
where
• = a priori probability of event
• = a priori probability of event
• = conditional probability for when is known
Estimation of the a priori probabilities
Oh, that’s tricky
Estimation of the posteriori probabilities
Why is easier than ?
Measured values can be converted to a histogram that tells how many samples in each “bin”.
P(A|B) = , (3)
P(B|A)P(A)
P(B)
P(A) A
P(B) B
P(A|B) A B
P(B|A) P(A|B)
In [1]:
import matplotlib.pyplot as plt
import numpy as np
#
# 1. Generate and plot random points for training
np.random.seed(13) # to always get the same points
N_h = 1000
N_e = 200
x_h = np.random.normal(1.1,0.3,N_h)
x_e = np.random.normal(1.9,0.4,N_e)
plt.plot(x_h,np.zeros([N_h,1]),’co’, label=”hobit”)
plt.title(f'{N_h} measured hobit heights’)
plt.legend()
plt.xlabel(‘height [m]’)
plt.axis([0.0,3.0,-1.1,+1.1])
plt.show()
lec04_bayesian_learning http://localhost:8888/nbconvert/html/lec04_bayesian_l…
2 of 5 9/8/21, 16:30
[ 19. 360. 541. 78. 2.]
[0. 0.5 1. 1.5 2. 2.5]
However, histogram represents the numbers of samples in each bin and not really
probabilities. If we want to assign probability of each bin, we need to divide the bin height with
the total number of samples.
[0.019 0.36 0.541 0.078 0.002]
[0. 0.5 1. 1.5 2. 2.5]
In [2]:
#
# 2. Histogram plots with 5 bins
foo = plt.hist(x_h, bins = 5, range = [0,2.5], color=’cyan’)
x_h_hist = foo[0]
x_h_hist_bins = foo[1]
print(x_h_hist)
print(x_h_hist_bins)
plt.xlabel(‘height [m]’)
plt.ylabel(‘amount [num]’)
plt.show()
In [3]:
foo = plt.hist(x_h_hist_bins[:-1], x_h_hist_bins, weights=x_h_hist/1000, color
x_h_hist = foo[0]
x_h_hist_bins = foo[1]
print(x_h_hist)
print(x_h_hist_bins)
plt.xlabel(‘height [m]’)
plt.ylabel(‘probability’)
plt.show()
lec04_bayesian_learning http://localhost:8888/nbconvert/html/lec04_bayesian_l…
3 of 5 9/8/21, 16:30
Normalized histogram does not still represent a probability distribution as its area is not 1.0.
For that, we need to divide each value by the bin width (0.5 in this examples). That produces
a probability density for which
[0.038 0.72 1.082 0.156 0.004]
[0. 0.5 1. 1.5 2. 2.5]
Bayes optimal decision
Maximum a posteriori (MAP) decision/hypothesis, , and maximum likelihood (ML)
decision/hypothesis, .
Example 4.3 Does patient have cancer or not?
A patient takes a lab test and the result comes back positive. The test returns a correct
positive result in only of the cases in which the disease is actually present, and a correct
negative result in only of the cases in which the disease is not present. Furthermore,
of the entire population have this cancer.
?
∫ p(x|hobit)dx = ∑
i
p(Δi|hobit) ⋅ |Δi| = 1.0 (4)
In [4]:
foo = plt.hist(x_h, bins = 5, range = [0,2.5], density=True, rwidth=0.1, color
x_h_hist = foo[0]
x_h_hist_bins = foo[1]
print(x_h_hist)
print(x_h_hist_bins)
plt.xlabel(‘height [m]’)
plt.ylabel(‘probability density’)
plt.show()
hMAP
hML
98%
97%
.008
P(cancer) = P(¬cancer) =
P(+|cancer) = P(−|cancer) =
P(+|¬cancer) = P(−|¬cancer) =
hMAP
lec04_bayesian_learning http://localhost:8888/nbconvert/html/lec04_bayesian_l…
4 of 5 9/8/21, 16:30
References
C.M. Bishop (2006): Pattern Recognition and Machine Learning, Chapter 1-2.
lec04_bayesian_learning http://localhost:8888/nbconvert/html/lec04_bayesian_l…
5 of 5 9/8/21, 16:30