CS计算机代考程序代写 Bayesian Lecture 15. Bayesian classification

Lecture 15. Bayesian classification
COMP90051 Statistical Machine Learning
Semester 2, 2019 Lecturer: Ben Rubinstein
Copyright: University of Melbourne

COMP90051 Statistical Machine Learning
This lecture
• Bayesianideasindiscretesettings ∗ Beta-Binomial conjugacy
• Bayesianclassification
∗ non-conjugacy necessitates approximation
2

COMP90051 Statistical Machine Learning
How to apply Bayesian view to discrete data?
• Firstoffconsidermodelswhichgeneratetheinput ∗ cf. discriminative models, which condition on the input ∗ I.e., p(y | x) vs p(x, y), Logistic Regression vs Naïve Bayes
• Forsimplicity,startwithmostbasicsetting
∗ n coin tosses, of which k were heads
∗ only have x (sequence of outcomes), but no ‘classes’ y
• Methods apply to generative models over discrete data
∗ e.g., topic models, generative classifiers (Naïve Bayes, mixture of multinomials)
3

COMP90051 Statistical Machine Learning
Discrete Conjugate prior: Beta-Binomial
• Conjugatepriorsalsoexistfordiscretespaces
• Consider n coin tosses, of which k were heads ∗ let p(head) = q from a single toss (Bernoulli dist)
∗ Inference question is the coin biased, i.e., is q ≈ 0.5
• Severaldraws,use Binomial dist
∗ and its conjugate prior, Beta dist
4

COMP90051 Statistical Machine Learning
Beta distribution
Sourced from https://en.wikipedia.org/wiki/Beta_distribution
5

COMP90051 Statistical Machine Learning
Beta-Binomial conjugacy
Bayesian posterior
Sweet! We know the normaliser for Beta
trick: ignore constant factors (normaliser)
6

COMP90051 Statistical Machine Learning
Uniqueness up to normalisation
• A trick we’ve used many times:
When an unnormalized distribution is proportional to a recognised
= 𝑔𝑔(𝜃𝜃). and the result follows from LHS1/LHS2 = RHS1/RHS2
• If 𝑓𝑓(𝜃𝜃) ∝ 𝑔𝑔(𝜃𝜃) for 𝑔𝑔 a distribution,
• Proof: 𝑓𝑓(𝜃𝜃) ∝ 𝑔𝑔(𝜃𝜃) means that
distribution, we say it must be that distribution
𝑓𝑓𝜃𝜃=𝐶𝐶�𝑔𝑔𝜃𝜃
�Θ 𝑓𝑓𝜃𝜃𝑑𝑑𝜃𝜃=𝐶𝐶�Θ 𝑔𝑔𝜃𝜃𝑑𝑑𝜃𝜃=𝐶𝐶
𝑓𝑓(𝜃𝜃)
∫Θ 𝑓𝑓 𝜃𝜃 𝑑𝑑𝜃𝜃
7

COMP90051 Statistical Machine Learning

Laplace’s Sunrise Problem
Every morning you observe the sun rising. Based solely on this fact, what’s the probability that the sun will rise tomorrow?
Use Beta-Binomial, where q is the Pr(sun rises in morning) ∗ posterior
∗ n=k=observer’sageindays
∗ let𝛼𝛼=𝛽𝛽=1(uniformprior)
• Undertheseassumptions
’smoothed’ count of days where sun rose / did not
8

COMP90051 Statistical Machine Learning
Sunrise Problem (cont.)
Consider a human life-span
Day (n, k)
k+α
n-k+β
E[q]
0
1
1
0.5
1
2
1
0.667
2
3
1
0.75

365
366
1
0.997
2920
(80 years)
2921
1
0.99997
q
Effect of prior diminishing with data, but never disappears completely.
9

COMP90051 Statistical Machine Learning
Suite of useful conjugate priors
likelihood
conjugate prior
Normal
Normal (for mean)
Normal
Inverse Gamma (for variance) or Inverse Wishart (covariance)
Binomial
Beta
Multinomial
Dirichlet
Poisson
Gamma
10
counts classification regression

COMP90051 Statistical Machine Learning
Bayesian Logistic Regression
Discriminative classifier, which conditions on inputs. How can we do Bayesian inference in this setting?
11

COMP90051 Statistical Machine Learning
Now for Logistic Regression…
• Similarproblemswithparameteruncertainty compared to regression
∗ although predictive uncertainty in-built to model outputs
Murphy Fig 8.5 & 8.6 p257-8 12

COMP90051 Statistical Machine Learning
No conjugacy
• Canweuseconjugateprior?E.g.,
∗ Beta-Binomial for generative binary models
∗ Dirichlet-Multinomial for multiclass (similar formulation)
• Modelisdiscriminative,withparametersdefined using logistic sigmoid*
∗ need prior over w, not q
∗ no known conjugate prior (!), thus use a Gaussian prior
* Or softmax for multiclass; same problems arise and similar solution
13

COMP90051 Statistical Machine Learning
Approximation
• Noknownsolutionforthenormalisingconstant
• Resolvebyapproximation
Laplace approx.:
• assume posterior ≃ Normal about
mode
• can compute normalisation constant,
draw samples etc.
Murphy Fig 8.6 p258
14

COMP90051 Statistical Machine Learning
Summary
• Bayesianideasindiscretesettings ∗ Beta-Binomial conjugacy
• Bayesianclassification
∗ non-conjugacy necessitates approximation
• Nexttime:probabilisticgraphicalmodels
15