代写代考 Graphical Models

Graphical Models
Bayesian Networks

Objectives

Describe Bayesian Illustrate key tasks in Networks implementing
Bayesian Networks

Why do we use graphical models?
|In machine learning, we are often concerned with joint distributions of many random variables.
|A graph may provide an intuitive way of representing or visualizing the relationships of the variables.
-Making it easier for domain experts to build a model Lung cancer X- history
Bronchitis Fatigue

Graphical Models for Casual Relations
|Graphical models arise naturally from, often causal, independency relations of physical events.
Smoking history
Lung cancer X- Fatigue
|Caveat: probabilistic relationship does not imply causality.

Bayesian Networks
|A BN is directed acyclic graph (DAG), where
-Nodes (vertices) represent random variables.
-Directed edges represent immediate dependence of nodes.
|Other names: Belief networks, Bayes nets, etc.

Conditional Independence
|E.g., given the following graph, check the relationship between X3 and X1
-X3 is dependent of X2, and X2 is dependent of X1 -Thus X3 is dependent of X1
– But given X2, X3 is dependent of X1
➔Conditional Independence

Inference in Bayesian Networks
|Given a model and some data (“evidence”), how to update our belief?
|What are the model parameters?

Inference in Bayesian Networks (cont’d)
|Given a model and some data (“evidence”), how to update our belief?
|E.g., for a patient with certain smoking history (non-smoker), whose X-ray result is positive, and who does not experience fatigue:
-What is probability of having lung cancer?

Inference in Bayesian Networks (cont’d)
|In a simple BN like this, we can compute the exact probabilities.
|In general, for a tree-structured BN, we may use belief propagation for the inference problem.
|For general structures, sometimes it is possible to generalize the above method (e.g., the junction tree algorithm). More often, we must resort to approximation methods
-E.g. Variational methods, Sampling (Monte Carlo) methods.

Learning in Bayesian Networks
|Learning parameters (probabilities) for a given BN (the graph is given).
-Estimate the (conditional) probabilities from past data. |Learning both the structure and the parameters
-A more challenging task beyond the scope of this discussion.

Learning the Probabilities
|Basic ideas
– Use relative frequency for estimating probability.
– A prior distribution is typically assumed.
– The prior is then updated by the data into posterior. – Using the MLE principle
|The so-called “Expectation-Maximization (EM) Algorithm” is often used.
-Iteratively update our guess for the parameter and each step attempts to apply the MLE principle.

Graphical Models
Hidden Markov Formulation

Objectives
Introduce Hidden Markov Illustrate HMM with Models intuitive examples

Hidden Markov Models
|Hidden Markov Models (HMMs) are a type of dynamic Bayesian Network
-Modeling a process indexed by time
|“Hidden”: the observations are due to some underlying (hidden) states not directly observable.
|“Markov”: the state transitions are governed by a Markov process.

Discrete Markov Process
|Consider a system which may be described at any time as being in one of a set of N distinct states, S1, …, SN.
| At time instances t=1,2,3, …, the system changes its state according to certain probability. The full description requires us to know P(st=Sj | st-1=Si,st- 2=Sk, …, s1=Sm) for all t, i, k, …, m, where st stands for the state of the system at time t.
– For a first-order Markov chain, we need to consider only P(st=Sj | st-1=Si)
– Further assume Ps are “stationary”:
aij = P(st=Sj | st-1=Si), 1≤i,j≤N, for any t.

A Simple Example
|Assume one of the three states for each day: S1-rainy, S2-cloudy, S3-sunny
|Assume the transition probability matrix
0.5 0.2 0.3 0.3 0.3 0.4 0.1 0.2 0.7
|Many questions we may ask, based on this model.
-E.g., Given today is cloudy, what is the probability it remains to be cloudy for next 5 days?
A = {aij} =

Extending to “Hidden” States
|The previous example is an “observable” Markov model: the output of the system/process is the states of interest.
|Now assume that we can only measure the (average) temperature of a day
-Further assume this measurement is useful for predicting the weather states (rainy, cloudy, sunny).
-We can view the temperature values as being produced by the hidden states of interest, i.e., the weather.

A Simple HMM

A Specific Process from the Model
rainy cloudy cloudy sunny cloudy
70◦ 72◦ 75◦ 78◦ 77◦

Specifying an HMM
|Θ: the set of hidden states.
|The state transition probabilities aij = P(st=Sj | st-
1=Si), 1≤i,j≤N
– Let A={aij } be the transition probability matrix
|Ω: the set of outputs (observations).

Specifying an HMM (cont’d)
|The observation probabilities: P(ot|st ), where ot stands for the observation at time t, given the state st. This is also called the emission probability.
– For discrete observation space, we can define B={bjk}=P(ot=vk at t|st=Sj ) as the emission probability matrix, where vk is the kth symbol in Ω
|The initial state distribution π = {πi}, πi=P(s1=Si)
– Sometimes we are given an initial state, i.e., P(s1=Si)=1 for certain i.

Basic Problems in HMM
|For a given HMM Λ={Θ,Ω,A,B,π}
-Problem 1: Given an observation (sequence) O={o1,o2, … ,ok}, what is the most likely state sequence S={s1,s2, … ,sk} that has produced O?
-Problem 2: How likely is an observation O (i.e., what is P(O)) ?
-Problem 3: How to estimate the model parameters (A,B,π)?

Graphical Models
Hidden Markov Models: Learning & Inference

Implement HMM learning & inference algorithms

Problem 1: State Estimation
|Given an observation (sequence) O={o1,o2, … ,ok}, what is the most likely state sequence S={s1,s2, … ,sk} that has produced O?
|Formally, we need to solve argmax𝑃(𝑺|𝑶)
|Or, equivalently,
argmax 𝑃(𝑺, 𝑶) = argmax𝑃(𝑺, 𝑶) 𝑺 𝑃(𝑶) 𝑺

The ”Weather” Example
|Let’s expand the state space as a trellis, for the earlier example:
— t(.|.) is the transition probability and e(.|.) the emission probability.
S1-rain, S2-cloudy, S3- ➔To identify a path for which
the product of t’s and the e’s is maximized.
S1 S1 … S1 … S1
Initial state
S* S2 S2 S2 S2
… S3 o1o2 oi ok

Viterbi Algorithm for Problem 1
|A dynamic programming solution − For each state in the trellis, we record:
1. 𝛿𝑠𝑖(𝑡) is the probability of taking the maximal path up to time t-1 ending at state Si at time t and while generating o1…ot
2. 𝜓𝑠𝑖 (𝑡) is the state sequence that resulted in the maximal probability up to state Si at time t.

Problem 2: Evaluate P(O)
|To evaluate P(O), we can do 𝑷(𝑶) = ෍ 𝑷(𝑺, 𝑶) 𝑺
|From the trellis, a solution can be found by summing the probabilities of all paths generating the given observation sequence.
|A dynamic programming solution: the forward algorithm or the backward algorithm.

The Forward Algorithm
|Define the forward probability , which is the probability for all paths up to time t-1 ending at state Si at time t and generating o1…ot .

Problem 3: Parameter Learning
|Case 1: we have a set of labeled data – sequences in which we have the information
− Use relative frequency for estimating the probabilities →the MLE solution
numberof(𝑜𝑡 =𝑜 ,𝑠𝑡 =𝑆) 𝑟 𝑗
numberof𝑆 𝑗𝑗
|Case 2: we have only the observation sequence
-The Forward-Backward Algorithm (a.k.a. Baum- ): An EM approach.
numberof(𝑠𝑡 =𝑆,𝑠𝑡−1 =𝑆)
𝑡(𝑆 |𝑆 ) = 𝑖 𝑗
𝑒(𝑜 |𝑆 ) = 𝑟 𝑗
𝑖 𝑗 numberof𝑆

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts