程序代写 COMP9418: Advanced Topics in Statistical Machine Learning

COMP9418: Advanced Topics in Statistical Machine Learning
Markov Networks
Instructor: University of Wales

Copyright By PowCoder代写 加微信 powcoder

Introduction
§ This lecture discusses Markov networks
§ These are undirected graphical models
§ They are frequently used to model symmetrical dependencies, as in case of pixels in an image
§ Like Bayesian networks, Markov networks are used to model variable independencies
§ However, these representations are not redundant
§ There exist sets of independencies that can be expressed in a Markov network but not in a Bayesian network and vice-versa
§ We will discuss the semantics of Markov networks
§ As well as some inference algorithms such as stochastic search and variable elimination

Introduction
§ Several processes such as an sentence or image can be modelled as a series of states in a chain or grid
§ Each state can be influenced by the state of its neighbours
§ Such symmetry is modelled using undirected graphs called
Markov random fields (MRFs) or Markov networks (MN)
§ MNs were proposed to model ferromagnetic materials
§ In Physics, these models are known as Ising models
§ Each variable represents a dipole with two states + and –
§ The state of each dipole depends on an external field and its neighbours
𝑋! 𝑋” 𝑋# 𝑋$ 𝑋! 𝑋” 𝑋# 𝑋$ 𝑋% 𝑋& 𝑋’ 𝑋( 𝑋) 𝑋!* 𝑋!! 𝑋!”

Introduction
§ In an MN, a variable is independent of all other variables given its neighbours
§ For instance, in this figure, 𝑋! ⊥ 𝑋#, 𝑋$|𝑋” § Therefore, 𝑃 𝑋! 𝑋”, 𝑋#, 𝑋$ = 𝑃(𝑋!|𝑋”)
§ A common query is to find the instantiation of maximum probability
§ MAP or MPE query
§ The probability of each instantiation depends on an external
influence (prior) and the internal influence (likelihood)
§ MNs can be thought as a series of rings in poles, where each ring is a variable, and the height of a ring corresponds to its state
𝑋! 𝑋” 𝑋# 𝑋$

Voting Example
§ Suppose that we are modeling voting preferences among four persons 𝐴, 𝐵, 𝐶, 𝐷
§ Let’s say that 𝐴 − 𝐵, 𝐵 − 𝐶, 𝐶 − 𝐷, and 𝐷 − 𝐴 are friends
§ Friends can influence each other
§ These influences can be naturally represented by an undirected graph
§ In this example, 𝐴 does not interact directly with 𝐶. The same occurs with 𝐵 and 𝐷
§ 𝐴 ⊥ 𝐶|𝐵, 𝐷 and 𝐵 ⊥ 𝐷|𝐴, 𝐶
§ We saw there is no Bayesian network that can represent only these independence assumption (Lecture 4 – Slide 33)

Voting Example
§ Like Bayesian networks, Markov networks encode independence assumptions
§ Variables that are not independent must be in some factor
§ Factor is a generalization of a CPT. It does not need to store values in the range 0 − 1
§ In this example, we can factorise the joint distribution as
𝑃(𝐴,𝐵,𝐶,𝐷)= +!𝜙!(𝐴,𝐵)𝜙”(𝐵,𝐶)𝜙#(𝐶,𝐷)𝜙$(𝐷,𝐴)
§ 𝑍 is a normalizing constant known as the partition function
§𝑍=∑ 𝑃9(𝐴,𝐵,𝐶,𝐷)
,,.,/,0 𝑐̅ 𝑑̅ 1 𝑏1𝑐̅
§ 𝑃9 ( 𝐴 , 𝐵 , 𝐶 , 𝐷 ) = 𝜙 ! ( 𝐴 , 𝐵 ) 𝜙 ” ( 𝐵 , 𝐶 ) 𝜙 # ( 𝐶 , 𝐷 ) 𝜙 $ ( 𝐷 , 𝐴 )
𝐷 𝐴 𝜙$(𝐷,𝐴) 𝐴 𝐵 𝜙!(𝐴,𝐵) 𝑑𝑎 100 𝑎𝑏 30
1 𝑑̅𝑎11 𝑎𝑏5
̅1 𝑑𝑎1 100 𝐴 𝑎1𝑏
𝐶𝐷 𝜙#(𝐶,𝐷)
𝑐 𝑑̅ 1𝑏𝑐 𝑐 𝑑 100 𝑏 𝑐̅ 𝑐̅ 𝑑 100 𝑏1 𝑐
100 1 1 100

Voting Example
§ We can view 𝜙(𝐴, 𝐵) as an interaction that pushes B’s vote closer to that of A
§ The term 𝜙(𝐵, 𝐶) pushes 𝐵’s vote closer to 𝐶, but 𝐶 pushes 𝐷’s vote away (and vice-versa).
§ The most likely vote will require reconciling these conflicting influences
§ We simply indicate a level of coupling between dependent variables in the graph
§ This requires less prior knowledge than CPTs
§ It defines an energy landscape over the space of possible
assignments
§ We convert this energy to a probability via the normalization constant
𝐷 𝐴 𝜙$(𝐷,𝐴) 𝐴 𝐵 𝜙!(𝐴,𝐵) 𝑑𝑎 100 𝑎𝑏 30
1 𝑑̅𝑎11 𝑎𝑏5
𝑑𝑎1 𝑎1𝑏1 ̅1
𝑑𝑎1 100 𝐴 𝑎1𝑏 10 𝐷𝐵
𝐶 𝐷 𝜙#(𝐶,𝐷) 𝑐𝑑̅ 1
𝐵 𝐶 𝜙”(𝐵,𝐶) 𝑏𝑐 100
𝑐𝑑100 𝑏𝑐̅1 𝑐̅ 𝑑 100 𝑏1 𝑐 1 𝑐̅ 𝑑̅ 1 𝑏1 𝑐̅ 100

Voting Example
MPE assignment
Assignment
Unnormalized
Normalized
𝐷 𝐴 𝜙$(𝐷,𝐴) 𝐴 𝐵 𝜙!(𝐴,𝐵) 𝑑𝑎 100 𝑎𝑏 30
1 𝑑̅𝑎11 𝑎𝑏5
𝑑𝑎1 𝑎1𝑏1 ̅1
𝑑𝑎1 100 𝐴 𝑎1𝑏 10 𝐷𝐵
𝐶 𝐷 𝜙#(𝐶,𝐷) 𝑐𝑑̅ 1
𝐵 𝐶 𝜙”(𝐵,𝐶) 𝑏𝑐 100
𝑐𝑑100 𝑏𝑐̅1 𝑐̅𝑑100 𝑏1𝑐1 𝑐̅𝑑̅ 1 𝑏1𝑐̅ 100

Voting Example
§ Although expensive, the joint probability can be used to answer probabilistic queries
§ Prior marginal queries, such as 𝑃(𝐴, 𝐵)
𝑃(𝐴,𝐵) 𝑎𝑏 .69
§ Probability of evidence, such as 𝑃 𝑏1
§ Posterior marginal, such as 𝑃 𝑏1 𝑐 = 0.06
Assignment
Unnormalized
Normalized

Voting Example: Bad News for Learning!
§ Suppose we had learned 𝑃(𝐴, 𝐵) from data § By counting the occurrences of 𝑎 and 𝑏
§ 𝑃(𝐴, 𝐵) is not a direct replacement for 𝜙!(𝐴, 𝐵)
𝐷 𝐴 𝜙$(𝐷,𝐴) 𝐴 𝐵 𝜙!(𝐴,𝐵) 𝑑𝑎 100 𝑎𝑏 30
1 𝑑̅𝑎11 𝑎𝑏5
𝑑𝑎1 𝑎1𝑏1 ̅1
𝑑𝑎1 100 A 𝑎1𝑏 10
𝑃(𝐴,𝐵) 𝑎𝑏 .69
𝐶 𝐷 𝜙#(𝐶,𝐷) 𝑐𝑑̅ 1
𝐵 𝐶 𝜙”(𝐵,𝐶) 𝑏𝑐 100
𝑐𝑑100 𝑏𝑐̅1 𝑐̅𝑑100 𝑏1𝑐1 𝑐̅𝑑̅ 1 𝑏1𝑐̅ 100

Random Field
§ A random field 𝑿 is a set of random variables
§ It is common that each variable 𝑋1 to be associated with a site
§ This idea comes from areas such as image processing in which each variable is associated with a pixel or region
§ We use a set 𝑺 to index a set of 𝑛 sites
§ The sites can be spatially regular, as in the case of a 2D image § Or irregular, if they do not present spatial regularity
§ The sites in 𝑺 are related to one another via a neighborhood system
§ A site is not neighboring to itself: 𝑖 ∉ 𝑵1
§ The neighboring relationship is mutual: 𝑖 ∈ 𝑵1’ iff 𝑖’ ∈ 𝑵1
𝑿= 𝑋%,…,𝑋&
𝑋!,! 𝑋!,# 𝑋!,$ 𝑋!,% 𝑋#,! 𝑋#,# 𝑋#,$ 𝑋#,% 𝑋$,! 𝑋$,# 𝑋$,$ 𝑋$,%
𝑵1 is a set of sites neighboring 𝑖 𝑵 = {𝑵1|∀𝑖 ∈ 𝑺}

Markov Networks
§ A random field 𝑿 is a Markov random field (or Markov network) on 𝑺 w.r.t. a neighbourhood system 𝑵 if and only if
§ 𝑃 𝑋! = 𝑥!,…,𝑋3 = 𝑥3 > 0,∀𝒙 ∈ 𝑿 (positivity)
§ 𝑃 𝑋1 𝑿𝑺∖ 1 = 𝑃(𝑋1|𝑿𝑵!) (Markovianity)
§ Graphically, Markov networks (MN) are undirected graphical models
§ 𝐺 = (𝑽, 𝑬), where 𝑽 consists of a set of random variables, and 𝑬 a set of undirected edges
§ A set of variables 𝑿 is independent of 𝒀 given 𝒁, if the variables in 𝒁 separate 𝑿 and 𝒀 in the graph
§ Therefore, if we remove the nodes in 𝒁 from the graph, there will be no paths between 𝑿 and 𝒀

Markov Networks:
§ When the positivity condition is satisfied, the joint probability distribution is uniquely determined by the Gibbs distribution
§ This result is known as the Hammersley-Clifford theorem
§ Like in Bayesian networks, it allow us to factorise the full joint
distribution into smaller factors
§ Therefore, we can efficiently answer probabilistic queries
§ Using the example, we have the following factorisation for maximal cliques
§ 𝑃(𝐴,𝐵,𝐶,𝐷) = +! 𝜙! 𝐴,𝐵,𝐷 𝜙”(𝐵,𝐶,𝐷)
§ In practice, we frequently use smaller cliques such as pairwise
§ 𝑃(𝐴,𝐵,𝐶,𝐷) = +! 𝜙! 𝐴,𝐵 𝜙” 𝐵,𝐶 𝜙# 𝐶,𝐷 𝜙$(𝐷,𝐴)𝜙%(𝐷,𝐵) 𝐶
𝑃𝑿=𝑍1 R 𝜙7(𝑿7) 7∈791:;<=(?) 𝑍=S R 𝜙7(𝑿7) 𝒙 7∈791:;<=(?) Markov Networks: Positivity § This graph encodes the independencies § 𝐴 ⊥ 𝐶|𝐵, 𝐷 and 𝐷 ⊥ 𝐵|𝐴, 𝐶 § Let us verify if this joint distribution has the same independence assumptions 𝐵 𝐷 𝐴 𝑃(𝐴|𝐵,𝐷) 𝐵 𝐷 𝐶 𝑃(𝐶|𝐵,𝐷) .5 1 1 .5 0 .5 0 .5 1 .5 𝑃(𝐴, 𝐶|𝐵, 𝐷) .5 𝐴 𝐵 𝐶 𝐷 𝑃(.) 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 Markov Networks: Positivity § This graph encodes the independencies § 𝐴 ⊥ 𝐶|𝐵, 𝐷 and 𝐷 ⊥ 𝐵|𝐴, 𝐶 § Let us verify if this joint distribution has the same independence assumptions 𝐴 𝐶 𝐵 𝑃(𝐵|𝐴,𝐶) 𝐴 𝐶 𝐷 𝑃(𝐷|𝐴,𝐶) 1 .5 .5 0 .5 1 .5 1 .5 0 𝑃(𝐵, 𝐷|𝐴, 𝐶) .5 𝐴 𝐵 𝐶 𝐷 𝑃(.) 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 Markov Networks: Positivity § This graph encodes the independencies § 𝐴 ⊥ 𝐶|𝐵, 𝐷 and 𝐷 ⊥ 𝐵|𝐴, 𝐶 § Let us verify if this joint distribution has the same independences assumptions §𝑃 𝑎1,𝑏,𝑐,𝑑̅ =𝜙! 𝑎1,𝑏 𝜙" 𝑏,𝑐 𝜙# 𝑐,𝑑̅ 𝜙$ 𝑑̅,𝑎 =0 §𝑃 𝑎1,𝑏,𝑐,𝑑 =𝜙! 𝑎1,𝑏 𝜙" 𝑏,𝑐 𝜙# 𝑐,𝑑 𝜙$ 𝑑,𝑎 =!( § 𝑃 𝑎1,𝑏1,𝑐̅,𝑑̅ = 𝜙! 𝑎1,𝑏1 𝜙" 𝑏1,𝑐̅ 𝜙# 𝑐̅,𝑑̅ 𝜙$ 𝑑̅,𝑎1 = !( §𝑃 𝑎,𝑏,𝑐,𝑑̅ =𝜙! 𝑎,𝑏 𝜙" 𝑏,𝑐 𝜙# 𝑐,𝑑̅ 𝜙$ 𝑑̅,𝑎 =!( 1/8 1/8 1/8 1/8 1/8 1/8 𝜙% 𝐷,𝐴 𝐴 𝜙! 𝐴,𝐵 𝐷𝐵 𝐴 𝐵 𝐶 𝐷 𝑃(.) 1/8 1/8 § Different Gibbs distributions may induce a same undirected graph § 𝜙! 𝐴,𝐵,𝐷 𝜙"(𝐵,𝐶,𝐷) § 𝜙! 𝐴,𝐵,𝐷 𝜙" 𝐵,𝐷 𝜙# 𝐵,𝐶 𝜙$(𝐶,𝐷) § 𝜙! 𝐴,𝐵 𝜙"(𝐴,𝐷)𝜙# 𝐵,𝐷 𝜙$ 𝐵,𝐶 𝜙%(𝐶,𝐷) § Therefore, we cannot read the factorization from the graph § All these factorizations have the same independence assumptions § However, they do not have the same representational power § For example, for a fully connected graph, a maximal clique has 𝑂(𝑑3) parameters, but a pairwise graph has only 𝑂(𝑛"𝑑") parameters Potentials § Clique factors can be: § Single-node factors: specify an affinity for a particular candidate § Pairwise-factors: enforce affinities between friends 𝜙,- 𝑎,𝑏 =100if𝑎=𝑏 𝜙,-. 𝑎,𝑏,𝑐 =100if𝑎⊕𝑏⊕𝑐 The normalization 𝑍 makes the factors scale invariant! § Higher-order: important to specify relationships among sets of variables Factor Graphs § A factor graph is a graph containing two types of nodes § Random variables § Factors over the sets of variables 𝑋# 𝑋$ 𝑋% 𝑋& 𝑋! 𝑋# 𝑋$ 𝑋% 𝑋& ambiguity 𝑋 § It allow us to derive the factorization without §𝑃𝑋!,𝑋",𝑋#,𝑋$,𝑋% = ! 𝑃 𝑋!,𝑋",𝑋# 𝑃 𝑋",𝑋#,𝑋$ 𝑃(𝑋#,𝑋%) §𝑃𝑋!,𝑋",𝑋#,𝑋$,𝑋% = 𝑃 𝑋!,𝑋" 𝑃 𝑋!,𝑋# 𝑃 𝑋",𝑋# 𝑃(𝑋",𝑋$)𝑃(𝑋#,𝑋$)𝑃(𝑋#,𝑋%) § 𝑃 𝑋!,𝑋",𝑋#,𝑋$,𝑋% = 𝑃 𝑋!,𝑋" 𝑃 𝑋!,𝑋# 𝑃 𝑋",𝑋#,𝑋$ 𝑃(𝑋#,𝑋%) 𝑋# 𝑋$ 𝑋% 𝑋& Energy Functions § The joint probability in a MN is frequently expressed in terms of energy functions § 𝐸(𝑿) is the energy. Therefore, maximising 𝑃(𝑿) is equivalent to minimising 𝐸(𝑿) § The energy function can be written in terms of local functions 𝜓/ known as potentials § Historical: statistical physics 𝑃 𝑿 = 𝑍1 e x p ( − 𝐸 𝑿 ) 𝐸𝑿= S 𝜓7(𝑿7) 7∈/91:;<=(?) 𝑃 𝑿 = 𝑍1 e x p − 𝜓𝑿7 =−log𝜙7(𝑿7) S 𝜓 7 ( 𝑿 7 ) 7∈/91:;<=(?) Voting Example Assignment Unnormalized Normalized 𝐷 𝐴 𝜙$(𝐷,𝐴) 𝐴 𝐵 𝜙!(𝐴,𝐵) 𝑑𝑎 100 𝑎𝑏 30 1 𝑑̅𝑎11 𝑎𝑏5 𝑑𝑎1 𝑎1𝑏1 ̅1 𝑑𝑎1 100 𝐴 𝑎1𝑏 10 𝐷𝐵 𝐶 𝐷 𝜙#(𝐶,𝐷) 𝑐𝑑̅ 1 𝐵 𝐶 𝜙"(𝐵,𝐶) 𝑏𝑐 100 𝑐𝑑100 𝑏𝑐̅1 𝑐̅𝑑100 𝑏1𝑐1 𝑐̅𝑑̅ 1 𝑏1𝑐̅ 100 Assignment Voting Example Unnormalized Normalized 𝐷 𝐴 𝜓$(𝐷,𝐴) 𝐴 𝐵 𝜓!(𝐴,𝐵) 𝑑 𝑎 𝑎 𝑏 -3.40 1 𝑑̅𝑎10 𝑎𝑏 𝑑𝑎0 𝑎1𝑏0 ̅1 𝑑 𝑎1 -4.61 𝐴 𝑎1 𝑏 -2.30 𝐷𝐵 𝐶 𝐷 𝜓#(𝐶,𝐷) 𝑐 𝑑̅ 0 𝐵 𝐶 𝜓"(𝐵,𝐶) 𝑏 𝑐 -4.61 𝑏 𝑐̅ 0 𝑏1𝑐 0 𝑏1 𝑐̅ -4.61 22 𝑐 𝑑 -4.61 𝑐̅𝑑-4.61 𝑐̅ 𝑑̅ 0 Pairwise Markov Networks § Common subclass of Markov networks § All the factors are over single variables or pairs of variables § Node potentials: 𝜓 𝑋1 : 𝑖 = 1,...,𝑛 § Edge potentials: {𝜓 𝑋1, 𝑋B : 𝑋1, 𝑋B ∈ 𝐻} § Application: noise removal from binary images § Noisy image of pixel values, 𝑌1,B § Noise-free image of pixel values, 𝑋1,B § Markov Net with § 𝜙 𝑋',(, 𝑋'’,(’ potentials representing correlations between neighbouring pixels 𝑋!,! 𝑋!,# 𝑋!,$ 𝑋!,% 𝑋#,! 𝑋#,# 𝑋#,$ 𝑋#,% 𝑋$,! 𝑋$,# 𝑋$,$ 𝑋$,% § 𝜙 𝑋',(,𝑌',( potentials describing correlations between same pixels in noise-free and noisy image Example: Image Smoothing § Many applications of Markov networks involve finding the MAP or MPE assignment § This is known as the MAP-MRF approach § Given the Gibbs distribution, it is equivalent to minimize the energy function 𝑌',( § The number of possible assignments is very large § It increases exponentially with the number of variables in the network § For instance, for a binary image of 100 x 100 pixels, there are 2!*,*** possible assignments § Finding the assignment of minimal energy is usually posed as a stochastic search § Start with a random value for each variable in the network § Improve this configuration via local operations § Until a configuration of (local) minimum energy is found Stochastic Search Algorithm Input: Markov network 𝑁 with variables 𝑿, energy function 𝐸 Output: an assignment 𝒔 for 𝑿 with minimum (local) energy 𝒔 ← initial assignment for every variable 𝑋' ∈ 𝑿 for 𝑖 = 1 to 𝐼 # 𝐼 is maximum number of iterations for each variable 𝑋' ∈ 𝑿 do 𝑠'’ ← alternative value for variable 𝑋' if𝐸(𝒔’)<𝐸(𝒔)orrandom(𝐸 𝒔’ −𝐸 𝒔 )<𝑇then if 𝐸 𝒔𝒑𝒓𝒆𝒗 −𝐸 𝒔 <𝜖 𝒔𝒑𝒓𝒆𝒗 ← 𝒔 return 𝒔 # 𝑇 is threshold of accepting a change to a higher energy state #𝜖isaconvergencethreshold Example: Image Smoothing § This algorithm has three main variations § Iterative Conditional Modes (ICM): it always selects the assignment of minimum energy § Metropolis: with a fixed probability, 𝑝, it selects an assignment with higher energy § Simulated annealing (SA): with a variable probability, 𝑃(𝑇), it selects an assignment with higher energy. 𝑇 is a parameter known as temperature. The probability of selecting a value with higher energy is determined by the expression 𝑃(𝑇) = 𝑒ABC/E where 𝛿𝐸 is the energy difference. The value of 𝑇 is reduced with each iteration Local Independence § In a Markov network the absence of edges imply in independence § Given an undirected graph 𝐺 = (𝑽, 𝑬) § If the edge 𝑋 − 𝑌 ∉ 𝑬 then 𝑋 ⊥ 𝑌|𝑽 ∖ {𝑋, 𝑌} § These are known as pairwise Markov independencies of 𝐺 § Another local property of independence is the Markov blanket § As in the case of Bayesian networks, the Markov blanket 𝑈 of a variable 𝑋 is the set of nodes such that 𝑋 is independent from the rest of the graph if 𝑈 is observed § In the undirected case the Markov blanket turns out to be simply equal a node’s neighborhood Global Independence: Separation § A global interpretation of independence uses the idea of separation § Let 𝑿, 𝒀, and 𝒁 be disjoint sets of nodes in a graph 𝐺. We say that 𝑿 and 𝒀 are separated by 𝒁, written 𝑠𝑒𝑝G (𝑿, 𝒁, 𝒀), iff every path between a node in 𝑿 and a node in 𝒀 is blocked by 𝒁 § A path is blocked by 𝒁 iff at least one valve on the path is closed given 𝒁 § Like Bayesian networks. But now, there is not the exception of convergent structures Separation: Complexity § The definition of separation considers all paths connecting a node in 𝑿 with a node in 𝒀 § In practice, this test is too inefficient § We can replace it by a cut-set test § Two sets 𝑿 and 𝒀 of variables are separated by a set 𝒁 iff § There is no path from every node 𝑋 ∈ 𝑿 to every node 𝑌 ∈ 𝒀 after removing all nodes in 𝒁 § 𝒁 is a cut-set between two parts of the graph Separation: Soundness and Completeness § Like d-separation, separation test is sound § If 𝑃 is a probability distribution induced by a Markov network then 𝑠𝑒𝑝G(𝑿, 𝒁, 𝒀) only if 𝑿 ⊥ 𝒀 | 𝒁 § We can safely use separation test to derive independence statements about the probability distributions induced by Markov networks § Like d-separation, separation test is not complete § The lack of separation does not imply into dependency § This is expected. As d-separation, separation only looks at the graph structure 𝐴 𝜙! 𝑎5 𝑎1 10 𝐴 𝐵 𝜙!,# 𝐴 𝑎𝑏1 𝐵 𝜙# 𝑎2 𝑎1 20 Markov VS Bayesian Networks Markov Nets § Factors are easy to change (no normalization), but difficult to elicit § Can be applied to problems with cycles or no natural directionality § Difficult to read the factorization from the graph, but we can use factor graphs § 𝑍 requires summing over all entries (NP-hard) Bayes Nets § Factors are easy to elicit from people § Must have no cycles and edges are directed § Graphs are easy to interpret particularly the causal ones § Naturally normalized § Easy to generate synthetic data from it (more about this later) Markov VS Bayesian: Representation § Bayesian and Markov networks can be understood as languages to represent independencies § These languages can represent different sets of independencies § Therefore, these representations are not redundant § For example, there is no directed graph that is a perfect map for the top case § Conversely, there is no undirected graph that is a perfect map for the bottom case § In several circumstances, we need to find a Markov network that is an I-MAP for a Bayesian network § This is achievable through moralisation § We connect the parents of unmarried child nodes § We lose the marginal independence of parents 𝑋! ⊥ 𝑋" | 𝑌!, 𝑌" 𝑌! ⊥ 𝑌" | 𝑋!, 𝑋" 𝑌! 𝑌# 𝑋# 𝑋! 𝑋# 𝑌 𝑋! 𝑋# 𝑌 Variable Elimination § Let us now consider if Variable Elimination (VE) works for Markov networks § The idea of VE is to anticipate the elimination of variables § Using the network example, suppose we want to compute § We start with the Gibbs distribution 𝑃(𝐴,𝐵) = ∑/ ∑0 𝑃(𝐴,𝐵,𝐶,𝐷) =∑/∑0+!𝜙! 𝐴,𝐵 𝜙" 𝐵,𝐶 𝜙# 𝐶,𝐷 𝜙$(𝐷,𝐴) ∝ ∑/∑0𝜙! 𝐴,𝐵 𝜙" 𝐵,𝐶 𝜙# 𝐶,𝐷 𝜙$(𝐷,𝐴) =𝜙! 𝐴,𝐵 ∑/𝜙" 𝐵,𝐶 ∑0𝜙# 𝐶,𝐷 𝜙$(𝐷,𝐴) 𝐷 𝐴 𝜙$(𝐷,𝐴) 𝐴 𝐵 𝜙!(𝐴,𝐵) 𝑑 𝑎 100 𝑎 𝑏 30 1 𝑑̅𝑎11 𝑎𝑏5 𝑑𝑎1 𝑎1𝑏1 ̅1 𝑑 𝑎1 100 𝐴 𝑎1 𝑏 10 𝐷𝐵 𝐶 𝐷 𝜙#(𝐶,𝐷) 𝑐 𝑑̅ 1 𝐵 𝐶 𝜙"(𝐵,𝐶) 𝑏 𝑐 100 𝑐̅𝑑100 𝑏1𝑐1 𝑐̅ 𝑑̅ 1 𝑏1 𝑐̅ 100 Variable Elimination 𝑐 𝑑 𝑎 100 𝑐 𝑑̅ 𝑎1 1 𝑐𝑑̅𝑎 100 𝑐 𝑑 𝑎1 10000 𝑐̅ 𝑑 𝑎 10000 𝑐̅ 𝑑̅ 𝑎1 100 𝑐̅𝑑̅𝑎 1 𝑐̅ 𝑑 𝑎1 100 𝐷 𝐴 𝜙$(𝐷,𝐴) 𝐴 𝐵 𝜙!(𝐴,𝐵) 𝑑 𝑎 100 𝑎 𝑏 30 1 𝑑̅𝑎11 𝑎𝑏5 𝑑𝑎1 𝑎1𝑏1 ̅1 𝑑 𝑎1 100 𝐴 𝑎1 𝑏 10 𝐷𝐵 𝐶𝐷𝜙(𝐶,𝐷)𝐶 𝐵𝐶𝜙(𝐵,𝐶) 𝑐 𝑑̅ 1 𝑐 𝑑 100 𝑏 𝑐̅ 1 𝑐̅𝑑100 𝑏1𝑐1 𝑐̅ 𝑑̅ 1 𝑏1 𝑐̅ 100 Variable Elimination § Let us now consider if Variable Elimination (VE) works for Markov networks § The idea of VE is to anticipate the e 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com