COMP9418: Advanced Topics in Statistical Machine Learning
Belief Propagation
Instructor: University of Wales
Copyright By PowCoder代写 加微信 powcoder
Introduction
§ In this lecture, we discuss a class of approximate inference algorithms based on belief propagation
§ Belief propagation was introduced as an exact algorithm for networks with polytree structure
§ Later, applied to networks with arbitrary structure and produced high-quality approximations in certain cases
§ We introduce generalization of the algorithm with a full spectrum of approximations
§ Belief propagation approximation at one end § Exact results at the other
Belief Propagation
§ Belief propagation is a messaging-passing algorithm § Originally developed for exact inference in polytrees
§ A polytree is a network with only one undirected path between any two nodes
§ The exact algorithm is a variation of the jointree § It computes 𝑃(𝑋, 𝒆) for every variable in the polytree § We discuss the approximate algorithm later on
EF Polytree
Belief Propagation
§ Suppose we want to apply the jointree algorithm under evidence 𝐸 = 𝑡𝑟𝑢𝑒
§ In this case, we can create a “special” jointree has the same structure as the polytree
§ A node 𝑖 in the jointree has cluster 𝑪! = 𝑋𝑼, where 𝑼 are the parents of 𝑋
§ Edge 𝑈 → 𝑋 in the jointree has separator 𝑺!” = 𝑈 § Therefore
§ Jointree width equals polytree treewidth
§ Each jointree message is over a single variable
EF Polytree
Belief Propagation
§ Belief propagation is the jointree algorithm under these circumstances
§ Messages are notated differently based on the polytree
§ Message from node 𝑈 to child 𝑋 is denoted 𝜋!(𝑈)
(causal support)
§ Messages from node 𝑌 to parent 𝑋 is denoted 𝜆”(𝑋)
(diagnostic support)
§ The joint marginal for the family of variable 𝑋 with
parents 𝑈! and children 𝑌! is given by
𝑃(𝑋𝑼) = 𝜙!(𝑋,𝑼)-𝜋!(𝑈#)-𝜆”!(𝑋) #$
𝜋!(𝐵) 𝜆$(𝐷)
𝜋!(𝐶) B 𝜆”(𝐷) D
Belief Propagation with Evidence
§ In the presence of evidence, the belief propagation uses an evidence indicator 𝜆𝒆(𝑋)
§ 𝜆𝒆(𝑥) = 1 if 𝑥 is consistent with the evidence 𝒆 and zero otherwise
§ We can rewrite the joint marginal for the family of variable 𝑋 with parents 𝑈!, children 𝑌! and evidence 𝒆 as
𝑃(𝑋𝑼,𝒆) = 𝜆𝒆(𝑋) 𝜙,(𝑋,𝑼)2𝜋,(𝑈!)2𝜆-!(𝑋) !”
𝜋!(𝐵) 𝜆$ (𝐷)
𝜋!(𝐶) 𝜆” (𝐷)
Belief Propagation with Evidence
§ Using this notation, diagnostic messages can be defined as 𝜆! 𝑈# = . 𝜆* 𝑋 𝜙!(𝑋,𝑼)-𝜋!(𝑈+)-𝜆””(𝑋)
!𝑼\{(!} +,# $
§ And causal messages as
𝜋”” 𝑋 = .𝜆* 𝑋 𝜙!(𝑋,𝑼)-𝜋!(𝑈#)-𝜆”#(𝑋)
§ A node can send a message to a neighbour only after it has received messages from all other neighbours
-𝜋!(𝑈+) +,#
𝑈% … 𝑈( … 𝑈) 𝜆$ 𝑈%
𝜋&(𝑈%) 𝜆 ‘! ( 𝑋 )
𝜋&(𝑈)) 𝑋 𝜙&(𝑋,𝑼)
𝜆 ‘” ( 𝑋 ) 𝑌+
Belief Propagation with Evidence
§ Using this notation, diagnostic messages can be defined as 𝜆! 𝑈# = . 𝜆* 𝑋 𝜙!(𝑋,𝑼)-𝜋!(𝑈+)-𝜆””(𝑋)
!𝑼\{(!} +,# $
§ And causal messages as
𝜋”” 𝑋 = .𝜆* 𝑋 𝜙!(𝑋,𝑼)-𝜋!(𝑈#)-𝜆”#(𝑋)
§ A node can send a message to a neighbour only after it has received messages from all other neighbours
𝑈% … 𝑈( … 𝑈)
𝜋&(𝑈%) 𝜆 ‘! ( 𝑋 )
𝜋&(𝑈)) 𝜙&(𝑋,𝑼)
𝜆 ‘” ( 𝑋 ) 𝑌+
-𝜆”#(𝑋) +,$
Belief Propagation with Evidence
§ When a node has a single neighbour, it can immediately send a message to that neighbour
§ This includes a leaf node 𝑋 with a single parent 𝑈
𝜆! 𝑈 =.𝜆* 𝑋 𝜙!(𝑋,𝑈) !
§ And a root node 𝑋 with a single child 𝑌
𝜋” 𝑋 =𝜆* 𝑋𝜙!(𝑋,𝑈)
§ These are the base cases for belief propagation
§ These messages can be computed immediately as they do not depend on any
𝑋 𝜙&(𝑋,𝑼) 𝑋 𝜙&(𝑋,𝑼)
other messages 𝑌
§ Typically, messages are first propagated toward a root and them pushed away from root
Belief Propagation: Example
𝑃 𝐵,𝐶,𝐷,𝒆 =𝜙!(𝐷,𝐵,𝐶)𝜋!(𝐵)𝜋!(𝐶)𝜆$(𝐷)𝜆”(𝐷)
𝒆: {𝐸 = 𝑡𝑟𝑢𝑒}
𝐴 𝜙,(𝐴) 𝐵 𝐶 𝐷 𝜙!(𝐷,𝐵,𝐶) 𝐵 𝐶 𝐷 𝑃(𝐵,𝐶,𝐷,𝒆)
𝑎 .01 𝐶 𝜙 (𝐶) 𝑏 𝑐 𝑑 .99 𝐴 𝜋#(𝐴) 𝑏 𝑐 𝑑 1.7731×10./
𝑎<.99A - 𝑏𝑐𝑑̅.01 𝑎.01A𝐶𝜋!(𝐶)𝑏𝑐𝑑̅5.9700×10.0
c .001 𝑏 𝑐̅ 𝑑 1.6103×10.1
𝑐̅ .999 𝑏 𝑐̅ 𝑑̅ .90 𝐴 𝐵 𝜙#(𝐵,𝐴) 𝑏 𝑐̅ 𝑑 .10 𝑎𝑏 .100 <𝑐𝑑 .95
𝑏 𝑐̅ 𝑑̅ 𝑏<𝑐𝑑 𝑏<𝑐𝑑̅ 𝑏< 𝑐̅ 𝑑 𝑏<𝑐̅𝑑̅
5.9640×10.2 8.5330×10.3 1.4970×10.2 8.9731×10.1 2.9611×10.%
𝑎𝑏 .900 B C 𝑏𝑐𝑑 .05 𝑏 .00199 AB
𝑎<𝑏 .001 <
< 𝑏𝑐̅𝑑 .01 𝑏< .99801
𝑏𝑐̅𝑑 .99 𝜋'(𝐵) 𝜋'(𝐶)
𝑑̅𝑓 F 𝑑̅𝑓̅
𝜙 (𝐹,𝐷) " .2
𝐷 𝐸 𝜙$(𝐸,𝐷)
𝑑̅𝑒 .3 E 𝑑𝑒̅ .7
𝐷 𝜆 (𝐷) 𝑑"1
Belief Propagation: Example
§ We can use 𝑃(𝐵, 𝐶, 𝐷, 𝒆) to compute marginals for the variables 𝐵, 𝐶 and 𝐷. For instance
§ We can also compute the joint marginal for 𝐶 once we compute the message from 𝐷 to 𝐶
§ To compute conditional marginals, we simply normalize joint marginals § Another approach is to use a constant 𝜂 that normalizes the
factor to sum to one
𝑃(𝑋𝑼|𝒆) = 𝜂 𝜆𝒆(𝑋) 𝜙!(𝑋,𝑼)-𝜋!(𝑈#)-𝜆""(𝑋) #$
𝜆! 𝑈# = 𝜂 . 𝜆* 𝑋 𝜙!(𝑋,𝑼)-𝜋!(𝑈+)-𝜆""(𝑋) !𝑼\{(!} +,# $
𝜋"" 𝑋 =𝜂.𝜆* 𝑋 𝜙!(𝑋,𝑼)-𝜋!(𝑈#)-𝜆"#(𝑋) 𝑼 # +,$
𝑃(𝐶, 𝒆) .0009 .3067
𝜋'(𝐵) 𝜋'(𝐶) BCD
𝜆*(𝐷) 𝜆((𝐷)
Belief Propagation in Connected Networks
§ Belief propagation was designed as an exact algorithm for polytrees § However, it was later applied to connected networks
§ This application poses some difficulties
§ A message can be sent from 𝑋 to 𝑌 only when 𝑋 has received all messages
from other neighbours B
§ The correctness of belief propagation depends on the underlying polytree
§ The results can be incorrect if applied to connected networks § The algorithm is no longer always correct
§ But can still provide some high-quality approximations in many cases
§ In the figure, after node 𝐸 send a message to 𝐶 no other message can be propagated
§ Since each is dependent on others that are waiting to be propagated
Iterative Belief Propagation (IBP)
§ Iterative or Loopy Belief Propagation assumes some initial value to each message in the network
§ Given these initial values, each node is ready to send a message to each of its neighbours
§ At each iteration 𝑡, every node 𝑋 send a message to its neighbours using the messages received from its other neighbours in 𝑡 − 1
§ The algorithm iterates until message convergence
§ The value of messages at the current iteration are within some
threshold from their values at the previous iteration
§ When IBP converges, the values of the messages at convergence are
called fixed point
§ IBP may have multiple fixed points on a given network
Message Schedule
§ For some networks, IBP can oscillate and never converge
§ The convergence rate can depend on the order the messages are propagated, which is known as message schedule
§ Parallel schedule: the order of the messages does not affect the
algorithm B
§ Sequential schedule: messages are propagates as soon as they are computed
§ Sequential schedules are flexible in when and how quickly information is propagated
§ Although one schedule may converge and other may not, all schedules have the same fixed points
Parallel Iterative Belief Propagation
initialize all messages
while messages have not converged do
for each node 𝑋 with parents 𝑼 do
for each parent 𝑈( do
𝜆4 𝑈 ←𝜂∑ 𝜆 𝑋𝜙(𝑋,𝑼)∏ 𝜋4.%(𝑈)∏𝜆4.%(𝑋)
&( &𝑼\{8#}:& ;<(&;*'$ for each child 𝑌* do
𝜋4 𝑋←𝜂∑𝜆𝑋𝜙(𝑋,𝑼)∏𝜋4.%(𝑈)∏ 𝜆4.%(𝑋) '$ 𝑼:&(&(;<*'%
return𝛽 𝑋𝑼 = 𝑃(𝑋𝑼|𝒆)=𝜂𝜆 (𝑋)𝜙 (𝑋,𝑼)∏ 𝜋4(𝑈)∏ 𝜆4 (𝑋) 𝒆& (&𝒊*'$
The Kullback-Leibler Divergence
§ The Kullback-Leibler divergence, known as KL divergence, between two distributions 𝑃 and 𝑃’ conditioned on 𝒆
𝐾𝐿(𝑃’ 𝑿 𝒆 ,𝑃 𝑿 𝒆 ) ≝ .𝑃’(𝒙|𝒆)log𝑃’(𝒙|𝒆) 𝒙 𝑃(𝒙|𝒆)
§𝐾𝐿(𝑃’ 𝑿𝒆 ,𝑃 𝑿𝒆 )isnon-negativeandequaltozeroifandonlyif𝑃’ 𝑿𝒆 and 𝑃 𝑿 𝒆 are equivalent
§ However, KL divergence is not a true distance since it is not symmetric. In general 𝐾𝐿𝑃’𝑿𝒆,𝑃𝑿𝒆 ≠𝐾𝐿(𝑃𝑿𝒆,𝑃’𝑿𝒆)
§ We say we are weighting the KL divergence by the approximate distribution § This variation has some useful computational properties
Optimizing KL Divergence
§ The approximate inference can be posed as an optimization problem
§ The goal is to search for an approximate distribution 𝑃’ that minimizes KL divergence with 𝑃
§ We can assume a parametrized form for 𝑃’ and search for the best instance, i.e., the best set of parameters
§ The Iterative Belief Propagation algorithm presented before assumes that the approximate distribution 𝑃’(𝑋) factors as
𝑃’ 𝑿 𝒆 = - 𝑃’(𝑋𝑼|𝒆) !𝑼 ∏(∈𝑼𝑃’(𝑈|𝒆)
§ 𝑋𝑼 ranges over the families of the network 𝑁
§ 𝑈 ranges over nodes that appear as parents in 𝑁
Optimizing KL Divergence
§ The approximate distribution 𝑃’(𝑋) factors as 𝑃’ 𝑿 𝒆 = 0 𝑃’(𝑋𝑼|𝒆)
$𝑼 ∏,∈𝑼𝑃’(𝑈|𝒆)
§ 𝑋𝑼 ranges over the families of the network 𝑁
§ 𝑈 ranges over nodes that appear as parents in 𝑁 § Some observations about this assumption
§ This choice of 𝑃’ 𝑿 𝒆 is expressive enough to describe distributions induced by polytree networks
§ If the network 𝑁 is a polytree then 𝑃 𝑿 𝒆 factors according to this equation (see figure for an example)
§If𝑁isnotapolytree,thenwearetryingtofit𝑃 𝑿𝒆 intoan approximation 𝑃’ 𝑿 𝒆 as if it were generated by a polytree
𝑃 𝐴,𝐵,𝐶,𝐷,𝐸,𝐹 =
𝑃 𝐴 𝑃 𝐶 𝑃(𝐵,𝐴)𝑃(𝐷,𝐵,𝐶)𝑃(𝐸,𝐷)𝑃(𝐹,𝐷)
𝑃 𝐴 𝑃 𝐵 𝑃 𝐶 𝑃 𝐷 𝑃(𝐷)
Optimizing KL Divergence
§ The previous correspondence that IBP fixed points are stationary points of the KL divergence
§ They may or may not be local minima
§ When IBP performs well, it often has fixed points that are minima of the KL
divergence
§ Otherwise, we need to seek approximations 𝑃’ whose factorizations are more expressive than the polytree-based factorization
§ If we do not insist on marginals being over families and individual variables, we can have a more general form that covers every distribution
Generalized Belief Propagation
§ We saw in the previous lecture that a network can be factorized according to this this expression if
§ 𝑪 corresponds to the clusters of a jointree § 𝑺 corresponds to the separators
§ If we base our factorization in a jointree
§ Solving the previous optimization problem yields the same update
equations of the jointree algorithm
§ Therefore, the factorizations used by IBP and the factorization based on jointrees can be viewed as two extremes
§ One efficient but approximate § The other expensive but exact
𝑃’𝑿𝒆 =∏𝑪𝑃’(𝑪|𝒆) ∏𝑺 𝑃’(𝑺|𝒆)
Joingraphs
§ There is a spectrum of factorizations that fall in between these two extremes
§ This allows a trade-off quality and efficiency
§ The notion of joingraph is one way to obtain such a spectrum
§ Joingraphs are generalizations of jointrees §Theycanbeusedtoobtainfactorizationsaccordingto𝑃’𝑿𝒆 =∏𝑪Y’(𝑪|𝒆)
§ They are used to formulate a message-passing algorithm like IBF, known as iterative joingraph propagation
∏𝑺 Y’(𝑺|𝒆)
Joingraphs
§ A joingraph 𝐺 for a network 𝑁 is a graph where nodes 𝑖 are labelled by cluster 𝑪Q , and edges 𝑖 − 𝑗 are labelled by separators 𝑺QR . Moreover, 𝐺 satisfies the following properties
§ Clusters 𝑪! and separators 𝑺!" are sets of nodes from 𝑁
§ Each factor in 𝑁 must appear in some cluster 𝑪!
§ If a variable 𝑋 appears in two clusters 𝑪! and 𝑪𝒋, then there exists a unique path connecting 𝑖 and 𝑗
in the joingraph such that 𝑋 appears in every cluster and separator on that path
§Foreveryedge𝑖−𝑗inthejoingraph,𝑺!" ⊆𝑪! ∩𝑪"
CDE Joingraph
Jointrees and Joingraphs
§ We can think of a jointgraph as a way of relaxing some constraints of jointrees
§ In a jointree, if two clusters 𝑪! and 𝑪" share a set of variables 𝑿 then every cluster and separator on the path connecting 𝑪! and 𝑪" must contain 𝑿
§ In a joingraph, we assert each variable 𝑋 ∈ 𝑿 be contained in clusters and separators of some path connecting 𝑪! and 𝑪"
§ We do not require separators 𝑺!" to be precisely the intersection of 𝑪! and 𝑪", as in the case of jointrees
CDE Jointree
CDE Joingraph
Valid Joingraph?
𝜙S 𝐴,𝐵,𝐶 ,𝜙T 𝐵,𝐶 ,𝜙U 𝐵,𝐷 ,𝜙V 𝐷,𝐸 ,𝜙W 𝐵,𝐸 ,𝜙X 𝐵,𝐷 ,𝜙Y 𝐵,𝐷,𝐹
1: A,B,C 4: B,E
2: B,C,D 5: D,E
Valid Joingraph?
𝜙S 𝐴,𝐵,𝐶 ,𝜙T 𝐵,𝐶 ,𝜙U 𝐵,𝐷 ,𝜙V 𝐷,𝐸 ,𝜙W 𝐵,𝐸 ,𝜙X 𝐵,𝐷 ,𝜙Y 𝐵,𝐷,𝐹
1: A,B,C 4: B,E
2: B,C,D 5: D,E
Valid Joingraph?
𝜙S 𝐴,𝐵,𝐶 ,𝜙T 𝐵,𝐶 ,𝜙U 𝐵,𝐷 ,𝜙V 𝐷,𝐸 ,𝜙W 𝐵,𝐸 ,𝜙X 𝐵,𝐷 ,𝜙Y 𝐵,𝐷,𝐹
1: A,B,C 4: B,E
2: B,C,D 5: D,E
Joingraph Factorization
§ A joingraph induces an approximate
factorization ABC
𝑃’𝑿𝒆 = ∏!𝑃’(𝑪!|𝒆)
∏!" 𝑃’(𝑺!"|𝒆) ABC
§ When the joingraph corresponds to a jointree, the factorization is exact
CDE Joingraph
CDE Jointree
Dual Joingraph
§ A dual joingraph is a special joingraph whose factorization reduces to the one used by IBP
§ A dual joingraph 𝐺 for a network 𝑁 is obtained as follows
§ 𝐺 has the same undirected structure as 𝑁
§ For each family 𝑋𝑼 in 𝑁, the corresponding node 𝑖
in 𝐺 has the cluster 𝑪! = 𝑋𝑼
§ For each 𝑈 → 𝑋 in 𝑁, the corresponding edge 𝑖 − 𝑗 in 𝐺 has separator 𝑺!" = 𝑈
CDE Dual Joingraph
E Bayes net
Factorization Comparison
§ Dual jointgraph (approximate, same as IBP)
𝑃’𝑿𝒆 =𝑃’𝐴𝒆𝑃’𝐵𝒆𝑃’𝐴,𝐵,𝐶𝒆𝑃’𝐴,𝐵,𝐷𝒆𝑃’(𝐶,𝐷,𝐸|𝒆)
Dual Joingraph
𝑃’ 𝐴 𝒆 0𝑃’ 𝐵 𝒆 0𝑃’(𝐶|𝒆)𝑃’(𝐷|𝒆)
E Bayes net
Factorization Comparison
§ Dual jointgraph (approximate, same as IBP)
𝑃’𝑿𝒆 =𝑃’𝐴𝒆𝑃’𝐵𝒆𝑃’𝐴,𝐵,𝐶𝒆𝑃’𝐴,𝐵,𝐷𝒆𝑃’(𝐶,𝐷,𝐸|𝒆)
ABC § Jointree (exact) ABC
𝑃’ 𝐴 𝒆 0𝑃’ 𝐵 𝒆 0𝑃’(𝐶|𝒆)𝑃’(𝐷|𝒆)
𝑃’ 𝑿 𝒆 = 𝑃’ 𝐴,𝐵,𝐶 𝒆 𝑃’ 𝐴,𝐵,𝐷 𝒆 𝑃’ 𝐴,𝐵,𝐶,𝐷 𝒆 𝑃’(𝐶,𝐷,𝐸|𝒆) 𝑃’(𝐴, 𝐵, 𝐶|𝒆)𝑃’ 𝐴, 𝐵, 𝐷 𝒆 𝑃’(𝐶, 𝐷|𝒆)
E Bayes net
Factorization Comparison
§ Dual jointgraph (approximate, same as IBP)
𝑃’𝑿𝒆 =𝑃’𝐴𝒆𝑃’𝐵𝒆𝑃’𝐴,𝐵,𝐶𝒆𝑃’𝐴,𝐵,𝐷𝒆𝑃’(𝐶,𝐷,𝐸|𝒆)
𝑃’ 𝐴 𝒆 0𝑃’ 𝐵 𝒆 0𝑃’(𝐶|𝒆)𝑃’(𝐷|𝒆)
§ Jointree (exact)
𝑃’ 𝑿 𝒆 = 𝑃’ 𝐴,𝐵,𝐶 𝒆 𝑃’ 𝐴,𝐵,𝐷 𝒆 𝑃’ 𝐴,𝐵,𝐶,𝐷 𝒆 𝑃’(𝐶,𝐷,𝐸|𝒆)
𝑃’(𝐴, 𝐵, 𝐶|𝒆)𝑃’ 𝐴, 𝐵, 𝐷 𝒆 𝑃’(𝐶, 𝐷|𝒆)
§ Joingraph (trade complexity and quality)
𝑃’ 𝑿 𝒆 = 𝑃’ 𝐴,𝐵,𝐶 𝒆 𝑃’ 𝐴,𝐵,𝐷 𝒆 𝑃’ 𝐴,𝐶,𝐷 𝒆 𝑃’(𝐶,𝐷,𝐸|𝒆)
𝑃’(𝐵|𝒆)𝑃’ 𝐴, 𝐶 𝒆 𝑃’ 𝐴, 𝐷 𝒆 𝑃’(𝐶, 𝐷|𝒆)
CDE Joingraph
E Bayes net
Factorization Comparison
§ Joingraph (trade complexity and quality)
𝑃’ 𝑿 𝒆 = 𝑃’ 𝐴,𝐵,𝐶 𝒆 𝑃’ 𝐴,𝐵,𝐷 𝒆 𝑃’ 𝐴,𝐶,𝐷 𝒆 𝑃’(𝐶,𝐷,𝐸|𝒆)
𝑃’(𝐵|𝒆)𝑃’ 𝐴, 𝐶 𝒆 𝑃’ 𝐴, 𝐷 𝒆 𝑃’(𝐶, 𝐷|𝒆)
CDE Joingraph
E Bayes net
Iterative Joingraph Propagation
§ Suppose we have a network 𝑁 that induces a distribution 𝑃
§ And a corresponding joingraph that induces a factorization 𝑃’
§ Also, we want to compute cluster marginals 𝑃’(𝑪#|𝒆) and separator marginals 𝑃’(𝑺#$|𝒆) that minimize the KL divergence between 𝑃(𝑿|𝒆) and 𝑃’(𝑿|𝒆)
§ This optimization problem can be solved with a generalization of IBP called interactive joingraph propagation (IJGP)
Iterative Joingraph Propagation
§ The algorithm starts assigning each network factor 𝜙 and evidence indicator 𝜆𝒆 to some cluster 𝑪! that contains variables in 𝜙
§ All factors are associated to some cluster (no information loss)
§ No factor is present in more than one cluster (no overcounting of
information)
§ It propagates messages with the equations
§ 𝑀#$ = 𝜂 ∑𝑪!\𝑺!" 𝜓# ∏+,$ 𝑀+#
§ where 𝜓# is the product of all CPTs and evidence indicators assigned to cluster 𝑪#
§ 𝑀#$ is the message sent from cluster 𝑖 to cluster 𝑗
Parallel Iterative Joingraph Propagation
initialize all messages
while messages have not converged do
for each joingraph edge 𝑖 − 𝑗 do
𝑀4=𝜂∑ 𝜓∏ 𝑀4.%
𝑪#\𝑺#$ 𝑪#\𝑺#$
𝑀4.% ( ;<( ;*
return𝛽 𝑪 =𝜂𝜓 ∏ 𝑀3 foreachnode𝑖 # #++#
Joingraph Example with Markov Nets
𝐷 𝐴 𝜙6(𝐷,𝐴) 𝐴 𝐵 𝜙4(𝐴,𝐵) 𝑑 𝑎 100 𝑎 𝑏 30
f 𝑑̅𝑎f1 𝑎𝑏5
𝑑𝑎 1 𝑎f𝑏 1 1:A,B 1:A,D ̅f
𝑑 𝑎f 100 𝐴 𝑎f 𝑏 10 𝐷𝐵
𝐶 𝐷 𝜙5(𝐶,𝐷) 𝑐 𝑑̅ 1
𝐵 𝐶 𝜙0(𝐵,𝐶) 𝑏 𝑐 100
𝑐̅𝑑100 𝑏f𝑐1 𝑐̅ 𝑑̅ 1 𝑏f 𝑐̅ 100
Joingraph Example with Markov Nets
𝐷 𝐴 𝜙6(𝐷,𝐴) 𝐴 𝐵 𝜙4(𝐴,𝐵) 𝑑 𝑎 100 𝑎 𝑏 30
f 𝑑̅𝑎f1 𝑎𝑏5
𝜓% 𝐴,𝐵 = 𝜙%(𝐴,𝐵)
𝜓3 𝐴,𝐷 = 𝜙3(𝐴,𝐷)
𝑑𝑎 1 𝑎f𝑏 1 ̅f
1:A,B 1:A,D
𝑑 𝑎f 100 𝐴 𝑎f 𝑏 10 𝐷𝐵
𝐶 𝐷 𝜙5(𝐶,𝐷) 𝑐 𝑑̅ 1
𝐵 𝐶 𝜙0(𝐵,𝐶) 𝑏 𝑐 100
𝑐̅𝑑100 𝑏f𝑐1 𝑐̅ 𝑑̅ 1 𝑏f 𝑐̅ 100
𝜓A 𝐵,𝐶 =𝜙A(𝐵,𝐶)
𝜓1 𝐶,𝐷 =𝜙1(𝐶,𝐷)
Joingraph Example with Markov Nets
Message Passing with Markov Nets
𝑀%,3 = ∑#𝜓% 𝐴,𝐵 𝑀A,%(𝐵)
Message Passing with Markov Nets
𝑀%,3 = ∑#𝜓% 𝐴,𝐵 𝑀A,%(𝐵)
𝑀3,% = ∑!𝜓3 𝐴,𝐷 𝑀1,3(𝐷)
𝑀A,% = ∑- 𝜓A 𝐵,𝐶 𝑀1,A(𝐶) 𝑀%,A = ∑, 𝜓% 𝐴,𝐵 𝑀3,%(𝐴)
𝑀3,1 = ∑, 𝜓3 𝐴,𝐷 𝑀%,3(𝐴) 𝑀1,3 = ∑- 𝜓1 𝐶,𝐷 𝑀A,1(𝐶)
𝜓1 𝐶,𝐷 𝑀A,1 = ∑#𝜓A 𝐵,𝐶 𝑀%,A(𝐵)
𝑀1,A = ∑!𝜓1 𝐶,𝐷 𝑀3,1(𝐷)
Message Passing: Avoid Self-Beliefs
§ Notice that
§ 𝑀U,V only considers information received from 2 (𝑀T,U)
§ Therefore, ignoring information from 4
§ This helps to avoid reinforcing self- beliefs
𝑀1,3 = ∑- 𝜓1 𝐶,𝐷 𝑀A,1(𝐶)
𝑀3,1 = ∑, 𝜓3 𝐴,𝐷 𝑀%,3(𝐴)
𝜓1 𝐶,𝐷 𝑀A,1 = ∑#𝜓A 𝐵,𝐶 𝑀%,A(𝐵)
Bethe Graph
§ A simple way to generate a valid joingraph §Foreach𝜙i,makeacluster𝑪i =𝑉𝑎𝑟𝑠(𝜙i)
§ For each variable 𝑋Q , create a singleton cluster {𝑋Q } §Edge(𝑪i,𝑋Q)if𝑋Q ∈𝑪i
Which factor goes here? 𝜓( = 1
7: 𝜙% 8: 𝜙A
𝜙S 𝐴,𝐵,𝐶 ,𝜙T 𝐵,𝐶 ,𝜙U 𝐵,𝐷 ,𝜙V 𝐷,𝐸 ,𝜙W 𝐵,𝐸 ,𝜙X 𝐵,𝐷 ,𝜙Y 𝐵,𝐷,𝐹
10: 𝜙3 11: 𝜙2
Belief Propagation Error
Pyramid network
Loopy belief propagation
Likelihood sampling with 200 cases
Murphy, K., Weiss, Y., & Jordan, M. I. (2013). Loopy belief propagation for approximate inference: An empirical study. arXiv preprint arXiv:1301.6725.
Belief Propagation Error
toyQMR network
Loopy belief propagation
Likelihood sampling with 200 cases
Murphy, K., Weiss, Y., & Jordan, M. I. (2013). Loopy belief propagation for approximate inference: An empirical study. arXiv preprint arXiv:1301.6725.
Belief Propagation Error
Full scale ICU network
Loopy belief propagation
Likelihood sampling with 200 cases
Murphy, K., Weiss, Y., & Jordan, M. I. (2013). Loopy belief propagation for approximate inference: An empirical study. arXiv preprint arXiv:1301.6725.
Belief Propagation Error
QMR-DT network
Posterior marginals for three nodes
Exact marginals (circles) and error bars
Murphy, K., Weiss, Y., & Jordan, M. I. (2013). Loopy belief propagation for approximate inference: An empirical study. arXiv preprint arXiv:1301.6725.
Convergence and Message Schedule
§ In their analysis, Murphy, Weiss & Jordan used synchronous (parallel) message passaging
§ However, convergence can be improved with asynchronous approaches
§ Some approaches for asynchronous message scheduling
§ Tree reparameterization (TRP): Choose a tree (spanning tree is a good
choice) and pass messages. The trees must cover all edges
§ Residual belief propagation (RBP): Pass messages between two clusters whose beliefs over separators disagree the most. Usually, organised with a priority queue
§ Smoothing messages
§𝑀=𝜆𝜂∑ 𝜓∏𝑀+1−𝜆𝑀789
#$ 𝑪!\𝑺!" # +,$ +# #$
𝜓4 𝜓0 𝜓5 𝜓6 𝜓: 𝜓; 𝜓< 𝜓= 𝜓>
Joingraph Example with Markov Nets
𝐷 𝐴 𝜙6(𝐷,𝐴) 𝐴 𝐵 𝜙4(𝐴,𝐵) 𝑑 𝑎 100 𝑎 𝑏 100
f 𝑑̅𝑎f1 𝑎𝑏2
𝑑𝑎1 𝑎f𝑏1 ̅f
𝑑 𝑎f 100 𝐴 𝑎f 𝑏 100 𝐷𝐵
𝐶 𝐷 𝜙5(𝐶,𝐷)
𝑐̅𝑑100 𝑏f𝑐1 𝑐̅ 𝑑̅ 1 𝑏f 𝑐̅ 100
𝐵 𝐶 𝜙0(𝐵,𝐶) 𝑏 𝑐 100
150 300 iteration
Convergence and Message Schedule
50 random grids of size 11 × 11 and C = 11 (left) and C = 13 (right)
Elidan, G., McGraw, I., & Koller, D. (2012). Residual belief propagation: Informed scheduling for asynchronous message passing. arXiv preprint arXiv:1206.6837.
Conclusion
§ Belief propagation extends the paradigm of message
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com