留学生作业代写 COMP9418: Advanced Topics in Statistical Machine Learning

COMP9418: Advanced Topics in Statistical Machine Learning
Bayesian Networks 2
Instructor: University of Wales

Copyright By PowCoder代写 加微信 powcoder

Introduction
§ This lecture continues the study of Bayesian networks
§ We will see which types of queries can be answered by this probabilistic reasoning
§ There are four main types of queries and we will see how to use them in specific situations
§ We will discuss several use cases for Bayesian networks
§ We start with a problem description and we will design a model for the problem § Also, we will discuss what answers we can get with different types of queries

Probability of Evidence
§ Ones of the simplest queries is the probability of some variable instantiation, 𝒆, 𝑃(𝒆)
§ For instance, the probability a patient has a positive x-ray but no dyspnoea 𝑃(𝑋 = 𝑡𝑟𝑢𝑒, 𝐷 = 𝑓𝑎𝑙𝑠𝑒)
§ The variables 𝑬 = {𝑋, 𝐷} are evidence variables
§ 𝑃 𝒆 is known as probability of evidence query
§ There are other types of evidence beyond variable instantiation
§ For instance, the probability that a patient has either a positive x-ray or dyspnoea, 𝑃(𝑋 = 𝑡𝑟𝑢𝑒 ∨ 𝐷 = 𝑡𝑟𝑢𝑒)
§ Bayesian networks do not directly support queries with arbitrary pieces of evidence
§ But these probabilities can be computed indirectly

Probability of Evidence
§ We can add an auxiliary node 𝐸
§ Declare nodes 𝑋 and 𝐷 as parents of 𝐸
§ Adopt the following CPT for 𝐸
§ We can use the augmented network and compute 𝑃(𝐸 = 𝑦𝑒𝑠)
§ This technique is known as auxiliary-node method
§ It is practical only when the number of evidence variables is
§ However, this CPT has only 0 and 1 values, known as
deterministic CPT
§ We can use some techniques to represent deterministic CPT that do not grow exponentially
𝑋 𝐷 𝐸 𝑃(𝐸|𝑋,𝐷) 𝑥𝑑̅𝑒 1 𝑥𝑑𝑒 1 𝑥̅𝑑̅𝑒 1 𝑥̅𝑑𝑒 0

Prior and Posterior Marginals
§ Posterior-marginal queries are the most common ones
§ Let us first discuss the terms posterior and marginal
§ Given a joint probability distribution 𝑃(𝑥!, … , 𝑥”)
§ The marginal distribution 𝑃(𝑥!,…,𝑥”),𝑚 ≤ 𝑛 is defined as
§ The marginal distribution can be viewed as a projection of the joint distribution on a smaller set of variables
§ When the marginal distribution is computed given some evidence 𝒆
§ It is known as a posterior marginal
§ This contrasts with marginal distribution given no evidence, known as prior marginal
𝑃 𝑥!,…,𝑥”
𝑃 𝑥!,…,𝑥”|𝒆 =
> 𝑃(𝑥!,…,𝑥&) #!”#,…,#$
> 𝑃(𝑥!,…,𝑥&|𝒆) #!”#,…,#$

Prior and Posterior Marginals
§ This screen shows the marginals for each variable
§ For instance, the distribution of variable 𝐶, lung cancer is
𝐶 𝑃(𝐶) 𝑐 5.5% 𝑐̅ 94.5%

Prior and Posterior Marginals
§ This screen shows the marginals for each variable
§ For instance, the distribution of variable 𝐶, lung cancer is
𝐶 𝑃(𝐶) 𝑐 5.5% 𝑐̅ 94.5%
§ Suppose the patient has a positive x-ray but no dyspnoea 𝐵: 𝑋 = 𝑡𝑟𝑢𝑒, 𝐷 = 𝑓𝑎𝑙𝑠𝑒.
§ The posterior marginal for variable 𝐶 is
𝐶 𝑃(𝐶|𝒆) 𝑐 25.23% 𝑐̅ 74.77%

Most Probable Explanation
§ We now consider the most probable explanation (MPE) queries
§ The goal is to identify the most probable instantiation given some evidence
§ Given 𝑋!, … , 𝑋& variables and 𝒆 is the evidence, the goal is to identify the instantiation 𝑥!, … , 𝑥& for which the probability 𝑃(𝑥!, … , 𝑥&|𝒆) is maximal
§ Such instantiation is called the most probable explanation given evidence 𝒆
§ For example, the MPE for a patient with positive x-ray and dyspnoea
§ It is a person that made no visit to Asia, is a smoker, and has lung cancer and bronchitis but no tuberculosis

Most Probable Explanation
§ MPE cannot be obtained directly from the posterior marginals
§ That is, choosing each value 𝑥’ to maximize 𝑃(𝑥’|𝒆)
§ Consider the case in which the patient has a positive x-ray but no dyspnoea
§ We get an explanation the patient is not a smoker with probability of approximately 38.57%
§ However, if we choose for each variable the value with maximal probability, we get an explanation in which the patient is a smoker with probability of approximately 20.03%

Maximum a Posteriori Hypothesis
§ MPE is a special case of a more general class of queries
§ We may want to find the most probable instantiation for a subset of variables
§ Given 𝑴 ⊆ 𝑿 and some evidence 𝒆, our goal is to find the instantiation 𝒎 of variables 𝑴 for which 𝑃(𝒎|𝒆) is maximal
§ Such instantiation 𝒎 is known as maximum a posteriori hypothesis (MAP)
§ The variables 𝑴 are known as MAP variables
§ MPE is a special case of MAP when MAP variables include all
network variables
§ Such distinction exists because MPE is much easier to compute

Maximum a Posteriori Hypothesis
§ In this example we consider that
§ The patient has a positive x-ray and no dyspnoea
§ MAP variables are 𝑴 = {𝐴, 𝑆}
§ Which answer is 𝐴 = 𝑛𝑜, 𝑆 = 𝑦𝑒𝑠 with probability of approximately 50.74%
§ MPE is frequently used to approximate MAP
§ We say we are projecting the MPE on MAP variables

Maximum a Posteriori Hypothesis
§ In this example we consider that
§ The patient has a positive x-ray and no dyspnoea
§ MAP variables are 𝑴 = {𝐴, 𝑆}
§ Which answer is 𝐴 = 𝑛𝑜, 𝑆 = 𝑦𝑒𝑠 with probability of approximately 50.74%
§ MPE is frequently used to approximate MAP
§ We say we are projecting the MPE on MAP
§ However, it is just an approximation.
§ This figure shows the MPE answer for 𝑴 variables is 𝐴 = 𝑛𝑜, 𝑆 = 𝑛𝑜 with probability approximately 48.09%

Diagnosis Model from Expert I
§ Consider the following medical information § Which variables
§ Disease: flu, cold and tonsillitis
§ Symptoms: chilling, body ache and pain, sore
throat, and fever
§ True or false
§ Structure
§ From the problem statement
The flu is an acute disease characterised by fever, body aches and pains, and can be associated with chilling and a sore throat. The cold is a bodily disorder popularly associated with chilling and can cause a sore throat. Tonsillitis is inflammation of the tonsils that leads to a sore throat and can be associated with fever
Cold? Flu? Tonsillitis?
Body Ache?
Sore Throat?

Diagnosis Model from Expert I
§ Another modelling has one variable “Condition” with values normal, cold, flu, tonsillitis
§ This network structure is known as naïve Bayes
§ The naïve Bayes has the structure 𝐶 → A!,…,𝐶 → 𝐴”, where 𝐶 is the class and 𝐴!,…,A” are the attributes
§ The naïve Bayes structure
§ Has a single-fault assumption
§ Attributes are independent given a condition
§ All attributes are connected to the condition node
Cold? Flu?
Tonsillitis?
Body Ache?
Sore Throat?
Body Ache?
Sore Throat?

Diagnosis Model from Expert I
§ CPTs for the conditions
§ Must provide the belief in developing the condition by a person we have no knowledge of any symptom
§ CPTs for the symptoms
§ Must provide the belief in this symptom under all possible combinations of possible conditions
§ The probabilities for the CPTs can come
§ From medical statistics or subjective beliefs
§ Estimating from medical records of previous patients
§ The diagnosis problem
§ Symptoms represent known evidence
§ Compute the most probable combination of conditions given the evidence
§ MAP or MPE query
Tonsillitis?
Sorethroat?

Diagnosis Model from Expert II
§ Consider the problem of computing the probability of pregnancy given some tests
§ Variables
§ Query variable to represent pregnancy (𝑃)
§ Three evidence variables to represent test results: scanning (𝑆), blood (𝐵) and urine (𝑈)
§ One intermediary variable to represent progesterone level (𝐿)
§ Binary, depending on the variable
A few weeks after inseminating a cow, we have three possible tests to confirm pregnancy. The first is a scanning test that has a false positive of 1% and a false negative of 10%. The second is a blood test that detects progesterone with a false positive of 10% and a false negative of 30%. The third test is a urine test that also detects progesterone with a false positive of 10% and a false negative of 20%. The probability of a detectable progesterone level is 90% given pregnancy and 1% given no pregnancy. The probability that insemination will impregnate a cow is
§ Structure
§ Causal from the problem statement 87%

Diagnosis Model from Expert II
§ Some independencies
§ Blood and urine tests are independent, given the
progesterone level
§ The scanning test is independent of the blood and urine tests, given the status of pregnancy
§ Urine and blood tests are not independent given the status of pregnancy
§ Suppose we inseminate a cow and after a few
weeks the three tests come out negative
𝒆:𝑆=𝑓𝑎𝑙𝑠𝑒,𝐵=𝑓𝑎𝑙𝑠𝑒,𝑈=𝑓𝑎𝑙𝑠𝑒
§𝑃𝑃𝒆 ≈10.21%
§ It is relatively high given all three tests came out negative
Pregnant? (𝑃)
𝑃 𝜃& 𝑝 .87
𝑃 𝐿 𝜃+|) 𝑝 𝑙̅ .10
Progesterone Level (𝐿)
Scanning Test (𝑆)
Urine Test
Blood Test (𝐵)
𝑙̅ 𝑢V .20 𝑙̅ 𝑏V .30 𝑝 𝑠̅ .10
𝑃 𝑆 𝜃’|) 𝑙 𝑢 .10 𝑙 𝑏 .10 𝑝̅ 𝑠 .01

Sensitivity Analysis
§ Suppose the farmer is not happy
§ The three negative tests need to drop the probability of
pregnancy to less than 5%
§ We will need to replace the tests, but we need to know the new false positive and negative rates
§ Sensitivity analysis
§ Which network parameters do we have to change, and by how much, to ensure that the probability of pregnancy would be no more than 5% given three negative tests?
§ Solution is to change the scanning test by one with 4.63% false negative rate
§ Urine and blood test cannot help
§ The uncertainty of the progesterone level is such that even perfect urine and blood tests cannot achieve the desired confidence level

Network Granularity
§ The progesterone level is neither a query or an evidence variable
§ Why do we need to include it in the network?
§ Intermediate variables are a modelling convenience
§ It helps modelling urine and blood tests with pregnancy
§ However, the network allow us to compute the following
𝑃(𝐵 = 𝑓𝑎𝑙𝑠𝑒|𝑃 = 𝑡𝑟𝑢𝑒) = 36% 𝑃(𝐵 = 𝑡𝑟𝑢𝑒|𝑃 = 𝑓𝑎𝑙𝑠𝑒) = 10.6% 𝑃(𝑈 = 𝑓𝑎𝑙𝑠𝑒|𝑃 = 𝑡𝑟𝑢𝑒) = 27% 𝑃(𝑈 = 𝑡𝑟𝑢𝑒|𝑃 = 𝑡𝑟𝑢𝑒) = 10.7%
Pregnant? (𝑃)
Progesterone Level (𝐿)
Scanning Test (𝑆)
Urine Test (𝑈)
Blood Test (𝐵)
Pregnant? (𝑃)
Blood Test (𝐵)
Urine Test (𝑈)
Scanning Test (𝑆)

Network Granularity
§ Is this simpler network equivalent to the original one?
§ Simpler: negative blood and urine tests will count more in ruling out pregnancy (45.09% vs 52.96%)
§ We cannot remove intermediate variables without undesirable effects in certain cases
§ In general, an intermediate variable can be bypassed without affecting model accuracy if
§𝑃 𝒒,𝒆 =𝑃’(𝒒,𝒆)
§ For all instantiations 𝒒 of the query variables 𝑸
and 𝒆 of the evidence variables 𝑬
§ 𝑃 is induced by the original Bayesian network and
𝑃’ by the new network
Pregnant? (𝑃)
Progesterone Level (𝐿)
Scanning Test (𝑆)
Urine Test (𝑈)
Blood Test (𝐵)
Pregnant? (𝑃)
Blood Test (𝐵)
Urine Test (𝑈)
Scanning Test (𝑆)

Network Granularity
§ Suppose that 𝑋 is not a query or an evidence variable
§ 𝑋 can be bypassed if it has a single child 𝑌
§ In this case, the CPT for variable 𝑌 is
𝜃’(|𝒖𝒗 = > 𝜃(|#𝒗𝜃#|𝒖 #
§ 𝑼 are the parents of 𝑋 and 𝑽 the parents of 𝑌 other than 𝑋
§ In most case, we do not bypass intermediate variables
§ It tends to create larger CPTs even it does not affect model accuracy

Diagnosis Model from Design
§ This is another diagnosis problem
§ The model will be general to the point it can be
generated automatically for similar instances
§ Evidence variables
§ Primary inputs and outputs of the circuit: 𝐴, 𝐵 and
§ Query variables
§ One for each component: 𝑋, 𝑌, and 𝑍
§ Intermediate variables
§ Internal wires: 𝐶 and 𝐷
§ For larger circuits, it would be unfeasible to model the circuit without representing the internal states
Consider this digital circuit. Given some values for the circuit primarily inputs and outputs, our goal is to decide whether the circuit is behaving normally. If not, decide the most likely health states of its components

Diagnosis Model from Design
§ This structure is general
§ It applied to any system composed by function
§ Outputs determined by inputs and health state
§ Do not contain feedback loops
§ High or low for wire variables
§ Ok or faulty for health variables
§ Stuck-at-zero or stuck-at-one may be more specific

Diagnosis Model from Design
§CPTs 𝑋 𝜃-
ok .99 stuckat0 .005 stuckat1 .005
§ Health variables defined the probability of ok being faulty faulty
§ Component variables are deterministic § CPTs for primary inputs
low ok high
high faulty high low faulty high
high .5 low .5
𝐶 𝜃.|/,- high 0
high 1 high 0 high 0 high 1 high 1
high stuckat0 low stuckat0 high stuckat1 low stuckat1
high ok ok

Diagnosis Model from Design
§ Suppose we have the following test vector
§ 𝒆:𝐴 = h𝑖𝑔h,𝐵 = h𝑖𝑔h,𝐸 = 𝑙𝑜𝑤
§ We want to compute MAP query over
health variables
𝑋 𝑌 𝑍 MAP ok stuckat0 ok 49.4% ok ok stuckat0 49.4%
ok faulty ok
ok ok faulty
49.4% 49.4%

Diagnosis Model from Design
§ Suppose we have two test vectors instead of one
§ For instance, we want to solve the ambiguity of the MAE result
§ We apply two low inputs and get an abnormal low as output
§ We have evidence variables
§ 𝐴’, 𝐵’ and 𝐸’ for new input vector
§ 𝐶’ and 𝐷’ for internal wires
§ 𝑋’, 𝑌’ and 𝑍’ are necessary if we want to model intermittent faults
𝑋’ 𝐴’ 𝐵’ 𝑌’

Diagnosis Model from Design
§ Suppose we have the following test vectors
§ 𝒆:𝐴 = h𝑖𝑔h,𝐵 = h𝑖𝑔h,𝐸 = 𝑙𝑜𝑤 § 𝒆’: 𝐴 = 𝑙𝑜𝑤, 𝐵 = 𝑙𝑜𝑤, 𝐸 = 𝑙𝑜𝑤
𝑋 𝑌 𝑍 MAP ok ok faulty 97.53%
§ For intermittent faults, we need as additional CPT (persistency model)
𝑋 𝑋’ 𝜃-’|-
ok faulty .01 faulty ok .001 faulty faulty .999
𝑋’ 𝐴’ 𝐵’ 𝑌’
𝐶’ 𝑍’ 𝐸’
§ Dynamic Bayesian network (DBN)
§ Include multiple copies of the same variable
§ Different copies represent different states of the variable over time

Reliability Model from Design
§ To address this problem, we need an RBD interpretation
§ Each node represents a block
§ Block 𝐵 represents a subsystem that includes the component 𝐵 and the subsystems feeding into 𝐵
§ For 𝐵 to be available, at least one of the subsystems feeding into 𝐵 must be available
§ Our representation has a single leaf node
This figure depicts a reliability block diagram (RBD) of a computer system, indicating conditions under which the system is guaranteed to be functioning normally (available). At 1,000 days since initial operation, the reliability of different components are as follows: power supply is 99%, fan is 90%, processor is 96%, and the hard drive is 98%. What is the overall system reliability at 1,000 days since operation?
Processor 1
Power Supply
Hard Drive
Processor 2

Reliability Model from Design
§ Variables
§ Availability of each component:
𝐸, 𝐹!, 𝐹,, 𝑃!, 𝑃,, and 𝐷
§ Availability of the whole system: 𝑆
§ Intermediary variables for AND’s and OR’s
§ Root variables correspond to system
components
§ Intermediate variables 𝐴!, … , 𝐴- and 𝑆 are AND gates
§ Intermediate variables 𝑂!, … , 𝑂, are OR gates
Fan 1 ( 𝐹! )
CPU 1 ( 𝑃! )
Processor 1
Power Supply
Hard Drive
Processor 2
𝐴! 𝐴” PSU 𝑂
𝑂$ System (𝑆)
(𝐹”) CPU 2

Reliability Model from Design
§ Marginal for variable 𝑆 provides the system reliability (≈ 95.9%)
§ If we want to increase the system reliability to 96.5% by replacing one component
§ Increase reliability of the hard drive to ≈ 98.6%
§ Increase reliability of the power supply to ≈ 99.6% § Increase reliability of either fan to ≈ 96.2%
§ We found the system functioning abnormally at day 1,000. MAP
provides the most likely explanation
𝐸 𝐹! 𝐹, 𝑃!
avail avail avail avail
§ The next most likely explanations are
avail avail
avail avail
𝑃(. |𝑆 = un_avail)
𝑃(. |𝑆 = un_avail)
21.8% 17.8%
un_avail avail
un_avail avail avail avail

Dealing with Large CPTs
§ One major issue is dealing with large CPTs § If a binary variable has 𝑛 parents
§ We have 2″ independent parameters § This situation causes modelling and
computational problems
§ Modelling problems tend to appear first
§ 1,024 entries may be a small number for
storage/inference
§ But eliciting 1,024 probabilities to quantify the relationship between, say, headache and ten medical conditions is difficult
Number of parents (𝑛)
Number of parameters (27)
1,073,741,824

Micro Models: Noisy-or
§ The first approach is to develop a micro model
§ It specifies the relationship between the parents and
their common child
§ But with a number of parameters that is smaller then 2″
§ A well-known micro model is the noisy-or
§ In a causal interpretation, each cause 𝐶5 is capable of
establishing the effect 𝐸 on its own
§ Except under some unusual circumstances summarised
by the suppressor variable 𝑄5
§ Moreover, the leak variable 𝐿 is meant to represent all other causes of 𝐸
𝐶! 𝐶$ … 𝐶% 𝐸

Micro Models: Noisy-or
§ The noisy-or model can be specified with 𝑛 + 1 parameters
§ 𝜃.% = 𝑃(𝑄’ = active): probability that the suppressor of cause 𝐶’ is active
§ 𝜃/ = 𝑃(𝐿 = active): the probability that the leak variable is active
§ Let 𝐼9 be the indices of causes that are active in 𝛼
§ For instance, if 𝛼: 𝐶! = active, 𝐶, = active, 𝐶0 = passive, 𝐶- =
passive, 𝐶1 = active
§ Then 𝐼2 = {1,2,5}
𝑃(𝐸=passive|𝛼)= 1−𝜃/ {𝜃.% ‘∈4_2
𝐶! 𝐶$ … 𝐶% 𝐶

Micro Models: Noisy-or
§ Revisiting the medical diagnosis problem
§ Sore throat (𝑆) has three different causes Cold (𝐶), flu (𝐹) and
tonsillitis (𝑇)
§ Suppressor probability for 𝐶 is .15, 𝐹 is .01, and 𝑇 is .05 § Leak probability is .02
Sore Throat?
Tonsillitis?
1 − (1 − .02)(.15)(.01)(.05)
1 − (1 − .02)(.15)(.01)
1 − (1 − .02)(.15)(.05)

Other Representations
§ Many times we have some local structure, but it does not fit any existing micro model such as noisy-or
§ For instance, this CPT has a considerable amount of structure
§ But it does not correspond to the assumptions of a noisy-or model
§ For irregular structure, there are several nontabular representation
§ Not necessarily exponential in the number of parents

Decision Trees and Graphs
§ Decision tree is a popular representation
§ We start at the root and branch downward 0
depending on the value of the variable
§ It can have a size that is linear in the number of parents if there is enough structure
§ But it may also be exponential if such structure lacks in the CPT
𝑃(𝐸 = 1) = .0
𝑃(𝐸 = 1) = .9
𝑃(𝐸 = 1) = .3
𝑃(𝐸=1)=.6 𝑃(𝐸=1)=.8

If-then Rules
§ A CPT for variable 𝐸 can be represented using a set of if-then rules
If𝐶!=0∧𝐶,=1
If𝐶! =0∧𝐶, =0∧𝐶0 =1
If𝐶! =0∧𝐶, =0∧𝐶0 =0∧𝐶- =1 If𝐶! =0∧𝐶, =0∧𝐶0 =0∧𝐶- =0
then𝑃𝐸=1 =0 then𝑃𝐸=1 =.9 then𝑃 𝐸=1 =.3 then𝑃 𝐸=1 =.6 then𝑃 𝐸=

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com