COMP 424 – Artificial Intelligence Causal Graphical Models
Instructor: Jackie CK Cheung
Readings: Shalizi, 2019, Ch. 19, 20 – 20.3.1
https://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/
Copyright By PowCoder代写 加微信 powcoder
Today’s overview
• Review of Bayesian Network
• Causation vs. Correlation
• Causal Graphical Models
• do-operator
• Deconfounding with fully observed data
• Back-door Criterion
Bayesian networks
• Bayesian networks represent conditional independence relationships in a systematic way using a graphical model.
• Specify conditional independencies using graph structure.
• Graphical model = graph structure + parameters. B
Bayesian networks – Basics
• Nodes are random variables
• Edges specify dependency between random variables
• E.g., B: bronchitis, C: cough, F: fever (binary random variables)
• Edges specify that B directly influences probability of C, F.
• This results in conditional probability distributions:
Semantics of network structure
• In Bayesian networks, joint probability distribution is the product of these conditional probability distributions
𝑃𝑥,𝑥,…,𝑥 =ෑ𝑃𝑥 parents𝑋 12𝑛𝑖𝑖
𝑃𝐵,𝐶,𝐹 =𝑃𝐶𝐵𝑃𝐹𝐵𝑃(𝐵)
Queries in Bayes Nets
• Reminder: queries in Bayes Nets don’t have to correspond to direction of arrow!
• Remember the alarm domain? • Tampering
• Smoke • Leaving • Report
• We can ask about P(Alarm|Fire) or P(Fire|Alarm) just as easily – same procedure to do calculations!
Causal Graphical Models
• This model happens to be causal – the direction of the arrows match how we expect influences to happen in the
world. e.g.,
• A fire could create smoke
• Smoke is unlikely to create fire
• Contrast with correlation
• Smoke and fire are correlated
• Topic of today’s discussion: examine these issues of causality!
• For our purposes: when one event (cause) contributes to another event (effect)
• A difficult philosophical subject to define formally and precisely!
• If you turn on a light, did you cause the light to go on, or did the
physical mechanism of the switch?
https://xkcd.com/552/
Counterfactual as a Test
• A diagnostic test to determine causation is to check if the effect would still happen if the proposed cause were false.
• Counterfactual reasoning – reasoning about what is not the case
• e.g., if there were no fire, then there would be no smoke,
so fire causes smoke
• Does wearing sunglasses cause a sunny day?
• If I were not wearing sunglasses (or I take them off), it would still be a sunny day, so wearing sunglasses does NOT cause a sunny day.
Spurious Correlations
• With enough data, you can find some wild correlations!
https://www.tylervigen.com/spurious-correlations
[Warning: subject matters involving death and suicide are mentioned]
Civil Engineers Love Cheese
Causation or Correlation?
• The 1918 flu pandemic was one of the deadliest pandemics in human history.
• Scientists noticed that a certain bacterium appears in flu patients, today called Haemophilus influenzae, which they thought caused the flu.
Source: Wikipedia
Opportunistic Pathogen
• In 1933, it was discovered that flu is caused by a virus, not by a bacterium like H. influenzae.
• Instead, H. influenzae is an opportunistic pathogen:
• Normally lives in healthy adults without causing symptoms
• Takes advantage of a person’s weakened immune system (e.g., because of the flu), to multiply can cause problems
• So, flu -> immune system -> active H. influenza -> state of health
• Understanding causal structure here could directly impact how we discover effective treatments and cures!
Causal Graphical Model
• Formalism for representing and reasoning about causal effects with the following assumptions:
1. Directed acyclic graph in which edges point between random variables from cause to effect
2. Causal Markov condition: conditional on its direct causes, a node is independent of all variables which are not direct causes or direct effects of that node
3. Faithfulness: the joint distribution has all and only those independence assumptions implied by 2.
• 1. and 2. are similar to what we have seen for Bayesian networks in general, except in a causal setting
• 3. needed for technical reasons.
Kinds of Conditioning
Probabilistic conditioning
• What we’ve been doing so far
• With standard Bayes nets, we can ask queries which are observational in nature.
e.g., P(Sunny|Sunglasses)
• Probability that it is sunny given that sunglasses are being worn.
• Would expect that if Sunglasses = T, then probability of sunny weather would increase.
• But we now want to ask causal questions about interventions or treatments!
Causal Conditioning – do Operator
𝑃(𝑋 |𝑑𝑜(𝑋 = 𝑥 )) 𝑒𝑐𝑐
• Asking about what would happen to 𝑋 𝑒
𝑋 tovalue𝑥 𝑐𝑐
if we set variable
• Not the same thing as probabilistic conditioning! e.g. P(Sunny|do(Sunglasses = T))
• Probability that it is sunny if we make somebody wear sunglasses.
• Should be equal to the prior probability of P(Sunny)
• Should not equal P(Sunny|Sunglasses = T)
Example: Brushing and Heart Disease
Shalizi, 2019
do(Brushing)
By setting value of toothbrushing, we have altered graph structure
Probabilistic vs. Causal Conditioning
Probabilistic conditioning:
• Observational, factual
• Selecting a subpopulation depending on variables being conditioned on to do statistical estimation of the values of the query variables
Causal conditioning:
• Interventional, counterfactual
• Creating a new subpopulation to test the effect on the query variables if we change the conditioned variables
Causal Inference Problems
Given a causal graphical model, what are the effects that variables have on each other?
What is the causal structure of a problem, given relevant data?
We’ll focus on 1 in this lecture.
Identifying Causal Effects
• Goal is to estimate 𝑃(𝑋 |𝑑𝑜(𝑋 = 𝑥 )), given the causal 𝑒𝑐𝑐
graph structure
1. Run an experiment!
2. Extract the causal effect from observational data
Running an Experiment
• Literally implement 𝑃(𝑋 |𝑑𝑜(𝑋 = 𝑥 )) 𝑒𝑐𝑐
• What scientists spend their time doing
• Must be careful to avoid confounds
• e.g., assign subjects to brush their teeth or not, but you do it based on gender
• Now, your experiment gives you
𝑃(𝐻𝑒𝑎𝑟𝑡𝐷𝑖𝑠𝑒𝑎𝑠𝑒|𝑑𝑜 𝐵𝑟𝑢𝑠h𝑖𝑛𝑔 , 𝐺𝑒𝑛𝑑𝑒𝑟), which is not what you
• But it is not easy (maybe impossible) to identify all the possible confounds in your experiment
Randomized Control Trials
• Goal: Estimate 𝑃 𝑋 𝑑𝑜 𝑋 = 𝑥 using experiments 𝑒𝑐𝑐
• Problem: Confounds may cause us to estimate the wrong quantity𝑃𝑋 𝑑𝑜𝑋 =𝑥 ,𝐶𝑜𝑛𝑓𝑜𝑢𝑛𝑑
• Solution: Randomize over all other variables besides 𝑋 , 𝑐
so that you can measure the difference between 𝑃(𝑋 |𝑑𝑜(𝑋 =𝑥))and𝑃 𝑋 𝑑𝑜 𝑋 =𝑥′
• Typical example: randomly assign experimental subjects into the different conditions
• No need to write down or identify all the confounds!
• This is why RCTs are the gold standard in medical research.
Extracting Causal Effects from Observational Data
• Experiments are expensive! Can we use naturally occurring (i.e., observational) data to estimate causal effects and not just correlations?
• Sometimes, we can estimate 𝑃(𝑋 |𝑑𝑜(𝑋 = 𝑥 )) from 𝑒𝑐𝑐
observational data, but not always!
• First, let’s consider the identifiability of conditional distributions with respect to observations
• Does changing the conditional distribution change the observations?
Identifiability of Probabilistic Conditioning
• Consider two random variables 𝑋, and 𝑌 in a Bayes net where 𝑋 is a parent of 𝑌.
• If we change the conditional probability distribution, 𝑃(𝑌|𝑋), then the observations that we get of 𝑋 and 𝑌 would also change, by definition, since probabilistic conditioning is observational.
• So, we can estimate 𝑃(𝑌|𝑋) from samples of 𝑋 and 𝑌. Probabilistic conditioning is thus identifiable.
Identifiability of Causal Conditioning
• Consider two random variables 𝑋, and 𝑌 in a causal Bayes net where 𝑋 is a parent of 𝑌.
• If we change the conditional probability distribution, 𝑃(𝑌|𝑑𝑜(𝑋)), then the observations that we get of 𝑋 and 𝑌 might not change!
Smoke and Fire
• Suppose that fire causes smoke with a certain probability. But suppose that there is a malicious prankster in the background, who has access to a smoke generation machine, and sometimes sets fires.
Smoke and Fire
• For an observational dataset of fire and smoke:
• We can always estimate 𝑃(𝑆|𝐹), because that is by definition
observational
• We cannot identify 𝑃(𝑆|𝑑𝑜(𝐹)), because we don’t know if the smoke is generated by the fire, or because the prankster used their smoke machine!
• Need to also know prankster’s behaviours and dynamics. P
Example: Non-identifiability of Causal Conditioning
Suppose we observe in a dataset that whenever fire appears, smoke appears half the time
So,𝑃𝑆𝐹 =0.5
Maybe fires cause smoke only 10% of the time, and the remaining 40% of the smoke that co-occurs with fire in the dataset is caused by the prankster running a smoke machine or lighting a fire
𝑃 𝑆𝐹 =0.5still,but𝑃 𝑆𝑑𝑜 𝐹 =0.1
Or maybe the prankster took the day off, and all the smoke is actually caused by fire.
𝑃𝑆𝐹 =𝑃𝑆𝑑𝑜𝐹 =0.5
No way to tell between these scenarios by observing only fire and smoke!
Fully Observed Data
• Identification strategy with fully observed data:
𝑃 𝑌𝑑𝑜 𝑋=𝑥 =𝑃 𝑌𝑋=𝑥,𝑇=𝑡 𝑃(𝑇=𝑡)
where 𝑇 represents the parents of 𝑋
• Intuitively, the parents of X matter because we replaced the “naturally occurring” 𝑃(𝑋|𝑇) from observations and set 𝑃𝑋=𝑥=1
• So, to recover the causal effect, we must adjust the probabilities estimated from observations by multiplying 𝑃(𝑇 = 𝑡).
Prankster Example
Observational Pr(S|F)?
Prankster Example
Observational Pr(S|F)?
= 2 / 3 = 0.6667
Causal Pr(S|do(F)?
Prankster Example
Observational Pr(S|F)?
= 2 / 3 = 0.6667
Causal Pr(S|do(F)?
=𝑃𝑟 𝑆𝐹,𝑃=𝑝 𝑃𝑟(𝑃=𝑝)
=Pr𝑆𝐹,𝑃 Pr𝑃 +Pr𝑆𝐹,¬𝑃 Pr¬𝑃
= 1 ∗ 0.2 + 0.5 ∗ 0.8 = 0.6
Toothbrushing Example
σ𝑡𝑃 𝐻𝐷𝐵𝑟,𝐻𝐶=𝑡 𝑃(𝐻𝐶=𝑡)
The Case of Partial Observations
• If you have only partial observations, then it is not always possible to identify causal effects from observational data.
• The back-door criterion is one of the cases when this is possible.
• A back-door path between X and Y is an undirected path between X and Y, which could potentially create confounding by providing an alternate path for information to flow.
• Our adjustment strategy needs to account for all such paths!
Back-door Paths for P(HD|do(Br))
Shalizi, 2019
Back-Door Criterion
• Let 𝑆 be a set of variables which satisfy the back-door criterion:
1. S blocks every back-door path between X and Y
2. No node in S is a descendent of X
𝑃 𝑌𝑑𝑜 𝑋=𝑥 =𝑃 𝑌𝑋=𝑥,𝑆=𝑠 𝑃(𝑆=𝑠)
1. is needed to block alternate flows of information
2. is needed to prevent leakage of information due to V- structures, in case where a node in S is a descendant of both X and Y.
Exercise: Back-door Criterion
Suppose that we can’t accurately get people’s health consciousness.
How can we still estimate 𝑃𝐻𝐷𝑑𝑜𝐵𝑟 ?
Exercise: Back-door Criterion
Suppose that we can’t accurately get people’s health consciousness.
How can we still estimate 𝑃𝐻𝐷𝑑𝑜𝐵𝑟 ?
Ans: Measure frequency of exercise (𝐹𝐸) and amount of fat/red meat in diet (𝐷). 𝑃𝐻𝐷𝑑𝑜𝐵𝑟 =
σ𝑠,𝑡𝑃 𝐻𝐷𝐵𝑟,𝐹𝐸=𝑠,𝐷=𝑡 𝑃(𝐹𝐸=𝑠,𝐷=𝑡)
• Correlation does not equal causation!
• Understanding causation is critical to science and
technology.
• Causal graphical models is a formal representational tool for representing causal structure
• Causal conditioning and the do operator
• You can estimate causal effects by running an experiment.
• Sometimes, you can even do so with observational data, but you need to make adjustments to account for potential confounding variables.
• Back-door criterion is one way to do this.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com