CS代考 CSE 3521: Probability Refresher

PowerPoint Presentation

CSE 3521: Probability Refresher
[Many slides are adapted from previous CSE 5521 course at OSU.]

What is Probability?
“The probability the coin will land heads is 0.5”
Q: what does this mean?

What is Probability?
“The probability the coin will land heads is 0.5”
Q: what does this mean?
Interpretations:
Frequentist (Repeated trials)
If we flip the coin many times…
On average, we will see half of the times with heads
We believe there is equal chance of heads/tails
Advantage: events that do not have long term frequencies

Q: What is the probability the polar ice caps will melt by 2050?

Probability Basic (a less strict version)
Begin with a set Ω (i.e., the sample space)
e.g., 6 possible rolls of a die
ω ∈ Ω, is an outcome, a sample point (e.g., ω=1), or atomic event

atomic event: {6} or sample 6

Probability Basic (a less strict version)
Begin with a set Ω (i.e., the sample space)
e.g., 6 possible rolls of a die
ω ∈ Ω, is an outcome, a sample point (e.g., ω=1), or atomic event
An event A is a subset of Ω
A probability space or probability model is a sample space with a probability function for every event A Ω, s.t,

In the case where every sample can be an “atomic” event

Random Variables
A random variable X is a function from Ω to some range (e.g., the reals or Booleans)
e.g., X(1) = True; X(3) = True; X(5) = True; X(2) = False; X(4) = False; X(6) = False
e.g., X(“head”) = 0; X(“tail”) = 1
e.g., X(“Temperature is 3.5 degree”) = 3.5
P induces a probability measure for any random variable X
; is an assignment
e.g., P (X = True) = P (1) + P (3) + P (5) = + + =
can be viewed as an event
and with can be viewed as disjoint events
Denote random variables with capital letters

Proposition
Think of a proposition as the event (set of sample points) where the proposition is true
Given Boolean random variables A and B
event a = set of sample points where A(ω) = true
event ¬a = set of sample points where A(ω) = false
event a ∧ b = points where A(ω) = true and B(ω) = true
event a V b = points where A(ω) = true or B(ω) = true

Often in AI applications, the “multi-dimensional” sample points/data instances are defined by the values of a set of random variables (of different sample spaces),
i.e., the sample space is the Cartesian product of the ranges of the variables
e.g., coin = (weight, height)

The definitions imply that certain logically related events must have related probabilities
e.g., P (a ∨ b) = P (a) + P (b) − P (a ∧ b)

Both events are in the same “sample” space.

Why should we use probability?
Propositional or Boolean random variables
e.g., Cavity (do I have a cavity?)
Cavity = true is a proposition, also written cavity
Discrete random variables (finite or infinite)
e.g., Weather is one of sunny, rain, cloudy, snow
Weather = rain is a proposition
Values must be exhaustive and mutually exclusive
Continuous random variables (bounded or unbounded)
e.g., Temp = 21.6; also allow, e.g., Temp < 22.0. Arbitrary Boolean combinations of basic propositions Prior probability (before seeing evidence/data) Prior or unconditional probabilities of propositions e.g., P (Cavity = true) = 0.2 and P (Weather = sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution gives values for all possible assignments: P(Weather = sunny/rain/cloudy/snow) = 0.72/0.1/0.08/0.1 (normalized, i.e., sums to 1) Joint probability distribution for a set of random variables gives the probability of every atomic event on those random variables i.e., every sample point/data instance P(Weather, Cavity) = a 4 × 2 matrix of values, sum to 1 Weather sunny rain cloudy snow Cavity= true 0.144 0.02 0.016 0.02 Cavity = false 0.576 0.08 0.064 0.08 Probability for Continuous Variables Express distribution as a parameterized function of value: P (X = x) = (x) = uniform density between 18 and 26 Here P is a density (i.e., probability density function: PDF); integrates to 1: P (X = 20.5) = 0.125 really means = 0.125 The density can be larger than 1 Gaussian Density Conditional Probability given evidence Conditional or posterior probabilities e.g., P (Cavity = true | Toothache = true) = 0.8 i.e., given that toothache is true is all I know, the probability of cavity is 0.8 Notation for conditional probabilities: Denote Cavity = true with cavity, Cavity = false with ¬ cavity, and so on! P(cavity | toothache) P(cavity | toothache) + P(¬ cavity | toothache) = 1 P(cavity | ¬ toothache) + P(¬ cavity | ¬ toothache) = 1 “conditional” Toothache = true Toothache = false Cavity= true 0.8 0.2 Cavity = false 0.2 0.8 Conditional Probability given evidence If we know more, e.g., not-brush-teeth, the probability may change P (cavity | toothache, not-brush-teeth) = 0.95 P (cavity | toothache, visit-a-dentist-last-week) = 0.2 New evidence may be irrelevant, allowing simplification e.g., P (cavity | toothache, 49℃) = P (cavity | toothache) = 0.8 This kind of inference, sanctioned by domain knowledge, is crucial Inference by Enumeration Start with joint distribution: of (1) cavity or not (2) toothache or not (3) catch or not For any proposition, , sum the atomic events where it is true toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 Inference by Enumeration Start with joint distribution For any proposition, , sum the atomic events where it is true P (toothache) = p(toothache, cavity, catch) + p(toothache, cavity, ¬catch) + p(toothache, ¬cavity, catch) + + p(toothache, ¬ cavity, ¬catch) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2 We call it a “marginal distribution” toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 Inference by Enumeration Start with joint distribution For any proposition, , sum the atomic events where it is true P(cavity ∨ toothache) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28 toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 Inference by Enumeration Start with joint distribution For any proposition, , sum the atomic events where it is true P(¬ cavity | toothache) = = = 0.4 toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 Normalization Start with joint distribution Denominator can be viewed as a normalization constant P(cavity| toothache) General idea: compute distribution on query variable by fixing evidence variables and summing over hidden variables toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 P(cavity, ¬catch) P(toothache| cavity, ¬catch) P(cavity, ¬catch| toothache) toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 P(cavity, ¬catch) =P(cavity, ¬catch, toothache) + P(cavity, ¬catch, ¬ toothache) P(toothache| cavity, ¬catch) P(cavity, ¬catch| toothache) toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 P(cavity, ¬catch) =P(cavity, ¬catch, toothache) + P(cavity, ¬catch, ¬ toothache) P(toothache| cavity, ¬catch) = P(cavity, ¬catch| toothache) toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 P(cavity, ¬catch) =P(cavity, ¬catch, toothache) + P(cavity, ¬catch, ¬ toothache) P(toothache| cavity, ¬catch) = P(cavity, ¬catch| toothache)= toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 The product rule P(Cavity, Toothache)= P(Cavity)P(Toothache|Cavity) P(cavity, toothache)= P(cavity)P(toothache|cavity) P(cavity, ¬ toothache)= P(cavity)P(¬ toothache|cavity) And so on for all assignments The chain rule P(Cavity, Toothache, Catch)= P(Cavity)P(Toothache , Catch |Cavity) = P(Cavity)P(Toothache|Cavity)P(Catch| Toothache, Cavity) P(Toothache)P(Cavity,Catch|Toothache) = P(Toothache)P(Catch|Toothache)P(Cavity|Catch,Toothache) Independence (not always) A and B are independent if and only if P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A)P(B) Need to check all the assignments: e.g., P(A=a, B=b) = P(A=a)P(B=b) and so on P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) P(Weather) Absolute independence is powerful but rare Dentistry is a large field with hundreds of variables, none of which are independent. What to do? Cavity decomposes into Toothache Catch Cavity Toothache Catch Conditional Independence (not always) A and B are conditional independent given C if and only if P(A, B|C=c) = P(A|C=c) P(B|C=c) or P(A|B, C=c) = P(A|C=c) or P(B|A, C=c) = P(B|C=c) Need to check all the assignments: e.g., A= a, B = b, C = c If I have a cavity, the probability that the probe catches in it doesn’t depend on whether I have a toothache: P (catch|toothache, cavity) = P (catch| ¬ toothache, cavity) = P (catch|cavity) The same independence holds if I haven’t got a cavity: P (catch|toothache, ¬cavity) = P (catch| ¬ toothache, ¬cavity) = P (catch|¬cavity) toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 Conditional Independence Catch is conditionally independent of Toothache given cavity (or ¬cavity): P(Catch|Toothache, Cavity = true) = P(Catch|Cavity = true) P(Catch|Toothache, Cavity = false) = P(Catch|Cavity = false) Equivalent statements: P(Toothache|Catch, Cavity) = P(Toothache|Cavity) P(Toothache, Catch|Cavity) = P(Toothache|Cavity) P(Catch|Cavity) P(Toothache, Catch, Cavity) needs to store 2^3 independent numbers If Toothache and Catch are conditionally independent given Cavity: P(Toothache, Catch, Cavity) = P(Toothache , Catch |Cavity) P(Cavity) = P(Toothache|Cavity) P(Catch |Cavity) P(Cavity) Conditional Independence Write out full joint distribution using chain rule: P(toothache, catch, cavity) = P(toothache| catch, cavity)P(catch, cavity) = P(toothache| catch, cavity)P(catch | cavity) P(cavity) = P(toothache| cavity)P(catch | cavity) P(cavity) In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n. Conditional independence is our most basic and robust form of knowledge about uncertain environments. Bayes' Rule Bayes’ rule P (a|b) = or in distribution form P(Y|X) = = α P(X|Y)P(Y); α is the normalization factor/denominator Useful for assessing diagnostic probability from causal probability: P (cause|effect) = e.g., let M be meningitis, S be stiff neck: Product rule P (a, b) = P (a|b)P (b) = P (b|a)P(a) Bayes' Rule Conditional Independence P(cavity|toothache, catch) = α P(toothache, catch| cavity)P( cavity) = α P(toothache|cavity)P(catch|cavity)P(cavity) This is an example of a naive Bayes model: P(cause, effect1, . . ., effectn) = P(cause)ΠiP(effecti |cause) Total number of parameters is linear in n cavity cause toothache catch effect1 effectn Let as N random variables, as the assignments for them Joint: P() = P() Marginal: P() Conditional: P() Independent, conditional independence Product and chain rules: P( Bayes’ rule: P() /docProps/thumbnail.jpeg 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts