CS代考 CSE 3521: Probability Refresher

PowerPoint Presentation

CSE 3521: Probability Refresher
[Many slides are adapted from previous CSE 5521 course at OSU.]

Copyright By PowCoder代写 加微信 powcoder

What is Probability?
“The probability the coin will land heads is 0.5”
Q: what does this mean?

What is Probability?
“The probability the coin will land heads is 0.5”
Q: what does this mean?
Interpretations:
Frequentist (Repeated trials)
If we flip the coin many times…
On average, we will see half of the times with heads

What is Probability?
“The probability the coin will land heads is 0.5”
Q: what does this mean?
Interpretations:
Frequentist (Repeated trials)
If we flip the coin many times…
On average, we will see half of the times with heads
We believe there is equal chance of heads/tails
Advantage: events that do not have long term frequencies

What is Probability?
“The probability the coin will land heads is 0.5”
Q: what does this mean?
Interpretations:
Frequentist (Repeated trials)
If we flip the coin many times…
On average, we will see half of the times with heads
We believe there is equal chance of heads/tails
Advantage: events that do not have long term frequencies

Q: What is the probability the polar ice caps will melt by 2050?

Probability Basic (a less strict version)
Begin with a set Ω (i.e., the sample space)
e.g., 6 possible rolls of a die
ω ∈ Ω, is an outcome, a sample point (e.g., ω=1), or atomic event

Probability Basic (a less strict version)
Begin with a set Ω (i.e., the sample space)
e.g., 6 possible rolls of a die
ω ∈ Ω, is an outcome, a sample point (e.g., ω=1), or atomic event
An event A is a subset of Ω

atomic event: {6} or sample 6

Probability Basic (a less strict version)
Begin with a set Ω (i.e., the sample space)
e.g., 6 possible rolls of a die
ω ∈ Ω, is an outcome, a sample point (e.g., ω=1), or atomic event
An event A is a subset of Ω
A probability space or probability model is a sample space with a probability function for every event A Ω, s.t,

Probability Basic (a less strict version)
Begin with a set Ω (i.e., the sample space)
e.g., 6 possible rolls of a die
ω ∈ Ω, is an outcome, a sample point (e.g., ω=1), or atomic event
An event A is a subset of Ω
A probability space or probability model is a sample space with a probability function for every event A Ω, s.t,

In the case where every sample can be an “atomic” event

Random Variables
A random variable X is a function from Ω to some range (e.g., the reals or Booleans)
e.g., X(1) = True; X(3) = True; X(5) = True; X(2) = False; X(4) = False; X(6) = False
e.g., X(“head”) = 0; X(“tail”) = 1
e.g., X(“Temperature is 3.5 degree”) = 3.5
P induces a probability measure for any random variable X
; is an assignment
e.g., P (X = True) = P (1) + P (3) + P (5) = + + =
can be viewed as an event
and with can be viewed as disjoint events
Denote random variables with capital letters

Proposition
Think of a proposition as the event (set of sample points) where the proposition is true
Given Boolean random variables A and B
event a = set of sample points where A(ω) = true
event ¬a = set of sample points where A(ω) = false
event a ∧ b = points where A(ω) = true and B(ω) = true
event a V b = points where A(ω) = true or B(ω) = true

Often in AI applications, the “multi-dimensional” sample points/data instances are defined by the values of a set of random variables (of different sample spaces),
i.e., the sample space is the Cartesian product of the ranges of the variables
e.g., coin = (weight, height)

The definitions imply that certain logically related events must have related probabilities
e.g., P (a ∨ b) = P (a) + P (b) − P (a ∧ b)

Both events are in the same “sample” space.

Why should we use probability?
Propositional or Boolean random variables
e.g., Cavity (do I have a cavity?)
Cavity = true is a proposition, also written cavity
Discrete random variables (finite or infinite)
e.g., Weather is one of sunny, rain, cloudy, snow
Weather = rain is a proposition
Values must be exhaustive and mutually exclusive
Continuous random variables (bounded or unbounded)
e.g., Temp = 21.6; also allow, e.g., Temp < 22.0. Arbitrary Boolean combinations of basic propositions Prior probability (before seeing evidence/data) Prior or unconditional probabilities of propositions e.g., P (Cavity = true) = 0.2 and P (Weather = sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution gives values for all possible assignments: P(Weather = sunny/rain/cloudy/snow) = 0.72/0.1/0.08/0.1 (normalized, i.e., sums to 1) Joint probability distribution for a set of random variables gives the probability of every atomic event on those random variables i.e., every sample point/data instance P(Weather, Cavity) = a 4 × 2 matrix of values, sum to 1 Weather sunny rain cloudy snow Cavity= true 0.144 0.02 0.016 0.02 Cavity = false 0.576 0.08 0.064 0.08 Probability for Continuous Variables Express distribution as a parameterized function of value: P (X = x) = (x) = uniform density between 18 and 26 Here P is a density (i.e., probability density function: PDF); integrates to 1: P (X = 20.5) = 0.125 really means = 0.125 The density can be larger than 1 Gaussian Density Conditional Probability given evidence Conditional or posterior probabilities e.g., P (Cavity = true | Toothache = true) = 0.8 i.e., given that toothache is true is all I know, the probability of cavity is 0.8 Notation for conditional probabilities: Denote Cavity = true with cavity, Cavity = false with ¬ cavity, and so on! P(cavity | toothache) P(cavity | toothache) + P(¬ cavity | toothache) = 1 P(cavity | ¬ toothache) + P(¬ cavity | ¬ toothache) = 1 “conditional” Toothache = true Toothache = false Cavity= true 0.8 0.2 Cavity = false 0.2 0.8 Conditional Probability given evidence If we know more, e.g., not-brush-teeth, the probability may change P (cavity | toothache, not-brush-teeth) = 0.95 P (cavity | toothache, visit-a-dentist-last-week) = 0.2 New evidence may be irrelevant, allowing simplification e.g., P (cavity | toothache, 49℃) = P (cavity | toothache) = 0.8 This kind of inference, sanctioned by domain knowledge, is crucial Inference by Enumeration Start with joint distribution: of (1) cavity or not (2) toothache or not (3) catch or not For any proposition, , sum the atomic events where it is true toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 Inference by Enumeration Start with joint distribution For any proposition, , sum the atomic events where it is true P (toothache) = p(toothache, cavity, catch) + p(toothache, cavity, ¬catch) + p(toothache, ¬cavity, catch) + + p(toothache, ¬ cavity, ¬catch) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2 We call it a “marginal distribution” toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 Inference by Enumeration Start with joint distribution For any proposition, , sum the atomic events where it is true P(cavity ∨ toothache) = 0.108 + 0.012 + 0.072 + 0.008 + 0.016 + 0.064 = 0.28 toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 Inference by Enumeration Start with joint distribution For any proposition, , sum the atomic events where it is true P(¬ cavity | toothache) = = = 0.4 toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 Normalization Start with joint distribution Denominator can be viewed as a normalization constant P(cavity| toothache) General idea: compute distribution on query variable by fixing evidence variables and summing over hidden variables toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 P(cavity, ¬catch) P(toothache| cavity, ¬catch) P(cavity, ¬catch| toothache) toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 P(cavity, ¬catch) =P(cavity, ¬catch, toothache) + P(cavity, ¬catch, ¬ toothache) P(toothache| cavity, ¬catch) P(cavity, ¬catch| toothache) toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 P(cavity, ¬catch) =P(cavity, ¬catch, toothache) + P(cavity, ¬catch, ¬ toothache) P(toothache| cavity, ¬catch) = P(cavity, ¬catch| toothache) toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 P(cavity, ¬catch) =P(cavity, ¬catch, toothache) + P(cavity, ¬catch, ¬ toothache) P(toothache| cavity, ¬catch) = P(cavity, ¬catch| toothache)= toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 The product rule P(Cavity, Toothache)= P(Cavity)P(Toothache|Cavity) P(cavity, toothache)= P(cavity)P(toothache|cavity) P(cavity, ¬ toothache)= P(cavity)P(¬ toothache|cavity) And so on for all assignments The chain rule P(Cavity, Toothache, Catch)= P(Cavity)P(Toothache , Catch |Cavity) = P(Cavity)P(Toothache|Cavity)P(Catch| Toothache, Cavity) P(Toothache)P(Cavity,Catch|Toothache) = P(Toothache)P(Catch|Toothache)P(Cavity|Catch,Toothache) Independence (not always) A and B are independent if and only if P(A|B) = P(A) or P(B|A) = P(B) or P(A, B) = P(A)P(B) Need to check all the assignments: e.g., P(A=a, B=b) = P(A=a)P(B=b) and so on P(Toothache, Catch, Cavity, Weather) = P(Toothache, Catch, Cavity) P(Weather) Absolute independence is powerful but rare Dentistry is a large field with hundreds of variables, none of which are independent. What to do? Cavity decomposes into Toothache Catch Cavity Toothache Catch Conditional Independence (not always) A and B are conditional independent given C if and only if P(A, B|C=c) = P(A|C=c) P(B|C=c) or P(A|B, C=c) = P(A|C=c) or P(B|A, C=c) = P(B|C=c) Need to check all the assignments: e.g., A= a, B = b, C = c If I have a cavity, the probability that the probe catches in it doesn’t depend on whether I have a toothache: P (catch|toothache, cavity) = P (catch| ¬ toothache, cavity) = P (catch|cavity) The same independence holds if I haven’t got a cavity: P (catch|toothache, ¬cavity) = P (catch| ¬ toothache, ¬cavity) = P (catch|¬cavity) toothache ¬toothache catch ¬catch catch ¬catch cavity 0.108 0.012 0.072 0.008 ¬cavity 0.016 0.064 0.144 0.576 Conditional Independence Catch is conditionally independent of Toothache given cavity (or ¬cavity): P(Catch|Toothache, Cavity = true) = P(Catch|Cavity = true) P(Catch|Toothache, Cavity = false) = P(Catch|Cavity = false) Equivalent statements: P(Toothache|Catch, Cavity) = P(Toothache|Cavity) P(Toothache, Catch|Cavity) = P(Toothache|Cavity) P(Catch|Cavity) P(Toothache, Catch, Cavity) needs to store 2^3 independent numbers If Toothache and Catch are conditionally independent given Cavity: P(Toothache, Catch, Cavity) = P(Toothache , Catch |Cavity) P(Cavity) = P(Toothache|Cavity) P(Catch |Cavity) P(Cavity) Conditional Independence Write out full joint distribution using chain rule: P(toothache, catch, cavity) = P(toothache| catch, cavity)P(catch, cavity) = P(toothache| catch, cavity)P(catch | cavity) P(cavity) = P(toothache| cavity)P(catch | cavity) P(cavity) In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n. Conditional independence is our most basic and robust form of knowledge about uncertain environments. Bayes' Rule Bayes’ rule P (a|b) = or in distribution form P(Y|X) = = α P(X|Y)P(Y); α is the normalization factor/denominator Useful for assessing diagnostic probability from causal probability: P (cause|effect) = e.g., let M be meningitis, S be stiff neck: Product rule P (a, b) = P (a|b)P (b) = P (b|a)P(a) Bayes' Rule Conditional Independence P(cavity|toothache, catch) = α P(toothache, catch| cavity)P( cavity) = α P(toothache|cavity)P(catch|cavity)P(cavity) This is an example of a naive Bayes model: P(cause, effect1, . . ., effectn) = P(cause)ΠiP(effecti |cause) Total number of parameters is linear in n cavity cause toothache catch effect1 effectn Let as N random variables, as the assignments for them Joint: P() = P() Marginal: P() Conditional: P() Independent, conditional independence Product and chain rules: P( Bayes’ rule: P() /docProps/thumbnail.jpeg 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com