ISE 562, Dr. Bayes Methods Decision Theory
ISE 562, Dr. differences between discrete/continuous
• Probability is not height of function but area under curve for an interval.
p(x) discrete P(X=a)
Copyright By PowCoder代写 加微信 powcoder
• Total area under the function =1
9/4/2022 2
f(x) continuous
P(aXb) a b • Require f(X)0 for all X (but X can be < 0)
ISE 562, Dr. vs. Cumulative Probability
PDF’s and CDF’s
PDF : P(a X b)
CDF:P(Xx)x f()d
f (X ) d dX
•P(aXb) = P(Xb) – P(Xa) •P(Xa) = 1-P(Xa)
•P(Xa) = 1-P(Xa)
ISE 562, Dr. histogram f(x) Empirical pdfs
9 / 4 / 2 0 2 2
0202 f ( X ) kX 0 X 2
2 X2 kX2 2 0kXdXk22 2k
0 2k 1 so k 12
f ( X ) 12 X 0 X 2 4
ISE 562, Dr. pdfs
2 1 X32 4 E[X]0X2XdX6 3
2 E[X2]2 2X21XdX2
X42 42 162
8 3 299 0
ISE 562, Dr. pdfs: Cumulative Distribution
Function (CDF) x F(X)x 1d 2
1 X2 02404
0 X 2 cdf
ISE 562, Dr. pdfs: Expected value of a function Y=Cost(X)=aX+b
E(Cost(X )) 2 Cost(X ) 12XdX 0
2(aX b)1XdX 2(aX2 bX)dX 02022
aX3 bX2 2 4
6 4 3ab
ISE 562, Dr. use properties of expectation: Y=Cost(X)=aX+b
E(Cost(X))E(aX b) E(aX ) E(b)
aE ( X ) b
43 a b
ISE 562, Dr. ’s with more than one random variable
f(X,Y) called the joint probability of X and Y
Marginal pdfs can be obtained by integrating one variable out of the function:
f(X) f(X,Y)dY
f(Y) f(X,Y)dX
ISE 562, Dr. we have a reservoir fed by 2 rivers. Let X=stream flow in river X and Y=flow in river Y. The joint pdf of flows is:
4000 X f(X,Y)c 4000
c 2.5x107
0 X 4000 0Y 2000
ISE 562, Dr. Smith
f ( X ) 2000 c 4000 X dY c (4000 X ) 0 4000 2
f (Y ) 4000 c 4000 X dX 2000c 0 4000
E[(X Y)] 20004000X Yc 4000 X dXdY 00 4000
ISE 562, Dr. Smith
f(X|Y) f(X,Y) f (Y )
Conditional Probability
Bayes rule is derived from the above conditional probability relationship where Y represents the sample information (observations) and X is the random (decision) variable.
9/4/2022 12
ISE 562, Dr. Smith
i) Conditional Probability
ii) As in discrete case: iii) Integrating out the
Bayes for Continuous Random Variables
f(|y) f(,y) f (y)
f(,y) f()f(y|) f (y) f (, y)d
f()f(y|)d
Substituting (ii) and (iii) in (i)
f ( ) f ( y | )
f(|y) f()f(y|)d
ISE 562, Dr. Smith
f ( | y )
posterior pdf ( prior pdf )(likelihood )
9/4/2022 14
f ( ) f ( y | ) f()f(y|)d
( prior pdf )(likelihood )
ISE 562, Dr. Smith
• We denote an uncertain decision variable as and assume that sample information involving can be summarized by a sample statistic y.
• If y has all the information from the sample relevant to the uncertainty about , then y is called a sufficient statistic.
• Example: for a Bernoulli process, the sample information can be summarized by n and r; the actual number of successes and failures adds no additional information about p since p=r/n.
9/4/2022 15
Sufficiency
ISE 562, Dr. do we care about sufficiency?
Sufficiency
For Bayes rule this means that knowledge of n and r is sufficient to determine the likelihoods so the posterior distribution of p given n and r is exactly the same as the posterior distribution of p given the entire sequence of observations.
Simply put, everything we need to know about the sample is contained in the likelihood function.
9/4/2022 16
ISE 562, Dr. Smith
• Example: Let =market share of a new product
(01); assume it is continuous and has prior pdf: Prior pdf
f() 2 f ( ) 2(1 ) 0 1
• We take a sample of 5 consumers and 1 buys new brand while other 4 purchase a different brand
• Assume binomial likelihood distribution with Likelihood
success = “buys new product” so
f ( y | ) P ( r 1 | n 5, )
5! 1(1)4 5(1)4 9/4/2022 4!1!
ISE 562, Dr. Smith
• Applying Bayes rule we substitute the prior and likelhood functions:
f ( | y ) f ( ) f ( y | ) 2 (1 ) 5 (1 ) 4
f()f(y|)d 12(1)5(1)4d 0
10(1)5 (1)5 101(1)5d 1(1)5d
Posterior pdf
Since x (1 x) dx where (n) (n 1)! f(|y)
1 m1 n1 (m)(n) 0
(mn) 2 (1)5 42(1)5
Notice this is a function of . In the discrete Bayes case we had specific values of p for the states—here we can have any value in [0,1] for the state variable.
ISE 562, Dr. Smith
• As shown the process appears somewhat difficult and computationally challenging
• May not find the integral is computable in closed form; may need numerical quadrature techniques
• There is another way!!!
• conjugate families of distributions that simplify the process of combining priors and likelihoods
9/4/2022 19
ISE 562, Dr. of conjugate families of pdfs
1. Tractability:easytospecifyposterior given the prior and likelihood function
2. Richness:thepriorshouldreflectthe prior information (this is done with parameters to fit the distribution to the information)
3. Easeofinterpretation:priorshouldbe interpretable in terms of previous sample results
9/4/2022 20
ISE 562, Dr. Smith
2 conjugate families studied here:
1. SamplingfromaBernoulliprocess whose conjugate is the family of beta distributions
2. Sampling from a normal distributed process with known variance whose conjugate distribution is the family of normal distributions.
Likelihood
Prior Likelihood
ISE 562, Dr. /binomial
“Conjugate” family refers to the relationship between the prior and the likelihood function.
1. Sampling from a Bernoulli process (the likelihood function) has as its conjugate, the family of beta distributions
f(p) (n1)! pr1(1 p)nr1 0 p1 (r1)!(nr1)!
(Note that the random variable p, varies from zero to one for the beta pdf)
9/4/2022 Beta pdf 22
ISE 562, Dr. /binomial “If n, r, not integers we must use gamma functions
f(p) (n) (r)(n r)
(t) xt1exdx
note : if t an integer
(t) (t 1)! 9/4/2022
pr1(1 p)nr1 t 0
ISE 562, Dr. /binomial Mean and variance of the beta distribution:
E ( pˆ | r , n ) nr
2 V(pˆ|r,n) r(nr)
To calculate probabilities we use fractiles:
The f fractile of the pdf of a continuous random
variable is the value xf where P(X xf)=f
9/4/2022 24
ISE 562, Dr. /binomial
Suppose we have a beta pdf of p with r=5 and n=8 and we want P(p0.562|r=5,n=8) and P(.265p 0.562|r=5,n=8)
9/4/2022 25
2.5000 2.0000 1.5000 1.0000 0.5000 0.0000
ISE 562, Dr. /binomial
Suppose we have a beta pdf of p with r=5 and n=8
• Can use rhs of beta calculator (back of book)
P(p0.562|r=5,n=8)=0.3402
• For P(.265p 0.562|r=5,n=8) use:
• P(a x b)=P(xb) - P(xa)
• = P(p.562) - P(p.265)
• =0.3402 – 0.0167
• Can use fractiles (on lhs) to enter probability
and then look up p.
9/4/2022 Calculator 26
2.5000 2.0000 1.5000 1.0000 0.5000 0.0000
ISE 562, Dr. Alert!
• Prior pdfs and likelihoods denoted with single primes; e.g., p’, f’(), E’[], ’,...
• Posterior pdfs and parameters of posterior pdfs denoted with double primes, (p”), f”(), E”[], ”,...
9/4/2022 27
ISE 562, Dr. Smith
Now the big advantage of conjugate pdfs...
ISE 562, Dr. a binomial process
– Stationarity and Independency
– Beta prior pdf of the form:
f'(p) (n'1)! pr'1(1 p)n'r'1 (r'1)!(n'r'1)!
Beta/binomial
– We draw a sample for binomial likelihood function with r successes in n trials. Then the posterior pdf becomes:
f''(p) (n"1)! pr"1(1p)n"r"1 0 p1 (r"1)!(n"r"1)!
with n" = n' + n and r" = r' + r
ISE 562, Dr. /binomial
• Example: Let =market share of a new product
(01); assume it is continuous and has prior pdf: Prior pdf
f() 2 f ( ) 2(1 ) 0 1
• We take a sample of 5 consumers and 1 buys new brand while other 4 purchase a different brand
• Assume binomial likelihood distribution with Likelihood
success = “buys new product” so
f ( y | ) P ( r 1 | n 5, )
5! 1(1)4 5(1)4 9/4/2022 4!1!
0.001 0.15
0.001 0.15
ISE 562, Dr. /binomial • Example: Let =market share of a new product
(01) is a beta prior: f()2(1) 0 1
E [ ] 1 2 ( 1 ) d 13 0
• The equivalent mean for the beta is E[p]=r/n=1/3 so r=1 and n=3:
9/4/2022 31
2.5000 2.0000 1.5000 1.0000 0.5000 0.0000
ISE 562, Dr. /binomial
• The sample results were r=1 and n=5 so the posterior pdf has parameters
• r”=r’+r= 1+1=2
• n”=n’+n=3+5=8
• So mean of prior went from 1/3 (0.33) to posterior of 2/8=1/4 (0.25) – it shifted to the left.
9/4/2022 Prior Posterior 32
2.5000 2.0000 1.5000 1.0000 0.5000 0.0000
3.0000 2.5000 2.0000 1.5000 1.0000 0.5000 0.0000
ISE 562, Dr. /binomial
• Notice also the shift in variance:
• Prior variance =r(n-r)/(n2(n+1))=1(3-1)/(9(4))
=1/18 = 0.055
• Posterior variance=2(8-2)/(64(9))=12/576
2.5000 2.0000 1.5000 1.0000 0.5000 0.0000
=1/48 = 0.021
3.0000 2.5000 2.0000 1.5000 1.0000 0.5000 0.0000
ISE 562, Dr. /binomial
• Posterior mean always lies between prior mean and the sample mean for the Bernoulli/Beta family
• Posterior variance generally smaller than prior variance due to addition of sample information to prior knowledge.
9/4/2022 Prior Posterior 34
2.5000 2.0000 1.5000 1.0000 0.5000 0.0000
3.0000 2.5000 2.0000 1.5000 1.0000 0.5000 0.0000
ISE 562, Dr. Conjugate
• “Conjugate” family refers to the relationship between the prior and the likelihood function.
• Sampling from a normal process (the likelihood function) has as its conjugate, the family of normal distributions ( x )2
f(x) 1 e 22
zx f(x|,2) f(z|0,1)
f(z) 1 ez2 2
ISE 562, Dr. normal distribution
P(Xa)a f(X)dX 0
P(z (a )) STDNORMALTABLE((a ))
0.001 0.15
0.001 0.15
0.001 0.15
0.001 0.15
0.001 0.15
0.001 0.15
0.001 0.15
ISE 562, Dr. Smith
Table used for
This area =
u=0 Z=(a-u)/
ISE 562, Dr. on form of table, can use symmetry properties of normal to calculate various probabilities:
P(aXb) = P(Xb) – P(Xa) P(Xa) = 1-P(Xa)
P(Xa) = 1-P(Xa)
9/4/2022 38
ISE 562, Dr. Normal Table
ISE 562, Dr. types of normal pdf problems:
1. Given X, u, 2 , find P(X)
a) Transform question to probability statement
b) Translate X to Z and P(Z) using Z=(X-u)/
c) Look up P(Z) in standard normal table
Lookup P(Z) in body of table to find Z Solve Z=(X-u)/ for unknown quantity
Given P(X), find X, u, or 2
ISE 562, Dr. carpet warehouse keeps 6000 yards of carpet in stock during a month. The average demand is normally distributed with mean 4500 yards and standard deviation 900 yards. What is the probability a customer order won’t be met? We want P(d6000).
P(d6000)=P(z (6000-4500)/900)=P(z 1.67) = 1-P(z1.67) = 1-(0.9525) = 0.0475
Std. Normal
9/4/2022 Table 41
Type 1, Find Probability
ISE 562, Dr. mount of coffee a filling machine puts into 4 oz. jars is normally distributed with std. dev, =0.04 oz. If only 2% of the jars are to contain less than 4 oz., what should be the average fill amount?
• We want P(X4)=0.02.
• Find Z such that P(Zz)=0.02
• From table, Z=-2.05
• Solve -2.05=(4-u)/.04 for u.
• u=4.082 oz.
Std. Normal
Type 2, Find unknown
ISE 562, Dr. to Bayes...
Normal Conjugate
• and are summary measures of the normal pdf
• For the binomial, r and n are summary measures of the sample info
• For the normal, the sample mean and sample variance are summary measures for a sample from
a normal population
ISE 562, Dr. Smith
• If x1, ..., xn random variables represent a random sample from a normal population with mean and , then the sample mean is normally distributed with E[m| ,2]= and V[m| ,2]= 2/n
• Even if population is not normally distributed the Central Limit Theorem shows that as n, the distribution of (m-)/(/n) converges to the normal pdf. This is true for any population with finite mean and variance.
9/4/2022 CLT demo 44
Normal Conjugate
ISE 562, Dr. Smith
•Be careful—check your data for normal distribution
9/4/2022 45
ISE 562, Dr. -squared,2 Goodness-of-fit test
• Probability distribution unknown
• Sample from the unknown pdf n values
• We sort the sample into k class intervals (histogram) and let foi be the observed frequency (count) for interval i.
• Then compute the theoretical frequency using normal (or any) distribution for each interval, fti
• If any of intervals contain < 5 theoretical observations, combine with next interval
• Calculate total deviation of observed values from theoretical values and test for significance
9/4/2022 46
Testing for Normality
ISE 562, Dr. Chi-square statistic is:
Testing for Normality
k= no. intervals
p= no. estimated parameters (2 for normal)
2 oi ti i1 fti
The hypothesis test:
Ho: X is normally distributed
Ha: X is not normally distributed
Reject Ho if data not normal (if differences too
large); ie: reject normal if 2 2 ,kp1
ISE 562, Dr. : cell phone company has frequency data for length of calls outside a roaming area. The mean and std. dev are 14.3 and 3.7 minutes respectively. Are the data normally distributed? Use =0.05.
9/4/2022 48
Length (min)
Testing for Normality
ISE 562, Dr. (0X5)=P(z(5-14.3)/3.7) - P(z(0-14.3)/3.7)=P(z-2.51)-
P(z-3.86)=(1-.9940)-(1-.9999)=0.006
9/4/2022 49
Testing for Normality
Length (min)
Freq (obs)
Theor. prob
Theor freq (np)
(fo-ft)2 /ft
ISE 562, Dr. Smith
reject normal if 2 2 ,kp1
2 2 2 3.841
,kp1 0.05,421 0.05,1 112.2 3.841
so reject normal
Testing for Normality
ISE 562, Dr. Conjugate
Returning to conjugate discussion...
Suppose the prior distribution on is normal with mean m’ and variance ’2: (m')2
f '() 1 e 2 '2
We draw a sample of size n and observe a sample mean of m;
the posterior density is then normal:
Where y represents the sample results and the posterior
parameters are computed from:
f "( | y) 1 e 2 "2
1 m' n m 11n and m"'2 2
(m")2
"2 '2 2
Back to 1n p.56
ISE 562, Dr. Conjugate
Discrete prior example: retailer interested in weekly sales, x, at one of their stores
• x distributed normally with unknown mean and variance known, 2=90000
• Only 5 potential values to be considered
• =1100, 1150, 1200, 1250, 1300
• The prior distribution is estimated to be:
9/4/2022 52
P(=1100)=.15
P(=1150)=.20
P(=1200)=.30
P(=1250)=.20
P(=1300)=.15
ISE 562, Dr. Conjugate
Retailer wants more information about the store so takes a sample from past sales records assuming weekly sales independent.
Takes sample of n=60 weeks and calculates sample mean, m=1240; now calculate the likelihoods*
f(1240|1100,
f(1240|1150,
f(1240|1200,
f(1240|1250,
f(1240|1300, 9/4/2022
12401100 n 38.73) f 38.73
12401150 n 38.73) f 38.73
12401200 n 38.73) f 38.73
12401250 n 38.73) f 38.73
12401300 n 38.73) f 38.73
| 0,1 / 38.73 .0006 / 38.73
| 0,1 / 38.73 .0270 / 38.73
| 0,1 / 38.73 .2347 / 38.73
| 0,1 / 38.73 .3857 / 38.73
| 0,1 / 38.73 .1200 / 38.73
* 300/sqrt(60) = 38.73
ISE 562, Dr. Conjugate
Note that denominator same for all calculations (see page 35) so we can ignore in table—its optional since it cancels out later
9/4/2022 54
Prior prob.
Likelihood
Prior prob * likelihood
Posterior prob.
ISE 562, Dr. Conjugate
Continuous prior example: Suppose mgr decides prior is normally distributed with mean, m’=1200 and ’=50.
Note that 2=90000 is the variance of weekly sales Note that ’2=2500 is the variance of prior pdf of ,
average weekly sales.
Using the previous sample information for n=60 weeks with mean m=1240 we calculate the posterior parameters for the posterior pdf:
9/4/2022 55
ISE 562, Dr. Conjugate
1 1 n 1 60 96 "2 '2 2 2500 90000 90000
1m'nm m" '2 2
1 1200 60 1240
90000 1 60
2500 90000
So the mean and variance of the posterior pdf are 1225 and 90000/96=937.5
ISE 562, Dr. Conjugate
• The original belief about the mean (prior) was m’=1200 and a population standard deviation of 300
• A sample was taken with sample mean of 1240 and standard deviation of 38.73
• The posterior mean then moved from the prior value of 1200 to the posterior value of 1225
• The posterior standard deviation went from a prior value of 38.73 to sqrt(937.5) = 30.62 (less dispersion due to “learning” from sample information.
9/4/2022 57
ISE 562, Dr. if no conjugates?
• What if the prior and likelihood not conjugate?
• All is not lost. In Bayes time until digital computers arrived in the 1950-1960’s, computation of the posterior for an arbitrary prior and likelihood was not feasible.
• Today we have software that make these calculations easy (e.g., Mathematica, Matlab, etc.)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com