代写 C statistic Bayesian theory Coin flipping

Coin flipping
Suppose there is a coin that may be biased – this coin has unknown probability θ of giving a “heads.” If we repeatedly flip this coin and observe the outcomes, how can we maintain our belief about θ?
Note that the coin-flipping problem can be seen as a simplification of the survey problem we discussed last time, where we assume that people always tell the truth, are sampled uniformly at random, and whose opinions are generated independently (by flipping a coin!).
Before we select a prior for θ, we write down the likelihood. For a particular problem, it is almost always easier to derive an appropriate likelihood than it is to identify an appropriate prior distribution.
Suppose we flip the coin n times and observe x “heads.” Every statistician, regardless of philosophy, would agree that the probability of this observation, given the value of θ, comes from a binomial
distribution:
Classical method
􏰊n􏰋 x n−x Pr(x|n,θ)= x θ (1−θ) .
Before we continue with the Bayesian approach, we pause to discuss how a classical statistician would proceed with this problem. Recall that in the frequentist approach, the value θ can only be considered in terms of the frequency of success (“heads”) seen during an infinite number of trials. It is not valid in this framework to represent a “belief” about θ in terms of probability.
Rather, the frequentist approach to reasoning about θ is to construct an estimator for θ, which in ˆ
theory can be any function of the observed data: θ(x, n). Estimators are then analyzed in terms of their behavior as the number of observations goes to infinity (for example, we might prove that θˆ → θ as n → ∞). The classical estimator in this case is the empirical frequency θˆ = x/n.
Bayesian method
An interesting thing to note about the frequentist approach is that it ignores all prior information, opting instead to only look at the observed data. To a Bayesian, every such problem is different and should be analyzed contextually given the known information.
With the likelihood decided, we must now choose a prior distribution p(θ). A convenient prior in this case is the beta distribution, which has two parameters α and β:
p(θ | α, β) = B(θ; α, β) = 1 θα−1(1 − θ)β−1. B(α,β)
Here the normalizing constant B(α, β) is the beta function: 􏰖1
θα−1(1 − θ)β−1 dθ. can control its shape to represent a variety of different prior beliefs.
B(α, β) =
The support of the beta distribution is θ ∈ (0, 1), and by selecting various values of α and β, we
0
Given our observations D = (x, n), we can now compute the posterior distribution of θ: Pr(x | n, θ)p(θ | α, β)
p(θ | x,n,α,β) = 􏰔 Pr(x | n,θ)p(θ | α,β)dθ. 1

First we handle the normalization constant Pr(x | n, α, β): 􏰖 􏰊n􏰋 1 􏰖1
Pr(x | n, θ)p(θ | α, β) dθ = x B(α, β)
=x B(α,β) .
Now we apply Bayes theorem:
Pr(x | n, θ)p(θ | α, β) p(θ | x,n,α,β) = 􏰔 Pr(x | n,θ)p(θ | α,β)dθ
􏰌􏰊n􏰋B(α+x,β+n−x)􏰍−1􏰌􏰊n􏰋 x
= x B(α,β) x θ (1−θ)
= 1 θα+x−1(1 − θ)β+n−x−1 B(α + x, β + n − x)
θα+x−1(1 − θ)β+n−x−1 dθ 􏰊n􏰋 B(α + x, β + n − x)
0
n−x􏰍􏰌θα−1(1−θ)β−1􏰍 B(α,β)
= B(α + x, β + n − x).
The posterior is therefore another beta distribution with parameters (α + x, β + n − x); we have
added the number of successes to the first parameter and the number of failures to the second.
The rather convenient fact that the posterior remains a beta distribution is because the beta distribution satisfies a property known as conjugacy with the binomial likelihood. This fact also leads to a common interpretation of the parameters α and β: they serve as “pseudocounts,” or fake observations we pretend to have seen before seeing the data.
Figure 1 shows the relevant functions for the coin flipping example for (α, β) = (3, 5) and (x, n) = (5, 6). Notice that the likelihood favors higher values of θ, whereas the prior had favored lower values of θ. The posterior, taking into account both sources of information, lies in between these extremes. Notice also that the posterior has support over a narrower range of plausible θ values than the prior; this is because we can draw more confident conclusions from having access to more
information.
Hypothesis testing
We often wish to use our observed data to draw conclusions about the plausibility of various hypotheses. For example, we might wish to know whether the parameter θ is less than 1/2. The Bayesian method allows us to compute this value directly from the posterior distribution:
􏰖 1/2
Pr(θ < 1/2 | x,n,α,β) = For the example in Figure 1, this probability is approximately 15%. There is a sharp contrast between the simplicity of this approach and the frequentist method. The classical approach to hypothesis testing uses the likelihood as a way of generating fake datasets of the same size as the observations. The likelihood then serves as a so-called “null hypothesis” that allows us to generate hypothetical datasets under some condition. From these, we compute statistics, which, like estimators, can be any function of the hypothesized data. We then identify some critical set C for this statistic which contains some large portion (1 − α) of the values corresponding to the datasets generated by our null hypothesis. If the 2 0 p(θ | x,n,α,β)dθ. prior 3 likelihood posterior 2 1 0 0 0.2 0.4 0.6 0.8 1 θ Figure 1: An example of Bayesian updating for coin flipping. Figure produced by plot_beta_example.m. statistic computed from the observed data falls outside this set, we reject the null hypothesis with “confidence” α. Note that the “rejection” of the null hypothesis in classical hypothesis testing is purely a statement about the observed data (that it looks “unusual”), and not about the plausibility of alternative hypotheses! 3