CS计算机代考程序代写 chain Population estimation example

Population estimation example

Population estimation example

31 August 2018

We will illustrate rejection sampling using the problem of estimating the number of trials in Binomial experiment with known success probability. This is the opposite problem to that usually encountered in Binomial experiments where the number of trials is known but the success probability is not. It is a simplified version of a problem arising in population estimation. The simplification is that usually the success probability needs to be estimated as well as the population size. Capture-recapture studies (known as dual-system estimation in official statistics) allow both the number of trials and the success probability (“capture probability”in this context) to be estimated. We assume the the capture probability is known. We let \(X\) denote the observed count, \(N\) the population size to be captured, and \(\theta\) the known capture probability. Our model of the observed count is \[X \sim \mathrm{Binomial}(N,\theta) \] The likelihood for \(N\) is \[
\begin{aligned}
p(X|N,\theta) & =\binom{N}{X}\theta^{X}(1-\theta)^{(N-X)} \\
& \propto \binom{N}{X}(1-\theta)^{(N-X)}
\end{aligned}
\] Consequently the log-likelihood is \[l(X|N,\theta) = \log(\binom{N}{X}) (N-X) \log(1-\theta) \] The prior for \(N\) is a discrete uniform on the interval \((4000,6000).\) The observed data is X=4350.

library(MASS)
library(MCMCpack)

## Loading required package: coda

## ##
## ## Markov Chain Monte Carlo Package (MCMCpack)

## ## Copyright (C) 2003-2018 . Martin, . Quinn, and Park

## ##
## ## Support provided by the U.S. National Science Foundation

## ## (Grants SES-0350646 and SES-0350613)
## ##

#wd <- "C:/Users/Patrick/Documents/Patrick" #change to your folder or hash out this and the following line #if you want to default to your own folder #setwd(wd) #set-up functions that are needed ##log pmf for N assuming a uniform prior - log-likelihood logpN <- function(N,X,theta) { #logdens <- lfactorial(N) - (lfactorial(N-n) + lfactorial(n)) + # (N-n)*log(p0) logdens <- lchoose(N,X) + (N-X)*log(1-theta) return(logdens) } check what happens for values of N < observed logpN(N=4000,X=4350,theta=0.87) ## [1] -Inf ##so likelihood is exp(logpN(N=4000,X=4350,theta=0.87)) ## [1] 0 No support for population values less than the observed count - make sense. Now we set-up the rejection sampler based on uniform prior Define a function to generate a single draw from the posterior acceptN <- function(X,theta,ntry,low=1,high=10000000) { # X is the observed count # theta - probability of capture #ntry is the maximum number of attempts at generating N ##lower bound of prior ##upper bound of prior ##calls the function logpN guess <- X/theta #MLE lower <- max((X+1),low) upper <- high max_logdens <- logpN(N=guess,X=X,theta=theta) accept <- 0 attempt <- 1 ##Define a vector of points on which the prior is defined testseq <- seq(from=lower,to=upper,by=1) while((accept < 1) & (attempt < (ntry + 1))) { Ntest <- sample(testseq,size=1) logpNtest <- logpN(N=Ntest,X=X,theta=theta) check <- logpNtest - max_logdens u <- runif(1,0,1) if (u <= exp(check)) {accept <- 1} attempt <- attempt + 1 } #end while loop return(c(Ntest,attempt)) } #End function Now all we need to do is set the desired posterior sample size and run acceptN() in a loop to get the required number of values of \(N\). npos <- 2000 possamp <- vector(mode="numeric",length=npos) nattempts <- vector(mode="numeric",length=npos) theta = 0.87 X = 4350 ##observed sample size for (i in 1:npos) { newN <- acceptN(X=4350,theta=0.87,ntry=5000,low=4000,high=6000) possamp[i] <- newN[1] nattempts[i] <- newN[2] } summary(possamp) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 4905 4981 5000 5000 5019 5108 summary(nattempts) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 2.00 9.00 18.00 24.71 34.00 203.00 plot posterior histogram hist(possamp,main=expression(paste("posterior histogram for N given: X=4350,",theta,"=0.87"))) Compare with true posterior percentiles ##Define a vector of points on which the prior is defined testseq <- seq(from=4350,to=6000,by=1) ##evaluate the log-posterior at each point logpost <- logpN(N=testseq,X=4350,theta=0.87) head(logpost) ## [1] 0.00000 6.33794 11.98296 17.22275 22.17509 26.90451 str(logpost) ## num [1:1651] 0 6.34 11.98 17.22 22.18 ... Get the cumulative distribution postdist <- exp(logpost)/(sum(exp(logpost))) head(postdist) ## [1] 7.051283e-264 3.988417e-261 1.128243e-258 2.128206e-256 3.011517e-254 ## [6] 3.409941e-252 sum(postdist) ## [1] 1 cumdist <- cumsum(postdist) ##set up indicators for the quantiles q025 <- sum((cumdist <= 0.025)) ## number of points until 2.5 percentile q50 <- sum((cumdist <= 0.5) ) q975 <- sum((cumdist <= 0.975) ) ##can near get quantiles by locating these points on the list of possible of points -- this is like inverting the cumulative distribution function Nlow <- testseq[q025] Nmedian <- testseq[q50] Nhigh <- testseq[q975] true_quant <- c(Nlow,Nmedian,Nhigh) reject_quant <- quantile(possamp,probs=c(0.025,0.5,0.975)) cat("true-quantiles",true_quant, "\n",sep=" ") ## true-quantiles 4946 4999 5053 cat("quantiles estimated from rejection sampling,", reject_quant, "\n",sep=" ") ## quantiles estimated from rejection sampling, 4949 5000 5052 The approached to computing the “true” quantiles could be refined further to account of the discrete-ness.