编程代写 PROJECT 3051

STATISTICS IN MEDICINE
Statist. Med. 2009; 28:3049–3067
Published online 24 July 2009 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/sim.3680
The BUGS project: Evolution, critique and future directions 1,∗,†, 1, 2 and 3

Copyright By PowCoder代写 加微信 powcoder

1Medical Research Council Biostatistics Unit, Institute of Public Health, University Forvie Site, Robinson Way, Cambridge CB2 0SR, U.K.
2School of Mathematics and Statistics, Mathematical Institute, North Haugh, St. Andrews, Y16 9SS, U.K. 3Department of Epidemiology and Public Health, Imperial College London, St. Mary’s Campus, Norfolk Place, London W2 1PG, U.K.
BUGS is a software package for Bayesian inference using Gibbs sampling. The software has been instru- mental in raising awareness of Bayesian modelling among both academic and commercial communities internationally, and has enjoyed considerable success over its 20-year life span. Despite this, the software has a number of shortcomings and a principal aim of this paper is to provide a balanced critical appraisal, in particular highlighting how various ideas have led to unprecedented flexibility while at the same time producing negative side effects. We also present a historical overview of the BUGS project and some future perspectives. Copyright q 2009 & Sons, Ltd.
KEY WORDS: BUGS; WinBUGS; OpenBUGS; Bayesian modelling; graphical models
1. INTRODUCTION
BUGS [1] is a software package for performing Bayesian inference using Gibbs sampling [2, 3]. The BUGS project began at the Medical Research Council Biostatistics Unit in Cambridge in 1989. Since that time the software has become one of the most popular statistical modelling packages, with, at the time of writing, over 30 000 registered users of WinBUGS (the Microsoft Windows incarnation of the software) worldwide, and an active on-line community comprising over 8000 members. Typing ‘WinBUGS’ into Google generates over 100 000 hits; in Google Scholar the figure is in excess of 5000; and simply searching Statistics in Medicine on-line gives nearly 100 hits.
BUGS has been just one part of the tremendous growth in the application of Bayesian ideas over the last 20 years. Prior to the widespread introduction of simulation-based methods, Bayesian ideas
∗Correspondence to: , Medical Research Council Biostatistics Unit, Institute of Public Health, University Forvie Site, Robinson Way, Cambridge CB2 0SR, U.K.
Contract/grant sponsor: Medical Research Council; contract/grant number: U.1052.00.005
Received 31 March 2009 Copyright q 2009 & Sons, Ltd. Accepted 19 June 2009

3050 D. LUNN ET AL.
could only be implemented in circumstances in which solutions could be obtained in closed form in so-called conjugate analyses, or by ingenious but restricted application of numerical integra- tion methods. BUGS has therefore been instrumental in raising awareness of Bayesian modelling among both academic and commercial communities internationally, and has greatly facilitated the routine analysis of many complex statistical problems. Numerous applications of the software can be found in the literature, in a wide array of application areas, including disease mapping [4], pharmacometrics [5], ecology [6], health-economics [7], genetics [8], archaeology [9], psycho- metrics [10], coastal engineering [11], educational performance [12], behavioural studies [13], econometrics [14], automated music transcription [15], sports modelling [16], fisheries stock assessment [17], and actuarial science [18]. The software is also used widely for the teaching of Bayesian modelling ideas to students and researchers the world over, and several texts use BUGS extensively for illustrating the Bayesian approach [19–26]. The importance of the soft- ware has been acknowledged in ‘An International Review of U.K. Research in Mathematics’ (http://www.cms.ac.uk/irm/irm.pdf).
Despite this apparent success, we are well aware that BUGS has some significant shortcomings and it is our intention here to provide a balanced critical appraisal of the software in its current form(s). We will also present an overview of the way in which the software has evolved, and a discussion of potential future directions for the BUGS project. We recognize that not all parts of the paper will be of interest to all readers, and so try to provide appropriate guidance.
The structure of this paper is as follows. In Section 2 we explain the intimate and vital connection between BUGS and graphical modelling, while in Section 3 we present an overview of the BUGS language: these sections could be skipped by those familiar to ideas behind the program. In Section 4 we describe how the software has evolved, which goes into more technical detail regarding the ‘guts’ of the program and how the different versions were developed and funded. The technical material is presented in a separate subsection (Section 4.2) and can be skipped by those who are less interested in how BUGS’ capabilities have expanded over the years. Section 5 provides a critical appraisal and discusses features that should be irritatingly familiar to all those who have struggled with the program. A concluding discussion, including some future perspectives, is given in Section 6.
2. A BASIS IN GRAPHICAL MODELLING
The popularity of the software stems from its flexibility, which is due to the exploitation of graphical modelling theory. Consider a simple linear regression problem given by:
yi ∼N(􏰀i,􏰁2), 􏰀i =􏰂+􏰃xi, i=1,…,N
forresponsesyi andobservedvaluesxi ofasinglecovariate(i=1,…,N).Asweareworkingin a Bayesian setting we assign prior distributions to the unknown parameters, 􏰂, 􏰃 and 􏰁2 (or 􏰁, or log􏰁, say). For example,
􏰂∼N(m􏰂,v􏰂), 􏰃∼N(m􏰃,v􏰃), log􏰁∼U(a,b)
for fixed constants m􏰂, v􏰂, m􏰃, v􏰃, a and b. An alternative representation of this model is the directed acyclic graph (DAG; see [27], for example) shown in Figure 1, where each quantity in the model corresponds to a node and links between nodes show direct dependence. The graph is directed because each link is an arrow; it is acyclic because by following the arrows it is not possible to return to a node after leaving it.
Copyright q 2009 & Sons, Ltd. Statist. Med. 2009; 28:3049–3067 DOI: 10.1002/sim

THE BUGS PROJECT 3051
Figure 1. Directed acyclic graph (DAG) corresponding to linear regression example—see text for details.
The notation is defined as follows. Rectangular nodes denote known constants. Elliptical nodes represent either deterministic relationships (i.e. functions) or stochastic quantities, i.e. quantities that require a distributional assumption. Stochastic dependence and functional dependence are denoted by single-edged arrows and double-edged arrows, respectively. Repetitive structures, such as the ‘loop’ from i=1 to N, are represented by ‘plates’, which may be nested if the model is hierarchical.
A node v is said to be a parent of node w if an arrow emanating from v points to w; furthermore, w is then said to be a child of v. We are primarily interested in stochastic nodes, i.e. the unknown parameters and the data. When identifying probabilistic relationships between these, deterministic links are collapsed and constants are ignored. Thus, the terms parent and child are usually reserved for the appropriate stochastic quantities. In the above example, the stochastic parents of each yi are 􏰂, 􏰃 and log􏰁, whereas we can refer to 􏰀i and 􏰁2 as direct parents.
DAGs can be used to describe pictorially a very wide class of statistical models through describing the ‘local’ relationships between quantities. It is when these models become complicated that the benefits become obvious. DAGs communicate the essential structure of the model without recourse to a large set of equations. This is achieved by abstraction: the details of distributional assumptions and deterministic relationships are ‘hidden’. This is conceptually similar to object- oriented programming (OOP) (see [28], for example), which realization has been instrumental in maximizing the flexibility of BUGS, as we will discuss later.
The joint distribution of all nodes V in any DAG can be factorized as follows [29]:
p(V)= 􏰈 p(v|Pv) (1) v∈V
Copyright q 2009 & Sons, Ltd.
Statist. Med. 2009; 28:3049–3067 DOI: 10.1002/sim

3052 D. LUNN ET AL.
where Pv denotes the set of (stochastic) parents of node v. With regard to Bayesian inference, the data D and the unknowns 􏰄, say, together form a partition of V , and so the joint posterior density p(􏰄|D)∝p(V). For the purposes of Gibbs sampling we are interested in the full conditional distributions of each unknown stochastic node conditional on the values of all other nodes in the graph. For two arbitrary sets of nodes A and B, say, let A\B denote ‘all elements of A except those in B’. Then the full conditional for a specific node w, say, is denoted p(w|V\w). As {w,V\w} is a also a partition of V , we have that the full conditional is also proportional to the right-hand side of (1). However, p(w|V \w) is a distribution in w only, and so we can ignore any factors not
involving w, giving
p(w|V \w)∝ p(w|Pw)× 􏰈 p(v|Pv) v∈Cw
where Cw denotes the children of w. The beauty of the factorization (1) is thus two-fold. First, we can write down the joint posterior for any DAG simply by knowing the relationship between each node and its parents. Second, the full conditional distribution for any node (or set of nodes) is a local computation on the graph, involving only the node-parent dependencies for the node of interest itself and its children. Thus, one only ever needs to consider a small part of the model at any given time, without needing to take account of the bigger picture. The BUGS software exploits these facts first by providing a language (the BUGS language) for specifying arbitrary child–parent relationships, and by ‘inverting’ these to determine the set of nodes relevant to each full conditional calculation: these comprise the children, parents and ‘co-parents’ collectively known as the ‘Markov blanket’. The software also provides a mechanism whereby each node can communicate, to the ‘inference engine’, how it is related to its parents. With these facilities in place, Gibbs sampling on arbitrary graphs is conceptually straightforward.
3. THE BUGS LANGUAGE
As noted above, BUGS provides its own language for the textual specification of graphical models. The language is designed to mimic the mathematical specification of the model in terms of parent– child relationships. Stochastic relationships are denoted with ‘∼’ whereas logical/deterministic relationships are denoted with a ‘<-’. Repetitive structures are represented using ‘for-loops’, which may be nested if the model is hierarchical, say. The following code specifies the linear regression graph referred to in the previous section: model { for(iin1:N){ y[i] ̃ dnorm(mu[i], tau) mu[i] <- alpha + beta * x[i] } alpha ̃ dnorm(m.alpha, p.alpha) beta ̃ dnorm(m.beta, p.beta) log.sigma ̃ dunif(a, b) sigma <- exp(log.sigma) sigma.sq <- pow(sigma, 2) tau <- 1 / sigma.sq Copyright q 2009 & Sons, Ltd. Statist. Med. 2009; 28:3049–3067 DOI: 10.1002/sim THE BUGS PROJECT 3053 p.alpha <-1/v.alpha p.beta <- 1 / v.beta } The code should be reasonably self-explanatory but a few notes are required for clarification. First note that normal distributions (dnorm(.,.)) are parameterized in terms of precision (equal to 1/variance) as opposed to variance. This can be a source of confusion and reflects the fact that the software was originally designed to exploit mathematical conveniences (so-called conjugacy) where possible. Although this is no longer necessary and the need for an intuitive prior typically outweighs the benefit of any such conjugacy, the parameterization remains unchanged. In addition, note that the data, y[1:N], x[1:N], N, and the values of m.alpha, v.alpha, m.beta, v.beta, a and b in this case, are loaded into the system separately. Finally, note that the pow(.,.) function raises the first argument to the power of the second argument, and so the sigma.sq variable represents 􏰁2. The BUGS language is a declarative language. This means that it does not matter in which order the various statements are made, which makes it easier for the system to interpret the model specification. The downside of this, however, is that there is no scope for ‘logical’ expressions such as IF–THEN–ELSE statements. There are ways around this, for example, by making use of the step(.) function (equal to 1/0 depending on the positive/negative status of the argument) to switch between modelling terms, but this still represents a significant limitation of the language. Despite this the language has proved enormously successful and capable of expressing a vast array of model-types. To illustrate the flexibility afforded by the language, let us elaborate the model described above a little. First, suppose that the covariate x[] is subjected to measurement error. We could model this, for example, by adding the assumptionx[i]∼dnorm(mu.x[i],tau.x), where themu.x[i] terms are unknown and assigned an appropriate prior, e.g. mu.x[i]∼dnorm(m.x,p.x) with the mean m.x and precision p.x known, and tau.x, for the sake of simplicity, also known. Now suppose, for instance, that the responses y[] represent growth data on a given individual, collected at timesx[]. If we had such data on a number of individuals (indexed by j=1,...,K) as opposed to just one, then we might wish to assume different individuals to have distinct but similar (exchangeable) parameters via 􏰂j ∼N(m􏰂,v􏰂), 􏰃j ∼N(m􏰃,v􏰃), j=1,...,K, where now m􏰂, v􏰂, m􏰃 and v􏰃 are unknown and assigned appropriate priors, e.g. m􏰂 ∼N(0,1002), √v􏰂 ∼U(0,100), m􏰃 ∼N(0,1002), √v􏰃 ∼U(0,100) To implement this now hierarchical model we simply add lines to represent the priors for m􏰂, v􏰂, m􏰃 and v􏰃, e.g. m.aplha∼dnorm(0,0.0001), and nest any individual-specific statements within a loop over individuals (indexed by j): for(jin1:K){ for(iin1:N){ y[i,j] ̃ dnorm(mu[i,j], tau) mu[i,j] <- alpha[j] + beta[j] * x[i] } alpha[j] ̃ dnorm(m.alpha, p.alpha) beta[j] ̃ dnorm(m.beta, p.beta) } Copyright q 2009 & Sons, Ltd. Statist. Med. 2009; 28:3049–3067 DOI: 10.1002/sim 3054 D. LUNN ET AL. Suppose we wish to fit a non-linear growth curve, 􏰀 =􏰂 −􏰃 􏰅xi , say, as opposed to a straight line. Then we would rewrite the definition of mu[i,j] as follows: mu[i,j] <- alpha[j] - beta[j] * pow(gamma[j], x[i]) If the 􏰂js and 􏰃js are as before and we further suppose 􏰅j ∼U(0,1) for j=1,...,K, then we simply include the line gamma[j]∼dunif(0,1) within the j-loop (but outside the i-loop). Finally, to robustify the model against any outlying observations we might wish to assume the y[i,j]s arise from a Student-t as opposed to a normal distribution. Then we would modify the assumed relationship between each y[i,j] and its parents to y[i,j]∼dt(mu[i,j],tau,d), where the degrees-of-freedom parameter d is either assumed to be known (e.g. d <- 4) or assigned an appropriate prior, Poisson, say, d∼dpois(...). One of the reasons the BUGS language is so flexible is that it is open ended. It is quite straightforward to introduce new ‘vocabulary’, i.e. new distributions and functions, and indeed this can be done without any part of the existing software being modified or even recompiled. This is due to the object- and component-oriented [30] nature of the software, which is discussed throughout. 4. EVOLUTION OF BUGS 4.1. Inception and early years In the early 1980s one focus of work in artificial intelligence concerned expert systems, which were programs designed to mimic expertise in complex situations. A basic principle was a separation of the knowledge base that encapsulated expertise, from the inference engine that controlled the response to new evidence and queries: there was agreement that a natural computer implementation of a knowledge base was in a declarative rather than a procedural form, so that the program would express local relationships between entities which could then be manipulated using the inference engine. Where there was substantial disagreement and sometimes passionate argument was about how to deal with uncertainty. For example, extremely influential work from Stanford featured rule-based systems in which uncertainty was handled using a system of ‘certainty factors’ attached to rules and manipulated according to somewhat arbitrary principles—this was labeled as ‘ad hoc quantitative mumbo-jumbo’ by a prominent statistician (Smith, discussion of [31]). The challenge was to come up with a more rigorous process based on probability theory but that still retained the attractive aspect of local computation. The potential role of graphical models, also to become known as Bayesian networks, was argued both in the U.S. [32] and in the U.K. [33], and exact means of propagating uncertainty in tree structures [34] and general networks [35] were developed. Pearl [36] noted that simulation methods could also be used for approximate inference in directed graphical models, and Spiegelhalter and Lauritzen [37] showed that graphical models could also include parameters as nodes, hence facilitating general Bayesian inference. It became apparent that, by merging these ideas, the computations required for Bayesian inference were simply an iterative sequence of local calculations on the graph, and that each such calculation, viewed abstractly, was identical. OOP would therefore prove highly effective: not only does it provide a means of describing relationships between entities of different, but related, types, i.e. a virtual graphical Copyright q 2009 & Sons, Ltd. Statist. Med. 2009; 28:3049–3067 DOI: 10.1002/sim THE BUGS PROJECT 3055 model (VGM),‡ but it also allows abstract computation, vastly simplifying implementation (and maintenance) of the algorithm (see [38] for details). Putting these ideas together led to the concept of a general program that could handle arbitrary models and would use simple Gibbs sampling to simulate from nodes conditional on their neighbours in the graph. A start was made on the BUGS program in 1989 with the appointment of to the MRC Biostatistics Unit: it is rather extraordinary that at the same time the classic MCMC work of Gelfand and Smith [3] was being carried out 80 miles away in Nottingham but entirely independently and from a rather different starting point. The language chosen for implementation was Modula-2 [39]. Actually this is not an object- oriented language but it can be used to a similar effect. In addition it is modular (as one might expect from its name), which facilitates the hiding of detail, an important concept in designing complex architectures, for much the same reasons as it is effective in graphical modelling. An obvious alternative might have been C++; however, as points out ‘I am unable to program in C++. If I was clever enough to program in C++ I would have a very good job in say banking.’ A prototype on a PC was demonstrated at the 4th Bayesian Valencia meeting in April 1991 [40], and a Unix version at the Practical Bayesian Statistics meeting in Nottingham in 1992 [1]. Version 0.1 for Unix was released in February 1993 and further refinements followed until a stable Version 0.5 in 1995. Early versions were distributed on disks and a voluntary fee of £25 or $40 solicited. A large impetus was the INSERM workshop on MCMC methods in 1993 which had a follow-up workshop in Cambridge using BUGS and led to the MCMC in Practice book [41] (which, rather remarkably, is still selling well). Throughout this period interest in MCMC was steadily growing, and the portfolio of examples grew, including the standard lip-cancer spatial mapping example which was made to work on the day before the Basel–Amsterdam Bayesian Riverboat conference in 1993 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com