CS代写 Bayesian Reasoning and Machine Learning

Bayesian Reasoning and Machine Learning
⃝c 2007,2008,2009,2010

Notation List

Copyright By PowCoder代写 加微信 powcoder

dom(x) x=x p(x = tr) p(x = fa) p(x, y) p(x∩y) p(x∪y) p(x|y)
I [x = y] pa (x) ch (x) ne (x)
X ⊥Y|Z X⊤Y|Z dim x
⟨f (x)⟩p(x)
♯ (x = s, y = t) D
a calligraphic symbol typically denotes a set of random variables . . . . . . . . 3 Domain of a variable …………………………………………….3
The variable x is in the state x ……………………………………3
probability of event/variable x being in the state true ………………. 3
probability of event/variable x being in the state false ……………….3
probability of x and y ……………………………………………4
probability of x and y ……………………………………………4
probability of x or y ……………………………………………. 4
The probability of x conditioned on y ……………………………..4 For continuous variables this is shorthand for R f(x)dx and for discrete vari-
ables means summation over the states of x, Px f(x) ………………. 7 Indicator : has value 1 if x = y, 0 otherwise ……………………….11
The parents of node x. ………………………………………… 19
The children of node x. …………………………………………19
Neighbours of node x …………………………………………..20
Variables X are independent of variables Y conditioned on variables Z. 33
Variables X are dependent on variables Y conditioned variables Z. …..33
For a discrete variable x, this denotes the number of states x can take ..43
The average of the function f(x) with respect to the distribution p(x). 139 Delta function. For discrete a, b, this is the Kronecker delta, δa,b and for
continuous a, b the Dirac delta function δ(a − b) …………………. 142 The dimension of the vector/matrix x. …………………………..150
The number of times variable x is in state s and y in state t simultaneously. 172
Dataset ………………………………………………………251 Data index ………………………………………………….. 251 Number of Dataset training points ………………………………251 The number of times variable x is in state y …………………….. 265 Sample Covariance matrix ……………………………………..283 The logistic sigmoid 1/(1 + exp(−x) ……………………………. 319 The (Gaussian) error function …………………………………. 319 The set of unique neighbouring edges on a graph ………………….529 The m×m identity matrix …………………………………….546
DRAFT March 9, 2010

Machine Learning
The last decade has seen considerable growth in interest in Artificial Intelligence and Machine Learning. In the broadest sense, these fields aim to ‘learn something useful’ about the environment within which the organism operates. How gathered information is processed leads to the development of algorithms – how to process high dimensional data and deal with uncertainty. In the early stages of research in Machine Learning and related areas, similar techniques were discovered in relatively isolated research communities. Whilst not all techniques have a natural description in terms of probability theory, many do, and it is the framework of Graphical Models (a marriage between graph and probability theory) that has enabled the understanding and transference of ideas from statistical physics, statistics, machine learning and informa- tion theory. To this extent it is now reasonable to expect that machine learning researchers are familiar with the basics of statistical modelling techniques.
This book concentrates on the probabilistic aspects of information processing and machine learning. Cer- tainly no claim is made as to the correctness or that this is the only useful approach. Indeed, one might counter that this is unnecessary since “biological organisms don’t use probability theory”. Whether this is the case or not, it is undeniable that the framework of graphical models and probability has helped with the explosion of new algorithms and models in the machine learning community. One should also be clear that Bayesian viewpoint is not the only way to go about describing machine learning and information processing. Bayesian and probabilistic techniques really come into their own in domains where uncertainty is a necessary consideration.
The structure of the book
One aim of part I of the book is to encourage Computer Science students into this area. A particular diffi- culty that many modern students face is a limited formal training in calculus and linear algebra, meaning that minutiae of continuous and high-dimensional distributions can turn them away. In beginning with probability as a form of reasoning system, we hope to show the reader how ideas from logical inference and dynamical programming that they may be more familiar with have natural parallels in a probabilistic context. In particular, Computer Science students are familiar with the concept of algorithms as core. However, it is more common in machine learning to view the model as core, and how this is implemented is secondary. From this perspective, understanding how to translate a mathematical model into a piece of computer code is central.
Part II introduces the statistical background needed to understand continuous distributions and how learn- ing can be viewed from a probabilistic framework. Part III discusses machine learning topics. Certainly some readers will raise an eyebrow to see their favourite statistical topic listed under machine learning. A difference viewpoint between statistics and machine learning is what kinds of systems we would ultimately

like to construct (machines capable of ‘human/biological information processing tasks) rather than in some of the techniques. This section of the book is therefore what I feel would be useful for machine learners to know.
Part IV discusses dynamical models in which time is explicitly considered. In particular the is treated as a form of graphical model, which helps emphasise what the model is, rather than focusing on it as a ‘filter’, as is more traditional in the engineering literature.
Part V contains a brief introduction to approximate inference techniques, including both stochastic (Monte Carlo) and deterministic (variational) techniques.
The references in the book are not generally intended as crediting authors with ideas, nor are they always to the most authoritative works. Rather, the references are largely to works which are at a level reasonably consistent with the book and which are readily available.
Whom this book is for
My primary aim was to write a book for final year undergraduates and graduates without significant expe- rience in calculus and mathematics that gave an inroad into machine learning, much of which is currently phrased in terms of probabilities and multi-variate distributions. The aim was to encourage students that apparently unexciting statistical concepts are actually highly relevant for research in making intelligent systems that interact with humans in a natural manner. Such a research programme inevitably requires dealing with high-dimensional data, time-series, networks, logical reasoning, modelling and uncertainty.
Other books in this area
Whilst there are several excellent textbooks in this area, none currently meets the requirements that I per- sonally need for teaching, namely one that contains demonstration code and gently introduces probability and statistics before leading on to more advanced topics in machine learning. This lead me to build on my lecture material from courses given at Aston, Edinburgh, EPFL and UCL and expand the demonstration software considerably. The book is due for publication by Cambridge University Press in 2010.
The literature on machine learning is vast, as is the overlap with the relevant areas of statistics, engineering and other physical sciences. In this respect, it is difficult to isolate particular areas, and this book is an attempt to integrate parts of the machine learning and statistics literature. The book is written in an informal style at the expense of rigour and detailed proofs. As an introductory textbook, topics are naturally covered to a somewhat shallow level and the reader is referred to more specialised books for deeper treatments. Amongst my favourites are:
• Graphical models
– Graphical models by S. Lauritzen, Oxford University Press, 1996.
– Bayesian Networks and Decision Graphs by F. Jensen and T. D. Nielsen, , 2007.
– Probabilistic Networks and Expert Systems by R. G. Cowell, A. P. Dawid, S. L. Lauritzen and D. J. Spiegelhalter, , 1999.
– Probabilistic Reasoning in Intelligent Systems by J. Pearl, , 1988.
– Graphical Models in Applied Multivariate Statistics by J. Whittaker, Wiley, 1990.
– Probabilistic Graphical Models: Principles and Techniques by D. Koller and N. Friedman, MIT Press, 2009.
• Machine Learning and Information Processing
– Information Theory, Inference and Learning Algorithms by D. J. C. MacKay, Cambridge Uni- versity Press, 2003.
DRAFT March 9, 2010

– Pattern Recognition and Machine Learning by C. M. Bishop, , 2006.
– An Introduction To Support Vector Machines, N. Cristianini and J. Shawe-Taylor, Cambridge
University Press, 2000.
– Gaussian Processes for Machine Learning by C. E. Rasmussen and C. K. I. Williams, MIT press, 2006.
How to use this book
Part I would be suitable for an introductory course on Graphical Models with a focus on inference. Part II contains enough material for a short lecture course on learning in probabilistic models. Part III is reasonably self-contained and would be suitable for a course on Machine Learning from a probabilistic perspective, particularly combined with the dynamical models material in part IV. Part V would be suitable for a short course on approximate inference.
Accompanying code
The MATLAB code is provided to help readers see how mathematical models translate into actual code. The code is not meant to be an industrial strength research tool, rather a reasonably lightweight toolbox that enables the reader to play with concepts in graph theory, probability theory and machine learning. In an attempt to retain readability, no extensive error and/or exception handling has been included. The code contains at the moment basic routines for manipulating discrete variable distributions, along with a set of routines that are more concerned with continuous variable machine learning. One could in principle extend the ‘graphical models’ part of the code considerably to support continuous variables. Limited support for continuous variables is currently provided so that, for example, inference in the linear dynamical system may be written in conducted of operations on Gaussian potentials. However, in general, potentials on continuous variables need to be manipulated with care and often specialised routines are required to ensure numerical stability.
Acknowledgements
Many people have helped this book along the way either in terms of reading, feedback, general insights, allowing me to present their work, or just plain motivation. Amongst these I would like to thank Massim- iliano Pontil, , -Taylor, Vladimir Kolmogorov, Yuri Boykov, , , , , , Cemgil, , , , , , , , , , Seraf ́ın Moral, ́, , , and . I would also like to thank the many students that have helped improve the material during lectures over the years. I’m particularly grateful to for allowing parts of his Lightspeed toolbox to be bundled with the BRMLtoolbox and am similarly indebted to for his GraphLayout package.
A final thankyou to my family and friends.
The code along with an electronic version of the book is available from
http://www.cs.ucl.ac.uk/staff/D.Barber/brml
Instructors seeking solutions to the exercises can find information at the website, along with additional teaching material. The website also contains a feedback form and errata list.
DRAFT March 9, 2010 V

VI DRAFT March 9, 2010

I Inference in Probabilistic Models 1
1 Probabilistic Reasoning 3
1.1 ProbabilityRefresher…………………………………. 3 1.1.1 ProbabilityTables ………………………………. 6 1.1.2 InterpretingConditionalProbability ……………………… 7
1.2 ProbabilisticReasoning ……………………………….. 8
1.3 Prior,LikelihoodandPosterior ……………………………. 10 1.3.1 Twodice:whatweretheindividualscores?………………….. 10
1.4 Furtherworkedexamples ………………………………. 11
1.5 Code…………………………………………. 15 1.5.1 BasicProbabilitycode…………………………….. 15 1.5.2 Generalutilities ……………………………….. 16 1.5.3 Anexample………………………………….. 17
1.6 Notes ………………………………………… 17
1.7 Exercises ………………………………………. 17
2 Basic Graph Concepts 19
2.1 Graphs………………………………………… 19 2.1.1 Spanningtree…………………………………. 21
2.2 NumericallyEncodingGraphs…………………………….. 21 2.2.1 Edgelist……………………………………. 21 2.2.2 Adjacencymatrix ………………………………. 21 2.2.3 Cliquematrix…………………………………. 22
2.3 Code…………………………………………. 23 2.3.1 Utilityroutines………………………………… 23
2.4 Exercises ………………………………………. 23
3 Belief Networks 25
3.1 ProbabilisticInferenceinStructuredDistributions ………………….. 25
3.2 GraphicallyRepresentingDistributions………………………… 26 3.2.1 ConstructingasimpleBeliefnetwork:wetgrass ……………….. 26 3.2.2 Uncertainevidence………………………………. 29
3.3 BeliefNetworks……………………………………. 32 3.3.1 Conditionalindependence …………………………… 33 3.3.2 Theimpactofcollisions ……………………………. 34 3.3.3 d-Separation …………………………………. 35 3.3.4 d-Connectionanddependence…………………………. 36 3.3.5 Markovequivalenceinbeliefnetworks …………………….. 37 3.3.6 Beliefnetworkshavelimitedexpressibility…………………… 39

CONTENTS CONTENTS
3.4 Causality ………………………………………. 39 3.4.1 Simpson’sparadox ………………………………. 40 3.4.2 Influencediagramsandthedo-calculus…………………….. 42 3.4.3 Learningthedirectionofarrows ……………………….. 43
3.5 ParameterisingBeliefNetworks……………………………. 43
3.6 FurtherReading …………………………………… 44
3.7 Code…………………………………………. 44
3.7.1 Naiveinferencedemo …………………………….. 44 3.7.2 Conditionalindependencedemo………………………… 44 3.7.3 Utilityroutines………………………………… 44
3.8 Exercises ………………………………………. 44
4 Graphical Models 49
4.1 GraphicalModels…………………………………… 49
4.2 MarkovNetworks…………………………………… 50 4.2.1 Markovproperties ………………………………. 51 4.2.2 Gibbsnetworks………………………………… 52 4.2.3 Markovrandomfields …………………………….. 53 4.2.4 ConditionalindependenceusingMarkovnetworks. . . . . . . . . . . . . . . . . . . . 53 4.2.5 LatticeModels ………………………………… 54
4.3 ChainGraphicalModels……………………………….. 55
4.4 ExpressivenessofGraphicalModels………………………….. 56
4.5 FactorGraphs…………………………………….. 58 4.5.1 Conditionalindependenceinfactorgraphs…………………… 59
4.6 Notes ………………………………………… 59
4.7 Code…………………………………………. 59
4.8 Exercises ………………………………………. 59
5 Efficient Inference in Trees 63
5.1 MarginalInference ………………………………….. 63 5.1.1 VariableeliminationinaMarkovchainandmessagepassing. . . . . . . . . . . . . . 63 5.1.2 Thesum-productalgorithmonfactorgraphs …………………. 66 5.1.3 Computingthemarginallikelihood………………………. 69 5.1.4 Theproblemwithloops ……………………………. 71
5.2 OtherFormsofInference ………………………………. 71 5.2.1 Max-Product…………………………………. 71 5.2.2 FindingtheNmostprobablestates ……………………… 73 5.2.3 Mostprobablepathandshortestpath …………………….. 75 5.2.4 Mixedinference………………………………… 77
5.3 InferenceinMultiply-ConnectedGraphs……………………….. 78 5.3.1 Bucketelimination………………………………. 78 5.3.2 Loop-cutconditioning …………………………….. 79
5.4 MessagePassingforContinuousDistributions…………………….. 80
5.5 Notes ………………………………………… 80
5.6 Code…………………………………………. 81
5.6.1 Factorgraphexamples…………………………….. 81 5.6.2 Mostprobableandshortestpath ……………………….. 81 5.6.3 Bucketelimination………………………………. 81 5.6.4 MessagepassingonGaussians…………………………. 82
5.7 Exercises ………………………………………. 82
VIII DRAFT March 9, 2010

CONTENTS CONTENTS
6 The Junction Tree Algorithm 85
6.1 ClusteringVariables …………………………………. 85 6.1.1 Reparameterisation………………………………. 85
6.2 CliqueGraphs ……………………………………. 86 6.2.1 Absorption ………………………………….. 87 6.2.2 Absorptionscheduleoncliquetrees………………………. 88
6.3 JunctionTrees ……………………………………. 88 6.3.1 Therunningintersectionproperty ………………………. 89
6.4 Constructing a Junction Tree for Singly-Connected Distributions . . . . . . . . . . . . . . . 92 6.4.1 Moralisation …………………………………. 92 6.4.2 Formingthecliquegraph …………………………… 92 6.4.3 Formingajunctiontreefromacliquegraph………………….. 92 6.4.4 Assigningpotentialstocliques ………………………… 92
6.5 JunctionTreesforMultiply-ConnectedDistributions . . . . . . . . . . . . . . . . . . . . . . 93 6.5.1 Triangulationalgorithms……………………………. 95
6.6 TheJunctionTreeAlgorithm …………………………….. 97 6.6.1 RemarksontheJTA……………………………… 98 6.6.2 Computingthenormalisationconstantofadistribution . . . . . . . . . . . . . . . . 99 6.6.3 Themarginallikelihood ……………………………. 99
6.7 FindingtheMostLikelyState……………………………..101
6.8 Reabsorption: ConvertingaJunctionTreetoaDirectedNetwork . . . . . . . . . . . . . . 102
6.9 TheNeedForApproximations …………………………….103 6.9.1 Boundedwidthjunctiontrees………………………….103
6.10 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6.10.1 Utility routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.11Exercises ……………………………………….104
7 Making Decisions 107
7.1 ExpectedUtility …………………………………..

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com