https://xkcd.com/2118/
Announcements
Course schedule updated.
Copyright By PowCoder代写 加微信 powcoder
person lectures – PSYC G8
Gaussian Process 2
Week 10 Wed – Guest lecture: ML
for Cyber security
tutorial in week 7 – assignment 1 walk-through during drop-in, will
tutorial in week 8 – enjoy quiz 2 : )
due in week 12.
● Video assignment released,
● Assignment 2 out soon.
● Quiz 2 released, 9 multiple choice questions.
● Assignment 1 raw grades out, regrade-request open on Gradescope for 2 weeks.
Thank You for anonymous
Student A: Math too hard!
survey input
Student B: “Content is too easy. Does not feel challenging enough to be a 4000 level course. Make assessment more difficult, and material more in depth.”
? skipped derivations / proofs
Thanks. Let’s clarify some expectations for derivations / proofs.
Things that are important and appearing for the first time are generally covered; when a proof is skipped it usually means we expect students to be able to follow the overall logic, and the model design arguments without the proof.
? motivating
the material
Motivations tend to be mathematical in this course. We’re happy to help in a number of ways.
? which parts are important
Good question. We’ll try to clarify + reemphasize this in the intro/outro slides. If still unclear, do ask questions during class during tuts, or on piazza.
Gaussian Processes –
Motivation
Defining Gaussian Processes (GP)
Kernel functions, sampling
GP regression
GP regression – predictive distribution
Sampling algorithm and computational costs
Regression
Bishop, Chap 6.4 (6.4.1-6.4.3)
http://gaussianprocess.org/gpml/chapte rs/ Chap 1, 2.1, 2.2, 2.5
In a geological inversion problem, properties such as
temperature, susceptibility observations
conductivity, density, magnetic
and permeability are inferred from related such as gravity, magnetics and seismic
reflexion. … In this paper we formulate geophysical inversion as a machine learning problem, and propose an approach based on Gaussian processes regression that naturally
provides both a predictive distribution over the inverted quantities and a principled method to fuse different types of observations. We apply our method to a real dataset from South Australia containing gravity and drill-hole data with the goal of characterizing rock densities for geothermal
target exploration, and also to simulated validation data involving gravity, drill-hole and magnetic observations.
Two ways to learn a
Motivating scenarios
● SARCOS robotic arm
○ x: (7 joint positions, 7 joint velocities, 7 joint accelerations)
○ y: 7 joint torques
Representation choices
● Restrict the class of functions
● Give prior probability to every possible function
function (regression)
○ Learning: y ~ f(x)
○ Prediction: compute the torque needed to move the arm along a given trajectory
● Predict the yield of a chemical plant (y), given temperature, pressure, amount of catalyst, etc
Hmm, we don’t know a functional form of y (in either input or kernel space)
With full uncertainty
estimate in y
we consider, e.g.
Rely on parameter of the function, which can be a Un-countably infinite number of input points (e.g. R, R2, …) “Pretend” that it’s a very long vector
Un-countably infinite number of functions
[GPbook, intro]
how to do this : )
Prediction function for regression
Prediction function for new x
Kernelized version
What is the Bayesian version
regression (isotropic
Prediction function
Is there a
kernelized version of this? – can I have both kernels and prior
Motivation
In week 6,
we talked about kernels in non-probabilistic settings:
Can one use kernels in a
probabilistic setting?
Starting again from linear regression
Claim: Y is
Gaussian … only it’s
mean and covariance matters.
distribution for any
A Gaussian Process is defined
of values of y(x) evaluated at an arbitrary set of
distribution.
More generally,
Common choice: set mean to zero,
Input x in 2-D –
as a probability distribution over functions
finite set of values y(x ), . .
a stochastic process y(x) is specified by giving the joint
x1, . . . , xN jointly have a Gaussian
1 . , y(xN) in a consistent manner. due to no prior knowledge of y(x)
Fully specified by their second-order statistics
distribution of any y1, … yN
random field (also called Kriging in statistics)
y(x) such that the set
probability
Kernel functions in GP
Implicitly defined by φ(x)
One can also explicitly define a kernel
The whole heap of kernel construction tricks apply …
[GP book, Fig 1.2(a)]
Q: How to sample from a GP? How are these figures produced?
[GP book, Sec A.2] http://gaussianprocess.org/gpml/chapters/RWA.pdf
Intuition:
numpy.random.standard_normal
numpy.random.multivariate_normal
Why Cholesky rather than
eigen decomposition?
https://math.stackexchange.com/questions/2840755/what-is-the-computation-time-of-lu-cholesky-and-qr-decomposition
Gaussian Processes –
Motivation
Defining Gaussian Processes (GP)
Kernel functions, sampling
GP regression
GP regression – predictive distribution
Sampling algorithm and computational costs
Regression
Bishop, Chap 6.4 (6.4.1-6.4.3)
http://gaussianprocess.org/gpml/chapte rs/ Chap 1, 2.1, 2.2, 2.5
for regression
GP for regression
Let’s explicitly
We covered
representation and “training” (implicit) in GP.
Predictive distribution
The conditional mean and variance expressed as a function of xN+1
of tN+1 given t1:N
GP book Chap 2.2
Noise-less: CN=KN
GP book Chap 2.2
GP book Chap 2.2
[Bishop 6.4]
Implementing
GP book Chap 2.2, A.4
LLT, L lower-triangular
To solve Ax = b
or A-1b for square A
i.e. x = A\b
First solve Ly=b, y = L \ b
Then LTx =
x = LT\y = LT\ (L \b)
Bottleneck:
vs Bayesian
compute C -1, O(N3) – at training time 2N
O(N ) for each test point
regression
regression
Bottleneck:
compute SN, O(M3) – at training time O(M2) for each test point
w basis functions
example application
● SARCOS robotic arm
○ x: (7 joint positions, 7 joint velocities, 7 joint accelerations)
○ y: 7 joint torques
Learning: y ~
Prediction: compute
Squared exponential covariance function
Separate length scales for each input dim, σ 2, σ 2 fn
Optimise marginal likelihood on subset of data
~49K input-output pairs
44,484 for training
4,449 for testing
Evaluation metrics
SMSE – Standardized mean square error MSLL – Mean standardised log loss
needed to move the arm along a given trajectory
*speed up with approximations, see Table 8.1
GP book Chap 2.5
Learning hyperparameters
Gaussian Processes –
Motivation
Defining Gaussian Processes (GP)
Kernel functions, sampling
GP regression
GP regression – predictive distribution
Sampling algorithm and computational costs
Regression
Bishop, Chap 6.4 (6.4.1-6.4.3)
http://gaussianprocess.org/gpml/chapte rs/ Chap 1, 2.1, 2.2, 2.5
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com