Introduction
Data and Knowledge
Copyright By PowCoder代写 加微信 powcoder
What have we done?
• How to quantify beliefs
What have we done?
• How to quantify beliefs
• How to integrate beliefs with data
What have we done?
• How to quantify beliefs
• How to integrate beliefs with data • How to reach updated beliefs
What have we not done?
• How to extract and formulate domain specific knowledge mathematically
What have we not done?
• How to extract and formulate domain specific knowledge mathematically • How to iteratively improve our models
What have we not done?
• How to extract and formulate domain specific knowledge mathematically • How to iteratively improve our models
• How to interpret our results and make decisions
What have we not done?
• How to extract and formulate domain specific knowledge mathematically
• How to iteratively improve our models
• How to interpret our results and make decisions
• How to perform statistical inference when we have intractable computations
Approximate Posterior Inference
• When we have non-conjugate models we have to compute the marginal likelihood
Approximate Posterior Inference
• When we have non-conjugate models we have to compute the marginal likelihood
• We have to approximate the computation
Approximate Posterior Inference
• When we have non-conjugate models we have to compute the marginal likelihood
• We have to approximate the computation
• Deterministic approximations
Approximate Posterior Inference
• When we have non-conjugate models we have to compute the marginal likelihood
• We have to approximate the computation
• Deterministic approximations • Stochastic approximations
Point Estimates
• Bayesian Inference
p(θ | D) = p(D | θ)p(θ) p(D)
Point Estimates
• Bayesian Inference
p(θ | D) = p(D | θ)p(θ)
θˆ = argmaxθ p(D | θ)
• Maximum Likelihood (ML)
Point Estimates
• Bayesian Inference
• Maximum Likelihood (ML)
θˆ = argmaxθ p(D | θ) • Maximum-a-Posteriori (MAP)
θˆ = argmaxθ p(D | θ)p(θ)
p(θ | D) = p(D | θ)p(θ)
Model Selection
p(θ | y) = p(y | θ)p(θ)
p(y | θ)p(θ)dθ
Likelihood How much evidence is there in the data for a specific hypothesis Prior What are my beliefs about different hypothesis
Posterior What is my updated belief after having seen data Evidence What is my belief about the data
The Compute: Evidence
p(y | θ)p(θ)dθ
Regression Model
Which Parametrisation
• Should I use a line, polynomial, quadratic basis function? • How many basis functions should I use?
• Likelihood won’t help me
• How do we proceed?
Regression Models
Linear Linear Model
Basis function
p(yi|xi,w)=N(w0 +w1 ·xi,β−1)
p(yi|xi, w) = N ( wiφ(xi), β−1)
p(Y|W)p(W)dW
Probabilities are a zero-sum game
Model Selection1
1 D Thesis
Occams Razor
Occams Razor
Definition (Occams Razor)
“All things being equal, the simplest solution tends to be the best one” – William of Ockham
What is Simple?2
2 https://www.imdb.com/title/tt8132700/
The Mac Mackay, 1991
Hypothesis Spaces
Composite Functions
f(x)=fL ◦fL−1 ◦···◦f0(x)
What Does Compositions Do?
Im(f)[X] = {f(x) | x ∈ X} Kern(f)[X]={(x,x′)|f(x)=f(x′), (x,x′)∈X ×X}
What Does Compositions Do?
Kern(f1)⊆Kern(fk−1 ◦…◦f2 ◦f1)⊆Kern(fk ◦fk−1 ◦…◦f2 ◦f1) Im(fk ◦fk−1 ◦…◦f2 ◦f1)⊆Im(fk ◦fk−1 ◦…◦f2)⊆…⊆Im(fk)
Why would you ever want this?
y1 = {x1,x2,x3,x4} y2 = {x5,x6,x7,x8,x9} y3 = {x10} ···
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
Why would you ever want this?
y1 = {x1,x2,x5,x6,x7} y2 = {x3,x4} y3 = {x8,x9,x10} ···
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
Why would you ever want this?
y1 = {x1,x2,x3,x5,x8,x10} y2 = {x4,x9} y3 = {x6,x7} ···
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
Composite Functions
Composite Functions
Composite Functions
Composite Functions
Composite Functions
Composite Functions
Composite Functions
Composite Functions
Composite Functions
Composite Functions
Composite Functions
Composite Functions
Why would you ever want this?
y1 = {x1,x2,x3,x5,x8,x10} ···
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
y2 = {x4,x9} y3 = {x6,x7}
Why would you ever want this?
y1 = {x1,x2,x3,x5,x8,x10} y2 = {x4,x9} y3 = {x6,x7} ···
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
y1 = {x1,x2,x3,x5,x8,x10} y2 = {x4,x9} y3 = {x6,x7}
Composite Functions
• Simple functions in composition give rise to complicated composite behaviour
Composite Functions
• Simple functions in composition give rise to complicated composite behaviour
• Small changes in early compositions give rise to large changes in composite behaviour
Composite Functions
• Simple functions in composition give rise to complicated composite behaviour
• Small changes in early compositions give rise to large changes in composite behaviour
• Over parametrisation give rise to symmetries
Why would you ever want this?
y1 = {x1,x2,x3,x5,x8,x10} y2 = {x4,x9} y3 = {x6,x7} ···
x1 x2 x3 x4?x5 x6 x7 x8 x9 x10? y1 = {x1,x2,x3,x5,x8,x10} y2 = {x4,x9} y3 = {x6,x7}
Why Compositional Functions?
• Compositional functions cannot do more
Why Compositional Functions?
• Compositional functions cannot do more
• Compositional functions introduce symmetries in objective
Why Compositional Functions?
• Compositional functions cannot do more
• Compositional functions introduce symmetries in objective • These symmetries turns out to be excellent for optimisation
Why Compositional Functions?
• Compositional functions cannot do more
• Compositional functions introduce symmetries in objective • These symmetries turns out to be excellent for optimisation • We are not really sure why
Is this useful?
“A theory that explains everything, explains nothing” – The Logic of Scientific Discovery
Data and Knowledge
• Machine Learning might look like it is changing very quickly
• Machine Learning might look like it is changing very quickly
• it is not
• Machine Learning might look like it is changing very quickly
• it is not
• Remember to attribute the advances to the right thing
• Machine Learning might look like it is changing very quickly
• it is not
• Remember to attribute the advances to the right thing
• we are mainly using statistical methods that are decades if not centuries old
• Machine Learning might look like it is changing very quickly
• it is not
• Remember to attribute the advances to the right thing
• we are mainly using statistical methods that are decades if not centuries old • however we have access to vast amounts of data
• Machine Learning might look like it is changing very quickly
• it is not
• Remember to attribute the advances to the right thing
• we are mainly using statistical methods that are decades if not centuries old
• however we have access to vast amounts of data
• it turns out that with very large volumes of data a lot of problems are a lot easier than
we thought
What to do next?
• Define a project, do not do ML in isolation, find something that you want to solve that has data
What to do next?
• Define a project, do not do ML in isolation, find something that you want to solve that has data
• Understand the problem in depth
What to do next?
• Define a project, do not do ML in isolation, find something that you want to solve that has data
• Understand the problem in depth
• Do not decide on methods, do not start writing code but extract as much knowledge you have
What to do next?
• Define a project, do not do ML in isolation, find something that you want to solve that has data
• Understand the problem in depth
• Do not decide on methods, do not start writing code but extract as much knowledge you have
• Formulate the knowledge mathematically
What to do next?
• Define a project, do not do ML in isolation, find something that you want to solve that has data
• Understand the problem in depth
• Do not decide on methods, do not start writing code but extract as much knowledge you have
• Formulate the knowledge mathematically
• What model exists that can use this data
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com