… SML lecture
https://xkcd.com/605/
be starting
On the topic of extrapolation and train-test mismatch, see https://www.youtube.com/watch?v=es6p6NuxOnY and http://ciml.info/dl/v0_99/ciml-v0_99-ch08.pdf
Copyright By PowCoder代写 加微信 powcoder
Plan for Today
ML 101: Polynomial
Model selection
→ and how this helps curve-fitting
Gaussians (multidimensional)
various matrix
fitting: model, loss/error function, over-fitting, regularisation
Probabilities: sum rule, product rule,
Gaussians – 1D, maximum likelihood estimates (MLE), bias-variance
identities, geometric intuitions
Bernoulli, Binomial, Exponential family distributions – will be in assignment
eigenvectors
probabilities, derivatives and finding stationary
points, eigenvalues and
about the book
The machine sees:
guess, M-th order polynomials
Test error and learning curves
Training set: 10 points
Separate test set of 100 points
1: More data
regularisation
Minimize regularised error function
(more in Bayesian regression next week)
Model selection (an
Minimizing square error / maximizing data likelihood can
performance
empirical view)
w data (generalisation) – Cause: overfitting
In the curve-fitting example: the order of the polynomial controls the number of free parameters in the model and thereby governs the model complexity.
Training set
Training set
generalise
Testing set
Validation Testing set
a poor indication
How reliable are the estimates
for validation and
performance?
generalisation
[source: MML book]
Model selection
Probabilities: sum rule, product rule,
Polynomial curve fitting: model, loss/error function, over-fitting, regularisation
Gaussians – 1D, maximum likelihood estimates (MLE), bias-variance → and how this helps curve-fitting
Gaussians (multidimensional)
various matrix identities, geometric intuitions
Bernoulli, Binomial, Exponential family distributions
Review: probabilities, derivatives
and finding stationary points, eigen values and eigen vectors
Bayes Theorem
Continuous
Bayes Theorem, restated (Sec
Expectations,
variance, covariance
For review
what is the expectation taken over? probability p is
often implicit.
Question: for a random
variable x ~ p(x), do E[x] and var[x] always exist?
The Gaussian
Distribution
Maximum likelihood for univariate Gaussian
Maximum likelihood =
statistics
, the bias (or
the difference between this estimator’s
bias function
stimator is
expected value
and the true value of the parameter being estimated. An
estimator or decision rule with zero bias is called
an estimator.
. In statistics, “bias” is an
property of
“Bias” is not necessarily bad!
Q: does high
bias/variance means that the model is overfitted, or vice versa?
Bringing it together:
Curve fitting
maximum likelihood
estimate \beta
curve-fitting: predictive distribution
(will cover
next week in Bayesian linear
regression)
ML 101: Polynomial
fitting: model, loss/error function, over-fitting, regularisation
Probabilities: sum rule, product rule, Gaussians – 1D, MLE, bias-variance
→ and how this helps curve-fitting
Bernoulli, binomial
Gaussians (multidimensional)
various matrix identities, geometric intuitions
Exponential family
Review: vectors
probabilities, derivatives and finding stationary
points, eigen values and
Bernoulli to
binomial for increasing large N
Gaussians again: why
n coin tosses with p
CLT – central limit theorem
Gaussians – multidimensional
Eigen decomposition of the cov
Mahalanobis distance
of general 2-D Gaussians –
rotated ellipse
Bernoulli to
binomial for increasing large N
The Exponential family
Beyond Gaussians: What is a class of ‘nice’ distributions for statistical machine
● More expressive
● “Easy” to estimate
normalisation
normalisation
MLE and sufficient stats
Exponential family: a note
h(x)=1, g(η) = exp(-ψ(η))
Assignment 1
About these lecture notes:
● They are designed to be visual aid
for that).
● They are generally focused on derivations + plots
of the model.
● I do not aim to help you learn
about data/plots in the
but not reading material (you have
produce new equations nor new plots (they don’t necessarily 🙂
● Reasoning about ML models on toy data
● Designing appropriate toy data is a core
and less on the “story”
is a core skill of a good ML engineer. research skill in ML.
Polynomial curve fitting: model, loss/error function, over-fitting, regularisation
Model selection
Probabilities: sum rule, product rule, Gaussians – 1D, MLE, bias-variance → and how this helps curve-fitting
Gaussians (multidimensional)
various matrix identities, geometric intuitions
Bernoulli, Binomial, Exponential family distributions
Review: probabilities, derivatives
and finding stationary points, eigen values and eigen vectors
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com