https://xkcd.com/388/
Announcements
In person lecture: Wed 16 March, PHYS
Copyright By PowCoder代写 加微信 powcoder
T (will try simulcast in Teams, with us luck!)
https://studentvip.com.au/anu/main/maps/142757
Quiz 1 next week, due
Thu (releasing by Mon)
Linear models for
Classification
of decision theory (Sec
Discriminant functions – why least
The perceptron algorithm
Probabilistic generative models – origin
Probabilistic discriminative models Logistic regression
classification
Laplace approximation (Bayesian logistic regression)
doesn’t work here
of the logistic function
Origin of the logistic function, logistic regression and how it connects to perceptron
Image from wikipedia
Representing
Meeting in person/remote, pass/fail, benign/malignant, should be given credit (Y/N), …
Iris flowers, object classes in photos, natural language, …
Other representations abound, e.g. structured (time series, sequences – PRML Chapter 13, graphs), Sec 4.1.5 (an output representation that connects Fisher Discriminant to least squares),
Sec 4.1.7 perceptrons, SVMs
A primer on decision theory
Inference: compute
, e.g. from a set of training data.
Decision theory: take a specific action based on our understanding of the values t is likely to take.
(end of page 38) “We shall see that the decision stage is generally very simple, even trivial, once we have solved the inference problem. “
— many counter-examples! e.g. multiple medical tests, test + quarantine for covid
— frontiers of theoretical and practical ML, e.g. active/online learning, ML and economics
Minimise error
the potential problems
for minimising
classification error?
Example scenario: medical diagnosis
Minimise loss,
Decision: argminj
Brodersen, K. H.; Ong, C. S.; Stephan, K. E.; Buhmann, J. M., 2010. The balanced accuracy and its posterior distribution. International Conference on Pattern Recognition.
(many other metrics also address this problem)
metric, having
a reject option
A modular view
of (supervised) ML
Classification
we covered
of decision theory (Sec
Discriminant functions – why least
The perceptron algorithm
Probabilistic generative models – origin
Probabilistic discriminative models Logistic regression
(Laplace approximation, Bayesian
doesn’t work here
of the logistic function
logistic regression)
Discriminant function
Linear discriminant functions
Decision boundary y(x)=0 — is a hyperplane of D-1 dimensions (in a D-dimensional input space).
w is orthogonal to any vector lying in the decision surface.
Contour lines of y(x) are hyperplanes too
y(x) is the signed distance of point x from the hyper-plane (see
next page)
Discriminant for
two classes
Discriminant function for K>=2
series of functions k=1,.., K,
Claim: The decision regions of such singly connected and convex.
a discriminant are always
Can we use least squares?
Mismatch between
model requirement and model specification.
The (historically
significant) perceptron [Rosenblatt 1962]
Stochastic gradient descent
“The perceptron convergence theorem states that if there exists an exact solution (in other words, if the training data set is linearly separable), then the perceptron learning algorithm is guaranteed to find an exact solution in a finite number of steps … however, until convergence is achieved, we will not be able to
distinguish between a nonseparable problem and one that is simply slow to converge. ”
Another look at
view of supervised ML
Classification
we covered
of decision theory (Sec
Discriminant functions – why least
The perceptron algorithm
Probabilistic generative models – origin
Probabilistic discriminative models Logistic regression
(Laplace approximation, Bayesian
doesn’t work here
of the logistic function
logistic regression)
Bayes theorem, again
logistic sigmoid function
a.k.a log odds
Probabilistic Generative
Models – Multiclass
Class-conditional distribution being Gaussians
Maximum likelihood estimates
The maximum is
MLE solution for means and
covariance
O(D2) parameters!
Left as reading:
Discrete features – Naive Bayes
Sec 4.2.3 naive bayes
Sec 4.2.4 exponential family
Naive Bayes classifier
Linear functions of input x
we covered
Classification
of decision theory (Sec
Discriminant functions – why least
doesn’t work here
(Fisher discriminant)
The perceptron algorithm
Probabilistic generative models – origin
of the logistic function
Probabilistic discriminative models
Logistic regression
(Laplace approximation, Bayesian
logistic regression)
Probabilistic Discriminative Models
Fixed basis
Original input
vs feature
Logistic regression
The name came
from statistics.
It’s a model for classification rather than regression!
Compare to generative model of one Gaussian per class with shared variance
Same expression for posterior estimates, but linear number of parameters rather than quadratic.
Maximum likelihood for logistic regression
A critique of logistic
regression
PRML Fig 10.13
Courtesy of Aditya (Google Research)
Classification
of Linear Models for
of decision theory (Sec
Discriminant functions – why least
The perceptron algorithm
Probabilistic generative models – origin
Probabilistic discriminative models Logistic regression
(Laplace approximation, Bayesian
classification
doesn’t work here
of the logistic function
logistic regression) – later in
Origin of the logistic function, logistic regression and how it connects to perceptron
the semester
Laplace approximation
Other linear discriminants
(e.g. Fisher)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com