程序代写 https://xkcd.com/388/

https://xkcd.com/388/

Announcements
In person lecture: Wed 16 March, PHYS

T (will try simulcast in Teams, with us luck!)
https://studentvip.com.au/anu/main/maps/142757
Quiz 1 next week, due
Thu (releasing by Mon)

Linear models for
Classification
of decision theory (Sec
Discriminant functions – why least
The perceptron algorithm
Probabilistic generative models – origin
Probabilistic discriminative models Logistic regression
classification
Laplace approximation (Bayesian logistic regression)
doesn’t work here
of the logistic function
Origin of the logistic function, logistic regression and how it connects to perceptron

Image from wikipedia

Representing
Meeting in person/remote, pass/fail, benign/malignant, should be given credit (Y/N), …
Iris flowers, object classes in photos, natural language, …
Other representations abound, e.g. structured (time series, sequences – PRML Chapter 13, graphs), Sec 4.1.5 (an output representation that connects Fisher Discriminant to least squares),
Sec 4.1.7 perceptrons, SVMs

A primer on decision theory
Inference: compute
, e.g. from a set of training data.
Decision theory: take a specific action based on our understanding of the values t is likely to take.
(end of page 38) “We shall see that the decision stage is generally very simple, even trivial, once we have solved the inference problem. “
— many counter-examples! e.g. multiple medical tests, test + quarantine for covid
— frontiers of theoretical and practical ML, e.g. active/online learning, ML and economics

Minimise error

the potential problems
for minimising
classification error?
Example scenario: medical diagnosis

Minimise loss,
Decision: argminj
Brodersen, K. H.; Ong, C. S.; Stephan, K. E.; Buhmann, J. M., 2010. The balanced accuracy and its posterior distribution. International Conference on Pattern Recognition.
(many other metrics also address this problem)
metric, having
a reject option

A modular view
of (supervised) ML

Classification
we covered
of decision theory (Sec
Discriminant functions – why least
The perceptron algorithm
Probabilistic generative models – origin
Probabilistic discriminative models Logistic regression
(Laplace approximation, Bayesian
doesn’t work here
of the logistic function
logistic regression)

Discriminant function

Linear discriminant functions
Decision boundary y(x)=0 — is a hyperplane of D-1 dimensions (in a D-dimensional input space).
w is orthogonal to any vector lying in the decision surface.
Contour lines of y(x) are hyperplanes too
y(x) is the signed distance of point x from the hyper-plane (see
next page)

Discriminant for
two classes

Discriminant function for K>=2
series of functions k=1,.., K,
Claim: The decision regions of such singly connected and convex.
a discriminant are always

Can we use least squares?

Mismatch between
model requirement and model specification.

The (historically
significant) perceptron [Rosenblatt 1962]
Stochastic gradient descent
“The perceptron convergence theorem states that if there exists an exact solution (in other words, if the training data set is linearly separable), then the perceptron learning algorithm is guaranteed to find an exact solution in a finite number of steps … however, until convergence is achieved, we will not be able to
distinguish between a nonseparable problem and one that is simply slow to converge. ”

Another look at
view of supervised ML

Bayes theorem, again
logistic sigmoid function

a.k.a log odds

Probabilistic Generative
Models – Multiclass

Class-conditional distribution being Gaussians

Maximum likelihood estimates

The maximum is

MLE solution for means and
covariance
O(D2) parameters!

Left as reading:
Discrete features – Naive Bayes
Sec 4.2.3 naive bayes
Sec 4.2.4 exponential family

Naive Bayes classifier
Linear functions of input x

we covered
Classification
of decision theory (Sec
Discriminant functions – why least
doesn’t work here
(Fisher discriminant)
The perceptron algorithm
Probabilistic generative models – origin
of the logistic function
Probabilistic discriminative models
Logistic regression
(Laplace approximation, Bayesian
logistic regression)

Probabilistic Discriminative Models

Fixed basis

Original input
vs feature

Logistic regression
The name came
from statistics.
It’s a model for classification rather than regression!
Compare to generative model of one Gaussian per class with shared variance
Same expression for posterior estimates, but linear number of parameters rather than quadratic.

Maximum likelihood for logistic regression

A critique of logistic
regression
PRML Fig 10.13

Courtesy of Aditya (Google Research)

Classification
of Linear Models for
of decision theory (Sec
Discriminant functions – why least
The perceptron algorithm
Probabilistic generative models – origin
Probabilistic discriminative models Logistic regression
(Laplace approximation, Bayesian
classification
doesn’t work here
of the logistic function
logistic regression) – later in
Origin of the logistic function, logistic regression and how it connects to perceptron
the semester

Laplace approximation

Other linear discriminants
(e.g. Fisher)

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts