CS代考程序代写 GMM Bayesian algorithm LECTURE 5 TERM 2:

LECTURE 5 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE

MSIN0097
Individual coursework

MSIN0097
Individual Coursework assignment has been extended by one week
to Friday 5th March 2021 at 10:00 am

USING OTHER PEOPLE’S CODE

pic.twitter.com/4q4IbLgEB8

— Wojciech Zaremba (@woj_zaremba) February 4, 2021

MACHINE LEARNING JARGON
— Model
— Interpolating / Extrapolating — Data Bias
— Noise / Outliers
— Learning algorithm
— Inference algorithm
— Supervised learning
— Unsupervised learning
— Classification
— Regression
— Clustering
— Decomposition
— Parameters
— Optimisation
— Training data
— Testing data
— Error metric
— Linear model
— Parametric model
— Model variance
— Model bias
— Model generalization
— Overfitting
— Goodness-of-fit
— Hyper-parameters
— Failure modes
— Confusion matrix
— True Positive
— False Negative
— Partition
— Margin
— Data density
— Hidden parameter
— High dimensional space
— Low dimensional space
— Separable data
— Manifold / Decision surface
— Hyper cube / volume / plane

A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
C. Clustering
Hidden variables
Density estimation Manifolds
B. Regression
Super vised
D. Decomposition
Subspaces
Unsuper vised

QUES TIONS
— How would I know if my data will be benefitted from a transformation to a higher or lower dimensional space?

CURSE OF DIMENSIONALITY
https://www.nature.com/articles/s41592-018-0019-x

QUES TIONS
— Would I always have to visualize the data at a 2D or 3D level to visually understand if the data can be better separable? (but then this would defeat the idea of going a higher dimensional space which can’t be visualized).

SUMMARY STATISTICS
Anscombe’s quartet

SUMMARY STATISTICS
https://seaborn.pydata.org/examples/scatterplot_matrix.html

QUES TIONS?
— Should I have to go all the way through modelling (e.g. classification) and evaluate a metric such as the Gini coefficient and then go back to comparing different Gini scores from (addition of) extra dimensions?

QUES TIONS?
— I understand that it might be better to go up a dimension in certain cases and other cases it will be better to go lower a dimension?

MULTIPLE MODELS

MSIN0097
K-means

K-MEANS LLOYD–FORGY ALGORITHM

K-MEANS
— Advantages — Disadvantages

ELLIPSOIDAL DISTRIBUTED DATA

MSIN0097
Gaussian mixtures

PARTITIONAL

MIXTURE OF GAUSSIANS (1D)

HIDDEN (LATENT) VARIABLES

MIXTURE OF GAUSSIANS (2D)

GRAPHICAL MODELS GAUSSIAN MIXTURES

PLATE NOTATION
— including its parameters (squares, solid circles, bullet) — random variables (circles)
— conditional dependencies (solid arrows)

FAMILIES OF MODELS
Gaussian mixture T-distribution mixture Factor Analysis

TWO STEP – EM ALGORITHM

EM ALGORITHM

EXPECTATION MAXIMIZATION

MIXTURE OF GAUSSIANS AS MARGINALIZATION

E-S TEP

M-S TEP

EM ALGORITHM

EXPECTATION MAXIMIZATION

MANIPULATING THE LOWER BOUND

LOCAL MAXIMA
Repeated fitting of mixture of Gaussians model with different starting points results in different models as the fit converges to different local maxima.
Log likelihoods are a) 98.76 b) 96.97 c) 94.35, respectively, indicating that (a) is the best fit.

COVARIANCE COMPONENTS
a) Full covariances.
b) Diagonal covariances.
c) Identical diagonal covariances.

LEARNING GMM PSEUDO CODE

ANOMALY DETECTION

BIC AND AIC

GAUSSIAN MIXTURES

BAYESIAN GMMS

CONCENTRATION PRIORS
The more data we have, however, the less the priors matter. In fact, to plot diagrams with such large differences, you must use very strong priors and little data.

TWO MOONS DATA

PROBLEMS WITH MULTI-VARIATE NORMAL DENSITY

MSIN0097
Types of models

GENERATIVE VS DISCRIMINATIVE

CLASSIFICATION (DISCRIMINATIVE)
LOGISTIC REGRESSION REVISITED
MODEL CONTINGENCY OF THE WORLD ON DATA
World state: Linear model Bernoulli distribution
Probability / Decision surface

CLASSIFICATION (GENERATIVE)
GAUSSIAN MIXTURE
MODEL CONTINGENCY OF DATA ON THE WORLD

WHAT SORT OF MODEL SHOULD WE USE?

WHAT SORT OF MODEL SHOULD WE USE? TL;DR NO DEFINITIVE ANSWER
— Inference is generally simpler with discriminative models.
— Generative models calculate this probability via Bayes’ rule, and sometimes this requires a computationally expensive algorithm.
— Generative models might waste modelling power.
The data are generally of much higher dimension than the world, and modelling it is costly. Moreover, there may be many aspects of the data which do not influence the state;
— Using discriminative approaches, it is harder to exploit this knowledge: essentially we have to re-learn these phenomena from the data.
— Sometimes parts of the training or test data vector x may be missing. Here, generative models are preferred.
— It is harder to impose prior knowledge in a principled way in discriminative models.

SUMMARY OF APPROACHES

MSIN0097
Best practice…

BEST PRACTICE…

BEST PRACTICE…
Source: https://www.marekrei.com/blog/ml-and-nlp-publications-in-2019/
Percentage of papers mentioning GitHub (indicating that the code is made available):
ACL 70%, EMNLP 69%, NAACL 68% ICLR 56%, NeurIPS 46%, ICML 45%, AAAI 31%.
It seems the NLP papers are releasing their code much more freely.

PAPERS WITH CODE
https://paperswithcode.com/

PERCEPTIONS OF PROBABILITY

DEPLO YMEN T

@SOCIAL
@chipro @random_forests @zachar ylipton @yudapearl @svpino @jackclarkSF

TEACHING TEAM
Dr Alastair Moore Senior Teaching Fellow
a.p.moore@ucl.ac.uk
@latticecut
Kamil Tylinski Teaching Assistant
kamil.tylinski.16@ucl.ac.uk
Jiangbo Shangguan Teaching Assistant
j.shangguan.17@ucl.ac.uk
Individual Coursework workshop
to Thursday 11th Feb 2021 at 12:00 am

LECTURE 3 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE

Related Posts