LECTURE 1 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE
INTRODUCTION TO AI
Why do they call it intelligence?
MACHINE LEARNING
Data + modelàprediction
MACHINE LEARNING DATA DRIVEN AI
Assume there is enough data to find statistical associations to solve specific tasks
Data + modelàprediction
Define how well the model solves the task and adapt the parameters to maximize performance
LEARNING A FUNCTION
𝑥→𝑦
𝑥 →𝑓(𝑥)→𝑦
LEARNING A FUNCTION
𝑥→𝑦
𝑥 →𝑓(𝑥)→𝑦
Measured data
Features Inferred/Predicted/Estimated value
Trueinitialvalue𝑥 →𝑥’→𝑓 𝑥 =𝑦’ →𝑦
(world state) True target value
Learned/Fitted function (world state) From n observations
LEARNING A FUNCTION
𝑥→𝑦
𝑥 →𝑓(𝑥)→𝑦
Measured data
Features Inferred/Predicted/Estimated value
Trueinitialvalue𝑥 →𝑥’→𝑓 𝑥 =𝑦’ →𝑦
(world state) True target value
Learned/Fitted function (world state) From n observations
input 𝑥→ 𝑓 𝑥 →𝑦 output
MACHINE LEARNING DATA DRIVEN AI
Source: https://twitter.com/Kpaxs/status/1163058544402411520
MACHINE LEARNING DATA DRIVEN AI
𝑥 → 𝑥’ → 𝑓 𝑥 = 𝑦’ → 𝑦 𝑓𝑥
𝑦!
𝑓𝑥
𝑥! 𝑥!
𝑦!
{𝑥!,𝑦!} Labelled training data Source: https://twitter.com/Kpaxs/status/1163058544402411520
INTRODUCTION TO AI
Learning the rules
MATURITY OF APPROACHES ML
“Classical” Machine learning
“Modern” Machine learning
Source: hazyresearch.github.io/snorkel/blog/snorkel_programming_training_data.html
PARADIGMS IN ML
Source: https://twitter.com/IntuitMachine/status/1200796873495318528/photo/1
TASKS IN MACHINE LEARNING
MACHINE LEARNING BR ANCHES
We know what the right answer is
We don’t know what the right answer is – but we can recognize a good answer if we find it
We have a way to measure how good our current best answer is, and a method to improve it
Source: Introduction to Reinforcement Learning, David Silver
BUILDING BLOCKS OF ML
A – B – C- D
A TAXONOMY OF PROBLEMS
A. ClAssification B. Regression
C. Clustering D. Decomposition
A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
– Support vector machines
– Neural networks
– Random Forests
– Maximum entropy classifiers -…
C. Clustering
– K-means
– KD Trees
– Spectral clustering – Density estimation – …
B. Regression
– Logistic regression
– Support vector regression – SGD regressor
– …
D. Decomposition
– PCA – LDA – t-SNE – Umap – VAE -…
A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
– Support vector machines
– Neural networks
– Random Forests
– Maximum entropy classifiers -…
C. Clustering – K-means
– KD Trees
– Spectral clustering – Density estimation – …
B. Regression
– Logistic regression
– Support vector regression – SGD regressor
– …
Super vised
D. Decomposition – PCA
– LDA – t-SNE – Umap – VAE -…
Unsuper vised
A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
B. Regression
Super vised
C. Clustering
D. Decomposition
Unsuper vised
A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
B. Regression
We know what the right answer is
Super vised
C. Clustering
D. Decomposition
Unsuper vised
A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
B. Regression
C. Clustering
D. Decomposition
Super vised
Unsuper vised
We don’t know what the right answer is – but we can recognize a good answer if we find it
A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
B. Regression
Super vised
We have a way to measure how good our current best answer is Reinforcement
C. Clustering
D. Decomposition
Learning Unsuper vised
MACHINE LEARNING
B. Regression
B. REGRESSION REAL VALUED VARIABLE
B. REGRESSION REAL VALUED VARIABLE
B. REGRESSION REAL VALUED VARIABLE
LINEAR REGRESSION
REGRESSION BY MODELING PROBABILITIES
B. REGRESSION REAL VALUED VARIABLE
B. REGRESSION REAL VALUED VARIABLE
MULTIPLE DIMENSIONS
DEVELOPING MORE COMPLEX ALGORITHMS
MACHINE LEARNING
A. Classification
A. CLASSIFICATION CATEGORICAL VARIABLE
A. CLASSIFICATION CATEGORICAL VARIABLE
A. CLASSIFICATION CATEGORICAL VARIABLE
A. CLASSIFICATION CATEGORICAL VARIABLE
LOGISTIC REGRESSION
DEVELOPING MORE COMPLEX ALGORITHMS
CONFUSION MATRIX BINARY FORCED CHOICE
A. CLASSIFICATION CATEGORICAL VARIABLE
Model 1
predicted
predicted
Model 2
actual
actual
CL ASSIFIC ATION MNIST DATASET
CONFUSION MATRIX
MACHINE LEARNING
C. Clustering
CLASSIFICATION VS CLUSTERING CATEGORICAL VARIABLE
CLASSIFICATION VS CLUSTERING
C. CLUSTERING
C. CLUSTERING
C. CLUSTERING
C. CLUSTERING 1. AGGLOMERATIVE
C. CLUSTERING 1. AGGLOMERATIVE
C. CLUSTERING 1. AGGLOMERATIVE
C. CLUSTERING 1. AGGLOMERATIVE
C. CLUSTERING 1. AGGLOMERATIVE
Dendrogram
C. CLUSTERING 2. DIVISIVE
C. CLUSTERING 2. DIVISIVE
C. CLUSTERING 2. DIVISIVE
C. CLUSTERING 2. DIVISIVE
C. CLUSTERING 3. PARTITIONAL
C. CLUSTERING 3. PARTITIONAL
C. CLUSTERING 3. PARTITIONAL
EXPECTATION MAXIMISATION
MACHINE LEARNING
D. Decomposition
D. DECOMPOSITION 2. PROJECTION METHODS
Dimensionality reduction
D. DECOMPOSITION 2. KERNEL METHODS
D. DECOMPOSITION 3. MANIFOLD LEARNING
A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification B. Regression
C. Clustering D. Decomposition
TAXONOM Y
A.
B.
C.
D.
A – B – C- D ALGORITHMIC APPROACHES
A. ClAssification
B. Regression Source: Computer Vision: Learning, Models and Inference
DISCRIMINATIVE VS GENERATIVE A SIMPLE EXAMPLE
PARAMETRIC VS NON-PARAMETRIC
— With data gathered from uncontrolled observations on complex systems involving unknown [physical, chemical, biological, social, economic] mechanisms, the a priori assumption that nature would generate the data through a parametric model selected by the statistician can result in questionable conclusions that cannot be substantiated by appeal to goodness-of-fit tests and residual analysis.
— Usually, simple parametric models imposed on data generated by complex systems, for example, medical data, financial data, result in a loss of accuracy and information as compared to algorithmic models
Source: Statistical Science 2001, Vol. 16, No. 3, 199–231 Statistical Modeling: The Two Cultures Leo Breiman
REGUL ARIZ ATION IMPOSING ADDITIONAL CONSTRAINTS
ASSESSING GOODNESS OF FIT
ML PIPELINES
Source: https://epistasislab.github.io/tpot/
ML PIPELINES
FEATURE SELECTION AND AUTOMATION
Source: https://epistasislab.github.io/tpot/
HOMEWORK
Hands-on Machine Learning
Chapter 2: End-to-End Machine Learning Project
Try reading the Chapter from start to finish.We will work through the problem in class but please come prepared to discuss the case study.
It is easier to understand the different stages of a ML project if you follow one from start to finish.
END TO END
TESTING AND VALIDATION
— Generalization of data
— Generalization of feature representation — Generalization of the ML model
TOY VS REAL DATA
— Toy data is useful for exploring behaviour of algorithms
— Demonstrating the advantages and disadvantages of an algorithm — However, best not to use just Toy datasets
— Use real datasets
Source: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
BOOKS
THINKING ABOUT BUSINESS
WORKING WITH DATA
DESIGNING PREDICTIVE MODELS
PYTHON PROGRAMMING
A – B – C- D
A TAXONOMY OF PROBLEMS
A. ClAssification B. Regression Week 2 – Classification and Regression
Week 3 – Trees and Ensembles
C. Clustering D. Decomposition
Week 5 – Clustering
Week 4 – Kernel spaces and Decomposition
LECTURE 1 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE