LECTURE 3 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE
MSIN0097
Individual coursework
INDIVIDUAL COURSEWORK
MSIN0097
Group coursework
COURSEWORK / INDUSTRY REPORT
MACHINE LEARNING JARGON
— Model
— Interpolating / Extrapolating — Data Bias
— Noise / Outliers
— Learning algorithm
— Inference algorithm
— Supervised learning
— Unsupervised learning
— Classification
— Regression
— Clustering
— Decomposition
— Parameters
— Optimisation
— Training data
— Testing data
— Error metric
— Linear model
— Parametric model
— Model variance
— Model bias
— Model generalization
— Overfitting
— Goodness-of-fit
— Hyper-parameters
— Failure modes
— Confusion matrix
— True Positive
— False Negative
— Partition
— Data density
— Hidden parameter
— High dimensional space
— Low dimensional space
— Separable data
— Manifold / Decision surface
— Hyper cube / volume / plane
MSIN0097
Homework…
LEARNING RATES
Hands of ML: Chapt 4, Figure 4.8, page 116 (v1)
MSIN0097
Reviewing notebooks
CODE
ALTERING CODE
DIFFS
LEARNING RATES
LEARNING RATES
LEARNING RATES
+ [0],[0]
[1],[0]
[8],[-2]
[10],[3]
𝜃! = 𝑐
𝜃” = 𝑚
LEARNING CURVES
DECISION TREES
Source: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
TREE ALGORITHM
MODEL VARIANCE / 模型方差 MÓXÍNG FĀNGCHĀ
MODEL GENERALIZATION / 模型泛化 MÓXÍNG FÀN HUÀ
OVERFITTING / 过拟合 GUÒ NǏ HÉ
FAILURE MODES / 失败模式 SHĪBÀI MÓSHÌ
CLASS WEIGHTING
QUES TIONS
— What is the benefit of out-of-bag evaluation?
— If a Decision Tree is overfitting the training set, is it a good idea to try
decreasing max_depth?
— If a Decision Tree is underfitting the training set, is it a good idea to try scaling the
input features?
— What problems might we have if we try to grow a tree with high class imbalance?
MSIN0097
Breakout 1
CLASS WEIGHTING
— Domain expertise
– determinedbytalkingtosubjectmatterexperts.
— Tuning
– determinedbyahyperparametersearchsuchasagridsearch.
— Heuristic
– specifiedusingageneralbestpractice
Applications
• Fraud Detection
• Claim Prediction
• Churn Prediction
• Spam Detection
MSIN0097
Breakout 2
COMMON APPROACHES
— Performance Metrics – F-measure
– G-mean — Data Sampling
– SMOTE(SyntheticMinorityOversamplingTechnique)
– ENN(Editednearestneighbours) — Cost-Sensitive Algorithms
– DecisionTrees — Post-Processing
– ThresholdMoving – Calibration
SYNTHETIC DATA SAMPLING
Source: https://github.com/minoue-xx/Oversampling-Imbalanced-Data
•SMOTE (Chawla, NV. et al. 2002)
•Borderline SMOTE (Han, H. et al. 2005)
•ADASYN (He, H. et al. 2008)
•Safe-level SMOTE (Bunkhumpornpat, C. at al. 2009)
MSIN0097
Ensembles
KEY CONCEPTS
If you have trained five different models on the exact same training data, and they all achieve 95% precision, is there any chance that you can combine these models to get better results?
If so, how? If not, why?
MULTIPLE MODELS
VOTING
Leipzig–Dresden Railway Company in 1852
MACHINE LEARNING SYSTEMS
Why is ML hard?
WHY IS ML HARD?
https://ai.stanford.edu/~zayd/why-is-machine-learning-hard.html
DEBUGGING
def recursion(input):
if input is endCase:
else:
return transform(input)
return recursion(transform(input))
WHY IS ML HARD? ALGORITHM, IMPLEMENTATION, DATA, MODEL
WHY IS ML HARD?
ML AS HACKING
TWITTER
@chipro @random_forests @zachar ylipton @yudapearl
HAVE YOUR SAY!
“The questionnaires are very short and will take less than a minute for them to complete.”
VOTING IN ACTION!
TEACHING TEAM
Dr Alastair Moore Senior Teaching Fellow
a.p.moore@ucl.ac.uk
@latticecut
Kamil Tylinski Teaching Assistant
kamil.tylinski.16@ucl.ac.uk
Jiangbo Shangguan Teaching Assistant
j.shangguan.17@ucl.ac.uk
LECTURE 3 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE