WEEK 2 TERM 2:
MSIN0097
Predictive Analytics Lecture 2
A P MOORE
PREDICTIVE ANALYTICS
Review
MACHINE LEARNING JARGON
— Model
— Interpolating / Extrapolating — Data Bias
— Noise / Outliers
— Learning algorithm
— Inference algorithm
— Supervised learning
— Unsupervised learning
— Classification
— Regression
— Clustering
— Decomposition
— Parameters
— Optimisation
— Training data
— Testing data
— Error metric
— Linear model
— Parametric model
— Model variance
— Model bias
— Model generalization
— Overfitting
— Goodness-of-fit
— Hyper-parameters
— Failure modes
— Confusion matrix
— True Positive
— False Negative
— Data density
— Partition
— Hidden parameter
— High dimensional space
— Low dimensional space
— Separable data
— Manifold / Decision surface
— Hyper cube / volume / plane
机器学习 行话
— 模型
— 内插 / 外推 — 数据偏差
— 噪声/离群值 — 学习算法
— 推断算法
— 监督学习
— 无监督学习 — 分类
— 回归
— 聚类
— 分解
— 参数
— 优化
— 训练数据 — 测试数据 — 误差指标 — 线性模型 — 参数模型 — 模型方差 — 模型偏差 — 模型泛化 — 过拟合 — 拟合优度 — 超参数
— 失败模式
— 混淆矩阵
— 真正例
— 假反例
— 数据密度
— 划分
— 隐藏参数
— 高维空间
— 低维空间
— 可分数据
— 流形/ 决策面
— 超立方体/超体积/超平 面
OPTIMISATION / 优化 YŌUHUÀ
SC ALING
ERROR METRIC / 误差指标 WÙCHĀ ZHǏBIĀO
HYPER-PARAMETERS / 超参数 CHĀO CĀNSHÙ
PARAMETRIC MODEL / 参数模型 CĀNSHÙ MÓXÍNG
MODEL BIAS / 模型偏差 MÓXÍNG PIĀNCHĀ
MODEL VARIANCE / 模型方差 MÓXÍNG FĀNGCHĀ
MODEL GENERALIZATION / 模型泛化 MÓXÍNG FÀN HUÀ
OVERFITTING / 过拟合 GUÒ NǏ HÉ
FAILURE MODES / 失败模式 SHĪBÀI MÓSHÌ
COMMON CLASSIFICATION METRICS
— Accuracy
— Precision (P)
— Recall (R)
— F1 score (F1)
— Area under the ROC (Receiver Operating Characteristic) curve or simply AUC (AUC) – Log loss
— Precision at k (P@k)
— Average precision at k (AP@k)
— Mean average precision at k (MAP@k)
COMMON REGRESSION METRICS
— Mean absolute error (MAE)
— Mean squared error (MSE)
— Root mean squared error (RMSE)
— Root mean squared logarithmic error (RMSLE) — Mean percentage error (MPE)
— Mean absolute percentage error (MAPE)
— R2
PREDICTIVE ANALYTICS
Measuring performance
CONFUSION MATRIX / 混淆矩阵 HÙNXIÁO JǓZHÈN
PRACTICAL TOOLS ML CANVAS
A. CLASSIFICATION CATEGORICAL VARIABLE
PREDICTIVE ANALYTICS
Logistic Regression
LOGISTIC REGRESSION (CLASSIFICATION !!!!!)
DECISION BOUNDARIES
DECISION BOUNDARIES
PREDICTIVE ANALYTICS
Problem 1
~15 mins group work ~15 mins discussion
REVIEW
— Select good metrics for classification tasks
— How to pick the appropriate precision/recall trade-off
— How to compare classifiers
— Different classification systems for a variety of tasks
— What business problems can you think of that are classification tasks?
— Can you think of some business problems that are multilabel and multioutput?
The Machine Leaning Cana (0.4) Deiged f: Deiged b: Dae: Ieai: .
Deciin
H ae edici ed
ake decii ha ide
he ed ale he ed-e?
ML ak
I, edic, e f ble.
Vale Piin
Wha ae e ig d f he ed-e() f he edicie e? Wha bjecie ae e eig?
Daa Sce
Which a daa ce ca e e (ieal ad eeal)?
Cllecing Daa
H d e ge e daa lea f (i ad )?
Making Pedicin
Whe d e ake edici e i? H lg d e hae feaie a e i ad ake a edici?
Offline Ealain
Mehd ad eic ealae he e befe dele.
Feae
I eeeai eaced f a daa ce.
Bilding Mdel
Whe d e ceae/dae
del ih e aiig
daa? H lg d e hae feaie aiig i ad ceae a del?
Lie Ealain and Mniing
Mehd ad eic ealae he e afe dele, ad aif ale ceai.
machineleaningcana.cm b Li Dad, Ph.D. Liceed de a Ceaie C Aibi-ShaeAlike 4.0 Ieaial Licee.
TEXT CATEGORIZATION
FILM GENRE CLASSIFICATION
https://towardsdatascience.com/journey-to-the-center-of-multi-label-classification-384c40229bff
MULTI-OUTPUT LEARNING
https://arxiv.org/pdf/1901.00248.pdf
OUTPUT STRUCTURES
https://arxiv.org/pdf/1901.00248.pdf
PREDICTIVE ANALYTICS
Decision boundaries
LEARNING CURVES
DECISION BOUNDARIES
Decision Boundaries Animations by Ryan Holbrook is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at https://github.com/ryanholbrook/decision-boundaries-animations.
DECISION BOUNDARIES
Decision Boundaries Animations by Ryan Holbrook is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at https://github.com/ryanholbrook/decision-boundaries-animations.
DECISION BOUNDARIES
Decision Boundaries Animations by Ryan Holbrook is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at https://github.com/ryanholbrook/decision-boundaries-animations.
DECISION BOUNDARIES
Decision Boundaries Animations by Ryan Holbrook is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at https://github.com/ryanholbrook/decision-boundaries-animations.
IRIS DECISION TREE
DECISION TREES
Source: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
PREDICTIVE ANALYTICS
Individual Assessment
LECTURE 1 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE