LECTURE 3 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE
DEALING WITH DIFFICULT PROBLEMS
LEARNING SYSTEMS
LOSS SURFACES
DEALING WITH DIFFICULT PROBLEMS
— Improving bad solutions
– StartwithabadSolution(weaklearner)andimproveit
– Buildupabettersolutionbythinkingabouthowpartialsolutionscan support/correct each others mistakes
DEALING WITH DIFFICULT PROBLEMS
— Improving bad solutions
– StartwithabadSolution(weaklearner)andimproveit
– Buildupabettersolutionbythinkingabouthowpartialsolutionscan support/correct each others mistakes
— Make the problem simpler – Divideandconcur
– Problemdecomposition
DEALING WITH DIFFICULT PROBLEMS
1. Improving bad solutions
– StartwithabadSolution(weaklearner)andimproveit
– Buildupabettersolutionbythinkingabouthowpartialsolutionscan support/correct each others mistakes
2. Make the problem simpler – Divideandconcur
– Problemdecomposition
3. Building much better solutions – Deepmodels
HOW DO WE FIND THE RIGHT ANSWER?
HOW DO WE FIND THE RIGHT ANSWER?
CORREC T?
END-TO-END ML
— Discover — Explore — Visualize
— Clean
— Sample — Impute — Encode — Transform
– Scale
— Modeling
– Overfitting
– Optimization
– ModelSelection – Regularization
– Generalization
— Documentation — Presentation
— Launch — Monitor — Maintain
END-TO-END ML
— Discover — Explore — Visualize
— Clean
— Sample — Impute — Encode — Transform — Modeling
– Voting
– Bagging – Boosting – Pasting – Stacking
— Documentation — Presentation
— Launch — Monitor — Maintain
END-TO-END ML
— Discover — Explore — Visualize
— Clean
— Sample — Impute — Encode — Transform
– Scale
— Modeling
– Overfitting
– Optimization
– ModelSelection – Regularization
– Generalization
Voting
Majority voting Bagging and Pasting
Out-of-bag evaluation Boosting
Adaptive Boosting (Adaboost) Gradient Boosting
XGBoost
Stacking
MULTIPLE MODELS
ENSEMBLES
IMPROVING BAD SOLUTIONS
— Voting
— Bagging (bootstrap aggregating) and Pasting (non-replacement) — Boosting
— Stacking
MULTIPLE MODELS
MAJORITY VOTING
ENSEMBLES
Partitioning data
B AGGING
B AGGING
RANDOM FORESTS
DECISION BOUNDARIES
RANDOM FORESTS
FEATURE IMPORTANCE
ENSEMBLES
Sequential data
SEQUENTIAL MODELS
ENSEMBLES
Boosting data
BOOS TING
ADABOOS T
DECISION BOUNDARIES CONSECUTIVE PREDICTORS
VIOLA & JONES 2004
ENSEMBLES
Gradient boosting
GRADIENT BOOSTING FITTING RESIDUAL ERRORS
GRADIENT BOOSTING
GBRT ENSEMBLES
OVERFITTING
VALIDATION ERROR
S TACKING
BLENDING
SUB-PROBLEMS
S TACKING
BLENDED OR META LEARNER
S TACKING
BLENDED OR META LEARNER
DEEP MODELS
ANNS 2 GANS
BEST PRACTICE?
https://twitter.com/chipro/status/1354223368099278849 @chipro
LECTURE 3 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE