LECTURE 3 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE
END-TO-END ML
— Discover — Explore — Visualize
— Clean
— Sample — Impute — Encode — Transform
– Scale
— Modeling
– Overfitting
– Optimization
– ModelSelection – Regularization
– Generalization
— Documentation — Presentation
— Launch — Monitor — Maintain
DECISION BOUNDARY SOFTMAX REGRESSION
DECISION TREES
DECISION TREES
DECISION TREES
Classification Trees
DECISION TREE (IRIS DATA)
DECISION TREE BOUNDARIES
DECISION TREE BOUNDARIES
COST FUNCTIONS
%
𝐺! =1−%𝑝!,”‘ “#$
𝐽 𝑘, 𝑡” = 𝑚()*+ 𝐺()*+ + 𝑚,!-.+ 𝐺,!-.+ 𝑚𝑚
REGUL ARIZ ATION
RIDGE (REGULARIZED) REGRESSION
REGUL ARIZ ATION
• k – features
• tk – thresholds
• min_samples_split • min_samples_leaf • max_leaf_nodes
• max_features
DECISION TREES
Regression Trees
REGRESSION TREES
TREE REGRESSIONS
𝐽 𝑘, 𝑡! = 𝑚”#$% 𝑀𝑆𝐸”#$% + 𝑚&'()% 𝑀𝑆𝐸&'()% 𝑚𝑚
TREE REGRESSIONS
REGULARIZING A TREE REGRESSOR
INSTABILITY
SENSITIVITY TO TRAINING SET
Source: http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
QUES TIONS
— If a Decision Tree is overfitting the training set, is it a good idea to try decreasing max_depth?
— If a Decision Tree is underfitting the training set, is it a good idea to try scaling the input features?
LECTURE 3 TERM 2:
MSIN0097
Predictive Analytics
A P MOORE