Lead Research Scientist, Financial Risk Quantitative Research, SS&C Algorithmics Adjunct Professor, University of Toronto
MIE1624H – Introduction to Data Science and Analytics Lecture 7 – Machine Learning
University of Toronto March 1, 2022
Copyright By PowCoder代写 加微信 powcoder
Machine learning
Machine learning gives computers the ability to learn without being explicitly programmed
■ Supervised learning: decision trees, ensembles (bagging, boosting, random forests), k-NN, linear regression, Naive Bayes, neural networks, logistic regression, SVM
❑ Classification
❑ Regression (prediction)
■ Unsupervised learning: k-means, c-means, hierarchical clustering, DBSCAN
❑ Clustering
❑ Dimensionality reduction (PCA, LDA, factor analysis, t-SNE) ❑ Association rules (market basked analysis)
■ Reinforcement learning ❑ Dynamic programming
■ Neural nets: deep learning, multilayer perceptron, recurrent neural network (RNN), convolutional neural network (CNN), generative
adversarial network (GAN) 2
Machine learning
Machine learning gives computers the ability to learn without being explicitly programmed
Source: , Intro to Machine Learning
Source: , Intro to Machine Learning
Source: , Intro to Machine Learning
Source: , Intro to Machine Learning
Source: , Intro to Machine Learning
Source: , Intro to Machine Learning
Source: , Intro to Machine Learning
Source: , Intro to Machine Learning
Source: , Intro to Machine Learning
Source: , Intro to Machine Learning
Source: , Intro to Machine Learning
Source: , Intro to Machine Learning
Source: , Intro to Machine Learning
Source: , Intro to Machine Learning
Source: , Intro to Machine Learning
Source: , Intro to Machine Learning
Source: , Intro to Machine Learning
Lecture #5, #7, #10 Lecture #10
Source: , Intro to Machine Learning
Lecture #5, #7, #10 Lecture #10
Source: , Intro to Machine Learning
Lecture #5, #7, #10
Lecture #10
Lecture #11
Source: , Intro to Machine Learning
Supervised Machine Learning – Decision Trees
Classification
▪ Classification is a supervised learning technique, which maps data into predefined classes or groups
▪ Training set contains a set of records, where one of the records indicates class
▪ Modeling objective is to assign a class variable to all of the records, using
attributes of other variables to predict a class
▪ Data is divided into test / train, where “train” is used to build the model and “test” is used to validate the accuracy of classification
▪ Typical techniques: Decision Trees, Neural Networks
>=15 years
Classification: creating model
Training Data
Works with both interval and categorical variables
Trained Classifier
Purchased lipstick if Gender = Female and
Classification Algorithms
Classification: applying rules
Apply Scoring
Gender = Female
Age >= 15 then
Purchase lipstick = YES
Decision (classification) trees
▪ A tree can be “learned” by splitting the source set into subsets based on an attribute value test
▪ Tree partitions samples into mutually exclusive groups by selecting the best splitting attribute, one group for each terminal node
▪ The process is repeated recursively for each derived subset, until the stopping criteria is reached
➢ Works with both interval and categorical variables
➢ No need to normalize the data ➢ Intuitive if-then rules are easy to
extract and apply
➢ Best applied to binary outcomes 28
▪ Decision trees can be used to
support multiple modeling objectives o Customer segmentation
o Investment / portfolio decisions
o Issuing a credit card or loan
o Medical patient / disease classification
>=15 years
Decision (classification) trees
Decision (classification) trees
All Candidates
Decision node
Leaf nodes
Understanding decision trees
▪Decision trees are built using recursive partitioning to classify the data
▪The algorithm chooses the most predictive feature to split the data on
▪“Predictiveness” is based on decrease in entropy (gain in information) or “impurity”
Understanding decision trees
▪Decision trees are built using recursive partitioning to classify the data
▪The algorithm chooses the most predictive feature to split the data on
▪“Predictiveness” is based on decrease in entropy (gain in information) or “impurity”
Understanding decision trees
▪Decision trees are built using recursive partitioning to classify the data
▪The algorithm chooses the most predictive feature to split the data on
▪“Predictiveness” is based on decrease in entropy (gain in information) or “impurity”
No information gain!
Understanding decision trees
▪Root node partitions the data using the feature that provides the most information gain
▪Information gain tells us how important a given attribute of the feature vectors is:
▪Entropy is a common measure of target class impurity (i is each of the target classes, pi is proportion of the number of elements in class 0 or 1):
▪Gini Index is another measure of impurity:
Gini impurity is computationally faster as it doesn’t require calculating logarithmic functions, though in reality which of the two methods is used rarely makes too much of a difference.
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Source: , Supervised Machine Learning
Characteristics of decision trees
Easy to interpret
Easy to overfit or underfit the model
Can handle numeric or categorical features
Cannot model interactions between features
Can handle missing data
Large trees can be difficult to interpret
Uses only the most important features
Can be used on very large or small data
A tree stops growing at a node when…
▪ pure or nearly pure
▪ no remaining variables on which to further subset the data
▪ the tree has grown to a preselected size limit 88
Bias-variance tradeoff
error = bias + variance
■ Bias-variance tradeoff is the problem of simultaneously minimizing two sources of error that prevent supervised learning algorithms from generalizing beyond their training set:
❑ Bias is error from erroneous assumptions in the learning algorithm, high bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting)
❑ Variance is error from sensitivity to small fluctuations in the training set, high variance can cause overfitting, i.e., modeling the random noise in the training data, rather than the intended outputs
■ Ensemble tree methods:
❑ Gradient Boosting (GBoost) is based on weak learners (high bias, low variance). In terms of decision trees, weak learners are shallow trees, sometimes even as small as decision stumps (trees with two leaves). Boosting reduces error mainly by reducing bias.
❑ Random Forest uses fully grown decision trees (low bias, high variance). It tackles the error reduction task by reducing variance. The trees are made uncorrelated to maximize the decrease in variance, but the algorithm cannot reduce bias (which is slightly higher than the bias of an individual tree in the forest).
Supervised Machine Learning – Neural Networks
Artificial neural networks (ANNs)
▪ Based loosely on computer models of how brains work
▪ Model is an assembly of inter-connected neurons (nodes) and weighted links
▪ Each neuron applies a nonlinear function to its inputs to produce an output
▪ Output node sums up each of its input value according to the weights of its links
▪ Used for classification, pattern recognition, speech recognition
▪ “ ” model – no explanatory power, very hard to interpret the results
▪ ANNs with more than one hidden layer are called Deep Learning networks
Training ANN means learning the weights of the neurons
x2 Input Layer
f6 y Output Layer
f7 Hidden Layer
Applications of Deep Learning – Computer Vision
▪ Image classification
▪ Object detection
▪ Type: Feedforward Neural Network
▪ Type: Convolutional Neural Network
Source: CognitiveClass.ai
Applications of Deep Learning – Computer Vision
▪ Image classification
▪ Object detection
▪ Type: Feedforward Neural Network
▪ Type: Convolutional Neural Network
Source: CognitiveClass.ai
Applications of Deep Learning – Natural Language Processing and Time Series Analysis
▪ Speech recognition
▪ Automatic machine translation
▪ Time series prediction
▪ Type: Recurrent Neural Network
https://blog.statsbot.co/time-series-prediction-using- recurrent-neural-networks-lstms-807fa6ca7f
Source: CognitiveClass.ai
Applications in Computer Vision
■ Application: automatically classifying postal codes
Source: CognitiveClass.ai
Target variables in MNIST dataset – multi-class classification
𝑦=0 𝑦=1 𝑦=2⋯
Source: CognitiveClass.ai
image with 28 x 28 pixels and 4 greyscale intensities (0=white, 3=black)
Source: CognitiveClass.ai
Features and targets: 3-class example with 2 features Colors are used to indicates the class
1.(𝑦 = 0, 𝐱 = [4,2]) 11
2.(𝑦2 = 1, 𝐱2 = [1,0]) 3.(𝑦3 = 2, 𝐱3 = [0,1])
Source: CognitiveClass.ai
Linear classifier
Source: CognitiveClass.ai
Linear classifier
▪The equation of a line in 1 dimension is given by: 𝑤x+𝑏
▪This generalizes in D dimensions to : 𝒘𝑇𝐱+𝑏
▪Let’s see what happens for different values of 𝐱
Source: CognitiveClass.ai
Linear classifier
• Consider the following data set
• If we can separate the data with a line, we can use that line to classify the sample
𝑧 = 𝑤x + 𝑏 = 1x − 1
Source: CognitiveClass.ai
Linear classifier
■ If x is on the left side of the line we get a positive number
𝑧 = 1x − 1
𝑧 = 1(3)−1
Source: CognitiveClass.ai
Linear classifier
■ If x is on the right side of the line we get a positive number
𝑧 = 1x − 1
𝑧 = 1(−2)−1
Source: CognitiveClass.ai
Linear classifiers
• If we use the line to calculate the class of a point, it always returns a positive or
negative number, such as 3, -2, and so on.
• But, we need class 0 and class 1. How we can convert the numbers into 0 and 1?
• 𝑦ො = 1, if 𝑧 >= 0 • 𝑦ො = 0, if 𝑧 < 0
Source: CognitiveClass.ai
Linear classifier
parameters: 𝑤, 𝑏
Source: CognitiveClass.ai
Linear classifier: Threshold Function
Source: CognitiveClass.ai
Linear classifier: Logistic Regression Logistic function
Source: CognitiveClass.ai
𝜎−4 ≈0 𝜎−1 =0.25 𝜎1 =0.75 𝜎4 ≈1
Linear classifier: Logistic Regression
𝑦ො = 0 𝜎 𝑧 <0.5
Source: CognitiveClass.ai
Linear classifier: Logistic Regression
▪ Samples are near the line, we get values close to 0.5
▪ If they are far, we get values close to 0 or 1
x3=-3.5 x1=-0.25
Source: CognitiveClass.ai
Linear classifiers vs. neural networks – example
• It is helpful to view the sample y as a decision function of x
Source: CognitiveClass.ai
Linear classifiers vs. neural networks – example
Colors are used to indicates the class 1. y1 =0,x1 =−3
2. y2 =0,x2 =−2
3. y3 =1,x3=−1
4. y4 = 1, x4= 0 5. y5 = 1, x5= 1 6. y6 = 0, x6 = 2 7. y7 = 0, x7 = 3
This dataset is not linearly separable
Source: CognitiveClass.ai
Linear classifiers vs. neural networks
Source: CognitiveClass.ai
Neural networks
Source: CognitiveClass.ai
Neural networks
Source: CognitiveClass.ai
Neural networks
Source: CognitiveClass.ai
Neural networks
Source: CognitiveClass.ai
Neural networks
Source: CognitiveClass.ai
Neural networks
Source: CognitiveClass.ai
Source: CognitiveClass.ai
artificial neuron
Source: CognitiveClass.ai
𝑧1 = 𝑥 + 5
𝑧2 = 𝑥 − 5
Source: CognitiveClass.ai
𝑎1 = 𝜎(𝑥 + 5)
Source: CognitiveClass.ai
𝑎2 =𝜎(𝑥−5)
activation
Neural networks and tesnors
parameters
Source: CognitiveClass.ai
hidden layers
TensorFlow is used in large scale production and deployment
Versatility of this Pythonic framework allows researchers to test out ideas with almost zero friction
The Battle: TensorFlow vs. PyTorch by
Source: CognitiveClass.ai
TF-Keras TensorFlow
Source: CognitiveClass.ai
Keras may be easier to get into and experiment with standard layers, in a plug & play spirit
Low-level environment for experimentation, giving the user more freedom to write custom layers and look under the hood of numerical optimization tasks.
Keras or PyTorch as your first deep learning framework: by and Rafał Jakubanis
Source: CognitiveClass.ai
Deep Learning on EdX
https://www.edx.org/professional-certificate/ibm-deep-learning
Source: CognitiveClass.ai
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com