编程辅导 MIE1624H – Introduction to Data Science and Analytics Lecture 10

Lead Research Scientist, Financial Risk Quantitative Research, SS&C Algorithmics Adjunct Professor, University of Toronto
MIE1624H – Introduction to Data Science and Analytics Lecture 10 – Advanced Machine Learning
University of Toronto March 22, 2022

Machine learning
Machine learning gives computers the ability to learn without being explicitly programmed
■ Supervised learning: decision trees, ensembles (bagging, boosting, random forests), k-NN, linear regression, Naive Bayes, neural networks, logistic regression, SVM
❑ Classification
❑ Regression (prediction)
■ Unsupervised learning: k-means, c-means, hierarchical clustering, DBSCAN
❑ Clustering
❑ Dimensionality reduction (PCA, LDA, factor analysis, t-SNE) ❑ Association rules (market basked analysis)
■ Reinforcement learning ❑ Dynamic programming
■ Neural nets: deep learning, multilayer perceptron, recurrent neural network (RNN), convolutional neural network (CNN), generative
adversarial network (GAN) 2

Machine learning
Machine learning gives computers the ability to learn without being explicitly programmed

Source: , Intro to Machine Learning

Lecture #5, #7, #10 Lecture #10
Source: , Intro to Machine Learning

Lecture #5, #7, #10
Lecture #10
Lecture #11
Source: , Intro to Machine Learning

 Unsupervised Machine Learning – Clustering

Cluster analysis (segmentation) ▪ Unsupervised learning algorithm
o Unlabeled data and no “target” variable
▪ Frequently used for segmentation (to identify natural groupings of customers)
o Market segmentation, customer segmentation
▪ Most cluster analysis methods involve the use of a distance measure to calculate
the closeness between pairs of items
o Data points in one cluster are more similar to one another
o Data points in separate clusters are less similar to one another
Cluster #1
Cluster #3
Cluster #2

K-means clustering

Source: , Unsupervised Machine Learning

Source: , Unupervised Machine Learning

Source: , Unsupervised Machine Learning

Clustering: LinkedIn

Cluster analysis – K-means clustering
Source: , Cluster Analysis

Cluster analysis – Fuzzy C-means clustering (FCM)
Source: , Cluster Analysis

Cluster analysis – Hierarchical clustering
Source: , Cluster Analysis

AI for course curriculum design – clustering of skills

AI for course curriculum design – clustering of skills
Deep Learning and Computer Vision (neural nets, deep learning, AI, Python, TensorFlow)
Data Management
(databases, data structures, SQL, noSQL, web-scraping, APIs, intro to Big Data)
Statistical Analysis for Business (statistical modeling, hypothesis testing, SPSS)
Distributed Computing, Big Data Analytics (distributed computing, Cloud, Hadoop, Spark)
Creative Thinking, Design Thinking

Cluster analysis – DBSCAN
Source: , Cluster Analysis

Main clustering algorithms
■ Partition based (K-means):
❑ Medium and large sized databases (relatively efficient) ❑ Produces sphere-like clusters
❑ Needs number of clusters (K)
■ Partition based (FCM): ❑ Produces fuzzy clusters ❑ Long computational time
■ Hierarchical based (agglomerative): ❑ Produces trees of clusters
■ Density based (DBScan):
❑ Produces arbitrary shaped clusters
❑ Good when dealing with spatial clusters (maps)

Cluster analysis – comparison
Source: , Cluster Analysis

Applications of clustering
■ Retail / Marketing:
❑ Identifying buying patterns of customers
❑ Finding associations among customers demographic characteristics
❑ To recommend a new book, or to new customer by identifying clusters of books or clusters of customer preferences
■ Education:
❑ Education professionals may want to know the likes and dislikes of their students, they can create and understand the different groups and then package and market the various courses
■ Banking:
❑ Clustering normal transactions to find patterns of fraudulent credit card use ❑ Identifying clusters of customers, e.g., loyal
❑ Determining credit card spending by customer groups

Applications of clustering
■ Insurance:
❑ Fraud detection in claims analysis ❑ Insurance risk of customers
■ Publishing / Media:
❑ Automatically categorizing news based on their content ❑ Recommending similar news articles
❑ Tagging news
❑ Automatic fact checking
■ Medicine:
❑ Characterizing patient behaviour based on similar characteristics ❑ Identifying successful medical therapies for different illnesses
■ Biology:
❑ Clustering genetic markers to identify family tries

Other Machine Learning Algorithms

Machine learning
Source: Rahul “Artificial Intelligence Demystified”, http://www.analyticsvidhya.com/blog/2016/12/artificial-intelligence-demystified/

Machine learning

Association rules – unsupervised machine learning
▪ Frequently called Market Basket Analysis is an unsupervised learning algorithm (no target variable)
▪ Detects associations (affinities) between variables (items or events)
▪ If customer purchased bread and bananas, s/he has an 80% probability to
purchase milk during the same trip
▪ Multiple applications: o Cross-sell and up-sell o Targeted Promotions o Product bundling
o Store planograms
o Assortment optimization

Ensemble Learning

Source: , Ensemble Learning

12345𝑋12345
Source: , Ensemble Learning

Multiple models are built on training data
12345𝑋12345
Regression
Source: , Ensemble Learning

Multiple models are built on training data
12345𝑋12345
Regression
Regression
Source: , Ensemble Learning

Multiple models are built on training data
12345𝑋12345
Regression
Regression

Source: , Ensemble Learning

Multiple models are built on training data
12345𝑋12345
Regression
Regression

Source: , Ensemble Learning

Multiple models are built on training data
Regression
Regression

Source: , Ensemble Learning

Multiple models are built on training data
Regression
Regression

Source: , Ensemble Learning