PowerPoint 演示文稿
MFIN 290 Application of Machine Learning in
Finance: Lecture 6
Yujie He
7/31/2021
Agenda
Recap of last lecture (Unsupervised Learning and Neural Network)
Brief Introduction to Deep Learning
Lab: Auto-encoder for Fraud Detection Cont’d
2
Last Lecture
Unsupervised Learning
Dimension Reduction
Overview of different approach families, PCA, SVD
Clustering
Common methods
Evaluation
Real world example use case
Neural Network
Activation function
Loss function
Back-propagation
Gradient descent
Regularization
3
Brief Introduction to Deep Learning
4
What is Deep learning
Deep learning is a subfield of machine learning
Most machine learning methods work well because of
human-designed representations and input features
For example: features for finding named entities like
locations or organization names (Finkel et al., 2010):
Machine learning becomes just optimizing weights to
best make a final prediction
5
6
7
Feature learning example
LSTM based
language model
LSTM cell
activation
8
Deep Learning (DNN) vs. Neural Network
Neural networks with MANY layers!
Vanishing/exploding gradient
clipping, better activation functions, better optimizers
Structural change, residual/highway network
A variety of different model architectures
Convolution neural network
Transformer (multi-head attention)
Generative Adversarial Network (GAN)
Key differentiator is representation learning
Need large amount of training data
9
Why do we explore deep learning
Manually designed features are often over-specified, task specific, incomplete and take many efforts to
design and validate
Learned Features are easy to adapt, fast to learn
Deep learning provides a very flexible, (almost?) universal, learnable framework for representing world,
visual and linguistic information.
Deep learning can learn unsupervised (from raw text) and supervised (with specific labels like
positive/negative)
Transfer learning
Low resource languages/tasks
Zero-shot/few-shot learning
Cross-lingual applications
10
Why do we explore deep learning
In ~2010 deep learning techniques started outperforming other machine learning techniques. Why this
decade?
Large amounts of training data favor deep learning
Faster machines and multicore CPU/GPUs favor Deep Learning
New/larger models, algorithms, ideas
Better, more flexible learning of intermediate representations
Effective end-to-end joint training system
Improved model structure to encode more information (e.g. transformers)
Larger models with higher capacity (e.g. 17B parameters)
Better regularization (e.g. dropouts, batch norm) and optimization methods (e.g. Adam)
Improved performance (initially in speech and vision, then NLP)
11
Best Practices
Often does not hurt to normalize data (e.g. has big impact on KNN with Euclidean distance)
PCA
SVD
KNN
LASSO/Ridge regression (penalty is coefficient magnitude dependent)
Feature importance interpretation (e.g. regression model)
Tree-based models are not sensitive to magnitude of variables
Build a benchmark model
Find a good metric that aligns with business goal
12
Lab: Auto-encoder for Fraud Detection
Colab
13
https://colab.research.google.com/drive/1q_AuysUon2QB8V55MdzXoXHqPpqZ55WR?usp=sharing
Next Step
Today is deadline of submitting group member list and project selection
Lecture 7: Prof. Edward Sheng will cover times series, Kalman filter, and state-space model
14
15