Announcements
Reminder: pset5 self-grading form and pset6 out today, due 11/19 (1 week)
Unsupervised Learning III: Anomaly Detection
Machine Learning
Anomaly detection • What is anomaly detection?
• Methods:
– Density estimation
– Detection by reconstruction – One-class SVM
What is an anomaly?
Anomaly Detection is
• An unsupervised learning problem (data unlabeled)
• About the identification of new or unknown data or signal that a machine learning system is not aware of during training
5
Example 1
“Novel”
“Normal”
“Novel”
“Novel” 6
So what seems to be the problem?
It’s a 2-Class problem. “Normal” vs. “Novel”
7
So what seems to be the problem?
It’s a 2-Class problem. “Normal” vs. “Novel”
8
The Problem is
That “All positive examples are alike but each
negative example is negative in its own way”.
9
One-class recognition
• Suppose we want to build a classifier that recognizes anomalous activities in an airport
• How can we collect a training data?
– We easily assemble videos of normal airport activities like walking, checking in, etc., as positive examples.
• What about negative examples ?
– The negative examples are… all other
activities!!
• So the negative examples come from an unknown # of negative classes.
10
Importance of Anomaly Detection
Ozone Depletion History
• In 1985 three researchers (Farman, Gardinar and Shanklin) were puzzled by data gathered by the British Antarctic Survey showing that ozone levels for Antarctica had dropped 10% below normal levels
• Why did the Nimbus 7 satellite, which had instruments aboard for recording ozone levels, not record similarly low ozone concentrations?
• The ozone concentrations recorded by the satellite were so low they were being treated as outliers by a computer program and discarded!
Sources: http://exploringdata.cqu.edu.au/ozone.html http://www.epa.gov/ozone/science/hole/size.html
Real World Anomalies • Credit Card Fraud
– An abnormally high purchase made on a credit card
• Cyber Intrusions
– A web server involved in ftp traffic
Fraud Detection
• Fraud detection refers to detection of criminal activities occurring in commercial organizations
– Malicious users might be the actual customers of the organization or might be posing as a customer (also known as identity theft).
• Types of fraud
– Credit card fraud
– Insurance claim fraud
– Mobile / cell phone fraud
– Insider trading
• Challenges
– Fast and accurate real-time detection – Misclassification cost is very high
Healthcare Informatics
• Detectanomalouspatient records
– Indicate disease outbreaks, instrumentation errors, etc.
• KeyChallenges
– Only normal labels available
– Misclassification cost is very high
– Data can be complex: spatio- temporal
outbreaks from 2006 to today preventable byvaccinations Article
Industrial Damage Detection
• Industrial damage detection refers to detection of different faults and failures in complex industrial systems, structural damages, intrusions in electronic security systems, suspicious events in video surveillance, abnormal energy consumption, etc.
– Example: Aircraft Safety
• Anomalous Aircraft (Engine) / Fleet Usage
• Anomalies in engine combustion data
• Total aircraft health and usage management
• Key Challenges
– Data is extremely huge, noisy and unlabelled
– Most of applications exhibit temporal behavior
– Detecting anomalous events typically require immediate intervention
Image Processing
• Detecting outliers in a image monitored over time
• Detecting anomalous regions within an image
• Used in
– mammography image analysis – video surveillance
– satellite image analysis
• Key Challenges
– Detecting collective anomalies – Data sets are very large
50
100
150
200
250
50 100 150
200 250
300 350
Anomaly
Video Surveillance
https://arxiv.org /pdf/1606.08455.pdf
Density Estimation Method
Anomaly Detection
Anomaly detection example
Aircraft engine features: = heat generated
= vibration intensity
Dataset: New engine:
…
(heat)
(vibration)
Density estimation
Dataset:
Is anomalous?
(heat)
(vibration)
Anomaly detection example
Fraud detection:
= features of user ’s activities
Model from data.
Identify unusual users by checking which have
Manufacturing
Monitoring computers in a data center. = features of machine
= memory use, = number of disk accesses/sec,
= CPU load, = CPU load/network traffic. …
Example density estimation method: Multivariate Gaussian (Normal) distribution
Parameters
Parameter fitting: Given training set
Anomaly detection with the multivariate Gaussian
1. Fit model by setting
2. Given a new example , compute
Flag an anomaly if
Evaluation
Anomaly Detection
Evaluating an anomaly detection model
When developing a learning algorithm (choosing features, etc.), making decisions is much easier if we have a way of evaluating our learning algorithm.
Assume we have some labeled data, of anomalous and non- anomalous examples. ( if normal, if anomalous).
Training set: (assume normal examples/not anomalous)
Cross validation set: Test set:
Aircraft engines motivating example
10000 good (normal) engines
20 flawed engines (anomalous)
Training set: 6000 good engines
CV: 2000 good engines ( ), 10 anomalous ( ) Test: 2000 good engines ( ), 10 anomalous ( )
Algorithm evaluation
Fit model on training set
On a cross validation/test example , predict
Possible evaluation metrics:
– True positive, false positive, false negative, true negative – Precision/Recall
– F1-score
Can also use cross validation set to choose parameter
Precision/Recall/F1
Anomaly detection
• Very small number of positive examples (y=1)
• Large number of negative (y=0) examples
vs.
Supervised learning
Large number of positive and negative examples.
Enough positive examples for algorithm to get a sense of what positive examples are like, future positive examples likely to be similar to ones in training set.
• Many different “types” of anomalies. Hard for any algorithm to learn from positive examples what the anomalies look like; future anomalies may look nothing like any of the anomalous examples we’ve seen so far.
•
•
Anomaly detection vs.
• Fraud detection
• Manufacturing (e.g. aircraft
engines)
• Monitoring machines in a data center
Supervised learning
• Email spam classification
• Weather prediction
(sunny/rainy/etc).
• Cancer classification
Anomaly Detection by Reconstruction
Online Detection of Unusual
Events in Videos via Dynamic
Sparse Coding
Bin Zhao, Li Fei-Fei, Eric Xing
Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR 2011), Colorado Springs, CO, USA, June 2011
Goal: Detect Unusual Events in Videos
• Example unusual event: entering subway via exit
• Videos are described as spatio-temporal features
Dictionary-based Anomaly Detection
• Learn a dictionary of bases corresponding to usual events:
– a usual event should be reconstructible from a small number of such bases, and
– the reconstruction weights should change smoothly over space/time across actions in such events.
– an unusual event is either not reconstructible from the dictionary of usual events with small error, or,
– Needs a large number of bases , in a temporal-spatially non- smooth fashion.
• Must: Learn a good dictionary of bases representing usual events
• Must: Update the dictionary online to adapt to changing content of the video
Algorithm: Look for High Reconstruction Error
Algorithm
Results on YouTube Videos
One-Class SVM
Anomaly Detection
Support Vector Method for Novelty Detection
Bernhard Schölkof, Robert Williams, Alex Smola, John Shawe-Taylor, John Platt
42
Problem Formulation
• Suppose we are given a training ample drawn from
an underlying distribution P
• Wewanttoaestimatea“simple”subsetSX such that for a test point x drawn from the
distribution P ,
Pr(xS) , (0,1]
• We approach the problem by trying to estimate a function f which is positive on S and negative on the complement
43
One-Class SVM
• Given a feature space F (via kernel k )
• Define the class to be
Cw {x| fw(x)}
• (w,) are respectively a weight vector and an offset parameterizing a hyperplane in F
44
“Hey, Just a second”
If we use hyperplanes & offsets, doesn’t it mean we separate the “positive” sample? But, separate from what?
From the Origin
45
OCSVM
w
/ || w || (x)
/ || w ||
46
Example
sklearn.svm.OneClassSVM
55
OneClassSVM – Shortcomings
• Implicitly assumes that the “negative” data lies around the origin.
• Ignores completely “negative” data even if such data partially exist.
56
Next Class
Unsupervised Learning IV: Generative Adversarial Networks (GANs)
Implicit generative models; adversarial methods; Generative Adversarial Nets (GANs)