留学生辅导 SDM08)

Lecture 20: Anomaly Detection
Introduction to Machine Learning Semester 1, 2022
C 2022 The University of Melbourne Acknowledgement:

Copyright By PowCoder代写 加微信 powcoder

Lecture Outline
• AnomalyDetection • Definition
• Importance • Structure
• AnomalyDetectionAlgorithms • Statistical
• Proximity-based • Density-based
• Clustering-based

What are Outliers/Anomalies?
• Anomaly: A data object that deviates significantly from the normal objects as if it were generated by a different mechanism
• Ex.: Unusual credit card purchase, sports: , …
• Anomalies are different from noise
• Anomalies are interesting:
• They violate the mechanism that generates the normal data
• translate to significant (often critical) real life entities
• Cyber intrusions
• Credit card fraud
Noise is random error or variance in a measured variable Noise should be removed before anomaly detection

Importance of Anomaly Detection?
Ozone Depletion History
• In 1985 three researchers (Farman, Gardinar and Shanklin) were puzzled by data gathered by the British Antarctic Survey showing that ozone levels for Antarctica had dropped 10% below normal levels
• Why did the Nimbus 7 satellite, which had instruments aboard for recording ozone levels, not record similarly low ozone concentrations?
• The ozone concentrations recorded by the satellite were so low they were being treated as noise by a computer program and discarded!

Variants of Anomaly Detection Problem
• VariantsofAnomaly/OutlierDetectionProblems
• Given a database D, find all the data points x ∈ D having the top-n largest anomaly scores f(x)
• Given a database D, containing mostly normal (but unlabeled) data points, and a test point x, compute the anomaly score of x with respect to D
Given a database D, find all the data points x ∈ D with anomaly scores greater than some threshold t

Structure of Anomalies
• Global/Point anomalies
• Contextual/Conditional anomalies • Collective anomalies

Global/Point anomalies
• GlobalAnomaly(orpoint)
– Object is Og if it significantly deviates from the rest of the data set
– Ex.Intrusiondetectionincomputernetworks
– Issue:Findanappropriatemeasurementofdeviation

Contextual anomalies
• Contextual Anomaly (or conditional)
• Object is Oc if it deviates significantly based on a selected
• Attributes of data objects should be divided into two groups
• Contextual attributes: defines the context, e.g., time & location
• Behavioral attributes: characteristics of the object, used in anomaly evaluation, e.g., temperature
• Can be viewed as a generalization of local anomalies—whose density significantly deviates from its local area
• Issue: How to define or formulate meaningful context?
* Song, et al, “Conditional Anomaly Detection”, IEEE Transactions on Data and Knowledge Engineering, 2006.

Example of Contextual Anomalies
Ex. 10o C in Paris: Is this an anomaly?

Collective anomalies
Collective Anomalies
• A subset of data objects that collectively deviate significantly from the whole data set, even if the individual data objects may not be anomalies
E.g., intrusion detection:
When a number of computers keep sending denial-of-service • Detection of collective anomalies
packages to each other
Consider not only behavior of individual objects, but also that of
groups of objects
Requires background knowledge about the relationship among
data objects, such as a distance or similarity measure on objects.

Example of Collective anomalies
• Requires a relationship among data instances
• The individual instances within a collective anomaly are not anomalous by themselves
anomalous subsequence
Sequential data Spatial data Graph data

Anomaly detection paradigms: supervised, semi-supervised, and unsupervised

Supervised Anomaly Detection
Supervised anomaly detection
• Labels available for both normal data and anomalies
• Samples examined by domain experts used for training & testing • Challenges
Require both labels from both normal and anomaly class Imbalanced classes, i.e., anomalies are rare: Boost the
anomaly class and make up some artificial anomalies
Cannot detect unknown and emerging anomalies
Catch as many outliers as possible, i.e., recall is more important than accuracy (i.e., not mislabeling normal objects as outliers)

Semi-supervised Anomaly Detection
Semi-Supervised anomaly detection
Labels available only for normal data
Model normal objects & report those not matching the model as
Challenges
Require labels from normal class
Possible high false alarm rate – previously unseen (yet legitimate)
data records may be recognized as anomalies

Unsupervised Anomaly Detection I
Unsupervised anomaly detection
• Assume the normal objects are somewhat “clustered” into multiple
groups, each having some distinct features
• An outlier is expected to be far away from any groups of normal
General steps
Build a profile of “normal” behavior
summary statistics for overall population model of multivariate data distribution
• Use the “normal” profile to detect anomalies
anomalies are observations whose characteristics
differ significantly from the normal profile

Unsupervised Anomaly Detection II
Unsupervised anomaly detection Challenges
• Normal objects may not share any strong patterns, but the
collective outliers may share high similarity in a small area
• Ex. In some intrusion or virus detection, normal activities are diverse
• Unsupervised methods may have a high false positive rate but still miss many real outliers.
Many clustering methods can be adapted for unsupervised methods
• Find clusters, then outliers: not belonging to any cluster
• Problem 1: Hard to distinguish noise from outliers
• Problem 2: Costly since first clustering: but far less outliers than normal objects

Unsupervised anomaly detection: Approaches
• Statistical (or: model-based)
• Assume that normal data follow some statistical model
• Proximity-based
• An object is an outlier if the nearest neighbors of the object are far away
• Density-based
• Outliers are objects in regions of low density
• Clustering-based
Normal data belong to large and dense clusters

Statistical Anomaly detection

Statistical anomaly detection
Anomalies are objects that are fit poorly by a statistical model. • Idea: learn a model fitting the given data set, and then identify the
objects in low probability regions of the model as anomalies
• Assumption:normaldataisgeneratedbyaparametricdistribution
with parameter θ
– Theprobabilitydensityfunctionoftheparametricdistributionf(x,θ)
gives the probability that object x is generated by the distribution – Thesmallerthisvalue,themorelikelyxisanoutlier
Challenges of Statistical testing:
– highly depends on whether the assumption of statistical model holds in the real data

Visualizing the data
Graphical Approaches
• Boxplot (1-D), Scatter plot (2-D), Spin plot (3-D)
• Limitations: Time consuming, Subjective
Image: https://en.wikipedia.org/wiki/Box_plot#/media/File:Boxplot_with_outlier.png

Univariate data — General Approach
Avg. temp.: x={24.0, 28.9, 28.9, 29.0, 29.1, 29.1, 29.2, 29.2, 29.3, 29.4} • Use the maximum likelihood method to estimate μ and σ
• Fortheabovedataxwithn=10:
• Decide on a confidence limits, e.g.,
• Then 24 is an outlier since:
(24 – 28.61) /1.51 = – 3.04 < –3 Image: https://en.wikipedia.org/wiki/Standard_deviation#/media/File:Standard_deviation_diagram.svg Multivariate Data • Multivariate Gaussian distribution – OutlierdefinedbyMahalanobisdistance – Grubb’stestonthedistances Mahalanobis Mahalanobis Distance Mahalanobis Distance • Mahalanobis Distance • S is the covariance matrix: • For 2-dimensional data: Likelihood approach • Assume the dataset D contains samples from a mixture of two probability distributions: • M (majority distribution) • A (anomalous distribution) • Generalapproach: • Initially, assume all the data points belong to M • Let Lt(D) be the log likelihood of D at time t • For each point xt that belongs to M, move it to A Let Lt+1 (D) be the new log likelihood. Compute the difference, Δ = Lt(D) – Lt+1 (D) If Δ > c (some threshold), then xt is declared as an anomaly and moved permanently from M to A

Likelihood approach
Data distribution, D = (1 –λ) M +λA
• M is a probability distribution estimated from data • A is initially assumed to be uniform distribution
• Likelihood at time t:
L(D) P(x)(1λ)t
 |A| P(x)λt
 i1xM xA
LL(D)M log(1λ)logP (x)A logλlogP (x)
xM xA it it
ttMitAi tt

Statistical Anomaly detection
• Statistical tests are well-understood and well-validated.
• Quantitative measure of degree to which object is an outlier.
• Data may be hard to model parametrically.
• multiple modes
• variable density
• In high dimensions, data may be insufficient to estimate true distribution.

Proximity-based Anomaly detection

Proximity-based Anomaly detection
Anomalies are objects far away from other objects.
• An object is an anomaly if the nearest neighbors of the object are far away, i.e., the proximity of the object significantly deviates from the proximity of most of the other objects in the same data set
• Common approach:
– Outlier score is distance to kth nearest neighbor. – Scoresensitivetochoiceofk.

Proximity-based anomaly detection

Proximity-based anomaly detection

Proximity-based outlier detection
– Easiertodefineaproximitymeasureforadatasetthandetermineits statistical distribution.
– Quantitativemeasureofdegreetowhichobjectisanoutlier.
– Dealsnaturallywithmultiplemodes.
– O(n2)complexity.
– Scoresensitivetochoiceofk.
– Doesnotworkwellifdatahaswidelyvariabledensity.

Density-based Anomaly detection

Density-based outlier detection
Outliers are objects in regions of low density. • Outlierscoreistheinverseofthedensity
around a point
• Scoresusuallybasedonproximities. • Examplescores:
• #pointswithinafixedradiusd
• Reciprocalofaveragedistancetok
nearest neighbors:
density(x,k)1 distance(x,y)1 k yN(x,k) 
Tend to work poorly if data has variable density.
Image: https://en.wikipedia.org/wiki/Local_outlier_factor#/media/File:Reachability-distance.svg

Density-based outlier detection
Relative density outlier score
Local Outlier Factor (LOF)
Reciprocal of average distance to k nearest neighbors, relative to that of those k neighbors.
relative density(x, k )  density(x, k )
1 density(y,k) k yN(x,k)
Image: https://en.wikipedia.org/wiki/File:LOF-idea.svg

Density-based outlier detection
In the NN approach, o2 is not
considered as outlier, while LOF approach find both o1
and o2 as outliers!

Density-based outlier detection
• Quantitative measure of degree to which object is an outlier.
• O(n2) complexity
• Must choose parameters
• k for nearest neighbor
• d for distance threshold
Can work well even if data has variable density.

Cluster-based Anomaly Detection

Cluster-based outlier detection
Outliers are objects that do not belong strongly to any cluster.
Approaches:
– Assess degree to which object belongs to any cluster.
– Eliminate object(s) to improve objective function.
– Discard small clusters far from other clusters
– Outliers may affect initial formation of clusters.

Cluster-based outlier detection
Assess degree to which object belongs to any cluster.
For prototype-based clustering (e.g. k-means), use distance to cluster centers.
To deal with variable density clusters, use relative distance:
distance(x,centroidC ) medianx’C distance(x’,centroidC )
Similar concepts for density-based or connectivity-based clusters.

Cluster-based outlier detection
distance of points from nearest centroid

Cluster-based outlier detection
relative distance of points from nearest centroid

Cluster-based outlier detection
Eliminate object(s) to improve objective function.
1) Form initial set of clusters.
2) Remove the object which most improves objective function.
3) Repeat step 2) until …
Discard small clusters far from other clusters.
• Need to define thresholds for “small” and “far”.

Cluster-based outlier detection
• Some clustering techniques have O(n) complexity.
• Extends concept of outlier from single objects to groups of objects.
• Requires thresholds for minimum size and distance.
• Sensitive to number of clusters chosen.
• Hard to associate outlier score with objects.
• Outliers may affect initial formation of clusters.

• Types of outliers
• Supervised, semi-supervised, or unsupervised
• Statistical, proximity-based, clustering-based approaches
• Ethics in Machine Learing
• Friday: Defining and measuring (un)fairness in ML
Next Wednesday: Guest lecture (live)

References
• Tan et al (2006) Introduction to Data Mining. Section 4.3, pp 150-171. (Chapter 10)
• V. Chandola, A. Banerjee, and V. Kumar, (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 1-58.
• A. Banerjee, et al (2008). Tutorial session on anomaly detection. The SIAM Data Mining Conference (SDM08)

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com