14_Anomaly Detection | Carpe's Blog

Anomaly Detection

Density estimation

Dataset: {}
Isanomalous?

Application

Fraud detection
Manufacturing
Monitoring computers in a data center.

Gaussian Distribution

aka Normal Distribution
Say. Ifis a distributed Gaussian with mean, variance

Algorithm

Choosing featuresthat you think might be indicative of anomalous examples.
Fit parameters
Given new example, compute

Algorithm evaluation

Fit modelon training set{}
On a cross validation/test example, predict

Possible evaluation metrics:
- True positive, false positive, false negative, true negative
- Precision/Recall
-
Can also use cross validation set to choose parameter

Choosing features

For non-gaussian features, take some type of transform to make the histogram looks much more Gaussian
Error analysis for anomaly detection
Wantlarge for normal examples.
small for anomalous examples.

Most common problem:

Note: if the Anomalous Detection Algorithm can’t distinguish the anomalous or non-anomalous examples, it’s useful that coming up with more features to do that.

Anomaly Detection vs. Supervised Learning

Anomaly Detection

Very small number of positive examples.(0-20 is common).
Large number of negtive(y=0) examples.
Many different “types” of anomalies. Hard for any algorithm to learn from positive examples what the anomalies look like;
future anomalies may look nothing like any of the anomalou examples we’ve seen so far

Supervised learning

Large number of positive and negative examples.
Enough positive examples for algorithm to get a sense of what positive examples are like
future positive examples likely to be similar to ones in training set.

Multivariate Gaussian Distribution

$𝕟$

Sigma and contour

Original model vs. Multivariate Version

Original Model

Manualyy create features to capture anomalies wheretake unusual combinations of values.
Computational cheaper (alternatively, scales better to larbgeE.g. n=10000,100000)
Ok even if(training set size) is small

Multivariate Version

Automatically captures correlations between features
Computationally more expensive
Must haveor elseis non-invertible.