14_Anomaly Detection
Carpe Tu Black Whistle

Anomaly Detection

Density estimation

Dataset: {}
Isanomalous?

image

.

Application

  1. Fraud detection


  2. Manufacturing

  3. Monitoring computers in a data center.


Gaussian Distribution

aka Normal Distribution
Say. Ifis a distributed Gaussian with mean, variance


image

Algorithm

  1. Choosing featuresthat you think might be indicative of anomalous examples.

  2. Fit parameters

  3. Given new example, compute

Algorithm evaluation

Fit modelon training set{}
On a cross validation/test example, predict

image

Possible evaluation metrics:
- True positive, false positive, false negative, true negative
- Precision/Recall
-
Can also use cross validation set to choose parameter

image

Choosing features

For non-gaussian features, take some type of transform to make the histogram looks much more Gaussian
Error analysis for anomaly detection
Wantlarge for normal examples.
small for anomalous examples.

Most common problem:

image

Note: if the Anomalous Detection Algorithm can’t distinguish the anomalous or non-anomalous examples, it’s useful that coming up with more features to do that.

Anomaly Detection vs. Supervised Learning

Anomaly Detection

  • Very small number of positive examples.(0-20 is common).
  • Large number of negtive(y=0) examples.
  • Many different “types” of anomalies. Hard for any algorithm to learn from positive examples what the anomalies look like;
  • future anomalies may look nothing like any of the anomalou examples we’ve seen so far

Supervised learning

  • Large number of positive and negative examples.
  • Enough positive examples for algorithm to get a sense of what positive examples are like
  • future positive examples likely to be similar to ones in training set.

Multivariate Gaussian Distribution

𝕟


Sigma and contour

image

image

image

image

Original model vs. Multivariate Version

Original Model

  • Manualyy create features to capture anomalies wheretake unusual combinations of values.
  • Computational cheaper (alternatively, scales better to larbgeE.g. n=10000,100000)
  • Ok even if(training set size) is small

Multivariate Version

  • Automatically captures correlations between features
  • Computationally more expensive
  • Must haveor elseis non-invertible.