Anomaly Detection
Density estimation
Dataset: {
Is
Application
Fraud detection
Manufacturing
Monitoring computers in a data center.
Gaussian Distribution
aka Normal Distribution
Say
Algorithm
Choosing features
that you think might be indicative of anomalous examples. Fit parameters
Given new example
, compute
Algorithm evaluation
Fit model
On a cross validation/test example
Possible evaluation metrics:
- True positive, false positive, false negative, true negative
- Precision/Recall
-
Can also use cross validation set to choose parameter
Choosing features
For non-gaussian features, take some type of transform to make the histogram looks much more Gaussian
Error analysis for anomaly detection
Want
Most common problem:
Note: if the Anomalous Detection Algorithm can’t distinguish the anomalous or non-anomalous examples, it’s useful that coming up with more features to do that.
Anomaly Detection vs. Supervised Learning
Anomaly Detection
- Very small number of positive examples
.(0-20 is common). - Large number of negtive(y=0) examples.
- Many different “types” of anomalies. Hard for any algorithm to learn from positive examples what the anomalies look like;
- future anomalies may look nothing like any of the anomalou examples we’ve seen so far
Supervised learning
- Large number of positive and negative examples.
- Enough positive examples for algorithm to get a sense of what positive examples are like
- future positive examples likely to be similar to ones in training set.
Multivariate Gaussian Distribution
Sigma and contour
Original model vs. Multivariate Version
Original Model
- Manualyy create features to capture anomalies where
take unusual combinations of values. - Computational cheaper (alternatively, scales better to larbge
E.g. n=10000,100000) - Ok even if
(training set size) is small
Multivariate Version
- Automatically captures correlations between features
- Computationally more expensive
- Must have
or else is non-invertible.
- Post title: 14_Anomaly Detection
- Create time: 2022-02-22 14:17:49
- Post link: Machine-Learning/14-anomaly-detection/
- Copyright notice: All articles in this blog are licensed under BY-NC-SA unless stating additionally.