Anomaly Detection algorithm falls under the clustering category.

It is used to find data in the system that does not match with existing model of the data. Such anomalies are inconsistent with regards to remaining data and can affect reporting and analysis of data.

The anomalous data is identified by applying K-means clustering algorithm on the data sets and the data farthest from the center of the cluster is identified as an anomaly.

To know more about K-means see PAL Algorithms simplified – K means

Hasan RafiqHi Arun,

Nice article but somehow I have a different opinion over anomaly detection. The python version of anomaly detection and other major systems worldwide use Gaussian mixture model( probabilistic model ).

SAP has provided “ANOMALY” function but it runs using the K-Means with distance function. However consider an example which fits more like a high width and low height ellipse. Hence PAL’s ANOMALY function will fail in this scenario.

Example below( True anomaly is red but closer however yellow will be detected as it is further from centroid X ):

In this case even non-anomalous examples might fit at a far away distance( high probability) as compared to an anomalous one appearing very close to centroid but outside of the high probability range, hence for anomaly detection I would prefer using the GMM ( Gaussian mixture ) function over the ANOMALY function.

We would want someone from the PA team to shed little more light on using “ANOMALY” function in this scenario. Orla Cullen

Thanks,

Hasan Rafiq