Technical Articles

# Data Cleansing:Outlier Correction Methods in IBP-Demand

Any form of statistical analysis goes around a life cycle that starts from Data Collection and goes a way forward towards partitioning, cleansing, visualizing, analyzing, performing hypothesis testing. Throughout this life cycle it is considered that data scientists spend most of their time in cleansing the data. It is a well-known fact that performing a downstream analysis with an Incorrect data often leads to In-correct results. In this Blog I have explained the Outlier correction Methodology which is one form of Data cleansing technique used in IBP-Demand Planning.

IBP-2011 Update

In IBP there are three ways to perform data cleansing

1. Outlier Detection and Correction
2. Substitute Missing Values
3. Promotion Sales Lift Elimination

Out of the three methods, I will cover the Outlier Detection and Correction part which is one of the Pre-Processing steps to be done before performing downstream analysis.

An “outlier” is a value in the historical data that lies outside the accepted range of values, which is also called the tolerance lane (Beyond Upper and Lower Boundary). There can be a variety of reasons for an Outlier to exist in the data set that includes data entry error, sudden spike due to socio-economic factors etc.

Techniques to Handle Outliers in IBP

1. Outlier Detection Method (Ordinary/ Trend & Seasonality)
2. Outlier Correction Method (Ordinary/ Trend & Seasonality)

I.Outlier Detection Methods: (Ordinary)

Outlier Detection Method helps to identify the potential outliers from the huge dataset which if unidentified can have a drastic change on the forecasted data. Below are the two methods to Detect Outliers in IBP.

• Interquartile Range Test (IQR)
• Variance Test

Interquartile Range Test (IQR)

The system checks whether the data points are within the interquartile range, which is the difference between the third quartile and the first quartile of the data. The values that are not within this range are identified as outliers.

Normally the best way of interpreting the Interquartile range is to use Box Plot technique. In this method the data is segregated into Four quartiles (Q1, Q2, Q3, Q4) based on the value that exist in the dataset. Then interquartile range (IQR) is obtained by taking the difference between the third and first quartiles. (IQR = Q3 – Q1).

To detect the outliers using the interquartile range method, the system calculates a lower and an upper bound using the first (Q1) and the third (Q3) quartile:

Lower bound = Q1 – Multiplier × IQR

Upper bound = Q3 + Multiplier × IQR

Multiplier is often defined within 1.5-3.0 as per the business requirement.

IBP Configuration:

APP-Manage Forecast Models -> Pre-Processing Steps -> Outlier correction • Variance Test

Variance is often interpreted as the spread of values in the dataset, which means how much a data point is spread out from the mean.

The system checks whether the data points from the dataset (Historical data) deviate from the mean by more than the standard deviation multiplied by a constant. The value that deviates larger from the mean is identified as outliers.

To detect the outliers using the variance test method, the system calculates a lower and an upper bound using the mean and the standard deviation (SD) of the historical data:

Lower bound = Mean – Multiplier × SD

Upper bound = Mean + Multiplier × SD The values that fall outside of this tolerance lane are considered as outliers. The multiplier influences the sensitivity of the outlier detection; using lower values for the multiplier will tend to detect more historical values as outliers, so again choosing the correct multiplier plays a major role based upon the business requirement.

 Multiplier: A decimal number by which the system should multiply the range of accepted values, thus including additional values in the range or excluding a set of values from it. The most commonly used multipliers are 1.5 and 3. Outlier Correction Methods(Ordinary)

With the above-discussed method potential outliers can be identified from the data set (Historical values) now it is necessary to correct those identified outliers with anyone of the below methods as per the business requirement.

IBP Configuration:

APP-Manage Forecast Models -> Pre-Processing Steps -> Outlier correction

There are six methods for correcting Outliers in IBP

1. Correction with Mean Excluding Outliers
2. Correction with Mean
3. Correction with Median Excluding Outliers
4. Correction with Median
5. Correction with Tolerance Excluding Outliers
6. Correction with Tolerance

Correction with Mean Excluding Outliers

The system replaces outliers with the average of all key figure values calculated for the historical periods and does not take the outliers into account for the calculation. This is the default outlier correction method.

Correction with Mean

The system replaces outliers with the average of all key figure values calculated for the historical periods.

Correction with Median Excluding Outliers

The system replaces outliers with the middle value from all key figure values calculated for the historical periods and does not take the outliers into account for the calculation.

Correction with Median

The system replaces outliers with the middle value from all key figure values calculated for the historical periods. The outliers are excluded from the calculation.

Correction with Tolerance Excluding Outliers

The system recalculates the tolerance interval without taking the outliers into consideration and changes the outliers so that they are at the limits of the new tolerance interval.

Correction with Tolerance

The system recalculates the tolerance interval and changes the outliers so that they are at the limits of the new tolerance interval.

No Correction

This is not an Outlier correction method, but simply an outlier detection is used for information purposes only.

Scenario 1: IQR Method Detection and Correction with Tolerance

Input KF-Actuals Quantity

Historical period: 11 weeks

Outlier Detection Calculation with IQR Method:  Output generated with outlier value for Actuals Qty Adj KF Scenario 2: Variance Method Detection and Correction with Tolerance

Input KF – Actual Quantity

Output KF – Actual Quantity Adjusted

Historical period: 11 weeks  Run Statistical Forecast with 11 week Historical Method II.Outlier Correction Methods (Trend & Seasonality)

1. No Correction
2. Adjustment to Seasonality and Trend
3. Adjustment to Seasonality and Trend with Tolerance

1.No Correction This method is used only for information purposes. No outliers will be corrected if we select this method. Outliers are corrected considering the Seasonality and Trend pattern using the moving average calculated for the data within the smoothing window. Using this method, the outliers are corrected considering the seasonality and Trend pattern generated by the Time series properties. After executing the statistical forecasting algorithm corrected outliers will be updated in the chosen Key figure and change point of the ‘Manage forecast automation profile’ app.

3.Adjustement to Seasonality and Trend with Tolerance

Outliers are calculated using the Tolerance range method. But the only difference here is a system also considers Trend and Seasonality within the selected data before performing the outlier correction. Smoothing widow

It is a subset size used in the outlier correction algorithm in order to determine the number of periods to which it has to determine the moving average in the data set.

Note: If Trend and Seasonality are found in the time series, the algorithm picks the smoothing window from the  Forecast Automation profile. If the only Trend is found in the Time series then the smoothing window is considered from the smoothing window period from the `Manage Forecast Model’ app.

Recommended smoothing window based on the period. If the period is month then the range is 6-12 months. If it is weeks the range is between 4-12 weeks. If it is days then the range is between 7-30days.

Pre-requisites

• Time series analysis needs to run in advance to detect seasonal and trend patterns
• Time series analysis should run on the level of statistical forecast including outlier detection
• To use this method, it is mandatory to enable the `Consider Timer series properties’ check box
• Smoothing factor Must be set with a value greater than `3’ based on the Period

Conclusion

In this blog, I have explained only the Outlier correction method ‘correction with Tolerance’ and covered new Outlier correction methods for Trend and Seasonality Introduced in IBP-2011 Version. Other methods of Outlier Correction are all self-explanatory and hence I am not covering it as a part of this blog. So with this, I would like to conclude that any forecasting model if it needs to obtain a better accuracy it requires a defined pre-processing step which should cleanse the complete dataset.

/             You must be Logged on to comment or reply to a post.
• Good summary of the Outlier correction methods in IBP Demand………

However, these methods are based on a simple mean and use range-based measures around the central tendency to detect outliers.

Since the detection is not based on thresholds (or Tolerance Lane as SAP has often called it) that are calculated with reference to the conditional mean, typically this will overcleanse the data.

Although these methods sound fancy, caution is advised…….

• Thanks for clear explanation.