Any form of statistical analysis goes around a lifecycle that starts from Data Collection and goes a way forward towards partitioning, cleansing, visualizing, analyzing, performing hypothesis testing. Throughout this lifecycle it is considered that data scientists spend most of their time in cleansing the data. It is a well-known fact that performing a downstream analysis with an Incorrect data often leads to In-correct results. In this Blog I have explained the Outlier correction Methodology which is one form of Data cleansing technique used in IBP-Demand Planning(as on 1905).

In IBP there are three ways to perform data cleansing

- Outlier Detection and Correction
- Substitute Missing Values
- Promotion Sales Lift Elimination

Out of the three methods I will cover the Outlier Detection and Correction part which is one of the Pre-Processing step to be done before performing downstream analysis.

An “outlier” is a value in the historical data that lies outside the accepted range of values, which is also called the tolerance lane (Beyond Upper and Lower Boundary). There can be a variety of reasons for an Outlier to exist in the data set that includes data entry error, sudden spike due to socio economic factors etc.

**Techniques to Handle Outliers in IBP**

- Outlier Detection Method
- Outlier Correction Method

**Outlier Detection Methods:**

Outlier Detection Method helps to identify the potential outliers from the huge dataset which if unidentified can have a drastic change in the forecasted data. Below are the two methods to Detect Outliers in IBP.

- Interquartile Range Test(IQR)
- Variance Test

**Interquartile Range Test (IQR)**

The system checks whether the data points are within the interquartile range, which is the difference between the third quartile and the first quartile of the data. The values that are not within this range are identified as outliers.

Normally the best way of interpreting the Interquartile range is to use Box Plot technique.

In this method the data is segregated into Four quartiles (Q1, Q2, Q3, Q4) based on the value that exist in the dataset. Then interquartile range (IQR) is obtained by taking the difference between the third and first quartiles. (IQR = Q3 – Q1).

To detect the outliers using the interquartile range method, the system calculates a lower and an upper bound using the first (Q1) and the third (Q3) quartile:

Lower bound = Q1 – Multiplier × IQR

Upper bound = Q3 + Multiplier × IQR

Multiplier is often defined within 1.5-3.0 as per the business requirement.

__IBP Configuration:__

APP-Manage Forecast Models -> Pre-Processing Steps -> Outlier correction

**Variance Test**

Variance is often interpreted as the spread of values in the dataset, which means how much a data point is spread out from the mean.

The system checks whether the data points from the dataset (Historical data) deviate from the mean by more than the standard deviation multiplied by a constant. The value that deviates larger from the mean is identified as outliers.

To detect the outliers using the variance test method, the system calculates a lower and an upper bound using the mean and the standard deviation (SD) of the historical data:

Lower bound = Mean – Multiplier × SD

Upper bound = Mean + Multiplier × SD

The values that fall outside of this tolerance lane are considered as outliers. The multiplier influences the sensitivity of the outlier detection; using lower values for the multiplier will tend to detect more historical values as outliers, so again choosing the correct multiplier plays a major role based upon the business requirement.

Multiplier: A decimal number by which the system should multiply the range of accepted values, thus including additional values in the range or excluding a set of values from it. The most commonly used multipliers are 1.5 and 3. |

__Outlier Correction Methods__

With the above discussed method a potential outliers can be identified from the data set (Historical values) now it is necessary to correct those identified outliers with anyone of the below methods as per the business requirement.

__IBP Configuration:__

APP-Manage Forecast Models -> Pre-Processing Steps -> Outlier correction

**There are six methods for correcting Outliers in IBP**

- Correction with Mean Excluding Outliers
- Correction with Mean
- Correction with Median Excluding Outliers
- Correction with Median
- Correction with Tolerance Excluding Outliers
**Correction with Tolerance**

**Correction with Mean Excluding Outliers**

The system replaces outliers with the average of all key figure values calculated for the historical periods and does not take the outliers into account for the calculation. This is the default outlier correction method.

**Correction with Mean**

The system replaces outliers with the average of all key figure values calculated for the historical periods.

**Correction with Median Excluding Outliers**

The system replaces outliers with the middle value from all key figure values calculated for the historical periods and does not take the outliers into account for the calculation.

**Correction with Median**

The system replaces outliers with the middle value from all key figure values calculated for the historical periods. The outliers are excluded from the calculation.

**Correction with Tolerance Excluding Outliers**

The system recalculates the tolerance interval without taking the outliers into consideration and changes the outliers so that they are at the limits of the new tolerance interval.

**Correction with Tolerance**

The system recalculates the tolerance interval and changes the outliers so that they are at the limits of the new tolerance interval.

**No Correction**

This is not an Outlier correction method, but simply an outlier detection is used for information purposes only.

__Scenario 1: IQR Method Detection and Correction with Tolerance__

Input KF-Actuals Quantity

Output KF- Actual Quantity Adjusted

Historical period: 11 weeks

__Outlier Detection Calculation with IQR Method:__

Output generated with outlier value for Actuals Qty Adj KF

__Scenario 2: Variance Method Detection and Correction with Tolerance__

Input KF – Actual Quantity

Output KF – Actual Quantity Adjusted

Historical period: 11 weeks

Run Statistical Forecast with 11 week Historical Method

**Conclusion**

In this Blog I have explained only Outlier correction method ‘correction with Tolerance’.Other methods of Outlier Correction are all self-explanatory and hence I am not covering it as a part of this blog. So with this I would like to conclude that any forecasting model if it needs to obtain a better accuracy it requires a defined pre-processing steps which should be cleanse the complete dataset.

Please post any of your queries in comment section.