Predictive Analysis using SQL Functions of HANA
An online retailer wants to improve it’s shipping/delivery process. Currently, there are deviations noticed between the planned and the actual date of delivery. Causing customer dis-satisfaction.
Currently the order book has 360 pending orders. Retailer wants to know the following:
- What pecent of orders will get delivered within 3 days of the planned delivery date?
- Within how many days of planned date will 95% of orders be delivered?
Note: A basic knowledge of statistics and normal distribution is required.
Retailer stores the historic delivery data in the “DELIVERYDATA” table in HANA.
Table “DELIVERYDATA” has the following columns:
OrderNo, CustomerID, ProductID,PlanDelDate,ActDelDate
where PlanDelDate –> Planned Delivery Date
ActDelDate –> Actual Delivery Date
Step 1: Assuming that actual delivery dates follow a normal distribution. And delay is the difference between the Planned and Actual delivery date.
Step 2: Need to find the mean delay and standard deviation of delay for the historical delivery data. Here is the SQL written in HANA:
select avg((days_between(plandeldate,actdeldate))) as mean,
stddev((days_between(plandeldate,actdeldate))) as standard_deviation
Mean came as 0.08 and standard deviation came as 3.5466. This is directly from the SQL above on HANA.
Step 3: Now the first problem. Need to find the percentage of delivery where the lag will be + or – 3 days. Used the following calculation on paper ( Manual ):
For “+3” days, the corresponding deviation comes as : ( 3 – 0.08 ) / 3.5466 = 0.8233. Looked into the standard deviation table manually, and the probability percentage comes as 29.39%. Please scroll to the end of this document to get the link for the standard deviation table.
In a similar way, for “-3″ day”, the probability percentage comes as 30.5%.
Combined together it comes = 29.39 + 30.5 = 59.89%
Company has 360 pending orders. 59.89% of 360 is ~ 216
So, 216 of these 360 orders are expected to be delivered within 3 days of the planned delivery date.
Step 4: Next problem. Need to know the delay range, by which 95% of the 360 orders will be delivered.
For normal distribution, twice the standard deviation (2 sigma) accounts for 95% of the distribution.
Mean delay was 0.08 days. As some orders reached late and some early. And standard deviation was 3.5467.
+ 2 sigma comes as : 2 x 3.5467 = 7.0934. Add it to the mean of 0.08. So the total comes as 7.1734. For practical purposes, lets take 7.
Similary for – 2 sigma : 2 x 3.5467 – 0.08 = 7.01. Roughly 7 days.
So 95% of the orders would reach within + or – 7 days of the planned delivery date.
Note: The attachment has the csv file to load data to the HANA “DELIVERYDATA” table.
Also one can find the normal standard distribution table in the below link: