During data load we come across of one issue very often which is related to “data load runtime”. This runtime is related to process chain execution time. Being a part of Production Support Team our responsibility is to estimate the data load completion time. Maximum times we rely on excel-based tool which provides the runtime. But reliability on the estimation is less significant.
Based on my past experience, I am offering a technique known as “Predictive Modeling” which can be used to predict the future runtime and thus we can plan resource allocation. Let us first understand what we mean by “Predictive Modeling” –
“Predictive modeling is the process by which a model is created or chosen to try to best predict the probability of an outcome”
We are going to create a model which is based on Predictive Analysis and later we will use this model to predict the runtime. To explain this process here I have considered two input parameters – Number of Records, Runtime for cube “ZXXC_01”. As per Predictive Analysis, first we need to identify independent (X) and dependent variables (Y). In our case, Number of Records is independent and Runtime is dependent variable. Thus in model using values of X, we can predict the value for Y. Let say, we have following data as per record (this data I took it from BP2 system)
We need a model who can predict the runtime in advance. However issue here is to predict the runtime we should have no of records in hand. But we don’t have such details, now question comes what can be done here. So to get the “No of Records” for advance period we have to use Forecast Techniques.
This complete exercise covers four major steps –
Step 1:Identify relation between variables X and Y – To understand whether variables are related to each other, we will use coefficient of correlation/covariance here.
Step 2:Once the relationship is found, then create Regression Model – This model will predict future runtime.
St Step 3:Use Forecast techniques to get future Runtime – We will use “Exponential Smoothing” technique here to get the values
4. Step 4:Now apply Regression Model on forecasted data to get the future runtime.
Step 1 – Identify the relationship between variables X and Y
After applying formula for correlation coefficient “r”, we get the value as 0.61. This value is close to +1, and thus these two variables are co-related.
Step 2 – Create Regression Model
We have all studied linear regression equation in our schools as
“r” we have already calculated in step 1.
So after applying standard deviation for X and Y and correlation coefficient we get following linear equation
Step 3 – Use Exponential Smoothing to get forecasted data for “No of Records”
After applying Exponential Smoothing (with alpha = 0.7), We get the Forecast Data for No of Records.
Comparison of Actual and Forecast Data using graph –
Step 4 – Apply Regression Model to forecasted data to get the runtime
After applying regression model to forecasted data we have got in previous step we get the future runtime.
Comparison of Actual and Predicted Runtime using graph –
Re Remark – It might possible that we may arrive with more than one indpenedent variable in that case we can use multiple regression. It is good to have more than one independent variable. For this case I have simply used Linear Regession with one IV,