Analytical Business Rules with HANA and R – Forecasting Time Series
This blog series is about analytical business rules. I will focus on advanced techniques using R in BRFplus rule systems. R is together with PAL one tool that enhances HANA with techniques from predictive analytics.
Operational vs. Analytical Decision Management
Operational Decision Management is related to digitalization using business rules. With business rules we automate processes by doing automated decisions based on calculations. Usually the rule systems have a simple structure as algorithms specified by business process experts.
Analytical decision management has a different flavor since it uses methods from statistics and data mining to detect business rules. In many cases this is done by statisticians or data analysts and typical example are calculations of risks f.e. of a financial product.
Sometimes data analysts doing analytical business rules management find rules having a simple structure and sometimes the rules contain difficult calculations having their origin from analytical models. From now on I call those rules analytical rules.
How to find Analytical Rules?
Let’s look at a hypothetical and much simplified example. A revenue analyst wants to analyze the revenue of a certain business unit. This process should be automated and a rule system should warn a business analyst if the revenue gets critical. I will present two (simplified) answers which show you what I mean.
As I wrote above usual a specialist or sometimes a specialized organizational will define the rules. Those specialists will usually at first look at the data since visualization is very important for a data analyst or data scientist. SAP offers many tools for this task: SAP Lumira is perfect for this task, but you can also develop your own analytical apps using frameworks like APF for exploratory data analysis. HANA experts can also use HDB studios and people with R skills can use the Outside-In approach described here. So R offers a workbench that can be used by data scientists for everything that can’t be done with a dashboard. This includes specials charts for visualization which are described in this blog: https://www.r-bloggers.com/interactive-visualizations-with-r-a-minireview/ but also statistical tests.
The following diagram shows a time series in R imported in a data frame. If you look at the picture you will see that it is hard to see any pattern. You may argue that this is no realistic example since you need would expect a visible trend (perhaps linear), seasonality (regular and predictable changes which recur in a certain time interval) and a cyclic component (regular fluctuations).
We will come back the question of the trend later. At first we start with a very simple analytical rule: we want to find outliers.
Outlier Detection – Checking of Thresholds
Outliers are observations that you don’t expect. Usually those are very big or small values which can occur by chance. In our context we mean something different and ask how many revenues are smaller than a certain threshold – say less or equal 290.000. In the time series above you find 9 of those values which can be easily checked using the following DB Lookup expression in BRFplus.
This is a very simple analytical rule: we ask how many times the value is under a certain limit. I call this an analytical rule since I don’t check the properties of one business object – I check a huge number of values. With BRFplus DB Lookup working on transparent tables or CDS views you can check a huge amount of line items efficiently especially using the HANA database.
Finding Trends and Forecasts
In the following I will present a simplified example for a more advanced analytical rule that is using prediction. If you look at the example an example of time series above a natural question is, whether we can detect a trend and can even make predictions. A typical question is whether in the next time in the future there will be some outliers.
The example above is a typical example that in most cases it is hard to detect a trend. When a statistician analyses a time series instance usually he will try to decompose the time series to detect seasonal and cyclic behavior. Without those components the time series consists of a smooth component (trend) and random fluctuations (“white noise”). A well know technique is smoothing which removes irregularities to provide a clearer view of the true underlying pattern in the series. This can be done very easily using R in combination with HANA which I described in a previous blog: http://scn.sap.com/community/hana-in-memory/blog/2016/07/31/hana-and-r-inside-out-and-outside-in In the following I show a very simplified example of a prediction. In contrast to the last blog now I using the Inside-Out scenario where R is called from SQLscript procedures.
Supposed we selected the value of above time series using SQL we can create create a time series object in R:
R provides all state-of-the-art algorithms for smoothing, f.e. Holt Winters smoothing:
revenueseriessmooth <- HoltWinters(revenueseries, beta=FALSE, gamma=FALSE)
In this example we use an automated estimation of the smoothing parameters. They are most important since in exponential smoothing methods. All exponential smoothing methods usually look at the latest values in the time series where older values have less influence. The smoothing parameters define exactly the influence of older values.
plot.revenueseriessmooth command in R you can see the result in the following diagram.
When smoothing is performed you can make a prediction using the forecast package in R:
revenueseriesforecasts <- forecast.HoltWinters(revenueseriessmooth, h=8)
Here R calculated a prediction for eight weeks. The result can be printed out with the following command:
In the picture you can see the values in 80% (middle blue) and 95% (light blue) prediction interval.
The prediction intervals can be used in predictive analytical rules. If they f.e. the lower values of the 95% prediction interval go below a certain value for a number of times, this can lead to a clearing case where an official in charge will be alerted using a business workflow for example.
It is very easy to implement this feature in BRFplus and in R-Inside-Out szenarion. Here a SQLscript procedure is using R. The following procedure has the time series as input, performs smooting, makes a forecast and return the lower values of the 95% prediction intervals. You will easily understand that I reused the code snippets above.
DROP PROCEDURE "MY_SCHEMA"."REVENUE_FORECAST";
CREATE PROCEDURE "MY_SCHEMA"."REVENUE_FORECAST"
(IN revenue " MY_SCHEMA "."T_SERIES", OUT result " MY_SCHEMA"."T_SERIES")
READS SQL DATA AS
revenueseriessmooth <- HoltWinters(revenueseries,
revenueseriesforecasts <- forecast.HoltWinters(revenueseriessmooth, h=8)
lowval <- as.data.frame(matrix(revenueseriesforecasts2$lower[,"95%"])
colnames(result) <- c("REVENUE")
This procedure can be called using a SQL Script procedure in HANA
/********* Begin Procedure Script ************/
lt_threshold_low = SELECT REVENUE FROM "SAPDAT"."ZWEEKLYREVENUE"
WHERE PARTNER = :IV_PARTNER and CLIENT = :IV_CLIENT
ORDER BY FISCYEAR ASC, WEEK ASC;
CALL "MY_SCHEMA"."REVENUE_FORECAST"(:lt_threshold_low, :OUTPUT_TABLE);
This SQLscript procedure can be called from an AMDP or with a Database Procedure Proxy and both artifact can the called from BRFplus .At the moment you can’t use R in AMDP which makes the software logistics a little bit painful.
Some Remarks about Forecast Automation
Usually forecasting is very difficult. There is no single algorithm that will provide reasonable results for all problem instances. Usually a data scientist will have to visualize data, perform time series decomposition, will try out different smoothing parameters and finally will use statistical checks to analyze the output of the algorithms. At least this was what I learned in the 90s of the last century at university. This is true but in the last years that science has made progress. If you want to learn about this I recommend the book “Forecasting with Exponential Smoothing – The State Space Approach” written by Rob Hyndman and others: http://exponentialsmoothing.net/ The R forecast packages used above was written by Rob Hyndman, too. So let me describe some highlights of the book:
- Many methods of exponential smoothing are special cases of the so called state space approach.
- There a stochastic models for the state approach so that you can calculates above mentioned confidence intervals for a forecast.
- With so-called information criteria you can test the predictive ability of each model.
- You can use those techniques for automation of forecasting: just perform forecasts for every of above mentioned models and use it for model selection. And then choose the best model for calculation.
- In the literature you find many researches that combination of different models perform very well.
Also any researchers believe that with the above mentioned approach machines usually perform better doing predictions.
I recommend to test the quality of predictions. With HANA this can be done in real time: just calculate predictions of past values and compare them with actual values. This is of course a huge computational effort that can be done easily using In-Memory technology. Those computations should be revisited by statisticians who will use it to optimize the prediction algorithms.
I discussed two simple examples for analytical business rules. For a simple forecast scenarion I used the HANA/R integration since the R library provides all state-of-the-art algorithms for predictive analytics. Those can be called from ABAP and so from BRFplus to enhance Business Rules with complex calculations. I consider this as a giant step forward to intelligent decision making,