# Analytical Business Rules with HANA and R – Forecasting Time Series

This blog series is about analytical business rules. I will focus on advanced techniques using R in BRFplus rule systems. R is together with PAL one tool that enhances HANA with techniques from predictive analytics.

**Operational vs. Analytical Decision Management**

Operational Decision Management is related to digitalization using business rules. With business rules we automate processes by doing automated decisions based on calculations. Usually the rule systems have a simple structure as algorithms specified by business process experts.

Analytical decision management has a different flavor since it uses methods from statistics and data mining to detect business rules. In many cases this is done by statisticians or data analysts and typical example are calculations of risks f.e. of a financial product.

Sometimes data analysts doing analytical business rules management find rules having a simple structure and sometimes the rules contain difficult calculations having their origin from analytical models. From now on I call those rules analytical rules.

**How to find Analytical Rules?**

Let’s look at a hypothetical and much simplified example. A revenue analyst wants to analyze the revenue of a certain business unit. This process should be automated and a rule system should warn a business analyst if the revenue gets critical. I will present two (simplified) answers which show you what I mean.

As I wrote above usual a specialist or sometimes a specialized organizational will define the rules. Those specialists will usually at first look at the data since visualization is very important for a data analyst or data scientist. SAP offers many tools for this task: SAP Lumira is perfect for this task, but you can also develop your own analytical apps using frameworks like APF for exploratory data analysis. HANA experts can also use HDB studios and people with R skills can use the Outside-In approach described here. So R offers a workbench that can be used by data scientists for everything that can’t be done with a dashboard. This includes specials charts for visualization which are described in this blog: https://www.r-bloggers.com/interactive-visualizations-with-r-a-minireview/ but also statistical tests.

The following diagram shows a time series in R imported in a data frame. If you look at the picture you will see that it is hard to see any pattern. You may argue that this is no realistic example since you need would expect a visible trend (perhaps linear), seasonality (regular and predictable changes which recur in a certain time interval) and a cyclic component (regular fluctuations).

We will come back the question of the trend later. At first we start with a very simple analytical rule: we want to find outliers.

**Outlier Detection – Checking of Thresholds**

Outliers are observations that you don’t expect. Usually those are very big or small values which can occur by chance. In our context we mean something different and ask how many revenues are smaller than a certain threshold – say less or equal 290.000. In the time series above you find 9 of those values which can be easily checked using the following DB Lookup expression in BRFplus.

This is a very simple analytical rule: we ask how many times the value is under a certain limit. I call this an analytical rule since I don’t check the properties of one business object – I check a huge number of values. With BRFplus DB Lookup working on transparent tables or CDS views you can check a huge amount of line items efficiently especially using the HANA database.

**Finding Trends and Forecasts**

In the following I will present a simplified example for a more advanced analytical rule that is using prediction. If you look at the example an example of time series above a natural question is, whether we can detect a trend and can even make predictions. A typical question is whether in the next time in the future there will be some outliers.

The example above is a typical example that in most cases it is hard to detect a trend. When a statistician analyses a time series instance usually he will try to decompose the time series to detect seasonal and cyclic behavior. Without those components the time series consists of a smooth component (trend) and random fluctuations (“white noise”). A well know technique is smoothing which removes irregularities to provide a clearer view of the true underlying pattern in the series. This can be done very easily using R in combination with HANA which I described in a previous blog: http://scn.sap.com/community/hana-in-memory/blog/2016/07/31/hana-and-r-inside-out-and-outside-in In the following I show a very simplified example of a prediction. In contrast to the last blog now I using the Inside-Out scenario where R is called from SQLscript procedures.

Supposed we selected the value of above time series using SQL we can create create a time series object in R:

`revenueseries <-ts(revenues)`

R provides all state-of-the-art algorithms for smoothing, f.e. Holt Winters smoothing:

`revenueseriessmooth <- HoltWinters(revenueseries, beta=FALSE, gamma=FALSE)`

In this example we use an automated estimation of the smoothing parameters. They are most important since in exponential smoothing methods. All exponential smoothing methods usually look at the latest values in the time series where older values have less influence. The smoothing parameters define exactly the influence of older values.

With the `plot.revenueseriessmooth`

command in R you can see the result in the following diagram.

When smoothing is performed you can make a prediction using the forecast package in R:

`revenueseriesforecasts <- forecast.HoltWinters(revenueseriessmooth, h=8)`

Here R calculated a prediction for eight weeks. The result can be printed out with the following command:

`plot.forecast(revenueseriesforecasts)`

In the picture you can see the values in 80% (middle blue) and 95% (light blue) prediction interval.

The prediction intervals can be used in predictive analytical rules. If they f.e. the lower values of the 95% prediction interval go below a certain value for a number of times, this can lead to a clearing case where an official in charge will be alerted using a business workflow for example.

It is very easy to implement this feature in BRFplus and in R-Inside-Out szenarion. Here a SQLscript procedure is using R. The following procedure has the time series as input, performs smooting, makes a forecast and return the lower values of the 95% prediction intervals. You will easily understand that I reused the code snippets above.

`DROP PROCEDURE "MY_SCHEMA"."REVENUE_FORECAST";`

`CREATE PROCEDURE "MY_SCHEMA"."REVENUE_FORECAST"`

` (IN revenue " MY_SCHEMA "."T_SERIES", OUT result " MY_SCHEMA"."T_SERIES")`

`LANGUAGE RLANG`

`READS SQL DATA AS`

`BEGIN`

`revenueseries <-ts(revenues)`

`revenueseriessmooth <- HoltWinters(revenueseries,`

beta=FALSE, gamma=FALSE)

`library("forecast")`

`revenueseriesforecasts <- forecast.HoltWinters(revenueseriessmooth, h=8)`

`lowval <- as.data.frame(matrix(revenueseriesforecasts2$lower[,"95%"])`

`colnames(result) <- c("REVENUE")`

`END;`

This procedure can be called using a SQL Script procedure in HANA

`/********* Begin Procedure Script ************/`

`BEGIN`

`lt_threshold_low = SELECT REVENUE FROM "SAPDAT"."ZWEEKLYREVENUE"`

` WHERE PARTNER = :IV_PARTNER and CLIENT = :IV_CLIENT`

` ORDER BY FISCYEAR ASC, WEEK ASC;`

`CALL "MY_SCHEMA"."REVENUE_FORECAST"(:lt_threshold_low, :OUTPUT_TABLE);`

`END`

This SQLscript procedure can be called from an AMDP or with a Database Procedure Proxy and both artifact can the called from BRFplus .At the moment you can’t use R in AMDP which makes the software logistics a little bit painful.

**Some Remarks about Forecast Automation**

Usually forecasting is very difficult. There is no single algorithm that will provide reasonable results for all problem instances. Usually a data scientist will have to visualize data, perform time series decomposition, will try out different smoothing parameters and finally will use statistical checks to analyze the output of the algorithms. At least this was what I learned in the 90s of the last century at university. This is true but in the last years that science has made progress. If you want to learn about this I recommend the book “Forecasting with Exponential Smoothing – The State Space Approach” written by Rob Hyndman and others: http://exponentialsmoothing.net/ The R forecast packages used above was written by Rob Hyndman, too. So let me describe some highlights of the book:

- Many methods of exponential smoothing are special cases of the so called state space approach.
- There a stochastic models for the state approach so that you can calculates above mentioned confidence intervals for a forecast.
- With so-called information criteria you can test the predictive ability of each model.
- You can use those techniques for automation of forecasting: just perform forecasts for every of above mentioned models and use it for model selection. And then choose the best model for calculation.
- In the literature you find many researches that combination of different models perform very well.

Also any researchers believe that with the above mentioned approach machines usually perform better doing predictions.

I recommend to test the quality of predictions. With HANA this can be done in real time: just calculate predictions of past values and compare them with actual values. This is of course a huge computational effort that can be done easily using In-Memory technology. Those computations should be revisited by statisticians who will use it to optimize the prediction algorithms.

**Summary**

I discussed two simple examples for analytical business rules. For a simple forecast scenarion I used the HANA/R integration since the R library provides all state-of-the-art algorithms for predictive analytics. Those can be called from ABAP and so from BRFplus to enhance Business Rules with complex calculations. I consider this as a giant step forward to intelligent decision making,

Really good write-up (as per usual), Tobias!

The only remark I have to make is that for such rather common forecast techniques, SAP HANA provides highly optimised built-in algorithms e.g. Forecast Smoothing (Holt linear and Holt Winter algorithms can be used here as well .

The huge benefit of the PAL algorithms is that the data doesn't need to be transported between SAP HANA and RSERVE and are optimised for multi-core processing.

You are absolutely right: PAL implements the standard best practice algorithms for time series like ARIMA, single double and triple exponential smoothing. If I interprete this right, AIC and BIC are available, too, as well as autocovariance for seasonality tests: http://help.sap.com/hana/SAP_HANA_Predictive_Analysis_Library_PAL_en.pdf

So it should be possible to implement the same (and even better) smoothing methods in the example above using PAL.

Nevertheless I think R has some advantages, too:

I really would like to know whether a statistician with PAL expert knowledge can do the same in PAL compared to state-of-the-art forecast packages in R.

Best Regards,

Tobias

I really would like to know whether a statistician with PAL expert knowledge can do the same in PAL compared to state-of-the-art forecast packages in R.

Hmm... that's something I cannot answer.

I suppose R generally trumps nearly everything else when it comes to available explanations about what's happening in any function, not least due to the open source concept.

A major aspect for HANA though is that the focus is less on the explorative/experimental work of a statistician/data scientist/number wizard but on making important core techniques available to the business application platform.

My comment was really aimed at the point that for many scenarios where a common solution approach is known (e.g. using a certain algorithm to solve a problem), PAL provides integrated services that can provide high performance and an integrated workflow.

Open Source is another reason why SAP need PAL: The OSS R implementation together with many libraries are under GPL 2.0 which is "viral" license. So SAP and other software vendors can't make it part of the product. IMHO it would be netherless possible for SAP from the licensing point of view to ship RLANG procedures as long the R Server & libraries are part of a customer implementation. Is that true?

Hi Tobias

there are a couple of non-open source R-versions available (e.g. https://www.microsoft.com/en-us/cloud-platform/r-server), so I think there would've been a way to come up with a fully integrated approach. However, the main value of R is really in the vast community content, so that's something nobody 'can buy'.

So, I don't really agree with the notion that the licensing is the main factor for building the PAL libraries. If you look at the standard R distribution, you'll find that it is not that well suited to work on large data sets. For that, you have to use additional libraries (e.g. parallel or the cluster libs for multi-node processing). This would be a massive bottleneck for SAP HANA setups. Going and implementing a multi-threaded R - like the MS version - means that there is software that has been written externally, needs to be supported and maintained and that is still not integrated with HANA.

In contrast, the PAL library is fully built, supported, maintained and integrated with SAP HANA. It can leverage all the neat infrastructure for data processing in SAP HANA, runs multi-core and multi-host, can be considered by the SAP HANA optimizers, etc.

Most importantly: it does indeed cover the majority of functions required for the scenarios, most of the SAP HANA users (admittedly most of them use SAP business software) need. And looking at the number of supported functions over the last 12 SPS you'll find that the PA library continuously gets extended.

For this 'buy-or-build' decision, I believe there had been good reasons to go 'build'.

For your second point: yes, SAP as well as anybody else can create R code and also libraries and distribute them with a license of their choice. And we do exactly that already. One example here would be the SAP Predictive Maintenance and Service on-premise product which implements R models.

Hi Lars,

your answer was extremely important. I guess IT decision makers will ask only two questions: "GPL? Are you kidding?" and "Are there any reference implementations." You answered those questions and this is an extremely important information for the community.

Best Regards,

Tobias