Skip to Content

Last year, SAP announced the launch of their new solution in the predictive analytics portfolio, SAP Predictive Analysis 1.0. It was a replacement to the classical offering of SAP BO Predictive Workbench (a BO frontend to IBM’s SPSS statistical/predictive engine), however it was also part of a bigger strategy of spreading SAP’s offering in the advanced analytics space, specially related to Big Data analytics. With PA, HANA and HANA native predictive library (Predictive Analysis Library or PAL), which enables the execution of predictive algorithms in-database (i.e. with procedures running in the DB layer and exporting just the resultset, instead of exporting the whole dataset for the algorithms to run in the application layer), SAP had become a big contender in the Big Data Predictive Analytics space. It was so that Forrester recognized SAP’s strategy strength and positioned it together with SAS and IBM in the “leaders’ wave” in their most recent Big Data Predictive Analytics report.

/wp-content/uploads/2013/06/forrester_2013_big_data_predictive_analytics_231180.png

Figure 1 – Forrester Wave: Big Data Predictive Analytics Solutions Q1’13

Of course SAP still has a very feeble marketshare in the predictive analytics space, specially compared to the likes of SAS and IBM, but the strength of SAP’s offering, centralized on HANA’s real time, in-memory & big data capabilities, was enough to put SAP well positioned in the analysts’ eyes.

However, in practical terms, SAP’s actual portfolio was very limited. Yes, HANA brings a lot of modern and groundbreaking technologies to the game that weren’t available before, but in terms of actual functionalities (i.e. analytical models possible to be implemented), it was still behind its main competitors by far. Solutions like SAS (Stat and Enteprise Miner) and SPSS (Statistics and Modeler) have been in the market for over 45 years, being even older than SAP’s R/2. And that means lots and lots of experience and, most importantly, content, which translates into the vast amount of algorithms and industry-specific pre-built analytic models. In the meantime, SAP’s initial offering consisted of a couple of dozen algorithms in PAL, which still had to be consumed through procedures that had to be written in HANA (i.e. not end-user friendly, like SAS or SPSS) with just a handful of these algorithms being supported in PA (which was intended to be the end-user/analyst friendly tool). Now, 90% (that is of course a guess) of the analytical models companies usually use (or want to use) couldn’t be implemented with HANA and PA alone.

And of course SAP knew that. And as part of that strategy, since day one SAP has announced that R integration has been a huge part of its predictive analytic strategy. R is an open source statistics/predictive analytical environment vastly used throughout the world, said to include more than 3,500 distinct algorithms in their standard set of libraries. It initially appeared strongly in the academic world (much like SPSS and SAS did 20 or 30 years ago) and has recently seen a considerable and steady growth in adoption by the corporate world. Rexer Analytics 2011 Data Miner Survey presents R as the most used statistical solution in the world: 47% of the companies which participated in the survey claimed to use R – and that number tends to be even higher when Rexer announce their 2013 Survey results in next September.

http://revolution-computing.typepad.com/.a/6a010534b1db25970b01676908ecaf970b-pi

Figure 2 – Rexer Analytics 2011 Data Miner Survey results

(notice SAP didn’t appear as a player in 2011 – that will probably change in 2013… 😉 )

But even with R, SAP’s offering was still weaker. HANA R integration, while powerful, still demanded a very specific set of development skills in order to deliver actual analytical models to the business users (as demonstrated in this excellent blog by Blag). The big corporations with their own ranks of statistical analysts and R developers wouldn’t have a problem with that, but that isn’t the case for the vast majority of companies seeking statistical insights. Most of them rely on very few people with deep functional and business expertise but usually very little or almost none technical knowledge.

In this context, the PA R integration would then play a major role in enabling these users to consume HANA and/or R algorithms through an user friendly, graphical tool (in the likes of SAS and SPSS Modeler). And R integration was there since the first release of PA. In the latest ones, it even installs R for you without requiring any previous knowledge from the user. However, the number of R functions (algorithms) supported in PA is still very limited, again reducing the applicability of PA in real-life scenarios… 🙁

That was until before the latest release. 😀

With yesterday’s launch of PA 1.0 SP11 in GA, for the first time, SAP brings the possibility of consuming custom R functionalities (i.e. algorithms that weren’t built in standard PA) without having to resort to developing HANA SQLScript/R procedures. One can reuse their existing R scripts (just adapting it to a function model, as demonstrated below) and graphically create their analytical models with the most complex algorithms they can imagine. And, even better, they can run these models with both a local instance of R (the so called “standalone R” scenario), usually suited for smaller datasets/prototyping, or with a served instance of R connected to a HANA appliance (the so called “HANA R” scenario, described in Blag’s blog mentioned above), which enables a very performatic execution model on big data use cases of these custom models. For the first time, the full potential of the 3,500+ R algorithms is really there to be utilized by the PA users.

And it is indeed very simple. Let’s take, for example, the R script from Blag’s blog.

All you have to do is to adapt it a little bit to encapsulate the input and output parameters in a R function syntax.

I’ve done that with Blag’s code (and I’ve also modified it a little bit in order to have a little bit more meaningful output).

Here is what the modified code looks like:

predict_tickets<-function(tickets_year) {
     period=as.integer(tickets_year$PERIOD)
     tickets=as.integer(tickets_year$TICKETS)
     var_year=as.integer(substr(period[1],1,4))
     var_year=var_year+1
     new_period=gsub("^\\d{4}",var_year,period)
     next_year=data.frame(year=new_period)
     prt.lm<-lm(tickets ~ period)
     pred=round(predict(prt.lm,next_year,interval="none"))
     result<-data.frame(PERIOD=as.character(new_period),TICKETS=0,PRED_TICKETS=pred)
     tickets_year$PRED_TICKETS<-0
     output<-rbind(tickets_year,result)
     mp<-barplot(output$TICKETS)
     axis(1, at = mp, labels = c(output$PERIOD))
     lines(c(0,0,output$PRED_TICKETS),col="red")
     return(list(out=output))
}

Here are the steps to create a custom R component in PA 1.0 SP11:

  1. Acquire the data set from any source (in my case, I tested with two documents, a HANA Online document on top of a HANA table exactly like in Blag’s blog, and a CSV document with the table content exported to a .csv file).
  2. In the “Designer” tab of the “Predict” view, click on “Add New Component” -> “R Component”./wp-content/uploads/2013/06/pa_r_component_231322.png
  3. Follow the R component creation wizard. First, in the “General” screen, give a name and description. Click on Next./wp-content/uploads/2013/06/pa_create_new_component_1_231326.png
  4. In the “Script” screen, load or paste your R script (remember to adapt to the function syntax – hovering the mouse over the ℹ symbol shows a sample code), fill the required parameters (most of them are selectable in the available dropdown boxes) and select the desired options (I’ll comment a little bit on these options below). Click on Next./wp-content/uploads/2013/06/pa_create_new_component_2_231340.png
  5. In the “Settings” screen, you define the output columns that will be available to the next step of your analytical  model. You can reuse the input columns and just add the new ones or redefine all columns from scratch. If your R function has additional input parameters that not just the input DataFrame (that comes from the previous step in the model), you’ll be able to define them here (they’ll be editable parameters when you instantiate the component in the model). Click on Finish./wp-content/uploads/2013/06/pa_create_new_component_3_231341.png
  6. That’s it. Your component will appear in the list of available algorithms and you can now instantiate it in your model (just click twice or drag it to the model)./wp-content/uploads/2013/06/pa_custom_model_231343.png
  7. Once you run your model, you’ll then be able to see the output in the Grid tab of the Results view, as you would for a regular PA model./wp-content/uploads/2013/06/pa_results_grid_231346.png

Some comments on the options shown in the “Script” screen of the custom R component wizard shown above. The “Show Visualization” option enables that any graph plotted in the R script is shown in the “Charts” tab of the “Results” view. This is very nice when you want to plot some graph types not supported by PA/Lumira yet. In my tests, however, the “Show Visualization” option would only appear for custom R components created on top of offline documents, i.e. using the local standalone R engine, it didn’t appear when creating custom R components on top of a HANA Online document, so apparently PA can’t transfer plots from the HANA R server yet. In the sample R script I’ve posted above, there are some plotting commands (barplot, lines, axis) which I used to plot simple bar and line charts in order to test the R graphs in PA, which worked nicely for me, as shown below.

/wp-content/uploads/2013/06/pa_results_charts_231362.pngFigure 9 – R graph shown in PA

But, in case you need a graph type that is supported natively by PA/Lumira visualization capabilities (and bar and line charts are), of course you’re much better served using these, since they look much nicer. 🙂

/wp-content/uploads/2013/06/pa_results_visualize_231364.png Figure 10 – PA/Lumira native visualization

And to finalize, even though the custom R component is not the only new functionality in PA 1.0 SP11, it’s definitely one of the most, if not the most, exciting ones, due to the applicability possibilities it brings to the SAP HANA/PA/R solution architecture. This blog shows a broader view of other new functionalities brought by PA 1.0 SP11. One other feature I’m also very excited about is the possibility to export HANA PAL based models as HANA SQLScript procedures. This feature enables the business analysts to not just prototype but actually deliver a complete analytical model to the IT team in an easily consumable manner (i.e. database procedures, that can be consumed by any other applications/job scheduling tools). Unfortunately, this feature apparently isn’t available to models including custom R components. I hope to see this capability very soon, because then SAP will have closed the gap to a complete and comprehensive predictive model delivery life cycle without requiring full blown developers: develop & test models in PA -> elect the desired models to HANA procedures -> consume these in corporate-wide analytical scenarios (thru for example BO 4.x frontend on top of HANA, or even on BW with Virtual Info Providers based on HANA views constructed on top of these procedures).

All in all, for me as a consultant, this is the first version of SAP Predictive Analysis that can be really considered in a productive deployment of a statistical/predictive project. Furthermore, once you consider that PA has been launched less than 7 months ago, then you notice how fast SAP has been moving with these new game changing solutions, how fast PA has evolved in these 7 months and how powerful it will be in a 1 or 2 year timeframe. It is definitely a player in the Predictive Analytics market and, together with the performance and robustness of HANA and the flexibility and completeness of R, it is a serious contender to SAS and SPSS.

EDIT:

Vishwa, product manager for SAP Predictive Analysis, has released a helpful FAQ on custom R components in PA and also a detailed step-by-step on how to create your own custom R component. Very good, Vishwa!

To report this post you need to login first.

19 Comments

You must be Logged on to comment or reply to a post.

  1. Scott Leaver

    Thoughtful perspective on Predictive Analytics as a whole, and nice overview on the new SP11 features. The ability to add your own custom algorithm adds a lot of great flexibility which will extend choices for customers.

    There’s also the ability to persist the models as SQL procedure back into HANA, which can then be called/kicked off as/when needed by a variety of different applications, custom or off-the-shelf. 

    (0) 
    1. Henrique Pinto Post author

      Hi Scott,

      thank you for the comment!

      That’s what I mentioned in the second to last paragraph, I think. It’s just that I called them SQLScript procedures.

      Best regards,

      Henrique.

      (0) 
  2. Vishwanath Belur

    Great blog Henrique.

    On your comment on PA not showing option to visualize HANA R script visualization, PA reads the visualizations from R using the R console (remember if you have multiple visualizations in your script), only last one would be seen. In the case of HANA R, the interface for PA is HANA and not R, hence, it is not possible to get the visualizations from HANA R server to show it on PA console.

    Thanks,

    Vishwa

    (0) 
    1. Henrique Pinto Post author

      Hi Vishwa,

      thank you very much for the explanation. It’s an honor to have you and Scott (product and solution management) commenting here.

      I understand why PA can’t read visualizations from HANA R, but it would still be a very nice feature if in the future, Rserve would allow for visualizations to be passed down back to HANA and HANA could somehow transmit this and send it back to PA (but of course then PA would need to have a proprietary specific interface  to HANA and not work through regular JDBC/ODBC SQL clients).

      Cheers,

      Henrique.

      (0) 
  3. Elzbieta Derecka

    Hi,

    I would like to add new custom algorithm to SAP PA 10.0.11 – R package (for example Benchmarking). In R on first step I install package. What should I do in PA to use this package? The same steps like in R?

    BR,

    Ela

    (0) 
    1. Henrique Pinto Post author

      Hi Elzbieta,

      have you read the whole blog?

      If you have a working R script, all you have to do is encapsulate it in a function syntax, define input and output parameters and return your output parameter in a list, like I have done above.

      Of course, I didn’t use the best syntax, didn’t have a variable receiving the model etc.

      If you want more details, please go through the following documents:

      Creating and Using Custom R components with SAP Predictive Analysis

      Hands-On Tutorial for creating Custom R Components in SAP Predictive Analysis

      Best,

      Henrique.

      (0) 
  4. waldemar roberti

    It is a realy good blog Henrique, and it is very good to know that SAP is moving into this predictive analytics world in a serious way with support to R.

    Best regards!

    Waldemar

    (0) 
    1. Henrique Pinto Post author

      Thanks, Waldemar!

      Yes SAP is investing a lot in Advanced Analytics, not just with PA but with HANA itself. It’s definitely one area of interest where SAP wants to differentiate itself, even planning to integrate predictive processes in the Business Suite applications.

      (0) 
  5. Esteban Burbano de Lara

    Commenting on Forrester´s assumption that SAP has less than 100 productive users of PA, I´m pretty sure that´s way underestimated. A leading Ecuadorian public agency placed an order for this product 5 days prior to it´s General Availabilty (Nov 27 2012) and a month later a large apparel retailer placed another one. I´m sure that if Ecuador has (now, June 2013) two live productive PA environments the entire world must easily out-number 100.

    (0) 
    1. Henrique Pinto Post author

      Hi Esteban,

      That was their view when the report was released (early Q1’13), so yes, it’s probably much more than that by now.

      Cheers,

      Henrique.

      (0) 
    1. Vishwanath Belur

      Hi Lino,

      I use following script for ARIMA.

      ——————–

      ## Parameters x is dataframe, periodicity is the number of time periods per year in the dataset and fcast_per is the number of periods to forecast & column in the column on which forecast needs to be done

      arimaFunction<-function(x, column, periodicity, fcast_per){

      fit <- arima(x[,column], order = c(0,1,1),seasonal = list(order = c(0,1,1),period = periodicity))

      predicted<-predict(fit, n.ahead = fcast_per)

      ##Plot the actual values

      plot(x[,column], type=”o”, col=”blue”)

      output <- cbind (x,predicted)

      return (list(out=output,modelarima=fit))

      }

      ————

      Hope you will find it useful. We are looking at publishing some of these scripts on some common place.

      -Vishwa

      (0) 

Leave a Reply