Predictive on SAP HANA: Alphabet Soup or Peanut Butter & Jelly?
Let’s face it – “predictive analytics” can be a bit of a complicated topic once you get into it. The “why” and the “what” are pretty easy to get your head around but the “how” is where the rubber hits the road. Before we even get to the complexity of which algorithms to use and how to configure them, there’s a higher level consideration to deal with first – which technologies to use and how do they fit together?
In my last article: Predictive Smackdown: Automated Algorithms vs The Data Scientist, I discussed where our Automated Analytics and Expert Analytics fit into the bigger picture, so in this entry, let’s turn our attention to SAP HANA and the predictive options available there.
Predictive Alphabet Soup?
Sometimes the options on SAP HANA look like “Predictive Alphabet Soup” because with R, PAL, and APL, it is not just the letters in the acronyms that are important but also what order they are arranged in. Unfortunately for the uninitiated, some customers these three technologies as disjoint and confusing – when do you use R and when do you use PAL? What are the differences between PAL and APL?
I have even heard from a few customers that they would like to wait until these technologies “merge” into one (hint: it’s like saying you want to wait until an apple and an orange become one fruit). A better way to look at this is to understand the pros and cons of each and how they work together. Let’s take a look at each one of them individually and then what that means overall.
R – The De Facto Predictive Language
It’s pretty impossible to read anything about predictive analysis and not hear about the open source language “R”. What is R? It is a language used by statisticians and data scientists to analyze data sets with complex mathematical algorithms. There are well over 5,800 “packages” (and growing) that implement statistical techniques, data manipulation, graphing, reporting, and more.
R is extremely popular because it is freely available, easy to extend, and there are lots of resources (and people) to learn from. But R is a statistical language made for the mathematically inclined and therefore isn’t something you just pick up a book on and learn in a few hours unless you have some background already.
SAP Predictive Analytics 2.x uses R and provides a graphical modelling environment on top to make the creation and comparison of predictive models much easier than by invoking R on the command line. You can even add your own custom R components so there’s virtually no limit to the types of modelling you can do.
SAP HANA and R
If you have SAP HANA, you can deploy an R server as a sidecar to run predictive algorithms on your data. This opens up all new possibilities because the full breadth of R’s capabilities can be unleashed on your data in HANA. However as an external system, this type of deployment requires data extraction from HANA to feed the R server which will crunch the numbers and return the results back to HANA. In addition to the obvious I/O bottlenecks involved in bringing data to an external system, you lose the parallel processing that SAP HANA is legendary for.
The SAP Predictive Analytics client tool can use R locally, but can also be used for scenarios where you want to leverage an external R server that is connected to SAP HANA.
SAP Predictive Analysis Library (PAL)
The SAP PAL is a native C++ implementation on HANA of the most commonly used predictive algorithms in data science. The goal of this library is to enable up to 80% of the common predictive scenarios that you would normally use an external R server for. Note the goal is 80% of the use cases, not 80% of the algorithms – You can imagine that with over 5,000 R algorithms in the world, there is actually a tiny fraction of them that are used very frequently.
By using SAP PAL you can leverage all the in-memory goodness and near-linear parallelism performance that SAP HANA offers to perform training, scoring, categorization, and more without your data leaving the server. So what’s the problem?
Well, if you need an algorithm that is not in the SAP PAL, you may still need to deploy an external R server. Additionally, many data scientists develop their own R algorithms – something within their skill set whereas developing those same algorithms in C++ to be deployed natively on HANA may not be.
How do you use the SAP PAL? You can call it directly in SQLScript, but fortunately SAP Predictive Analytics not only supports R, but it also supports SAP PAL – and even a combination of the two. Of course this only makes sense when you are using an external R server connected to SAP HANA as PAL itself is native to HANA.
But there’s also another thing to consider – what if you aren’t a data scientist?
SAP Automated Predictive Library (APL)
The APL is a native C++ implementation on HANA of SAP’s patented automated machine learning technologies that make Automated Analytics so cool. Instead of rehashing the benefits of Automated Analytics here, please take a look at my previous blog entry that details it more fully: Predictive Smackdown: Automated Algorithms vs The Data Scientist
You could perform automated analytics with HANA before the creation of the APL, but you would have needed to deploy a sidecar predictive server to run the automated machine learning algorithms. The APL was introduced at the beginning of 2015 to bring all of that “automagic” goodness to HANA, and just like the PAL, the APL also does not need to extract data from the HANA system to do it’s predictive magic.
You can find out more about the APL here: What is the SAP Automated Predictive Library (APL) for SAP HANA? .
Predictive Peanut Butter and Jelly (or Chocolate & Peanut Butter)
A more interesting analogy to the Gestalt Principle is the concept of peanut butter and jelly sandwiches (or chocolate and peanut cups if you prefer). Peanut butter is rich and creamy, but adding the sweetness and tartness of jelly somehow creates a magical combination that is better than either topping by itself. “PB&J” is one of the best inventions in the world.
The predictive options are pretty much like peanut butter and jelly – you can use R by itself, you could use SAP PAL by itself, or you could go the automated route with SAP Automated Predictive Library. Each has it’s own purpose but being able to use one or more of these together based on your needs is where things get very interesting:
- Need the flexibility of custom R algorithms but use SAP HANA?
- No problem, deploy R because HANA can be connected to it.
- Want the speed of HANA but still need R’s flexibility?
- Deploy both R and PAL together and do as much in PAL as you can.
- Want to have some intelligent auto-clustering algorithms but still need some hardcore data science requirements?
- Simple! – deploy PAL and APL.
Know Your Predictive OPTIONS
A prerequisite to “knowing what you are doing” is understanding what is available and what you need. Personas and use cases are usually good hints:
SAP PAL and R:
- Data Scientists and Mathematicians creating models themselves (typically by hand).
- Business and Data Analysts as well as Data Scientists wanting automatic model creation.
One reason some of our customers get confused is that they think “all predictive is the same” and assume that if their HANA system has predictive capabilities, they also have the Automated Predictive Library (APL). However the APL is part of the “Predictive Option for SAP HANA” license so you want to ensure you know if you are licensed for the APL or need to get it. A future article will go into this option into more detail.
You must resist the urge to try to rearrange the letters and assume that you can replace the APL with PAL or vice versa. Hopefully this article shows you how each predictive technology on SAP HANA has its place and they are not interchangeable.
Ofcourse SAP Predictive Analytics 2.x operates with all of these – R, PAL, and APL.
Most customers realize the ROI of their existing investment in SAP HANA can be greatly enhanced by enabling users of all types to benefit from automated predictive analytics and adopt the Predictive Option for SAP HANA . Whether you like your peanut butter with jelly or chocolate, you have to admit, it tastes pretty damn good. 🙂
Thanks for your very interesting article. Could you explain what are the limitations of R comparing to PAL and APL ?
Great Blogs Ashish,
For my clients, I am designing and deploying 'SAP PAL and R' option.
- starting point is R/R Studio on Linux and connect to HANA Server
- once stabilized and guaranteed ROI from modeller perspective and business user, I will propose wholly managed in HANA. i.e 'Deploy both R and PAL together'
Any guideline or Installation scenario available for 'Deploy both R and PAL together' option ?