Working with R integration in HANA 2.0 SPS02
R Integration for SAP HANA has been around for a number of years. In fact we published a set of video tutorials to the HANA Academy R Integration for SAP HANA playlist back in 2013!
The capability simply “worked” so there wasn’t really any need to re-visit the subject – well not until now!
With HANA 2.0 SPS 02 the way in which R servers are configured has evolved – making it a good opportunity to revise the HANA Academy content on this topic.
With SPS 02, R servers may now be configured as remote sources, which means an administrator can authorize a given user to access one or more explicit R servers. Previously all registered R servers were available to all R users.
-- authorize user for R script GRANT CREATE R SCRIPT TO RUSER; -- create a remote source (need to customize server and port values) CREATE REMOTE SOURCE "Rserve" ADAPTER "rserve" CONFIGURATION 'server=0.0.0.0;port=30020'; -- assign remote source to user ALTER USER RUSER SET PARAMETER RSERVE REMOTE SOURCES = 'Rserve';
If you’ve worked with the SAP HANA External Machine Learning library (aka EML) then you may already be familiar with the use of remote sources – as the EML takes a similar approach when registering TensorFlow servers.
You’ll find a hands-on video tutorial that covers getting started and creating remote sources here.
Nothing has really changed in how to use R integration. With SQL Script, you create and run a stored procedure with embedded R script that’s transferred to and executed on the remote R server on your behalf. Any input/output tables are automagically packaged up and sent to/received from the R server as well. There’s a tutorial showing a simple k-means clustering example here.
-- create stored procedure with R script CREATE PROCEDURE "R_CLUSTER" (IN data "T_DATA", IN params "T_PARAMS", OUT results "T_RESULTS") LANGUAGE RLANG AS BEGIN library(cluster) clusters <- kmeans(data[c('LIFESPEND','NEWSPEND','INCOME','LOYALTY')], params[params$NAME=='CLUSTERS',]$VALUE) results <- cbind(data[c('ID')], CLUSTER=clusters$cluster) END;
One thing we’ve never shown before was how to utilize R integration in the context of a web-based application based on microservices. So now that we have XS Advanced and Web IDE for HANA, we’ve created a tutorial for that using the same k-means clustering example.
An important aspect to remember with XSA is that it’s necessary to (indirectly) authorize the HDI container object owner to create and run R script. This is easily done by authorizing CREATE R SCRIPT on _SYS_DI_OO_DEFAULTS as follows:
-- XSA : authorize HDI container owner for R script and also authorize technical users GRANT CREATE R SCRIPT TO "_SYS_DI_OO_DEFAULTS" WITH ADMIN OPTION;
Don’t forget to specify WITH ADMIN OPTION otherwise the authorization won’t be propagated onto the object owner technical user for your generated HDI container.
Also, you need to ensure the HDI container object owner technical user is authorized to access the remote source – something like this:
-- XSA : assign remote source to HDI container owner ALTER USER RAPP_HDI_CONTAINER_1#OO SET PARAMETER RSERVE REMOTE SOURCES = 'Rserve';
There’s already a tutorial showing how to set up R and Rserve on SUSE linux – based on on the section in the R Integration the reference guide. However it’s a tad complex as it involves compiling R – great for productive scenarios.
To make it easier just to get R and Rserve running for test and development purposes (to work alongside SAP HANA, Express Edition for example) we’ve published a new video tutorial showing how to set up R and Rserve on CentOS in just a few minutes. In our example we use Google Cloud Platform as there’s currently a free trial – but you could do the same in any landscape.
There are still one or two areas we didn’t get time to cover yet – such as how to set up authentication between HANA and R which is crucial for productive scenarios. Hopefully we’ll get around to this soon.Let me know in the comments below if this is something you’d like to see. UPDATE 10Oct17: a video tutorial covering authentication is now available.
Should you be attending SAP TechEd please drop by our SAP HANA Academy event on the Monday. A full day of lecture/demo and hands-on sessions where we’ll be covering predictive and other hot HANA topics – and unlike YouTube you can interrupt us to ask questions! Attendance is free and we’ll help you get started with SAP HANA, express edition on the Google Cloud Platform using the free trial – so you can keep your work afterwards.
If you’re interested to learn about what’s new with HANA 2 SPS 02 in general check out the following blog.
Have fun with R and HANA!
The SAP HANA Academy provides free online video tutorials for the developers, consultants, partners and customers of SAP HANA.
Topics range from practical how-to instructions on administration, data loading and modeling, and integration with other SAP solutions, to more conceptual projects to help build out new solutions using mobile applications or predictive analysis.
For the full library, see SAP HANA Academy Library – by the SAP HANA Academy
For the full list of blogs, see Blog Posts – by the SAP HANA Academy