We have a Chinese version of this blog.
1 R language introduce
R language is is an GNU project which based on S language, it can be treated as an implementation of S language. It is firstly developed by Ross lhaka and Robert Gentleman in the University of Auckland, New Zealand, it mainly used for statistic computing, graphing, and data mining.
Since SAP HANA SP5, it is has been enforced greatly with the integrate of memory computing and the statistic function of R language. This enables you use R as a procedure language, and call the function of R. The data exchange between SAP HANA and R is very efficient, because they all use the column storage style. The communication process between SAP HANA and R is shown below:
In order to execute R code in SAP HANA, the R code is writen as a procedure with RLANG. It is executed
by the external Rserve.For the support of R operator, The calculate engine inside SAP HANA has been extended, for the given input object, after some computing , output with a result table. Differently with the local operator, R operator can be processed by R function, when the calculate engine recognized the R operator, the R client will send a request to the Rserve, and at the same time send the parameters needed by the program, then it will begin to execute, and result data frame will be send back to the calculate engine.
currently, there are some restrictions:
1. In RLang, the input parameter can only be table types, so if you want to pass a scalar type, you need to encapsulate it into a table,.
2. the name of variable in RLang Procedure can not contain uppercase letters.
3. the RLang procedure must return at least one result ,in the form of data frame.
2 The installation of R
The installation in windows is very simple, just download the corresponding installation package and double-click the set up program and go on with next. The following will focus on the installation in Linux platform. Before installation please make sure that the related software packages exists:
xorg-x11-devel: for the support of X window
gcc-fortran: build environment
readline-devel: When using R as a standalone program.
libgfortran46: SLES 11 sp2
Then download the R language source package(R-2.15.0 has been tested), and decompress it, run the following command:
if the installation is successful, execute R command in the shell ,you can start the interactive interpreter of R, as shown in the following figure.
3 Integrate it with SAP HANA
(1) Install Rserve
Rserve is a server side of R based on TCP/IP. After start R, execute “install.packages(“Rserve”) “, the program will prompt you to select the image, and then it will start downloading and install it. Of course , you can also download “Rserve.tar.gz ,and execute “install.packages(“/path/to/your/Rserve.tar.gz”,repos=NULL) “, this will also lead you to install Rserve.
After the installation ,please edit “/etc/Rserve.conf”, and the following content:
then launch Rserve:
/usr/local/lib64/R/bin/Rserve –RS-port 30120 –no-save –RS-encoding utf8
(2) configure SAP HANA
Start SAP HANA Studio, chooseManage View, Configuration tab, navigate to indexserver.ini -> calcEngine, add the following parameters:
(4) simple test demo
The following demo will compute the square of a column of prime:
execute this procedure, the result is like this:
Because currently the RLANG procedure only support table type parameter, but many times you need to input a scalar argument. When this happens, you can use “select from dummy” to generate a temp table.
CREATE PROCEDURE WAPPER_WEIBOSOHU(IN keyword NVARCHAR, INcrawltime INTEGER,OUT result WEIBOSOHU_TYPE)
inputinfo=select :keyword AS “keyword”,:crawltime as“crawltime” from DUMMY;
CALL fetch_weibosohu(:inputinfo, :result);
For example , the procedure above, the outer WAPPER_WEIBOSOHU procedure is a SQLSCRIPT procedure, the fetch_weibosohu is a RLANG procedure, we use “select from dummy ” to generate a temp table, and pass it to the inner procedure.
[Note: The test case for this article is based on SAP HANA SPS07 revision 70.00]