When SAP HANA met R – First kiss
If you follow my blogs (I hope you do 😡 ) then you know I really love the R programming language but I also love SAP HANA 😘 and in the past I have dealt with integration between those two:
Sanitizing data in SAP HANA with R
But…those integrations were not done using the SAP way…which means, they are not supported or endorsed by SAP…
Gladly…as of today, there’s an official SAP way to do it! 😎
First, we need to read and follow all the instructions detailed in Get your own SAP HANA DB server on Amazon Web Services by the most awesome Juergen Schmerder. (Believe! It took me less than 10 minutes to get my SAP HANA running on my laptop…really…a piece of cake).
With our SAP HANA instance up and running, we can dedicate ourselves to the funny part…the #R part 😉
Go to your AWS Management Console and under Amazon EC2, launch a new instance…
You have to choose SUSE Linux Enterprise with 32 bit. I tried with 64 bit and it wasn’t funny…didn’t work and I lost a lot of time…32 bit for the win!
For the installation, you can follow this link SAP HANA Database Development Guide – Integration with R programming language, but at least in my case, I need to deal with a lot of difficulties, that gladly I’m going to write down in this blog, so you don’t have to deal with them 😆
First, we need a compiler as we’re going to compile #R from it’s source.
Get a compiler… |
---|
sudo zypper install gcc gcc-c++ gcc-fortran |
Then we need to get and extract the #R source code.
Get source code, extract and compile |
---|
wget http://cran.r-project.org/src/base/R-2/R-2.13.0.tar.gz tar zxf R-2.13.0.tar.gz && cd R-2.13.0 ./configure –enable-R-shlib –with-readline=no –with-x=no make clean make make install |
This step really takes a long time…so you better go doing something more productive in the meantime…
When #R is finally installed, we need to download and install the Rserve package.
Download Rserve |
---|
wget http://www.rforge.net/Rserve/snapshot/Rserve_0.6-5.tar.gz |
Now, we have to log into R and do the installation…
Install and test Rserve |
---|
R install.packages(“/PATH_TO_FILE/Rserve.tar.gz”, repos = NULL) library(“Rserve”) #To test the installation. If there’s no output, then it’s working fine q() |
If you get an error regarding a personal library…just say “y”.
Once Rserve is install, we need to create a config file.
Rserv.conf |
---|
vi /etc/Rserv.conf maxinbuf 10000000 Maxsendbuf 0 remote enable #Press ESC key :w #Press ESC key :q! |
Now, we have to create a user that will run the Rserve so we can connect to it from SAP HANA.
Create new user with password |
---|
useradd -m login_name passwd login_name |
For some reason Amazon doesn’t provide the password for the root user…but we might need it eventually…so just do this…after all, if your user and you’re paying for it…
Change root password |
---|
sudo passwd root #Assign a password |
Great, we’re are now ready to start our server! (You need to be logged as the new user that we create in a previous step).
Start the Rserve server |
---|
R CMD Rserve –RS-port 6311 –no-save –RS-encoding “utf8” |
Now…we’re ready to move to move to our SAP HANA server and keep configuring 😏
Configure SAP HANA |
---|
Right click on your system node at the navigator tab Select Administration Select on the right hand side the Configuration tab Select the indexserver.ini Select the calcengine #Add the following parameters… cer_timeout – 300 cer_rserve_addresses – Our R Amazon server:6311 cer_rserve_maxsendsize – 0 |
One more thing, and we’re ready to roll…go to your AWS Management Console, EC2 and then choose Security Groups. Our R server is going to be assign to “Quicklaunch-1”. Just select it and go to Inbound. And add the port “6311”.
That’s all folks…we’re officially ready to go!
On SAP HANA, create a table and call it TICKETS_BY_YEAR with the following structure:
Open a SQL Editor and copy the following code…
Fill up TICKETS_BY_YEAR |
---|
insert into “SYSTEM”.”TICKETS_BY_YEAR” values(‘20110101’,4195); insert into “SYSTEM”.”TICKETS_BY_YEAR” values(‘20110201’,4245); insert into “SYSTEM”.”TICKETS_BY_YEAR” values(‘20110301’,4971); insert into “SYSTEM”.”TICKETS_BY_YEAR” values(‘20110401’,4469); insert into “SYSTEM”.”TICKETS_BY_YEAR” values(‘20110501’,4257); insert into “SYSTEM”.”TICKETS_BY_YEAR” values(‘20110601’,4973); insert into “SYSTEM”.”TICKETS_BY_YEAR” values(‘20110701’,4470); insert into “SYSTEM”.”TICKETS_BY_YEAR” values(‘20110801’,4981); insert into “SYSTEM”.”TICKETS_BY_YEAR” values(‘20110901’,4530); insert into “SYSTEM”.”TICKETS_BY_YEAR” values(‘20111001’,4167); insert into “SYSTEM”.”TICKETS_BY_YEAR” values(‘20111101’,4059); insert into “SYSTEM”.”TICKETS_BY_YEAR” values(‘20111201’,1483); |
This table is supposed to hold the tickets sales for a given company, during each month of the year 2011. What we want to do here is to determine or predict how are going to our sales on 2012. We’re going to use #R for that matter.
Create the following script and call it “Predict_Tickets”. This script will have two Stored Procedures, called Prediction_Model and Get_Tickets.
Predict_Tickets |
---|
CREATE TYPE T_PREDICTION_TICKETS AS TABLE ( PERIOD VARCHAR(8), TICKETS INTEGER ); DROP PROCEDURE Prediction_Model; DROP PROCEDURE Get_Tickets; CREATE PROCEDURE Prediction_Model(IN tickets_year TICKETS_BY_YEAR,OUT result T_PREDICTION_TICKETS) LANGUAGE RLANG AS BEGIN period=as.integer(tickets_year$PERIOD) tickets=as.integer(tickets_year$TICKETS) var_year=as.integer(substr(period[1],1,4)) var_year=var_year+1 new_period=gsub(“^\\d{4}”,var_year,period) next_year=data.frame(year=new_period) prt.lm<-lm(tickets ~ period) pred=round(predict(prt.lm,next_year,interval=”none”)) result<-data.frame(PERIOD=new_period,TICKETS=pred) END; CREATE PROCEDURE Get_Tickets() LANGUAGE SQLSCRIPT AS BEGIN Tickets = SELECT * FROM TICKETS_BY_YEAR; CALL Prediction_Model(:Tickets,T_PREDICTION_TICKETS); INSERT INTO “TICKETS_BY_YEAR” SELECT * FROM :T_PREDICTION_TICKETS; END; CALL Get_Tickets(); SELECT * FROM “TICKETS_BY_YEAR”; |
As you can see, our first Stored Procedure called Prediction_Model, we’re using RLANG as the script language…meaning that we’re going to embed R code that will go from our SAP HANA to our R Serve and back with the modified information.
Prediction_Model is calling the Stored Procedure Get_Tickets, which is doing a select from the table TICKETS_BY_YEAR and then calling Prediction_Model to finally insert the data back into SAP HANA.
At the end of our script, we call Get_Tickets and do a select to TICKETS_BY_YEAR to verify that our script worked.
Success! Our SAP HANA and R integration work like a charm! We never left SAP HANA Studio, but our code went to the R Server and back to bring us the modified information…all in just 829 milliseconds…really fast considering that both servers are in the cloud…
That’s all for now…I will come back with more information on SAP HANA and R as soon as I can…there’s still a lot to discover and test 😎
Hi, Alvaro
I've read all of your R-related blogs. All of them are very enlightening and detailed. ➕
Alvaro,
Can you please tell me, whats the logic did you use for determining or predicting the sales for 2012.
Sorry, am asking this question because i couldn't able to understand the R-script.
regards,
Tilak
Tilak:
It's simple and R takes care of everything...
First, I do a linear model:
prt.lm<-lm(tickets ~ period)
And then the prediction:
pred=round(predict(prt.lm,next_year,interval="none"))
Here I'm passing the linear model, along with the new months for the prediction and round the results to avoid decimals...
http://stat.ethz.ch/R-manual/R-patched/library/stats/html/predict.lm.html
Greetings,
Blag.
thanks for your knowledge sharing, i also did it again follow your guide...all steps are ok, excluded some linux compiler part.. thanks:)
Hi Alvaro,
great article! Thank you.
Could you please update the 3 links:
HANA meets R
R meets HANA
Sanitizing data in SAP HANA with R
They don't seem to work,
Bernd
Bernard:
Thanks for your comments and for pointing me out to the broken links...I have fixed them...however, for the last one I need to link it to my personal blog, as it seems to be stranded in the sands of SCN 🙁
Greetings,
Blag.
Hi Alvaro,
My Rserve server is listening on port 6311 on an aws instance but Hana doesn't seem to be able to connect to it.
- I added the cer_* parameters under system. There are also a default and tenant column but I do not know how to set parmeters for them.
- I was not sure how to assign to "Quicklaunch-1". So I created a new security group called "Quicklaunch-1" on the server hosting Rserve and added the port 6311. To be sure I added the same rule to the "default" security group.
I also checked that my R instance was readily accessible using its elastic ip.
Any hints?
Looking forward to overcoming this last difficulty before using R and Hana in combination!
François
Francois:
Have you started the RServe Server? R CMD Rserve --RS-port 6311 --no-save --RS-encoding "utf8"
About the default and tenant columns...forget about them...they're not needed...
The security group should be fine...as long as it linked to the server...
When you log into the R Server you should see that the RServe Server is up and running...
Greetings,
Blag.
Alvaro,
Thanks for your early reply.
Yes, I did use the command : R CMD Rserve --RS-port 6311 --no-save --RS-encoding "utf8"
According to netstat Rserve was up and running, listening on port 6311,
I used this port for defining the security group on the R instance on the one hand and for setting cer_rserve_addresses - <ip elatisc address of R instance>:6311 on the Hana instance.
Is it necessary to create a "Quicklaunch-1" security group. How can we be sure that our instance is assigned to this group? By default there seem to be three: a "deafult" and two "emr-xxx".
Thanks for any suggestions you may have,
François
PS And also a newby question about Hana : I wondered why the test tables were created in the SYSTEM schema. I have not yet looked into how to create a new schema.
Francois:
There's no need to create "Quicklaunch-1"...that's just the name I kept for my instance...you just need to modify the security group assigned to your R instance and make sure that the port 6311 is opened.
What's is the script your using to connect to R? (The one on the blog I assume...)
Well...my assumption (as I didn't create it) is that we used SYSTEM to have something more generic...the SAP HANA instance is an image so we needed to put the tables somewhere...and after all, we're managers of our own SAP HANA Servers 😉
Greetings,
Blag.
Thanks Alvaro,
It works! 🙂
The problem had indeed to do with the security groups. To be sure I temporarily set the port 6311 for all of them (defaulyt may suffice ; I will try).
I should learn more about security groups (how to assign an instance to a security ; knowing which security group is in force when several have been define, etc.)
To be sure that my R server was accessible I used pyRserve, a nice Rserve client.
Again thanks for your post which was very helpful,
Francois
Security groups automatically created by Amazon for my instance :
default group
Master group for Elastic MapReduce
Slave group for Elastic MapReduce
Are
Francois:
Awesome to know that you make it work! 😀 Happy SAP HANA-R programming 😉
Yeah...Security Groups are not very obvious...I'm not totally sure about them myself LOL
Greetings,
Blag.
Thanks Alvaro , it finally worked,
my vi commands where
vi /etc/Rserv.conf
# press i for insert
maxinbuf 10000000
Maxsendbuf 0
remote enable
#Press ESC key
: x
#Press enter
Why does the Linux -/ R Server accept the request without authentication (since I don't put in any user / password)?
Is this done through PUTTY pageant and the corresponding key? Probably not since they are local...
Sorin!
So glad that you make it work 🙂
Regarding the R Server...as you're using RServe as server, it will accept anonymus request as you're not accessing the OS...and when you access the OS you're passing a Pair Key, so no password is needed 🙂
Greetings,
Blag.
Hi Blag,
Is there a way to access Analytical views/Calculation views from R. Or is the access limited to tables only.
Also if i have a table with data:
Doc,Customer,Amount
10001,Cust01,10
10002,Cust01,20
10003,Cust03,30
and use the barplot function with cust & amount, I see three bars: Cust01-10, Cust01-20, Cust03-30.
I was expecting Cust01-30, Cust03-30.
Data does not get aggregated automatically in R.
Have you faced similar issues?
Thanks & Regards,
Rahul Rajagopalan Nair
Rahul:
R is not actually accessing any table...we're using a Store Procedure to pass the information to it...in this two lines...
Tickets = SELECT * FROM TICKETS_BY_YEAR;
CALL Prediction_Model(:Tickets,T_PREDICTION_TICKETS);
Fist we do the select and the we pass it to the R procedure...
To read from a Calc or Analytical view...you should do it like this...replacing the fist line for this one...
Tickets = CE_CALC_VIEW("_SYS_BIC"."blag/CV_TICKETS",["PERIOD","TICKETS"];
Were "blag" is the name of the package and "CV_TICKETS" is the name of the Calculation View...so in the end you will have...
Tickets = CE_CALC_VIEW("_SYS_BIC"."blag/CV_TICKETS",["PERIOD","TICKETS"];
CALL Prediction_Model(:Tickets,T_PREDICTION_TICKETS);
Regarding your second question...barplot is a graphical function...it will do any aggregate by itself...and BTW...it's not going to generate any graphic is you call it from SAP HANA, as the R server is a headless one...meaning...no graphics output...
In pure R...you should do it like this...
doc<-c(10001,10002,10003)
customer<-c("Cust01","Cust01","Cust03")
amount<-c(10,20,30)
tab<-data.frame(doc,customer,amount)
tab_sum<-aggregate(amount~customer,data=tab,FUN=sum)
barplot(tab_sum$amount,names.arg=tab_sum$customer)
Using the "aggregate" function to summarize the data you need.
Greetings,
Blag.
Hi Blag,
Thanks for the response. Let me repost this in your previous blog wherein you had initially accessed HANA tables in R.
BTW your solution for using aggregate worked. thanks!.
Best Regards,
Rahul R Nair
Glad to hear 🙂 And don't worry, will check you question on the other post...
Greetings,
Blag.
Hi Blag,
I have heard about RHANA package. I tried to search it on CRAN & SMP. But not able to find it. Do you know if it has been released, if yes then where can i find it?
Thanks & Regards,
Rahul Rajagopalan Nair
Hi Rahul:
Well...if I'm not wrong...and I might be 😛 RHANA was developed internally and was never published outside SAP, I think it might have been discontinued...also, it was replaced by using the option detailed in my blog...SAP HANA with an R Server running RServe.
Greetings,
Blag.
As always, thank you for your quick response Blag.
I came across this document http://datatable.r-forge.r-project.org/randsaphana-dkom.pdf which stated that RHANA was a preferred option over RODBC. Too bad it didnt work. I totally understand that it would have taken a lot of effort to get a HANA specific R package ready and publish it on CRAN. Appreciate the team's effort.
Yes, we are looking forward to test Rserve package. Waiting for the SPS5 upgrade on AWS.
Thanks & Regards,
Rahul Rajagopalan Nair
Hi Blag,
First of all thanks for your post.... its really wonderful. And beauty is it worked.... 🙂
Now, going step further I have a question. For some forecasting models, we need to use forecast packages in R... its only after installing right forecast package and then loading it during R program execution we can make use of certain R functions....
A use case that I am trying involves such forecasting technique... with R I could make it work however, when I try to execute the same code I get errors ....
Here are some insights into whats happening:
1. If I dont load the forecast package using library("forecast")
-- System issues error saying forecast functions I am using are not available.
2. If I use library("") - Just to isolate what problem might be...
-- System issues error saying Invalid package
3. If I use correct package library("forecast")
--
" ....(at pos 88): GenericFailure exception: column store error: search table error: [34084] Receive error: get result error.;An internal error in R client occured "
Do you have any idea what might be wrong? or you know any elegant way of installing packages in R via SQL scripts?
Thanks,
AO.
Akshay:
Well...if you have installed the library on your R server...then it should work...and actually when you got an error on the SAP HANA Studio side, you can switch to your R server and see what's the error message in the log.
I have no tried the "forecast" library yet...but will do it tomorrow when I get back to Montreal...most surely together will be able to pinpoint any error and fix it 😉
Greetings,
Blag.
Hello Blag,
Great to see your reply... 🙂
Actually here is what I did,
1. In R, I mostly get the required packages via CRAN mirror servers using neat UI and then load them in scripts using command - library("package_name")
- This is, of course not possible when doing it via SQLScript
2. Packages that I have put are via ftp servers - this is definitely wrong way as I think it wont install all necessary dependent objects like it will when installation is called within script itself
3. Also, maybe versions of package might be issue as well....?
Lets try to knock this off..... thanks again.
AO.
Akshay:
Well...what I did for my R Server, was to log into R and run this simple command...
install.packages("library_name")
Which is the same as using RStudio installation package option...that way...you guarantee that everything is going to get installed correctly...
Greetings,
Blag.
Blag,
Thanks for the reply. With install.packages("forecast") - we get the error that forecast package is not available.
Typical message is:
Selection: 80
Warning message:
In getDependencies(pkgs, dependencies, available, lib) :
package âforecastâ is not available (for R version 2.13.0)
>
The SUSE Linux has the R version i.e. 2.13.0
The code that worked for me on my local R environment is on R version 2.15.2
So I decided to go ahead with latest R on SUSE Linux on AWS. Looks like best way to appraoch is keeping both versions same, that way atleast debugging will be easier.
Will install 2.15.2 on AWS and try again.
Let me know you have other thoughts. Thanks!
AO.
Hello Blag,
So, made some progress on this topic:
1. First, I got the same version as the one I am working on my local machine. So, even AWS has the same R version.
2. On AWS R server, I wrote R code and saw the output. Its coming up correct.
3. Now, here is the tricky part: When I execute code via SQL script, I get internal server error. This error happens only when I try to loop back the result the way you have in your example and update the table in HANA.
For example,
I get two vectors in R, one vector for time buckets and other for forecasts (similar to Period and Tickets column ) in your example.
Now, I bind them using cbind, so I get one matrix. Finally, I apply data.frame(matrix) to get the resultant data frame that has two columns, one for time periods and other for forecast.
If I put something like print(dataframe) in R code of SQL script, I see output on Rserve. But, if I remove print command and expect to have feedback loop so that I can insert into table. I get internal R error..... so, maybe I am doing something wrong such as having wrong type or assigning data frame correctly etc.
Any idea on what might go wrong if I intend to receive dataframe from R in SQL script? Thanks!
AO.
Akshay:
Ok...let's try this...as you can see in my example, I'm doing this...
result<-data.frame(PERIOD=new_period,TICKETS=pred)
So I'm actually assigning the names of the column from the table into the data.frame...maybe you're only doing this...
apply data.frame(matrix)
When you should do something like this...
a<-c(1,2,3)
b<-c(4,5,6)
ab<-cbind(a,b)
abc<-data.frame(COLUMN1=ab[a],COLUMN2=ab[b])
This way, your data.frame will have the table's column names...
Greetings,
Blag.
Perfect Blag..... Its going through.... 🙂
Now, I am all set to try some advanced use-cases in this area. Will buzz here just in case I get lost 😉
Thanks!
AO.
Hello Blag
First of all I have to compliment you on your blogs. They are all really interesting and easy to understand!
In your example, you are using R for calculations, but I want to do data mining with it. For this, R offers many different libraries. But my problem is: How can I return the knowledge which I have mined back into HANA? As you already mentioned, the R server is headless without graphics output. The only method I came up with is to output the results into files, but these do not look very nice and cannot be used for future data minings. Do you know of any example for data mining with HANA? Maybe even without R?
Ronja:
Well...if you read my blog http://scn.sap.com/community/developer-center/hana/blog/2012/06/25/when-sap-hana-met-r--bring-home-your-graphics you will see that there's an option...create the graphics using the Cairo Library and send them by email...also, you can save the information on SAP HANA and the exported to a CSV file and explore it on SAP Visual Intelligence like I say on my blog http://scn.sap.com/community/developer-center/hana/blog/2013/02/18/when-sap-hana-met-r--whats-new
I'm not aware of any other options...
Greetings,
Blag.
Hey Blag,
thank you for your fast reply! I actually did look through the "Content" of your profile but apparently I didn’t get this far 🙂
The first link is actually what I was looking for… Now I’ll only have to figure out how to email 🙂 Thank you so much!!
How to sent the email is actually on the blog 😉
command<-"uuencode Tickets.png Tickets.png | mail -s 'Tickets December Report' atejada@gmail.com"
system(command,intern=TRUE)
It's part of the code...basically, you use the uuencode command to send the email from the Linux box 🙂
Greetings,
Blag.
Hi Alvaro,
i find your blog really good 😉
I am abou to install a Linux Host for R but i dont't know exactly the recommended values of the Linux Host
We are going to install Linux Host in a VM and at the moment we have low capacity.
I need to know th following parameters for the Linux Host:
1. how much memory do i need for R and RServer (min) ?
2. how much disk?
3. how many Processors ?
Thank you in advance
Regards,
Vjola
Vjola:
To be honest...I don't know...R can be installed on any machine...of course, the amount of memory and processors will affect the performance but will not prevent you from running it...in terms of space...the installation takes from 75 to 100 Mb...so I don't think that's a real issue...
While you install R and Rserve on a server with nothing else installed on it...you should be fine...
Greetings,
Blag.
Hi Alvaro,
I've tried your code, however when I try to define the procedure Get_Tickets(), it says
* 257: sql syntax error: line 1 col 87 (at pos 87) SQLSTATE: HY000
hdbsql HDB=> > * 7: feature not supported: Parameterized input table parameter is not allowed: line 1 col 23 (at pos 22) SQLSTATE: HY000
hdbsql HDB=> > * 1309: invalid use of table variable: line 1 col 45 (at pos 44) SQLSTATE: HY000
hdbsql HDB=> > END;
* 257: sql syntax error: incorrect syntax near "END": line 1 col 1 (at pos 1) SQLSTATE: HY000
Can you see what is wrong?
Cheers,
-- Micha
Hello Micha:
Well...this blog is almost from a year ago...so maybe something has changed in the newest SAP HANA Studio revision...will try to find some time to try to reproduce it....
Or maybe it's because you don't have the R Server well configured or Rserve is not running correctly...have you follow all the steps? SAP HANA and R should installed on different machines...
Greetings,
Blag.
Ok, I figured it out. Entering your code into Hana Studio works, entering it into hdbsql gives this error. Note that the error is not in the R procedure but in the SQLScript. Looks like a bug in dbsql to me...
-- Micha
Micha:
I haven't use hdbsql...but my assumption is that the Studio holds the connection information to the R server...while the hdbsql doesn't...there might be some way to make it work...but that might take some time to investigate...
Greetings,
Blag.
Hi Blag,
did you ever got it to work with SLES 64bit?
Or at least found out the root cause of the errors you were getting with 64bit?
Thanks,
Henrique.
Henrique:
Not really...once I had it working on 32bit...I just forgot about the 64bit...will try to find some time next time to do some testing... 😉
Greetings,
Blag.
Hi Blag,
with your and Juergen's invaluable help, I was able to install Rserve 1.7-0 and R 2.15.3 with success. 🙂 At least the "hello world" script above executed successfully. 😀
Now I'll make some more tests with Predictive Analysis calling HANA R procedures.
Thank you very much!
Cheers,
Henrique.
Great blog - thanks1
Hallo Alvaro,
I have installed a trial 60 Day version of Linus SUSE 11 SP2, but the gcc-fortran is not in the Server installed.
Where can I find a installation of FORTRAN
Regards,
Vjola
In a fresh installation, developer compilers are never installed...you need to use this...
sudo zypper install gcc gcc-c++ gcc-fortran
Greetings,
Blag.
When I ran the following SQL Statements on SAP HANA Studio, I got the error message: Could not execute 'CALL R_test("xytest") WITH OVERVIEW' in 42 ms 418 µs.
SAP DBTech JDBC: [2048]: column store error: search table error: [34081] Connection error: Unable to connect with Rserve.;None of the Rserve can be connected: <host>:<port>:crypted: Authentication with Rserve failed;
although the host (Elastic IP) and port number (in Security Group) are correct, R and Rserve are running when checked under PuTTy. If the last syntax "CALL R_test("xytest") WITH OVERVIEW;" is not included, no error message.
I am not sure how to "Create new user with password" and "change root password" (http://scn.sap.com/community/developer-center/hana/blog/2012/05/21/when-sap-hana-met-r--first-kiss,
* in
useradd -m login_name
passwd login_name
what is the login_name?
* in
sudo passwd root
#Assign a password
what is the purpose to assign a password to root?
* How can I make my SAP HANA Studio to recognize these passwords?)
Your suggestions on how to solve the issue is appreciated!
DROP TABLE "xytest";
CREATE COLUMN TABLE "xytest"("xx" INTEGER, "yy" INTEGER);
DROP PROCEDURE R_test;
CREATE PROCEDURE R_test(OUT xytest "xytest")
LANGUAGE RLANG AS
BEGIN
xytest = as.data.frame(cbind(xx=11:20,yy=21:30));
source("/tmp/R/test.r");
END;
CALL R_test("xytest") WITH OVERVIEW;
I don't think you can the R procedure "WITH OVERVIEW"...but you should be getting a different error message...
Are you under a firewall that might be preventing SAP HANA to contact the R Server? Also...you cannot use the root user to call the RServe process...you need a new user...
useradd -m login_name
passwd login_name
what is the login_name?
login_name is simply a user name defined by you...it can be...
useradd -m beiwei
sudo passwd root
#Assign a password
what is the purpose to assign a password to root?
The purpose is to be able to do some configurations that might need root authorization...let's say you log in with your R user...without the root password there aren't so many things you can change...
* How can I make my SAP HANA Studio to recognize these passwords?)
Well...SAP HANA Studio got nothing do to with this...you start your R Server and then SAP HANA Studio access it...as simple as that...
In your code...why are you calling the source code of test.r? I don't really see the point...also as you can see in my example...you need to insert the data into the table as simply assigning it doesn't populate it...
Greetings,
Blag.
Developer Experience.
Hi Alvaro!
First of all I want to thank you for sharing your knowledge with us.
I would like to ask you if you think that the Rserver would run also on a "simple" SUSE server.
Thanks in advance for your answer.
Kind regards,
Vincenzo
Vincenzo:
Well...of course I haven't tried it...but I don't see why not...as far as is a server...it should be fine 🙂 SAP HANA will simply call it and maybe it will be a little bit slow...but should work...
Greetings,
Blag.
Developer Empowerment and Culture.
Thanks a lot!
We will try with both the 64bit and the 32bit.
Great! Please let me know the outcome 🙂
Greetings,
Blag.
Developer Empowerment and Culture.
Dear Alvaro,
We tried with an OpenSUSE server 64bit and the integration worked!
Thanks again for your post!
Vincenzo:
Awesome! Glad to know that my post was helpful 🙂
Greetings,
Blag.
Development Culture.
Hi Blag, sorry to get back after such a long time with a (hopefully simple) question: Since R is keeping all of its data inside RAM, all analyses are limited by the RAM available on the Linux machine, aren't they? Or is it possible to send chunks of data from SAP HANA in order to overcome RAM limitations on the Linux machine?
Ronja:
R doesn't keep the data in RAM...SAP HANA does 😉 SAP HANA sends the data to R and since you're using a server to run R, then you don't need to worry to much about RAM...I haven't tried with a really heavy load yet...but all my tests were successful and R responded very well as expected 🙂
Greetings,
Blag.
Development Culture.
Hey Blag, R is also keeping all of its data in RAM (except if you are using special packages like ff). I am doing regression analyses which need a lot of data, but I guess I will have to use workarounds
Ronja:
Yes...you're correct...my mistake 🙁 I need to stop assuming that only SAP HANA work in-memory 🙂 For your problem...maybe some Big Data packages or adding Hadoop to the equation might work...
Greetings,
Blag.
Development Culture.
Hi Blag,
great blog, i like how you describe every step very detailed.
I recently installed R myself and therefore I noticed something important about your Rserve configuration file (Rserv.conf):
What this actually means is described here: Rserve - Binary R server - RForge.net
"The maxinbuf specifies (in kilobytes) the maximal allowable size of the input buffer, that is the maximal size of data transported from the client to the server."
So for your server, it is about 1 GB. You probably chose this value, because it's mentioned in the SAP HANA R Integration Guide. But there is an explanation below in this document:
"The value 10000000 is merely an example. We recommend that you set the value of maxinbuf to (physical memory size, in bytes) / 2048. For example if you installed R on a host with 256 GB of physical memory you should set maxinbuf to 134217728."
I thought, it might help 🙂
Greetings,
Steve
Steve:
That's right 🙂 I didn't even bother to change the maxinbuf value...I of course read the HANA R Integration Guide and leave the value like that...thanks a lot for taking the time to explain this...really appreciated 😉
Greetings,
Blag.
Development Culture.
Great article,
I heard somewhere that R can be triggered several times parallel from HANA.
How I can do this. Must I change my Procedure somehow for this?
Matthias:
I have never done that...so I don't know...but I guess the idea would be to actually make R call parallel processes...and that can be achieve by using the "Parallel" package https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf
Greetings,
Blag.
Development Culture.
Hi Matthias
the connection to RSERVE is triggered once per R-POP (plan operator) in a calc scenario.
You can think of one HANA R procedure as a single POP in this case.
So, depending on your overall scenario, these POPs may be called in parallel.
Each of these connections to RSERVE will spawn a separate independent R session.
All this happens automatically and no special coding is required here.
For the parallel execution within R there are literally tons of approaches, the parallel package mentioned by Blag is one of them.
This blog is getting cross referenced so much it should serve as a central repository for R+HANA discussions! 🙂
Has anyone tried a flavour of Linux other than SUSE for hosting R/RServ? Even if it isn't supported, I am wondering if something like Ubuntu might still work well enough for some experimentation, or whether this just would not work?
Regards
Ray
Hello Ray:
Thanks for your kind words 🙂
To be honest...I haven't try it 🙁 I guess it should work fine...as long as it's a server...let's see if anyone else has done that already...
Greetings,
Blag.
Development Culture.
Hi Ray,
I can confirm it works on Ubuntu 14.04 LTS.
Regards, Michael
Hi Blag,
I am trying to connect HANA server with Revolution R-serve (windows based). I am getting error -
Could not execute 'call "GET_TICKETS"()' in 20.415 seconds .
SAP DBTech JDBC: [2048]: column store error: [2048] "GET_TICKETS": line 5 col 1 (at pos 104): [2048] (range 3): column store error: search table error: [34081] Connection error: Unable to connect with Rserve.;None of the Rserve can be connected: 172.21.99.48:7400: The connection closed unexpected;
This procedure works fine when I am connected to a different R-serve, so the issue is related to R-HANA integration. I am trying to find root cause and below are my findings -
1) Revolution R and R-serve are installed on a separate Windows Based server. The guide says "This guide assumes an R installation on a Linux system, preferably SLES Linux; no other R hosting environments are currently supported."
Are we sure that windows based R environment is not compatible for HANA-R integration?
Is this mentioned in PAM for HANA? If yes, can you guide me to the link.
2) I don't think firewall or port opening or security group is an issue. I am able to Telnet. Can you guide me on making an entry for the host name at remote server level as mentioned @ Not able to connect to RServe from Hana Studio
3) I am checking the user running R-serve process on the server. My hunch is it is currently being run using default user. I will try to run through a different user. But how does it make a difference?
Looking forward to your inputs. Thanks.
Sameer
Hello,
Do you know what interface is allowing the communication R and SAP HANA DB ? is it JDBC or RFC or something else ?
Best regards,
Imane
Hello Blag,
Ibran Here,
Thanks for this blog its great, it worked for me i can connect R from Hana express edition.
Our issue is when we call r with small amount of data then there is no problem we were getting response every thing and when we call r with large amount of data then from r we are not getting any response and r is becoming non responsive after 10 min. Should we do any configuration in R server?
Please help me in this,
Thank You.