we also have a Chinese version of this blog.

 

        From this article on, I will research the principle to combine HANA with R. After the SAP D-code meeting, I figured out that so many related applications are using R, especially the application about analysis and prediction. So I want to study the details deeply. These articles need the experience of R, and you had better have used R in SAP HANA. You can find related documents on https://help.sap.com/hana/SAP_HANA_R_Integration_Guide_en.pdf, and you can also read my another document on http://scn.sap.com/community/chinese/hana/blog/2014/02/14/r%E8%AF%AD%E8%A8%80%E5%8C%85%E5%AE%89%E8%A3%85%E5%B9%B6%E5%AE%9E%E7%8E%B0%E4%B8%8Ehana%E7%9A%84%E6%95%B4%E5%90%88

        These documents will show the bi-directional data flows that SAP HANA communicates with R. Then you can design more efficient procedure on SAP HANA with R and it also can help you to figure out the reason for your problem. You can check out logs for R, and you can even integrate R with your own applications in TCP/IP if you can support TCP/IP.

(1)       Embedded R’s execution environment

            Because R can support an embedded execution environment, so SAP HANA can integrate with R. It means that if you have installed some specified  libraries, you can add R programs into C programs.

QQ截图20140413175009.png

          Under R’s installation directory (such as /user/local/lib64/R), there are some head files providing some functions’ prototype and a dynamic-link library file- libR.so.  It can run R program with C with these files’ support. For example


#include <stdio.h> 
#include "Rembedded.h" //header file
#include  "Rdefines.h" 
int main(){ 
        char *argv[] = { 
                "REmbeddedPostgres", "--gui=none", "--silent" //arguments 
        }; 
        int argc = sizeof(argv)/sizeof(argv[0]); 
        Rf_initEmbeddedR(argc, argv); 
        SEXP e; 
        SEXP fun; 
        SEXP arg; 
     int i; 
        fun = Rf_findFun(Rf_install("print"),  R_GlobalEnv); 
        PROTECT(fun); 
     arg = NEW_INTEGER(10); 
         for(i = 0; i < GET_LENGTH(arg); i++) 
             INTEGER_DATA(arg)[i]  = i + 1; 
        PROTECT(arg); 
        e= allocVector(LANGSXP, 2); 
     PROTECT(e); 
     SETCAR(e, fun); 
     SETCAR(CDR(e), arg); 
        /* Evaluate the call to the R function.Ignore the return value. */ 
     eval(e, R_GlobalEnv); 
         UNPROTECT(3);   
     return 0; 
}  

       This code mainly defines some functions and macro in R language kernel. Firstly, initialize an embedded execution environment through calling Rf_initEmbeddedR(argc, argv). SEXP represent a kind of pointer which point some internal data structures. (refer to R’s source code with R-2.15.0/src/main), then it defines an array arg values from 1 to 10. Lastly, execute the function print() with eval(e, R_GlobalEnv).

            We can compile these codes with command:

gcc embed.c  -I/usr/local/lib64/R/include -L/usr/local/lib64/R/lib –lR

            -l: the path of head files;

            -L: the path of dynamic-link library;

            -IR: with this parameter, it can link to libR.so

QQ截图20140413185129.png

        As we can see, the result is similar to R.

        Because of this, based on embedded R execution environment, we can create an R server as a TCP/IP server. It can accept request from TCP/IP client, execute the R program and return the result to the client. This is the primary reason for developing Rserve

         Above all is the basis of combination of SAP HANA and R.

(2)     Introduction for Rserve

          Rserve was born on 10.2003, the newest version is Rserve 1.7-3 which was published on 2013. The writer is Simon Urbanek(http://simon.urbanek.info/) who is doing some research work on AT&T labs. We can download Rserve server and client from http://www.rforge.net/Rserve/, and we can get more details about Rserver on this website.

            Server is implemented with C. It can accept request and data from client, then it will return the results to client after calculation. The client provides C++, java and PHP version. Speak of this, the C++ interface only provides basic functionality, with author’s own words, “This C++ interface is experimental and does not come in form of a library”. It is just experimental which only some basis data structures, like lists, vectors, and doubles. For some other types, you need to design it by yourself. Just as “Look at the sources to see how to implement other types if necessary”

            However, in SAP HANA’s R client, it is implemented by C++. But it is more complicated than the original C++ interface. Actually in theory, you can implement any kind of clients with any language if you get TCP/IP’s support.



(3)      Message-oriented communication protocol:QAP1

          QAP1(quad attributes protocol v1) is applied for Rserve to communicate with clients. According to QAP1, the clients should send a message first, which contains specific actions and some related data, then it will wait for the response message from server. The response message should contain the response code and the result data. As the structure of the response message, it contains a header portion whose size is 16 byte and data portion. The structure of header is as follows:


             Offset                     type       meaning

             [0]                            (int)       the type to request and response

             [4]                            (int)      set the length of message(0 to 31bit)

             [8]                            (int)       set the offset of data part

             [12]                          (int)       set the length of message(32 to 63 bit)


          The data portion of the message may contain some additional parameters, such as DT_INT, DT_STRING or other types of parameters. Specific reference Rsrv.h .

         Here are some commands which Rserve support,

command                 parameters         | response data

  CMD_login                DT_STRING      | –

CMD_voidEval           DT_STRING     | –

CMD_eval DT_STRING or  | DT_SEXP

DT_SEXP

CMD_shutdown        [DT_STRING]   | –

CMD_openFile            DT_STRING     | –

CMD_createFile          DT_STRING     | –

CMD_closeFile              –                        | –

CMD_readFile                [DT_INT]        | DT_BYTESTREAM

CMD_writeFile     DT_BYTESTREAM  | –

CMD_removeFile      DT_STRING       | –

CMD_setSEXP             DT_STRING,     | –

                                               DT_SEXP

CMD_assignSEXP          DT_STRING, | –

                                               DT_SEXP

CMD_setBufferSize        DT_INT         | –

CMD_setEncoding       DT_STRING    | – (since 0.5-3)

since 0.6:

CMD_ctrlEval               DT_STRING     | –

CMD_ctrlSource          DT_STRING     | –

CMD_ctrlShutdown                  –           | –

since 1.7:

CMD_switch                   DT_STRING     | –

CMD_keyReq                  DT_STRING     | DT_BYTESTREAM

CMD_secLogin            DT_BYTESTREAM | –

  CMD_OCcall DT_SEXP                                | DT_SEXP

           The most commonly used command is CMD_EVAL. It can receive an R code. After syntax parsing, it can execute the code and get the result, then sends back the response message.

            Actually, we can run embedded R program directly in SAP HANA, and it is more simple and efficient. But we cannot do this because of the copyright issues of open source software.

            That’s it for now, I will introduce the operating mechanism Rserve and its communication with Rserve in the following blogs. If you know the principles of the detail, I think this will help you write better R procedure.



To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply