Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
Former Member
0 Kudos

在上一篇文章中,笔者介绍了SAP HANA与R语言整合的技术基础(http://scn.sap.com/community/chinese/hana/blog/2014/04/16/%E6%B7%B1%E5%85%A5%E7%90%86%E8%A7%A3sap-ha...), 包括嵌入式R代码的执行,Rserve,及Rserve与客户端通信的基于消息的协议QAP1.


在本篇文章中,笔者将通过对日志的分析来跟踪SAP HANA与R的通信过程。


(一) 开启动rclient的DEBUG跟踪


   开启rclient的DEBUG跟踪,这样能方便跟踪SAP HANA与R之间的通信。

   打开SAP HANA Studio,选择Trace Configuration标签。


然后点击Database Trace最右边的编辑按钮,进入编辑对话框。

在rclient后面的System Trace Level 下面选择DEBUG


(二)查看Trace日志信息

     这里仍然以http://scn.sap.com/community/chinese/hana/blog/2014/02/14/r%E8%AF%AD%E8%A8%80%E5%8C%85%E5%AE%89%E8%A...这篇日志中介绍的求质数的平方的R procedure为例来说明。

执行这个procedure之后,在SAP HANA Studio的管理视图中切换到Diagnosis Files标签,选择indexserver_(主机名).trc这个文件并打开。

能看到类似下面这样的日志信息,下面将简单的介绍一下它们的含义。


[6925]{-1}[-1/-1] 2014-06-09 14:35:43.547096 i RClient          RClient.cpp(00789) : RClient::execute: start.

//指示R代码开始执行

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.547145 d RClient          RClient.cpp(00802) : RClientExecutionType is 0.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.547173 d RClient          Rconnection.cpp(00099) : Rconnection::connect()

//执行Rconnection,连接到Rserve服务器端

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.552481 i RClient          Rconnection.cpp(00550) : get result of name 'rservepid'.

//接下来的这些日志是为了获取Rserve的pid, 因为目前的设计上, SAP HANA与Rserve之间的通信不支持标量直接传输,如果要传输标量,也需要把它封装到一个dataframe里面,至于为什么这么设计,笔者现在也不太清楚。

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591262 d RClient          Rconnection.cpp(00450) : EVAL: Parse a data frame

//解析一个data frame,包含pid

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591308 d RClient          Rexp.cpp(00596) : header type of back column is: 32

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591321 d RClient          Rexp.cpp(00525) : enter getRowNum

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591326 d RClient          Rconnection.h(00547) : in the getLength() and length is 1

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591331 d RClient          Rexp.cpp(00530) : this column type is 32and row number is 1.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591337 d RClient          Rexp.cpp(00532) : rowNum of df is 1.

//因为一个pid是一个整数,所以封装到一个一行一列的data frame中。 具体的获取方式rservepid <- data.frame(A=Sys.getpid()) , 然后这个结果被回传给RClient

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591343 d RClient          Rconnection.h(00547) : in the getLength() and length is 1

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591348 d RClient          Rconnection.cpp(00559) : itsRServePid: 3905

//获取到的pid是3905

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591378 d RClient          Rconnection.h(00547) : in the getLength() and length is 2

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591393 d RClient          Rconnection.h(00547) : in the getLength() and length is 2

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591431 d RClient          Rmessage.cpp(00252) : data size is not large.

//下面开始传输数据,从SAP HANA到Rserve

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672147 i RClient          RClient.cpp(00838) : RClient::transfer data to R: start.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672165 d RClient          RClient.cpp(00870) : input 1 will transfer with RDataframe!

//通过data frame进行传输

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672173 d RClient          RClient.cpp(00871) : numRows: 8, numColumns: 1.

//对于笔者当前PRIME表,只有1列,8行,所以numColumns=1,numRows=8,

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672188 d RClient          RClient.cpp(00966) : before constructed the RDataframeOnTCPIP.18

//传输的方式, 并行的读取内部表,然后拼装成data frame

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672210 d RClient          Rconnection.h(00547) : in the getLength() and length is 2

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672215 d RClient          Rconnection.h(00547) : in the getLength() and length is 2

//并行读取internal table

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672227 d RClient          RClient.cpp(01057) : parallel reading itab

[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672435 d RClient          Rconnection.h(00547) : in the getLength() and length is 8

[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672467 d RClient          Rconnection.h(00547) : in the getLength() and length is 8

[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672472 d RClient          Rconnection.h(00547) : in the getLength() and length is 8

[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672476 d RClient          Rconnection.h(00547) : in the getLength() and length is 8

[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672481 d RClient          Rconnection.h(00547) : in the getLength() and length is 8

[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672485 d RClient          Rconnection.h(00547) : in the getLength() and length is 8

[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672489 d RClient          Rconnection.h(00547) : in the getLength() and length is 8

[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672493 d RClient          Rconnection.h(00547) : in the getLength() and length is 8

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672592 i RClient          RClient.cpp(01102) : Prepare dataframe sending to R takes: 0ms

//data frame拼装好,准备发送给Rserve, 这个过程可以看出还是非常快的。

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672608 d RClient          RClient.cpp(01126) : tmpNumberOfInput is 1

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672617 i RClient          RClient.cpp(01130) : assign input1 as 'input1'.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672628 d RClient          Rmessage.cpp(00252) : data size is not large.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711281 d RClient          RClient.cpp(01145) : RClient: assign finished, itab deleted

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711292 i RClient          RClient.cpp(01146) : Sending a dataframe to R takes: 39ms

//把data frame发送到R

//这个时候,在嵌入式R的服务端,data frame的赋值已经完成。

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711300 d RClient          RClient.cpp(01322) : numITab = 1

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711306 i RClient          RClient.cpp(01325) : RClient::transfer data to R: stop.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711310 i RClient          RClient.cpp(01326) : whole data transfer to R takes 39 ms

//数据传输时间 39ms

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711316 d RClient          RClient.cpp(01363) : RClient::execute: R script is executed. RClientExecutionType is 0.

//R代码

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711324 d RClient          RClient.cpp(01417) : Before R script Execution. Script: """

  result<-as.data.frame(input1$NUMBER^2);

  names(result)<-c("NUMBER");

""".

//执行开始

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711331 i RClient          RserveConnection.cpp(00352) : R-script execution: start.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836391 i RClient          RserveConnection.cpp(00383) : R-script execution: stop.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836403 i RClient          RserveConnection.cpp(00384) : R-script execution takes 125 ms.

//R代码执行时间125ms.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836411 d RClient          RClient.cpp(01429) : After R script Execution.

//执行完成,结果通过TCPIP 传输

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836416 d RClient          RClient.cpp(01453) : RClient::execute is done and is TCPIP solution

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836424 d RClient          RClient.cpp(01465) : RClient::execute: data is read from SHM. RClientExecutionType is 0.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836432 i RClient          RClient.cpp(01468) : RClient::transfer data from R: start.

//这是结果的内存区域id

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836449 d RClient          RClient.cpp(01501) : Result shmId: 18446744073709551614.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836459 d RClient          RClient.cpp(01513) : Using TCPIP to construct result.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836464 d RClient          RserveConnection.cpp(00455) : RserveConnection::getResultForTCPIP.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836468 i RClient          RserveConnection.cpp(00457) : get result of name 'result'.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875237 d RClient          Rconnection.cpp(00450) : EVAL: Parse a data frame

//R代码的执行结果是一个data frame,把这个data frame转换成一个table

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875287 d RClient          Rexp.cpp(00596) : header type of back column is: 33

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875306 d RClient          Rexp.cpp(00525) : enter getRowNum

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875315 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875323 d RClient          Rexp.cpp(00530) : this column type is 33and row number is 8.

//结果是一个单列,8行的结果表

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875329 d RClient          Rexp.cpp(00532) : rowNum of df is 8.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875565 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875598 d RClient          RClient.cpp(00603) : class is double.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875604 d RClient          RClient.cpp(00604) : value of double is 4

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875635 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875640 d RClient          RClient.cpp(00603) : class is double.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875644 d RClient          RClient.cpp(00604) : value of double is 9

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875654 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875658 d RClient          RClient.cpp(00603) : class is double.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875662 d RClient          RClient.cpp(00604) : value of double is 25

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875669 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875674 d RClient          RClient.cpp(00603) : class is double.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875677 d RClient          RClient.cpp(00604) : value of double is 49

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875684 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875689 d RClient          RClient.cpp(00603) : class is double.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875692 d RClient          RClient.cpp(00604) : value of double is 121

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875700 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875704 d RClient          RClient.cpp(00603) : class is double.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875708 d RClient          RClient.cpp(00604) : value of double is 169

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875715 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875719 d RClient          RClient.cpp(00603) : class is double.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875723 d RClient          RClient.cpp(00604) : value of double is 361

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875730 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875734 d RClient          RClient.cpp(00603) : class is double.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875738 d RClient          RClient.cpp(00604) : value of double is 529


//求质数平方的结果

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875892 i RClient          RClient.cpp(01924) : RClient::transfer data from R: start.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875903 i RClient          RClient.cpp(01925) : whole datatransfer from R takes 39 ms.

//结果从R serve传回SAP HANA

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875910 d RClient          Rconnection.cpp(00335) : Rconnection::disconnect()

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875950 i RClient          RClient.cpp(01995) : RClient::execute: stop.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875956 i RClient          RClient.cpp(01996) : whole R-Op takes 329 ms.

// 执行完毕,断开连接,总共耗时329ms.

     通过对这个执行过程的分析,读者已经清楚SAP HANA与R整合的内部原理,从中可以看出,为了充分发挥整合的性能,在可能的情况下,应该尽量的减少SAP HANA与R之间的数据传输开销。

[:本文的测试案例所使用的SAP HANA版本为SAP HANA SPS7 Revision 70.00]


想获取更多SAP HANA学习资料或有任何疑问,请关注新浪微博@HANAGeek

!我们欢迎你的加入!


转载本文章请注明作者和出处

http://scn.sap.com/community/chinese/hana/blog/2014/06/09/%E6%B7%B1%E5%85%A5%E7%90%86%E8%A7%A3sap-ha...

请勿用于任何商业用途。

1 Comment