Skip to Content

在上一篇文章中,笔者介绍了SAP HANA与R语言整合的技术基础(http://scn.sap.com/community/chinese/hana/blog/2014/04/16/%E6%B7%B1%E5%85%A5%E7%90%86%E8%A7%A3sap-hana%E4%B8%8Er%E6%95%B…), 包括嵌入式R代码的执行,Rserve,及Rserve与客户端通信的基于消息的协议QAP1.


在本篇文章中,笔者将通过对日志的分析来跟踪SAP HANA与R的通信过程。


(一) 开启动rclient的DEBUG跟踪


   开启rclient的DEBUG跟踪,这样能方便跟踪SAP HANA与R之间的通信。

   打开SAP HANA Studio,选择Trace Configuration标签。


/wp-content/uploads/2014/06/trace_470442.png

然后点击Database Trace最右边的编辑按钮,进入编辑对话框。

/wp-content/uploads/2014/06/debug_470443.png

在rclient后面的System Trace Level 下面选择DEBUG


(二)查看Trace日志信息

     这里仍然以http://scn.sap.com/community/chinese/hana/blog/2014/02/14/r%E8%AF%AD%E8%A8%80%E5%8C%85%E5%AE%89%E8%A3%85%E5%B9%B6%E5%AE%…这篇日志中介绍的求质数的平方的R procedure为例来说明。

/wp-content/uploads/2014/06/4_470458.png

执行这个procedure之后,在SAP HANA Studio的管理视图中切换到Diagnosis Files标签,选择indexserver_(主机名).trc这个文件并打开。

能看到类似下面这样的日志信息,下面将简单的介绍一下它们的含义。


[6925]{-1}[-1/-1] 2014-06-09 14:35:43.547096 i RClient          RClient.cpp(00789) : RClient::execute: start.

//指示R代码开始执行

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.547145 d RClient          RClient.cpp(00802) : RClientExecutionType is 0.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.547173 d RClient          Rconnection.cpp(00099) : Rconnection::connect()

//执行Rconnection,连接到Rserve服务器端

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.552481 i RClient          Rconnection.cpp(00550) : get result of name ‘rservepid’.

//接下来的这些日志是为了获取Rserve的pid, 因为目前的设计上, SAP HANA与Rserve之间的通信不支持标量直接传输,如果要传输标量,也需要把它封装到一个dataframe里面,至于为什么这么设计,笔者现在也不太清楚。

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591262 d RClient          Rconnection.cpp(00450) : EVAL: Parse a data frame

//解析一个data frame,包含pid

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591308 d RClient          Rexp.cpp(00596) : header type of back column is: 32

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591321 d RClient          Rexp.cpp(00525) : enter getRowNum

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591326 d RClient          Rconnection.h(00547) : in the getLength() and length is 1

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591331 d RClient          Rexp.cpp(00530) : this column type is 32and row number is 1.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591337 d RClient          Rexp.cpp(00532) : rowNum of df is 1.

//因为一个pid是一个整数,所以封装到一个一行一列的data frame中。 具体的获取方式rservepid <- data.frame(A=Sys.getpid()) , 然后这个结果被回传给RClient

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591343 d RClient          Rconnection.h(00547) : in the getLength() and length is 1

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591348 d RClient          Rconnection.cpp(00559) : itsRServePid: 3905

//获取到的pid是3905

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591378 d RClient          Rconnection.h(00547) : in the getLength() and length is 2

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591393 d RClient          Rconnection.h(00547) : in the getLength() and length is 2

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591431 d RClient          Rmessage.cpp(00252) : data size is not large.

//下面开始传输数据,从SAP HANA到Rserve

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672147 i RClient          RClient.cpp(00838) : RClient::transfer data to R: start.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672165 d RClient          RClient.cpp(00870) : input 1 will transfer with RDataframe!

//通过data frame进行传输

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672173 d RClient          RClient.cpp(00871) : numRows: 8, numColumns: 1.

//对于笔者当前PRIME表,只有1列,8行,所以numColumns=1,numRows=8,

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672188 d RClient          RClient.cpp(00966) : before constructed the RDataframeOnTCPIP.18

//传输的方式, 并行的读取内部表,然后拼装成data frame

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672210 d RClient          Rconnection.h(00547) : in the getLength() and length is 2

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672215 d RClient          Rconnection.h(00547) : in the getLength() and length is 2

//并行读取internal table

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672227 d RClient          RClient.cpp(01057) : parallel reading itab

[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672435 d RClient          Rconnection.h(00547) : in the getLength() and length is 8

[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672467 d RClient          Rconnection.h(00547) : in the getLength() and length is 8

[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672472 d RClient          Rconnection.h(00547) : in the getLength() and length is 8

[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672476 d RClient          Rconnection.h(00547) : in the getLength() and length is 8

[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672481 d RClient          Rconnection.h(00547) : in the getLength() and length is 8

[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672485 d RClient          Rconnection.h(00547) : in the getLength() and length is 8

[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672489 d RClient          Rconnection.h(00547) : in the getLength() and length is 8

[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672493 d RClient          Rconnection.h(00547) : in the getLength() and length is 8

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672592 i RClient          RClient.cpp(01102) : Prepare dataframe sending to R takes: 0ms

//data frame拼装好,准备发送给Rserve, 这个过程可以看出还是非常快的。

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672608 d RClient          RClient.cpp(01126) : tmpNumberOfInput is 1

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672617 i RClient          RClient.cpp(01130) : assign input1 as ‘input1’.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672628 d RClient          Rmessage.cpp(00252) : data size is not large.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711281 d RClient          RClient.cpp(01145) : RClient: assign finished, itab deleted

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711292 i RClient          RClient.cpp(01146) : Sending a dataframe to R takes: 39ms

//把data frame发送到R

//这个时候,在嵌入式R的服务端,data frame的赋值已经完成。

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711300 d RClient          RClient.cpp(01322) : numITab = 1

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711306 i RClient          RClient.cpp(01325) : RClient::transfer data to R: stop.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711310 i RClient          RClient.cpp(01326) : whole data transfer to R takes 39 ms

//数据传输时间 39ms

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711316 d RClient          RClient.cpp(01363) : RClient::execute: R script is executed. RClientExecutionType is 0.

//R代码

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711324 d RClient          RClient.cpp(01417) : Before R script Execution. Script: “””

  result<-as.data.frame(input1$NUMBER^2);

  names(result)<-c(“NUMBER”);

“””.

//执行开始

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711331 i RClient          RserveConnection.cpp(00352) : R-script execution: start.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836391 i RClient          RserveConnection.cpp(00383) : R-script execution: stop.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836403 i RClient          RserveConnection.cpp(00384) : R-script execution takes 125 ms.

//R代码执行时间125ms.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836411 d RClient          RClient.cpp(01429) : After R script Execution.

//执行完成,结果通过TCPIP 传输

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836416 d RClient          RClient.cpp(01453) : RClient::execute is done and is TCPIP solution

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836424 d RClient          RClient.cpp(01465) : RClient::execute: data is read from SHM. RClientExecutionType is 0.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836432 i RClient          RClient.cpp(01468) : RClient::transfer data from R: start.

//这是结果的内存区域id

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836449 d RClient          RClient.cpp(01501) : Result shmId: 18446744073709551614.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836459 d RClient          RClient.cpp(01513) : Using TCPIP to construct result.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836464 d RClient          RserveConnection.cpp(00455) : RserveConnection::getResultForTCPIP.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836468 i RClient          RserveConnection.cpp(00457) : get result of name ‘result’.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875237 d RClient          Rconnection.cpp(00450) : EVAL: Parse a data frame

//R代码的执行结果是一个data frame,把这个data frame转换成一个table

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875287 d RClient          Rexp.cpp(00596) : header type of back column is: 33

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875306 d RClient          Rexp.cpp(00525) : enter getRowNum

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875315 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875323 d RClient          Rexp.cpp(00530) : this column type is 33and row number is 8.

//结果是一个单列,8行的结果表

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875329 d RClient          Rexp.cpp(00532) : rowNum of df is 8.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875565 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875598 d RClient          RClient.cpp(00603) : class is double.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875604 d RClient          RClient.cpp(00604) : value of double is 4

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875635 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875640 d RClient          RClient.cpp(00603) : class is double.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875644 d RClient          RClient.cpp(00604) : value of double is 9

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875654 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875658 d RClient          RClient.cpp(00603) : class is double.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875662 d RClient          RClient.cpp(00604) : value of double is 25

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875669 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875674 d RClient          RClient.cpp(00603) : class is double.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875677 d RClient          RClient.cpp(00604) : value of double is 49

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875684 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875689 d RClient          RClient.cpp(00603) : class is double.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875692 d RClient          RClient.cpp(00604) : value of double is 121

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875700 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875704 d RClient          RClient.cpp(00603) : class is double.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875708 d RClient          RClient.cpp(00604) : value of double is 169

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875715 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875719 d RClient          RClient.cpp(00603) : class is double.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875723 d RClient          RClient.cpp(00604) : value of double is 361

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875730 d RClient          Rconnection.h(00436) : in the getLength() and length is 8

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875734 d RClient          RClient.cpp(00603) : class is double.

[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875738 d RClient          RClient.cpp(00604) : value of double is 529

//求质数平方的结果

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875892 i RClient          RClient.cpp(01924) : RClient::transfer data from R: start.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875903 i RClient          RClient.cpp(01925) : whole datatransfer from R takes 39 ms.

//结果从R serve传回SAP HANA

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875910 d RClient          Rconnection.cpp(00335) : Rconnection::disconnect()

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875950 i RClient          RClient.cpp(01995) : RClient::execute: stop.

[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875956 i RClient          RClient.cpp(01996) : whole R-Op takes 329 ms.

// 执行完毕,断开连接,总共耗时329ms.

     通过对这个执行过程的分析,读者已经清楚SAP HANA与R整合的内部原理,从中可以看出,为了充分发挥整合的性能,在可能的情况下,应该尽量的减少SAP HANA与R之间的数据传输开销。

[:本文的测试案例所使用的SAP HANA版本为SAP HANA SPS7 Revision 70.00]


想获取更多SAP HANA学习资料或有任何疑问,请关注新浪微博@HANAGeek

!我们欢迎你的加入!


转载本文章请注明作者和出处

http://scn.sap.com/community/chinese/hana/blog/2014/06/09/%E6%B7%B1%E5%85%A5%E7%90%86%E8%A7%A3sap-hana%E4%B8%8Er%E6%95%B…

请勿用于任何商业用途。

To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

Leave a Reply