在上一篇文章中,笔者介绍了SAP HANA与R语言整合的技术基础(http://scn.sap.com/community/chinese/hana/blog/2014/04/16/%E6%B7%B1%E5%85%A5%E7%90%86%E8%A7%A3sap-ha...), 包括嵌入式R代码的执行,Rserve,及Rserve与客户端通信的基于消息的协议QAP1.
在本篇文章中,笔者将通过对日志的分析来跟踪SAP HANA与R的通信过程。
(一) 开启动rclient的DEBUG跟踪
开启rclient的DEBUG跟踪,这样能方便跟踪SAP HANA与R之间的通信。
打开SAP HANA Studio,选择Trace Configuration标签。
然后点击Database Trace最右边的编辑按钮,进入编辑对话框。
在rclient后面的System Trace Level 下面选择DEBUG
(二)查看Trace日志信息
这里仍然以http://scn.sap.com/community/chinese/hana/blog/2014/02/14/r%E8%AF%AD%E8%A8%80%E5%8C%85%E5%AE%89%E8%A...这篇日志中介绍的求质数的平方的R procedure为例来说明。
执行这个procedure之后,在SAP HANA Studio的管理视图中切换到Diagnosis Files标签,选择indexserver_(主机名).trc这个文件并打开。
能看到类似下面这样的日志信息,下面将简单的介绍一下它们的含义。
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.547096 i RClient RClient.cpp(00789) : RClient::execute: start.
//指示R代码开始执行
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.547145 d RClient RClient.cpp(00802) : RClientExecutionType is 0.
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.547173 d RClient Rconnection.cpp(00099) : Rconnection::connect()
//执行Rconnection,连接到Rserve服务器端
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.552481 i RClient Rconnection.cpp(00550) : get result of name 'rservepid'.
//接下来的这些日志是为了获取Rserve的pid, 因为目前的设计上, SAP HANA与Rserve之间的通信不支持标量直接传输,如果要传输标量,也需要把它封装到一个dataframe里面,至于为什么这么设计,笔者现在也不太清楚。
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591262 d RClient Rconnection.cpp(00450) : EVAL: Parse a data frame
//解析一个data frame,包含pid
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591308 d RClient Rexp.cpp(00596) : header type of back column is: 32
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591321 d RClient Rexp.cpp(00525) : enter getRowNum
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591326 d RClient Rconnection.h(00547) : in the getLength() and length is 1
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591331 d RClient Rexp.cpp(00530) : this column type is 32and row number is 1.
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591337 d RClient Rexp.cpp(00532) : rowNum of df is 1.
//因为一个pid是一个整数,所以封装到一个一行一列的data frame中。 具体的获取方式rservepid <- data.frame(A=Sys.getpid()) , 然后这个结果被回传给RClient
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591343 d RClient Rconnection.h(00547) : in the getLength() and length is 1
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591348 d RClient Rconnection.cpp(00559) : itsRServePid: 3905
//获取到的pid是3905
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591378 d RClient Rconnection.h(00547) : in the getLength() and length is 2
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591393 d RClient Rconnection.h(00547) : in the getLength() and length is 2
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.591431 d RClient Rmessage.cpp(00252) : data size is not large.
//下面开始传输数据,从SAP HANA到Rserve
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672147 i RClient RClient.cpp(00838) : RClient::transfer data to R: start.
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672165 d RClient RClient.cpp(00870) : input 1 will transfer with RDataframe!
//通过data frame进行传输
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672173 d RClient RClient.cpp(00871) : numRows: 8, numColumns: 1.
//对于笔者当前PRIME表,只有1列,8行,所以numColumns=1,numRows=8,
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672188 d RClient RClient.cpp(00966) : before constructed the RDataframeOnTCPIP.18
//传输的方式, 并行的读取内部表,然后拼装成data frame
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672210 d RClient Rconnection.h(00547) : in the getLength() and length is 2
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672215 d RClient Rconnection.h(00547) : in the getLength() and length is 2
//并行读取internal table
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672227 d RClient RClient.cpp(01057) : parallel reading itab
[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672435 d RClient Rconnection.h(00547) : in the getLength() and length is 8
[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672467 d RClient Rconnection.h(00547) : in the getLength() and length is 8
[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672472 d RClient Rconnection.h(00547) : in the getLength() and length is 8
[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672476 d RClient Rconnection.h(00547) : in the getLength() and length is 8
[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672481 d RClient Rconnection.h(00547) : in the getLength() and length is 8
[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672485 d RClient Rconnection.h(00547) : in the getLength() and length is 8
[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672489 d RClient Rconnection.h(00547) : in the getLength() and length is 8
[1884]{-1}[36/15884785] 2014-06-09 14:35:43.672493 d RClient Rconnection.h(00547) : in the getLength() and length is 8
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672592 i RClient RClient.cpp(01102) : Prepare dataframe sending to R takes: 0ms
//data frame拼装好,准备发送给Rserve, 这个过程可以看出还是非常快的。
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672608 d RClient RClient.cpp(01126) : tmpNumberOfInput is 1
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672617 i RClient RClient.cpp(01130) : assign input1 as 'input1'.
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.672628 d RClient Rmessage.cpp(00252) : data size is not large.
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711281 d RClient RClient.cpp(01145) : RClient: assign finished, itab deleted
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711292 i RClient RClient.cpp(01146) : Sending a dataframe to R takes: 39ms
//把data frame发送到R
//这个时候,在嵌入式R的服务端,data frame的赋值已经完成。
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711300 d RClient RClient.cpp(01322) : numITab = 1
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711306 i RClient RClient.cpp(01325) : RClient::transfer data to R: stop.
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711310 i RClient RClient.cpp(01326) : whole data transfer to R takes 39 ms
//数据传输时间 39ms
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711316 d RClient RClient.cpp(01363) : RClient::execute: R script is executed. RClientExecutionType is 0.
//R代码
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711324 d RClient RClient.cpp(01417) : Before R script Execution. Script: """
result<-as.data.frame(input1$NUMBER^2);
names(result)<-c("NUMBER");
""".
//执行开始
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.711331 i RClient RserveConnection.cpp(00352) : R-script execution: start.
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836391 i RClient RserveConnection.cpp(00383) : R-script execution: stop.
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836403 i RClient RserveConnection.cpp(00384) : R-script execution takes 125 ms.
//R代码执行时间125ms.
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836411 d RClient RClient.cpp(01429) : After R script Execution.
//执行完成,结果通过TCPIP 传输
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836416 d RClient RClient.cpp(01453) : RClient::execute is done and is TCPIP solution
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836424 d RClient RClient.cpp(01465) : RClient::execute: data is read from SHM. RClientExecutionType is 0.
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836432 i RClient RClient.cpp(01468) : RClient::transfer data from R: start.
//这是结果的内存区域id
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836449 d RClient RClient.cpp(01501) : Result shmId: 18446744073709551614.
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836459 d RClient RClient.cpp(01513) : Using TCPIP to construct result.
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836464 d RClient RserveConnection.cpp(00455) : RserveConnection::getResultForTCPIP.
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.836468 i RClient RserveConnection.cpp(00457) : get result of name 'result'.
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875237 d RClient Rconnection.cpp(00450) : EVAL: Parse a data frame
//R代码的执行结果是一个data frame,把这个data frame转换成一个table
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875287 d RClient Rexp.cpp(00596) : header type of back column is: 33
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875306 d RClient Rexp.cpp(00525) : enter getRowNum
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875315 d RClient Rconnection.h(00436) : in the getLength() and length is 8
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875323 d RClient Rexp.cpp(00530) : this column type is 33and row number is 8.
//结果是一个单列,8行的结果表
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875329 d RClient Rexp.cpp(00532) : rowNum of df is 8.
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875565 d RClient Rconnection.h(00436) : in the getLength() and length is 8
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875598 d RClient RClient.cpp(00603) : class is double.
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875604 d RClient RClient.cpp(00604) : value of double is 4
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875635 d RClient Rconnection.h(00436) : in the getLength() and length is 8
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875640 d RClient RClient.cpp(00603) : class is double.
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875644 d RClient RClient.cpp(00604) : value of double is 9
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875654 d RClient Rconnection.h(00436) : in the getLength() and length is 8
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875658 d RClient RClient.cpp(00603) : class is double.
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875662 d RClient RClient.cpp(00604) : value of double is 25
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875669 d RClient Rconnection.h(00436) : in the getLength() and length is 8
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875674 d RClient RClient.cpp(00603) : class is double.
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875677 d RClient RClient.cpp(00604) : value of double is 49
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875684 d RClient Rconnection.h(00436) : in the getLength() and length is 8
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875689 d RClient RClient.cpp(00603) : class is double.
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875692 d RClient RClient.cpp(00604) : value of double is 121
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875700 d RClient Rconnection.h(00436) : in the getLength() and length is 8
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875704 d RClient RClient.cpp(00603) : class is double.
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875708 d RClient RClient.cpp(00604) : value of double is 169
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875715 d RClient Rconnection.h(00436) : in the getLength() and length is 8
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875719 d RClient RClient.cpp(00603) : class is double.
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875723 d RClient RClient.cpp(00604) : value of double is 361
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875730 d RClient Rconnection.h(00436) : in the getLength() and length is 8
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875734 d RClient RClient.cpp(00603) : class is double.
[7063]{-1}[36/15884785] 2014-06-09 14:35:43.875738 d RClient RClient.cpp(00604) : value of double is 529
//求质数平方的结果
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875892 i RClient RClient.cpp(01924) : RClient::transfer data from R: start.
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875903 i RClient RClient.cpp(01925) : whole datatransfer from R takes 39 ms.
//结果从R serve传回SAP HANA
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875910 d RClient Rconnection.cpp(00335) : Rconnection::disconnect()
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875950 i RClient RClient.cpp(01995) : RClient::execute: stop.
[6925]{-1}[-1/-1] 2014-06-09 14:35:43.875956 i RClient RClient.cpp(01996) : whole R-Op takes 329 ms.
// 执行完毕,断开连接,总共耗时329ms.
通过对这个执行过程的分析,读者已经清楚SAP HANA与R整合的内部原理,从中可以看出,为了充分发挥整合的性能,在可能的情况下,应该尽量的减少SAP HANA与R之间的数据传输开销。
[注:本文的测试案例所使用的SAP HANA版本为SAP HANA SPS7 Revision 70.00]
想获取更多SAP HANA学习资料或有任何疑问,请关注新浪微博@HANAGeek
!我们欢迎你的加入!
转载本文章请注明作者和出处
请勿用于任何商业用途。
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
34 | |
25 | |
12 | |
7 | |
7 | |
6 | |
6 | |
6 | |
5 | |
4 |