Custom R Component creation was one of the features included in PA 1.0.11, which enabled the users to create components from their own R Scripts, which was highly appreciated and added a lot of value in terms of the integration workflows of PA with R. Current topic of dicsussion will not cover the “How to” of this feature and thus I will let you get more details on this from http://scn.sap.com/docs/DOC-42862 and http://scn.sap.com/docs/DOC-42739

These custom components generated could be used directly in the PA Analysis workflows, just like any other normal PA components. But the internal flow of this integrated component is not an easy job. PA has an underlying Framework which helped in setting the execution environment for this component to a great extent. The major jobs done in this regard by the Framework were

1.       1. Fetching of the input data from the parent component

2.       2. Fetching the output data from the R-Environment to the next components of the Analysis.

Then comes the component , which is responsible for instantiating the R Environment, prepare the scripts to execute and also do execute them. The beauty of this feature is that, all these complexities of the component here is hidden from the user.

The problem areas is that the component doesn’t control the input data to the R Script since it is fetched through the framework.  In addition, creating component which needs inputs from two tables/datasets gets a little tricky…! The solution here would be to enable the Component itself to select and load all the additional input that it needs, independently.

On face argument here would be that, how can you get two inputs to a component, when PA doesn’t allow it yet (I am being a little skeptical here, as PA might include this feature sooner or later). We could counter it by claiming that this requirement could arise even without having multiple parent components in the Analysis. A classic example here would be a KNN component, which takes a training dataset and does a scoring on an entirely different dataset.

I will try to suggest a simple solution that could help us to create a component with multiple input tables. The solution is to read one of the inputs from its parent as usual. In the samples given below, I use that as the scoring input and the other input (the training dataset) should be loaded into the R execution environment separately, which could be done using any of the R load data constructs as you can see in the samples below.

A sample script which would read the second input from DB Table would look like…

knnfunction<-function(dataFrame,dsn,user,pass,tablename){

library(RODBC)

conn <- odbcConnect(dsn, uid=user, pwd=pass)

sqlstatement<-paste(“SELECT * FROM “,tablename)

traindata<-sqlQuery(conn,sqlstatement,believeNRows=FALSE)

library(class);

cl <- factor(c(rep(“s”,25), rep(“c”,25), rep(“v”,25)));

knnout<-knn(traindata, dataFrame, cl, k = 3, prob=FALSE);

return (list(out=knnout));

}

This script could be used, directly, as it is in the custom R Component wizard to create a new component, where in knnfunction is the name of the function and the parameters to the component would be the dataFrame name(the input from the parent), dsn name with username and password and finally the table from which to read the training data. These will become the user given properties of the component, once the component is created.

A similar sample script with a csv file as input would look like…

knnfunction<-function(input,filepath){

library(class);

traindata<-read.csv(filepath);

cl <- factor(c(rep(“s”,25), rep(“c”,25), rep(“v”,25)));

knnout<-knn(traindata, input, cl, k = 3, prob=FALSE);

return (list(out=knnout));

}

Here the input parameter would define the data frame from the parent and the filePath defines the location of the csv file.

Similar methods could be used to read any number of additional inputs and also to read from any source supported by R.

To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

  1. Henry Banks

    great stuff, can’t believe this slipped my attention on the first pass!

    this deserves a *bump* because a multi-input component that can enrich  PA with a second source is  ‘golden’!

    me too, I hope this feature comes soon in the standard.

    regards,

    H

    (0) 

Leave a Reply