Skip to Content
Technical Articles

Using SAP systems with a REST Web Service in SAP Data Intelligence

In this blog post I will share my experiences developing a fully connected RESTful web service.

A common request I receive, is asking how to expose SAP data as an API. With SAP there’s always multiple ways to achieve things.  Here I have used SAP Data Intelligence and the OpenAPI, operator, connected with live data and processed by an open source engine.

Similar techniques can be applied to the wide variety of sources that SAP Data Intelligence supports, we could be combining data from S4/HANA, SAP BW, SAP ECC, cloud storage, database or data lakes.

Use Case

A number of customers have asked me how best to provide and process SAP data in an interactive way.

Meaning they wanted to connect with SAP data, set parameters and enrich the output with some intelligence.  The output could then be consumed via a web app or a native (iOS or Android) mobile app.

Just to be clear, if the requirement is expose a single source with no further processing then SAP Data Intelligence would not be needed, SAP Gateway could more suitable.

This scenario required data to be retrieved from the SAP system, processed by another operator, such as R, Python, Node, Go or other dockers with C++. The results would then exposed as an API.

Initially I thought how that  Figure 1 represents this concept.

Next step was to colour in the boxes, to understand what I could implement.

 

SAP Data Intelligence Implementation

Initially I came up with a pipeline that reflected Figure 2, you may think that my work was done here.  Actually it was just beginning.

If you look very closely at the arrows in the above diagram you can see they only flow one way.  This is from left to right, meaning the data only moves this direction.  The pipeline would therefore not receive input from the OpenAPI Servlow, any user parameters would therefore be ignored. Incoming requests would only reach the OpenAPI Servlow, and the response would be static.

Previously I had read Jens Rannacher’s  great blog post on Building a RESTful Service with SAP Data Hub Pipelines, which I re-read to help get me started.

Using the example pipeline referenced there, OpenAPI Greeter Demo.  We can see a loop where the request is fed back into the server to create the response.

This example pipeline can be immediately executed, Jen’s describes how to call this with Postman and curl.

It’s a good start, but clearly it is not reflecting Figure 1, there is no external systems and no processing with Python or R.

  • SAP System = SAP HANA
  • Processing = R Client
  • OpenAPI Server = OpenAPI Servlow
  • Client = Postman
  • Controller = JavaScript

I took this pipeline and added Wiretaps to all ports so that I could understand exactly what is being passed where.  I deliberately numbered the Wiretaps, in my expected sequence, this makes tracing more intuitive. I added the SAP System (SAP HANA Client) to provide an interactive data source. My idea was, the HANA Client will receive a SQL statement that has been dynamically generated by the Controller.

After inspecting the Wiretaps, I learnt how important the port type, message is.  The message allows separation of data (body) from metadata (header).  Notice above how all output ports and connections are pink, meaning they are all messages. This allows the message header information to transfer between operators. The OpenAPI operator requires this header information.

The message header, shows the following.

{"message.commit.token":"bced6483-2c2a-45bf-b351-7d1f2b94f89f",
"message.request.id":"bced6483-2c2a-45bf-b351-7d1f2b94f89f",
"openapi.graph.id":"bee2ab56e9a64d46904e238a264916bc",
"openapi.header.content-type":"application/json",
"openapi.host":"100.64.177.110:8090",
"openapi.method":"POST",
"openapi.remote_addr":"123.123.123.123",
"openapi.request_uri":"/dev/v1/hana",
"openapi.scheme":"https",
"openapi.subgraph.id":"default"}

The OpenAPI Servlow uses the “message.request.id” to uniquely identify each request.

Simple routing of requests is performed by the JavaScript Controller.  I have kept the /ping request for convenience of testing.

The code below receives a JSON input {“Rows”:”2″} and in turn returns that number of rows from the source system (SAP HANA).

$.setPortCallback("input",onInput);
$.setPortCallback("hanain",hanaInput);

// ping count 
var count = 0;

function isByteArray(data) {
    return (typeof data === 'object' && Array.isArray(data) 
        && data.length > 0 && typeof data[0] === 'number')
}

function hanaInput(ctx, m) {
    // directly send the message from HANA as Response
    sendResponse(m, m, null);
}

function sendResponse(s, m, e) {
    if ($.output == null) {
        // invoke the callback directly
        $.sendResponse(s, m, e);
    } else {
        // let the subsequent operator decide what to do
        if (e !== null) {
            m.Attributes["message.response.error"] = e;
        }
        $.output(m);
    }
}

function onInput(ctx,s) {
    var msg = {};
    var inbody = s.Body;
    var inattributes = s.Attributes;
    
    // convert the body into string if it is bytes
    if (isByteArray(inbody)) {
        inbody = String.fromCharCode.apply(null, inbody);
    }

    // response message
    msg.Attributes = {};
    for (var key in inattributes) {
        // only copy the headers that won't interfere with the recieving operators
        if (key.indexOf("openapi.header") < 0 || key === "openapi.header.x-request-key") {
             msg.Attributes[key] = inattributes[key];
        }
    }

    // get the request path
    var reqpath = inattributes["openapi.request_uri"];
    // set header content-type
    msg.Attributes["openapi.header.content-type"] = "application/json";
    
    switch (reqpath) {
        case "/dev/v1/ping":
            msg.Body = {"pong": count++};
            sendResponse(s, msg, null);
            break;
        case "/dev/v1/hana":
            data = JSON.parse(inbody);
            // check Rows has been received
            if ((data["Rows"]) === undefined) {
                sendResponse(s, msg, Error("Invalid Input, please specify Rows"));
            }
            // check if the value Rows is not a number
            else if ((isNaN(data["Rows"]) === true))  {
                sendResponse(s, msg, Error("Invalid Number "+ data["Rows"].toString()));
            } 
            else {
                // build SQL statement
                sqlstatement = "SELECT TOP " + data["Rows"].toString() + " * FROM HDB.XRATE_GBP_EUR";
                msg.Body = sqlstatement
                // send message to sql port for HANA
                $.sql(msg);
            }
            break;
        default:
            sendResponse(s, msg, Error("Unexpected operation at " + reqpath));
            break;
    }
}

I tested the pipeline shown in Figure 5 with Postman to verify it behaves as expected.

The next challenge is to pass this data through R, actually this is easy, but retaining the message headers was more challenging.  The R operator needs to be configured to receive a message as input and send a message as output.  To understand how R is working in Data Intelligence I added a number print(paste()) statements to my R code.  The output of these statements can be found in the DI Debug trace to verify what is happening.

Reviewing the R Client Operator I learnt how the native Data Intelligence Go message is interpreted by R.  The R representation a list, with the following fields: Body, Attributes and Encoding.

msg2df <- function(msg) {
    print(paste("msg2df Body Names ", names(msg$Body)))
    print(paste("msg2df Body Dollar ", msg$Body))
    body = msg$Body
    df=data.frame(t(sapply(body,c)))
    print(paste("msg2df Body DataFrame", df))
    return (df)
}

msgheader <- function(msg) {
    print(paste("msg2hed Attributes Names ", names(msg$Attributes)))
    print(paste("msg2hed Attributes Dollar ", msg$Attributes))
    header = list(msg$Attributes)
}

df2msg <- function(df, header) {
    attributes = header[[1]]
    print(paste("df2msg Attributes ", attributes))
    print(paste("df2msg Attributes Names ", names(attributes)))
    dfaslist <- setNames(split(df, seq(nrow(df))), rownames(df))
    msg <- list(Body=list(dfaslist), Attributes=attributes, Encoding="bar")
}

onInput <- function(msg) {
    df <- msg2df(msg)
    header <- msgheader(msg)
    outMsg <- df2msg(df,header)
    print(paste("Output Print ", outMsg))
    list(outData=outMsg)
}

api$setPortCallback(c("inData"), c("outData"), "onInput")

With the wiretaps removed, the pipeline looks a lot more simple, but you loose the ability to verify what the internals are doing.

Performance Benchmarking using Command Line

After multiple tests with Postman, I wanted to validate the performance and scalability of the pipeline using other methods.

I tried 3 command line utilities, and all worked as expected.

  • vctl
  • curl
  • ab

Gianluca published a great blog on vctl.

vctl is a Data Intelligence System Management command line client, that used to support http calls. Http is officially deprecated, but still works.

 vctl util http /app/pipeline-modeler/openapi/service/dev/v1/hana \
    -H Content-Type=application/json \
    -d '{"Rows":2}' \
    -X POST

 

To use curl we can  export the required command from postman, using code, export.

curl -X POST \
  https://<SAP-Data-Intelligence-Cluster.com>/app/pipeline-modeler/openapi/service/dev/v1/hana \
  -H 'Authorization: Basic ZGVmYXVsdFx3ZGYwMTpXZWxjb21lMDE=' \
  -H 'Content-Type: application/json' \
  -H 'X-Requested-With: XMLHttpRequest' \
  -H 'cache-control: no-cache' \
  -d '{"Rows":"3"}'

Using Apache Bench (ab) has multiple options including number of requests and concurrency. It also provdes statistics and a summary of response times.. Googling for ab and performance was the most difficult part.

ab -n 20 -c4 -s 10 -v3 \
 -H 'Authorization: Basic ZGVmYXVsdFx3ZGYwMTpXZWxjb21lMDE=' \
 -H 'X-Requested-With: XMLHttpRequest' \
 -T 'application/json' \
 -H 'cache-control: no-cache' \
 -p rows.json \
 https://<SAP-Data-Intelligence-Cluster.com>/app/pipeline-modeler/openapi/service/dev/v1/hana

Asking for the verbose output from (-v) displays the complete information including the headers, one of which, Server-Timing gives us the vflow(pipeline execution time).

Server-Timing →vflow;dur=90.69

 

Conclusion

In this blog posted, I described how I processed SAP data via open source R and exposed it via a Restful webservice.

Having connected data is now a common requirement coming from applications and mobile apps.  These data driven apps often require data surfaced via APIs.  SAP Data Intelligence provides a relatively easy way to implement such APIs.

2 Comments
You must be Logged on to comment or reply to a post.
  • Excellent Blog Ian.

     

    I am using the HTTP client operator to make an HTTP POST call to a URL. As part of this POST call, I need to pass the following details in the Message Generator function:

    1. Authorization – Basic, Username and Password (which is passed in the Authorization Tab of postman)

    2. Grant Type, Username & Password (which is passed in the Body Tab of postman and under x-www-form-urlencoded)

    var msg = {};
    msg.Attributes = {}; 
    msg.Attributes["http.url"] = "<endpoint url>";
    msg.Attributes["http.method"] = "POST";

    Just like how we pass the URL and Method in the above format, how do we pass the values in step 1 & 2 above. Any help on this is greatly appreciated.