Skip to Content
Technical Articles

Zen and the art of SAP Data Intelligence. Episode 2: use a DI connection in your new operator

Among the many qualities of SAP Data Intelligence (DI), the two I personally love the most are flexibility and interoperability.  In this blog post, we will make use of both traits to create a new custom operator in the DI Modeler and add a special configuration parameter to select one of the connections defined in the DI Connection Manager.

The DI Connection Manager application offers a very friendly way to onboard and manage all those external systems that constitute your data constellation.  The user can choose from an ever growing list of connection types, fill in the required parameters and credentials, and voila: the system is connected.

Many of the standard operators in the DI Modeler are already meant to work with some of the connected systems, depending on the connection type. A good example is the Read File operator (com.sap.storage.read) that can be configured to fetch files from one of the compatible storage systems like S3, HDFS, GCS, and so on…  All the user needs to do is to click on the connection drop down menu and select one of the connected systems from the list.

 

Quite easy, isn’t it?  The question now is: how do I achieve the same with a new custom operator?  This post is the answer to this very question.

Prerequisites

  • The steps of this post have been performed with Data Hub 2.7.1 and Data Intelligence 1911.0.22, but should be valid for any Data Hub ≥ 2.5 and any Data Intelligence 19** version.
  • Recommended browser: Chrome.
  • A basic knowledge of the DI Connection Manager application is required.
  • A basic knowledge of the DI Modeler application is required.
  • Step 6 requires a basic knowledge of the Python coding language.

 

Step 1: create a new Python operator

For the sake of this example let’s assume that our new python operator contains a fictional Psychohistory algorithm that needs to read and write data from and to an S3 bucket where the needed Foundation’s data are stored.  The user of the operator should be able to easily select the preferred S3 bucket from the list of S3 connections available in the DI Connection Manager.

To add a new custom operator, click on the Operators tab on the left side of the DI Modeler GUI, click on the symbol “+” and then fill the corresponding Create Operator window.  As you can see from the image below, this example uses a Python3 Operator as a Base Operator, but what we show in this blog post is valid regardless the choice of the base operator.

 

Step 2: open the Config Schema editor to modify the configuration parameters of the new operator

Every DI Modeler operator has five macro-properties that can be edited: Ports, Tags, Configuration, Script, and Documentation.  To achieve our goal, we need to modify the operator’s Configuration parameters.  Click on the Configuration tab of the operator edit window and click on the pencil symbol to edit the Config Schema that defines the operator configuration parameters.

 

Step 3: fast-track! Just C&P a json snippet to your new operator configuration schema

This step is a shortcut for those in a hurry who want to achieve the goal without knowing all the details: just click the button on the right side to activate the JSON editor of the Config Schema as shown in the image below and replace the whole properties field with the json snippet that I placed for you right after the image.

 

"properties": {
  "S3_connection": {
    "title": "S3 Connection", 
    "required": [], 
    "type": "object", 
    "properties": {
      "connectionID": {
        "sap_vflow_valuehelp": {
          "url": "/app/datahub-app-connection/connections?connectionTypes=S3", 
          "displayStyle": "autocomplete", 
          "valuepath": "id"
        }, 
        "description": "Nome della connessione nel Connection Manager", 
        "title": "Connection ID", 
        "sap_vflow_constraints": {
          "ui_visibility": [
            {
              "name": "configurationType", 
              "value": "Configuration Manager"
            }
          ]
        }, 
        "format": "com.sap.dh.connection.id", 
        "type": "string"
      }, 
      "connectionProperties": {
        "title": "Connection Properties", 
        "description": "Manual entry of the connection properties", 
        "$ref": "http://sap.com/vflow/com.sap.dh.connections.s3.schema.json",
        "sap_vflow_constraints": {
          "ui_visibility": [
	    {
	      "name": "configurationType",
	      "value": "Manual"
	    }
          ]
	}
      }, 
      "configurationType": {
        "enum": [
          "Configuration Manager", 
          "Manual"
        ], 
        "type": "string", 
        "title": "Configuration Type"
      }
    }, 
    "description": "Connection to an S3 system"
  }, 
  "codelanguage": {
    "type": "string"
  }, 
  "script": {
    "type": "string"
  }
}

 

Now you can jump directly to Step 5, unless you want to know all the details…

Step 4: the full story.  Edit the Config Schema step-by-step.

To begin with, we want to add a new parameter called S3_connection. It should be a generic object that contains three more parameters: the configurationType, the connectionID and the connectionParameters.  The configurationType should offer only two possible string values: either “Manual” or “Configuration Manager”.  The intention of the configurationType is clear: the user should be able to either pick one of the pre-configured connections or provide all the connection parameters manually.

Depending on the value of the configurationType, either one of the two remaining parameters should be visible.  If the user opts for the option “Configuration Manager”, then the connectionID should offer the list of the available connections.  Under the hood, this list shall be filled on the fly by querying the DI Connection Manager backend.  Alternatively, the user can decide for a manual configuration and fill all the required parameters in the connectionParameters structure.  The content of such a structure depends on the type of the connection: e.g. for S3 we have an endpoint, a region, a bucket, an access key, a secret key, and so on…

Does it sound confusing?  Don’t worry, it was the same for me the very first time!  But it is fairly straightforward if we do it together step-by-step using the Config Schema editor GUI.  First of all: click on the “+” symbol to add a new property.

 

Then, call the new property S3_connection and fill all the other fields as in the image below.  As already mentioned, the Data Type should be equal to Object.

 

Now drill down to the S3_connection content by clicking on the corresponding arrow symbol.

 

The editor window is now showing the empty content of the S3_connection object.  Add the first property called configurationType by clicking the “+” symbol.

 

Fill all the fields as in the image below and do not forget to add the two possible string values: “Configuration Manager” and “Manual”.

 

Let’s add the second property called connectionID.  This is a string parameter whose list of possible values comes from the DI Connection Manager backend exposed at the following URL:

/app/datahub-app-connection/connections?connectionTypes=S3

 

 

Note that instead of fetching all the possible connections from the service we filter for connectionTypes = S3 because these are the only ones we need.  For a complete list of available connection types (34 at the time of writing, and counting…) just open the DI Connection Manager application and click on the “Connection Types” tab.

 

To complete the definition of the connectionID parameter, do not forget to make its visibility Conditional with the condition configurationType=Configuration Manager.

 

The last parameter we add to the S3_connection object is called connectionProperties.  Its type is one of the Custom Types, namely com.sap.dh.connections.s3.  The visibility is again conditional with the condition configurationType=Manual.

 

Step 5: the new connection configuration parameter is now ready to be used

If everything works well, after clicking OK to close the Config Schema editor you should see the new S3_connection parameter in the list.

 

You can check the result by dropping the new operator into an empty graph and try to edit the new S3 Connection parameter.

Step 6: how to use the configuration properties in the python script

The proper configuration of the Config Schema is only half of the story.  We need to be able to use the new connection parameters in the operator code, in this case a python script.

The number of variables we can use depend on the specific connection, or more precisely on the custom type we chose.  At the end of Step 4, we selected the com.sap.dh.connections.s3 type because we were interested in connecting to an S3 storage system.  To discover the definition of this specific type, we can select the Type side bar and search for s3.  Click on the the corresponding item to visualize the list of the available Properties.

 

Regardless of how the user decides to configure the connection properties, i.e. both in the Configuration Manager as well as in the Manual case, the resulting variable is a structure containing all the key-value pairs, the keys being exactly the names of the properties as displayed in the image above.

Now we have all the information to navigate the api.config.S3_connection python dictionary in our custom operator:

S3_endpoint  = api.config.S3_connection["connectionProperties"]["endpoint"]
S3_protocol  = api.config.S3_connection["connectionProperties"]["protocol"]
S3_region    = api.config.S3_connection["connectionProperties"]["region"]
S3_accessKey = api.config.S3_connection["connectionProperties"]["accessKey"]
S3_secretKey = api.config.S3_connection["connectionProperties"]["secretKey"]
S3_rootPath  = api.config.S3_connection["connectionProperties"]["rootPath"]

def onInput(data):
    api.send("output", "S3_endpoint = "  + str(S3_endpoint))
    api.send("output", "S3_protocol = "  + str(S3_protocol))
    api.send("output", "S3_region = "    + str(S3_region))
    api.send("output", "S3_accessKey = " + str(S3_accessKey))
    api.send("output", "S3_secretKey = " + str(S3_secretKey))
    api.send("output", "S3_rootPath = "  + str(S3_rootPath))

api.set_port_callback("input",onInput)

 

 

Epilog

At the beginning of this post I stated that flexibility and interoperability are the two most useful features of Data Intelligence.  I want to go further now, and declare that flexibility and interoperability are probably among the top five most wanted features for any enterprise-ready big-data analytics software (the other three? easy: scalability, security, and stability).

The SAP Data Intelligence applications are built to be flexible and interoperable (and of course scalable, and secure, and stable…).  In this blog post, I showed you how to customize your DI Modeler by adding new content based on another application, the DI Connection Manager.  The whole process is a bit cumbersome and could be (and will be!) greatly improved.

Fortunately, the JSON editor interface always allows us to simply cut and paste the right snippet and get to the final result in a heartbeat.  All you need to do is search for a good blog post and get the right snippet.  In other words: as long as there is bloggers there is hope.

Thank you for reading!

 


For the philomaths

Further information about the topics treated in this blog post can be found in the following references:

Be the first to leave a comment
You must be Logged on to comment or reply to a post.