Skip to Content

Introduction

 

Early this year there were a couple of blog posts on how one can extend SAP Data Hub: Developing a Custom Pipeline Operator from a Base Operator, and Develop a Custom Pipeline Operator from a Dockerfile.  SAP Data Hub is highly extensible, and you can take the information in these blogs a step further by building a Solution for SAP Data Hub.  By creating a Solution you can make custom components for SAP Data Hub more consumable and accelerates the use of your custom pipelines and operators in other deployments.  Solutions will play a big part in the SAP Data Hub eco-system.

What is a SAP Data Hub Solution or VSolution?

 

For the developer persona, SAP Data Hub content is made up of graphs (or pipeline), operators and dockerfiles.  When you navigate to the System Manager and the Modeler within SAP Data Hub, you will see Graphs, Operators and Dockerfiles.   A graph is made up of interconnected operators to execute an arbitrary task.  Internal to each operator, there is a dockerfile definition on top of which the operator runs.  Subengines have not fully been exposed yet with SAP Data Hub but these subengines can be attached to dockerfiles (perhaps in a future blogs).  Graphs can be also interconnected as it is possible to invoke a graph from another graph.  Dockerfiles and operators can include/reference other files as you will see later on.

In the System Manager you will find export features.  Once you create these components above, you can export them as a Solution.  You can decide to export them individually (not very useful) or as a working set. A SAP Data Hub Solution can contain one or more graphs containing one or more operators containing one or more dockerfiles as a working set. An operator (as shown above with Operator 1) can be used in multiple graphs as operators and graphs are loosely coupled. Carefully consider these dependencies when exporting the objects in a Solution. Additionally, graphs will use system operators and these would not need to included in a Solution. You may also see references to VSolutions which are the same thing as a SAP Data Hub Solution.

What you will be building in this tutorial

There are situations that may not need a custom docker file as described in the second blog referenced above, this blog will walk you through creating a Solution that includes a custom docker file, custom operator and example graph using the custom operator.  It is good practice to include a graph using your custom operators to show people how the operator can be used.  By creating a Solution, SAP Data Hub administrators can easily incorporate your custom code very quickly into the SAP Data Hub landscape.

The Solution that you will create will provide a basis of deployment for a HANA Python Client which can be used with Vora Pre-Ingestor or WriteFile operators.  While there is a HANA Client, this operator will allow a developer to produce batch messages that could be further be customized in terms of format.  The operator will expect in input string that contains SQL.  The operator has an output port which sends messages containing query results that ideally can be sent to WriteFile or VORA Ingestor nodes.  The complete output port identifies when all of the data has been sent.  The complete output port will be typically connected to a Graph Terminator node.

The batchsize parameter will specify the number of records to be sent to the output message.  The host, port, user and password parameters are connections for a provided HANA database.  It is possible to use connections established by the connection manager but for simplicity this blog will use these 4 parameters.

Note: There is alternative development approach for Python Operators which is site in the SAP online documentation.

Prerequisites:

SAP Data Hub Trial V2.3

SAP HANA Client (for Linux)

SAP HANA

Note: The blog here discusses how to acquire the SAP HANA Python libraries.  You may want to look in the installation directory under /hana/shared/HDB/hdbclient/ for the hdbcli-n-n.nn.tar.gz file.

 

Create a Dockerfile

  1. From the SAP Data Hub Launchpad, navigate to the SAP Data Hub Modeler.
  2. Open the “Repository” tab.  Right click on the “Docker Files section and select “Create Folder”. Name the folder “myexample”.
  3. Right click on the new folder named “myexample”  and select “Create Docker file”.  Call the Docker file “PyHANA”.
  4. In the Dockerfile editor, specify the following Docker file code:
    # Use an official Python 3.6 image from the repository as a parent image
    FROM python:3.6.4-slim-stretch
    #Create a directory on the docker container named /tmp/SAP_HANA_CLIENT
    RUN mkdir /tmp/SAP_HANA_CLIENT
    #Copy the local tar file to the docker container
    COPY hdbcli-2.3.106.tar.gz /tmp/SAP_HANA_CLIENT
    #Install the HANA specific components into the Python environment on the container
    RUN pip install /tmp/SAP_HANA_CLIENT/hdbcli-2.3.106.tar.gz
    ​

    Note: your hdbcli file may have a different version so you may need to tweak the version of this file.  For documentation on Dockerfiles see Docker.com.

  5. Click on the Tags button  on the top right   highlighted below. Specify the following tags for the Docker file.  Press the Save button.
  6. Navigate to the SAP Data Hub Launchpad (main screen).
  7. Navigate to the System Manager.
  8. Navigate to the Files
  9. Navigate to the “vflow->dockerfiles->myexample->PyHANA” folder. 
  10. As shown above, select the import drop down menu and select Import File.  In the SAP HANA Client full install (Linux binaries), you will find the hdbcli-2.n.nn.tar.gz file.  Navigate to this file location and import this file into your new PyHANA dockerfile.  It should look like the following once you are done.
    You may need to change the file name in the Dockerfile above to reflect the proper version of the file.  The above docker code will add and install the HANA Python client into your custom docker image from this location and the it will install the driver into your Python environment using the tar.gz file once it has been added to the docker environment.
  11. Navigate back to the SAP Data Hub Modeler browser tab.
  12. Open the “Repository” tab.  Navigate to the “Docker Files->myexample->PyHANA”.  Double click on PyHANA to open it.  On the top right, click on the Build button.  Make sure this docker file builds as it will populate the necessary tags for the custom docker image for the custom operator.

Create a Custom Operator based on the new Dockerfile

  1. Navigate to the “Operators” and right click on the “Operators” select “New folder”.
  2. Enter the name “myexample
  3. Navigate to the new myexample folder, right click on this folder and select “Create Operator

  4. Specify the name PyHANA, the display name as “Python HANA Client” and specify the Python3Operator.  Press the OK button.
  5. In the Input Ports section, press the plus symbol circled in red below.  Specify the name “input” with a type of “string” as shown below.
  6. In the Output Ports section, press the plus symbol circled in red below.  Specify the port name “output” with a type of “message”.  Again, press the plus symbol and add the port named “complete” as type “string” as shown below.
  7. In the Tags section, press the plus symbol circled in red below.  Specify the tag name “hdbcli” which should be found in your drop down list.  Again, press the plus symbol and add the tag named “python36” as shown below.  If you do not see hdbcli, in the list, refresh your browser and verify that you docker image built properly.
  8. In the “Operator Configuration” section, press the “Auto Propose” button.  Using the tags, that you just specified should assign this custom operator to the custom dockerfile you have built.
    You should now see the Operator Configuration as shown below.
  9. In the Parameters section, press the plus symbol circled in red below.  Specify the parameter name “batchsize” with a type of “number”.  Repeat this process to add the parameters host, port, user, password with the types string, number, string, string respectively as shown below.  Specify default values for batchsize, host, port, user, password as “50”, “localhost”, “30015”, “system” and “changeme” respectively.  You will want to keep these default values generic as your Solution will be used in a variety of environments.
  10. In the Script section, paste the following code:
    from hdbcli import dbapi
    import logging
    import time
    
    logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')
    
    #Get Configuration parameters
    batchSize=api.config.batchsize
    addr=api.config.host
    usr=api.config.user
    passwd=api.config.password
    prt=api.config.port
    
    #Connect to SAP HANA
    conn = dbapi.connect( address=str(addr), port=int(prt), user=str(usr),  password=str(passwd) )
    if (conn):
        api.logger.info("connection open")
        
    def shtdown():
      global conn
      conn.close()
      api.logger.info("closing connection")
      
    def on_input(data):
      api.logger.info("SQL received"+data)
      global conn
      cursor = conn.cursor()
      cursor.execute(data)
      if cursor:
          api.logger.info("open cursor")
      results = cursor.fetchall()
      i=0
      j=0
      collist=''
      for row in results:
          i=i+1
          j=j+1
          colcount=0
          for col in row:
              if col is None:
                  collist=collist+','
                  #api.logger.info("none found------------------")
              else:
                  if colcount==0:
                      collist=collist+str(col)+','
                  else:
                      collist=collist+str(col)+','
              colcount=colcount+1
          collist=collist[:-1]+'\n'
          colcount=0
          #api.logger.info("collist--\n"+collist)
          if j==batchSize or len(results)==i:
              obj ={"message.commit.token": "commit-" + str(i), "demo.batch_size": str(j)};
              msgout = api.Message( collist, obj)
              #api.logger.info("msg---\n"+str(msgout))
              api.send( "output", msgout)
              collist=''
              j=0
      api.logger.info("query complete")
      time.sleep(10)  
      api.send( "complete", "Done")
      
    api.set_port_callback("input", on_input)
    api.add_shutdown_handler(shtdown)
  11. Press on the “Edit Documentation” button in top middle of the browser.  For information on markdown language, see GitHub.
    1. Paste the following markup in the editor and press the Save
      HANA Python Client
      ============================================
      
      This operator will accept SQL query which will result in batch set of messages which can be used with Vora Ingestor or WriteFile operators. While there is a HANA Client, this operator will allow a developer to produce batch messages that could be further customized in terms of format via Python script.
       
      Input
       Ports
       ------------- 
      * **input** (type string) The operator will expect in input string that contains SQL.
      
      Output Ports
      ------------
      
      * **output** (type mesage) The operator has an output port which sends messages can be sent to WriteFile or VORA PreIngestor which are results of the query from the input port.
      * **complete** (type string) The complete port identifies when all of the data has been sent.  The complete port will be typically connected to a Graph Terminator node.
      
      Configuration Parameters
      -------------
      
      **batchsize** (type number) Specify the number of records to be sent to the output message. 
       
      **host** (type string) : TCPIP hostname for SAP HANA.
      
      **port** (type number) TCPIP port number for SAP HANA.
      
      **user** (type string) : user name to access SAP HANA.
      
      **password** (type string) :  password for HANA user.
      
      
      
      
  12. Press the Save but to save the Custom Operator

    Configure the icon for your Operator (Optional)

  13.  Navigate back to the System Manager and navigate to the location where you find the operator.json for the new operator. It may be under “vflow->operator->myexample”.  In this case it is under “vflow->subengines->com->sap->python36->operator->myexample-PyHANA”. Select the “PyHANA” folder.
  14. Click on “Import File” on the top middle.  Navigate to a “.png” file that you wish to use as an icon.  Below is an example of a hana.png that was uploaded.
  15. Navigate back to the “Modeler” browser tab. You should still have the Operator editor open.  If not, open the Python HANA Client operator that you just created.
  16. If you hover over the puzzle-piece icon you will see that you can click on it and change the icon to use one of the icons in a large list of system icons. 
  17. Or you can use the icon that you just uploaded into the System Manager.  To do use your icon, select the JSON tab on the right. Change the text “icon”:”puzzle-piece” to “iconsrc”:”hana.png”.  Toggle between JSON and Form to ensure that the previous settings are still in place.  If the settings are lost (even the script), the toggle between the views again and ensure the Form view has all of the settings.  You should also see the icon present on your operator.
  18. Press the Save

Create your example Graph using the custom operator

  1. Select the “Graphs” tab on the top right and press the plus symbol circled in red as shown below to create a new example graph that uses the new operator.
  2. Type “Python Hana” in the search bar.  Drag and drop the operator on the new untitled graph canvas.
  3. Type “Wiretap” in the search bar.  Drag and drop the operator on the new untitled graph canvas.
  4. Type “Javascript” in the search bar.  Drag and drop the first javascript operator on the new untitled graph canvas.
  5. Type “Graph Terminator” in the search bar.  Drag and drop the operator on the new untitled graph canvas.
  6. Wire the Javascript node to the Python HANA Client node. Wire the Python HANA Client output port to the Wiretap node and the complete port to the Graph Terminator node as shown below.
  7. Open the script editor for the Javascript node.
  8. Paste the following code into the Editor.
    $.setPortCallback("input",onInput);
    
    function onInput(ctx) {
        $.output("select * from sys.objects");
    }
    $.addGenerator(onInput);
    ​

  9. Press the “Save As” button at the top center of the browser.
  10. Provide the name “myexample.PyHANA” and press the OK button to save the Graph.

    At this point you could skip over to Export the Solution section you have generic code to export for your Solution.  However, it would be good idea for you test the Solution working by changing the Python HANA Client connection parameter for your environment save the graph again and run it to validate it works.  You do want to ship working code afterall. Press the Run button (circled in red below). When the graph is running open the Wiretap UI (circled in red below) to see the output of the custom operator.The output should look something like this.Once you have validated that it works.  Change the database connection configuration to generic values and resave the graph.

Export the Solution

 

  1. Navigate back to the browser tab for System Management. Expand the “vflow” folder and docker folder, operators folder and graphs folders.  You should see the “myexample->PyHANA” artifacts in each of these folders.

    If you do expand the operator folder and do not see anything under “myexample”, the operator may be under the subengines folder as shown below.  You want to find the “myexample->PyHANA” older that has the operator.json and configSchema.json files.
  2. By pressing CTRL-Click select “myexample->PyHANA” for the dockerfile, operator, subengines and graph. Once selected, select the Export drop down and select “Export select files/folders as solution” at the top middle of the browser.
  3. Ensure that you have the Graph, Dockerfile, Operator and if necessary Subengines->Operator included with your Solution.  Press “Export Solution”.  A “tar.gz” file should be downloaded to your system.  Open it and it should have a directory “vrep\vsolution” containing 4 folders for the operators, graphs and dockerfile.

Test the Solution

  1. Now that you have the Solution exported you will want to test the importing of the Solution. To truly test the Solution delete the dockerfile, subengine, graph and operator from SAP Data Hub.   Find each of the folder four folders within System Manager named “myexample” and delete the folder by right clicking and selecting “Delete”.
  2. Refresh the browser. Verify that there are no folders named “myexample”.
  3. On the top of the browser select the “Import” drop-down button and select “Import Solution”.
  4. Browse to the location of your exported Solution and select the tar.gz file that you just exported.
  5. Switch to the Modeler browser tab. Open and edit the myexample.PyHANA graph and make same changes for your database environment that you did for the Python HANA Client operator in Step 10 when you tested the graph earlier.  Save the changes.
  6. Run the myexample.PyHANA that was imported.
    Once the graph runs to completion, pat yourself on the back!  Good Job.

Other useful links related to this topic

SAP HANA 2.0 SP02 New Features

Developing a Custom Pipeline Operator from a Base Operator

Develop a Custom Pipeline Operator from a Dockerfile

Setting up HANA Express with Python Machine Learning

Docker Documentation

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply