Introduction

lochner_louw · ‎08-17-2020

Just a side note - You will need to log a ticket to activate custom applications
on SAP Data Intelligence Cloud. I recently tried and found a bunch of errors when trying to active one. Please log ticket under CA-DI-OPS or CA-DI.

Introduction

One thing that always fascinated me is to configure and customise things to fit a requirement. From customising my IRC clients with scripts when I was a kid to writing my own Linux distribution in high school and of course writing custom add-ons for games. After university, I walked into a consulting job as an ABAP developer and got exposed to the crazy world of the Z-prefix where you could customise SAP systems running the ABAP runtime.

SAP still continues the tradition of extensibility in some form or another, like in one of my favourite products to play around with, SAP Data Intelligence. SAP Data Intelligence is a product that allows users to connect, discover, enrich and orchestrate data into actionable insights. To do that SAP Data Intelligence is containerised with Docker and is in return managed by Kubernetes. Users of SAP Data Intelligence can configure Kubernetes and Docker to tailor their Data Intelligence setup to their needs.

As data scientists, my colleagues requested to use TensorBoard (software that gives visualisation and tools for Machine Learning experimentation) in SAP Data Intelligence. I felt inspired to capture the configuration required for this in this blog post. So, take this as the result of personal experience and not as a “best practice” guide.

Prerequisites

Access to an SAP Data Intelligence instance.
- SAP Data Intelligence (cloud edition) Version: 2006.1.8 was used in this post

You will need to have admin permissions on the SAP DI tenant to install the application.

A basic understanding of containerisation concepts, Kubernetes and Docker.

Overview

In this post, I will go through the steps of extending SAP Data Intelligence by adding TensorBoard as an application in the tenant. I will also go through the components that are required to develop and deploy the application.

At the time writing the post I noticed that it was a bit too lengthy to pack into a single post. For that reason, I split it into 2 separate posts to show what I did for my colleagues:

Part 1: Configuring a Custom Solution for TensorBoard in SAP DI (This post) and

Part 2: Packaging, deploying and running solution containing the custom application.

Components

Typically, in SAP Data Intelligence you can extend functionality by adding or modifying: DockerFiles, graphs, operators, data types and configuration types. But you can also add your own applications if you are the administrator of the tenant via custom solutions.

What is a Solution?

In SAP Data Intelligence a Solution is a deployable package which includes all artefacts that belong to the applications (graphs, operators, application definitions, and static files) used by the Solution. This package contains multiple applications that are used together in order to create a Solution. Think of a Solution as a complete unit that solves a specific business requirement end to end.

An example of this could be a solution that logs car license plates from the main entry gates to a car park. Such a Solution could use applications for capturing the images from cameras, preparing the image files for character recognition, using OCR to get the characters from the images, validating the OCR results and logging them into a database. Each of those steps is performed by a separate application. To make the Solution work, all applications (with the correct versions) need to be deployed together. Exactly this is what the “Solution” does in SAP Data Intelligence.

A Solution is supplied as a ZIP file which contains a manifest.json. This manifest file describes the properties of the Solution: its name, its version, and the component it depends on. The manifest.json file I used looked similar to this:

{

  "name": "tensorboard-app",

  "version": "0.0.2",

  "format": "2",

  "dependencies": []

}

The dependencies entry looks like this.

NOTE: Every dependent component is listed in curly brackets {} and each has a name and a version definition. If multiple components are required those are listed with commas between the {}:

{   

  ...    

  "dependencies": [ 

    { "name": "other-solution", "version": ">1.2.3" }, 

    { "name": "yet-another-solution", "version": ">=2.0.0" }

  ]

}

All files belonging to an SAP Data Intelligence Custom Solution are save within a single folder. The structure of this solution folder typically looks as follows:

my_custom_solution

├── content

│   └── files

│   |   └── vflow

│   |       ├── dockerfiles

│   |       ├── graphs

│   |       ├── operators

│   |       └── ...

│   └── vsystem

│       ├── apps

│       |   └── tensorboard-app.json

│       ├── icons

│       │   ├── TensorBoard.png

│       │   └──...

│       └── ...

└── manifest.json

NOTE: The artefacts I used for this blog post are printed in bold.

The Application Descriptor

Each application within a solution is defined by an application descriptor file. This .json file is located in a folder vsystem/apps. When SAP DI deploys the solution it subsequently deploys all applications. The application descriptor files provide specific details on how each application must be deployed.

It is good practice to give the application descriptor file a name similar to the name of your application. In my example the name is TensorBoard and the descriptor is tensorboard-app.json.

{

  "name": "TensorBoard",

  "type": "kubernetes",

  "apiVersion": "v2",

  "version": "0.0.2",

  "icon": "/vsystem/icons/TensorBoard.png",

  "mounts": { "vhome": true },

  "body": { ... }

}

name: The name that is displayed in the Launchpad once it is deployed on the system.

version: The version of our application.

apiVersion: The version of the system API. This is not related to your application version.

type: The type of application. This will tell the system what to look for in the body of the definition. For now, the only supported type is Kubernetes.

visible (optional): This is a Boolean parameter that defaults to true. This means the application will be shown in the SAP DI launchpad along with other tiles.
If set to false, the application will be hidden.

icon (optional): A reference to an SVG file (in my test it worked with PNG too) containing an icon for the application. If the reference is an absolute path, it will be interpreted as an external URL. If relative, it will be interpreted as a repository file with ‘/’ as the base folder of the solution.

mounts (optional): Thi parameter defines which mounts should be mounted for the application. You can specify vhome the workspace mount and vrep for the repository. Both vhome and vrep are Boolean parameters.

Since the type of the application is specified as Kubernetes you will notice that the body almost exactly looks like a standard Kubernetes deployment.

The body definition looks like this:

"body": {

  "kubernetes": {

    "apiVersion": "v2",

    "service": { ... },

    "deployment": { ... }

  }

}

The service definition looks like:

For an example on standard Kubernetes Service Creation go to the following link.

"service": {

  "spec": {

    "ports": [

      {

        "port": 6006

      }

    ]

  }

}

The Kubernetes Service YAML equivalent would be:

apiVersion: v2

kind: Service

metadata:

  name: tensorboard-app

  labels:

    app: tensorboard-app 

spec:

  selector:

    app: tensorboard-app

  ports:

    - port: 6006

.spec.ports[*].port field specifies that the server is exposed on port 6006 by the system. It means that when a request is received on this port that the system will redirect the request to the application.

The deployment definition looks like:

For an example on standard Kubernetes Deployment Creation go to the following link and go to link for more information Containers in Kubernetes.

"deployment": {

  "spec": {

    "template": {

      "spec": {

        "containers": [

          {

            "name": "tensorboard-app", 

            "image": "tensorflow/tensorflow:2.3.0",

            "command": [

              "/usr/local/bin/tensorboard"

            ],

            "args": [

              "--logdir",

              "/vhome/tf",

              "--host",

              "0.0.0.0"

            ],

In the top part of the container definition, the image is defined that should be pulled when deployed. In our example, we defined a publicly accessible image tensorflow/tensorflow with tag 2.3.0 (you can specify latest if you always want the latest image to be pulled). command and args is used to execute a specific application in the container similar to CMD in Docker.

You will notice that /vhome/tf is specified as the log directory that should be read by TensorBoard. /vhome is the User Workspace that is mounted for the session user (SAP Data Intelligence User). This means there will be a separate instance of TensorBoard per user with its own workspace.

            "resources": {

              "requests": {

                "cpu": "0.1",

                "memory": "1Gi"

               },

               "limits": {

                 "cpu": "0.5",

                 "memory": "5Gi"

               }

             },

Optionally you can also allocate and limit the resources such as memory and CPU container should use by defining Kubernetes resources.requests and resources.limits.

            "securityContext": {

              "runAsUser": 1000

            },

The security context of a container defines for Kubernetes to determine if the application is not running as the root user (which would make the application insecure by giving it too many system privileges).

Please the following Kubernetes link for more information on securityContext. runAsUser functions the same as User in Docker.

            "ports": [

              {

                "containerPort": 6006

              }

            ]

          }

        ]

      }

    }

  }

}

The ports are described in the Kubernetes documentation:

List of ports to expose from the container. Exposing a port here gives the system additional information about the network connections a container uses, but is primarily informational. Not specifying a port here DOES NOT prevent that port from being exposed. Any port which is listening on the default "0.0.0.0" address inside a container will be accessible from the network. Cannot be updated.

The basic function is the same as EXPOSE in Docker.

When deploying the solution in the system, the system will parse the Kubernetes section and match all the Kubernetes labels for service and deployment. After parsing the file, it will instantiate a Pod with the definition specified in the file.

The final application descriptor file

Finally, the full Application Descriptor file tensorboard-app.json looks like this:

{

  "name": "TensorBoard",

  "type": "kubernetes",

  "apiVersion": "v2",

  "version": "0.0.2",

  "icon": "/vsystem/icons/TensorBoard.png",

  "mounts": { "vhome": true },

  "body": {

    "kubernetes": {

      "apiVersion": "v2",

      "service": {

        "spec": {

          "ports": [

            {

              "port": 6006

            }

          ]

        }

      },

      "deployment": {

        "spec": {

          "template": {

            "spec": {

              "containers": [

                {

                  "name": "tensorboard-app",

                  "image": "tensorflow/tensorflow:2.3.0",

                  "command": [

                    "/usr/local/bin/tensorboard"

                  ],

                  "args": [

                    "--logdir",

                    "/vhome/tf",

                    "--host",

                    "0.0.0.0"

                  ],

                  "resources": {

                    "requests": {

                      "cpu": "0.1",

                      "memory": "1Gi"

                    },

                    "limits": {

                      "cpu": "0.5",

                      "memory": "5Gi"

                    }

                  },

                  "securityContext": {

                    "runAsUser": 1000

                  },

                  "ports": [

                    {

                      "containerPort": 6006

                    }

                  ]

                }

              ]

            }

          }

        }

      }

    }

  }

}

Closing and further reading

Developing the application descriptor file is the important first step to extend SAP Data Intelligence with a custom solution. To finally get the solution deployed, up and running, and available for users there are some more steps required, which I show you in the second part of this blog post (link to the second part).

I hope you enjoyed this blog post and that it gives you a general understanding of components that are part of a custom solution in SAP Data Intelligence. Let me know if you have questions or feedback in the comments section.

If you want more information about everything SAP Data Intelligence related I recommend the following: