How To Use Docker With Python For The SAP Pipeline

carolineoakes · ‎09-18-2022

Docker allows you to use Python and R in containerized environments. You get access to several Pyton3 and R operators out of the box. You will, of course, have to import a few things, such as the base libraries for Python and R. Luckily, the Modeler gives you a predefined runtime environment to download relevant libraries. Here, we'll look at a simple use case for Docker. We'll be using a Python3 operator for tagging. Hopefully, at the end of this, you'll better understand the SAP Data Intelligence Modeler application.

Overhead

In this use case, we'll be using the pandas engine. Pandas is an open-source engine that you can install from within Docker. First, start off by building a Dockerfile to hold the Docker image FROM $com.sap.sles.base. This image will come with the python36 library. We can then use the Package Manager pip to install pandas. Once we've done all that, we can save and execute the Dockerfile. You should see a confirmation in the top right corner of the screen saying, "Build status: completed [Date]."

Integrating Python3 and the Docker Image

We'll have to link our Dockerimage to the Python script and inform it that we'll need access to pandas. By default, the Python operator has no inputs and outputs. We'll have to start off by defining the operator's group. To do this, we can go through a series of steps;

Open Modeler.

Create a blank graph or pipeline. You may also open an existing graph or pipeline, but to keep it simple, we're dealing with a new pipeline. Drag the Python3 Operator into the blank space.

Right-click the operator. From the context menu that pops up, we'll select "Group."

A group box should appear, surrounding the operator. Select the group box (not the Operator, but the box around it) and open the Configuration panel. Under "Tags,” add "pandalib" to access the integration.

Now that you've added the pandalib tag, the Modeler system will recognize it. There are a few things to note in this scenario. The pandas library must be available. The initial setup (in the Overhead section) advises you on how to include the pandas library. If you're planning to use R instead of Python, the workflow and setup are virtually the same. You'll need to map the corresponding pandas tag to the R operator instead in that instance.

Tools like SAP are useful for cloud analytics, and R and Python facilitate these tools. Docker provides an environment that significantly speeds up the development of analytics tools by easy inclusion of libraries like pandas. Visually, it's a lot easier to spot program flows than sifting through code and manually following function calls. The tagging system ensures that the operator gets access to the libraries it needs through the tag. Loading those libraries before tagging the operators is crucial to ensure that there's a seamless operation. Without the libraries, errors will arise, and the output may be nonexistent.