Skip to Content
Technical Articles

Some Notes on Docker File Creation on SAP Data Intelligence

Introduction

As a Data Scientist or Data Engineer you might not be too familiar hands-on with Docker. At least this was my start. I knew about the appealing concept of containerising applications but when developing pipelines or operators on SAP Data Intelligence I was always happy when having an existing docker image that I could use. With time requests had come to leave my comfort zone and to learn more about using Docker. Eventually I had to realise that working with docker directly is not that hard as expected and the learning curve is rather short and steep than painstakingly long.

In this blog I give a short introduction of Docker from an SAP Data Intelligence angle. This is followed firstly how to add python packages with pip and secondly what needs to be done if another package manager is required. Finally I delve into the challenge when more elaborate installation tasks had to be added to a Dockerfile. For the sake of your nerves and fingernails this should be done and tested interactively before building an image on a SAP Data Intelligence instance.

Docker on SAP Data Intelligence

In general you can use any docker image to run on DI. You only have to ensure that it is correctly tagged so that the pipeline scheduler can select the appropriate docker container that provides the libraries required by the operators.

You might run into the challenge of using operators having tags that none of the existing docker image complies with, e.g. ‘flowagent’ and ‘python36’. Then either you

  • group parts of the pipeline for running them in different docker containers with the caveat of the data volume restriction or
  • enhance one of the images with the necessary packages

Side comment: messages between operators are either exchanged within the same docker container or across docker containers using NATS.

Enhancing Existing Docker Images with pip

SAP has an enterprise support aggreement with Suse and uses SLES as the basis for most of the operators. If you like for example add python packages like ‘pandas’ then you can select the base image with the reference character ‘$’

FROM $com.sap.sles.base

or directly pull the image from the repository with the reference character ‘§’

FROM §/com.sap.datahub.linuxx86_64/sles:15.0-sap-003

The latter might miss some enhancements that might be added to the Dockerfile in com.sap.sles.base. With that method you can also inherit from non-standard images that have been built and pushed to the local Docker registry from outside of SAP Data Hub / SAP Data Intelligence (on premise). This is often required when it is only allowed to use trusted images that have been hardened according to the company policy. The syntax is as follows: FROM §/<image-name-in-repo>:<version>.

Finally your new Dockerfile might look like:

FROM $com.sap.sles.base
RUN python3.6 -m pip --no-cache-dir install 'pandas'
RUN python3.6 -m pip --no-cache-dir install 'scikit-learn'

It is very important that you tag the new Docker image not only with the newly added packages but also refer to the tags of the base image. There is currently (SAP DI 2.6)  no inheritance process in place. In our particular case it would like as

  • default
  • sles
  • python27
  • python36
  • tornado – 5.0.2
  • pandas
  • scikit-learn

Enhancing Existing Docker Images with other Package Manager

Enhancing the SAP provided and maintained imagages has its limitations because you can only use ‘pip’ for installing python packages. If the use of other package managers like ‘apt-get’ from ubuntu, ‘zypper’ from suse, etc. is necessary then you have to fall back to openly available images.

Fortunately there is already an image that contains the basic packages and can be enhanced as you like. It can be found in the Modeler ->repository/dockerfiles folder with the path:

$com.sap.opensuse.golang.zypper

and the definition:

FROM opensuse/leap:15.0
ARG GOPATH=/gopath
ARG GOROOT=/goroot

ENV GOROOT=${GOROOT}
ENV GOPATH=${GOPATH}
ENV PATH=${GOROOT}/bin:${GOPATH}/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

RUN zypper --non-interactive update && \
# Install tar, gzip, python, python3, pip, pip3, gcc and libgthreadzypper --non-interactive install --no-recommends --force-resolution \  
    tar \
    gzip \
    python=2.7.14 \
    python2-pip \
    python3 \
    python3-pip \
    gcc=7 \gcc-c++=7 \
    libgthread-2_0-0=2.54.3 && 

# Install tornado
python2 -m pip --no-cache install tornado==5.0.2 
python3 -m pip --no-cache install tornado==5.0.2

COPY sapgolang.tar.gz /tmp/sapgolang.tar.gz

RUN mkdir -p $GOROOT && \
    tar -xzf /tmp/sapgolang.tar.gz --strip-components=1 -C ${GOROOT}

and the tags

  •  opensuse
  • python27
  • python36
  • tornado – 5.0.2
  • sapgolang – 1.12.1-bin
  • zypper

This base image enables you to run the package manager “zypper” for installing further packages to the image e.g.:

RUN zypper in gcc-fortran

Interactively Creating Dockerfiles

If you need to build more complex Dockerfiles than just adding a couple of simple packages with pip and zypper then you are strongly advised to do so locally first before adding lines in the Dockerfile on a SAP Data Intelligence instance unless you are an exceptional OS-admin and Docker guru. If you belong to the more ordinary kind of a developing data scientist or data engineer, the fast try-and-error approach might be more appropriate. This means you need to install Docker first locally,

and maybe read about the limited number of commands you are going to use in Dockerfiles. On my opinion a Dockerfile is just an installation batch-script that processes the commands outlined. In the vastness of the internet you are going to find hosts of good introductory pages to Docker.

In the following I take up a request from a customer in the meteorology business to use special libraries needed to write operators in Python. My first trial was just to add the necessary lines to my most favourite Docker image ($com.sap.sles.base)

RUN zypper addrepo https://download.opensuse.org/repositories/home:SStepke/openSUSE_Leap_15.0/home:SStepke.repo
RUN zypper refresh
RUN zypper install eccodes

and fell flat on my face. The succinct error message just told me that the build has failed.

So I started my search for enlightenment locally with the base image *opensuse/leap:15.0* and the basic extension of the Dockerfile ‘$com.sap.opensuse.golang.zypper’.

Preparation

I created a directory that contains the Dockerfile ‘$com.sap.opensuse.golang.zypper.Dockerfile’ and ‘sapgolang.tar.gz’ because the latter is needed as well.

Then I opened a terminal, went to the above folder and started a build process with

docker build --tag eccodes

and after a some time I got a list of my images with the command

docker images
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
eccodes latest 44b88839c5b3 44 seconds ago 661MB
opensuse/leap 15.0 7b6c420ec38e 9 days ago 104MB

with

docker images --all

I could see that it was a stacked building process where a lot of child images had been produced.

$ docker images --all
REPOSITORY TAG IMAGE ID CREATED SIZE
eccodes latest 44b88839c5b3 3 minutes ago 661MB
<none> <none> 4dbeed3246d5 3 minutes ago 533MB
<none> <none> ad60d5a15d70 3 minutes ago 530MB
<none> <none> 773c4c187f90 3 minutes ago 526MB
<none> <none> 4744b754f3a7 3 minutes ago 517MB
<none> <none> fbfaf8d1d6c0 11 minutes ago 104MB
<none> <none> 55a13db79639 11 minutes ago 104MB
<none> <none> 7cd45134c515 11 minutes ago 104MB
<none> <none> ab159e9ee696 11 minutes ago 104MB
<none> <none> c7eeb77d5357 11 minutes ago 104MB
opensuse/leap 15.0 7b6c420ec38e 9 days ago 104MB

If the before mentioned new lines are added for installing the additional repository and the eccodes package then the image build is much faster but finally fails as well.

But now having the image locally I could run the docker container interactively using the shell and could test all commands step-by-step.

Step by Step Installation of  a new Docker Image

For the step-by-step installation I first needed to run the container interactively

eccodes-di d051079$ docker run -it eccodes bash (or  eccodes-di d051079$ docker run -it eccodes sh)

With this I am in the container and can enter the commands needed for the new Docker image.

1. Command

9b07363dfa92:/ # zypper addrepo https://download.opensuse.org/repositories/home:SStepke/openSUSE_Leap_15.0/home:SStepke.repo ``` 

->  – No issue

2. Command

9b07363dfa92:/ # zypper refresh
Retrieving repository 'SStepke's Home Project (openSUSE_Leap_15.0)' metadata ---------------------------------------------------------------[\]

New repository or package signing key received:

Repository: SStepke's Home Project (openSUSE_Leap_15.0)
Key Name: home:SStepke OBS Project <home:SStepke@build.opensuse.org>
Key Fingerprint: 02C16E40 E54FD96B 57CBFA85 B1A9061F 7E4A4A2F
Key Created: Tue Nov 6 15:33:51 2018
Key Expires: Thu Jan 14 15:33:51 2021
Rpm Name: gpg-pubkey-7e4a4a2f-5be1b45fDo you want to reject the key, trust temporarily, or trust always? [r/t/a/?] (r):

This is an interactive command where the default was not helping at all. With some internet research I got the answer by adding the option –gpg-auto-import-keys.

3. Command

9b07363dfa92:/ # zypper --non-interactive install eccodes 

ran when the option “`–non-interactive“` has been added.

Summary

Here we go. Now I had all the commands tested and the Dockerfile ran without complaints when the following  3 lines are added

RUN zypper addrepo https://download.opensuse.org/repositories/home:SStepke/openSUSE_Leap_15.0/home:SStepke.repo
RUN zypper refresh --gpg-auto-import-keys
RUN zypper --non-interactive install eccodes 

Conclusion

With these learnings I am prepared to tackle a lot of challenges coming across when working with enhancing Dockerfiles with pip and  zypper package managers. Now I do not shy away when there is an ask for some sophisticated tasks like adding binaries, setting system variables etc.

Reference

SAP DI Help – Create Docker

SAP DI Help – Docker Inheritance

Be the first to leave a comment
You must be Logged on to comment or reply to a post.