Skip to Content
Technical Articles

Some Notes on Docker File Creation on SAP Data Intelligence

Introduction

As a Data Scientist or Data Engineer you might not be too familiar hands-on with Docker. At least this was my start. I knew about the appealing concept of containerising applications but when developing pipelines or operators on SAP Data Intelligence I was always happy when having an existing docker image that I could use. With time requests had come to leave my comfort zone and to learn more about using Docker. Eventually I had to realise that working with docker directly is not that hard as expected and the learning curve is rather short and steep than painstakingly long.

In this blog I give a short introduction of Docker from an SAP Data Intelligence angle. This is followed firstly how to add python packages with pip and secondly what needs to be done if another package manager is required. Finally I delve into the challenge when more elaborate installation tasks had to be added to a Dockerfile. For the sake of your nerves and fingernails this should be done and tested interactively before building an image on a SAP Data Intelligence instance.

Docker on SAP Data Intelligence

In general you can use any docker image to run on DI. You only have to ensure that it is correctly tagged so that the pipeline scheduler can select the appropriate docker container that provides the libraries required by the operators.

You might run into the challenge of using operators having tags that none of the existing docker image complies with, e.g. ‘flowagent’ and ‘python36’. Then either you

  • group parts of the pipeline for running them in different docker containers with the caveat of the data volume restriction or
  • enhance one of the images with the necessary packages

From performance reasons you might consider running a pipeline in one container then spread it to multiple ones.

Enhancing Existing Docker Images with pip

SAP has an enterprise support aggreement with Suse and uses SLES as the basis for most of the operators. If you like for example add python packages like ‘pandas’ then you can select the base image with the reference character ‘$’

FROM $com.sap.sles.base

or directly pull the image from the repository with the reference character ‘§’

FROM  §/com.sap.datahub.linuxx86_64/sles:15.0-sap-007

The latter might miss some enhancements that might be added to the Dockerfile in com.sap.sles.base. With that method you can also inherit from non-standard images that have been built and pushed to the local Docker registry from outside of SAP Data Hub / SAP Data Intelligence (on premise). This is often required when it is only allowed to use trusted images that have been hardened according to the company policy. The syntax is as follows: FROM §/<image-name-in-repo>:<version>.

With SAP Data Intelligence 3.0 you are required to run containers not a ‘root’ user. That means you have to add group and a user to each docker/container:

RUN groupadd -g 1972 cmddata && useradd -g 1972 -u 1972 -m cmddata
USER 1972:1972
WORKDIR "/home/cmddata"
ENV HOME=/home/cmddata
ENV PATH="${PATH}:${HOME}/.local/bin"

In addition I recommend to set some environment variables accordingly. In particular adding the user ‘bin/’ directory in case binaries are installed there as well.

Finally your new Dockerfile might look like:

FROM $com.sap.sles.base

RUN groupadd -g 1972 cmddata && useradd -g 1972 -u 1972 -m cmddata
USER 1972:1972
WORKDIR "/home/cmddata"
ENV HOME=/home/cmddata
ENV PATH="${PATH}:${HOME}/.local/bin"

RUN python3.6 -m pip --no-cache-dir install 'pandas' --user
RUN python3.6 -m pip --no-cache-dir install 'scikit-learn' --user

Do not forget adding the option ‘–user’ to the pip command to ensure that the package is only installed with user authorities.

It is very important that you tag the new Docker image not only with the newly added packages but also refer to the tags of the base image. There is currently (SAP DI 2.6)  no inheritance process in place. In our particular case it would like as

  • default
  • sles
  • python36
  • tornado – 5.0.2
  • pandas
  • scikit-learn

Enhancing Existing Docker Images with other Package Manager

Enhancing the SAP provided and maintained imagages has its limitations because you can only use ‘pip’ for installing python packages. If the use of other package managers like ‘apt-get’ from ubuntu, ‘zypper’ from suse, etc. is necessary then you have to fall back to openly available images.

Fortunately there is already an image that contains the basic packages and can be enhanced as you like. It can be found in the Modeler ->repository/dockerfiles folder with the path:

$com.sap.opensuse.golang.zypper

and the definition:

FROM $com.sap.sles.base

RUN groupadd -g 1972 cmddata && useradd -g 1972 -u 1972 -m cmddata
USER 1972:1972
WORKDIR "/home/cmddata"
ENV HOME=/home/cmddata
ENV PATH="${PATH}:${HOME}/.local/bin"

ARG GOPATH=/gopath
ARG GOROOT=/goroot

ENV GOROOT=${GOROOT}
ENV GOPATH=${GOPATH}
ENV PATH=${GOROOT}/bin:${GOPATH}/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

RUN zypper --non-interactive update && \
# Install tar, gzip, python, python3, pip, pip3, gcc and libgthreadzypper --non-interactive install --no-recommends --force-resolution \  
    tar \
    gzip \
    python3 \
    python3-pip \
    gcc=7 \gcc-c++=7 \
    libgthread-2_0-0=2.54.3 && 

# Install tornado
python3 -m pip --no-cache install tornado==5.0.2 --user

COPY sapgolang.tar.gz /tmp/sapgolang.tar.gz

RUN mkdir -p $GOROOT && \
    tar -xzf /tmp/sapgolang.tar.gz --strip-components=1 -C ${GOROOT}

and the tags

  • opensuse
  • python36
  • tornado – 5.0.2
  • sapgolang – 1.12.1-bin
  • zypper

This base image enables you to run the package manager “zypper” for installing further packages to the image e.g.:

RUN zypper in gcc-fortran

Interactively Creating Dockerfiles

If you need to build more complex Dockerfiles than just adding a couple of simple packages with pip and zypper then you are strongly advised to do so locally first before adding lines in the Dockerfile on a SAP Data Intelligence instance unless you are an exceptional OS-admin and Docker guru. If you belong to the more ordinary kind of a developing data scientist or data engineer, the fast try-and-error approach might be more appropriate. This means you need to install Docker first locally,

and maybe read about the limited number of commands you are going to use in Dockerfiles. On my opinion a Dockerfile is just an installation batch-script that processes the commands outlined. In the vastness of the internet you are going to find hosts of good introductory pages to Docker.

In the following I take up a request from a customer in the meteorology business to use special libraries needed to write operators in Python. My first trial was just to add the necessary lines to my most favourite Docker image ($com.sap.sles.base)

RUN zypper addrepo https://download.opensuse.org/repositories/home:SStepke/openSUSE_Leap_15.0/home:SStepke.repo
RUN zypper refresh
RUN zypper install eccodes

and fell flat on my face. The succinct error message just told me that the build has failed.

So I started my search for enlightenment locally with the base image *opensuse/leap:15.0* and the basic extension of the Dockerfile ‘$com.sap.opensuse.golang.zypper’.

Preparation

I created a directory that contains the Dockerfile ‘$com.sap.opensuse.golang.zypper.Dockerfile’ and ‘sapgolang.tar.gz’ because the latter is needed as well.

Then I opened a terminal, went to the above folder and started a build process with

docker build --tag eccodes .

and after a some time I got a list of my images with the command

docker images
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
eccodes latest 44b88839c5b3 44 seconds ago 661MB
opensuse/leap 15.0 7b6c420ec38e 9 days ago 104MB

with

docker images --all

I could see that it was a stacked building process where a lot of child images had been produced.

$ docker images --all
REPOSITORY TAG IMAGE ID CREATED SIZE
eccodes latest 44b88839c5b3 3 minutes ago 661MB
<none> <none> 4dbeed3246d5 3 minutes ago 533MB
<none> <none> ad60d5a15d70 3 minutes ago 530MB
<none> <none> 773c4c187f90 3 minutes ago 526MB
<none> <none> 4744b754f3a7 3 minutes ago 517MB
<none> <none> fbfaf8d1d6c0 11 minutes ago 104MB
<none> <none> 55a13db79639 11 minutes ago 104MB
<none> <none> 7cd45134c515 11 minutes ago 104MB
<none> <none> ab159e9ee696 11 minutes ago 104MB
<none> <none> c7eeb77d5357 11 minutes ago 104MB
opensuse/leap 15.0 7b6c420ec38e 9 days ago 104MB

If the before mentioned new lines are added for installing the additional repository and the eccodes package then the image build is much faster but finally fails as well.

But now having the image locally I could run the docker container interactively using the shell and could test all commands step-by-step.

Step by Step Installation of  a new Docker Image

For the step-by-step installation I first needed to run the container interactively

eccodes-di d051079$ docker run -it eccodes bash (or  eccodes-di d051079$ docker run -it eccodes sh)

With this I am in the container and can enter the commands needed for the new Docker image.

1. Command

9b07363dfa92:/ # zypper addrepo https://download.opensuse.org/repositories/home:SStepke/openSUSE_Leap_15.0/home:SStepke.repo ``` 

->  – No issue

2. Command

9b07363dfa92:/ # zypper refresh
Retrieving repository 'SStepke's Home Project (openSUSE_Leap_15.0)' metadata ---------------------------------------------------------------[\]

New repository or package signing key received:

Repository: SStepke's Home Project (openSUSE_Leap_15.0)
Key Name: home:SStepke OBS Project <home:SStepke@build.opensuse.org>
Key Fingerprint: 02C16E40 E54FD96B 57CBFA85 B1A9061F 7E4A4A2F
Key Created: Tue Nov 6 15:33:51 2018
Key Expires: Thu Jan 14 15:33:51 2021
Rpm Name: gpg-pubkey-7e4a4a2f-5be1b45fDo you want to reject the key, trust temporarily, or trust always? [r/t/a/?] (r):

This is an interactive command where the default was not helping at all. With some internet research I got the answer by adding the option –gpg-auto-import-keys.

3. Command

9b07363dfa92:/ # zypper --non-interactive install eccodes 

ran when the option “`–non-interactive“` has been added.

Summary

Here we go. Now I had all the commands tested and the Dockerfile ran without complaints when the following  3 lines are added

RUN zypper addrepo https://download.opensuse.org/repositories/home:SStepke/openSUSE_Leap_15.0/home:SStepke.repo
RUN zypper refresh --gpg-auto-import-keys
RUN zypper --non-interactive install eccodes 

Conclusion

With these learnings I am prepared to tackle a lot of challenges coming across when working with enhancing Dockerfiles with pip and  zypper package managers. Now I do not shy away when there is an ask for some sophisticated tasks like adding binaries, setting system variables etc.

Reference

SAP DI Help – Create Docker

SAP DI Help – Docker Inheritance

/
10 Comments
You must be Logged on to comment or reply to a post.
  • Hi Thorsten,

    great job, very helpful!

    After following your example, I found that two commands might need slight modification.

    1. $ docker build –tag eccodes
      This command might have missed an argument “.” (docker build . –tag eccodes) when you are executing this within the eccodes-di folder.
    2. eccodes-di d051079$ docker run -it eccodes bash (or eccodes-di d051079$ docker run -it eccodes sh)
      “eccodes-di d051079$” might have got into this command line unintended.

    Thank you so much and keep on with the outstanding blogs!

     

    Best, Lijin

     

  • Hi Thorsten,

    When I create the Docker base using:

    FROM $com.sap.sles.base
    RUN python3.6 -m pip –no-cache-dir install ‘pandas’
    RUN python3.6 -m pip –no-cache-dir install ‘scikit-learn’

    I get

    command ‘/bin/sh -c python3.6 -m pip –no-cache-dir install ‘pandas” returned a non-zero code:1

    Any idea on what could be the problem?   Using DI 3.0.19

    Thanks

    -ravi

     

  • Found the answer…with the new security changes designed to prevent running Containers as root, adding a –user will work

    FROM $com.sap.sles.base
    RUN python3.6 -m pip –no-cache-dir install ‘pandas’ –user
    RUN python3.6 -m pip –no-cache-dir install ‘scikit-learn’ –user

     

  • Hi

    I am some problems like return non-zero error.

    I create the docker file correcly with the code:

    FROM $com.sap.sles.base

    RUN groupadd -g 1972 cmddata && useradd -g 1972 -u 1972 -m cmddata
    USER 1972:1972
    WORKDIR “/home/cmddata”
    ENV HOME=/home/cmddata
    ENV PATH=”${PATH}:${HOME}/.local/bin”

    RUN python3.6 -m pip –no-cache-dir install ‘tensorflow’ –user
    RUN python3.6 -m pip –no-cache-dir install ‘numpy’ –user

     

    But when i add the TAGS to the operator and run deploy i have the error non zero

    Timestamp,Level,Message,Application,Topic,ID,Function
    [object Object],ERROR,”Error building docker image: The command ‘/bin/sh -c python3.6 -m pip –no-cache-dir install ‘tensorflow’ –user’ returned a non-zero code: 1″,vflow,container,216246,buildImageCoreDocker
    [object Object],ERROR,”Docker output:
    Step 1/6 : FROM registrycaltdc212638478.azurecr.io/vora/vflow-node-d2d4352a0cfc540f9be2ae9685643998bb11e126:com.sap.sles.base
    —> 18b16da089bc
    Step 2/6 : RUN python3.6 -m pip –no-cache-dir install ‘tensorflow’ –user
    —> Running in 85e69bdb8b64
    Collecting tensorflow
    Downloading https://files.pythonhosted.org/packages/de/f0/96fb2e0412ae9692dbf400e5b04432885f677ad6241c088ccc5fe7724d69/tensorflow-1.14.0-cp36-cp36m-manylinux1_x86_64.whl (109.2MB)
    Collecting grpcio>=1.8.6 (from tensorflow)
    Downloading https://files.pythonhosted.org/packages/f1/23/62d3e82fa4c505f3195315c8a774b2e656b556d174329aa98edb829e48bc/grpcio-1.29.0.tar.gz (19.6MB)
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
          File “”<string>””, line 1, in <module>
          File “”/tmp/pip-install-cqmenk9p/grpcio/setup.py””, line 191, in <module>
            if check_linker_need_libatomic():
          File “”/tmp/pip-install-cqmenk9p/grpcio/setup.py””, line 152, in check_linker_need_libatomic
            stderr=PIPE)
          File “”/usr/local/lib/python3.6/subprocess.py””, line 709, in __init__
            restore_signals, start_new_session)
          File “”/usr/local/lib/python3.6/subprocess.py””, line 1344, in _execute_child
            raise child_exception_type(errno_num, err_msg, err_filename)
        FileNotFoundError: [Errno 2] No such file or directory: ‘cc’: ‘cc’

    Do you have any idea.

    I changed commands with thoses:

    RUN groupadd -g 1972 vflow && useradd -g 1972 -u 1972 -m vflow

    USER 1972:1972

    WORKDIR /home/vflow

    ENV HOME=/home/vflow

    But i get the same result.

     

    I am working with the Cal Sap of Data Intelligence

    Thanks,

    Best,Sergio

  • Hi Thorsten,

    great info.
    Just a quick note: the com.sap.sles.base docker image already includes the vflow user handling logic at the end of its definition, so if you’re referring to it, you don’t need to add it again.
    You can just add the pip commands after the reference, e.g.:

    FROM $com.sap.sles.base
    
    RUN python3 -m pip install --upgrade numpy pandas sklearn --user
    RUN python3 -m pip install --upgrade hdbcli hana-ml --user
  • Hi Thorsten, Thorsten Hapke

    We are trying to install fbprophet  lib from Conda in a docker file.Here is the script

    FROM §/com.sap.datahub.linuxx86_64/sles:15.0-sap-020
    RUN groupadd -g 1972 vflow && useradd -g 1972 -u 1972 -m vflow
    USER 1972:1972
    WORKDIR /home/vflow
    ENV HOME=/home/vflow

    RUN conda install -c conda-forge/label/cf201901 ‘fbprophet’ –user

    but docker image failed to build.We added the default tags with file. can you please review the script and tell me what is wrong with the script. We are using DI 3.0

    Regards,

    Arindom Saha

  • Hi, Arindom,

    have you tested the dockerfile when creating an image on your computer? E.g. your could use as a base image:

    FROM opensuse/leap:15.1FROM opensuse/leap:15.1

    I suppose some additional libs are needed when installing fbprophet. I can recall faintly when I have used this package I also had some installation issues.

    Or conda is not available and you rather have to use pip.

    Best,

    Thorsten