Skip to Content
Technical Articles

Using SAP Data Hub without internet access

It is rare but sometimes required to work in environment without connectivity to the internet, not even with a secure proxy.

It is entirely possible, but it takes some additional efforts. Here’s a few tips to make your life easier if you find yourself in this situation. The installation happens in three steps:

  • First specify the installer to download all required images into a folder
  • Then transfer everything onto the secure network
  • Finally, run the installer with a parameter pointing to the folder with the 41 GB of images

You’ll need additional docker images:

  • tiller
  • registry (for insecure docker registry)
  • A software binary repository: nexus, artifactory, or any alternative

And you’ll also need additional programs:

Post Installation

After the installation completes successfully, it’s just the beginning !

Running a sample pipeline will fail if the insecure docker registry uses HTTPS. The solution is simply to import it with the “Connection Management” application.

After testing the demo pipelines, it’s time to build your own, and to do so, you’ll probably require external libraries for python, java or node that aren’t available. That’s where the software binary repository comes into play. For instance, to write a custom python operator that connects to a SOAP web service, we need a package called zeep, with this dependency tree:

In order to use zeep in this offline environment, we need to transfer 15 python packages ! And the different custom operators will require even more packages so it is important to have a tool to solve this issue. We will make pip commands point to a local repository manager that will provide the required libraries in the correct versions.

Prepare the offline python repository

we installed the nexus docker image inside OpenShift.

Then we did some setup connected to the html administration UI to:

  • create a python hosted repository.
  • granted the browse and upload role on that repo to the anonymous user

Download packages

To download all required packages, use the pip command:
pip3 download --only-binary=:all: --python-version 36 --platform manylinux1_x86_64 -d . zeep

Looking in indexes: https://pypi.python.org/simple/
Collecting zeep
[...]
Collecting lxml>=3.1.0 (from zeep)
Downloading https://files.pythonhosted.org/packages/ec/be/5ab8abdd8663c0386ec2dd595a5bc0e23330a0549b8a91e32f38c20845b6/lxml-4.4.1-cp36-cp36m-manylinux1_x86_64.whl (5.8MB)
|████████████████████████████████| 5.8MB 589kB/s Saved ./lxml-4.4.1-cp36-cp36m-manylinux1_x86_64.whl
[...]
Successfully downloaded zeep six defusedxml requests-toolbelt pytz isodate appdirs requests lxml attrs cached-property urllib3 certifi chardet idna

Then you should tar the wheels files and transfer them onto the secure environment.

Load packages in the offline python repository

To upload the wheels into the repository, you need a tool called twine, it’s the opposite of pip. It should be included with the python rpm for your distribution.

twine upload --repository-url <your repo url> *.whl
Enter your username: admin
Enter your password:
Uploading distributions to <your repo url>
Uploading appdirs-1.4.3-py2.py3-none-any.whl
100%|█████████████████████████████████████████████████████████| 24.1k/24.1k [00:00<00:00, 382kB/s]
Uploading attrs-19.1.0-py2.py3-none-any.whl
100%|█████████████████████████████████████████████████████████|
[…]

Use those libraries in a custom operator

We make a new docker image that includes one or more additional libraries and will be used in custom operators.

It might be cumbersome to make one docker image for every external package, so we could make one for all small packages and one for each big package like tensorflow (100 MB)

You can follow this tutorial by Jens Rannacher to create a docker file, and set the content as:

FROM $com.sap.opensuse.python36

RUN   python3 -m pip config --global set global.index nexus \
  &&  python3 -m pip config --global set global.index-url your_repo_url \
  &&  python3 -m pip config --global set global.trusted-host your_repo_host
RUN python3 -m pip install zeep

And off you go !

Then we make a graph, add a custom python3 operator, place it in a group and add a tag unique to the previous docker image. (For all practical questions including tagging, I reach out to our champion Henrique Pinto)

And in that operator, the python library zeep is available for use !

 

3 Comments
You must be Logged on to comment or reply to a post.
  • Now I know what you are doing on your bank holiday..

    Nevertheless, really interesting and most likely will be a really useful content to the community

    • I think the product should have a tighter integration to git and software like nexus to create a robust framework that leverages the best in open source libraries.

  • Now I might start to charge a fee! 🙂

    Awesome blog and great idea on how to leverage Nexus for pip.
    Another possibility would be to consider it for supporting a local repo for zypper on openSUSE.