DI Basics: Building a Custom Dockerfile
- A functional Data Intelligence cluster.
- DI cloud (all versions)
- DI on-premise (3.x)
- Access to the Data Intelligence Modeler.
When using code-based operators, such as the Python3 operator, there are times when additional libraries are needed that are not included in the default Python3 operator. Attempting to import these libraries will result in error unless these libraries are provided in a custom image which can be created using a custom dockerfile.
In this example we’ll include the Tensorflow library as a test. I’ve created a simple graph to test:
Within the Python operator I have some basic code to send the tensorflow version to wiretap:
Running this graph as-is with the default Python3 operator gives the following error because the tensorflow module is missing:
Error while executing Python Operator’s user provided script: No module named ‘tensorflow’
Creating a new Dockerfile:
To overcome this, we will create a simple Dockerfile that includes the tensorflow library and add it to a group. In part 2 we will include it as part of a custom operator.
We will only cover this as a quick example. For more detail on Dockerfile commands and usage please see Docker’s Getting Started guide.
- In the Modeler application, navigate to the ‘Repositories’ tab
- Right-click on the ‘dockerfiles’ folder and select ‘Create Docker File’:
- Once you give it a name, navigate to the newly created folder within the ‘dockerfiles’ folder. Double-click the ‘Dockerfile’ file to open the new Dockerfile for editing. In this example my dockerfile is named ‘tensorflow_test’:
- I am just adding two lines to this sample Dockerfile:
- FROM – tells the Dockerfile which base image to inherit from. Since this is for a Python operator I am just using the SUSE base image included with DI as it already includes Python3 and pip
- RUN – tells the Dockerfile which commands to run at build time. In this case I am just using pip to install a specific version of the tensorflow library.
- Next we will add tags to the new Dockerfile. Setting proper tags is extremely important as this is how we later match this image with the corresponding group or custom operator. Click on the settings button in the upper-right and click the plus sign to add new tags. Below are the tags I am using:
- ‘sles’ tag is included because we are using the sles base time. The ‘python36’ and ‘tornado’ tags tell the engine that this is a python image. I also recommend adding a custom tag to differentiate this image. In this case I’ve added ‘tensorflow_test’ which is specific to this image.
- Once finished, use the buttons in the upper-right to save and build the new Dockerfile into an image. Once the build completes you should hopefully see a green completed dot like below:
- If the build fails with a red dot, the pipeline-modeler logs can be checked for the source of the issue.
Using the Dockerfile in a group:
Now we can use the new Dockerfile image in a group that holds the default Python3 operator. In my test graph we right-click on the Python3 operator and chose ‘Group’.
- Once the Python3 operator is in its own group, open the group configuration to add new tags:
- Make sure the tags exactly match the tags defined in the Dockerfile:
- Save and run the graph. Once the graph reaches ‘Running’ status you should be able to open the Wiretap operator and view the Tensorflow version which matches the version we installed in the Dockerfile: