Understanding containers (part 06): communication ...

Vitaliy-R · ‎03-27-2020

In my recent post Quickly load Covid-19 data with hana_ml and see with DBeaver I used the Jupyter notebook to connect to my SAP HANA database instance in the public cloud via its IP address. But what if I want to connect from a Jupyter notebook running in one container to the OrientDB server running in another container?

Some additional setup

If you've been following previous parts of this Understand Containers series, then you should have these two containers on your computer myjupyter01 and myorientdb01.

To get something meaningful we need some data in the OrientDB database and a Python client in the Jupyter environment to connect to that database.

`pyorient` driver

We need an OrientDB driver for Python installed in my Jupyter container from https://github.com/orientechnologies/pyorient.

docker exec myjupyter01 pip install git+https://github.com/orientechnologies/pyorient

[At the time of writing this post the proper version of the driver is not available at PyPI, therefore we need to install it from the source with
pip install git+https://github.com/orientechnologies/pyorient]

Ok, the driver has been installed in the Jupyter container.

`OpenBeer` database

We need some data, and -- because it is Friday afternoon here already -- what can be the better data to look at, if not at OpenBeer dataset?

If you haven't played with it yet by following some of the tutorials or just by importing it as one of the sample databases in OrientDB Studio, then I prepared this one-liner to import it into the DB server running on orientdb01 container. (It was on macOS, and might not work the same way on Windows host)

BEERDIR=/orientdb/databases/OpenBeer sh -c 'docker exec myorientdb01 sh -c "mkdir -p $BEERDIR && cd $BEERDIR && wget -O- http://orientdb.com/public-databases/OpenBeer.zip | jar xv && ls $BEERDIR | wc -l"'

This command is certainly overengineered 🙂 as the same can be done in a less confusing way. But I did want to play with it a bit, and that's the result. Some notes:

The database folder has to be in /orientdb/databases/,

I wanted to chain a few commands to be executed in the container in one line, so the only way to do it is via docker exec myorientdb01 sh -c "CMD1 && CMD2",

I wanted to pass the environment variable $BEERDIR to the commands, and the only way I found is by wrapping the docker command into another sh -c 'CMD0',

The downloadable sample database is a zip file, and -- because there is no unzip available in the container -- we do the trick using available jar x to uncompress a file coming from wget via stdin stream.

Once executed successfully...

...you should see this database in the OrientDB Studio.

admin / admin are default user and password for this sample db.

Once logged you should be able to run a query like below. This one is to find from the graph what the categories of beers brewed in the United States are (at the time the dataset was created ;-)) ordered by a number of beers linked to each category.

select CatName, count(*) as CatCount 

from 

  (select in(HasBrewery).out(HasCategory).cat_name as CatName 

   from Brewery 

   where country='United States' 

   unwind CatName) 

 where CatName is not null 

 group by CatName 

 order by CatCount desc

That was just the preparation for what we really want to practice today: two containers (or rather applications inside of them) talking to each other.

Containers talking to each other...

Basically, we want to get a result of the same query above, but in Jupyter (running in one container) connecting to the OrientDB (running in another container).

...using IP

Docker has a few networks preconfigured as can be seen using docker network ls.

Each container by default is assigned to the Docker network bridge and gets an IP address in that network. We can find these IP addresses by scrolling through the output of the docker network inspect bridge. Or we can use the --format option to output only what we need using a Go template.

I dunno about you, but it is the first time for me using Go templates. It took me a while to get this syntax working in the command below. I'll be grateful if you can share some improvements in comments!

docker network inspect bridge --format '{{range .Containers}}{{.Name}} {{.IPv4Address}}; {{end}}'

So, in my environment, the IP addresses are:

172.17.0.2 for myorientdb01,

172.17.0.3 for myjupyter01.

Let's open a new notebook OrientDB_OpenBeersDB in the Jupyter and try!

We imported pyorient and then used it to open OpenBeer db with the driver's client. This client connected to server on 172.0.0.2 port 2424. This port is used by OrientDB for clients that support Network Binary Protocol, like pyorient.

Please note that we use the port from inside the container, not the one to which it might be mapped on the host running Docker!

Next, we just run a query and output its result.

import pyorient

#

client = pyorient.OrientDB("172.17.0.2", 2424)

odbclusters=client.db_open("OpenBeer", "admin", "admin")

#

categories = client.query(

    "select CatName, count(*) as CatCount from (select in(HasBrewery).out(HasCategory).cat_name as CatName from Brewery where country='United States' unwind CatName) where CatName is not null group by CatName order by CatCount desc"

)

#

for category in categories:

    print(category.oRecordData)

...using a hostname

It is not a good practice to communicate using IP addresses, because it might change. To solve this and to switch to hostnames we should use Docker's user-defined networks.

Let's create our own mynet01 network...

docker network create mynet01

docker network ls

...and connect our two containers to this network.

docker network connect mynet01 myorientdb01

docker network connect mynet01 myjupyter01

docker network inspect mynet01 --format '{{range .Containers}}{{.Name}} {{.IPv4Address}}; {{end}}'

Note that in the mynet01 our containers have IP addresses in the range 172.18.0.0/16, while in the default bridge network they have IP addresses are in the range 172.17.0.0./16.

What is cool, is that in the user-defined network container names are as well hostnames! Using example 3 from Understanding containers (part 03): one-shot containers we can use ping to check how it is working.

docker exec myorientdb01 ping myjupyter01

docker run -t --rm --net container:myorientdb01 nicolaka/netshoot ping myjupyter01

Let's try from the Jupyter notebook now changing the IP address with the hostname.

client = pyorient.OrientDB("myorientdb01", 2424)

Everything works fine.

...using an alias

But using the container name as a hostname might still not be the best solution, in case we got another container with another name running our OrientDB server and serving data.

In that case, it is better to connect a container to a user-defined network using an alias. So, let's disconnect and reconnect our container myorientdb01 to mynet01 network using --alias...

docker network disconnect mynet01 myorientdb01

docker network connect --alias myorientdb mynet01 myorientdb01

...and update the code in the Jupyter notebook.

client = pyorient.OrientDB("myorientdb", 2424)

?

We used quite a few new concepts today: not just Docker's networks, but executing multiple commands in the container using sh -c "...&&..." and using Go templates to format the command output too.

Just like you, I am thirsty for more. But it is Friday evening, so let me quench my other thirst with ... another beer. Repeating after Ray Bradbury "Beer's intellectual..."

Cheers ?
-Vitaliy (aka @Sygyzmundovych)