Skip to Content
Technical Articles
Author's profile photo Witalij Rudnicki

Track ISS with SAP Data Intelligence and SAP HANA: Ingest and transform data

The purpose of this post is purely educational.

Using SAP Data Intelligence (fka SAP Data Hub) and SAP HANA for what we will achieve is the overkill. But learning how to build data pipelines while having fun and learning something about space  (what we geeks all love) should be reasonable.

This%20selfie%20in%20NASA%20astronaut%20suite%20is%20as%20far%20into%20the%20space%20I%20have%20been%20so%20far

This selfie in NASA astronaut’s suit is as far into space as I have been…

To the point. This example is used as a demo in a session INT105 – Build Data Pipelines with SAP Data Intelligence at SAP TechEd 2020. You are more than welcome to join that session and to watch that demo. But it is not a prerequisite for following this post.

The scenario

We want to collect data about the locations on the Earth’s globe that are directly beneath the International Space Station (ISS) ?️. This would allow us to plot these locations getting a visualization similar to below.

We will use an API endpoint https://isstracker.spaceflight.esa.int/tledata.txt provided by European Space Agency as the source of the ISS location. It returns a TLE (Two-line Element) record “encoding a list of orbital elements of an Earth-orbiting object for a given point in time, [called] the epoch.” (source: Wikipedia)

And further accordingly to Wikipedia: “The format was originally intended for punch cards, encoding a set of elements on two standard 80-column cards.”

We will need to transform this TLE data into an Earth-based location for the current time, and for that we will use Python’s Skyfield package: https://pypi.org/project/skyfield/.

SAP HANA db will be our destination to store the data for further analysis and visualization.

The setup

I’d had an instance of SAP HANA, express edition, (ver. 2.0.45 at the time) already deployed at one of the cloud providers. To have both systems co-located I deployed an instance of SAP Data Intelligence trial (ver. 3.0.0 at the time) to the same cloud provider in the same region.

Please note that deployment options and the choice of a IaaS vendor are not relevant in this scenario as long as an instance of SAP Data Intelligence can communicate with the instance of SAP HANA and you can access data in the SAP HANA db from an external client.

For the sake of simplicity I will use SYSTEM users in both system. Needless to say it is a very bad practice. And never do this in Production!

Prerequisites

I would recommend you go through the following tutorials, if not yet familiar with SAP Data Intelligence at all:

For the steps that follow I assume you have some familiarity with the basics of SAP Data Intelligence now. I will not detail every small task or an icon to click below.

Initial pipeline in SAP Data Intelligence Modeler

Create a new graph in the Modeler application.

Switch to the JSON view of the graph and paste the following code.

{
    "description":"ISS locations",
    "processes":
    {
        "python3operator1":{"component":"com.sap.system.python3Operator","metadata":{"label":"Python3 Operator","extensible":true,"config":{"script":"def on_tle(tle_data):\n    api.send(\"location\", tle_data)\n\napi.set_port_callback(\"tle\", on_tle)"},"additionalinports":[{"name":"tle","type":"string"}],"additionaloutports":[{"name":"location","type":"message"},{"name":"debug","type":"string"}]}},
        "httpclient1":{"component":"com.sap.http.client2","metadata":{"label":"HTTP Client","config":{"pollingEnabled":true,"postConnection":{},"getConnection":{"connectionProperties":{"host":"isstracker.spaceflight.esa.int","port":443,"protocol":"HTTPS","authenticationType":"NoAuth"},"path":"tledata.txt","configurationType":"Manual"},"getPeriodInMs":10000}}},
        "wiretap1":{"component":"com.sap.util.wiretap","metadata":{"label":"Wiretap","ui":"dynpath","config":{}}},
        "tostringconverter1":{"component":"com.sap.util.toStringConverter","metadata":{"label":"ToString Converter","config":{}}},
        "wiretap3":{"component":"com.sap.util.wiretap","metadata":{"label":"Wiretap","ui":"dynpath","config":{}}}
    },
    "connections":[{"src":{"port":"out","process":"httpclient1"},"tgt":{"port":"in","process":"wiretap1"}},{"src":{"port":"out","process":"wiretap1"},"tgt":{"port":"ininterface","process":"tostringconverter1"}},{"src":{"port":"outstring","process":"tostringconverter1"},"tgt":{"port":"tle","process":"python3operator1"}},{"src":{"port":"location","process":"python3operator1"},"tgt":{"port":"in","process":"wiretap3"}}],"inports":{},"outports":{},
    "groups":[],
    "properties":{}
}

This will import the initial graph.

Now switch back to the Diagram view and click the Auto-Layout icon.

In the imported pipeline check the configuration of the HTTP Client operator:

  • It is already set to poll every 10 seconds…
  • …from https://isstracker.spaceflight.esa.int/tledata.txt using the GET method.

Ingest data from the API endpoint

Save the graph with the name community.sample.iss and a category Community Samples.

Once the graph is saved (and only then) you can execute it.

It might take a bit longer the first time while SAP Data Intelligence builds containers to run the graph. But once you see the status changed to Running then right click on the first Wiretap to open it’s UI.

You should see new TLE records polled from the source API endpoint every 10 seconds.

Transform data with Python operator

If you look into the UI of the second Wiretap, then you should see exactly the same records there. That’s because in this initial graph the Python3 operator simply sends to the output port location exactly the same information it receives on the input port tle.

def on_tle(tle_data):
    api.send("location", tle_data)

api.set_port_callback("tle", on_tle)

Let’s modify the code to transform the TLE record into a record that contains a timestamp plus a latitude and a longitude of the ISS location projected on the Earth surface.

Stop the running graph, and open a script of the Python3 Operator. Replace existing code with the following.

# Import dependencies
from skyfield.api import Topos, load, EarthSatellite
import datetime

# Decoding input and formatting output
def on_tle(tle_data):

    l1, l2, l3 = tle_data.strip().splitlines()
    ts = load.timescale()
    satellite = EarthSatellite(l2, l3, l1, ts)
    
    pytime_now=datetime.datetime.now(datetime.timezone.utc)
    ts_now=load.timescale().from_datetime(pytime_now)

    geocentric = satellite.at(ts_now)
    subpoint = geocentric.subpoint()
    timestamp = pytime_now
    
    location = [
                {
                    "TSTMP" : pytime_now.strftime('%Y-%m-%d %H:%M:%S.%f'), 
                    "LAT"   : subpoint.latitude.degrees, 
                    "LON"   : subpoint.longitude.degrees, 
                    "ALT"   : int(subpoint.elevation.m)
                }
            ]
    
    # api.send("debug", str(satellite))
    api.send(
        "location", 
        api.Message(location, {"Satelite": str(l1)})
    )

# Callback(s)
api.set_port_callback("tle", on_tle)

A few additional comments about the code:

  1. Although TLE literally means “two lines” we need to split it into three lines, as the first one contains the human-readable name of the Earth satellite.
  2. TLE record is not updated all the time. It contains the orbital elements at the specific time, called epoch, and the actual position can be calculated for +/- 24 hours with a good precision. Here we calculate for the current time stamp only (again, this is an exercise building graphs).
  3. The location is an array, not just a dictionary here, because this is what we will need later for an SAP HANA operator.
  4. TSTMP is formatted to be compatible with SAP HANA timestamp format.

Save the graph.

But it is to early to run it yet. If you try to run this graph, it will fail with the error:

Graph failure: operator.com.sap.system.python3Operator:python3operator1: Error while executing Python Operator's user provided script: No module named 'skyfield' [line 2]

Define a container image

The Python package skyfield is missing in the run-time container environment. We need to build a container image that includes the required package and tell the Python operator to use it during a run-time.

Go to Repository and right click on dockerfiles. Chose Create Docker File.

Call it samples.skyfield

… and include the following code.

FROM $com.sap.sles.base
RUN pip3 install --user skyfield

Please note the use of --user flag in pip as an access to the root user is not possible for security reasons.

In a Configuration panel add a tag skyfield.

Save it and refresh the Repository panel. You should see two files — Dockerfile and Tags.json — created under dockerfiles.samples.skyfield folder.

Build and use a container image

Go back to the graph’s diagram and include a Python operator to a group.

Do not be confused that a group contains only one operator. A group allows defining additional configurations.

In the configuration of the group add the  same tag as assigned to the Dockerfile, i.e. skyfield.

Run the graph. It will take a few minutes longer to start because it needs to build a new container image based on the Dockerfile we have defined.

Open the UI of the second Wiretap to see transformed messages, when the graph is running.

In the next episode we will add operators to the graph to persist this transformed data in SAP HANA database.


Stay healthy ❤️ everyone,
-Vitaliy (aka @Sygyzmundovych)

Assigned tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.