Skip to Content
Technical Articles

Multi-model in hana_ml 2.6 for Python: Geospatial visualization

In the previous post, we looked at the exploration of spatial data using HANA dataframes. But even more than regular data types geospatial results require visualization to be properly understood by a human.

Let’s go…

…and I am going to create a new notebook 03 Spatial Viz.ipynb in JupyterLab.

Then import usual modules and connect to SAP HANA instance (SAP HANA Cloud trial instance in my case).

import pandas as pd
from hana_ml import dataframe as dfh
hana_cloud_endpoint="<uuid>.hana.trial-eu10.hanacloud.ondemand.com:443"
hana_cloud_host, hana_cloud_port=hana_cloud_endpoint.split(":")

cchc=dfh.ConnectionContext(port=hana_cloud_port,
                                         address=hana_cloud_host,
                                         user='HANAML',
                                         password='Super$ecr3t!',
                                         encrypt=True
                                        )
cchc.connection.isconnected()

Nothing new so far, but here comes…

GeoPandas

GeoPandas extends the datatypes used by Pandas to include geometric types and allow spatial operations on them. And as such GeoPandas provides a high-level interface to the matplotlib library for plotting spatial data.

Let’s install and load it. Please note I included descartes supporting plotting in GeoPandas.

!pip install geopandas descartes
import geopandas

Let’s create a HANA DataFrame dfh_countries from the existing SAP HANA table TM_WORLD_BOARDERSthat we loaded in the previous post. You can see the table() method in hana_ml has been extended to support geo_cols property too.

dfh_countries = cchc.table("TM_WORLD_BORDERS", geo_cols={"SHAPE":"4326"})

Previously we would create Pandas dataframe from it, but now let’s create GeoPandas dataframe. I will use dfg_ prefixes for that.

dfg_countries = geopandas.GeoDataFrame(dfh_countries.collect(), 
                                       geometry='SHAPE', crs="EPSG:4326"
                                      )

Please note, that here we need to tell GeoPandas what is the geometry column and what Coordinate Reference System, or crs, is used. It matches SAP HANA’s SRID that is based on a dataset maintained by EPSG as well.

You can use the following statements to check what GeoPandas types are used.

dfg_countries.dtypes
print(type(dfg_countries))
print(type(dfg_countries.LON))
print(type(dfg_countries.SHAPE))

Visualizing countries dataframe

The beauty of using GeoPandas for plotting is that takes only a single method plot() to do it!

dfg_countries.plot()

Cool! But… a little bit too tiny, isn’t it? That’s because the default size is set to 6×4 in inches, as we can find by running:

import matplotlib.pyplot as plt

print(plt.rcParams.get('figure.figsize'))

You can specify the size when calling the plot (and please note the use of boundary here, as we will need it later as well)…

dfg_countries.boundary.plot(figsize=(26, 12))

…or set the size for all plots.

plt.rcParams["figure.figsize"] = [26, 12]

GeoPandas plots are very convenient as well to quickly create Choropleth maps. Just specify the column and its values will be used to set the color. Below are examples for using the categorical variable “UN Subregion” and the numeric variable “Population in 2005”.

dfg_countries.plot(column="SUBREGION", cmap='flag')

dfg_countries.plot(column="POP2005", cmap='rainbow', legend=True)

Please note:

  • the use of two different colormaps cmap to get the best use of colors to convey information. You can play with different colormaps available in Matplotlib.
  • the use of the legend in the second example.

Visualizing airports dataframe

Let’s move to data from the airports’ table PORTS in SAP HANA.

dfh_ports = cchc.table("PORTS", geo_cols={"POINT_LON_LAT_GEO":"4326"}).select('CODE','POINT_LON_LAT_GEO')
geopandas.GeoDataFrame(
    dfh_ports.collect(), 
    geometry='POINT_LON_LAT_GEO', crs="EPSG:4326"
).plot()

So, these are all the airports accordingly to the dataset we originally loaded. They fit into land masses, so our imagination can add the world map just by looking at this picture.

But what if I select airports from one country (which is Poland in my case)?

dfh_ports_pl = (dfh_ports
                .alias('P')
                .join(dfh_countries.filter("NAME='Poland'").alias('C'),
                      condition='P.POINT_LON_LAT_GEO.ST_CoveredBy(C.SHAPE)=1',
                      select=["P.*"]
                     )
               )
dfg_ports_pl = geopandas.GeoDataFrame(dfh_ports_pl.collect(), 
                                      geometry='POINT_LON_LAT_GEO', crs="EPSG:4326"
                                     )
dfg_ports_pl.plot()

Well, not that clear even for me now 🙂So, some additional details would be helpful. Maybe airport names? For that we need to create a new variable fig_port_pl representing the plot.

fig_port_pl=dfg_ports_pl.plot()

dfg_ports_pl.apply(
    lambda port: fig_port_pl.annotate(
        port.CODE, 
        xy=port.POINT_LON_LAT_GEO.coords[0], 
        xytext=(-10, 10), textcoords="offset points"
    ), axis=1
)

Now better, and I can even clearly spot my nearest airport by its code.

But still, some more context, like a country’s shape from TM_WORLD_BORDERS, would help. Don’t you think so?

dfg_poland = geopandas.GeoDataFrame(dfh_countries.filter("ISO2='PL'").collect(),
                                    geometry='SHAPE', crs="EPSG:4326"
                                   )
fig_port_pl=dfg_ports_pl.plot(color='w', edgecolor='r', markersize=300)

dfg_ports_pl.apply(lambda port: fig_port_pl.annotate(port.CODE, xy=port.POINT_LON_LAT_GEO.coords[0], xytext=(-10, 15), textcoords="offset points"), axis=1)

dfg_poland.plot(ax=fig_port_pl, alpha=0.6, edgecolor='k', zorder=0)

Please note:

  • the use of color, edgecolor and markersize to make airport locations more visually prominent,
  • the use of the alpha parameter to set the transparency of the country shape (we will need it later), and
  • most importantly: the use of zorder to add a country shape as a layer below airports.

Map background

So far, we’ve been plotting spatial data using only data we have in the SAP HANA database. And even though we can have the complete OpenStreetMap in SAP HANA db instance (as POC’ed by our engineering team), for visualization it might be convenient to use map services already available online. For that purpose, we will use contextily module.

!pip install contextily
import contextily as ctx

And now we can add a map background (you have a choice of multiple map providers using the source parameter in the ctx.add_basemap() method. I am using standard OpenStreetMap here:

fig_port_pl=dfg_ports_pl.to_crs(epsg=3857).plot(alpha=1, color='w', edgecolor='r', markersize=300, zorder=2)

dfg_ports_pl.to_crs(epsg=3857).apply(lambda port: fig_port_pl.annotate(port.CODE, xy=port.POINT_LON_LAT_GEO.coords[0], xytext=(-10, 15), textcoords="offset points"), axis=1)

dfg_poland.to_crs(epsg=3857).plot(ax=fig_port_pl, alpha=0.4, edgecolor='k', zorder=1)
fig_port_pl.set_axis_off()

ctx.add_basemap(fig_port_pl, source=ctx.providers.OpenStreetMap.Mapnik)

I am happy with the result, but before we move on, I’d like to make sure you don’t miss the to_csr() method we had to apply to convert our datasets to Pseudo-Mercator coordinate spatial reference 3857. This is the spatial reference that was first introduced in Google Maps, and then adopted by most of the map services, incl. OpenStreetMap.

Do you know that Google tried to get EPSG to register this CSR using the number 900913? Because, you know, GOOGLE?

SAP HANA’s spatial clustering

SAP HANA gives you several ways to run in-database clustering of spatial data. One of the recent additions to these methods is the Voronoi cells. As the documentation says: “Such a Voronoi cell consists of all locations of the plane that are closer to its generating point than to any other input point.

Let’s do this for our airports. This time we know that we need to transform spatial coordinates to SRID 3857 before we can display them properly with the map’s background. And we do this already in the SAP HANA Cloud, which provides this SRS by default!

country_iso2='PL'

sql_3857='''
SELECT 
	p."POINT_LON_LAT_GEO".ST_Transform(3857) AS "PORTGEO",
	ST_VoronoiCell(p."POINT_LON_LAT_GEO".ST_Srid(0), 30).ST_Srid(4326).ST_Intersection(wb."SHAPE").ST_Transform(3857) OVER () AS "CELL"
FROM "PORTS" p
JOIN "TM_WORLD_BORDERS" wb
ON p."POINT_LON_LAT_GEO".ST_COveredBy(wb."SHAPE")=1
WHERE wb."ISO2" ='{}'
'''.format(country_iso2)
dfp_ports_pl_voronoi = cchc.sql(
    sql_3857, geo_cols=['CELL','PORTGEO'], 
    srid=3857
).collect()

dfg_ports_pl_voronoi = geopandas.GeoDataFrame(
    dfp_ports_pl_voronoi, 
    geometry='CELL', crs="EPSG:3857"
)

And now let’s add these Voronoi cells to the plot, remembering to order layers properly using the zorder parameter.

fig_port_pl=dfg_ports_pl.to_crs(epsg=3857).plot(alpha=1, color='w', edgecolor='r', markersize=300, zorder=3)

dfg_ports_pl.to_crs(epsg=3857).apply(lambda port: fig_port_pl.annotate(port.CODE, xy=port.POINT_LON_LAT_GEO.coords[0], xytext=(-10, 15), textcoords="offset points"), axis=1)

dfg_poland.to_crs(epsg=3857).plot(ax=fig_port_pl, alpha=0.4, edgecolor='k', zorder=1)
fig_port_pl.set_axis_off()

ctx.add_basemap(fig_port_pl, source=ctx.providers.OpenStreetMap.Mapnik)

dfg_ports_pl_voronoi.boundary.plot(ax=fig_port_pl, edgecolor='r', 
                                   zorder=2
                                  )

Good job, isn’t it? We used three HANA DataFrames and four layers in this GeoPandas plot!

And the complete notebook can be found at: https://github.com/SAP-samples/hana-ml-samples/blob/main/Python-API/usecase-examples/multimodel-analysis-airroutes/03%20Spatial%20Viz.ipynb.

But…

…there are more ways to visualize geospatial data in Python

like Folium, Geoplot, or Cartopy. Have you used any of them and would like to share your experience with others? Please add in the comments.


In the next episode, we will move to what’s new in hana_ml 2.6 processing connected data using SAP HANA’s graph workspaces!

Stay healthy ❤️
-Vitaliy (aka @Sygyzmundovych)

Be the first to leave a comment
You must be Logged on to comment or reply to a post.