Technical Articles
Multi-model in hana_ml 2.6 for Python (part 04): Geospatial visualization
In the previous post, we looked at the exploration of spatial data using HANA dataframes. But even more than regular data types geospatial results require visualization to be properly understood by a human.
Let’s go…
…and I am going to create a new notebook 03 Spatial Viz.ipynb
in JupyterLab.
Then import usual modules and connect to SAP HANA instance (SAP HANA Cloud trial instance in my case).
import pandas as pd
from hana_ml import dataframe as dfh
hana_cloud_endpoint="<uuid>.hana.trial-eu10.hanacloud.ondemand.com:443"
hana_cloud_host, hana_cloud_port=hana_cloud_endpoint.split(":")
cchc=dfh.ConnectionContext(port=hana_cloud_port,
address=hana_cloud_host,
user='HANAML',
password='Super$ecr3t!',
encrypt=True
)
cchc.connection.isconnected()
Nothing new so far, but here comes…
GeoPandas
GeoPandas extends the datatypes used by Pandas to include geometric types and allow spatial operations on them. And as such GeoPandas provides a high-level interface to the matplotlib
library for plotting spatial data.
Let’s install and load it. Please note I included descartes
supporting plotting in GeoPandas.
!pip install geopandas descartes
import geopandas
Let’s create a HANA DataFrame dfh_countries
from the existing SAP HANA table TM_WORLD_BOARDERS
that we loaded in the previous post. You can see the table()
method in hana_ml
has been extended to support geo_cols
property too.
dfh_countries = cchc.table("TM_WORLD_BORDERS", geo_cols={"SHAPE":"4326"})
Previously we would create Pandas dataframe from it, but now let’s create GeoPandas dataframe. I will use dfg_
prefixes for that.
dfg_countries = geopandas.GeoDataFrame(dfh_countries.collect(),
geometry='SHAPE', crs="EPSG:4326"
)
Please note, that here we need to tell GeoPandas what is the geometry column and what Coordinate Reference System, or crs
, is used. It matches SAP HANA’s SRID that is based on a dataset maintained by EPSG as well.
You can use the following statements to check what GeoPandas types are used.
dfg_countries.dtypes
print(type(dfg_countries))
print(type(dfg_countries.LON))
print(type(dfg_countries.SHAPE))
Visualizing countries dataframe
The beauty of using GeoPandas for plotting is that takes only a single method plot()
to do it!
dfg_countries.plot()
Cool! But… a little bit too tiny, isn’t it? That’s because the default size is set to 6×4 in inches, as we can find by running:
import matplotlib.pyplot as plt
print(plt.rcParams.get('figure.figsize'))
You can specify the size when calling the plot (and please note the use of boundary
here, as we will need it later as well)…
dfg_countries.boundary.plot(figsize=(26, 12))
…or set the size for all plots.
plt.rcParams["figure.figsize"] = [26, 12]
GeoPandas plots are very convenient as well to quickly create Choropleth maps. Just specify the column and its values will be used to set the color. Below are examples for using the categorical variable “UN Subregion” and the numeric variable “Population in 2005”.
dfg_countries.plot(column="SUBREGION", cmap='flag')
dfg_countries.plot(column="POP2005", cmap='rainbow', legend=True)
Please note:
- the use of two different colormaps
cmap
to get the best use of colors to convey information. You can play with different colormaps available in Matplotlib. - the use of the
legend
in the second example.
Visualizing airports dataframe
Let’s move to data from the airports’ table PORTS
in SAP HANA.
dfh_ports = cchc.table("PORTS", geo_cols={"POINT_LON_LAT_GEO":"4326"}).select('CODE','POINT_LON_LAT_GEO')
geopandas.GeoDataFrame(
dfh_ports.collect(),
geometry='POINT_LON_LAT_GEO', crs="EPSG:4326"
).plot()
So, these are all the airports accordingly to the dataset we originally loaded. They fit into land masses, so our imagination can add the world map just by looking at this picture.
But what if I select airports from one country (which is Poland in my case)?
dfh_ports_pl = (dfh_ports
.alias('P')
.join(dfh_countries.filter("NAME='Poland'").alias('C'),
condition='P.POINT_LON_LAT_GEO.ST_CoveredBy(C.SHAPE)=1',
select=["P.*"]
)
)
dfg_ports_pl = geopandas.GeoDataFrame(dfh_ports_pl.collect(),
geometry='POINT_LON_LAT_GEO', crs="EPSG:4326"
)
dfg_ports_pl.plot()
Well, not that clear even for me now ?So, some additional details would be helpful. Maybe airport names? For that we need to create a new variable fig_port_pl
representing the plot.
fig_port_pl=dfg_ports_pl.plot()
dfg_ports_pl.apply(
lambda port: fig_port_pl.annotate(
port.CODE,
xy=port.POINT_LON_LAT_GEO.coords[0],
xytext=(-10, 10), textcoords="offset points"
), axis=1
)
Now better, and I can even clearly spot my nearest airport by its code.
But still, some more context, like a country’s shape from TM_WORLD_BORDERS
, would help. Don’t you think so?
dfg_poland = geopandas.GeoDataFrame(dfh_countries.filter("ISO2='PL'").collect(),
geometry='SHAPE', crs="EPSG:4326"
)
fig_port_pl=dfg_ports_pl.plot(color='w', edgecolor='r', markersize=300)
dfg_ports_pl.apply(lambda port: fig_port_pl.annotate(port.CODE, xy=port.POINT_LON_LAT_GEO.coords[0], xytext=(-10, 15), textcoords="offset points"), axis=1)
dfg_poland.plot(ax=fig_port_pl, alpha=0.6, edgecolor='k', zorder=0)
Please note:
- the use of
color
,edgecolor
andmarkersize
to make airport locations more visually prominent, - the use of the
alpha
parameter to set the transparency of the country shape (we will need it later), and - most importantly: the use of
zorder
to add a country shape as a layer below airports.
Map background
So far, we’ve been plotting spatial data using only data we have in the SAP HANA database. And even though we can have the complete OpenStreetMap in SAP HANA db instance (as POC’ed by our engineering team), for visualization it might be convenient to use map services already available online. For that purpose, we will use contextily
module.
!pip install contextily
import contextily as ctx
And now we can add a map background (you have a choice of multiple map providers using the source
parameter in the ctx.add_basemap()
method. I am using standard OpenStreetMap here:
fig_port_pl=dfg_ports_pl.to_crs(epsg=3857).plot(alpha=1, color='w', edgecolor='r', markersize=300, zorder=2)
dfg_ports_pl.to_crs(epsg=3857).apply(lambda port: fig_port_pl.annotate(port.CODE, xy=port.POINT_LON_LAT_GEO.coords[0], xytext=(-10, 15), textcoords="offset points"), axis=1)
dfg_poland.to_crs(epsg=3857).plot(ax=fig_port_pl, alpha=0.4, edgecolor='k', zorder=1)
fig_port_pl.set_axis_off()
ctx.add_basemap(fig_port_pl, source=ctx.providers.OpenStreetMap.Mapnik)
I am happy with the result, but before we move on, I’d like to make sure you don’t miss the to_csr()
method we had to apply to convert our datasets to Pseudo-Mercator coordinate spatial reference 3857
. This is the spatial reference that was first introduced in Google Maps, and then adopted by most of the map services, incl. OpenStreetMap.
Do you know that Google tried to get EPSG to register this CSR using the number
900913
? Because, you know,
SAP HANA’s spatial clustering
SAP HANA gives you several ways to run in-database clustering of spatial data. One of the recent additions to these methods is the Voronoi cells. As the documentation says: “Such a Voronoi cell consists of all locations of the plane that are closer to its generating point than to any other input point.”
Let’s do this for our airports. This time we know that we need to transform spatial coordinates to SRID 3857 before we can display them properly with the map’s background. And we do this already in the SAP HANA Cloud, which provides this SRS by default!
country_iso2='PL'
sql_3857='''
SELECT
p."POINT_LON_LAT_GEO".ST_Transform(3857) AS "PORTGEO",
ST_VoronoiCell(p."POINT_LON_LAT_GEO".ST_Srid(0), 30).ST_Srid(4326).ST_Intersection(wb."SHAPE").ST_Transform(3857) OVER () AS "CELL"
FROM "PORTS" p
JOIN "TM_WORLD_BORDERS" wb
ON p."POINT_LON_LAT_GEO".ST_COveredBy(wb."SHAPE")=1
WHERE wb."ISO2" ='{}'
'''.format(country_iso2)
dfp_ports_pl_voronoi = cchc.sql(
sql_3857, geo_cols=['CELL','PORTGEO'],
srid=3857
).collect()
dfg_ports_pl_voronoi = geopandas.GeoDataFrame(
dfp_ports_pl_voronoi,
geometry='CELL', crs="EPSG:3857"
)
And now let’s add these Voronoi cells to the plot, remembering to order layers properly using the zorder
parameter.
fig_port_pl=dfg_ports_pl.to_crs(epsg=3857).plot(alpha=1, color='w', edgecolor='r', markersize=300, zorder=3)
dfg_ports_pl.to_crs(epsg=3857).apply(lambda port: fig_port_pl.annotate(port.CODE, xy=port.POINT_LON_LAT_GEO.coords[0], xytext=(-10, 15), textcoords="offset points"), axis=1)
dfg_poland.to_crs(epsg=3857).plot(ax=fig_port_pl, alpha=0.4, edgecolor='k', zorder=1)
fig_port_pl.set_axis_off()
ctx.add_basemap(fig_port_pl, source=ctx.providers.OpenStreetMap.Mapnik)
dfg_ports_pl_voronoi.boundary.plot(ax=fig_port_pl, edgecolor='r',
zorder=2
)
Good job, isn’t it? We used three HANA DataFrames and four layers in this GeoPandas plot!
And the complete notebook can be found at: https://github.com/SAP-samples/hana-ml-samples/blob/main/Python-API/usecase-examples/multimodel-analysis-airroutes/03%20Spatial%20Viz.ipynb.
But…
…there are more ways to visualize geospatial data in Python
like Folium, Geoplot, or Cartopy. Have you used any of them and would like to share your experience with others? Please add in the comments.
In the next episode, we will move to what’s new in hana_ml
2.6 processing connected data using SAP HANA’s graph workspaces!
Stay healthy ❤️
-Vitaliy (aka @Sygyzmundovych)