The “SAP HANA Python Client API for Machine Learning Algorithms” – python client or simply HANA-ML for short – is around since a while. It exposes SAP HANA’s embedded machine learning capabilities to Data Scientists. With the latest release 2.6 on Oct 16, 2020 it includes enhancements to leverage some more of HANA’s multi-model capabilities: spatial and graph.
A core concept of the python client is the dataframe. It provides a set of methods to analyze and manipulate data in SAP HANA without bringing the data to the client. Since version 2.6, HANA-ML handles geospatial data seamlessly when creating SAP HANA dataframes from Pandas or even Geopandas. As an additional alternative, loading shapefiles into SAP HANA is now supported by a dedicated function.
Graphs are powerful models to describe many real-world complex systems like social, supply, and transportation networks. In SAP HANA, a “Graph Workspace” is backed up by two flat data structures – one for the vertices, one for the edges of the network. These data structures may describe people and their relationships, or road segments and junctions.
Using the python API you can now create a graph directly from two dataframes, or by simply pointing to an existing Graph Workspace in SAP HANA. Once a graph object is created you work with it in a similar way as you do with the machine learning algorithms from PAL/APL – both return Pandas dataframes to the user. The latest version of the python client exposes graph functions to query vertices and edges, get n-hop neighbors and neighbors-induced sub-graphs, and to calculate shortest paths.
If you are a data scientist working with python, make sure to explore the sample Jupyter Notebook on Github. It demonstrates how to load data from OpenStreetMaps into HANA and the use of the python client API to calculate shortest paths on the London street network.