Skip to Content
Technical Articles
Author's profile photo Witalij Rudnicki

Multi-model in hana_ml 2.6 for Python (part 05): Graphs

In the previous posts, we did exploration and visualization of spatial data in HANA dataframes. Now let’s’ move to another multi-model kind of data supported since hana_ml 2.6: connected data, or graphs.

If you are not familiar with graphs in SAP HANA yet, then:

Let’s move on…

…and I am going to create a new notebook 04 Graph.ipynb in JupyterLab.

Then import usual modules and connect to SAP HANA instance (SAP HANA Cloud trial instance in my case).

import pandas as pd
from hana_ml import dataframe as dfh
hana_cloud_endpoint="<uuid>.hana.trial-eu10.hanacloud.ondemand.com:443"
hana_cloud_host, hana_cloud_port=hana_cloud_endpoint.split(":")

cchc=dfh.ConnectionContext(port=hana_cloud_port,
                                         address=hana_cloud_host,
                                         user='HANAML',
                                         password='Super$ecr3t!',
                                         encrypt=True
                                        )
cchc.connection.isconnected()
dfh_ports=cchc.table("PORTS", geo_cols={"POINT_LON_LAT_GEO":"4326"})
dfh_routes=cchc.table("ROUTES")

Nothing new so far, but here come…

SAP HANA Graphs in Python

Starting from the version 2.6 hana_ml package includes the new module hana_ml.graph.hana_graph.

Let’s import the module…

import hana_ml.graph.hana_graph

…and create a new SAP HANA graph workspace AIRROUTES_DFH — represented by the Python variable hgws_airroutes — from existing tables represented by HANA DataFrames using create_hana_graph_from_vertex_and_edge_frames() method. We need to provide all the same parameters as if we would use SQL statement.

hgws_airroutes = (
    hana_ml.graph.hana_graph
    .create_hana_graph_from_vertex_and_edge_frames(
        connection_context=cchc, 
        workspace_name='AIRROUTES_DFH',
        
        vertices_hdf=dfh_ports,
        vertex_key_column="ID", 
        
        edges_hdf=dfh_routes, 
        edge_key_column="ID",
        edge_source_column="FROM", edge_target_column="TO"
    )
)

Let’s check what DB objects are used to provide data for vertices and edges in this new workspace.

print("SQL for Vertices: {}\nSQL for Edges: {}"
      .format(hgws_airroutes.vertices_hdf.select_statement,
              hgws_airroutes.edges_hdf.select_statement))

As you can see SQL views PORTS_VIEW and ROUTES_VIEW were created on the database side on top of existing column tables PORTS and ROUTES.

Graph exploration

Vertices…

As you’ve seen above we have access to HANA DataFrames representing vertices and edges of the graph. E.g. if I want to find my local Wrocław airport among nodes:

hgws_airroutes.vertices_hdf.filter("CODE='WRO'").collect()

But if I know the key of a node, ie. 313 in this example, then I can call vertices() method that will return a Pandas dataframe already.

hgws_airroutes.vertices(vertex_key=313)

Or I can go nerdy ? and combine the two above to over-engineer the solution just for the sake of the exercise.

hgws_airroutes.vertices(
    vertex_key=(hgws_airroutes
                .vertices_hdf.filter("CODE='WRO'").select('ID')
                .collect().values[0][0])
)

…and Edges

Let’s check edges() for the same node…

hgws_airroutes.edges(vertex_key=313).head(3)

…that returns all outgoing connections. To see incoming we need to add direction='INCOMING') property to it.

hgws_airroutes.edges(vertex_key=313, direction='INCOMING').head(3)

And to get information about the source node of the first connection on the list you can use edges() method.

hgws_airroutes.source(edge_key=hgws_airroutes.edges(
    vertex_key=313, 
    direction='INCOMING').head(1).ID.values[0])

Neighbors

While edges() method gives us a way to traverse paths, it is more convenient to use neighbors() method to discover nodes within N-degrees of separation.

By default it assumes depth=1, so only immediate neighbors.

dfp_nei_wro=hgws_airroutes.neighbors(start_vertex=313)
print("Number of immediate neighbors: {}".format(len(dfp_nei_wro.index)))
display(dfp_nei_wro)

This method can be extended to include edges into the result. Please note the use of the min_depth=0 to include the starting node too.

dfp_nei_wro=hgws_airroutes.neighbors(
    start_vertex=313, 
    min_depth=0, 
    include_edges=True
).edges()

display(dfp_nei_wro)

Why are there 1443 connections? Because the algorithm returned as well all connections between all nodes in the neighborhood.

If we want to count only connections from Wrocław airport, then…

len(dfp_nei_wro[dfp_nei_wro['FROM']==313].ID)

…returns 53 connections, as expected.

Visualizing graphs

Just as geospatial data it is much easier to understand connected data when it is visualized.

For that we will use Python’s networkX package.

!pip install networkx
import matplotlib.pyplot as plt
import networkx as nx
nx.__version__
plt.rcParams["figure.figsize"] = [26, 12]

Now let’s use networkX to display the network of Wrocław airport connected neighbors.

nx_graph_wro = nx.from_pandas_edgelist(
    dfp_nei_wro, 
    source="FROM", target="TO"
)
nx.draw_networkx(nx_graph_wro)

By default this graph has somewhat random shape, that you can notice after re-running the same cell nx.draw_networkx(nx_graph_wro) a few times.

As I have mentioned connections between all nodes selected by neighborhood algorithm are included. Let’s reduce them only to connections from the source WRO airport.

nx_graph_wro = nx.from_pandas_edgelist(
    dfp_nei_wro.query("FROM==313"), 
    source="FROM", 
    target="TO"
)
nx.draw_networkx(nx_graph_wro)

As you notice, the default layout of the graph has been changed to “hub-and-spoke” now.

We can pick some other layout, e.g. “shell”:

pos_geo=nx.shell_layout(nx_graph_wro)
nx.draw_networkx(nx_graph_wro, pos=pos_geo)

But in our case each node has geospatial coordinates that can be used as X and Y coordinates for nodes rendered for display.

hgws_airroutes.vertices(vertex_key=313)[['LON','LAT']].values[0]
for x in pos_geo.keys():
    pos_geo[x]=hgws_airroutes.vertices(vertex_key=x)[['LON','LAT']].values[0]
nx.draw_networkx(nx_graph_wro, pos=pos_geo)

Now, I can even visually guesstimate that node 217 is Reykjavik’s Keflavik International Airport or 146 is Tel Aviv’s Ben Gurion International Airport.


But in the next (and the last episode) we will not need to guess when we combine together graph and geospatial, so stay tuned!

Stay healthy ❤️
-Vitaliy (aka @Sygyzmundovych)

Assigned tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.