In the previous posts, we did exploration and visualization of spatial data in HANA dataframes. Now let’s’ move to another multi-model kind of data supported since `hana_ml`

2.6: connected data, or graphs.

If you are not familiar with graphs in SAP HANA yet, then:

- get the general overview from Markus Fath who presented this topic during the Devtoberfest: https://www.youtube.com/watch?v=_JnKtv66E-w&list=PL6RpkC85SLQA8za7iX9FRzewU7Vs022dl&index=8,
- check SAP HANA Graph SQL tutorials: https://developers.sap.com/group.hana-aa-graph-overview.html

# Let’s move on…

…and I am going to create a new notebook `04 Graph.ipynb`

in JupyterLab.

Then import usual modules and connect to SAP HANA instance (SAP HANA Cloud trial instance in my case).

```
import pandas as pd
from hana_ml import dataframe as dfh
```

`hana_cloud_endpoint="<uuid>.hana.trial-eu10.hanacloud.ondemand.com:443"`

```
hana_cloud_host, hana_cloud_port=hana_cloud_endpoint.split(":")
cchc=dfh.ConnectionContext(port=hana_cloud_port,
address=hana_cloud_host,
user='HANAML',
password='Super$ecr3t!',
encrypt=True
)
```

`cchc.connection.isconnected()`

```
dfh_ports=cchc.table("PORTS", geo_cols={"POINT_LON_LAT_GEO":"4326"})
dfh_routes=cchc.table("ROUTES")
```

Nothing new so far, but here come…

## SAP HANA Graphs in Python

Starting from the version 2.6 `hana_ml`

package includes the new module `hana_ml.graph.hana_graph`

.

Let’s import the module…

`import hana_ml.graph.hana_graph`

…and create a new SAP HANA graph workspace `AIRROUTES_DFH`

— represented by the Python variable `hgws_airroutes`

— from existing tables represented by HANA DataFrames using `create_hana_graph_from_vertex_and_edge_frames()`

method. We need to provide all the same parameters as if we would use SQL statement.

```
hgws_airroutes = (
hana_ml.graph.hana_graph
.create_hana_graph_from_vertex_and_edge_frames(
connection_context=cchc,
workspace_name='AIRROUTES_DFH',
vertices_hdf=dfh_ports,
vertex_key_column="ID",
edges_hdf=dfh_routes,
edge_key_column="ID",
edge_source_column="FROM", edge_target_column="TO"
)
)
```

Let’s check what DB objects are used to provide data for vertices and edges in this new workspace.

```
print("SQL for Vertices: {}\nSQL for Edges: {}"
.format(hgws_airroutes.vertices_hdf.select_statement,
hgws_airroutes.edges_hdf.select_statement))
```

As you can see SQL views `PORTS_VIEW`

and `ROUTES_VIEW`

were created on the database side on top of existing column tables `PORTS`

and `ROUTES`

.

## Graph exploration

### Vertices…

As you’ve seen above we have access to HANA DataFrames representing vertices and edges of the graph. E.g. if I want to find my local Wrocław airport among nodes:

`hgws_airroutes.vertices_hdf.filter("CODE='WRO'").collect()`

But if I know the key of a node, ie. `313`

in this example, then I can call `vertices()`

method that will return a Pandas dataframe already.

`hgws_airroutes.vertices(vertex_key=313)`

Or I can go nerdy ? and combine the two above to over-engineer the solution just for the sake of the exercise.

```
hgws_airroutes.vertices(
vertex_key=(hgws_airroutes
.vertices_hdf.filter("CODE='WRO'").select('ID')
.collect().values[0][0])
)
```

### …and Edges

Let’s check `edges()`

for the same node…

`hgws_airroutes.edges(vertex_key=313).head(3)`

…that returns all outgoing connections. To see incoming we need to add `direction='INCOMING')`

property to it.

`hgws_airroutes.edges(vertex_key=313, direction='INCOMING').head(3)`

And to get information about the source node of the first connection on the list you can use `edges()`

method.

```
hgws_airroutes.source(edge_key=hgws_airroutes.edges(
vertex_key=313,
direction='INCOMING').head(1).ID.values[0])
```

### Neighbors

While `edges()`

method gives us a way to traverse paths, it is more convenient to use `neighbors()`

method to discover nodes within N-degrees of separation.

By default it assumes depth=1, so only immediate neighbors.

```
dfp_nei_wro=hgws_airroutes.neighbors(start_vertex=313)
print("Number of immediate neighbors: {}".format(len(dfp_nei_wro.index)))
display(dfp_nei_wro)
```

This method can be extended to include edges into the result. Please note the use of the `min_depth=0`

to include the starting node too.

```
dfp_nei_wro=hgws_airroutes.neighbors(
start_vertex=313,
min_depth=0,
include_edges=True
).edges()
display(dfp_nei_wro)
```

Why are there 1443 connections? Because the algorithm returned as well all connections between all nodes in the neighborhood.

If we want to count only connections from Wrocław airport, then…

`len(dfp_nei_wro[dfp_nei_wro['FROM']==313].ID)`

…returns 53 connections, as expected.

## Visualizing graphs

Just as geospatial data it is much easier to understand connected data when it is visualized.

For that we will use Python’s networkX package.

`!pip install networkx`

```
import matplotlib.pyplot as plt
import networkx as nx
nx.__version__
```

`plt.rcParams["figure.figsize"] = [26, 12]`

Now let’s use networkX to display the network of Wrocław airport connected neighbors.

```
nx_graph_wro = nx.from_pandas_edgelist(
dfp_nei_wro,
source="FROM", target="TO"
)
```

`nx.draw_networkx(nx_graph_wro)`

By default this graph has somewhat random shape, that you can notice after re-running the same cell `nx.draw_networkx(nx_graph_wro)`

a few times.

As I have mentioned connections between all nodes selected by neighborhood algorithm are included. Let’s reduce them only to connections from the source `WRO`

airport.

```
nx_graph_wro = nx.from_pandas_edgelist(
dfp_nei_wro.query("FROM==313"),
source="FROM",
target="TO"
)
```

`nx.draw_networkx(nx_graph_wro)`

As you notice, the default layout of the graph has been changed to “hub-and-spoke” now.

We can pick some other layout, e.g. “shell”:

```
pos_geo=nx.shell_layout(nx_graph_wro)
nx.draw_networkx(nx_graph_wro, pos=pos_geo)
```

But in our case each node has geospatial coordinates that can be used as `X`

and `Y`

coordinates for nodes rendered for display.

`hgws_airroutes.vertices(vertex_key=313)[['LON','LAT']].values[0]`

```
for x in pos_geo.keys():
pos_geo[x]=hgws_airroutes.vertices(vertex_key=x)[['LON','LAT']].values[0]
```

`nx.draw_networkx(nx_graph_wro, pos=pos_geo)`

Now, I can even visually guesstimate that node `217`

is Reykjavik’s Keflavik International Airport or `146`

is Tel Aviv’s Ben Gurion International Airport.

But in the next (and the last episode) we will not need to guess when we combine together graph and geospatial, so stay tuned!

Stay healthy ❤️

-Vitaliy (aka @Sygyzmundovych)