Graph algorithms in SAP HANA: on SFLIGHT and at TechEd
In the previous blog SAP HANA Graph visualizes SFLIGHT data we loaded SFLIGHT data into SAP HANA instance and created the graph workspace
"SFLIGHT"."CONNECTIONS_C". Then using built-in Graph Visualizer in Database Explorer we could visualize the graph.
One more thing that the Graph Visualizer allows you to do is to understand and explore your graph by running most common graph algorithms: Neighborhood, Strongly Connected, Shortest Path. One more option is a pattern search using OpenCypher language, but we will leave it for the next blog.
So, let’s check these three algorithms. They are all available under
“A directed graph is said to be strongly connected if every vertex is reachable from every other vertex.” (source: help)
There is a subgraph where 14 airports are strongly connected, meaning we can leave from any airport in this set and arrive to any other airport from the same set.
THF is connected with FRA, but only with the inbound connection. And MUC is not connected to any other airport. Obviously, it is the same in reality, but now we are exploring the dataset loaded from SFLIGHT.
“…the neighboring vertices within the given radius (depth) from the given start vertices.” (source: help)
Let’s check what are direct flights from Singapore accordingly to this dataset.
The Shortest Path
“…the shortest path from the provided start vertex to the provided target vertex – also known as single-source single-target shortest path” (source: help)
Let’s check the shortest path from Tokyo to New York JFK using flight time as a weight attribute.
Attending SAP TechEd this year…
…and would like to learn more?
For SAP TechEd I prepared hands-on exercises to explore these and other aspects of SAP HANA graph processing based on some real life data. You can join CodeJam, mini-edition in the Developer Garage on the show floor.
You can join as well another CodeJam on the show floor that I prepared on processing of geospatial data in SAP HANA using SQL.
- Las Vegas: https://sessioncatalog.sapevents.com/go/agendabuilder.speakers/?l=191&speaker_id=46888
- Barcelona: https://sessioncatalog.sapevents.com/go/agendabuilder.speakers/?l=192&speaker_id=46888
- Bangalore: https://sessioncatalog.sapevents.com/go/agendabuilder.speakers/?l=193&speaker_id=46888 (to be scheduled soon)
See you there, and – remember – do not use SFLIGHT sample data to schedule your flights!! 😉
-Vitaliy, aka @Sygyzmundovych
PS, i would remove TYO from sample data as it is a joint 'fake' name for two Tokyo airports: HND (Haneda) and NRT (Narita). TYO is not a real airport 🙂
This makes a lot more sense after your preview at the mini-code-jam session, especially with the small data set of vertices and edges. Can you please clarify your comments about Support Pack 3 features supporting this technology?
Thank you so much for this two part blog series. I have been looking for some nice blogs on Graph Processing and found this one (there are really very few on SCN 😉 ). Having said that, I observed one thing. Creation of the Graph work space itself isn't a challenging thing., I see that the real task is in building the Vertices and Edges. I am wondering if I were to build these Vertices and Edges based on various SAP tables ( say Sales order info, customer master and material master info...), wouldn't it take longer run time or should we safely say that with HANA, we wouldn't have that problem anymore ? Would be nice to know your perspective. Cheers !!!
when you create nodes and edges from multiple tables, you basically have two options: copy the data into a single nodes/single edges table, or use a (UNION) view to expose the data to the graph engine. If you are invoking the built-in algorithms from the Database Explorer (like Witalij shows above), the two options (data copy vs. view) may result in differences in runtimes.
However, internally HANA Graph creates and maintains a kind of index structure - an adjacency list. The built-in algorithms and functions, like shortest path and so on, are also exposed in our "GraphScript" programming model. See SAP HANA Graph Reference.
GraphScript procedures leverage the graph's adjacency list and as a consequence, the underlying "persistence" (data copy vs. view) does not impact the runtime of GraphScript procedures and algorithms.
Hope that helps. Regards, Markus
Thank you Markus. Greatly appreciate your patience in explaining the in-depth details.
Hi Markus Fath and Witalij Rudnicki - Unfortunately HANA Cloud does not support the Cipher Pattern matching alogirthm and Strongly connected anymore. Could I ask if this is a restriction (or) temporary hiccup on HANA Cloud (I have deployed a free trial on SAP BTP).
right and wrong 😉 HANA Cloud does support Cypher and in general provides many more graph features and better performance compared to HANA on-prem. But the HANA Database Explorer front-end still needs some adaptations to some technical changes that came with HANA Cloud. So please be patient, the gaps in the DBX graph viewer will be closed soon.
Markus Fath - Big Thanks !