Find Your Path – With SAP HANA Graph
SAP HANA provides numerous advanced multi-model analytical capabilities. One such seamlessly integrated core functionality is SAP HANA Graph. A series of step-by-step tutorials explain in detail on how to get started with graph analytics on SAP HANA. In this article, we discuss different shortest path problems, walk you through a simple SAP HANA Graph program implementation, and evaluate the performance on SAP HANA Cloud.
SAP HANA, The Business Data Platform for All Applications (Image Source)
Graph analytics leverages graph structures to comprehend and assess relationships in a network. A recent market report suggested that the global graph analytics market size is expected to grow to USD ~2.5 billion by 2024. Another report listed it as one of the key topics in the 2020 Data Science Dictionary. With the advent of Big Data and IoT, need for an easy on-boarding of new data, uncovering insights, and understanding the relationships within the data objects is an ever-increasing requirement. From Google’s famous Page Rank algorithm and mobile navigation systems to LinkedIn’s recommendation engine, one can find numerous applications of graph algorithms and graph analytics in daily life.
One of the most fundamental graph problems is to find shortest paths. Algorithms to find shortest paths are applied on a graph, which is a collection of nodes and edges. An edge can be undirected, directed, and can carry some characteristics such as weight. If you consider an organization chart, you can think of a node as an employee, a directed edge will be an employee reporting to his/her manager, and an undirected edge will be the friendship relation of the employee with his/her colleagues.
What is the Shortest Path problem?
The Shortest Path problem is about finding a path between two points with the minimum cost. Common cost factors are travel distance and time. You must have used Google or Apple Maps at some point, which use efficient algorithms to tackle the shortest path problem to guide you through the fastest and cost-effective route to your destination. Another application is analyzing the fastest news or information spread in a social network such as Facebook or Twitter.
Types of Shortest Path Problems:
- Shortest Path One-to-One: This is the problem of finding the shortest path between two points. Example (i) represents such a path (shown in green) from start node A to target node D, i.e. path A→B→D.
- K-Shortest Path: This is the problem of finding more than one shortest path between two points. If K = 1, then it is same as the above variant. You can think of this problem as finding multiple fastest routes to your home and also look for a route which includes an ice-cream shop. Example (ii) shows the solution of K-Shortest Paths problem with K = 3 from point A to point D, namely paths A→B→D, A→C→D, and A→C→B→D.
- Shortest Path One-to-All: This problem aims at finding shortest paths to all points from the start point. Example (iii) shows the shortest paths from start node A to every other node in the graph. (i) One-To-One (ii) K-Paths (iii) One-To-All
The simplest shortest path calculation is hop-based, where a shortest path is the minimum number of edges needed to reach the target. More advanced variant is when you introduce costs to the paths, and hence the shortest path is the path with minimum cost to the target. So having covered some basics, we are now ready to jump to the performance investigation. Let us begin by introducing the data set we have used for the calculations.
Pokec is the most popular social networking platform in Slovakia. The dataset comprises of ~1.6 million people considered as nodes, which have ~30million relationships between them considered as edges of the graph. An article, as part of ArangoDb’s open-source performance benchmark series compares the performance of different algorithms using Pokec dataset. For calculations in this post, we also decided to use the Pokec social network data. The data is available in two flat files containing nodes and edges respectively, which can be easily imported as SAP HANA Tables.
For the calculations, we provisioned an SAP HANA Cloud system with following specifications:
- Memory: 45GB
- CPU: 3 vCPUs
- Storage: 160 GB
In SAP HANA, we use a database procedure to execute a shortest path function.
CREATE OR REPLACE PROCEDURE "POKEC"."GS_SP_LENGTH"( IN "SOURCE" BIGINT, IN "TARGET" BIGINT, IN i_dir NVARCHAR(10), OUT o_len BIGINT) LANGUAGE GRAPH READS SQL DATA AS BEGIN GRAPH g = Graph("POKEC","GRAPH"); VERTEX v_s = Vertex(:g, :SOURCE); VERTEX v_t = Vertex(:g, :TARGET); WeightedPath<BIGINT> p = Shortest_Path(:g, :v_s, :v_t, :i_dir); o_len = LENGTH(:p); END;
We start by defining an SAP HANA GraphScript procedure, which is similar to an SAP HANA SQLScript procedure but needs to have GRAPH as a mandatory language identifier. Inside the BEGIN-END block, we have access to the high-level interface for graph analytics, and most of the analytics is performed here. Graph g is the graph object created by the imported vertex and edge tables of the data set.
So, let us focus on the Shortest Path function call. A simple function call takes a graph (g), source vertex (v_s), target vertex (v_t) and direction (i_dir) as parameters. And the result of the calculation is stored as a path p, which is basically a combination of nodes and edges.
WeightedPath<BIGINT> p = Shortest_Path(:g, :v_s, :v_t, :i_dir);
And that is all you need to run the simplest Shortest Path call! If you ever tried to write your own shortest path implementation using any of the conventional algorithms like Dijkstra or Bellman-Ford, you would know that it is quite a hassle to take care of multiple loops, conditional statements, etc. We take care of it all on our side and provide you with a simple function call which achieves one of the best performances.
More functional features
SAP HANA Graph provides you additional built-in graph algorithms such as K-Shortest Paths and Shortest Path One-To-All to solve different shortest path problems. You also have the flexibility to use some additional features within a shortest path call, like declaring a weight function or choosing a direction parameter for traversal. With a weight function, you can assign weight to an edge, and also apply bounds on both hop distance and weighted distance, in order to filter for paths that do not satisfy the conditions. One prominent use case of such feature is supply chain optimization, where the usual objective is to plan routes with the minimum fuel consumption and using a weight function, you can penalize routes with heavy traffic.
Let us call this procedure. We randomly choose two points and provide them as input parameters for source and target, followed by direction, and the last placeholder is for the output of the procedure.
CALL POKEC.GS_SP_LENGTH(8414, 743803, 'OUTGOING', ?); CALL POKEC.GS_SP_LENGTH(8414, 743803, 'INCOMING', ?); CALL POKEC.GS_SP_LENGTH(8414, 743803, 'ANY', ?);
We notice that it takes merely 1ms, 2ms, and 2ms respectively, to find a path between two points for different direction parameters in a network of 30 million edges!
1000 Sequential calls
Next, let us see how the function performs for different hop distances. For this, we choose 1000 random pairs of source and target vertices and store them in a table. Afterwards, we use an SAP HANA SQLScript procedure to loop over these table entries and call the shortest path procedure sequentially. You can check out the complete code on our github page.
1000 Parallel Calls
Lastly, let us look at an even faster way to run 1000 calls. SAP HANA SQLScript provides an operator called MAP_MERGE which replaces sequential-for loops with parallelization. Executing such a call only took ~180ms, i.e. 0.18ms per query if we compare bluntly. This performance is unparalleled.
In conclusion, in this blog post we understood the Shortest Path problem, walked over a simple SAP HANA Graph Shortest Path implementation and measured its performance on SAP HANA Cloud. For more detailed information on various graph algorithms that can be performed with a single line of code, check out the SAP HANA Cloud Graph Documentation.
If we have aroused your interest in SAP HANA Graph, also check out the Devtoberfest video series or our github repository to learn about other graph examples. If you try out our examples, we would like to hear your thoughts in the comments section below. We will keep adding more examples to our github repository and add a comment here, so please click the ‘Follow’ button to be updated. In case you have questions about SAP HANA Graph or SAP HANA in general, you can ask them at SAP Community page for SAP HANA.
Do you know if its possible to integrate this idea with the Transportation module to define the best route to a customer and to define the freight conditions?
technically, it is for sure possible to leverage HANA's graph features in context of a transportation solution. You can either use HANA natively on top of the transportation tables and expose a read-only custom UI to calculate and analyze routes. For a more tight integration into an ABAP application, you can use ABAP managed database procedures (AMDP) to call graph procedures from an ABAP program. In both cases, it is a technical integration on the database level.