SAP HANA Database as a Graph Store – Introduction
SAP HANA database is continuously evolving with best fit functionalities to suffice varied needs of end user. HANA as a database supports more than primitive data types along with the defined set of operations on them. In the world of connected data, defining relationship among the available data set is one of the important aspects. RDBMSs are one such choice for storage of information’s like financial records, manufacturing and logistical information, personnel data, and other applications.SAP HANA is at its core a columnar store optimized for relational records, which suffices the above mentioned needs and it is not just that. Now it is also possible to identify relationships between the records in a deployment as a graph store without having to use an external store for same purpose.
In this series of discussion we will understand about SAP HANA Graph Database by :
From SPS12 version, HANA can be used as a Graph Database. What do we mean by ‘Graph Database‘ here? let us have a quick glimpse of what it is and proceed ahead with the computational capabilities in HANA to achieve the same.
There are no isolated pieces of information in this connected world ,but rich and connected domains all around us. Graph Database embraces relationships as a core aspect of its data model to store, process, and query connections efficiently. Conventional data storage mechanism in a DB computes relationships expensively at query time, on the other hand graph database stores connections as first class citizens, readily available for any “join-like” navigation operation. Accessing those already persistent connections is an efficient, constant-time operation and allows us to quickly traverse millions of connections per second per core.Independent of the total size of our data set, graph databases excel at managing highly connected data and complex queries. Armed only with a pattern and a set of starting points, graph databases explore the larger neighborhood around the initial starting points — collecting and aggregating information from millions of nodes and relationships — leaving the billions outside the search perimeter untouched.
Thus, instead of writing queries that are highly recursive or that span across multiple tables which increases the return time of the result in a relational DB structure, we are approaching towards a new design for the quick traversal of relationships between entities and are termed as Graph Database.
After understanding the base nature of Graph Database let us go ahead and realize the capabilities in HANA to achieve it.
SAP HANA Graph is an integral part of SAP HANA core functionality. It expands the SAP HANA platform with native support for graph processing and allows us to execute typical graph operations on the data stored in an SAP HANA system.
In SAP HANA, a graph is a set of vertices and a set of edges. Each edge connects two vertices; one vertex is denoted as the source and the other as the target. Edges are always directed and there can be two or more edges connecting the same two vertices. Vertices and edges can have an arbitrary number of attributes. A vertex attribute consists of a name that is associated with a data type and a value. Edge attributes consist of the same information.
There are quite a few Graph Algorithms defined in HANA to work upon the data defined in Graph Structure based on user requirement.
Let us understand each of the algorithms by taking an example.
Banks and Insurance companies lose billions of dollars every year to fraud. Traditional methods of fraud detection play an important role in minimizing these losses. However increasingly sophisticated fraudsters have developed a variety of ways to elude discovery, both by working together, and by leveraging various other means of constructing false identities. Graph Databases offer new methods of uncovering fraud rings and other sophisticated scams with a high-level of accuracy, and are capable of stopping advanced fraud scenarios in real-time.Understanding the connections between data, and deriving meaning from
these links, doesn’t necessarily mean gathering new data. Significant insightscan be drawn from one’s existing data, simply by reframing the problem and
looking at it in a new way: as a Graph
Insurance Fraud:Insurance fraud attracts sophisticated criminal rings who are often very effective in circumventing fraud detection measures, Once again, graph databases can be a powerful tool in combating collusive fraud. In a typical hard fraud scenario, rings of fraudsters work together to stage fake accidents and claim soft tissue injuries. These fake accidents never really happen. Such rings normally include a number of roles.
1. Providers: Collusions typically involve participation from professionals in several categories:
a. Doctors, who diagnose false injuries
b. Lawyers, who file fraudulent claims, and
c. Body shops, which misrepresent damage to cars
2. Participants: These are the people involved in the (false) accident, and normally include:
Fraudsters often create and manage rings by “recycling” participants so as to stage many accidents.
Thus one accident may have a particular person play the role of the driver. In another accident the same person may be a passenger or a pedestrian, and in another a witness. Clever usage of roles can generate a large number of costly fake accidents, even with a small number of participants as is shown below :
Traditional approach to discover the above ring requires joining a number of tables in a complex schema such as Accidents, Vehicles, Owners, Drivers, Passengers, Pedestrians, Witnesses, Providers, and joining these together multiple times— once per potential role— in order to uncover the full picture. Because such operations are so complex and costly, particularly for very large data sets, this crucial form of analysis is often overlooked
To achieve this, graph databases are well suited, as it becomes a simple question of walking the graph to find the fraud rings.
Below figure shows the insurance-fraud ring scenario that can be modeled in a graph data structure.
In the above insurance-fraud scenario there are multiple vertices/Nodes (People involved in the act, Cars used, Events that claimed the insurance) and edges/relationships (role that is played by each of the former defined vertices).
Let us create Graph Database tables for the above edges and vertices along with the other Graph Objects to uncover the fraud ring in the next discussion mentioned below :