SAP HANA Vora (v 1.2) Introduction – Part I
SAP HANA Vora is a solution from SAP for handling “In-Memory processing and analysis of Big Data” on top of the Hadoop platform. Here is short introduction on the concepts of SAP HANA Vora (Version 1.2), its services’ architecture, and how SAP Vora works on big data on Hadoop.
In the computing hardware technology, the prediction was the number of transistors packed in a dense integrated circuit doubles approximately every two years. With this quick growth in computing power, one unanticipated result is the huge upsurge in the amount of data that people, and their smart devices (such as the Internet of Things), generate every day. The tremendous growth in data, along with the increased computing power that comes with it, still far exceeds the speed at which users can generate data.
For a data to be considered as Big Data it has three famous V’s: Velocity, Volume, and Variety of structured and unstructured data. Hadoop is one Big Data platform that provides a less-expensive option for storing and analyzing this volume of data, as Hadoop runs by distributing the data on top of multiple cheaper commodity hardware instead of the usual high-end servers, breaking the basic fact that the enterprise data should live on highly reliable expensive servers. The performance is not compromised because the processing is now distributed on multiple nodes working in parallel. The number of nodes can be increased very easily, to increase the performance as needed. This is the high-level architecture of the Hadoop Big Data ecosystem that depends on multiple nodes.
SAP introduced a new solution for analyzing Big Data in 2015, called SAP HANA Vora. SAP HANA Vora has an in-memory data-processing engine that can be integrated into the Hadoop Big Data ecosystem and the Apache Spark execution framework. Apache Spark is a general- purpose in-memory data-processing engine that is compatible with Hadoop distributed data.
The SAP HANA Vora engine is designed for use in large distributed file systems handling Big Data. It boosts the performance by processing data in memory, and also provides online analytical processing (OLAP)-style capabilities for multi-dimensional analysis, including hierarchical reporting. It also improves the integration and faster consumption of Big Data from Hadoop environments and other solutions, such as SAP HANA. Though Hadoop is an open-platform solution from Apache, commercial Hadoop distributions are available from many vendors. Right now, SAP HANA Vora is supported in these commercial distributions:
- Hortonworks Data Platform (HDP)
- Cloudera Enterprise (CDH)
SAP HANA Vora plugs in to the general in-memory data-processing engine Apache Spark. SAP HANA Vora takes advantage of the Apache Spark execution framework on top of Hadoop to analyze Big Data interactively. SAP HANA Vora can function in two kinds of major scenarios
- By itself on top of Hadoop (no need for SAP HANA)
- Integrate SAP HANA data to Hadoop or Hadoop data to SAP HANA or for bidirectional integration
In the business case scenario used in this article, Hadoop needs to federate its Big Data with the enterprise data in SAP HANA. In this scenario, SAP HANA Vora can help consume the Big Data from both Hadoop (using the Apache Spark execution framework) and enterprise data from SAP HANA, thus providing a single platform for merging the data for combined analysis. This enables data scientists and developers to analyze their dataset in Hadoop quickly by combining it with the enterprise data stored in the SAP HANA database.
Below is the picture of the main components of Hadoop environment with SAP HANA Vora installed
For this scenario, before SAP HANA Support Package Stack (SPS) 10, SAP HANA connected to Big Data using Open Database Connectivity (ODBC) connections for Smart Data Access (SDA). Starting with SPS 10, SAP HANA consumed Big Data using the Apache Spark Controller for connection with the Hadoop platform. Now with the release, SAP HANA SPS 11, SAP HANA Vora, released at version 1.0, is another option. In this version, it still uses the Apache Spark Controller (Spark-SQL adaptor) to connect to the Hadoop platform. However, the connection now happens to the SAP HANA Vora services running in the Hadoop environment, instead of depending on Apache Spark and Hive Metastore as it used to be in SAP HANA version SPS 10. With this in place, the data is now available for bi-directional consumption, either from Hadoop or SAP HANA, in a federated environment of SAP HANA and the Hadoop platform.
I will cover the SAP HANA Vora architecture in the next one as part II