SAP HANA VORA & Hadoop
As a “Data Architect” at one of the big utility companies in Australia, I was wondering whether we as in the company, should be considering VORA or not. Certainly we could get SAP to do a little presentation to us, but I thought of doing a bit of digging myself. I would like to share my thoughts and observations with you and hope this will assist you or your organization in some shape or form.
As VORA is a new product and knowledge/information in the market is not yet widely available, there are quite a few questions for whom finding an answer is tricky. I tried to collaborate all this information into one place from an Analyst, Architect, and BI manager perspective.
“Let’s start with basics “
What does VORA means?
VORA is the Latin root for “VORAcious” or in other words “big”. As VORA can consume large amounts of data, it was given this name as per comments from a SAP spokesman.
What does it do?
VORA is an in-memory query engine which plugs into apache Hadoop framework to provide interactive analysis. VORA is using SPARK SQL library and HANA compute engine.
How does it do it?
HANA VORA is a combination of Hadoop/YARN (resource allocation), Spark (in memory query engine) and HANA push down query delegation capabilities. VORA handles OLAP analysis & hierarchical queries very well as it does layers in few enhancements to Spark SQL. VORA can exist on standalone basis with one of the Hadoop nodes but can also integrate with classic HANA. Classic HANA integration of-course will incur infrastructure cost but Hadoop integration should cost next to nothing in terms of infrastructure cost.
“We’re taking the lessons learned with what we’ve done with HANA, the real-time, interactive experiences which you can do in the enterprise cloud and applying this to Hadoop,” Tsai said. “But it’s not just making Hadoop interactive …. a lot of people are working on that; but how you also provide those real-time, interactive experiences and that business semantic understanding in Hadoop, and I think that’s the biggest thing that SAP has put in.”
What are the specific features of VORA vs Apache Spark?
VORA is an extension to the Hadoop platform and includes the following features in its first version:
- Accelerated In-Memory processing
- Compiled Queries
- Support for Scala, Python and Java
- HANA and Hadoop mash-ups
- Support for HDFS, Parquet and ORC
- NUMA awareness
Is VORA based on SAP HANA?
No, VORA is a completely new code base, but the engineering team is the same group as the HANA engineering team, so many concepts and ideas have been borrowed from SAP HANA, as you can see by the feature list. VORA and SAP Hana can exist separately.
Who will benefit by using SAP HANA VORA?
SAP HANA VORA will deliver the most value to people in the following positions:
Business analysts can perform root cause analysis using interactive queries across both business and Hadoop data to better understand business context.
Data scientists can discover patterns by trying new modelling techniques with a combination of business and Hadoop data, all without duplicating data copies within data lakes.
Software developers can deploy a query engine within applications that can span enterprise and Hadoop systems using familiar programming tools.
What type of licenses are there and how much will it cost (just the application)?
What are the challenges which SAP is trying to address using VORA?
Currently “batch process” based tools in Hadoop landscape does not provide fast and drill down mechanism to slice and dice the data. VORA will complement the stack of tools Hadoop enabled enterprises have
When is SAP releasing VORA to the market?
SAP VORA will be released on 18th September 2015. As per SAP roadmap and strategic directions it will be available in cloud first. I am expecting all type of licenses to be available from 18th September, but if in case there is a delay that could only be to the on premise version.
Integration to Hadoop
As you can guess from the screen shot below, SAP HANA VORA will be available as a configurable tool within Hadoop landscape. The question now arises is around Hadoop enterprise versions e.g. HORTONWORKS and CLOUDERA, when are they going to accept and release this into their landscape.
Steve Lucas, president of SAP’s Platform Products Group mentioned in his conversation with “Fortune” that VORA is to augment and speed up data queries of unstructured data, but not to displace Apache Spark.
What are the high level differences between SAP-HANA, VORA and Apache SPARK?
According to me SAP VORA will be a good addition for companies who are already on SAP platforms. Such companies can integrate their transactional, lake and other data sources into one VORA and create mash-up queries for deep dive and interactive analysis. For others I recommend to explore the options for a tool within Big Data Space or they can certainly consider to buy VORA which is a commercial product and offered separate to HANA.
Any question feel free to reach out to me.
& SAP Product guide