SAP HANA VORA & Hadoop
As a “Data Architect” at one of the big utility companies in Australia, I was wondering whether we as in the company, should be considering VORA or not. Certainly we could get SAP to do a little presentation to us, but I thought of doing a bit of digging myself. I would like to share my thoughts and observations with you and hope this will assist you or your organization in some shape or form.
As VORA is a new product and knowledge/information in the market is not yet widely available, there are quite a few questions for whom finding an answer is tricky. I tried to collaborate all this information into one place from an Analyst, Architect, and BI manager perspective.
“Let’s start with basics “
What does VORA means?
VORA is the Latin root for “VORAcious” or in other words “big”. As VORA can consume large amounts of data, it was given this name as per comments from a SAP spokesman.
What does it do?
VORA is an in-memory query engine which plugs into apache Hadoop framework to provide interactive analysis. VORA is using SPARK SQL library and HANA compute engine.
How does it do it?
HANA VORA is a combination of Hadoop/YARN (resource allocation), Spark (in memory query engine) and HANA push down query delegation capabilities. VORA handles OLAP analysis & hierarchical queries very well as it does layers in few enhancements to Spark SQL. VORA can exist on standalone basis with one of the Hadoop nodes but can also integrate with classic HANA. Classic HANA integration of-course will incur infrastructure cost but Hadoop integration should cost next to nothing in terms of infrastructure cost.
“We’re taking the lessons learned with what we’ve done with HANA, the real-time, interactive experiences which you can do in the enterprise cloud and applying this to Hadoop,” Tsai said. “But it’s not just making Hadoop interactive …. a lot of people are working on that; but how you also provide those real-time, interactive experiences and that business semantic understanding in Hadoop, and I think that’s the biggest thing that SAP has put in.”
What are the specific features of VORA vs Apache Spark?
VORA is an extension to the Hadoop platform and includes the following features in its first version:
- Accelerated In-Memory processing
- Compiled Queries
- Support for Scala, Python and Java
- HANA and Hadoop mash-ups
- Support for HDFS, Parquet and ORC
- NUMA awareness
Is VORA based on SAP HANA?
No, VORA is a completely new code base, but the engineering team is the same group as the HANA engineering team, so many concepts and ideas have been borrowed from SAP HANA, as you can see by the feature list. VORA and SAP Hana can exist separately.
Who will benefit by using SAP HANA VORA?
SAP HANA VORA will deliver the most value to people in the following positions:
Business analysts can perform root cause analysis using interactive queries across both business and Hadoop data to better understand business context.
Data scientists can discover patterns by trying new modelling techniques with a combination of business and Hadoop data, all without duplicating data copies within data lakes.
Software developers can deploy a query engine within applications that can span enterprise and Hadoop systems using familiar programming tools.
What type of licenses are there and how much will it cost (just the application)?
What are the challenges which SAP is trying to address using VORA?
Currently “batch process” based tools in Hadoop landscape does not provide fast and drill down mechanism to slice and dice the data. VORA will complement the stack of tools Hadoop enabled enterprises have
When is SAP releasing VORA to the market?
SAP VORA will be released on 18th September 2015. As per SAP roadmap and strategic directions it will be available in cloud first. I am expecting all type of licenses to be available from 18th September, but if in case there is a delay that could only be to the on premise version.
Integration to Hadoop
As you can guess from the screen shot below, SAP HANA VORA will be available as a configurable tool within Hadoop landscape. The question now arises is around Hadoop enterprise versions e.g. HORTONWORKS and CLOUDERA, when are they going to accept and release this into their landscape.
Steve Lucas, president of SAP’s Platform Products Group mentioned in his conversation with “Fortune” that VORA is to augment and speed up data queries of unstructured data, but not to displace Apache Spark.
What are the high level differences between SAP-HANA, VORA and Apache SPARK?
According to me SAP VORA will be a good addition for companies who are already on SAP platforms. Such companies can integrate their transactional, lake and other data sources into one VORA and create mash-up queries for deep dive and interactive analysis. For others I recommend to explore the options for a tool within Big Data Space or they can certainly consider to buy VORA which is a commercial product and offered separate to HANA.
Any question feel free to reach out to me.
SAP HANA VORA & HADOOP | Amandeep Modgil | LinkedIn
& SAP Product guide
This is the first time I am hearing of 'Vora'. A google search returned only this link "SAP HANA Vora | Hadoop In-Memory Query | SAP". Is there a more detailed document or technical overview to learn more about SAP VORA?
I know its a bit tough without much information available online (SCN google etc). .
What i know is that the VORA platform is at its early stages and still needs a bit of effort to integrate it fully into HANA studio. eg. at present you need to write "Zeppelin" (Hadoop tool set) queries to use VORA which in turn uses SPARK framework. All i can say at this stage is that its better to wait till mid next year or Q2 next years and see how it evolves as a product. it might not require Zeppelin at all.
Apologies again that i could not help with the documentation.
May be someone from SAP itself be able to help.
I will forsure post to this blog if i come across anything.
I would say the documentation is extremely lacking. Right now i'm just trying to figure out what was actually added to Spark as all of the HANA academy videos show using "Vora" via the Spark APIs. Specifically I am wondering what a "Vora" table is? Are they temporary? Are they just data frames registered as tables? Where are they stored? Is there any relationship to the HiveContext for this?
You are right Jared. Looked at the presentation on this in TechEd2015 site and again there's not a lot of information on what exactly Vora does?
Waiting for the replay to be made available for this session "ST110:SAP HANA Vora -- Why We Developed It and Where We Plan to Go with It"
What a fancy name. So I'm looking forward to see what is brought to the public and see what the roadmap shows.
no need to wonder as there are 49 youtubes trying to explain it for you: SAP HANA Vora - YouTube and both Amazon and IBM (and probably others) have their own products wrapped around Spark.
In the document http://go.sap.com/docs/download/2015/08/aebfe277-3d7c-0010-82c7-eda71af511fa.pdf is it mentioned that SAP HANA Vora Developer Edition is available for free on SAP HANA Cloud Platform.
I want to build some use-case on HANA Vora. Could you please guide me where can I find HANA Vora on SAP HANA Cloud Platform?
at the end of the blog, Balaji mentions a link where you can register.
Thank you for sharing SAP-HANA-Vora posting....
Can you share any thoughts on how VORA is integrated with SAP BI Landscape
I dont think that the code base is same for both applications/landscape. The use case of VORA is quite different to BI toolsets (different set of audiences). You be able to use HANA and HADOOP using VORA but you at this stage wont be able to use all of the BI tools via just one tool e.g. VORA. On the other hand you can connect to HADOOP, HANA, BW, Excel etc using a combination of BI tools (e.g. Universe, Webi).
Hope this helps.
My apologies everyone that i was not able to reply for long. Some issue with my SCN account. I think i have too many of them i suppose 🙂 .
Most of your questions have been answered by others, but the one who have been left alone i will try to address their concern/question soon.
can anybody highlight the comparison between SDA and VORA. how VORA is beneficial than using SDA ?
May I know the reason for this comparison... Because these are two totally different entity altogether.
Let me try to start with explaining what is SDA & VORA:
SDA- Smart data access ,using this HANA can connect to many databases like Sybase ASE,IQ, Oracle, HADOOP, etc.
VORA- Is a in memory Query processing engine along with few OLAP and Hierarchy handling capability similar to HANA(Lot more to come in future) which sits on top of HADOOP and SPARK .
so with this explanation what is concluded is:
SDA is something that is used to connect any DB to HANA and this SDA feature comes always in conjunction with HANA.
VORA currently can work only with HADOOP and it doesn't require HANA.
Lets assume a use case to understand it better.
Customer has some data in database like ORACLE, IQ or ASE. And also customer has some data in HANA for data analysis. Now there is requirement to do adhoc data analysis on the data that resides outside HANA along with HANA data , in this case the other database can be connected to HANA via SDA and virtualize the data in HANA and can do data analysis .Here only SDA can be used.
Customer has some data in HADOOP and also has some data in HANA for data analysis. Now there is a requirement to do data analysis on HADOOP data along with HANA data. Now customer has two option
• Connect HANA to HADOOP via VORA
• Connect HANA to HADOOP via SDA
So here the difference between SDA connectivity and VORA connectivity is, When connecting via SDA data will be extracted from HADOOP and proceeded in HANA .When connecting via VORA, HADOOP data will be processed in VORA engine and the processed data output is given to HANA. So connecting via VORA should be faster than connecting via SDA.
Customer doesn’t have HANA in its landscape and their data resides in HADOOP. Now to do faster data analysis and also to do OLAP kind of processing for a large data set in HADOOP customer can use VORA. And as it is explained very clearly by AMANDEEP in the document it’s a component that sits on top of SPARK & HADOOP. I think SAP LUMIRA already has direct connectivity to VORA that means using LUMIRA customer can analyze and visualize HADOOP data using VORA.
And as it is shown in AMANDEEP screenshot VORA will be configurable Tool in HADOOP like other tools in HADOOP.
Hope this clears your doubt…
Thanks & Regards
This is a nice written and explained. If you can turn this into a blog or document, it would be helpful to more people to understand Vora's placement on the big data map.
Thanks for your compliment, I will document when time permits 🙂
Should this statement:
VORA currently can work only with HADOOP and it doesn't require HANA.
read as follows:
VORA currently can work only with HADOOP and SPARK and it doesn't require HANA.
very good questions though. thanks, greg
I am little confused after reading the second statement. Can Vora run on Hadoop without Spark?
@ Gregory Misiorek- Thanks for correcting my statement.
@-Benedict Venmani- Yes VORA currently requires SPARK ,it cannot run directly on HADOOP as of today.
Thanks & Regards
Thanks Dinesh.... That was descriptive answer. But I am still confused with the use-case.
1. Both VORA and SDA , do not pull the data physically from HADOOP to HANA. its just the Virtuallization. So why should spend additional cost on VORA ? SDA also has Spark Adapter for rapid processing , so I can process data in HANA itself after extraction from HADOOP , cause anyways data extraction should be in structured way .
2.What exactly we mean by Hierarchy handling capabilities ? Can you please clarify in more details.
how SAP HANA VORA organize massive volumes of unstructured data into data hierarchies? Is it through programming ?
3.Considering the scenario where we do not have HANA in place but Hadoop only, is there any other tool which gives nearly equal benefits like VORA ? I mean to say customers only having HADOOP datasets without SAP , why they should go for SAP product like VORA?
Its tough to convince you 😉 .. Just kidding. I will give another try..
find my answers below:
1) Personally I will also will agree with you. I will not buy VORA license just for connecting HANA to HADOOP and extract data. As you said I will just use SDA with SPARK adapter.
But as a customer if I am initially in Case 3 and I had purchased VORA later I was falling into case two in that case I prefer using VORA connectivity over SDA.
2) Lets assume customer is so much used to BW(OLAP) kind of query analysis for an example handling hierarchical data ,unit or currency conversion,etc. VORA will provide you a design or coding interface to do this easily.
3) Why should Non SAP customers should go for VORA? Good question ..... this product is relatively new to the market so I really don't have convincing answer to you where I can really highlight the feature's. But I can say this product is evolving and I hope this will have good feature's in the future which will make the unstructured data processing easy and at the same time will provide good performance....
Thanks & Regards
Excellent comparison and clarified my doubts between Hana SDA and HANA Vora engine.
I have few questions.
Hana uses SAP HANA Spark controller as a connector to communicate with Vora engine that sits on Apache Spark. This way, Hana shares the data and consume the data from Hadoop respectively. What is the functionality of Apache Spark controller and how it differs from SAP HANA Spark controller?
I'm looking for industry specific use cases for HANA + Hadoop. Could you please share the same
Thanks & Regards,
You can refer this document for multiple industry scenarios from business perceptive
Key Scenario for Hadoop, we are working no at moment -- below template from SAP guideline
Thanks for the information. I learned some information from sap hana vora blog.