What is VORA and How it helps to Bridge the gap between Enterprise data and Big Data
I have written this Blog to address the beginners doubts about VORA and also I tried to give overall picture of VORA benefits.
Before getting into the Topic of VORA first we lets try to understand what is Enterprise data, Big Data, HADOOP, SPARK.
What is Enterprise Data – Data that comes from Day today business transactions eg. Sales order, Purchase Order, etc.
What is Big Data – Data that comes from information-sensing mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers and wireless sensor networks, Social Media and Archived Data.
Generally Enterprise data is stored in an Expensive Hardware and Big Data is stored in distributed less expensive commodity hardware.
Till 5 years back for any company, Enterprise data is “Must to have Data” and Big Data was “Good to have Data”. But the current study as per Gartner and Harvard shows that big data is also now getting part of “Must to have data”. Because Analytics on Big data showcases various insights which is really helpful for business to grow and provide Edge over their competitors.
As soon as big data getting part of “Must to have Data” we face two major issues:
- Big data is stored in less expensive and distributed environment running complex analytical quires in such an environment will not yield good query performance.
- Reports that demands combining Enterprise and Big data will be very challenging because both data resides in different landscape.
VORA really helps to address these two problems and bridges the gap between Enterprise data and Big Data.
What is VORA – To understand VORA first we have to understand HADOOP and SPARK.
HADOOP – It’s an open source software for distributed computing. When you wanted to store huge volume of data in a distributed landscape then HADOOP basically does the following.
- HADOOP will help you to create a distributed landscape by combining multiple systems in your landscape
- HADOOP helps to distribute the data and processing load to multiple systems.- Load distribution
3. HADOOP supports High Availability by providing auto failover feature. (I.e. if anyone node goes down other backup node will take up automatically).
HADOOP works just one layer above Operating system and does distributed computing using Hadoop Distributed File System (HDFS). Hence HADOOP handles the data in terms of files. In most of the cases it’s not so easy to process data when it is stored in unstructured file format.
So we need some software to structure the data. In our traditional systems we always structure the data files using software like MySQL, ORACLE, DB2, etc. Similarly to structure the HDFS files we need some software, list of few such software is
Apache Spark is a fast and general engine for large-scale data processing. Spark Combine SQL, streaming, and complex analytics.
SAP HANA Vora™ is an in-memory query engine that plugs into the Apache Spark execution framework to provide enriched interactive analytics on Hadoop.
So just to summarize:
1st – Couple of Hardware machine with homogeneous OS were combined together with a software called HADOOP to perform distributed computing in better way
2nd – To structure the HDFS files and to process DATA via SQL we need SQL engine provided software called as SPARK.
3rd – To process the Data in In- Memory and to have an interactive interface to Model data and process it we need a software called as VORA.
SAP HANA Vora can work as a stand-alone solution or in concert with the SAP HANA platform to extend enterprise-grade analytics to Hadoop clusters as depicted in Figure:1
VORA 1.2 comes with a modelling tool through which user can perform following activities:
- Data Browser- allows you to view the available tables, views, dimensions and cubes in Vora engine. It also allows you to have a preview of the data, download the data as a CSV file, filter the columns and refresh them
- SQL Editor – allows you to run the queries on Vora engine using Vora SQL, it also shows you the compilation warnings, errors and outputs and the result of the query when you run the select
3. Modeler – could be used to create SQL views, Dimensions or Cubes.
So let’s see the answer for our 2 problem statements:
- Big data is stored in less expensive and distributed environment running complex analytical quires in such an environment will not yield good query performance. à
- While Hadoop can store and access vast amount of detailed data at lower costs, it is not as well suited to the fast, drill-down nature of today’s business questions. SAP HANA Vora is an in-memory processing engine that runs on a Hadoop cluster and is tightly integrated with Spark. It is designed for handling big data. SAP HANA Vora makes available OLAP-style capabilities on Hadoop, provides deeper integration with SAP HANA, enabling high-performance analytics.
- VORA provides an user Interface using which user can easily model their HADOOP data and do data analytics easily .
2. Reports that demands combining Enterprise and Big data will be very challenging because both data resides in different landscape.
- Enterprise Data is stored in HANA and BIG data is stored HADOOP so if user needs a report that combines both data then HADOOP data can be virtualized (SDA) and Joined with HANA data and then it can be reported. This process is made easy using VORA connector .
b. And also If user wants to archive older data in HANA and move it to HADOOP and combine with other HADOOP data then that can also be achieved seamlessly using HANA Data Warehouse foundation Tool – DLM which uses VORA Connector. In this case the archived data can be queried from HANA side or from HADOOP side as user wish.
Hope this blog helps you to understand the basics of VORA..