Why SAP Vora for Big Data?
As the former SAP CIO used to say, “HANA is growing up so fast.” Well HANA has certainly matured over the last 5 years and with age, the adolescent has new demands.
Everyone who knows two cents about HANA knows its quite expensive, in terms of licensing and hardware. In BW world, this becomes a problem because of explosive growth in data, primarily due to IoT. So SAP provides the option to store the warm data in a columnar disk based store (Dynamic Tiering), which is managed directly by HANA. This is far cheaper than HANA in-memory and thus improves the price to memory ratio for the solution.
But why stop there? As the customers appetite for storing and processing data grows, SAP has to offer a way to leverage Big Data / Hadoop as a cold store. A popular strategy is to use SDA to access Hadoop via Hive or Spark. What most people are not aware of is that this is not a good way of utilizing Hadoop. By simply sending the query to Hadoop, the data is returned to HANA for processing. However, the entire premise for HANA is to send the code where the data is; so SDA is not the right approach for Big Data. What is needed is for HANA to be able to inject its query into Hadoop nodes and leverage Hadoop’s processing power; for example, to do transformations on very large data sets, that cannot be loaded into HANA memory anyway.
This is the problem that is answered by Vora, which is a layer that sits on top of Spark in Hadoop. The role of Vora is simply to allow HANA to leverage Hadoop for processing intensive work. There are other advantages as well, such as support for hierarchies and currencies in line with HANA.
The downside of Vora is that its triples the sizing requirement for the Hadoop cluster and adds significant SAP licensing costs. This damages the case for Big Data with SAP for the time being. But Vora is still new and as time passes, we will see more refinements and perhaps a more feasible licensing strategy from SAP.