HANA + Hadoop in the Cloud: The role of the HANA Enterprise Cloud in SAP’s Big Data strategy
There has been a noticeable increase in interest in Big Data at SAP (Big Data bus, etc) in recent months.
What is still unclear, however, is the relationship between SAP’s Cloud and its Big Data strategies.
In a recent Big Data-related blog, Vijay Vijayasankar makes a passing reference to how these two strategies relate to one another:
And we will make it easy to use – easy to administer, easy to consume, easy to extend and so on. You choose the deployment model that is right for you – keep it inhouse, or move it to Hana Enterprise Cloud. [SOURCE]
I asked Vijay on Twitter for confirmation that the HANA Enterprise Cloud is one deployment option for such BigData solutions and he responded:
very specifically, platform for sap big data solutions – that explicitly have Hana
With this as a starting point, I started looking for other background material. Swen Conrad – SAP HANA Marketing – also references this scenario:
Running mission critical applications such as SAP Business Suite, SAP Business Warehouse and several big data applications delivered as a managed cloud service. These services help customers assess, migrate and run rich applications with cloud simplicity [SOURCE]
In theory, it appeared that there was some relationship but I needed to look at some concrete scenarios.
With this in mind, I examined some of the existing SAP Big Data solutions – one of which is Demand Signal Management (DSiM):
SAP Demand Signal Management, which is powered by SAP HANA, contains a consistent, centralized In-Memory database that stores large volumes of data such as internal master and transactional data (e.g. shipments), external POS data and market research data, etc. This is combined with a framework that ensures the integration, cleansing and harmonization of the data during upload in order to achieve high qualitative results and a common data base for reporting and further processing of the data. [SOURCE]
Trying to stump Vijay, I asked about the viability of this product on HEC. His response:
absolutely viable for HEC – it sits on top of BWoH which HEC supports
Satisfied with this answer, I moved on to other Big Data topics.
HANA + Hadoop
These activities included a blog about a series of reseller agreements from SAP regarding Hadoop. Part of that press release concerned a set of new Big Data applications from SAP.
SAP Demand Signal Management is the first in a series of big data-enabled applications SAP intends to release before the end of 2013, including the SAP Fraud Management analytic application as well as the SAP Customer Engagement Intelligence solution, which includes the SAP Audience Discovery and Targeting, SAP Customer Value Intelligence, SAP Social Contact Intelligence and SAP Account Intelligence analytic applications. By deploying big data-enabled applications, enterprises can get to repeatable, measurable results much faster by infusing insights directly into day-to-day operations
What was interesting about these new applications, however, wasn’t mentioned in the official press release:
SAP has started rolling out shrink-wrapped applications designed to run on the combination of Hadoop and HANA. The first application, called SAP Demand Signal Management, is designed to help manufactures capture and analyze large volumes of “downstream” demand signals, including retail point of sale (POS) data, consumer sentiment data, and market research data.
SAP has plans to deliver two additional shrink-wrapped Hadoop-HANA apps before the end of 2013, including the SAP Fraud Management analytic application and SAP Customer Engagement Intelligence solution. [SOURCE]
These were applications from SAP that would be based on both HANA and Hadoop. This intention made the reseller agreements much more understandable.
HANA + Hadoop in the HANA Enterprise Cloud
I remembered Vijay’s previous tweet about DSiM and I started wondering about the use of Hadoop in such HEC-based Big Data applications.
Swen Conrad had referred to this possibility in another context.
Another use case example, according to Conrad, will be a hybrid (on-premise/HANA cloud) environment for teal-time analytics and big data projects.
“It will probably be too costly to perform big data projects entirely in the cloud with SAP HANA, as our pricing will be based on the amount of data you have in-memory. But, we can integrate HANA from the cloud with Hadoop, so companies can combine the two in a hybrid architecture,” he told IDN. In this approach, a company would collect and filter its data using Hadoop and once it has identified the meaningful datasets, put those into HANA. “In that way, HANA can provide you with real-time data results, rather than waiting for hours and hours, at a very reasonable cost.” [SOURCE]
From my understanding, Swen’s portrayal was that Hadoop would exist somewhere else (OnPremise?) rather than in HEC.
I recalled another case of a SAP customer – the Globe and Mail newspaper – using Hadoop in a cloud-based scenario that was based on HANA One on AWS.
I was curious as to whether this solution – although based on another cloud provider – would work on HEC and starting bugging Vijay again.
Boom – Hadoop + HANA running on the HANA Enterprise Cloud.
Ilet that news settle for a few minutes and started to think about the repercussions of this design pattern. I made a quick drawing to depict the potential impact of this functionality.
- Although Vijay made his Hadoop-related comment referring to the possibility of the Mail and Globe solution running on HEC, it is possible to imagine one of the newly announced HANA+Hadoop Big Data applications such as DSiM running on HEC.
- As the example of the Globe and Mail demonstrates, cloud-based HANA + Hadoop Big Data applications are technically possible. The presence of such applications, however, in the HEC is a different story inasmuch as they must be understood in the context of the HEC as a Managed Service with its associated characteristics (SLAs, maintenance / support, etc).
- Although Hadoop + HANA Big Data solutions on HEC might work with customers still using an OnPremise BusinessSuite on HANA (it might also work for OnPremise Business Suites not running on HANA), they would be a perfect fit for those customers with a HEC-based BusinessSuite on HANA.
- The usual manner for HANA apps to access Hadoop is via Smart Data Access (SDA) so there is no direct access to Hadoop from the Big Data applications.
- HANA is a prerequisite for an application to run on the HANA Enterprise Cloud. Thus, Big Data applications which are only based on Hadoop probably will not be hosted in the HEC.
- There are a variety of other HEC certified partners. The ability of such partners to provide HANA + Hadoop Big Data applications may be limited in that most partners won’t have reseller agreements with various Hadoop distributions. This distinction may provide the SAP-hosted HEC with certain competitive advantages in the Big Data marketplace.
- Another scenario for these “H2” Big Data applications on the HEC would be where Hadoop is hosted external to HEC. This might lead to performance issues inasmuch as data – perhaps large amounts of data – would have to be transferred between the two hosting environments.
- My assumption is that initially most of these Big Data applications hosted in SAP’s HEC will originate from SAP. I could imagine some sort of certification program for non-SAP Big Data solutions – from customers or partners – would be required before SAP takes responsibility for them as part of the Managed Service. In their own certified HEC environments, partners might wish to host their own Big Data apps.
- Which Hadoop distribution will be used for these applications? SAP has reseller agreements with Intel and HortonWorks. I assume that there will be a single Hadoop distribution for all Hadoop-based Big Data applications in HEC inasmuch as this would greatly simplify administration and support of those applications.