COIL Enabling Big Data Projects
Well some time has elapsed since I last wrote. I surely owe and update on the Monsta project since my last post and my sincere appreciation to those commenting and expressing interest in Monsta, but I’m going to hold off once more, in order to bring a new topic into focus; Big Data.
Since early last year, COIL has participated in numerous discussions around how we can help to enable projects and POCs attesting to the value delivered from a Hana/Hadoop deployment. I would say that the first discussions COIL took part in seemed to be just in front of the surge of big data articles that have surfaced over the past several months.
COIL eventually teamed up with solution marketing at the start of 2012, where we looked for a means to enable a big data initiative allowing SAP to work with a variety of internal teams, established partners and newcomers from its ecosystem. For more insight there was a Big Data press release published at SapphireNow that shares some of the things going on in this space. We’ve yet to talk to a hardware partner that is not interested in collaboration as most if not all are marketing Hadoop-ready platforms and actively working with the various Hadoop distributors. Among many of the Hardware vendors (servers, appliances and storage), virtualization software, OS providers, Hadoop distributors, Systems Integrators and ISVs, we’ve talked to, nearly all express interest in working with SAP in its co-innovation lab.
It has been our thought for some time that COIL could serve to orchestrate having multiple hardware partners (some who are either COIL sponsors or project members already) along with other participants from the SAP ecosystem to pursue “big data” project work. The output from such projects would then not only be relevant to helping SAP refine its own Hana Hadoop strategy as neccessary, but the same project work output contributes in the same way to the participating firm supporting its own strategies. The collaboration and tacit knowledge exchanged in these projects serves to enrich the knowledge required to deploy and use Hana/Hadoop as well as to potentially produce spillover effects that can often lead to discovery of both unknown opportunities or to mitigate or avoid risk.
Data just doesn’t remain as an archive anymore. There is a growing interest and need to tap every datastor; public, private, structured, unstructured, semi-structured in a way that will expand and deepen a firm’s core knowledge. With desire for real time predictive analytics, this means being able to use data from the moment it is collected while simultaneously managing the increasing velocity and variability of each data set.
SAP can now leverage Hadoop’s distributed file system and Map Reduce Framework for pre-processing a high volume of raw data in variety of formats. SAP EIM has the capability to perform text data analytics by pushing down the entity extraction operations into Hadoop. For those not so familiar with Hadoop, it uses a MapReduce programming interface that performs two functions; the Map function, which grabs a source of data and then gets applied to all members of a dataset to then be processed across multi-core systems where a result set then goes to the Reduce function. Hadoop uses MapReduce alongside of the Hadoop Distributed file System (HDFS) which is how data gets stored.
No question that Hadoop is clearly hyped up in the market these days but the interesting thing to see is that it appears to scale well over commodity hardware so firms are certainly exploring how and what big data sources can be tapped and if proved valuable, to determine how it becomes a system to be deployed into an on-premise production environment. All of this is still so new; with many open questions related to impact on change management, security as well as to balance system performance with operational costs. Hadoop is not without issues. To begin, a batch oriented system is not likely to meet the needs of every big data challenge. MapReduce recieves plenty of attention today, but it too possesses known limitations that may need to be grappled with for a while in trying to make things work. Given that Hadoop and Hana both benefit from application development done over top of them, seeing a collaborative effort among SAP and partners to explore a range of things from the creation of new applications over Hana and Hadoop to optimizing the architecture and infrastructure needed for large production deployments is the right approach. Dan Woods has looked at this in even more depth:
From a COIL perspective, we see enabling SAP and multiple partners to qucikly and efficiently engage in such project work allows for a variety of important questions to be examined through different use cases relative to systems management, scalability, high availability, security as well as even green it considerations. Any or all of these things can become key to ensuring successful deployments, and yet not all things can be explored from just within the Hadoop community or only within the Hana community. COIL projects can create touch points allowing project teams to then draw from both communities as knowledge flows that can serve to strengthen collaboration and co-innovation efforts to yield successful project results.
It arguably makes good sense for SAP to develop a strategy that shows how Hana fits into a big data landscape. It is also quite useful to have a viable means for exploring what technologies, configurations and reference architectures are most valuable for optimizing how Hana can take as input, a variety of large data sets requiring lighting fast computation and ready for being fully visualized and analyzed. To date, there are two projects explored where SAP BI and Analytics experts collaborated with internal teams, Cloudera and IBM, to establish the architecture blueprint for an SAP Hana/Hadoop solution useful to a variety of use case scenarios that bring structured and unstructured data together.
In the projects running up to SapphireNow this year, two demos were created as well as some new ones still in progress at COIL working with Hitachi and Cognilytics. Of the two exhibited at Sapphire Now 2012 they were:
1.) Retail POS data showing HANA + Hadoop complimentary solutions
2.) Sentiment Analysis used in HANA+ Hadoop by leveraging DS 4.1 text analytics
The project team was comprised of members from Solution Marketing, Ecosystem and Channels, EIM, SAP prototyping, Cloudera, IBM and COIL. Each participant contributed hardware, software and subject matter expertise as needed to build out a landscape capable of addressing useful business analysis and decision support.
The project demo was designed to show how Hana and Hadoop allows for real time data exploration using SAP Hana and SAP BOE to make sense of a data set comprised of over a billion sales records. It demonstrates how you can gain insights beyond just what an indicator such as sales revenue can tell you about how well a firm’s product sales are doing and where customers are most interested. The demo describes going a step further by trying to assess what products can potentially sell better. By assessing product data like that which can represesent customers who look at items of interest but don’t buy, so that a firm can use this data to concentrate selling to a customer segment that has high potential to purchase. Combining unstructured and/or low value data such as web logs and customer sentiment text from Hadoop with structured product data in Hana to do real time correlation and analysis is a pairing of technologies that is technically and economically feasible to implement and brings new value to the how to draw benefit froma variety of large data sets.
COIL looks forward to seeing SAP and more participants from its ecosystem of customers and partners pursue big data projects. I will save it for a later post, but COIL is also helping to enable the SAP Startup Focus Program where we now have a number of exciting startups working in COIL to develop Hana into various new solutions they are bringing to market. Some of these new firms are without question also tackling big data problems so when we talk about potential for spillover effects as more participants pursue big data projects in COIL, we are not kidding. Since SapphireNow, we are already aware of discussions and proposals for new projects that will bring some of these startups toward working with both Hana and Hadoop. As the project work continues, we will share results along the way. For those interested in big data and active in some fashion as to how Hana and Hadoop factor, we would welcome your comments, input and even interest in becoming active in this important initiative.