The Quest to Understand the Use of MongoDB in the SAP PaaS
Following the recent TechEd in Las Vegas, I was trying to get a feel for how the attendees had responded to the event. I was reading various blogs and articles when suddenly I saw a tweet from analyst Dennis Moore that made me spill my coffee.
Considering the importance of HANA at the TechEd, I found the tweet intriguing for a variety of reasons: 1) I found no similar press release from SAP, 2) as a keen observer of SAP’s OnDemand strategy, I hadn’t heard about this development. Based on these two factors, I decided to do a little digging and find out what was really going on.
My intention in this blog is to shed more light on this subject and attempt to provide a context for this decision.
Searching for Clues: the role of MongoDB in the PaaS
I started out by looking more closely at the associated press release and found this sentence:
MongoDB was selected for the enterprise content management (ECM) section of the platform, as its flexibility and scalability will enable SAP to scale its content management service on its PaaS offering to meet customers’ requirements while managing data from different applications.
OK – the use of MongoDB appeared to be restricted to one particular service (enterprise content management (ECM)) of the platform and not the entire platform as such. Don’t forget SAP’s on-demand platform strategy has two major components – for ‘core’ on-demand applications or extensions to Business ByDesign and for lightweight, collaborative, purpose-specific ‘edge’ applications. The use of MongoDB is restricted only to the ‘edge’ component of the platform. If you look at the architecture of this component, you will see that there is multitude of services (functional and platform-based) that are available to developers to create applications on the platform. ECM is just one service amongst many. Indeed, ECM isn’t even described on the following explanatory slide from SAP development.
[NOTE: In this slide, InMemory Computing Engine refers to SAP HANA]
Thus, some of the headlines that surfaced in the analyst community (for example, “MongoDB Sits at the Heart of SAP’s Platform-as-a-Service Product”) are a great exaggeration.
After I had discovered the relative role of MongoDB in the platform, I was interested to find more details on the use case that motivated the use of MongoDB in the ECM service.
Further research made it apparent that the CMIS protocol was being implemented in the service. CMIS is an OASIS specification for improving interoperability between Enterprise Content Management systems. SAP is also actively involved in the CMIS specification. Upon further digging, I found an Apache JIRA item for the OpenCMIS component in Apache Chemistry that appeared to show SAP’s use of the component. A listof Client APIs for OpenCMIS on the Apache Chemistry web site also contains the following reference – “CMIS connectivity for SAP Applications on future SAP NetWeaver release”. So I assume that Apache Chemistry is the foundation for this particular PaaS service. Based on the Chemistry architecture, MongoDB might be used as a Content Repository.
Before I continue, I’d like to commend the teams working on this ECM service for 1) using standards (CMIS) instead of designing something proprietary, 2) using OpenSource software (Apache Chemistry) instead of developing something proprietary and 3) actively participating in open source communities. All three activities demonstrate that SAP’s promise to be active in the OSS community that was presented so prominently two years ago at diverse TechEds was more than just marketing hype.
The HANA Connection
In an interview with Co-CEO Bill McDermott that appeared yesterday in USAToday, the first global business trend that he mentioned was ‘big data’:
Obviously huge is this idea of big data. Data in the world is doubling every 18 months. The problem isn’t in having enough information. Everyone has too much information. How do you make sense of it all so that you can make intelligent decisions?
I started considering the evolution of HANA – and indeed SAP’s InMemory Technology in general – in the last year. Originally, HANA was primarily focused on structured data (for example as used in real-time analytics). More recently, HANA has evolved into a tool that can also handle unstructured data.
Unstructured data (content in videos, documents, etc), has of late increased dramatically in importance. As ZDNet blogger Oliver Marks elegantly portrays in a recent blog, the critical nature of unstructured data was one of main motivations for HP’s recent decision to buy Autonomy.
Evidence of HANA’s evolution can be seen in a current job offer from SAP which describes functionality in this area:
The SAP HANA Database can also retrieve and classify unstructured data from text documents in SAP applications and will also be a mission-critical part of the core infrastructure of SAP BusinessByDesign.
For those who want to know more about the central position of this functionality in the HANA architecture, I recommend this video from Haaso Plattner entitled “Text Retrieval and Exploration” in which Hasso describes some interesting use cases for this technology.
Furthermore, for business environments added value lies in combining search in unstructured data with analytics of structured data. Cancer databases in healthcare are an example where the combination of structured and unstructured data creates new value by being able to map (structured) patient data in the hospital database onto (unstructured) reports from screening, operations, and pathology on a common characteristic, e.g., cancer size and type, to learn from treatments in similar cases
The demo from Medtronics which was part of Hasso’s keynote at SAPPHIRE NOW in Orlando earlier this year also provides a great example of this functionality.
The fundamental design decision to use MongoDB in the ECM component of the ‘edge’ platform had obviously been made long in the past. Thus, the decision to use MongoDB was correct based on the circumstances at the moment that the decision was made.
However, given the strategic importance of HANA to SAP and the significant investment in development that has been made over the last couple of years, I would expect that the PaaS teams will be keen to exploit the entire scope of HANA’s capabilities as well. As HANA matures, other vendors whose technology is currently inside of SAP’s products (for example, Oracle as the database most used in the BusinessSuite) may find themselves replaced. From what I can surmise, MongoDB will be substituted in the same fashion. When this exchange takes place will largely depend on the progress of HANA in the marketplace – in the form of customer acceptance – and the ability of HANA to adequately meet the technical and business requirements currently met by MongoDB.
Note: As a participant in a SAP Customer Engagement Initiative (CEI) concerning OnDemand Platforms, I am bound by a NDA regarding my activities which take place within the boundaries of the CEI – especially regarding technical details of the platform. This blog is based on materials that I found publically available on the Internet.