I am a member of SAP’s Industry Standards and Open Source team. Recently, I was approached by a national standards body to share some details about SAP’s Big Data architecture, so that they can analyze the emerging trends and thereby gather requirements for standardization. I have put together a short overview of the key principles and components of SAP Big Data architecture, which I am also sharing here. All viewpoints in this blog are my own and do not necessarily represent those of SAP. Nevertheless I would like to thank my colleagues Yuvaraj Athur Raghuvir, David Burdett, Mark Crawford, Rainer Brendle and Steve Winkler for their input.
The data enterprises care about has been growing exponentially in the recent years. In additional to the traditional transactional data, businesses are finding it advantageous to accumulate all kinds of other data such as weblogs, sensor data and social media, generally referred to as Big Data, and leverage it for enriching the context of their business applications and gaining new insights.
SAP Big Data architecture provides businesses with a platform to handle the high volumes, varieties and velocities of the Big Data, as well as to rapidly extract and utilize business value from Big Data in the context of business applications and analytics.
SAP HANA Platform for Big Data helps in relaxing traditional constrains of creating business applications.
Traditional disk based systems suffer from the performance hits not just due to the necessity of movement of data to and from the disk, but also from the inherent limitations of the disk based architectures such as the need to create and maintain indices and materialized aggregates of data. SAP HANA Platform largely eliminates such drawbacks of traditional architectures and enables the business to innovate freely and in real time.
Business applications are typically modeled to capture the rules and activities of real world business processes. SAP HANA Platform for Big Data makes a whole new set of data available for consumption and relaxes the dependency on static rules in creating business applications. Applications can now leverage real time Big Data insights for decision making to vastly enhance and expand their functionality.
Traditionally, exploration and analytics of data is based on the knowledge of the structures and formats of the data. With Big Data, it is often required to first investigate the data for what it contains, whether it is relevant to the business, how it relates to the other data and how it can be processed. SAP HANA Platform for Big Data supports tools and services for Data Scientists to conduct such research of Big Data and enable its exploitation in the context of business applications.
SAP Big Data architecture provides a platform for business applications with features such as the ones referenced above. The key principles of SAP Big Data architecture include:
- An architecture that puts In-Memory technology data at its core and maximizes computational efficiencies by bringing the compute and data layers together.
- Support for a variety of data processing engines (such as transactional, analytical, graph, and spatial) operating directly on the same data set in memory.
- Interoperability / integration of best of breed technologies such as Hadoop / Hive for the data layer including data storage and low level processing.
- The ability to leverage these technologies to transform existing business applications and build entirely new ones that were previously not practical.
- Comprehensive native support for predictive analytics as well as interoperability with popular libraries such as R, for enabling roles such as Data Scientists to uncover and predict new business potentialities.
In a nutshell, the focus of SAP Big Data architecture is not limited to handling large amounts of data but rather it is about enabling enterprises to identify and realize the business value of Big Data in real time.
Description of Key Components:
SAP Big Data architecture enables an end-to-end platform and includes support for ingestion, storage, processing and consumption of Big Data.
The ingestion of data includes acquisition of structured, semi-structured and unstructured data from a variety of sources to include traditional back end systems, sensors, social media, and event streams. Managing data quality through stewardship, governance, etc, and maintaining a dependable metadata store is a key aspect of the data ingestion phase.
SAP Big Data architecture brings together transactional and analytical processing that directly operate on the same copy of the enterprise data that is held entirely in memory. This architecture helps by eliminating the latency between transactional and analytical applications, as there is no longer a need to copy transactional data into separate systems for analytical purposes.
With in-memory computing, important application functionalities such as planning and simulation can be executed in real time. SAP Big Data architecture includes a dedicated engine for Planning & Simulation as a first class component, making it possible to iterate through various simulation and planning cycles in real time.
SAP Big Data architecture includes a Graph engine. The elements of Big Data are typically loosely structured. With the constant addition of new types of data, the structure and relationship between the data is constantly evolving. In such environments, it is not efficient to impose an artificial structure (e.g. relational) on the data. Modeling the data as graphs of complex and evolving interrelationships provides the needed flexibility in capturing dynamic, multi-faceted data.
An ever increasing number of business applications are becoming ‘location aware’. For example, sending only the most relevant promotions to the mobile device of a user walking into a retail store has a much higher chance of capturing the user’s attention and generating a sale. Recognizing this trend, SAP Big Data architecture includes a spatial data processing engine to support location aware business applications. For similar reasons, inherent capabilities for text, media, and social data processing are also included.
SAP Big Data architecture supports complex event processing throughout the entire stack. Event streams (e.g. sensor data, update from capital markets) are not just an additional source of Big Data, and they also require sophisticated processing of events such as processing (e.g. ETL) and analytics on the fly.
SAP Big Data architecture is also extensible to enable customers to utilize low cost data storage and low level data processing solutions such as Hadoop. By extending the SAP HANA platform for Big Data with Hadoop, customers can bring the benefits of real time processing to data in Hadoop systems. Scenarios for extracting high value data from Hadoop as well as federating the data processing of data in SAP HANA with Hadoop / Hive computational engines into a single SQL query are fully supported.
SAP Big Data architecture enables applying common concerns such as modeling, lifecycle management, landscape management, security, etc, across the various native and integrated components of the platform.
In summary, SAP Big Data architecture takes full advantage of the SAP HANA platform to create an environment that will enable enterprises to take full advantage of Big Data in ways that will fundamentally transform their business.