This article provides insights into the SAP Cloud Platform Big Data Services, one of the popular solutions from SAP software corporation. It allows customers to handle big data in Hadoop based on cloud service subscription.
Big data in the business ecosystem
It is common knowledge that SAP solutions are used by major international companies from different industries, including metallurgy, oil, and gas, as well as other conservative sectors of the economy. The software giant develops and deploys modern IT solutions for them. Nowadays, these enterprises are investing more and more in cutting-edge technologies, such as the Internet of Things (IoT), machine learning, or big data processing.
One of the motivations in terms of the latter activity is to benefit from this data in new ways. For instance, metallurgical companies are trying to find new sources of income or ways to cut the expenses in the current economic and geopolitical circumstances. Big data can help them generate new ideas as it speaks volumes about the global industry trends in general and their business processes in particular.
There are plenty of services out there, both open-source and commercial, to store and work with big data. Hadoop, along with its extra components, is the most popular one. The features that make it the leading solution in its niche include:
- Affordable data storage
- A vast variety of additional open-source components for processing data, such as Hive, Spark, etc.
- Lots of experts who are proficient in working with Hadoop
The popularity of open-source solutions with zero cost is understandable. Nevertheless, in scenarios where Hadoop is deployed for industrial use, free open-source services are hardly ever leveraged in their pure, original form. Instead, commercial variants of open-source solutions based on Hadoop have been gaining traction among businesses. A few well-known providers of these products are Hortonworks and Cloudera, but there are many more. These companies’ responsibility boils down to delivering reliable software and ensuring a seamless interaction of all the components. Some services adopt a different approach, enabling their clients to work with big data via the cloud on a subscription basis.
At some point, many companies face a tough choice between on-premise and cloud approach of working with big data. Most IT teams prefer the former option due to concerns about the reliability of the cloud.
It’s not hard to deploy Hadoop on local servers at the initial stage of dealing with big data, which is accompanied by commonplace hypothesis testing and other checks. Things get more complicated once you move on to commercial use of the solution, where it needs to meet specific requirements: SLA uptime level of 99.9%, guaranteed high reliability of storing massive amounts of data, as well as compliance with predefined KPIs.
In the event you choose to deploy Hadoop in production on-premise, the following tasks are going to be on your to-do list:
- Hire skilled IT specialists
- Buy appropriate hardware
- Buy the necessary distribution kits, install and optimize the software
- Launch the solution in production
- Invest in regular maintenance (personnel salaries, hardware maintenance, etc.)
It’s worth mentioning that this preliminary phase takes quite a bit of time. This is why businesses often find it hard to decide which approach – on-premise or cloud – is the most suitable for them.
Bain & Company, a reputable management consulting firm, touched upon the Netflix case in one of their reports. In 2016, the media services provider claimed they had to work with thousands of nodes under immense load in order to process big data. In particular, they were processing about 350 billion user-generated events and petabytes of data related to their services every single day. Obviously, on-premise servers alone are incapable of addressing the objective, unless you are constantly busy building new data centers.
SAP has joined the cloud boom, too. In 2016, the company teamed up with Altiscale, one of the world’s leading providers of Big Data-as-a-Service. The resulting product is the SAP Cloud Platform Big Data Services. SAP customers can use a cloud subscription model to benefit from this solution. It is also embedded in SAP’s general cloud infrastructure.
So, what is the SAP Cloud Platform Big Data Services, SAP’s cloud-based Hadoop service?
SAP Cloud Platform Big Data Services is a toolkit for working with big data based on the SaaS (Software-as-a-Service) model. It includes the following three main components:
Apache Hadoop cluster
This cluster is compiled using Hadoop in compliance with the ODPi certification. It means that the applications and scripts utilized in other services’ ODPi ecosystems can be successfully executed in SAP Big Data Services.
The cluster, in its turn, comprises three nodes: control node (‘namenode’), maintenance node (‘secondary namenode’), and data node (‘resource manager’). The initial set-up of the service already includes the YARN cluster management technology.
The secondary namenode supports additional services, such as Oozie, Hive Metastore, etc. When a customer subscribes for the solution, they get a separate cluster with all the necessary resources. The measurement of these resources is based on storage space and the number of machine-hours. The cluster’s resources can flexibly increase during periods of critical computation or on a permanent basis if necessary.
Workbench, the all-in-one access point
For the sake of security, direct access to the Hadoop cluster is restricted to the operating personnel and the Workbench. The customer can only access the Workbench, which spans local Hadoop as well as Spark, Hive, Oozie, Pig and other components required for data science and engineering, including SAP Predictive Analytics and SAP Lumira.
The customer can use the Workbench to launch scripts, examine data with business intelligence tools, and solve other tasks. The Workbench interacts closely with the Hadoop cluster via a high-capacity channel.
Big Data Services dashboard
The purpose of this element is to deliver proper user experience, generate keys to access Big Data Services, provide cluster usage statistics, and perform other routine tasks the customer may come across.
The Big Data Services solution is connected to the outside world by means of a jump host server. The whole network communication is done within the local IP address space – the virtual private cloud and virtual private network. SSH is the default way of accessing Big Data Services, whereas alternative options are available upon request. The solution additionally supports Kerberos authentication, thus allowing the clients to benefit from single sign-on (SSO) technology.
SAP Cloud Platform Big Data Services can interact with other services from SAP and with on-premise solutions. The following properties can facilitate successful integration:
- Gathering and processing sensor data with Kafka Streams
- Extracting data from relational databases by means of Kafka Connectors or SAP Data Services
- Interacting with SAP HANA platform-based SAP systems through Smart Data Access and Smart Data Integration
- Interacting with on-premise Hadoop at the Hadoop Distributed File System (HDFS) layer
All communication channels integrated with Big Data Services allow for high-speed data exchange with the customers’ system sources.
What makes the SAP Cloud Platform Big Data Services stand out from other cloud-based Hadoop solutions?
The fundamental difference is that the SAP toolkit can be flawlessly embedded into business processes due to the overarching interoperability between its architecture and SAP’s other systems and services. This is the key advantage for businesses seeking to monetize big data. If data scientists are the only ones who see the actual analysis results in a Hadoop usage scenario, they have yet to persuade enterprise users to put the new ideas into practice, and no one can guarantee that the hypotheses will be implemented. The SAP Cloud Platform Big Data Services can be directly interwoven with an organization’s internal IT systems as one of the critical steps towards a successful business process.