Enterprise Data Landscapes Are Growing Increasingly Complex
Here’s the reality. Enterprises are adopting a multi-cloud approach as part of their digital transformation strategy. Multi-Cloud gives organizations the flexibility and the agility to choose best-of-breed solutions that meet diverse business and technology requirements of various business units. However, it’s also resurfacing the age-old problem of integrating and extracting value from data stores that are distributed geographically, across a growing number of on-premises and cloud applications. Data silos, which have always limited organizations’ ability to extract value from all its diverse data, become even more isolated.
Imagine yourself working in the field of data-science or analytics, spending most of your project time on collecting and preparing data for analysis. In addition to being skilled at applying machine learning and AI, you need to be skilled in Big Data, especially ETL to process Big Data. However, ETL isn’t easy given the breadth of Big Data technologies for storing data like RDBMS, NoSQL and distributed data processing frameworks like Hadoop and Spark. Clearly, the need is to reduce the pain of integrating data silos without imposing an impractical enterprise-wide decree on how data is collected and stored to maximize ease of use for data scientists or analysts.
Big Data Meets Cloud Computing
Organizations are using cloud Big Data platforms to integrate their diverse landscapes of on-premises and cloud applications. They are building data lakes on cloud Big Data platforms to store and process structured and unstructured data from all their data sources. The goal of building a data lake is to provide a centralized and unified data source for the organization-wide Big Data analytics needs of all its business users.
In general, organizations favor cloud Big Data platforms to build a data lake for following reasons
- Easy onboarding of new big data sources like IoT, Social, Mobile
- Flexible consumption-based pricing model leads to efficient allocation of capital
- Future-proof investment in a scalable data platform – start small and grow storage or compute over time
- Easy and Speedy transition from proof-of-concept to production
- Scale compute and storage independently
SAP Cloud Platform Big Data Services
SAP Cloud Platform offers Big Data as a service for organizations of all sizes to build data lakes and run data science and analytics using their favorite tools. It’s a fully managed end-to-end offering – from setup to ongoing operations – making it an ideal Big Data cloud solution for ingesting, refining, and storing terabytes to petabytes of structured and unstructured data.
The out-of-box integration to SAP HANA allows customers to augment their data warehouse on SAP HANA with a data lake on SAP Cloud Platform Big Data Services. For instance, SAP Cloud Platform Big Data Services acts as a data refinery for new big data sources like IoT, where data is cleansed and transformed before being surfaced in SAP HANA – via data virtualization (SDA) or data movement (SDI) – for low-latency analytics. In other scenarios, SAP Cloud Platform Big Data Services is being used to offload ETL jobs as well as store older and less frequently used data, like historical sales orders, using data lifecycle management policies.
The following are some of the data flows an organization can implement on an integrated data warehouse and data lake landscape
- On-board Big Data Sources – Collect, store, and process new big data sources like IoT in a data lake. Cleanse, refine, and combine raw data with enterprise data in the data lake for data science and analytics.
- Offload Data Warehouse – Run ETL jobs to analyze, enrich, and aggregate data in a data lake for better integration of disparate data sources and to reduce data movement and latency. Expose aggregates from data lake to data warehouse using data virtualization or replication for enterprise reporting and analytics.
- Multi-Temperature Data Management – Move less frequently used data from data warehouse to data lake using data lifecycle management policies for regulatory compliance and historical data analysis.
Build and Manage the Information Supply Chain Across the Enterprise Data Landscape
The data lake is a viable and feasible approach to eliminating data silos and unlocking value from data by consolidating all enterprise data into a single repository. However, the data lake does come with the challenge of keeping it consistent and free from corrupted data to avoid it becoming a data swamp.
The solution to keeping the data lake consistent over time requires visibility and collaboration around how data is being accessed, harmonized and processed by various users for diverse use-cases. For the data lake to become a trusted, unified view on single source of truth, users need to maintain a catalog of all data management processes across the enterprise landscape. Such a catalog will help organizations to bring transparency, governance, and collaboration to data management processes across a diverse data landscape.
SAP offers a data management solution, SAP Data Hub, for organizations to orchestrate and catalog data flows for various integration scenarios across a distributed landscape. SAP Data Hub provides a graphical interface to build, orchestrate and monitor data pipelines with push-down distributed processing for enterprise visibility and governance across cloud and on-premises data stores. It provides a birds-eye view of the enterprise landscape and allows users to view metadata, profile data, and manage access rights.
SAP Data Hub Simplifies Data Lake Implementation on SAP Cloud Platform Big Data Services
SAP Data Hub now integrates with SAP Cloud Platform Big Data Services, in addition to SAP HANA. This out-of-box integration simplifies data lake implementation and maintenance for organizations of all sizes, irrespective of their maturity level in unlocking value from Big Data. SAP Data Hub simplifies the integration challenges of processing structured, semi-structured, and unstructured data by abstracting the underlying data stores and processing engines. For instance, users can easily browse metadata, preview data, and run Hadoop or Spark jobs on SAP Cloud Platform Big Data Services. With SAP Data Hub, users can schedule and monitor data pipelines to read, process, and store data on SAP Cloud Platform Big Data Services. These pipelines can be consumed as services by various applications for data science and analytics.
The following are some of the data flows that can be modeled by organizations using SAP Data Hub to build and manage the data lake on SAP Cloud Platform Big Data Services
- Distributed ETL: Use SAP Data Hub to build extensible and scalable ETL pipelines that orchestrate data flows across a variety of data sources for consolidating all forms of enterprise data in a data lake on SAP Big Data Services.
- Data APIs: Use SAP Data Hub to build data flows for stream and batch ingestion and processing using pre-built operators like join, filter, machine learning or custom code to refine source data into business insights. Expose data pipelines from SAP Data Hub as APIs to build and scale data-driven applications for various business users.
Enterprise Data Fabric
SAP is making it easy for organizations to embrace multi-cloud approach as part of their digital transformation strategy. Data management platforms like SAP HANA, SAP Data Hub and SAP Cloud Platform Big Data Services allow enterprises of all sizes to overcome data silos and unlock value from all its data through integration, governance and distributed processing. These integrated platforms from SAP enable organizations to build and manage their enterprise data fabric bringing much needed visibility and collaboration to become a data-driven Intelligent Enterprise.
Explore, Learn and Get Started with SAP Cloud Platform Big Data Services – a fully-managed Big Data cloud solution to store and process all forms of enterprise data for data science and analytics. Use SAP Cloud Platform Big Data Services alongside SAP HANA and SAP Data Hub as a scalable data management platform to build your enterprise data fabric and unlock value from your data.