10 reasons to choose SAP Datasphere as the foundation for your business data fabric architecture
SAP Datasphere is the evolution of the SAP data warehouse solution in the cloud, combining capabilities from SAP Data Warehouse Cloud with its agile enterprise data warehouse, self-service data modelling and SAP Data Intelligence Cloud with its data orchestration, lineage, quality, among other capabilities.
Probably many of you are wondering what a business data fabric is, therefore I am including here a simple yet concise definition:
A business data fabric goes beyond a traditional data fabric approach. While it still simplifies complex data landscapes and delivers meaningful data to every data consumer – it takes the benefits and value further by keeping the business logic and application context from data intact (in essence, it maintains the data’s DNA). As such, a business data fabric eliminates the need to recreate all the business context lost from extracting data – giving business stakeholders and data consumers the ability to accelerate their decision-making with trust and confidence, knowing they always have the complete picture of their data regardless of where it is stored or how it was designed. Source
1. Preserving Business Context
In typical data-related projects, an average of 80%* of the time is spent trying to recreate context that has been lost during data replication, especially when getting data from the database layer. In this inefficient process, data, especially metadata, hierarchies, etc. has been removed from its context, losing crucial business understanding of the data.
*Source: SAP Datasphere Webinar.
All this extra and unnecessary effort represents a cost not seen at first, whether in time or money, but that undoubtedly will impact in the organization later on. This is the so-called “hidden data tax”.
With SAP Datasphere, you can leverage over 6000 CDS views out-of-the-box available in the SAP S/4HANA system. The Core Data Services (CSD) allows data extraction via semantic meaningful and stable Virtual Data Models (VDM) artifacts (not database artifacts). In other words, SAP Datasphere preserves the business context of the SAP data for later integration, modelling and analytics data preparation activities for business consumption.
2. Federated Data Access & Integration Flexibility
One common approach for working and combining data is to replicate the data by extracting it out of the SAP ecosystem. What we’ve seen with many organizations is that this approach creates multiple versions of the truth, therefore inaccurate insights, delays, compliance risks with untrusty data and heavy reliance on IT with extra efforts to get the data right again.
With SAP Datasphere, you can leverage both Data Federation and Data Replication. Data Federation avoids extracting the data from SAP S/4HANA, and even if you follow a data replication approach with Datasphere, as this data still lives in SAP ecosystem, the governance and data sharing is consistent across multiple teams within your organization, empowering IT and business users to collaborate better with a democratized data access.
SAP Datasphere addresses different data ingestion approaches for your integration requirements.
- Data Federation:
- Remote Table Federation: First, you can federate data, where you leave the data in the source system and access it remotely in real-time. In other words no upfront data movement is done and this is supported across SAP and non-SAP sources such as hyperscalers sources.
- Model Transfer: When working with BW/4HANA, you can also leverage Datasphere for easier and faster data access to business users and data analysts while getting the “best of both worlds” in a hybrid landscape. This is thanks to the Model Transfer approach, where only the BW/4HANA rich metadata is imported from BW/4HANA to Datasphere, while the data is being virtually accessed. I’ll talk more about this in the Reason #6.
- Data Replication:
- Remote Table Replication: If needed, you can also replicate remote data entirely or selectively with a real-time approach or by taking and scheduling snapshots of, for example, CDS views from S/4HANA. CDS views have a built-in delta extraction mechanism for data replication thanks to its Change Data Capture engine.
- View Persistance: Another approach is to persist views, where you can just materialize the view of your data model with its outputs results in a stable persistence for an optimized performance if needed. Also, allowing you to schedule regular updates to keep the data fresh.
- Replication Flows: There is a new capability for replicating multiple entities (tables, CDS views, etc.) in a simpler and non-technical way, especially where no complex data transformation is needed. This is possible with Replication Flows. New targets are coming soon in our Roadmap.
- Data Flows: thanks to the capabilities being added from SAP Data Intelligence Cloud, SAP Datasphere offers ETL capabilities for processing batch loads and complex data transformations. You can combine structured and semi structured data while defining ETL processes. It also allows advanced transformation capabilities leveraging Python 3 and its analytical libraries like pandas.
- API integration: it also permits other solutions like SAP Data Services, SAP Open Connectors, Precog, etc. to bring data into the system using SQL interfaces and the open SQL schema.
For more information about the supported data sources and connectors, please check here.
3. Reduced effort on Security Configuration
When setting up a new system, security configurations for row-level data access tends to require a lot of time and effort to set them up. Of course, it would bring a lot of value on time and resource savings if we could just reuse the existing security concepts.
As SAP Datasphere supports both Data Federation and Replication, there are two main ways of leveraging existing data access configurations.
- Accessing the data in the source system in a federated way where either a technical user (with a well-defined set of authorizations) can be used to access the data from a given source as it is today.
- A second option is to leverage existing authorizations. For organizations that are currently working with SAP BW 7.5 (standalone or ERP, BW embedded) or SAP BW/4HANA as their data warehouse on premise; analysis authorizations defined in those systems can be imported into SAP Datasphere. This significantly reduces the effort on implementing row-level security and allows business continuity.
The way that this works -in a high level- is that you must create the report that generates the RSDWC_RSEC_DAC permissions table on the source system, which SAP Datasphere will connect to, so that it can generate data access controls. Check this documentation, and blog post by Heiko Schneider for more info.
4. Democratized data access for self-service Data and Business Modeling
With the ever-growing data landscape, the need for providing more data access to business data analysts has increased, therefore, it has also increased the need for practical and powerful self-service tools for combining data, creating data models, and accessing data in a faster way.
SAP Datasphere provides built-in editors for business and data modelling with drag and drop capabilities with a low-code/no-code approach. Starting with the graphical editor, you can allow not only IT users but business data users as well to manipulate tables and views to create analytical models for business consumption, including scripting capabilities for data transformation.
Also, SAP Datasphere provides integrated SQL data warehousing tools for professional developers with a pro-code approach. You have the option to leverage existing SQL tooling and skillset for data modelling with views, tables, procedures, etc., with the Open SQL schema. In addition, it also provides SAP HANA Cloud modeling capabilities, for creating calculation views, flowgraphs, etc., reusing HANA-based modeling in Datasphere.
5. Improved Data Governance with an enterprise-wide Data Catalog
When talking about easier access to available data artifacts for Business Users, the Data Catalog comes up.
SAP Datasphere utilizes an enterprise-wide data catalog. It helps business and IT users to crawl, profile, classify, organize and browse data (among other activities) for a better discovery and data exploration. The catalog holds metadata information, data lineage, glossary, classifications and so on. Therefore, it provides answers to common business questions such as where is the data coming from? What does this data mean in business terms? What is the available data to continue delivering relevant business insights?
Using the catalog and the enabled Business Data Fabric, you can put in place a better data governance strategy, across multiple systems, for your organization.
More will come with the integration with the open platform partnership with Collibra. I’ll cover more of this topic on Reason #10. Check the roadmap here.
6. Leverage existing SAP BW investments and simplify your landscape
Many SAP customers have used and still use SAP BW as their data warehouse to model SAP and non-SAP data on premise. SAP is aware of the efforts on time and resources many organizations have invested and know as well the need for adopting cloud solutions like SAP Datasphere to provide a faster way for innovation and scalability.
SAP Datasphere provides 3 main approaches to get the most of the SAP BW systems. I enlist them here:
- Move to the Cloud – BW Bridge: The SAP BW Bridge helps you to simplify your data landscape while accelerating the transition of SAP BW to SAP Datasphere. It allows to transfer your historical data and BW objects -such as InfoObjects, DataStore Objects and Composite Providers- enabling rich feature set of extractors and ABAP code for access to legacy SAP on premise systems from SAP Datasphere, to ultimately decommission your on-premise BW system.This is the recommended option to not only simplify your data landscape, but also to reduce TCO. Check more about the conversion paths and migration options to SAP Datasphere in this blog post by Deniz Osoy.
- Hybrid – BW/4HANA Model Transfer: If not ready to decommission your SAP BW yet, and you are working with BW/4HANA, you can follow a hybrid approach with Datasphere thanks to the BW/4HANA Model Transfer. The Model Transfer connection recreates the metadata structure automatically -from the business layer- for you inside the Datasphere system, to mirror what you can see in BW/4HANA giving you wider access to the BW objects. On top, you can leverage the remote tables in BW/4HANA giving your non-BW users the ability to create new data models in Datasphere with an improved user experience.
- Hybrid – ABAP and HANA connections: if working with a BW system on HANA, and not ready to decommission it yet, you can also federate and or replicate data for consumption scenarios in SAP Datasphere. This is thanks to the Operational Data Provisioning framework (while using an ABAP Connection) or connecting to the external HANA Views via an SAP HANA connection.
7. Reduced storage costs and improved performance
Oftentimes when thinking about managing high data volumes, we think about data lake systems to reduce storage costs, but there is a thin line on how much to sacrifice performance over lower storage costs. A common approach is to opt for typical file data lake systems, that in some occasions tend to resolve business demands, but when it comes to looking for a better and optimized performance for querying business data, these file data lake systems don’t tend to be the best option.
A simple example to bring in is when querying distinct counts, like how many distinct sales have occurred in all of your stores? In a month or in a year? This is super challenging in typical data lake systems (file data lake systems) because it requires cross references to come with a result, the more files you have the more challenging.
The story is different with SAP Datasphere, why? because besides being a data warehouse built on top of the in-memory database SAP HANA Cloud, it also provides a relational column-oriented data lake engine with SAP HANA Cloud Data Lake technology, which provides high-performance analysis for petabyte volumes of relational data for cold data storage and it’s tightly integrated with the in-memory layer of HANA. Therefore, SAP Datasphere allows storing high volumes of data at lower costs and at the same time provides an optimized performance for querying both hot and cold data. Check this blog post by Stefan Morio to get more familiar with this embedded technology.
8. Business Content for SAP applications and Industries
Understanding that earlier access to the business data is becoming more crucial to organizations. SAP Datasphere along with SAP Analytics Cloud provide over 250 content packages built by SAP industry experts or the SAP partner ecosystem, providing an accelerated time-to-value.
Business Content includes prebuilt dashboards, predefined data models, tables and views with business understanding to connect and centralize your data sources, which also helps to preserve business context beyond S/4HANA data. It is available for a great number of line of businesses and industries. Business content is delivered in a customizable way, allowing organizations to meet their unique business needs without having to start from scratch.
For more information, check here.
9. Machine Learning and Artificial Intelligence
Embedded Machine Learning and Advanced Analytics
When it comes to getting more insights on contextual business data, ML/AI comes handy. That’s why SAP Datasphere leverages its embedded Machine Learning and Advanced Analytics capabilities with SAP HANA Cloud. You can apply data science without data extraction in SAP Datasphere. Some of these capabilities are the abilty to work with Python and R native machine learning client, to trigger calculations in SAP Datasphere, and/or remotely use SAP HANA Cloud machine learning, spatial, text and graph functions in Python or R. You can train your model and store the ML prediction result in a HDI-container or open SQL schema.
For more information, check here.
On top of SAP Datapshere out-of-the-box ML/AI capabilities, it also offers the FedML, which are libraries provided by SAP so that hyperscalers can connect technologies like Google Vertex AI, Databricks, AWS Sagemaker, Azure ML, among others; to take virtual data on Datasphere, apply AI predictive modeling and return outcomes for further analytics and business consumption.
More is to come with SAP Analytics Cloud, with the Just Ask capability. Just Ask is a Natural Language Query technology that will allow users to just ask business-analytics-related questions from their SAP Analytics Cloud’s home page, enabling them to query acquired data models -for the controlled release that is planned to be in Q4 2023-. General Availability is planned in a subsequent release.
The Just Ask technology comes from the SAP’s acquisition of Askdata, which was announced in July 2022 (source). By that time, Askdata had over 8 years of experience in AI & ML Analytics. This is special for search-driven analytics, and it will come out of the box with SAP Analytics Cloud.
Stay tuned to the roadmap as it constantly provides updates on additional capabilities.
10. Openness and integration with open data partners
You probably are thinking how can we go beyond SAP Datasphere, especially with working with non-SAP data?
Well, SAP has created partnership with open data platforms for AI, Data Lakehouse and Catalog to simplify data landscapes and expand the AI possibilities.
- Collibra: The intention of the partnership with Collibra is to also gain visibility on the data assets for non-SAP data, by building a complete data catalog with lineage across the entire data landscape. Therefore, better data governance.
- Confluent: By bringing this partnership, organizations will be able to manage their data in motion with real-time event.
- Databricks: Organizations applying data engineering tasks with Databricks will benefit of the bi-directional integration between their using Databricks’ Lakehouse and SAP Datasphere, where data can be shared without extraction and preserving the business semantics.
- DataRobot: Data scientists -regardless of their skills- can use DataRobot to create ML and AI models, leveraging the integration with SAP Datasphere to access data with business meaning. After building and training models in DataRobot, they can be exported to be consumed by SAP AI Core so other applications can benefit from them, such as custom apps built on BTP, Process Automation, Analytics & Planning, etc.
- Google Cloud: This partnership brings a capability to keep the promise of unifying data landscapes and keep amplifying the AI spectrum. Non-SAP data coming from BigQuery can be federated and/or replicated to SAP Datasphere to again, combine SAP with non-SAP data. In addition, Google’s Vertex AI can be leverage with SAP’s technology FedML to take virtual data from Datasphere to expand predictive requirements with Google’s technology, and bring the output results back to Datasphere, having both actual and predictive data.
Messer, Gases for Life, is a great case to show an example of how Datasphere has help them to deliver greater benefits.
In the past, they tried to provide access to SAP and non-SAP data to their business functions via dedicated databases, but it only proliferated in silos, multiple versions of the truth and increased integration overhead and rework. Analytics were being done on extracted data in a third-party solution, their IT lost the ability to fully govern data and trust on the data was decreasing, while losing business context on the SAP data.
Now, they are simplifying their data landscape: with 1 data foundation -SAP Datasphere-, integrating 12 data sources, minimizing data extraction thanks to federating data (Reason #2: Federated Data Access & Integration Flexibility) and maintaining the business context of the SAP data (Reason #1: Preserving Business Context).
In addition, hundreds self-service users are leveraging the no-code approach for accessing and analyzing trusted data. For example: 100% of the data modeling tasks have been achievable using self-service tools (Reason # 4: Democratized data access for self-service Data and Business Modeling). Also, they have the opportunity to leverage decades of SAP BW investments, now in the Cloud thanks to the SAP BW Bridge (Reason #6: Leverage existing SAP BW investments and simplify your landscape).
The simplified data and analytics landscape is more cost-efficient and easier to keep up, while delivering better for the needs of the business.
SAP Datasphere is the evolution of data warehousing bringing together so many powerful capabilities to cover the ever-growing technical and business data requirements.
A key word that I personally often use to describe SAP Datasphere is Flexibility, because it gives you the flexibility that you need to cover your unique requirements, whether it is getting data in a federated or replicated way (and its options for replication), whether you want to follow a self-service approach for data modeling with no-code or want to model data using pro-code tooling, whether you want to create your own models or leverage the prebuilt content and start from there… I am using OR in these comparisons, but of course it can also be AND, the possibilities are immense… and all this by still preserving the business context of the data, and therefore your Business Data Fabric.
Big thank you and shoutout to Tony Cheung for being a great mentor on my data journey.
Learn more about SAP Datasphere
Find out how to unleash the power of your business data with SAP’s free learning content on SAP Datasphere Learning and OpenSAP. Check out even more role-based learning resources and opportunities to get certified in one place on SAP Learning site.