Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
Wolfgang_Epting
Product and Topic Expert
Product and Topic Expert


Summary: In my first blog with the title: More than just a hype: Data Mesh as a new approach to increase agility in value creation from data I explained the four principles of Data Mesh at a high level and mapped some helpful capabilities of the SAP Unified Data & Analytics Portfolio. In the past, current and subsequent episodes, I will look at individual SAP products and explain in more detail which of their technical capabilities can support the introduction of a Data Mesh. Having started the series with SAP Data Warehouse Cloud, SAP HANA Cloud and SAP Master Data Governance in the previous blogs, this time I will talk about SAP Data Intelligence Cloud and later on continue with SAP Analytics Cloud.

To fully understand this blog, it is at least necessary to have read my first blog or another respective publication to be familiar with the four principles of Data Mesh and to understand why centralized  monolithic data architectures suffer from some inherent problems. Recognize the mention of Data Mesh principles by the fact that they are written in bold.

 







SAP Data Intelligence Cloud is a comprehensive data management solution. As the data orchestration layer of SAP Business Technology Platform, it transforms distributed data sprawls into vital data insights, supporting innovation and business growth.

Dumb pipes and smart endpoints:





In the mindset of Data Mesh, pipelines are a rather unwelcome vehicle. In the traditional way we have set up these so far in ETL (Extract-Transform-Load) style, they are fragile as any change in upstream systems will cause bugs to appear which have to be painstakingly found and fixed. Central administration is also seen as counterproductive as it is an inhibitor for the agility of domain teams. Pipelines should be considered as a part of the data products and rather dumb, meaning they contain as little logic as possible and don’t extend beyond the boundary of a data product. In contrast, the filters of the downstream consumers should be rather smart, so that data pipelines are degraded to pure transmission instruments.

On the other hand, the need for cleansing, preparation, aggregation and sharing of data remains, regardless of whether a central or a Data Mesh approach is pursued. In addition, data products should have, following the principle of "valuable on its own", inherent value, which can only be achieved by combining them with other data products or by appending value-adding elements such as AI models. Data Mesh posits, and this is the fundamental shift, that pipelines are managed within the domains and not as central artifacts by the infrastructure IT team and are exclusively seen as part of data products.

This leads to the question of what options a platform on which such pipelines are developed must have. It must be flexible enough to support simple data transfer on the one hand and the seamless integration of complex AI models on the other. In addition, it must abstract from technical details, for which skills are not available in the domains teams, so that a simple, intuitive setup and operation becomes possible. Not to be forgotten is the possibility of continuously changing and redeploying pipelines. Results must be available in various formats, e.g. as APIs, so that subsequent consumers, as described above, can use smart selection criteria on the endpoints for further processing.

The SAP Data Intelligence Cloud Modeler uses a flow-based programming paradigm to create data processing pipelines also known as graphs. SAP Data Intelligence supports the creation of flexible structured / unstructured / streaming, batch / (near) real-time, OLTP and OLAP pipelines with a clear end-to-end understanding of data lineage and usage across the connected landscapes. Jupyter Notebooks can seamlessly be integrated into the Pipeline Modeler application. This makes it possible to use those directly for the operationalization of training and production pipelines and your data scientists can use the tools they know. REST APIs provide powerful ways to unlock Data Mesh relevant scenarios by automating the creation, modification, and execution of a pipeline. Parametrized pipelines can be configured and executed according to external factors from remote systems. Continuous integration - continuous delivery/deployment is being achieved by developing, packaging and testing artifacts using your company's Enterprise GitHub repository.

 

SAP Data Intelligence Cloud Modeler:









Basic pipeline



Serve Data Design Properties:


A data product serves domain-oriented data to a diverse set of analytical consumers with diverse profiles the need for longitudinal data and a consistent view of multiple domains at a point in time. The characteristics of multimodal, immutable, bitemporal and read-only access are integral to the working of Data Mesh. While serving diverse consumers natively, a data product must share the same domain semantic, in different syntaxes, i.e. as columnar files, relational database tables, events or in other formats without compromising the experience of the data consumer. Immutability means that once a piece of data is processed and made available to the data users, it cannot be deleted or updated. Bitemporal data modeling makes it possible to serve data as immutable entities to enable temporal analysis and time travel, i.e., look at past trends and predict future possibilities, which is essential for Data Mesh.

SAP Data Intelligence Cloud offers comprehensive possibilities to enable the described Data Design Properties. Data products can be provided in all possible formats, whereby immutability and bit-temporality are ensured by the type of provision, e.g. streaming, event processing, change data capture and the settings of the receiving medium.

SAP Data Intelligence Cloud - turn data chaos into data products:




Connect and process data from various sources to enable Data Mesh Design Properties

Discover, Understand, Trust, and Explore:


Data mesh defines discoverability, understandability, trustworthiness, and explorability as some of the intrinsic characteristics of a data product. What makes the approach unique is how to discover, understand, and trust data in a decentralized mesh of interconnected and autonomous data products, without creating centralized bottlenecks.




SAP´s statement of direction states that a comprehensive Data Catalog is an organized inventory of company-wide metadata and data assets to enable business and technical users to unlock the full potential of their enterprise data. It helps to rapidly discover, understand and trust data and generate impactful business insights and actions instantly.

Such a catalog is predestined to provide the Discover, Understand, Trust, and Explore functionalities for data products functionalities described. The SAP Catalog offers the technical user the possibility to further process data products and the business user the possibility to consume them. The SAP Data Intelligence Cloud Catalog has always been and will remain an open catalog, as can be about read in this blog.

SAP Data Catalog - Take, Curate, Deliver:







SAP Data Catalog Innovation





The author would like to thank frederikmichael.hopt for the collaboration on this topic and his contributions to this article.

  • Further Information:













If you have any further questions about SAP Business Technology Platform or specific elements of our platform, leave a question in SAP Community Q&A or visit our SAP Community topic page.


1 Comment