Skip to Content
Personal Insights
Author's profile photo Peter Baumann

SAP Datasphere – Q&A and Partnerships

On 8th of March 2023, SAP launched SAP Datasphere and on the same day already renamed the SAP Data Warehouse Cloud tenants to the new name.

SAP Datasphere shall deliver a Business Data Fabric to offer a seamless access to your data, independently where it is stored.

In the session a lot of questions where asked and answered by different SAP employees (thank you for this dialog!) about what this means and what is the impact to other SAP solutions. I compiled the most important questions I have seen from the session in the following part 1. After that I want to write down my first thoughts and considerations about the new strategic partnership.

Part 1 – Q&A:

Here, just updated is the external SAP Datasphere FAQ.

—– General Questions —–

Q: Will the Datasphere eventually replace DWC?

A: SAP Datasphere is the evolution of SAP Data Warehouse Cloud.

Q: What does evolution mean when talking about SAP Datasphere is the evolution of SAP DWC? Will DWC be renamed/rebranded or will they co-exist?

A: It is a rebranding. SAP DWC will become SAP Datasphere.

Q: What is the difference/advantage of SAP Datasphere over SAP Data Warehouse Cloud?

A: SAP Datasphere is the evolution of SAP Data Warehouse Cloud. It includes many new features include a global data catalog, deep partnerships, and an advanced analytic model.

—– SAP Data Intelligence —–

Q: Will you provide a clear roadmap for SAP Data Intelligence today?

A:  SAP Data Intelligence Cloud will continue as a supported product with ongoing innovation and investment.

Q: Is SAP Data Intelligence embedded within Datasphere ?

A: SAP Data Intelligence Cloud will continue as its own product but many of the capabilities will be included in SAP Datasphere as well. The underlying engines are the same (that handle data movement) and in that sense yes it is embedded. However there are still differences in the two solutions: SAP Data Intelligence Cloud is fully dedicated infrastructure per customer whereas SAP Datasphere is a true multi-tenant shared infrastructure solution (for example).

Q: Will all features of SAP Data Intelligence be included in SAP Dataspehere? Or what would be a reason to have both products?

A: SAP Data Intelligence Cloud continues as its own solution. We intend to have SAP Datasphere and SAP Data Intelligence Cloud co-exist until SAP Datasphere supports all SAP Data Intelligence Cloud customer use cases. The plan is that SAP Datasphere will eventually be able to cover all the major capabilities, target systems, and use cases that SAP Data Intelligence Cloud provides. We plan to also provide tools to facilitate the technical transition.

—– Integration Aspects —–

Q: SAP Datasphere is purely federated or still requires persistent data in the cloud data model when combining different data sources for Analytics?

A: SAP Datasphere follows a federation first approach – meaning you leave data where it resides, build your models and later on decide whether you want to replicate full tables, create view persistencies to cater for source system workload, data egress, and performance towards the end user.

Q: What mechanism type that DataSphere is using for realtime replication?

A: SAP Datasphere will support a cloud native real time replication mechanism (trigger based) which allows to efficiently replicate large data sets. You will find this functionality as “Replication Flow” as part of SAP Datasphere.

Q: How are you planning to enable seamless integration with on-premise solutions? DP Agent will be there for the long run?

A: For SAP S/4 and ERP systems we will make use of DMIS (part of SAP Landscape Transformation) and CDS views as a way to integrate data in near real time with initial and delta replication. For other on premise solutions we will initially use DP Agent but over time our intent is to enable seamless integration without an on premise agent being required.

Q: Is Datasphere replication able to move on prem data to on prem targets without a roundtrip to the cloud?

A: Our main focus of SAP Datasphere is to replicate data into cloud and distribute it further. We are planning for hybrid scenarios where workload could be executed e.g. on premise while orchestrated in the cloud to avoid the roundtrip.

—– BW/4HANA & SAC —–

Q: Where does SAP BW/4HANA fit into the SAP Datashpere?

A: SAP BW/4HANA objects and other objects can be imported via SAP Datasphere, BW bridge that lets users access their data and models via a workspace within SAP Datashphere.

Q: How will these products interact with SAP Analytics Cloud?

A: SAP Datasphere is tightly integrated with SAP Analytics Cloud to support analytics and planning use cases. We intend to further strengthen the integration with the release of the Analytic Model in SAP Datasphere. The Analytic Model offers a multi-dimensional modeling experience and comes with powerful new features, such as calculated and restricted measures, exception aggregations and the pruning of attributes and measures.

Part 2 – Strategic Partnerships:

SAP announced four strategic new partnerships to support the idea of a Business Data Fabric and to better support the integration of non-SAP data in a unified usage context.

Databricks

Databricks was founded by the creators of Apache Spark and they build a complete ecosystem around, delivering the Data Lakehouse based on a multi-cloud strategy similar to SAP. In simple terms a Data Lakehouse can be understood as bring together modern file formats (Apache Parquet (Databricks), Apache ORC, Apache Avro) with open table formats (Delta Lake (Databricks), Apache Iceberg, Apache Hudi) with a powerful, distributed query engine to process data (Photon for Databricks).

The advantage of a Data Lakehouse is to have all kind of data (structured, semi-structured, unstructured) in one tier and serve all your data roles like data engineer, data analyst, data scientists, BI modeler and so on, from this tier.

Databricks coined the term “Data Lakehouse” and is the one top partner in this area, even if others provide Data Lakehouse technologies, too.

See also: SAP Datasphere & Partnerships – Databricks

 

Collibra

If you look at market research like from Forrester or BARC for Data Catalog, Data Intelligence or Metadatamanagement, Collibra is typically on of the top three solutions in this area (typically together with Alation and Informatica or IBM).

SAP have some history with metadata management and already delivered these e. g. with SAP Information Steward, SAP Power Designer and also in other data and analytics solutions. SAP offers Data Catalog functionality within SAP Data Intelligence and build up more and more capabilities within SAP Data Warehouse Cloud.

So in a world where data assets are distributed all over our company and an overview and understanding of our data is getting more and more important, a data catalog is clearly recommended and will become a cornerstone of the data culture within data driven companies.

Collibra lately expand its platform with data quality management and data observability capabilities together with an partner ecosystem.

See also: SAP Datasphere & Partnerships – Collibra

 

Confluent

Confluent, similar to Databricks, is a company build on another important open source software for data management – Apache Kafka. If you have streaming data in your company, you will not pass having a look on Kafka. Confluent delivers Kafka from the cloud as a service with an optimized ecosystem.

For data driven companies the speed of collecting and processing data in near-real-time is getting more and more important. If you search the SAP Community you will find, that Kafka is a regular topic here, too.

See also: SAP Datasphere & Partnerships – Confluent

 

DataRobot

DataRobot is a pioneer of Automated Machine Learning and is deliviering a broad AI platform today. In the Forreser Wave Q3/2022 DataRobot is seen as Leader for AI/ML Platforms where Databricks holds the position of a Strong Performer.

This was maybe the most surprising partnership as I have seen SAP on a good way expanding it’s AI capabilites based on HANA, Augmented Analytics (e. g. via APL or SAP Analytics Cloud predictive features), or the relative new offering SAP AI Core. But here we also see that these tools and services are mostly used together with SAP solutions. So to expand to non-SAP data and use cases this could be the right way.

More about the current state of this partnership can be read in this statement.

See also: SAP Datasphere and Partnerships – DataRobot by Farooq Azam

Conclusion

If I look into my SAP Datasphere tenant today (formerly SAP Data Warehouse Cloud), I just see the announced new features (Analytic Model, Catalog, Replication Flow). More will come on the partner side as on SAP side. Even if integration today is already possible, more is expected as the comments in the blog “Unified Analytics with SAP Datasphere & Databricks Lakehouse Platform- Data Federation Scenarios” shows.

I remembered the announcement of SAP Data Hub (now SAP Data Intelligence) where SAP announced also openness to other vendors and partnerships and a very similar vision (just from my memory). In a world where end user companies are less and less bound to vendors because of the need to make the best out of their data, openness and partnerships are essential. To transfer capabilities from SAP Data Intelligence into SAP Datasphere will be the right decision as SAP DI will bring in further capabilities essential for a real data fabric.

SAP choose and start with top vendors in the market which seems to be the right approach. I would be happy if this works out well and also opens the mind of internal SAP-only advocates in the area of data and analytics. SAP is still the top in creating and handling business data but the process side is different to the data side. SAP have a big footprint in many companies but it is typically not the one player. Data Warehouses have been there to solve this in the past. In this hyper fast hybrid world today approaches have to be evolved and a Business Data Fabric – done right – shows the right way.

Happy to hear what you think about and if you already have some experiences how the solutions form these partnerships plays togehter with SAP.

 

 

Assigned Tags

      10 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Werner Dähn
      Werner Dähn

      Thanks, very insightful.

      There is a contradiction in the messaging of DataSphere. You want to get rid of an agent in the onPrem system yet you will be able to execute a dataflow in the confines of the onPrem system if source and target is onPrem. Without moving all data to the cloud and back. But what executes the dataflow if there is no software in the onPrem system?

      As I architected SDI and DPAgent I fully understand the desire to remove any onPrem installation requirement. The same question was raised during the inception of SDI as well. The reasons in short: security, performance, flexibility in processing, locality of processing, extensibility.

      In more detail, these were the arguments that lead to the DPAgent's existence:

      1. With a software component in the onPrem system you only need a single port opened from the cloud to onPrem. Sort of the same reason why you install a Cloud Connector. So my guess is, instead of DPAgent you will require Cloud Connector in future. As most customers have Cloud Connector installed for other reasons, that is a fair point.
      2. Let's your onPrem sources are Oracle, SQL Server and a windows file server (Samba network share). So cloud connector must poke three holes into the corporate firewall, one of which could potentially allow access to all files.
      3. Cloud Connector is currently limited to very few sources. Sure it can be extended to more ports and protocols. But why was it limited to just RFC and Hana SQL in the past and did not open all ports? For a good reason.
      4. I don't believe that even Cloud Connector will allow to read files from onPrem. This limits the number of potential sources to just the typical database sources. Note that advanced database scenarios require access to database transaction log files - so you would lose some CDC functionality forever.
      5. Even if no DPAgent is to be installed at the customer, you won't incorporate the functionality into the DWC core. It will always be a microservice of some sort.
      6. Imagine you want to join data from two onPrem databases. With an agent in the customer environment the agent can execute the two SQLs join the data and return the result. Without an agent all data has to be moved to DWC and joined there. DPAgent cannot do that today but it was envisioned.
      7. Data Integration is no 80:20 type. If the customer wants to move data from A to B or use transformation C and the Data Integration tool does not support 100%, then the tool cannot be used for that use case. Period. So it is important that you can access all systems and have options for all others. In SDI that is the Adapter SDK.
      8. Again for security, some settings you want to make in the corporate IT center and not via the cloud. You do not install Cloud Connector and then go to the cloud restrict its access, you do that during the cloud connector installation. Same thing for data access. You define what databases are accessible, what file shares are visible,...

      Hence my advice would be to make the DPAgent a tiny microservice again. Currently if you install the DataBricks connection, the DPAgent contains all adapters, a full SQLlite installation, a full DataServices installation and lots of programs. It could be a tiny container image with a 20MB DPAgent installation instead of a 1GB binary to be installed.

      Note: Despite common knowledge a single DPAgent can serve multiple sources and multiple DWC instances and multiple tennants. The DPAgent is just a bridge without any connection information (Minus point 8 above).

      Author's profile photo Peter Baumann
      Peter Baumann
      Blog Post Author

      Thank you Werner Dähn for your extensive comment. From my point of view it would be worth a blog on its own.

      To differentiate from what has been the DWC and shall now be a (Business) Data Fabric I expect SAP will work on many aspects including the bridge between onprem and cloud and the general connectivity. I think every discussion here also enhance the awareness of product management that Data Fabric is a high standard and SAP have to deliver and don't let it be just a new name with some new features already been on the roadmap.

       

      Author's profile photo sarthak srivastava
      sarthak srivastava

      Excellet Blog.

      Is SAP trying to promote Dataspehere for Data modelling in Future?

      As it has a lot of common features from HANA cloud Modelling and HANA cloud Modelling is obviously advanced as of now.

      So where does HANA cloud fits in terms of Data Modelling compared to Datasphere.

      How can Clients who are looking for specifically a Data Modelling tool choose between HANA Cloud & DataSphere.

      Thanks & Regards,

      Sarthak Srivastava

      Author's profile photo Peter Baumann
      Peter Baumann
      Blog Post Author

      Hi Sarthak Srivastava !

      Thank you for your comment! As SAP HANA Cloud is the technical base of former SAP Data Warehouse Cloud - now SAP Datasphere - I expect you still have access to the HANA Cloud underneath in the future, but more modelling possibilities as the Analytic Model will come to the SAP Datasphere tier.

       

      Author's profile photo Jyoti Sankar Sahu
      Jyoti Sankar Sahu

      Hi Peter,

      Very nice blog. We are facing one issue with SAP Datasphere. Could you please help in suggesting on it.

      I have posted that in the below link.

      https://answers.sap.com/questions/13866805/odp-datasource-not-available-in-datasphere.html

      Regards,

      Jyoti.

      Author's profile photo Benedict Venmani Felix
      Benedict Venmani Felix

      Hi Peter Baumann ,

      I am able to understand the integration with 3 of the data partners. But how does SAP Datasphere work with Collibra for data cataloging? Do you have additional information or links on this?

      Will Collibra be a add-on subscription service to Datasphere?

       

      -Benedict

      Author's profile photo Peter Baumann
      Peter Baumann
      Blog Post Author

      Hi Benedict Venmani Felix !

      From my understanding the next step is to provide SAP metadata to Collibra and enhance this step by step bi-directional. Collibra is a strong solution for metadata management/data catalog today. Getting metadata out of SAP ist not so easy and today done mainly via Sillwood Safyr integration at Collibra. I expect more announcements but currently there is not much detail about planned steps.

      To build up a data catalog is not so easy, especially for a broad range of sources. To deliver a data fabric, it is necessary to include a data catalog solution. The SAP Datasphere catalog is a good beginning but will need a long way if ever came to a similar maturity like Collibra.

      Best regards,

      Peter

      Author's profile photo Kalyan Abburi
      Kalyan Abburi

      Hi All,

       

      Though Datasphere is on cloud and consuming data from S4HANA On Premise. If we move the existing S4HANA to the cloud, the same extraction do work for datasphere or we have separate connections required? Do we loose any connection to CDS views or replications? What are all the impacts we can see?

       

      Best Regards,

      Kalyan Abburi

      Author's profile photo Peter Baumann
      Peter Baumann
      Blog Post Author

      Hi Kalyan Abburi !

      From my experience there are differences whether you just lift your S/4HANA to cloud (IaaS) or if you migrate to S/4HANA Cloud Service.

      But I would recommend to ask thid s question in the Q&A area of SAP Community.

       

      BR

      Peter

      Author's profile photo . Partner
      . Partner

      Werner Dähn you have said:

      Note: Despite common knowledge a single DPAgent can serve multiple sources and multiple DWC instances and multiple tennants. The DPAgent is just a bridge without any connection information

      Can you please tell us how to connect 1 DPAgent to SAC and DWC + on-Prem HANA in paraelell ? It look like the parameter hana.server has to be used for onprem and SAC but how can this work ?

      There is only 1 config file:  dpagentconfig.ini

      # Used for On-Prem HANA
      hana.onCloud=false
      hana.port=443
      hana.server=xx.onprem.local.com

      # Used for SAC
      hana.onCloud=true
      hana.port=443
      hana.server=sdi-connector-sac-saceu10.cfapps.eu10.hana.ondemand.com

      # Used for DWC Connect
      jdbc.enabled=true
      jdbc.encrypt=true
      jdbc.failover.hosts=
      jdbc.host=xxx.hana.prod-eu10.hanacloud.ondemand.com

      Thank you very much