Now that our customer went live with SAP Data Intelligence, I thought it was a good time to sum up impressions and experiences we had with this (*spoiler*) IMHO impressive tool. Not all functionality will be discussed, and I suggest the interested reader to see the latest open.sap.com training about SAP DI (“SAP Data Intelligence for Enterprise AI”, https://open.sap.com/courses/di1)
Back in 2016 I started to think about how data with the origin in the ERP systems and being reflected in SAP based data warehouse solutions in many of our customers can be brought together with data of data lakes which were increasing more and more in their importance as data pool within larger corporations. This included SAP Vora and teaching the HA500 training which “forced” me to get a better understanding about Hadoop.
Why was this integration of interest: On one side the significant growth of data could be covered by the more affordable way of managing this data within this infrastructure. In addition, it was easier for data scientists and the tools they used to analyse this data with advanced data introspection and interpretation methods (DM, AI, etc). These results could be of major relevance in the ERP systems, like enhanced master data for e.g. materials, providers or clients or being part of more intelligent transactions, following the idea of the “Intelligent enterprise” as proposed by SAP.
Back then we designed the following high-level integration chart and had our first projects, mostly PoCs for making these data interchanges possible. Connection types were point to point and it was still necessary to understand technologically well the participating peers.
But shortly after, SAP announced SAP Data Hub, which promised to cover many of the of the problems we were observing and the ones that appeared on the horizon when taking our observation to a level of scale for larger companies with dozens or hundreds of potentially interesting system to be integrated in a company-wide network of data pools for which there might exist one day the need to create unified data sets for further knowledge acquisition and optimization potentials.
Technology-neutral data accessibility
SAP Data Hub, now SAP Data Intelligence, showed unified interfaces for accessing very heterogeneous data sources without the need of having to know the technical details of software to access these data pools. Once connected, a data transformation developer could access SAP based data without knowing about BW queries, HIVE tools and requirements or other specifics of a RDBMS that included this. This was of special attraction. A list of currently supported connection types can be found here. It includes 37 connections types, including
- SAP-oriented ones (BW, CDS, ABAP & SAP HANA),
- “Cloud-oriented ones of different providers like AWS, Azure, GCP or Alibaba Cloud
- Hadoop-oriented ones like Kafka, HDFS but also specific RDBMS like Oracle, MS SQL or mySQL
- Any more generic ones like ODATA and others.
Please check out SAP Note 2693555 – SAP Data Hub 2 and SAP Data Intelligence 3: Supported Remote Systems and Data Sources for further details (Valid user required).
There is also an integration with SAP Analytics cloud available, which is becoming of interest in the environments we are working for (link to blog).
This technology-neutral approach from a source system perspective can be realized with the SAP DI Metadata Explorer application and is described in the “Self-Service Data Preparation User Guide” (link)
Advanced data transformations between heterogeneous data pools
In addition, SAP Data Hub offered the possibility to create data pipelines between the participating systems allowing a plenitude of data transformations including advanced data treatment or the application of neural networks and format changes.
Back in 2019 we presented during the SAP Inside track BCN a Data Pipeline that classified images (i.e. unstructured data which can’t be analysed by e.g. SQL) residing in a HDFS cluster and forwarding the classification results to a SAP HANA database after scoring the images against a Tensorflow service. Many more scenarios of transforming unstructured data to make it “analysable” are possible and have a certain value add.
A couple of pipelines and operators are already shipped with SAP Data Intelligence, for a full list, please go to the “Repository Objects Reference” of the SAP DI help. The generated PDF file is 694 pages long.
“Bring your own language”
One of the aspects I personally like most is the possibility to use many different languages within your developments. Currently there are operator containers for:
- Java Script
These means that a data scientist/analyst with a sound knowledge of any of the above languages can create their logic on data coming from a source with which they are likely not so familiar (SAP ERP based on ABAP) by using the Metadata Explorer described above and the Pipeline Modeler to apply their logic to it. All this would happen within the scope of activities allowed by the security concepts as defined in the Data Intelligence Security concept.
Actually, we did this in order to make our Image-to-HANA Pipeline work and which I mentioned above.
One place for data access governance and control
As mentioned in the previous point, SAP DI offers governance and control for connected source systems, who can access which data and who can use which type of operation. This gives an important plus if you want to create significant networks of data integration and which by a point-to-point approach quickly can become uncontrollable.
SAP DI relies heavily on Kubernetes. There is no SAP DI without it. This means that when using SAP DI the skill set required for making it work is very different to “classical” ABAP AS oriented architectures. This Kubernetes architecture makes use of containers (in this case Docker containers) to create its applications, making them much more efficient than running separate virtual machines. So, the technical skills required to work with this new framework is very different from the “classical” enterprise architectures used in SAP world for the last decades. It is necessary to have people on board who know the tools for container orchestration, since SAP Data Intelligence directly communicates with Kubernetes when managing and orchestrating its environment, and the Kubernetes version that SAP supports may differ depending on the platform.
Using Kubernetes allows SAP DI a lot of flexibility in terms on deployment and capability to breath according to resource demands. For “ALM” aspects we were using a Git repository and a Jenkins pipeline for continuous integration. This is all very different to classical SAP Tech people, so consider this when putting SAP DI on your organizational map.
The project we participated in was using SAP Data Intelligence, on-premise, however being deployed in a cloud service provider (IaaS).
Logical umbrella for your data pools
I personally see in SAP Data Intelligence a logical umbrella to unify all the different and heterogenous data pools that already exist and will show up. Just from the mere conceptual perspective I can interpret it as an OS for data integration and inclusion of advanced analytics in this unification of data. There is likely where additional insights and saving potentials will be discovered. “Non Medal winning” topics like security are covered and scalability is given by using Kubernetes.
I do not want to say that everything is perfect yet. But having in mind the reach of the functional scope of SAP Data Intelligence, my impression is that conceptually it is an enterprise solution and what is already in place, I also can say that “Rome was not built in one day”. SAP DI might have competitors of which I don’t know much, but one experience I made during my 20 years of working in the SAP Universe: Integrating with SAP is usually best done by SAP.
During our project we have seen coming functionality for upgrades, improvements in connectivity and integration. In the first phase for going to production SAP BW was integrated with a cloud-based storage in order to make ERP data available to a central data pool managed there. But, as shown above, the capabilities of integrating more complex data transformation functionality, on-premise or as cloud services and doing so in a governed way leads me to the conclusion that much more is possible and that integration, data transformation and analysis in large and technologically and logically heterogeneous environments can be realized very well with SAP DI. Both periodical and real-time data pipelines can be created. All these points to functionalities of SAP Data Intelligence, which I didn’t cover here and which is everything related to governed AI scenarios.
For further details about this topic: Again, please have a look at the openSAP training mentioned in the beginning of this blog.
And look: SAP DI caused a new system type when planning a new system in the SAP Maintenance Planner. SAP DI is still the only product I see when selecting this system type, but like I don’t see yet the end of functionality that can be covered by SAP DI, I also expect to see more products to come based on K8S (There is SAP HANA express edition based on containers, link).
Thanks to my colleagues Jorge and Antonio for their valuable contribution.
Thanks for reading up to the end.
Stay safe and Merry Christmas.
Additional links of interest:
- About SAP DI 3.1: https://blogs.sap.com/2020/12/02/sap-data-intelligence-whats-new-in-3.1/
- Anything of Witalij Rudnicki on this topic. One link here: https://blogs.sap.com/2019/11/21/understanding-containers-with-docker-and-sap/
- Home page of SAP DI in developers.sap.com: https://developers.sap.com/topics/data-intelligence.html
- Teched 2020: If you have access, please check out sessions like
- DAT203 – Integrating S/4HANA into SAP Data Intelligence – Over view and use cases,
- DAT 204 – SAP Data Intelligence: Data Integration with Enterprise Applications,
- DAT206 about “out-of-the-box operators in SAP Data Intelligence to seamlessly integrate with SAP business software”
- The above mentioned sessions were available on demand at the moment of publishing this blog.