SAP Data Intelligence 3.0 is now available. This version can be installed on-premise on certified setups, be deployed at supported hyperscaler certified consumed as a service or via private cloud partner environment.
Probably you also realized that we are not talking about SAP Data Hub anymore. We change the name to unify the product name towards SAP Data Intelligence to align the cloud service and on-premise product. Feel free to checkout this blog from Marc Hartz which describes this in more detail: https://blogs.sap.com/2020/03/20/sap-data-intelligence-next-evolution-of-sap-data-hub/
Within this blog I want to share and describe the new functions and features of SAP Data Intelligence for the march release.
This chapter will give you only a quick preview about the main developments in each topic area. All details will be described in the following sections for each individual topic area.
Connectivity & Integration
This topic area mainly focus on all kind of connection and integration capabilities which is use across the product for example in the Metadata Explorer or on operator level in the Pipeline Modeler.
Support of AWS RedShift
We introduced a new connection type for AWS Redshift that can be used in Metadata Explorer for Browsing, Fact Sheet and Data preview. Also a new operators for reading data from AWS Redshift and running SQL statements within the Pipeline Modeler is available.
Enhancement of Azure Data Lake Storage Gen2
A new connection type for Azure Data Lake Storage Gen2 is available. It includes the metadata extraction, profiling, rule validation and use as dataset in data preparation.
Support of SFTP
This enhancement allows connections to an SSH File Transfer Protocol server. This includes reading and writing files in the Pipeline Modeler via Read File and Write File operators as well as removing files via Remove File operator.
Enhancement of SAP HANA Client
When running SQL statements there is a new option available that allows to specify whether statements are grouped into one or multiple database transactions.
Enhancement of Google PubSub Operator
New configuration option withing the existing operator that allows to choose between asynchronous and synchronous pull from Google PubSub.
Metadata & Governance
In this topic area you find all features dealing with discovering metadata, working with it and also data preparation functionality. Sometimes you will find similar information about newly supported systems. The reason is that people only having a look into one area, do not miss information as well there could be also some more information related to the topic area.
New Business Glossary
Introduce business glossary to define terms and associate them with related metadata explorer objects. Allow search for terms and view their relationships. Enable to create categories to group related terms. You can find a screenshot below.
- To have a central and shared repository for business terms and definitions to promote a common, consistent understanding of business terms within your organization.
- To support templates to create new terms and grouping terms into categories.
- To allow search for terms and categories.
- Business glossary consists of three main areas: the term template, the categories, and the terms.
- Allows edit, update and delete of terms and categories
- Supports search for categories only, or search for terms only, or search for a combination of both.
Support for Azure Data Lake Storage Gen2
Support Azure Data Lake Storage Gen2 for Browse, Fact Sheet, Preview, Profile, Publish, Rules, Data Prep source and Upload Capabilities in Metadata Explorer.
Support of Additional Databases and Data Warehouse
Support Metadata Explorer functionalities including Browse, Fact Sheet, Preview, Profile, Publish, Rules, Data Prep source for IBM DB2, Oracle MySQL, Oracle, MS SQL Server, MS Azure Cloud SQL. In addition, Metadata Explorer functionalities including Browse, Fact Sheet, Preview for Amazon Redshift.
Enhancement of Data Preparation functionality
New supported operations for join/merge including inner and left outer join as well as join conditions include complex condition containing comparators: (not) equals, greater than (or equals to), less than (or equals to), (not) between. New options for Append/Union with column mapping suggestions (across different data types).
This topic area covers new operator or enhancement of existing operators. Improvements or new functionality of the Pipeline Modeler and the development of pipelines.
Graphical operator for data transformation
New structured data operators which can be used to perform data transformations, such as projection, filter, column operations and joins. The possible source and target objects are CSV-, Parquet- and ORC-files stored in any supported (cloud) storages as well as any supported database.
New operator that extends the functions of the database table consumer operators and allows capturing delta changes for different databases. The handling and application of changes to either a table in a target database or files in a cloud storage is possible.
Integration with HANA ML
Two new operators HANA ML Training and HANA ML Inference to allow users to leverage SAP HANA in-database machine learning libraries such as SAP HANA Predictive Analytics Library (PAL) and SAP HANA Automated Predictive Library (APL) using within a pipeline. You find a screenshot of a pipeline with the new HANA ML Training operator.
New step-by-step procedure to create a pipeline easily
Re-usable building blocks of pipelines whose basic unit is an operator. These snippets represent a group of operators and connections that have a logical meaning. This step-by-step process helps to define snippets and generate graphs from snippets. The screenshot below shows the step-by-step procedure to build a pipeline.
Automated resource clean up
New feature to define time- and number-based thresholds to clean up resources of completed pipelines for garbage collection.
Pre-validation of pipelines
During the save procedure automated basic validations of a pipeline is executed, including used connections and operator configuration to support pipeline development.
Enhanced Pipeline resource management
New feature to configure different resource limits for CPU or memory consumption for pipeline workloads.
Machine Learning & Data Science
This topic area covers all functions that support ML related activities within the product. With the release SAP Data Intelligence 3.0 all ML related functions and features are also available for on-premise customers.
TensorFlow 2.0 / PyTorch support
Deep Learning library support extended for both the ML Inference Service (TensorFlow 2.0) and the ML Training Service (TensorFlow 2.0 and PyTorch)
Improvements of the ML Scenario Manager
Removing of the necessity to create scenario versions before running executions or deployments from within the scenario. Also an enhancement the act of troubleshooting in the ML Scenario Manager and the AutoML application by providing more detailed error messages.
Metric Explorer represents an UI component of the tracking service and is embedded in the ML Scenario Manager application. The screenshots below are illustrating the Metric Explorer.
- To have a UI for listing run collections and runs
- To view captured metrics and parameters captured as part of both the machine learning experimentation phase and the training phase
- To visualize and compare run metrics
- To download visualization canvas as pdf
- Helps in listing, organizing, viewing, and sorting Runs and Run collections from one or more machine learning scenarios
- Allows to deep dive into details of a single run
- Supports comparison of metrics and parameters across various runs using charts for better visualization
- Ability to export the charting canvas as pdf to share the experiment comparison
GA deployment for Inference Service (BYOM)
GPU support for inference on ML models for your one premise installation.
This topic area includes all services that are provided by the system like administration, user management or system management.
Enhanced System Monitoring
Detailed monitoring of system resources based on user, application and pipeline level for optimizing operations and system utilization. The screenshow below shows the monitoring in the system.
Improved Resource Utilization of Launchpad Applications
Enhanced resource management including sharing of application resources of launchpad applications between multiple users.
Introduction of Guest Profiles for Identity Providers
New option to assign guest (member) roles to explicitly grant access to users from Identity Providers.
Deployment & Delivery
Within this focus area all functions and features which are dealing with the setup process, installation or deployment will be described.
New installation/upgrade procedure for DI 3.0 on-premise. Please check the official documentation on help.sap.com: Installation DI 3.0
Hyperscaler Support (SAP Data Intelligence, cloud edition / planned begin of April)
SAP Data Intelligence can be deployed in AWS APJ Tokyo and Microsoft Azure EU in Amsterdam.
SAP Cloud Connector (SAP Data Intelligence, cloud edition / planned begin of April)
Connection from SAP Data Intelligence to on-premise data sources can be established via Cloud Connector as an additional option beside VPN.
These are the new functions, features and enhancement for SAP Data Intelligence march release.
I hope you like this new format and already identified some areas you like to try.
Thank you & Best Regards,
Tobias and the SAP Data Intelligence PM team