SAP Data Intelligence – What’s New in DI:2007
SAP Data Intelligence, cloud edition DI:2007 is now available.
Within this blog post, we combine updates on the latest enhancements for DI:2005/2006/2007. We want to share and describe the new functions and features of SAP Data Intelligence for the August 29th release.
This section will give you only a quick preview about the main developments in each topic area. All details will be described in the following sections for each individual topic area.
Connectivity & Integration
This topic area focus mainly on all kinds of connection and integration capabilities which are used across the product – for example in the Metadata Explorer or on operator level in the Pipeline Modeler.
Connectivity to SAP HANA Cloud
Support of SAP HANA Cloud in the Metadata Explorer, Connection Management, Pipeline Modeler (incl. SAP HANA, Connectivity, Flowagent and Structured Data Operators) which allows to consume and persist data in SAP HANA Cloud and the data lake.
Support SAP BW connection through cloud connector
Now you can connect to on premise SAP Business Warehouse (SAP BW, SAP BW on HANA and SAP BW/4 HANA) systems that are running behind a firewall using the SAP Cloud Platform, Cloud Connector.
Data Preview on Consumer Operators
New Structured File Consumer and Structured Table Consumer operators now provide an option to preview the data of a chosen file or table within the Pipeline Modeler:
Metadata & Governance
In this topic area you will find all features dealing with discovering metadata, working with it and also data preparation functionality. Sometimes you will find similar information about newly supported systems. The reason is that people only having a look into one area, do not miss information as well as there could be also some more information related to the topic area.
Import metapedia terms from SAP Information Steward into the business glossary of SAP Data Intelligence
Increase your return on investment in SAP Information Steward by accessing metapedia terms from SAP Data Intelligence. Support a federated solution approach for SAP Information Steward and SAP Data Intelligence.
Import rules from SAP Information Steward into SAP Data Intelligence
Regain data management investment made in SAP Information Steward to add value to SAP Data Intelligence for product cost savings and efficiencies.
Automate the extraction of lineage information from pipeline and data preparation
Increase efficiency with automation of extraction of lineage and publishing to the catalog. Enrich catalog with lineage information in the datasets processed by pipeline and data preparation.
Link Metadata Explorer artifacts with business glossary
Associate all functionalities related to metadata explorer including fact sheets, rules, and lineages with the business glossary, which is the common vocabulary used for business.
Support UNION ALL in data preparation
It now allows merging of multiple datasets with UNION ALL option.
Support aggregation of values in a column in data preparation
Feature parity with agile data preparation. Increased functionality for business users and business analysts to perform data preparation in Data Intelligence
Support multiple glossaries to match SAP Information Steward
Support businesses with more than glossaries, which are collections of categories (which are collections of business terms). Feature parity with Information Steward and flexibility to define business glossary across Line of Business.
Support right outer join in data preparation
Join a preparation with another preparation (no self-join) while retaining all of the records of the right preparation even if unmatched, such as with a right outer join.
Support union to remove duplicates in data preparation
Union a preparation with another preparation (no self-union) while removing duplicate records, such as with distinct union.
Data preview of JSON, PDF and Image files
New support preview of JSON, PDF and image files in factsheet.
The screen below shows preview of a JSON file:
This topic area covers new operators or enhancements of existing operators. Improvements or new functionality of the Pipeline Modeler and the development of pipelines.
Enhanced Graph Snippet functionality
New enhanced capabilities of the graph snippets to support more concepts of design-time configuration of a graph and to simplify the creation process including:
- Support for editing existing graph snippets
- Support for group configuration in a snippet, e.g. group multiplicity
- Support to use a SVG image file as an icon for a graph snippet
- Support for adding an additional description next to parameters
- Resetting all configuration parameters
- Definition of shared parameters that can be used by multiple operator configurations of the same graph snippet
Now users can change an operator and save it as a new version. Multiple versions of the same operator can exist and statuses for versions can be identified as active or deprecated. This lets operator owners release new versions to users, while keeping the option of using the deprecated operators.
Run Pipeline in Debug Mode
Users can now run a pipeline in Debug Mode to be able to see the messages between operators. In Debug Mode, the pipeline runtime view shows tracepoints for each edge in the pipeline graph that allows to open a Wiretab (message viewer) to see the edge traffic.
Import/Export files from Data Pipelines Modeler
User can now directly import and export files from the Data Pipelines Modeler repository browser. It is also possible to directly export solutions.
Usability Improvements in Machine Learning applications
Several usability improvements in existing Machine Learning applications have been implemented:
- Show run tags in Metrics Explorer (see screenshots below)
- Show run name in Metrics Explorer
- Improve Error Message for Duplicate Scenarios in Machine Learning Scenario Manager
Tracking SDK: Fetch Runs under Run Collection
As part of conducted Model trainings which usually comes with multiple runs and with respective groupings under specific run collections, it is now possible to run all objects grouped under a specific run collection.
Multi-Model Serving Capabilities
It is possible to deploy and run multiple models in a model server (in a single node). As a consequence, the end user can save inference costs. Moreover, sharing of GPU resources is possible as well.
Content Templates for Content Delivery
Content template packages are collections of DI resources (pipelines and notebooks) that can be imported into a DI tenant using standard import capabilities in SAP Data Intelligence. Content template packages can be used to speed up implementation for ML scenarios.
SAP HANA Python Client API for Machine Learning Algorithms: support for Time Series algorithms
- HANA ML operators now offer integration with SAP HANA’s PAL (Predictive Analytics Library) and APL (Automated Predictive Library) Time Series analysis tasks for selected algorithms. In addition, requests to the HANA ML inference operator can now include inference parameters when applicable.
- The new HANA ML Forecast operator enables the use of Time Series algorithms from SAP HANA’s PAL (Predictive Analytics Library) and APL (Automated Predictive Library) in a combined fit and predict step, without persisting trained models.
This topic area includes all services that are provided by the system – like administration, user management or system management.
User Resource Quotas
Users can now be granted resource quotas for pipelines and application usage to allow for fair resource distribution among the users. The following resources can be limited for users and groups using the policy framework:
- CPU consumption
- Memory consumption
- Number of Kubernetes Pods
Application Start Policies
New resources have been added to the policy framework to permit access to the individual SAP Data Intelligence applications (Connection Manager, Pipeline Modeler, Meta-data Explorer, etc.). This allows administrators to create new roles of users with specific permissions to use applications.
IdP Support for System Command-line Client (vctl)
Users can now authenticate to vsystem with the System Command-line Client (vctl) using their credentials from an external Identify Provider (IdP).
Improved Resource Efficiency of System Applications
The Launchpad and System Management application do now need less system resources when used by several of users by sharing the underlying instances to all tenant users.
Simplified Application Lifecycle
Applications in SAP Data Intelligence can now be re-started, if needed, instead of stopping and starting them on demand. It is also ensured that only a single logical instance is running per tenant or user (depending on the type of the application).
Deployment & Delivery
Within this focus area, all functions and features which are dealing with the setup process, installation or deployment will be described.
Deployment of SAP Vora is now optional when creating a new SAP Data Intelligence cluster
Lower TCO: Minimum cluster size is now 2 dynamic nodes (down from 3).
These are the new functions, features and enhancement for SAP Data Intelligence, cloud edition DI:2007 release.
We hope you like them and, by reading the above descriptions, have already identified some areas you would like to try out.
Thank you & Best Regards,
Eduardo and the SAP Data Intelligence PM team
Thanks for the blog, could you please let me know how can i connect my Hadoop Onpremise system to SAP data Intelligence?