What's new in SAP Data Hub 2.4

Former Member · ‎01-18-2019

We're excited to share the newest release of SAP Data Hub 2.4 with you. While this is an incremental release, following the Introduction of SAP Data Hub 2.3 major release, it is more than just corrections and bug fixes. There are several new features and key enhancements which provide greater flexibility and more protection for our customers. You will be pleased to learn about them!

Here's a quick breakdown of the new features introduced in SAP Data Hub 2.4:

Extending native connectivity to support more databases and applications

Today, SAP Data Hub already provides a broad spectrum of connectivity to big data and enterprise sources. As integration remains a building block for the digital transformation, our top priorities is to continuously grow the native connectivity with more enterprise applications.

In this release, we added direct integration with several structured data sources including MS SQL Server, MySQL, IBM DB2, and Google BigQuery. Once a connection is established, SAP Data Hub will automatically crawl the metadata for these connected sources. You can then browse, view, profile, catalog, and share the data directly within the Metadata Explorer. In addition, there are more than 350 predefined operators and data pipelines that already exist ready to be used for supporting broader scenarios.

Enabling Data Lineage for disparate & distributed data sources

We have introduced the SAP Data Hub Metadata Explorer in the previous release. Our goal is to provide a centralized location for all data professionals to gain insights on diverse datasets in today's modern distributed landscape.

This release we are increasing our investment in metadata governance by offering end-to-end support for data lineage at the schema level. You can use the new lineage analysis feature to view a graphical representation of the source, transformations, and dependencies of a dataset. Lineage information can be extracted from computed datasets such as SQL View, and other types of computations including stored procedures, BW transformations, datastores, and the Data Hub pipeline modeler.

The Data Lineage feature will further help you to gain visibility about your data assets and greatly simplify root cause analysis. You will have a clear understanding of the data's origins, how the data may have changed, and which areas might be consuming the data.

The initial support is focused on SAP connections including Business Warehouse, HANA, Vora and data pipelines. Our plan is to extend this functionality towards all supported sources as well as allow a complete audit trails for business security and compliance in future release.

Providing a new Anonymization operator for individual's privacy protection

With GDPR being enforced in 2018, we know meeting this regulation is still on everyone's mind. Previously, you could use the data mask operator to mask out all or a portion of the data that contains sensitive information. Now, you can use the new anonymization operator to further protect the privacy of each individual identity by grouping similar records into a category. Thus, you can discover statistically valid insights from your data without risking re-identification of individuals.

Below is a sample graph that shows how the anonymization operator is used within a pipeline to hide the individual identities within a group of records.

Suppose you would like to create a healthcare report on patient data to find out what are the most common reasons for an emergency room visit. Since the input data contains sensitive information about your patients such as name, address, and age, you would need to protect their privacy.

The anonymization operator will use various techniques such as masking, generalization, and global suppression to reduce the granularity of the data representation. In this example, the patient name is completely hidden, zip code is partially masked, age information is being generalized, and the patient data is categorized in 3 groups.

Depending on the user configuration, the anonymization operator will suppress the output result by removing records that do not pass the threshold. In this example, Row Id #4 will be removed from the final output if user defined the count threshold > 2.

SAP Data Hub has a mission in mind and that is be the best in class SAP integration with applied artificial intelligence on data while remaining open to other technologies. In the year of 2019, we will focus on fulfilling our customer needs by rolling out more capabilities focused on tightening integration with business applications (on-premise and cloud), broadening machine learning support, and further enhancements on metadata governance. So, stay tuned to see the number of improvements and features that are coming soon this Spring.