Technical Articles
SAP Data Intelligence Metadata Explorer – Part 1
Facing Data governance issues while implementing data driven innovations across your enterprise?
Here are the latest innovations in data governance that SAP Data Intelligence has in-store for us.
This blog series will be focusing on the detailed features of Data intelligence w.r.t Metadata explorer and will have a broader comparison of different tools of Metadata management, Data Quality and Data Profiling tools that are available in technology space. This blog is intended to the people who are trying to explore more on Data Intelligence features and how it can be related to different Data Governance features.
It also dives into the intricacies of “how-to” automate the process of data profiling and Data quality.
*Firstly, let us know why any organization needs to adapt Data governance solution and what data issues does it exactly addresses
In an organization the data is distributed across multiple systems and what if your data is inconsistent between systems and doesn’t roll up well, how would you address this issue? Take a minute and try to think of such scenarios…
Now in most of the scenarios where you came up with solutions the data might be simple, usually the data problem doesn’t come in dribs and drab, it is high in number and more complicated and in large quantity. Generally, when such issues are reported, the IT doesn’t have enough context to fix the data and business users believe this is an IT Issue. This is where the organization ends up with no mans-land.
This leads to a lot of problems in variety of ways, like your reports don’t roll up well and this will further lead to the failure of data reconciliation.
Organizations use a framework called Data Governance to tackle this no-man’s land. A framework which facilitates the organization with the overall management of the availability, integrity, usability, ensuring quality and security of the data.
So all the benefits of Data Governance can be effectively achieved through the SAP Data Intelligence Metadata explorer. The Metadata explorer helps you to track the data between sources, catalog the data sets, publish the data sets, Profile the data, Data Quality and check for sensitivity of the data.
These are a brief set of features of the Metadata explorer that I have come across during my research on Metadata explorer.
-
- Get the Meta data of the data set
- Preview the data set
- Profile the data to see the distribution of the data
- Publish the data sets to allow the users to access and search for the data
- Can check the distribution of data using wizards like charts for the Profiled data set.
- Tagging and creation of tag hierarchy on the data sets that has registered and published.
- Creation of Validation rules, rule books and binding of the data sets.
- Data lineage on registered data and also on data transformed file.
- Creation of score cards to visualize the Quality score of the data sets, categories and rule-books.
- Visualizing the rule set results by creating dashboards on the data sets
- Monitoring the runs of Publishing, Profiling, Data Preparation and rule books
- Business Glossary for defining business terms and relationships.
Metadata explorer can be useful to different users in Data Governance frame work in difference ways. Below are the list of roles that are part of Data Governance Programme.
Roles in Data Management | Role Description |
Data Intelligence in-store features |
Data Owner |
They are responsible on business requirements for data and on data quality for one or more data assets. They have authority to make changes and accept/reject the data Quality rules |
Data Intelligence integrated with UI5 Workflow service helps the data owners to accept or reject the data quality rules.
|
Business Data Steward |
They are responsible for assessing the data quality reports and profiling reports. They are responsible for Data Cataloging and maintaining Business glossary |
Data Intelligence has rich set of features in Cataloging the data set as well as columns in the data set. Business Glossary helps the Business stewards in maintain the dictionary, relationships, and synonyms of the data set organized by data domains. |
Technical Data Steward/Data Insight Analyst |
They are responsible for Data standardization, profiling source system, also the data lineage of the datasets. · Import Files and create project structure · Remove Files and delete the project structure. · Create, delete and edit profile tasks and rule tasks. · Create, edit and delete rules and rule bindings. · Create, edit and delete the data preparation tasks. · Create, edit and delete the data quality score cards. |
Data Preparation in Metadata explorer provides Data Steward with data standardization capability upon source data using manage preparation. Each data set has an option to run the profiling tasks and also to apply preparation tasks. For each dataset you can get the lineage and see the data flow between the systems.
|
Data Insight user |
They can only view the connections, Fact sheet, sample data, profile report, prepared data, rules, rule bindings, sample data that failed rules and scorecard results | With feature road-map of SAP Data intelligence this can be achieved using RBAC (Role Based Access Control) and the policies defined over the user control. |
Data Insight Scorecard manager |
They have all rights of Technical data steward, plus right to edit the data quality rules, rule bindings and score cards | Data Quality in Metadata explorer provides the scorecard manager for the creating and editing the rules, rule bindings and score cards |
*Now let’s get familiar with the usage and the basic features of SAP Data intelligence Meta data explorer.
SAP Data Intelligence Metadata Explorer Features
The landing page of Data intelligence Metadata explorer contains different tiles that help users to navigate to the desired area.
In Browse Connections tab you can see all the connections that are created in Connection Management tile. You can view the source data, Metadata of the data set, profiling report, run the lineage analysis on the supported connections and also run the preparation tasks.
You can also view the connection capabilities that are supported with the connections available as part of Data Intelligence. In this blog we will work with AWS S3 bucket (connection created for demo) and below are the functionalities it supports.
With the current version of Data Intelligence, Lineage extraction is not supported for AWS S3 bucket.
Fact sheet gives you the detailed information of the Metadata of the data set, it provides the sheet with the columns, Data types, tags, Unique key and description. It also provides the detailed information of the data set, Connection ID, Type of Data set, Data set Size, Last Modified, Last Published and many more.
It provides you with the trends on the row count and size, it also provides charts to get a better view on the data spread and metadata of the data.
Data preview tab provides you with the sample data of the data source, by default no. of records is set to 100 and it can be extended up to 1000 rows.
Publishing the data – Once the data set is registered it can be published, publishing the data set provides easy accessibility of the data set to the team and also allows the team to search for data set in the search bar, to add comments and Tag the data set. Tagging the data set provides you with most sophisticated search on datasets.
All the published datasets can be seen from “Browse the Catalog” screen and the progress can be monitored from Monitoring tile.
Data Profiling is the process of examining and providing detailed statistical report of the data. Based on this report different users can act on the data accordingly, it helps the users to gain better insights of the data, helps in taking some cleansing actions such as running diagnosis and also helps in keen examination of the data that you have.
SAP Data intelligence has in-built feature of data profiling which provides you with additional metadata of the data set, some of the meta data attributes are Minimum-Maximum, Average Length, Null Values, Blank Values, Distinct values and many more will get appended to the fact sheet of the data set.
All the Profiled data sets can be seen from “View Profiled Datasets” tile and monitoring run can be seen from monitoring tab of Data intelligence.
Data Preparation used to manipulate the raw data prior to data processing and data analysis. It helps in collecting the data, data cleansing, data enrichment, data standardization, Data imputation and consolidating the data into one accurate.
Data preparation in SAP Data intelligence provides you the ground to perform all the kinds of transformations that to be performed on the raw data prior to the data analysis. Some of the functionalities of SAP Data Intelligence Data preparation includes in-column transformation, custom logic on existing column, data enrichment, duplication of data, editing the recipe and run the preparation task.
All the preparation runs can be seen in “View Preparation” tile and can be monitored from Monitoring tile.
While executing the preparation task it asks to store the preparation file in any of the supported target container. Data Intelligence has tight integration with Agile Data Preparation. it also has natively available operators that helps to automate the process in more sophisticated way.
Recipe records all the transformations and manipulations that were done to the data at the time of data preparation. You can also revert the changes by deleting the recipe items.
Data Quality and Validation rules are the special form of Business rules that are defined on the data, it provides the flexibility to check the quality of the data by defining data quality rules, to check whether data meets the required standards, to check for sensitivity of the data and also to check the completeness of the data.
SAP Data intelligence can import the rules which are already defined and consumed in SAP Information steward. It is just a single step process to import all the approved rules from SAP Information steward to SAP Data Intelligence
We can also define the rules and test the rule in prior to the data set binding.
Rule book allows you to import the rules that were already defined and bind the rules to the particular data set. You can run the rules and it provides you with percentage of passed records and also the sample of records which doesn’t satisfy the rule.
Rule book runs can be monitored from the monitoring tile to check the progress of the rule book.
Rule Dashboard enables you to create interactive dashboards on the rule categories, on rule books and on datasets. It allows you to create dashboard on single data set/rule category/rule-book, comparison of multiple datasets/rule categories/rule-books and also to capture data set/rule category/rule-book trends.
Business Glossary is a business metadata that provides the semantic context to data. It helps the organization to define the terms and relationships between the objects. SAP Data Intelligence Business Glossary is a library where you can define the business terms and associate them with the business. Additionally, it also allows to create relationship between terms, datasets and columns to bring in a meaningful relevance of the data in the organizational context.
Categories in Business Glossary allows you to group the logical terms into a single unit. You can also add Keywords, synonyms to the Business term which helps you define the metadata of the Term.
Conclusion
Metadata explorer has a rich set of features where organization can adapt for the Data Governance program. With Metadata explorer organizations can successfully execute 5W’s and 1H called who – what – how – when – where and why of data. It not only ensures the security and compliance of the data but also delivers valuable insights of the data with 5W and 1H principle.
I hope this blog provided you with the intrinsic features of Metadata explorer in Data intelligence, starting from end to end view on data management and orchestration for the big data analytics.
You can post any queries and concerns you face in the comment section below and also feel free to share any findings you come across that I missed in this blog.
Happy Learning 🙂
Thank you!
Awesome and comprehensive blog!
How can we add failed rows in rulebook to a table or file & how to schedule the rules in metadata explorer?