SAP Information Steward and SAP Data Services and Data Quality are indeed inseparable and complementary solutions. In this blog post, we are going to cover both an internal and external view as to why. We will explore use cases, features and architecture that make these two solutions the very best of friends.
Typical Use Case Scenarios
Some of the typical use case scenarios where Information Steward and Data Services work together to provide a solution, include ETL/Data Warehousing, Data Quality Management, Enterprise Master Data Management, IT/System Maintenance, Business Intelligence/Reporting and Data Migration. The table below contains some example of how the products fulfill use case requirements.
|Use Case||SAP Information Steward||SAP Data Services|
|ETL / Data Warehousing||Analyze source and target data to help with mappings and transformations||Creating the data flows, extraction, consolidation/transformation and load|
|Data Quality Management||Initial insight into data content to understand cleansing requirements||Perform cleansing and matching, in batch and real-time|
|Enterprise Master Data Management||Initial insight and continuous monitoring of master data quality||Cleanse, consolidate and load master data repository|
|IT / System Maintenance||Understand quality, impact and lineage of data – Where is data used up/downstream||Movement of data for system upgrades|
|Business Intelligence / Reporting||Understand quality, impact and lineage of data – Is data fit for use?||Populate business warehouse for reporting|
|Data Migration||Identify different data representations and and quality across systems||Migrate data into new system, merge acquired data|
Let’s focus on two use case scenarios in particular, data warehousing and data migration. For data warehousing, Information Steward is going to support you in analyzing your source data to understand what content is available at the source as well as the quality of that data. Profiling results such as word, value and pattern distributions will help you understand the need for mapping tables, or perhaps standardization of the data during the ETL process. In addition, advanced profiling can help you to identify referential integrity problems. For example, Information Steward could highlight the fact that the ORDER_DETAIL table contains Parts IDs that do not exist in the PARTS table. With a data migration project, let’s say one that arose as a part of an acquisition, Information Steward will help you gain familiarity with the newly acquired source system through data profiling, helping you to understand:
- Is the content in the new acquired source system of similar format, structure or type than your corporate system(s)?
- Again, is there a need for mapping tables or data standardization to be used as a part of the data migration process?
You can also perform a data assessment by running the new source system against your already establish data standards/quality rules within Information Steward. If cleansing needs to occur on the source system due to poor quality, the Data Quality Advisor and Cleansing Package Builder can support you to quickly and easily develop the needed cleansing and matching rules. If there are duplicate customer or product records found across systems, those records (or a portion of those records) can be manually reviewed with Information Steward’s Match Review feature.
In both use case scenarios, Data Services is going to provide you the broad connectivity to databases, applications, legacy systems, and file formats that is needed to support your requirements for data extraction and loading. Then, based on the results of the data profiling and assessment, Data Services can be used to transform the data to standardize the data from multiple sources to meet a common data warehouse or system schema. Data Services can additionally be used to cleanse the newly acquired data to meet the quality standards your organization has in place. De-duplication can be performed when redundancy need to be eliminated when bringing together the multiple sources of similar data. And, in the case of that data warehouse, Data Services provides you the means to capture change in order to perform delta loads on a regular basis.
When we focus specifically on Data Quality, Data Services and Information Steward are complementary solutions. Below is what we like to call the “Data Quality Wheel“.
To start the process, you are assessing the data to identify issues and determine overall health. And, on the back end, monitoring is in place to keep an eye on the ongoing health of the data. This is where SAP Information Steward provides your solution. Information Steward provides the platform for the business-oriented user to gain the necessary insight and visibility into the trustworthiness of their data, allowing them to understand the root cause of poor data quality as well as recognize errors, inconsistencies and omissions across data sources.
The next step in the process takes action with SAP Data Services Data Quality capabilities to guarantee clean and accurate data by automatically cleansing your data based on reference data and data cleansing rules, enhancing your data with additional attributes and information, finding duplicates and relationships in the data, and merging your duplicate or related records into one consolidated, best record. Data Services enables the technical user with broad data access and the ability to transform, improve, and enrich that data. This process can occur in a batch mode or as a real-time point of entry quality check against business requirements.
And, there is some overlap. Information Steward additionally gives that business-oriented user the tools to help with improvement efforts – with intuitive interfaces for developing data quality rules as well as cleansing rules that work within a Data Services ETL data flow to improve enterprise data quality. Information Steward also supports those business users in manually reviewing the results of the match and consolidation process, to spot check and validate duplicates that are flagged in Data Services as low confidence matches.
Let’s look specifically at some of the product features that support the concept of sharing, the type of sharing that we would expect with best friends.
Sharing Validation Rules
Validation or quality rules defined in Information Steward to access and monitor the quality of your information assets can additionally be published to Data Services to be included as a part of a batch or real-time data flow to perform the same quality or consistency checks during various ETL activities.
Sharing Cleansing Rules
Information Steward’s Cleansing Package Builder empowers data stewards and data analysts to develop custom data cleansing solutions for any data domain using an intuitive, drag-and-drop interface. Cleansing Package Builder allows users to create parsing and standardization rules according to their business needs
and visualize the impact of these rules on their data (as rules are being developed and as changes are being made). Once the data analyst has developed the custom data cleansing solution, the Cleansing Package is published to Data Services for use with the Data Cleanse transform to parse and standardize incoming data into discrete data components to meet the defined business and target system requirements.
Information Steward’s Data Quality Advisor guides data stewards to rapidly develop a solution to measure and improve the quality of their information assets. This is done with built-in intelligence to analyze and assess the data and make a recommendation on a cleansing solution – with the simplicity to allow a Data Steward to further review and tune the rules to get even better results. When satisfied with the rules and results, the data steward can publish the data cleansing configuration to Data Services. Allowing the IT developer to use – or consume – that solution within the context of larger production data set and ETL data flow.
Reviewing Match Results
Information Steward’s Match Review is a critical step within overall “matching” process. While the Match transform in Data Services provides a very comprehensive set of mechanisms to automatically match and group duplicate records, matching still remains a combination of art and science. While you can get close to accurate matching using the Data Quality Advisor (in Information Steward) and the Match transform (in Data Services), there may be results (a gray area) that would benefit from additional, manual review. Information Steward provides a business user-centric interface to review suspect or low confidence match groups that consist of duplicate or potentially duplicate records. Based on their domain expertise, the users can confirm the results of the automated matching process or make changes, such as identifying non-matching records. In addition, business users can then review and pick and choose fields from different records within the Match group to fine tune that pre-configured, consolidated best record. The review results are available in the staging area. You can configure whether the results should be made available at the completion of the review task or incrementally as each match group is processed. The downstream job or process can read the results from the staging repository and integrate them into the target system.
Discovering Data Services Metadata
Information Steward’s Metadata Management capabilities discover and consolidate metadata from various source into a central metadata repository to allow users to manage metadata from various data sources, data integration technologies, and Business Intelligence systems, to understand where the data used in applications comes from and how it is transformed, and to assess the impact of change from the source to the target, reports or application. SAP Data Services objects show up under the Data Integration category of Information Steward’s Metadata Management. Information Steward has a native Metadata Integrator that can discover a vast array of metadata objects – including projects, jobs, work flows, data flows, data stores (source and target information), custom functions, table & column instances, etc. – as well as understand the relationship between these metadata objects and up and downstream systems.
Performing data lineage analysis in Information Steward, we can see how data from the source has been extracted, transformed and loaded into the target using Data Services. In this example, you can drill into to determine how LOS, or length of stay, was calculated and what source fields ultimately make up the patient name.
An Architectural Perspective
Architecturally, Information Steward and Data Services are inseparable in that Information Steward relies on Data Services. In addition, they have a lot in common as well, they both leverage Information Platform Services (IPS). Information Steward and Data Services both rely on CMS services for centralized user and group management, security, administrative housekeeping, RFC Server hosting and services for integrating with other SAP BusinessObjects software (i.e. BI Launch Pad). A dedicated EIM Adaptive Processing Server is shared between Data Services and Info Steward. Services deployed to the EIM Adaptive Processing Server include:
- RFC Server – Used for BW loading and reading via Open Hub
- The View Data and Metadata Browsing Services – Provides connectivity to browse metadata (show tables and columns) and view data
- Administrator Service – Used for cleaning up the log files and history based on log retention period
In terms of being inseparable, the Data Services Job Server is required for Information Steward Job Server to work. With this need, there also comes benefit as Information Steward is able to leverage the great capabilities that Data Services has to offer. For example, Information Steward scales by leveraging Data Services ability to distribute work load across servers as well as across CPUs. In addition, Information Steward leverages Data Services for direct access to a broad range of source connectivity, including direct access to SAP sources like SAP ECC. Information Steward leverages Data Services as its core engine to not only access data but also to execute profiling and validation rules against that data. With that being said, Data Services and Information Steward’s source connectivity capabilities closely mirror each other, where it makes sense and where there are not technical limitations in doing so.
So, what do you say? SAP Information Steward and Data Services: inseparable, complementary, best friends…? How about, they complete each other? In any event, what a match!