Data Quality Cockpit using SAP Information Steward and SAP Data Services
There has been a lot of advancement in the area of Information management in last couple of years. Companies have invested millions of dollars and implemented many tools and processes to improve the way they manage their information assets. With such a large demand from companies, Vendor’s also have accordingly expanded their portfolio of offering to cater to the business requirements. Some Vendors have developed solutions themselves and some have acquired specialists who have proven tools to quickly address the market demand. SAP is no different and has been continuously on the path of innovation and in parallel looking at acquisitions to provide tools to meet the Customers’ demands. Two such solutions from SAP are SAP Information Steward and SAP Data Services. These two solutions along with SAP Master Data Management are primarily aimed at solving that puzzle of master data quality.
Business Data Quality Problem
Since the introduction of master data solution by SAP, many Customers have adopted the tool and have tried to solve the problem of bad and redundant data. It is known fact that master data and its problems are more often spoken and less often dealt with. The tangible benefits of good quality master data is not easily quantifiable and hence receives less focus from business. In recent past, Customers have implemented MDM solutions hoping that it would solve their issues with quality of master data. Often it is believed that MDM solves all problems of master data and implementing MDM is sufficient to manage data quality. When it has solved the problem to some extent in the area of managing and governing master data, it has not solved the problem completely in the area of data quality. Reason being, tools meant for managing master data are not very good tools meant for cleaning and enriching the data.
Some common complaints are –
- Poor governance and stewardship are reasons for bad quality of data
- Rules insufficient to identify bad and duplicate data
- No holistic view of data quality in the productive database
- Data in productive database has gone bad over a period of time
- Period data quality checks on productive database not possible due to tool limitations
- Mergers and Acquisitions lead to unclean data due to poor stewardship
Data Quality Cockpit
MDM as a tool is meant to consolidate, centralize and harmonize master data between various connected systems. But it lacks in depth data cleansing, transformations, enrichment and ongoing data quality assessment capabilities. The crux of the problem is in enabling business to control, monitor and maintain the quality of data on a continual basis and at the same time leverage existing master data setup.
The answer to this problem statement is a combination of SAP’s Information Steward and SAP’s Data Services which could be leveraged in the existing client’s landscape with minimal disrupt to the established business processes. These two tools can very well complement the master data solution or work independently in providing a comprehensive solution for data quality management. SAP Info Steward provides the right tools to understand the fundamental area of problem in order to know where to focus the solution. This along with ETL capabilities of SAP Data Services provided the complete solution. Together they form the key elements of the data quality cockpit for information management.
What SAP Information Steward can do?
SAP Information Steward has some key capabilities that can enable business to assess, analyze and quantify the business problem. Using SAP IS for financial impact analysis could be the very first step to solving the DQ problem. The DQ score card fulfills this to a very large extent.
Further, SAP IS has capabilities that will enable fixing the bad data. The basic/advanced profiling, business user friendly rule building mechanism, metadata management, impact analysis, data lineage and cleansing package builder are some key features that can be used in the context of analyzing data cleansing and standardization requirements. Further these rules can be exported to SAP Data Services for reuse and actual cleansing.
What SAP Data Services can do?
SAP Data Services can complement SAP IS in providing the required support for actual cleansing, transformation, standardization and de-duplication of data. SAP DS can either be used independently or in conjunction with SAP IS. In addition, rules can be imported from SAP IS and additional rules can be set up within SQAP DS. SAP DS can have its own set of transforms, directories (like address and company name), validations and matching rules. The data flow functionality allows for step by step cleansing and enrichment of data. Data fed once into the data flow can be taken through various steps whereby data is cleansed, standardized, transformed and enriched to improve the data quality. This clean data can then be used in the matching transform for eventually identifying the unique and survivor records among the multiple matches.
Using SAP IS + SAP DS – A typical business application scenario
In typical business scenario, data resides in the ERP or Master Data applications. The SAP DQ tools can readily integrate with source or destination systems via database connections, application connections or file type connections for data exchange.
Once data is extracted to SAP IS and formatted, business can do various kinds of profiling to assess the data quality. Next step is to understand how much a single instance of bad data is costing the business and use SAP IS to do a financial impact analysis and build a business case for the data quality problem.
SAP IS and SAP DS together can then be used to solve the data quality problem, SAP IS provides the governing and analytical abilities and SAP DS does the actual cleansing and enrichment. “Data is only as good as the underlying rules that govern the data”. Building a cleansing package with inputs from business for validations and rules governing the data provides the platform for ensuring data entering the system is good. Additionally using the Industry standard transforms like the Address, Company name, Standardization, etc. data can help further cleanse and enrich the data. Matching transforms can then be used on the cleansed and enriched data to identify duplicates and only retain unique copies of record in the system.
The cleansing package should be regularly updated with additional rules/updated rules and validations to ensure that the package is not outdated as data can transform and mature with time. More and complex rules could be built to ensure bad or redundant data is filtered and only a golden copy of the record is stored in the productive instance. The recent introduction of data quality advisor in SAP IS uses statistical analysis and content type to guides Data Stewards to rapidly develop cleansing and matching rules to improve the quality of their data assets.
There is also the flexibility of building the cleansing package in SAP IS and transporting it to SAP DS for reuse. The rules can be packaged as services so that they can be used in other processes that allow for data entry into the system. Thus data quality can be governed not just once but on a continual basis – at the point of creation of data, import of data, periodic extraction and review of data from the productive instance.
What is the ROI for business?
To understand the ROI by investing on tool and additional processes a simple impact analysis feature of the IS tool could be leveraged. By identifying the key attributes that define data, determining the cost of each bad attribute and its effect on the record, analyzing the impact of bad data on business and extrapolating it to the universe gives a sense of magnitude the bad data can have on the overall business. This when translated into potential savings and presented in a form understandable by business provides answers to questions around ROI.
To make good use of the DQ cockpit, a proper data quality or governance organization is required. Data Stewards and owners need to continuously engage with business users to understand the changing needs in data and its validation to continuously build or keep the rules updated. As data matures and expires with time, the same rules may not always hold good and the rules or validations that govern the data needs to continuously change and refine as business demands. Analysts can run daily, weekly or periodic jobs to generate reports that can help business or information stewards understand the state of data quality and take continuous measures to keep data clean and reliable.
Below are some key benefits of the tool
- An easy plug in solution that could extract in-process data at particular stage in business process, cleanse/enrich and put data back
- Rules that can be easily configured by data admins/stewards for data validation and reuse in IS and DS interchangeably
- Usage of external services for enriching data like the address directory services for different countries
- Periodic health check and dashboard view to ensure data standards and quality are maintained at appropriate levels
- Identifying duplicates , determination of survivor and maintaining history of merged records
- Financial impact analysis and calculating ROI
The SAP Information Steward and SAP Data Services are very important tools for the Stewards, Analysts, Information Governance experts and Business Users to regularly conduct health check on the quality of data and take timely corrective actions. It gives a good understanding of where their data quality stands and where to fix to get maximum benefits. It is imperative that data quality management is not a one-time exercise but a continuous one. Continuous improvement in the quality of data enabled by SAP IS and SAP DS provide the key foundational elements that enable governance and improve trust in the data infrastructure of the Organization.