Data Profiling With SAP Business Objects Data Services
Data profiling started off as a technology and methodology for IT use. But data profiling is emerging as an important tool for business users to gain full value from data assets. When given the right tools and practices for data profiling, business users should quickly identify inconsistencies and problems for data, before it is used for reporting and intelligence purposes.
Data profiling is an important preliminary step to data modeling. It’s also used in data quality improvement programs and master data management initiatives to help “ensure the consistency of key non-transactional reference data” used across the enterprise
In the long run, data profiling can be used both tactically and strategically. Tactically, it can serve as an integral part of data improvement programs. Strategically, it can help managers determine the appropriateness of different data source systems under consideration for deployment in a particular project.
Data Profiling is implemented over BODS Designer.
Prior actions to be performed before profiling over data.
Step1: Create a Profiler Repository using Repository Manager.
Step2: Assign the Profile Repository to a Job Server using Server Manager.
Step3: Configure the Profiler Repository in BODS Designer.
Step4: Configure the Profiler Repository in BODS Management Console.
Profiler repository stores following information:
1 – Profiler tasks, that are created when a profiling request is submitted from Designer, from the Management Console you can monitor the progress and execution of these task
2 – Profiling result (data), when you profile columns of tables, the summary and detail information will be stored in profiler repo, along with the sample data for each profiling attribute (this from where the profiler results are displayed in Designer).
There are 2 types of Data Profiling:
1. Normal & Detail Profiling.
2. Relationship Data Profiling.
1. Normal& Detail Profiling using BODS:
This will let you understand what are the unique values present in the column, min / max values, how many null values / blanks, their percentage, max /min / avg string length etc.
Login to BODS Designer ->
1) Select any table or file formats from Object Library for Profiling, Right click & you find the below tab.
2) Here we have an Option “Submit Column Profile Request”, select the option.
3) On selecting option, below tab opens.
4) Here we have all the columns available in the file or table used for profiling.
5) We can select all the columns or specific columns on which profiling have to be performed.
6) If we want DETAILED Profiling, click on the checkbox available for respective columns.
7) Once done, click SUBMIT button.
8) On submission, profiler server now starts the process of profiling the data. The status of the profiling of specific file or table can be views from Management console or we get a status tab for each task completed immediately after above process.
We can change the Statistics of Profiling from Management console.
9) Once Profiling completed, we can go to designer & click view data on file or table.
10) Here we find a tab (Profile Tab) which shows the latest profiling information performed on that table. We can drill down on the parameter values to check the values in detail.
11) Similarly, this result can be checked from database Profiler repository tables.
2.Relationship Profiling using BODS:
Lets you know, what percentage / no: of records of a table A is present in table B.
In the both these, there are basic / detailed profiling. For the latter, some attributes will be more
- Relationship Profiling shows the percentage of Non Matching values in columns of two sources.
- The sources can be tables, flat files, or a combination of a table and a flat file.
- Relationship profiling always require 2 tables or flat files to perform the profiling at any particular operation.
Login to BODS Designer ->
- Select 1 table or file formats from Object Library for Profiling, Right click & you find the below tab.
- Here we have an Option “Submit Relationship Profile Request with”, select the option.
3. Upon selecting the Option, you have to just select other table or file format to pop a below window, where you can perform the Profiling requirements.4. Here we can provide the join functionality on respective columns from 1st table to 2nd table.
- We can manually make the Join relation or if the source/target table consists of Primary or Foreign Keys then, we can just click the Button “Propose Relation”, which helps to join automatically for respective columns.
5. We can save all the columns data or specific Key mentioned columns data, which is mainly useful when we check the Profiled Data.
6. Once this is done, click “SUBMIT” button.
7. Below is the Window, which shows the status of Profiling, weather it is a Relationship or Column based Profiling.
8. Once Profiling completed, we can go to designer & click view data on file or table.
9. Here we find a tab (Relationship profile Tab) which shows the latest profiled information performed on table 1 over table 2. We can drill down on the parameter values to check the values in detail.
10. Here, 66.67% of records signifies that, they are not present in table – P2 but, available in
Table – P1.
11. And , 80.00% of records signifies that, they are not present in table – P1 but, available in
Table – P2.
12) We can even drill down the data that is Non-Matched upon clicking on the percentage values.
Benefits of Data Profiling:
The benefits of data profiling is to improve data quality, shorten the implementation cycle of major projects, and improve understanding of data for the users. Discovering business knowledge embedded in data itself is one of the significant benefits derived from data profiling. Data profiling is one of the most effective technologies for improving data accuracy in corporate databases. Although data profiling is effective, then do remember to find a suitable balance and do not slip in to “analysis paralysis”.