Redundancy Profiling in SAP BOIS-Information steward
Redundancy Profiling in one of the profiling types out the six different profiling available in SAP Information Steward. It’s basically used to measure the amount of repeated data, in other words the degree of data overlap between two data sets.
The output of this profiling status is represented by the means of the Venn Diagrams:-
An Overall View of the Information Steward Window after Redundancy Profiling:-
To have a deeper insight , we could double click on the Green tick marks, and it will open up the respective windows.
Following snapshots will give an overview of the Redundancy Profiling:-
1) For Redundancy profiling we got to select two tables for comparison,otherwise it will throw an error,Select any two relevant tables from the available connections(we can create own also), Choose “Profile->Redundancy”:-
2) Upon Clicking the “Redundancy” option, the following window will pop up, where we will choose the columns for comparison:-
We can select multiple columns for comparison , accordingly the results will get displayed.
1) CASE 1: Selecting the single column for comparison:–
Click Save and Run Now, then the profiling will start running in the back ground, with a little watch symbol along the sides of the tables chosen for comparison.Once the watch symbol disappears it will display the results on the right hand side with tick marks being displayed , as shown in the very initial image at the beginning of the document. We got to double click on both the tick marks (one at a time) to view our redundancy, profiling results.
If this window, does not appear then check whether the “Advanced” tab is selected or not.Go to “ProfileResults-> View->Advanced ” as shown below:-
The Redundancy Results:-
a) With respect to AC_COUNTRIES:-
b)With respect to AC_REGIONS :-
Here I have highlighted few areas, to point out some facts about the variations in the results display in both the cases. The results looks similar to the concept of “Left Outer Join” and “Right Outer Join”. In the First Case, the matched values(based on AC_COUNTRIES and AC_REGIONS) available in the AC_COUNTRIES columns where only 15 records from the LANDX(the column chosen during the Redundancy Profiling) are displayed.
However when we select the Profiling results of the AC_REGIONS, it displays both the matching records from both tables and relevant columns as well as the records from the current table which may be repeated or distinct (as highlighted).
2) CASE 2:- (Selecting Multiple columns) When the data across the selected columns mismatches:-
Well the database in this case is very large and contains lots of data loaded in it. Hence I am sharing only a part of the columns to be matched for:-
Entering the multiple data in the Redundancy Profiling Window:-
As we can see in the above case, the columns have no matching data. Hence the results of profiling them will be set of disjoint Circles as a part of Venn Diagram. We can view the profiling results below:-
Some Important links for reference:-
See you next time..Till then Happy Learning