Data Profiling -Column Profiling in Information steward
Hi All
This document provides a walk through over the various types of Profiling Options provided by the “Information Steward Tool” and the focus would be much on the “Column profiling”.
Following are the various types of “Data Profiling” information options available in SAP-BOIS as shown in the document below(which can be viewed in the “Profile” tab upon selecting a table):-
1)Column
2)Addresses
3)Dependency
4)Redundancy
5)Uniqueness
6)Content Type
1)Column Profiling :- This profiling helps us to analyze the overall distribution of data fields.During column profiling, the software identifies Outlier values, which are column values that differ from the majority of information provided in a column.The software then presents suggested rules that you can accept and bind to the column without having to create the rule from scratch and bind it to the column manually.
Steps for Performing Column Profiling:-
1) Select the relevant tables, using the check boxes, then Select ->Profile->Columns as shown :-
2) Enter the name for the profile, by default the “Simple profile” check box would be checked; Select the other check boxes (optional although), either you can select the check boxes displayed, which will select all the fields at a time or select specific fields, by expanding the check box of the table :-
Click “Save and Run Now” option at the bottom, this results in the profile execution in the back end.
3) Details of the Profiling Attributes:-
Click on the Settings-Icons tab in the right side corner, we would be able to view the details of the Profile Attributes, we can customize the attributes to be displayed here:-
4) View the details (results) of the various Attributes of the profile:-
Expand the “+” symbol near to the relevant table(for which Profiling was applied) .Then Just double click on the required values and below the respective results will be displayed:-
5) Meaning of Specific symbols in the Profiling:-
6) The Generated Rule-Types:-
The Highlighted icon stands for the “Rules”.
Rule_Types:-
Suggested Rules from the “Column Type Profiling”:-
Some Important facts to be kept in mind for the String related data (with null values or preceded by blanks)
1)Data should be ordered alphabetically to deduce the correct median values.
2)The Data- fields preceded with space are treated as min values, provided there are no null fields in the specific column (In case of string values) If there are Null Values, then they are treated as min values.
Some important links for overview about the Data Profiling(General Overview)
http://hpi.de/fileadmin/user_upload/fachgebiete/naumann/folien/SS13/DPDC/DPDC_02_ProfilingIntro.pdf
Overview on the Tabs in SAP Information Steward
For further reference:-
http://help.sap.com/businessobject/product_guides/sboIS42/en/is_42_user_en.pdf
http://help.sap.com/businessobject/product_guides/sboIS42/en/is_42_admin_en.pdf
That’s all I have for now..Till then Happy Learning