Information Steward Failed Data Reporting
In this blog I’m going to focus on how one can analyze the failed data more easily and how to get around some of the limitations.
Within Information Steward you would create rules that test your data for uniqueness, completeness, accuracy, etc.
However, what happens when the data fails and does not pass the rule?
If you have done some work with Information Steward you would already know that you can view the failed data and trends associated.
But by default the failed data displayed is limited to 100 records.
Now this is fine if you your data that has failed is less than 100 records, or if you are using it to just monitor the quality of the data.
But in other circumstances you will want to be able to
- export the failed data and correct the incorrect data in the source system, especially if the source system is ERP
- view all the incorrect data
- create reports on this data
Now there are two things one can do to get more failed data
- Increase the sample size, limited to a maximum of 500 records.
- Save the failed data to a seperate database
Bear in mind increasing how much failed data must be saved as mentioned in point 1 will increase the size of your CMC database.
So lets look at how we can do this.
Increase the sample size
To increase the sample size you can increase the size when calculating the score. By default it is always 100 records. You will have to change this each time when calculating the score.
Alternatively, you can set the default to a higher number in CMC so that when calculating the score it automatically shows a new default number that is higher. Log into CMC, navigate to Applications.Then configure Information Steward.
Then change the default sample size
Unfortunately anything above 500 records will give you an error. The maximum allowed is 500 records.
You will need to restart relevant servers. But now your default sample size should be larger. You will have more failed data but limited to the maximum of 500 records.
Save the failed data to a seperate database
Create a connection to a database where you would like to store the failed data, but ensure connection type is “For data that failed rules”.
Now when you run your rules you will have the connection name available to save the failed data.
Next time you calculate your score and select to save the failed data, the data will be stored in the database you selected. It will create the table and populate the table with failed data. From there you can build universes and reports off of the table. This will allow you to view all the failed data and make additional reports.
Hope the above helps 🙂
Follow me on twitter @louisdegouveia for more info on EIM.
I have set up a separate database for failed data as described here. However, I must ask for some advice on interpreting the data.
It is my understanding that every record that is in the failed data database failed some rule. However, how does one see which rule the record failed? I can't seem to locate the column for this in any of the tables that were exported.
For all the failed data 2 tables get created *_FR and *_FD (Failed Record and Failed Data).
You can use simple join on IS_GEN_ROWDID and RUN_ID to get the details of which rule has failed and on which column.
Good one. I am looking to send failed data into separate files. I dont see anything that says this feature is supported?
Also, while sending failed data to a database, say SAP HANA - I don't see an option (in CMC failed data connection properties) to override the schema to create the *_FD and *_FR tables in? I'd like to send the failed data to an Audit/Data Quality schema, not the connection user default schema.
Please let me know if it's possible "to send failed data into separate files".