In a data warehouse system landscape, there could be multiple source systems feeding data to the warehouse. There could be a mix of databases (like Oracle, Sybase, MS SQL Server etc.), web services and flat files (like xls, txt, csv etc.).
Here we would be discussing on error data capture and analysis during file loading which is an integral part to any ETL process. Files could be used for data migration, delta loads or both. Processing files without errors or tackling the errors when they arise is a tedious task.
The scope of this whitepaper is an enhancement using SAP Data Services to capture and process error data during flat file upload.
Although SAP DS provides mechanism to capture the error record details to a file for analysis, a utility to format and analyze this error data is unavailable in the tool. Also, the generated error file is not easily readable and this result in greater effort to scrutinize the error record and to locate the bad data within the source file. With minor adjustments, the proposed generic solution can be extended to any flat file format (csv, txt, tab delimited file etc) and would save abundant time on issue resolution when there are more number of files to be processed.
Following business benefits are expected
- Easy identification of error data
- Achieve quick turn around on issue resolution
- Faster data migration and reconciliation process
The reusable component would perform the following:
- Extract and process the tool generated error information
- Load the error details to a database table for further analysis. The table would contain the error information, name of the source file containing the error, error record, error generated date, DS job name etc.
- Can be easily integrated to any flat file loading SAP DS job with minimal changes
Create Dataflow in SAP DS to load csv file
If error records are generated, trigger data flow to process the error data.
Transform error file with appropriate delimiter which can be processed by SAP DS
Process the file from DS and query the database table to see the records for analysis
This process helps in easy identification of error data and also in faster resolution.