Data cleaning forms to be one of the important steps in “Data Preparation”. SAP BODS is one of the replication options recommended by SAP for loading data into SAP HANA.
In this document, I will be sharing certain validation checks required along with comments on how to handle errors. I have listed down below some most common checks expected while loading data into HANA.
1) Allowed character checks
a) Allowed character for CHAR fields
b) Allowed character for Numeric fields
c) Allowed character for Date fields
2) File existence check
3) Mandatory field check
4) Referential Integrity check
We have one transform named Validation Transform in BODS, which we can use for to validate the data based on validation ( even multiple validations possible ) rules.Now let us see our first validation rule and see how can we use Validation Transform and collect the error records.
.1) Allowed character check:
a) Allowed character for CHAR fields:
Only the characters A-Z,a-z and 0-9 are to be allowed.
I have the following data in the source
As we can see EMPLOYEE NAME contains a value “T@ngudu”. Where *@ is not a allowed character. So while loading this into HANA we need to ensure that this particular record is failed and needs to be handled by the support team.
To handle this, have configured the validation transform as shown below:
Now when i run this job, All the records which *PASSED *the validation rule will be loaded to the actual table and the records which FAILED will be collected in the FAIL table as shown below.( In our case the record with EmployeeName as “T@ngudu” is collected )
b) Allowed character for Numeric fields:
Only the characters 0-9 are allowed.
For this validation rule, everything remains the same as mentioned above but since here even A-Z or a-z should not be allowed leads to the formula change as mentioned below:
match_regex(Query.EMPID ,literal('[0-9]*'), NULL ) = 1
c) Allowed character for Date fields:
To check and ensure that the date column is coming in the required format. For example, the expected format in the target table is ‘YYYY-MM-DD’. If the date is coming in any other format or even if unwanted
Example the source contains a field where the date column has wrong data as shown below:
The following custom condition has to be entered for validating the date as shown below:
On executing the job as shown below, the field containing wrong data format is collected into Failed records table.
2) File existence check
You will encounter some situations like you need to check for a particular file before the loading starts. And let us take in our case that we are appending date to the file name to distinguish it from the previous files. Please find the below the steps to for check if the particular file exists or not:
1) Create a variable i.e $FileName
2) Write the below code in the Script editor before the data flow
$FileName= 'G: \TableName' || to_char( sysdate( ), 'YYYYMMDD' ) || '.txt'; Print ( $FileName ); While ( file_exists ( $FileName ) = 0 ) Begin print( 'Waiting for File' ); Sleep(1000); End print ( 'File Received Successfully' ); print ( 'File Processing Started' );
3) Mandatory field check:
We will have cases where we expected the field to be present (mandatory) . Like in case of the primary key of the table.In such cases to ensure that field is not empty we can use IS NOT NULL condition in Validation Transform as shown below:
4) Referential Integrity check:
If we want to load data into transaction table only if the corresponding master data entry is present in the master table ( something which is handled with SID’s in SAP BW ). We can use Exists in Table as shown below:
These are some of the common data validations you come across while trying to clean the data. (These rules depend again on the scenario you are working on).
I tried to finish the error handling in the same document but the blogger is not supporting hence keeping the error handling in the below mentioned document.
Thanks for your time. Hope this document helps you 🙂
Krishna Tangudu 🙂