Skip to Content

Remove and report invalid characters from input data

Scenario:

Identify and remove all the characters in the input data apart from a defined, valid set of characters. The data fields identified with the invalid characters need to be cleansed by replacing with blank spaces. Also, the invalid data fields are to be loaded to an error report file along with the key to identify the erroneous record in the input data (primary key) and the respective column name. The error file needs to be loaded in such a way that, for every invalid column there needs to be a record in the file pointing the same.

Implementation:

The functionality is achieved using a custom function. A query transform is used exclusively for the function implementation.
Input parameters: Key to identify the input record, Column Name of the data to be cleansed, data to be cleansed
Output parameters: Cleansed data.

Process:

1. Loop through the column data
2. Check to see if characters in the input data apart from a defined, valid set of characters
3. If  found in the valid character set,
    3.1 Return the data
4. Else
    4.1 Insert the record into the error file
    4.2 Cleanse the data by replacing with blank space
    4.3 Return the cleansed data
5. Loop back and continue the process until all characters have been cleansed.

Code:

# CF_DATA_CLEANSE

#
# Error file Details
# ———————–
# $$SP_ERR_FILE_PATH : Error File Path in substitution variable
# ERR_FILE.TXT               : Error File Name

#

# Input parameters
# ————————
# $PV_KeyValue   : Primary key
# $PV_FieldName : Field Name
# $PV_FieldVal     : Field Value

#

# Local Variables
# ———————-
# $LV_FieldVal_Trim  : Trimmed input col val
# $LI_StrLen              : Length of the trimmed input col val
# $LI_Cntr                 : Char position identifier of the input col to do iteration
# $LI_Chr                  : Current char being checked
# $LI_Chr_Ascii         : Ascii value of current char

# Initialize and define variables
# —————————————–
$LV_FieldVal_Trim   = RTRIM_BLANKS( $PV_FieldVal );
$LI_StrLen               = LENGTH( LV_FieldVal_Trim );
$PV_FieldName       = RTRIM_BLANKS($PV_FieldName);

$LI_Cntr             = 1;
$LI_Chr              = ”;
$LI_Chr_Ascii     = 0;

# Loop through the char set of the input column to check for invalid characters
# —————————————————————————–

while ( $LI_Cntr < $LI_StrLen +1)
begin

       # Iterate through every character of the input field and find its ascii value
       # —————————————————————————
       $LI_Chr = SUBSTR(LV_FieldVal_Trim,$LI_Cntr,1);
       $LI_Chr_Ascii = ASCII( $LI_Chr );

       # If the char is in valid char list do nothing and check the next char
       # ——————————————————————–
       if ($LI_Chr_Ascii >= 32 and $LI_Chr_Ascii <=126 )
       begin
              #Do Nothing
       end

       # If the char is not in valid char list cleanse the data and make and entry in the error  
       # file with the error field details
       # ————————————————————————————–

       Else
       begin
              # Replace invalid chars with blanks
              # ———————————
              LV_FieldVal_Trim = REPLACE_SUBSTR(LV_FieldVal_Trim,$LI_Chr,’ ‘);

              # Insert Error record to the error file
              # the error file should be cleared at the begin of the job
              # ——————————————————–
              EXEC(‘cmd’,’ECHO KEY VALUE:’||RPAD($PV_KeyValue,10,’ ‘)||’ FIELD NAME:’||RPAD($PV_FieldName,30,’ ‘)||’ ERR:INVALID CHR VAL:’||RPAD($PV_FieldVal,60,’ ‘)||’ >> \[$$SP_ERR_FILE_PATH]ERR_FILE.TXT’);
       end

       $LI_Cntr = $LI_Cntr + 1;
end

# Return processed field val
# ————————–
Return LV_FieldVal_Trim;

Thanks & Reg

Feddy George

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply