Skip to Content
Technical Articles

How to extract data from files with max date and time in a day

Introduction:

Hi this scenario will help how to extract multiple flat files with max date and time in a day

EX: Different countries’ files are available in our file path. We need to extract latest file from each country.

STEP1:

The input files for each country are like below

STEP2:

Create the flat file format to identify all the files which are present in the given file path. This file format includes all the files with extension – *.txt

STEP3:

After creating the flat file. Open flat-file and enable the below option include file name column -YES

Which gives all the file names available in the file path provided in the file format.

STEP4:

Our job flow is like below.

STEP5:

First Dataflow: DF_EXTRACT_DIFF_FILES

This Data flow is developed to load all the latest files from each country based on the file names provided in the file path.

QRY_DISTINCT: Retrieve the only distinct records from the source table. Enable distinct option in query transformation

QRY_SPLIT_DI_FILENAME:  Split the file name which is the output of include file name option.

 EX file name: INDIA_20191106_1401.

Filename includes country name, date and time with “_” as a separator. To find out the latest file of each country we need identify the latest time stamp which is available in the file names.

For this we need to split file names into below three columns data:

  • NAME -mapping country name from file name by using expression

word_ext (QRY_DISTINCT.DI_FILENAME,1,’_’)

  • DATE – mapping Date column from file name by using expression

word_ext (QRY_DISTINCT.DI_FILENAME,2,’_’)

  • TIME – mapping Timestamp from file name by using expression

word_ext (QRY_DISTINCT.DI_FILENAME,3,’_’)

Below is the screen shot for splitting the file names into three columns as mentioned above

STEP6:

QRY_MAX: Identifying the maximum time stamp for each countries files by using group by clause with aggregate function MAX().

 

STEP7:

QRY_CONCATE_FILE :  After fetching maximum time_stamp data we need to concatenate all the three column which were split into column ‘FILE_NAME’ to get entire latest filename to process and also we created another column SNO mapped to gen_row_num() which will be used to load the files sequentially.

STEP8:

TGT_LATEST_DI_FILE: output of the entire flow: all countries latest file names with sequence number

STEP9:

SCR_COUNT: script contains below code to get total count of latest file names. Which is assigned to global variable : $G_COUNT from target table: TGT_LATEST_DI_FILE and also we create one more global variable: $G_LOOP assigned to value 1 which will be used to start the sequence

$G_COUNT = sql(‘Data_store_name’,’SELECT count(*) FROM TGT_LATEST_DI_FILE’);

print(‘SOURCE COUNT’ ||$G_COUNT);

$G_LOOP = 1;

STEP10:

WHILE_LOOP

The below While loop is used to process the latest file of each country until all the latest files data get loaded of each country so for this we are using the while condition $G_LOOP<=$G_COUNT(1<=Total files count)

STEP11:

SCR_DIS_FILE_NAME : script is used to get the latest file name based on the sequence number from the table which contains all the latest file details. The output file name is assigned to global variable $G_FILE_NAME which is further used in the file format file name option to process the file.

$G_FILE_NAME = SQL(‘Data_store_name’,’SELECT FILE_NAME FROM TGT_LATEST_DI_FILE

WHERE SNO = {$G_LOOP}’);

PRINT(‘SOURCE FILE IS ‘||$G_FILE_NAME);

STEP12:

DF_FILE_DATA:

The below mapping is developed to load the latest file data to the target table: TGT_MAX_FILES  which was assigned to $G_FILE_NAME In the above script

SCR_LOOP: this script is used to increase the sequence number (+1) for every file load

This is used for adding the loop conditions when loop condition satisfy

$G_LOOP = $G_LOOP+1;

print( ‘loop :’|| $G_LOOP);

 

conclusion :

This scenario will help to extract the latest file of each country in a day while we are running the job multiple times in a day, it will help to fetch the latest file based on the max date and time which are available in the file name.

 

Be the first to leave a comment
You must be Logged on to comment or reply to a post.