Skip to Content

Business Requirement:

I have a requirement where source files [excel workbooks] with dates will be placed in global folder and we have to get the data from the excel workbooks in sequence. The excel workbooks will be placed in irregular manner. For ex suppose 1st file with date is placed today and next file would be placed after 4 days and next may be placed after 3 days and so on.

We have to consider the below during development

1) There will be no job scheduling [only manual execution]

2) The data extraction should be done in sequential manner.

3) We have to make sure that the extraction of one particular excel workbook data should be done only once.

4) Once the data extraction has been done for one file it shouldn’t allow extracting the same file again in next job.

5) All the excel workbooks will be placed in the same folder and will not be removed from the folder after execution.

6) If any case the job was failed, in re-execution of the job the appropriate unprocessed file should process first and then the subsequent files.

Design


I have used 2 WF & 2 DF , 2 script files (FileNameExtract[in 1st WF] & Endscript[in 2nd WF]) and few global variables

First Workflow/Dataflow: Collects all the filenames prefix with “Date” in the global folder and stores in FIRSTTABLEDATA table.

Second Workflow/Dataflow: Extracting the data from input source file, we will get the right file name from 1st WF to extract the data [identifying the file name logic  implemented in the FileNameExtract script file].

The below are the control tables used to manipulate the files.

Tables.png

Table2.png

The below are the WF’s used

WorkFlows.png

FileNamesGet_WF

FilenameGetWF.png

First Dataflow   FileNamesExtract_DF

Collects all the file names from global [source] folder, after drag and drop the source file in designer work area, mention “include file name” column feature in the input source with some column name which holds the source file information so that all the file names will be captured in the table FIRSTTABLEDATA .

1stDF.png

Filenamecolumn.png

Second Workflow DataExtract_WF

Second Workflow DataExtract_WF.png


Second Data Flow  ExtractingData_DF

The file name which was assigned in the 1st WF will be used as input file name in the source file in the below DF and the data will be inserted in the target table. Checking of redundant file names, assigning right filename logic, sequencing file names are implemented in the FileNameExtract script file.

Once this DF successfully executed the status of FileInterTable should be set to “Done”. This will be done in EndScript file.

Second Data Flow  ExtractingData_DF.png

Please find the below Script Files

FileNameExtract

#################################################################################################

#Getting file counts from FIRSTTABLEDATA

#################################################################################################

$GMassDataDistinctCount=sql(‘MDataStore’,’SELECT COUNT(DISTINCT t1.di_filename) FROM FIRSTTABLEDATA t1′);

print(‘IN FILENAME EXTRACT SCRIPT’);

print($GMassDataDistinctCount);

while ( $GMassDataDistinctCount > 0 )

                begin

                print(‘IN WHILE’);

#Getting file from the FIRSTTABLEDATA

$GJobName=sql(‘MDataStore’, ‘select distinct t1.di_filename  from FIRSTTABLEDATA t1 where t1.di_filename =(select min(t2.di_filename) from FIRSTTABLEDATA t2)’);

$GJUST=to_char( sysdate( ),’dd-mon-yy’);

#################################################################################################

#If Filename is already exists in the table,update with some value in extra column if the file name

#is not exist then inserting a new record.

#################################################################################################

sql(‘MDataStore’, ‘MERGE INTO FileInterTable USING dual ON ( “FILENAME”={$GJobName} )WHEN MATCHED THEN UPDATE SET “RUPDATE”=\’Y\’ WHEN NOT MATCHED THEN insert (“FILENAME”,”STATUS”,”RFLAG”,”RUPDATE”,”RDATE”) values({$GJobName},\’INPROCESS\’,\’N\’,\’N\’,{$GJUST})’);

sql(‘MDataStore’, ‘delete from FIRSTTABLEDATA where di_filename={$GJobName}’);

$GMassDataDistinctCount = $GMassDataDistinctCount-1;

print(‘LAST WHILE’);

print($GMassDataDistinctCount);

end

$GJUST=’N’;

$GJobName=sql(‘MDataStore’, ‘select distinct t1.FILENAME  from FileInterTable t1 where t1.FILENAME =(select min(t2.FILENAME) from FileInterTable t2 where STATUS=\’INPROCESS\’ and RFLAG={$GJUST})’);

print($GJobName);

# Assigning the file name,this file name will be used in excel metatdata used in second workflow/dataflow.

$GToBeExecutedFileName=$GJobName;

print($GToBeExecutedFileName);

///////////////////////////////////////////////////////////////////////////////////////////////////

Endscript

print(‘In end Script Successfully executed the job’);

print($GToBeExecutedFileName);          

###########################################################################################################

# Updating the control table after succefful extracting the data from the file

###########################################################################################################

sql(‘MDataStore’,’update FileInterTable set STATUS=\’DONE\’,RFLAG =\’Y\’ where FILENAME={$GToBeExecutedFileName}’);

print($GToBeExecutedFileName);          

print(‘Record updated in end Script Successfully’);

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply