Serialize Files Transfer with SFTP Adapter in CPI using General splitter & Poll Enrich
This blog describes a scenario where SAP generates several extract files daily which should be transferred to an external SFTP location for feeding into IBP (non-SAP) product. Because of limitations with how the product can load the extract files, there was a need to generate an extract.done(zero-byte) file to mark that all files are transferred completely to the target location.
Generally, if you have used SFTP sender & receiver adapter with middleware (PI or CPI) before you would know it cannot guarantee the sequence of files transfer as the process is asynchronous & message created for each file is not related to other nor does it creates any event post transfer is complete. Additionally, even if you have a source system write this zero-byte file at last through a series of background jobs from SAP or a source system generating other files for feed, there is no guarantee it will be transferred last.
In my role as Integration Architect to design a solution using CPI for this requirement, I have used a configuration file containing a list of file names that can be executed in parallel & what needs to be serialized (i.e., extract done file) and used general splitters & poll enrich to achieve the requirement.
Here is my configuration file structure, <file> nodes under <parallel> node can be executed in parallel and <file> node under different <parallel> nodes should be serialized to previous <parallel> node files. I am showing only 2 files here for example, but it will work for any number of files.
The process is initiated through an SFTP sender adapter reading the configuration file through a scheduler with processing mode Read lock strategy as Done File Expected. This ensures that the CPI process is initiated only when the SAP is finished placing all the files for the IBP feed.
I am using General Splitter to first split the messages for <parallel> with XPATH expression ‘//parallel’
Then, the second general splitter component further splits each individual message generated by the first splitter using XPATH ‘//file’ and enabling parallel processing to transfer the files in parallel.
Using content modifier to extract the file name from config file which will be used in SFTP sender in process call inside poll enrich component later
Here is how my local process call looks like
Poll Enrich is configured in replace mode as each of our files needs to be transferred without modification.
In the SFTP adapter configuration, I have used the filename property captured in the main process flow to read specific files from our list of files in config.xml. Here, the read lock strategy ‘None’ is used as I have ensured all the files are completely available in the source location through process initiation via a done file in sender SFTP in the main process.
Then I am using Router to ensure if any file from our config.xml is not available in the source location through a standard property SAP_PollEnrichMessageFound check so that it doesn’t create an empty file in target if this property is ‘false’.
I am using a groovy script to set the body as null for extract.done file before placing it at the target location. As I am using this file as a trigger from my main process, this file is not available to be read in Poll Enrich. So if I don’t explicitly set the body to null, config.xml content gets propagated in the extract.done file created in the target location.
Receiver SFTP adapter configuration is straightforward, the only point to note here is Camel File properties are not used in the channel because they still contain the file name picked from the SFTP sender (i.e., config.xml) in the main process
Let’s see the execution runtime on how the message is split & processed.
The first splitter created 2 split messages for path //parallel
The second splitter further created 2 splits messages for the 1st split message from the first splitter
The second splitter also created 1 split message for the 2nd split message from the first splitter
As you can see from the timestamps, bigger files are written before the zero-byte file is transferred to the target location
The above message took ~48 secs to process both files with parallel processing for KIN_Parts.txt (~100MB) & KIN_Part_UOM.txt (~70MB) file and sequencing extract.done file after that.
Hope this helps! Please share your feedback and comments.
Thanks for your blog.
We have similar requirement, we have more than 100 files of *.csv,*.tab and *.success(File name is unique) need to be send in sequence. I tried this blog using * in the file name but it is picking only one file for each type. First we need to send all more than 100 .csv files next .tab then next .success. at the end extract.done.
Could you please help if you have any idea.
Are your file names dynamic? If not, specify the individual file names in config.xml file and try.