Java Mapping: Unzip file and create additional attachments
Background
I had a requirement recently which dealt with incoming Zip files. Basically, there are a few files in the Zip file and each of them will be used differently in subsequent processing, i.e. fileA will be processed by interface A, fileB by interface B. In order to achieve this, it is important to preserve the filename for each file in the Zip file. Additionally, a copy of original Zip file needs to be saved in the target directory too.
This requirement could not be achieved by using the standard module PayloadZipBean because:
- The main payload is replaced with the contents of the first file in the Zip file
- The filename of the first file that replaces the main payload is no longer available
In this article, I will share the custom Java mapping that was developed to fulfill this requirement. The core logic is to:
- Unzip main payload and create additional attachments for each file in Zip file
- Retain filename of each file
Source code
Below is a snippet of the portion of the code which deals with the unzipping and attachment creation.
ZipInputStream is used to read the compressed data. The logic loops through getting each ZipEntry to retrieve the content of each file. The filename is retrieved and used to create the additional attachment for the message.
// Unzip input file
ZipInputStream zis = new ZipInputStream(new ByteArrayInputStream(content));
ZipEntry ze = null;
// Loop through all entries in the zip file
while((ze = zis.getNextEntry()) != null) {
byte[] zipContent = getInputStreamBytes(zis);
// Create attachment
String name = ze.getName();
String ctype = "text/plain;name=\"" + name +"\"";
if (outAtt != null) {
Attachment att = outAtt.create(name, ctype, zipContent);
outAtt.setAttachment(att);
}
zis.closeEntry();
}
zis.close();
The full source code can be found in the following public repository on GitHub:
GitHub repository for UnzipAndAttach
Additionally,the JAR file is also available for direct download from GitHub:
com.equalize.xpi.esr.mapping.jar
Testing
Below are some screenshots of a test run using the Java mapping.
Firstly, we have a Zip file in the source folder. This Zip file contains two files within it.
From the message log, we can see that the original payload (logged with MessageLoggerBean) does not contain any attachments.
After the mapping step (AM) is performed, there is now two additional attachments in the message.
The message is then delivered to an SFTP receiver channel (with Store Attachments checked.) The channel processes all three files – main payload and 2 attachments.
Finally, we can see the 3 files in the target folder. The name of the first file is retained as is, while the two attachments have filenames which are concatenation of the main payload filename and the attachment name, i.e. <main_filename>_<attachment_name>
Reference
For further reference on dealing with Zip files, the following article covers the reverse flow – incoming payload with multiple attachments are compressed into a single Zip outgoing payload
What should be the source and target structure in this case?
Hi Lalit
As this is Java mapping, the source structure and target structure does not matter because no parsing of XML is involved. I normally just create structures with a single field of type xsd:string for Java mappings.
Rgds
Eng Swee
Hello, Eng Swee.
First of all, thanks for this post!
I'm trying to use your class UnzipAndAttach as the only step (Java Class Mapping) at the Response in the Operation Mapping for the Extract v1 SAE File (Concur API), but I don't know if I implemented correctly (I didn't use a dummy message mapping as you told)... and now i'm getting this error on the response message mapping:
StreamTransformationException triggered by application mapping program com/equalize/xpi/esr/mapping/java/UnzipAndAttach; Exception: invalid stored block lengths
I was just wondering: does this class works in the way that I've implemented for this API? Or should I do something different for this purpose?
Thanks in advance!
Hi Eng,
Thanks for this nice blog on ZIP content ...
Since it has been a while I was wondering whether you had any 'experience' in the meanwhile on the sizes of ZIP files ...
What about processing ZIP files of 100 MB and more ? Can PI / PO handle this properly ? I always understood that files of 2MB and more are already considered 'large files' by SAP ( cft tuning guides, etc ) ...
I am currently looking at a scenario where potentially very large ZIP files need to be deconstructed, manipulated and reconstructed again ... I am very doubtfull this can be handled by PI with mappings which require to read the whole contents as a whole ( like yours ) ... but then again I don't see it working properly 'out of the box' in any other way ...
Steven
Hi Steven
I didn't get around to handling such large ZIP files in PI.
I think the guides haven't been updated for a while and these days the limit is actually higher than 2MB. I just checked below on our production system and it processed an 11MB XML file for a complex interface that involves multi-step mappings (even with an XSLT step!).
However, this is a very subjective matter and it really depends on the sizing and hardware of the system, and the only way to find out the limits is to perform stretch tests on the system.
Saying that, 100MB does seem pretty large for a ZIP file and it would likely be much larger once decompressed. The challenge would be the manipulation/mapping part to avoid an out-of-memory errors when loading/parsing through the file contents.
Any chance for the contents of the ZIP file to be broken into smaller logical units that can be processed independently?
Regards
Eng Swee
Hi Eng,
Thanks for your 'fast response' ...
Yeah I understand that the sizes are probably a bit outdated and are more depending on the actual sizing of the hardware involved ... nevertheless - like you agreed - very large ZIP files are still 'cumbersome' to manage and require special attention in terms of mapping etc ... Even a well tuned system might get some impact on processing these ... I have seen huge garbage collection processes kicking off which basically caused the system to 'halt' doing anything else ... it was not 'dead' but it took a while before it started processing anything again ...
For now I think we will resort to OS commands to handle the ZIPPING process itself so we can deal with the individual content - which we require to do anyways as we will have to calculate digest values for each one - and use a BPM or some ABAP hacking to get everything back into place in a folder which we will ZIP back into 1 file ...
Kind Regards,
Steven
Hi Steven
Some further thoughts on this.
i) Have you tested if the performance impact is caused by the unzipping process or the processing of the multiple individual content? If not, maybe you can try just unzipping without further processing using either a Java mapping or the standard PayloadZipBean.
ii) Using BPM might not be a good idea. ccBPM is very performance sensitive due to the persistence of multiple copies of the payload, and although NW BPM is better, it is still not good to load large payloads into the BPM context (thus the reason for the Claim Check enhancements)
Regards
Eng Swee
Thanks for your extra thoughts Eng ...
Yeah I agree on the BPM ... Been using them since 7.1 and indeed lots of copies of payloads etc ... The 'problem' is that we need to collect a huge amount of individual PDF documents ( potentially being ZIPPED to facilitate the 'delivery') ... calculate digest values for all those PDF's, use that information to make a kind of 'index' file including filenames, digests, etc ... make a seperate hash file of that index file and then combine all that into a ZIP file which then needs to be SFTP'ed ... You can see that is quite a challenge to get that into 'standard' PI/PO ...
Thanks for the link to the pattern ... very interesting !
Steven
I did a simple test using PayloadZipBean to unzip the following 50+MB Zip file.
http://archive.apache.org/dist/poi/release/src/poi-src-3.10.1-20140818.zip
It managed to unzip over 8000+ attachments in about 30 seconds, and this is in a development server.
Thanks Eng ! So that resulted in 1 message in PI with 8000 payload attachments right ?
Kr,
Steven
Yes, it'll be 1 message with 8000+ attachments.
Gosh, yours sounds like a "problem" with a capital P!!! 😯
If you want, I can assist you by providing an adapter module that unzips the file into multiple child messages, by combining the logic here with the logic in the following module
AttachmentSplitterBean - Split attachments into child messages
Let me know if that's something that might be helpful.
However, there is still the challenge of combining them back together. BPM or otherwise, how would you determine when all the individual content have been fully processed in order to perform the final zipping?
Hi Eng,
Thanks for the feedback and your offer on the java adapter module... much appreciated but not yet required ... We will try to achieve this without custom java adapter modules ( not really what my customer wants ).
Yeah it is a complex flow ... We would probably send messages in batches using a 'threshold' ( on BPM or custom coding ) of a max number of messages per batch and/or wait for X minutes ...
It requires alot of 'creativity' to get this one done but we will manage ... 🙂
Ciao,
Steven
No prob. All the best! 😉
Hi Eng Swee ,
I have one requirement ,I posted in SCN but no exact solution I got .
Can you help on this : https://answers.sap.com/questions/374708/how-to-extract-xaz-files-in-sap-pi.html ??
Thanks in adavance .
Best ,Shiva
Hi Eng Swee Yeoh ,
I've a requirement to unzip files in the message mapping using Java mapping and deposit the unzipped files to the target folder via NFS Transport Protocol in the receiver communication channel. I've tried with the below codes. The files were able to be unzipped but the content of the files were merged together as one and the filename was taken from the last filename of the zipped folder.
I've tried to google around but not able to find any solution. I'm not sure if this is a limitation of the SAP PO using File System (NFS) as the transport protocol at the receiver communication channel. Hope you can help me with this. Thanks.
~~~~~~~~~~~~~~~~~~~~~Code~~~~~~~~~~~~~~~~~~~~~
package company.com.messagemapping;
import com.sap.aii.mapping.api.*;
import java.io.File;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
public class MM_SanctionendPartyList_DataProvider_to_GTS_Java extends AbstractTransformation {
@Override
public void transform(TransformationInput transformationInput, TransformationOutput transformationOutput) throws StreamTransformationException {
byte b[] = new byte[8192];
try {
java.io.InputStream inputstream = transformationInput.getInputPayload().getInputStream();
java.io.OutputStream outputstream = transformationOutput.getOutputPayload().getOutputStream();
//Unzip and write required file to output.
ZipInputStream zis = new ZipInputStream(inputstream);
ZipEntry ze = zis.getNextEntry();
while (ze != null) {
java.util.Map mapParameters = transformationInput.getInputHeader().getAll();
mapParameters.put(DynamicConfigurationKey.create("http://sap.com/xi/XI/Dynamic", StreamTransformationConstants.DYNAMIC_CONFIGURATION), "");
DynamicConfiguration conf = (DynamicConfiguration) mapParameters.get(StreamTransformationConstants.DYNAMIC_CONFIGURATION);
//Set target File Name and Folder.
conf.put(DynamicConfigurationKey.create("http://sap.com/xi/XI/System/File", "FileName"), ze.getName());
conf.put(DynamicConfigurationKey.create("http://sap.com/xi/XI/System/File", "Directory"), "/DEV/WRICEF/Inbound/SanctionedPartyList/");
int len = 0;
while ((len = zis.read(b, 0, b.length)) != -1) {
outputstream.write(b, 0, len);
}
outputstream.flush();
zis.closeEntry();
ze = zis.getNextEntry();
}
zis.closeEntry();
zis.close();
} catch (Exception exception) {
getTrace().addDebugMessage(exception.getMessage());
throw new StreamTransformationException(exception.toString());
}
}
}
Regards,
Shawn