Skip to Content

Background

I had a requirement recently which dealt with incoming Zip files. Basically, there are a few files in the Zip file and each of them will be used differently in subsequent processing, i.e. fileA will be processed by interface A, fileB by interface B. In order to achieve this, it is important to preserve the filename for each file in the Zip file. Additionally, a copy of original Zip file needs to be saved in the target directory too.

This requirement could not be achieved by using the standard module PayloadZipBean because:

  • The main payload is replaced with the contents of the first file in the Zip file
  • The filename of the first file that replaces the main payload is no longer available

In this article, I will share the custom Java mapping that was developed to fulfill this requirement. The core logic is to:

  • Unzip main payload and create additional attachments for each file in Zip file
  • Retain filename of each file

Source code

Below is a snippet of the portion of the code which deals with the unzipping and attachment creation.

ZipInputStream is used to read the compressed data. The logic loops through getting each ZipEntry to retrieve the content of each file. The filename is retrieved and used to create the additional attachment for the message.


   // Unzip input file
   ZipInputStream zis = new ZipInputStream(new ByteArrayInputStream(content));
   ZipEntry ze = null;
   // Loop through all entries in the zip file
   while((ze = zis.getNextEntry()) != null) {
    byte[] zipContent = getInputStreamBytes(zis);
    // Create attachment
    String name = ze.getName();
    String ctype = "text/plain;name=\"" + name +"\"";
    if (outAtt != null) {
     Attachment att = outAtt.create(name, ctype, zipContent);
     outAtt.setAttachment(att);
    }
    zis.closeEntry();
   }
   zis.close();

The full source code can be found in the following public repository on GitHub:

GitHub repository for UnzipAndAttach

Additionally,the JAR file is also available for direct download from GitHub:

com.equalize.xpi.esr.mapping.jar

Testing

Below are some screenshots of a test run using the Java mapping.

Firstly, we have a Zip file in the source folder. This Zip file contains two files within it.

/wp-content/uploads/2015/01/infile_622834.png

From the message log, we can see that the original payload (logged with MessageLoggerBean) does not contain any attachments.

/wp-content/uploads/2015/01/ori_622838.png

After the mapping step (AM) is performed, there is now two additional attachments in the message.

/wp-content/uploads/2015/01/mapped_622839.png

The message is then delivered to an SFTP receiver channel (with Store Attachments checked.) The channel processes all three files – main payload and 2 attachments.

/wp-content/uploads/2015/01/log_622840.png

Finally, we can see the 3 files in the target folder. The name of the first file is retained as is, while the two attachments have filenames which are concatenation of the main payload filename and the attachment name, i.e. <main_filename>_<attachment_name>

/wp-content/uploads/2015/01/output_622841.png

Reference

For further reference on dealing with Zip files, the following article covers the reverse flow – incoming payload with multiple attachments are compressed into a single Zip outgoing payload

Attachments zipping during Mapping

To report this post you need to login first.

12 Comments

You must be Logged on to comment or reply to a post.

    1. Eng Swee Yeoh Post author

      Hi Lalit

      As this is Java mapping, the source structure and target structure does not matter because no parsing of XML is involved. I normally just create structures with a single field of type xsd:string for Java mappings.

      Rgds

      Eng Swee

      (0) 
  1. Steven De Saeger

    Hi Eng,

    Thanks for this nice blog on ZIP content …

    Since it has been a while I was wondering whether you had any ‘experience’ in the meanwhile on the sizes of ZIP files …

    What about processing ZIP files of 100 MB and more ?  Can PI / PO handle this properly ?  I always understood that files of 2MB and more are already considered ‘large files’ by SAP ( cft tuning guides, etc ) …

    I am currently looking at a scenario where potentially very large ZIP files need to be deconstructed, manipulated and reconstructed again … I am very doubtfull this can be handled by PI with mappings which require to read the whole contents as a whole ( like yours ) … but then again I don’t see it working properly ‘out of the box’ in any other way …

    Steven

    (0) 
    1. Eng Swee Yeoh Post author

      Hi Steven

      I didn’t get around to handling such large ZIP files in PI.

      I think the guides haven’t been updated for a while and these days the limit is actually higher than 2MB. I just checked below on our production system and it processed an 11MB XML file for a complex interface that involves multi-step mappings (even with an XSLT step!).

      /wp-content/uploads/2016/03/size_898259.png

      However, this is a very subjective matter and it really depends on the sizing and hardware of the system, and the only way to find out the limits is to perform stretch tests on the system.

      Saying that, 100MB does seem pretty large for a ZIP file and it would likely be much larger once decompressed. The challenge would be the manipulation/mapping part to avoid an out-of-memory errors when loading/parsing through the file contents.

      Any chance for the contents of the ZIP file to be broken into smaller logical units that can be processed independently?

      Regards

      Eng Swee

      (0) 
      1. Steven De Saeger

        Hi Eng,

        Thanks for your ‘fast response’ …

        Yeah I understand that the sizes are probably a bit outdated and are more depending on the actual sizing of the hardware involved … nevertheless – like you agreed – very large ZIP files are still ‘cumbersome’ to manage and require special attention in terms of mapping etc … Even a well tuned system might get some impact on processing these … I have seen huge garbage collection processes kicking off which basically caused the system to ‘halt’ doing anything else … it was not ‘dead’ but it took a while before it started processing anything again …

        For now I think we will resort to OS commands to handle the ZIPPING process itself so we can deal with the individual content – which we require to do anyways as we will have to calculate digest values for each one – and use a BPM or some ABAP hacking to get everything back into place in a folder which we will ZIP back into 1 file …

        Kind Regards,

        Steven

        (0) 
        1. Eng Swee Yeoh Post author

          Hi Steven

          Some further thoughts on this.

          i) Have you tested if the performance impact is caused by the unzipping process or the processing of the multiple individual content? If not, maybe you can try just unzipping without further processing using either a Java mapping or the standard PayloadZipBean.

          ii) Using BPM might not be a good idea. ccBPM is very performance sensitive due to the persistence of multiple copies of the payload, and although NW BPM is better, it is still not good to load large payloads into the BPM context (thus the reason for the Claim Check enhancements)

          Regards

          Eng Swee

          (0) 
          1. Steven De Saeger

            Thanks for your extra thoughts Eng …

            Yeah I agree on the BPM … Been using them since 7.1 and indeed lots of copies of payloads etc … The ‘problem’ is that we need to collect a huge amount of individual PDF documents ( potentially being ZIPPED to facilitate the ‘delivery’) … calculate digest values for all those PDF’s, use that information to make a kind of ‘index’ file including filenames, digests, etc … make a seperate hash file of that index file and then combine all that into a ZIP file which then needs to be SFTP’ed … You can see that is quite a challenge to get that into ‘standard’ PI/PO …

            Thanks for the link to the pattern … very interesting !

            Steven

            (0) 
            1. Eng Swee Yeoh Post author

              Yes, it’ll be 1 message with 8000+ attachments.

              Gosh, yours sounds like a “problem” with a capital P!!! 😯

              If you want, I can assist you by providing an adapter module that unzips the file into multiple child messages, by combining the logic here with the logic in the following module

              AttachmentSplitterBean – Split attachments into child messages

              Let me know if that’s something that might be helpful.

              However, there is still the challenge of combining them back together. BPM or otherwise, how would you determine when all the individual content have been fully processed in order to perform the final zipping?

              (0) 
              1. Steven De Saeger

                Hi Eng,

                Thanks for the feedback and your offer on the java adapter module…  much appreciated but not yet required … We will try to achieve this without custom java adapter modules ( not really what my customer wants ).

                Yeah it is a complex flow … We would probably send messages in batches using a ‘threshold’ ( on BPM or custom coding ) of a max number of messages per batch and/or wait for X minutes … 

                It requires alot of ‘creativity’ to get this one done but we will manage … 🙂

                Ciao,

                Steven

                (0) 

Leave a Reply