Skip to Content
Technical Articles

Handling text files in Groovy script of CPI (SAP Cloud Platform Integration).

Introduction:

Handling huge text files (which are either csv or fixed length)is a challenge in CPI (SAP Cloud Platform Integration) .

Mostly before converting them to xml required for mapping, we do read them via groovy scripts and also manipulate the data. Most often this is done via converting them to string format , which is very much memory intensive.

In this blog post, I will show alternate ways to handle them, not only how to read large files but also how to manipulate them.

Hope you will enjoy the reading.

Main Section:

In CPI (SAP Cloud Platform Integration) sometimes, we come across scenarios where we need to process an input csv or any other character delimited text file.

Most often these files are huge compared to when we get data as xml or json format.

This date which can be “,” or tab or “|” delimited or is of fixed length, creates additional complexity as first they have to be read, sorted, converted to xml (for mapping to some target structure) before they can be finally processed. Also, sometimes we have to do various checks on number of fields to validate if a line in the file is worth processing or not , before hand, to stop flow of unnecessary data.

Like: File -> input.csv

A,12234,NO,C,20190711,……

A,26579,NO,D,20190701,…….

……………………………………………..

……………………………………………..

Say, we have to process all lines of above file where fourth field has Flag set to ‘D’, or Debit indicator.

So, in above example after reading the file we should only keep lines which has ‘D’ as fourth field and hence line 1 above should not be processed further.

 

Here in below we will see how to handle text, csv files. Especially, Huge files and how to process each lines from them with out converting to String which is more memory consuming.

*.   Reading large files :   

 

We normally start our scripts by converting the input payload to String Object.

String content= message.getBody(String) // this line is mostly used in scripts.

 

But in case of large files, the above line converts the whole data to String and stores them in memory, which is not at all a good practice. Further any new changes on them by creating or replacing with new String Objects takes more space. This also has the probability of having – OutOfMemoryError Exception.

The better way is to handle them as stream. There are two class that can handle stream data.

a.              java.io.Reader -> handles data as character or text stream

b.              java.io.InputStream -> handles data as raw or binary stream.

Depending on the level of control you need over data, or business requirement you can use one of them. Mostly the Reader class is easier to use as we get data as text/character (UTF-16)rather then raw binary data (UTF-8).

Reading Data in CPI groovy script via java.io.Reader:

 

Reading Data at each field or word level, for each line:

*.    Not a good way to do replace on data in CPI Groovy:

The String way of doing it –

The better approach of doing a replace while reading it as Stream:

*.    Reading payload as an java.io.InputStream, stream object:

 

Conclusion:

This blog post, is written to ease the pain of developers, as while building Iflows, we do come across multiple cases where in, we need to handle large text files in csv or in other delimited format, which requires reading the entire file, sometimes working on data of each line via parsing-text etc.

In all those cases, the above blog post can be helpful to build required groovy scripts quickly, to be used in CPI (SAP Cloud Platform Integration) iflows, to handle these types of data.

It hastens those developments by providing architecture and re-usable codes on how to achieve the outcome.

I will look forward to your inputs and suggestions.

6 Comments
You must be Logged on to comment or reply to a post.
  • HI Subhojit,

    I am reading a zip file through Groovy script. It works find until the size of the zip file is less than 4MB. If the zip file is more than 4MB, it is giving the below error.

    java.lang.Exception: java.util.zip.ZipException: Open failed for ‘filename’ with fd 962 returning message ‘zip file is empty’ and errno 0 after trying cache.@ line 43 in script1.groovy

    When I tried to print the body.available() in the log, it shows 0 for files more than 4MB.

    I used message.getBodySize() method instead of body.available(), but still its not working.

    The maximum zip file size that we expect in real time would be more than 70MB.

     

    Below is the program that I use to read through the zip file.

    def Message processData(Message message) {
        def body = message.getBody()
        messageLog = messageLogFactory.getMessageLog(message)
        zipFileContent = new byte[body.available()]
        body.read(zipFileContent, 0, body.available())
        File zFile = new File('filename')
        FileOutputStream fileOuputStream = new FileOutputStream(zFile)
        fileOuputStream.write(zipFileContent);
        def pdfMap = [:]
        def manifestCSV
        
        
    	ZipFile zipFile = new ZipFile(zFile);
    
    	Enumeration<? extends ZipEntry> entries = zipFile.entries();
    
    	while(entries.hasMoreElements()){
    		ZipEntry entry = entries.nextElement();
    		InputStream stream = zipFile.getInputStream(entry);
    		if(entry.name.endsWith(".PDF")) {
    		    messageLog.addAttachmentAsString('Name...',entry.getName(),'text')
    		  
    		} else if (entry.name.endsWith(".CSV")) {
    			byte[] buffer = new byte[2048]
    			int len;
    			def content = new StringBuilder()
    			while ((len = stream.read(buffer)) >= 0) {
    				content.append(new String(buffer, 0, len));
    			}
    			
    			manifestCSV = content.toString()
                    
                            message.setProperty('manifestCSV',manifestCSV)
                            message.setBody(manifestCSV)			
    		}
    	}	
        return message 
    }

     

    Can you please guide me where I am wrong?

     

    Regards,

    Anand…

    • Hi can you try to write your code like below ( remember , you have to convert the below code to the way you need. But the overall concept still remains same.)

      ======================================================================

       

       

      def messageLog = messageLogFactory.getMessageLog(message);

      InputStream is = message.getBody(InputStream.class);
      ByteArrayOutputStream out=new ByteArrayOutputStream();
      int n;
      boolean canRead = false;
      def myData =”
      while ((n = is.read()) > -1){
      if (n==80 && !canRead)
      {
      canRead = true;
      }
      if (!canRead){
      continue;
      }
      out.write(n);
      }
      // def totalstring = out.toString(“UTF-8”);
      InputStream is2 = new ByteArrayInputStream(out.toByteArray());
      ZipInputStream zipStream = new ZipInputStream(is2);
      ZipEntry entry=zipStream.getNextEntry();
      byte[] buf=new byte[1024];
      while (entry != null) {
      if (entry.getName().contains(“PDF”)) {
      ByteArrayOutputStream baos=new ByteArrayOutputStream();
      int m;
      while ((m=zipStream.read(buf,0,1024)) > -1) {
      baos.write(buf,0,m);
      }
      myData = new String(baos.toByteArray(),StandardCharsets.UTF_8).replace(“\”UTF-8\”\n”,””)
      message.setBody(new String(baos.toByteArray(),StandardCharsets.UTF_8).replace(“\”UTF-8\”\n”,””));
      }
      zipStream.closeEntry();
      entry=zipStream.getNextEntry();
      }

      messageLog.setStringProperty(“Logging#5”, “Printing Input Payload As Attachment”)
      messageLog.addAttachmentAsString(“#ZIP CONTENT- payment_gl(PDF)”, myData, “text/plain”);
      message.setBody(myData)

      return message;

       

       

      =======================================================================

       

      If it still gives same error , please try to open a ticket to CPI team.

      • Hi mate,

        Thanks for your reply.

        My bad, I forgot to mention that I need to encode the pdf content. Actually, I had other set of code, which is able to read through more than 10MB zip file, but I could not encode. It was giving an error like “Stream close”. That’s why I changed the code to read it into the FileOutputStream.

        When I added the encoding part in your code, its giving the same error like Stream close. As I am inserting this PDF into SuccessFactors, I need to do base64Encoding. Please see the actual code.

        FYI, your code is able to read through all the files inside zip file. If you could help to do the encoding with your code, then that would be great.

        ZipEntry entry = entries.nextElement();
        		InputStream stream = zipFile.getInputStream(entry);
        		if(entry.name.endsWith(".PDF")) {
        		    messageLog.addAttachmentAsString('Name...',entry.getName(),'text')
        		    def jsonStr = createAttachmentXML(entry.getName(),stream.bytes.encodeBase64().toString())
         			messageLog.addAttachmentAsString('JSON...',jsonStr,'text')
         			pdfMap.put(entry.getName(),jsonStr)
        		}