Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
subhojit_saha2
Participant
Introduction:

Handling huge text files (which are either csv or fixed length)is a challenge in CPI (SAP Cloud Platform Integration) .

Mostly before converting them to xml required for mapping, we do read them via groovy scripts and also manipulate the data. Most often this is done via converting them to string format , which is very much memory intensive.

In this blog post, I will show alternate ways to handle them, not only how to read large files but also how to manipulate them.

Hope you will enjoy the reading.

Main Section:

In CPI (SAP Cloud Platform Integration) sometimes, we come across scenarios where we need to process an input csv or any other character delimited text file.

Most often these files are huge compared to when we get data as xml or json format.

This date which can be “,” or tab or “|” delimited or is of fixed length, creates additional complexity as first they have to be read, sorted, converted to xml (for mapping to some target structure) before they can be finally processed. Also, sometimes we have to do various checks on number of fields to validate if a line in the file is worth processing or not , before hand, to stop flow of unnecessary data.

Like: File -> input.csv

A,12234,NO,C,20190711,……

A,26579,NO,D,20190701,…….

……………………………………………..

……………………………………………..

Say, we have to process all lines of above file where fourth field has Flag set to ‘D’, or Debit indicator.

So, in above example after reading the file we should only keep lines which has ‘D’ as fourth field and hence line 1 above should not be processed further.

 

Here in below we will see how to handle text, csv files. Especially, Huge files and how to process each lines from them with out converting to String which is more memory consuming.

*.   Reading large files :   

 

We normally start our scripts by converting the input payload to String Object.

String content= message.getBody(String) // this line is mostly used in scripts.

 

But in case of large files, the above line converts the whole data to String and stores them in memory, which is not at all a good practice. Further any new changes on them by creating or replacing with new String Objects takes more space. This also has the probability of having – OutOfMemoryError Exception.

The better way is to handle them as stream. There are two class that can handle stream data.

a.              java.io.Reader -> handles data as character or text stream

b.              java.io.InputStream -> handles data as raw or binary stream.

Depending on the level of control you need over data, or business requirement you can use one of them. Mostly the Reader class is easier to use as we get data as text/character (UTF-16)rather then raw binary data (UTF-8).

Reading Data in CPI groovy script via java.io.Reader:



 

Reading Data at each field or word level, for each line:



*.    Not a good way to do replace on data in CPI Groovy:

The String way of doing it –



The better approach of doing a replace while reading it as Stream:



*.    Reading payload as an java.io.InputStream, stream object:



 

Conclusion:

This blog post, is written to ease the pain of developers, as while building Iflows, we do come across multiple cases where in, we need to handle large text files in csv or in other delimited format, which requires reading the entire file, sometimes working on data of each line via parsing-text etc.

In all those cases, the above blog post can be helpful to build required groovy scripts quickly, to be used in CPI (SAP Cloud Platform Integration) iflows, to handle these types of data.

It hastens those developments by providing architecture and re-usable codes on how to achieve the outcome.

I will look forward to your inputs and suggestions.
11 Comments
ArindamMitra
Explorer
Great one!! Keep Blogging Subhojit 🙂
subhojit_saha2
Participant
0 Kudos
Thanks Arindam.
ananda_paul
Explorer
0 Kudos
HI Subhojit,

I am reading a zip file through Groovy script. It works find until the size of the zip file is less than 4MB. If the zip file is more than 4MB, it is giving the below error.
java.lang.Exception: java.util.zip.ZipException: Open failed for 'filename' with fd 962 returning message 'zip file is empty' and errno 0 after trying cache.@ line 43 in script1.groovy

When I tried to print the body.available() in the log, it shows 0 for files more than 4MB.

I used message.getBodySize() method instead of body.available(), but still its not working.

The maximum zip file size that we expect in real time would be more than 70MB.

 

Below is the program that I use to read through the zip file.
def Message processData(Message message) {
def body = message.getBody()
messageLog = messageLogFactory.getMessageLog(message)
zipFileContent = new byte[body.available()]
body.read(zipFileContent, 0, body.available())
File zFile = new File('filename')
FileOutputStream fileOuputStream = new FileOutputStream(zFile)
fileOuputStream.write(zipFileContent);
def pdfMap = [:]
def manifestCSV


ZipFile zipFile = new ZipFile(zFile);

Enumeration<? extends ZipEntry> entries = zipFile.entries();

while(entries.hasMoreElements()){
ZipEntry entry = entries.nextElement();
InputStream stream = zipFile.getInputStream(entry);
if(entry.name.endsWith(".PDF")) {
messageLog.addAttachmentAsString('Name...',entry.getName(),'text')

} else if (entry.name.endsWith(".CSV")) {
byte[] buffer = new byte[2048]
int len;
def content = new StringBuilder()
while ((len = stream.read(buffer)) >= 0) {
content.append(new String(buffer, 0, len));
}

manifestCSV = content.toString()

message.setProperty('manifestCSV',manifestCSV)
message.setBody(manifestCSV)
}
}
return message
}

 

Can you please guide me where I am wrong?

 

Regards,

Anand...
subhojit_saha2
Participant
0 Kudos
Hi can you try to write your code like below ( remember , you have to convert the below code to the way you need. But the overall concept still remains same.)

======================================================================

 

 

def messageLog = messageLogFactory.getMessageLog(message);

InputStream is = message.getBody(InputStream.class);
ByteArrayOutputStream out=new ByteArrayOutputStream();
int n;
boolean canRead = false;
def myData =''
while ((n = is.read()) > -1){
if (n==80 && !canRead)
{
canRead = true;
}
if (!canRead){
continue;
}
out.write(n);
}
// def totalstring = out.toString("UTF-8");
InputStream is2 = new ByteArrayInputStream(out.toByteArray());
ZipInputStream zipStream = new ZipInputStream(is2);
ZipEntry entry=zipStream.getNextEntry();
byte[] buf=new byte[1024];
while (entry != null) {
if (entry.getName().contains("PDF")) {
ByteArrayOutputStream baos=new ByteArrayOutputStream();
int m;
while ((m=zipStream.read(buf,0,1024)) > -1) {
baos.write(buf,0,m);
}
myData = new String(baos.toByteArray(),StandardCharsets.UTF_8).replace("\"UTF-8\"\n","")
message.setBody(new String(baos.toByteArray(),StandardCharsets.UTF_8).replace("\"UTF-8\"\n",""));
}
zipStream.closeEntry();
entry=zipStream.getNextEntry();
}

messageLog.setStringProperty("Logging#5", "Printing Input Payload As Attachment")
messageLog.addAttachmentAsString("#ZIP CONTENT- payment_gl(PDF)", myData, "text/plain");
message.setBody(myData)

return message;

 

 

=======================================================================

 

If it still gives same error , please try to open a ticket to CPI team.
ananda_paul
Explorer
0 Kudos
Hi mate,

Thanks for your reply.

My bad, I forgot to mention that I need to encode the pdf content. Actually, I had other set of code, which is able to read through more than 10MB zip file, but I could not encode. It was giving an error like "Stream close". That's why I changed the code to read it into the FileOutputStream.

When I added the encoding part in your code, its giving the same error like Stream close. As I am inserting this PDF into SuccessFactors, I need to do base64Encoding. Please see the actual code.

FYI, your code is able to read through all the files inside zip file. If you could help to do the encoding with your code, then that would be great.
ZipEntry entry = entries.nextElement();
InputStream stream = zipFile.getInputStream(entry);
if(entry.name.endsWith(".PDF")) {
messageLog.addAttachmentAsString('Name...',entry.getName(),'text')
def jsonStr = createAttachmentXML(entry.getName(),stream.bytes.encodeBase64().toString())
messageLog.addAttachmentAsString('JSON...',jsonStr,'text')
pdfMap.put(entry.getName(),jsonStr)
}
ananda_paul
Explorer
0 Kudos
Hi S,

I am able to do the encoding.
Base64.getEncoder().encodeToString(baos.toByteArray())

Thanks

Anand...
0 Kudos
Hi Subhojit,

Great blog!

I have a requirement to read 3 very large csv files and simply combine them and send to receiver.
While I can use the input stream to read the files and use the memory space efficiently, the combining part using aggregation will need for the file to be converted to xml, since aggregation in CPI works only with xml.

And since I will be doing an aggregation, this large data will still be stored in the data store. Isn't it?

Do you have any ideas/work around to manage that?

Thanks,

Shubham

 
0 Kudos
Hi Subhojit,

 

We have a flat file where headers have special characters. We want to use replace only for header line. How do we achieve that using groovy?

 

Thanks,

Hemant
johnny999
Participant
0 Kudos
Thanks for your blog. I tried your method and it doesn't work.

My interface is extracting an email attachment via the sender mail adapter. I can see the attachment does get extracted by the mail adapter and saved into the body via the trace.

However, when I used the groovy script to read the attachment via the getBody, nothing gets read. I verified by the body.length() and is 0.

Here is the simple code for me to get the body:

String body = message.getBody(java.lang.String)

body.length() is 0.

I used your array method and the array has size() 0.

Is that a bug in CPI as getbody is local CPI method.

Thanks Jonathan.

 
shivaprasad1
Participant
0 Kudos
Hi Subhojit,

I tried implementing your logic (tried using io. Reader instead of String)

But i am facing an issue.

Below is my input payload

10000001|Module 1|Learning modules

10000002|Module 1|Learning modules

10000003|Module 1|Learning modules

10000001|Module 3|Learning modules

10000001|Module 2|Learning modules

10000002|Module 2|Learning modules

10000003|Module 2|Learning modules

10000003|Module 3|Learning modules

10000002|Module 3|Learning modules


Below is the script :
import com.sap.gateway.ip.core.customdev.util.Message;
import java.util.HashMap;
import java.io.*;
import org.codehaus.groovy.runtime.IOGroovyMethods;
def Message processData(Message message) {
//Body
def lines = message.getBody( java.io.Reader)
lines.eachLine{
prinltn "hi"
}
return message;
}

What could be the possible solution for the below error?

resham_naaz
Explorer
0 Kudos
Hi hemant0301knack , I have a similar requirement, Did you figure out how to do it?