Handling Newline and Comma in data field of a CSV file – using an UDF
Usually Newline will be endSeparator for records in CSV. But sometimes CSV reports sent from cloud systems (Ariba, SuccessFactors) can contain Newline and Comma in data fields. This happens, when end user enters some description with ‘Enter’ or ‘,’ in it. Comma in data can be handle in FCC using enclosureSign parameter. But Newline in data cannot be handled using FCC or Standard modules.
Here an example CSV file.
Solution:-
Configure FCC or Standard module.
Map using an UDF, to split each line to corresponding field.
Testing
UDF code: – ‘Execution Type’ : ‘All Values of a Context’.
public void udf_FlatToXML(String[] eachLines, ResultList Records, ResultList Name, ResultList ID, ResultList Description, ResultList Location, Container container) throws StreamTransformationException {
//Read input content into String.
String input = "";
for (String eachLine : eachLines) {
input = input + eachLine + "\n";
}
input = input.replaceFirst(".*?\n", ""); //Remove header line. FYI, You can comment this line, if there is no header in input.
//Remove \n and , when they are present in data.
int len = input.length(); StringBuilder out = new StringBuilder();
boolean withInQuotes = false; char c;
for (int i = 0; i < len; i++) {
c = input.charAt(i);
if (c == '"') { withInQuotes = !withInQuotes; }
if (withInQuotes && (c == '\n' || c == ',')) { c = ' '; } //When \n or , are present with in quotes, replace them with space.
out.append(c);
}
//Poupulate target elements.
eachLines = out.toString().split("\n"); //Get each line.
for (String eachLine : eachLines) {
eachLine = eachLine.replace("\"", ""); //Remove all ".
eachLine = eachLine + ",,,,--"; //Add ,,,,-- (4 target fields). This is to avoid ArrayIndexOutOfBoundsException.
String[] value = eachLine.split(","); //Get each value.
Records.addValue("");
Name.addValue(value[0]); Name.addContextChange();
ID.addValue(value[1]); ID.addContextChange();
Description.addValue(value[2]); Description.addContextChange();
Location.addValue(value[3]); Location.addContextChange();
}
}
UDF
good one Radhu 🙂
Raghu
I think this would only work if all the fields in the CSV are encapsulated in double quotes. Not all CSV files would have encapsulation though. Any thoughts how to handle it for those cases?
Rgds
Eng Swee
Eng,
Yes, above custom solution is for "sophisticated CSV implementations" where newLine is present in data. I have seen such data in Ariba CSV reports.
According to Wiki https://en.wikipedia.org/wiki/Comma-separated_values "Fields with embedded line breaks must be quoted (however, many CSV implementations simply do not support this)."
If CSV file does not have enclosureSign (" or other character), above solution will not work. And in that case, solution will be very difficult.
Hi Raghu,
Its great gesture, blogging on a query is very much appreciated.
Nice n easy way to handle the issue..
Keep it up and thanks a lot for blog... 🙂
Regards
Siri
Thanks for Information .
Hello Raghu and All,
Is there good method else to write the java code in ESR of PI(Function Library). You know that the tips of methods will not display when you write the java code in Function Libray.
Thank you for your support in advance.
Thank you & best regards,
Hubery
Hubery,
No. If UDF has huge number of input/output parameters, it is difficult to see entire method definition on screen in Function library. This what I do, I copy the method definition (ctrl+a ctrl+c) and paste it in notepad++ or Java IDE, to see entire method definition.
Hello Raghu,
Thank you for your reply. Generally, I finished the java code in Eclipse, and then copy to the Function Library I created. Because sometimes, I forget the method name with Object. The development platform Eclipse will give me a tips when create a new object.
Many thanks and Best regards,
Hubery
If there are multiple lines like header, details and trailer what field seperate i should use, as uts getting me mapping general error.