Skip to Content

Handling Newline and Comma in data field of a CSV file – using an UDF

Usually Newline will be endSeparator for records in CSV. But sometimes CSV reports sent from cloud systems (Ariba, SuccessFactors) can contain Newline and Comma in data fields. This happens, when end user enters some description with ‘Enter’ or ‘,’ in it. Comma in data can be handle in FCC using enclosureSign parameter. But Newline in data cannot be handled using FCC or Standard modules.

Here an example CSV file.

1Input.PNG

Solution:-

Configure FCC or Standard module.

2FCC.PNG

Map using an UDF, to split each line to corresponding field.

3Mapping.PNG

Testing

4Testing.PNG

UDF code: – ‘Execution Type’ : ‘All Values of a Context’.


public void udf_FlatToXML(String[] eachLines, ResultList Records, ResultList Name, ResultList ID, ResultList Description, ResultList Location, Container container) throws StreamTransformationException {
    //Read input content into String.
    String input = "";
    for (String eachLine : eachLines) {
        input = input + eachLine + "\n";
    }
    input = input.replaceFirst(".*?\n", ""); //Remove header line. FYI, You can comment this line, if there is no header in input.
    //Remove \n and , when they are present in data.
    int len = input.length();        StringBuilder out = new StringBuilder();
    boolean withInQuotes = false;    char c;
    for (int i = 0; i < len; i++) {
        c = input.charAt(i);
        if (c == '"') { withInQuotes = !withInQuotes; }
        if (withInQuotes && (c == '\n' || c == ',')) { c = ' '; } //When \n or , are present with in quotes, replace them with space.
        out.append(c);
    }
    //Poupulate target elements.
    eachLines = out.toString().split("\n"); //Get each line.
    for (String eachLine : eachLines) {
        eachLine = eachLine.replace("\"", ""); //Remove all ".
        eachLine = eachLine + ",,,,--"; //Add ,,,,-- (4 target fields). This is to avoid ArrayIndexOutOfBoundsException.
        String[] value = eachLine.split(","); //Get each value.
        Records.addValue("");
        Name.addValue(value[0]);            Name.addContextChange();
        ID.addValue(value[1]);              ID.addContextChange();
        Description.addValue(value[2]);     Description.addContextChange();
        Location.addValue(value[3]);        Location.addContextChange();
    }
}

UDF

5UDF.PNG

8 Comments
You must be Logged on to comment or reply to a post.
  • Raghu

    I think this would only work if all the fields in the CSV are encapsulated in double quotes. Not all CSV files would have encapsulation though. Any thoughts how to handle it for those cases?

    Rgds

    Eng Swee

    • Eng,

      Yes, above custom solution is for “sophisticated CSV implementations” where newLine is present in data. I have seen such data in Ariba CSV reports.

      According to Wiki https://en.wikipedia.org/wiki/Comma-separated_values “Fields with embedded line breaks must be quoted (however, many CSV implementations simply do not support this).”

      If CSV file does not have enclosureSign (” or other character), above solution will not work. And in that case, solution will be very difficult.

      • Hi Raghu,

        Its great gesture, blogging on a query is very much appreciated.

        Nice n easy way to handle the issue..

        Keep it up and thanks a lot for blog… 🙂

        Regards

        Siri

  • Hello Raghu and All,

    Is there good method else to write the java code in ESR of PI(Function Library). You know that the tips of methods will not display when you write the java code in Function Libray.

    Thank you for your support in advance.

    Thank you & best regards,

    Hubery

    • Hubery,

      No. If UDF has huge number of input/output parameters, it is difficult to see entire method definition on screen in Function library. This what I do, I copy the method definition (ctrl+a ctrl+c) and paste it in notepad++ or Java IDE, to see entire method definition.

      • Hello Raghu,

        Thank you for your reply. Generally, I finished the java code in Eclipse, and then copy to the Function Library I created. Because sometimes, I forget the method name with Object. The development platform Eclipse will give me a tips when create a new object.

        Many thanks and Best regards,

        Hubery