Skip to Content
Author's profile photo Raghu Vamseedhar Reddy KadipiReddy

Handling Newline and Comma in data field of a CSV file – using an UDF

Usually Newline will be endSeparator for records in CSV. But sometimes CSV reports sent from cloud systems (Ariba, SuccessFactors) can contain Newline and Comma in data fields. This happens, when end user enters some description with ‘Enter’ or ‘,’ in it. Comma in data can be handle in FCC using enclosureSign parameter. But Newline in data cannot be handled using FCC or Standard modules.

Here an example CSV file.

1Input.PNG

Solution:-

Configure FCC or Standard module.

2FCC.PNG

Map using an UDF, to split each line to corresponding field.

3Mapping.PNG

Testing

4Testing.PNG

UDF code: – ‘Execution Type’ : ‘All Values of a Context’.


public void udf_FlatToXML(String[] eachLines, ResultList Records, ResultList Name, ResultList ID, ResultList Description, ResultList Location, Container container) throws StreamTransformationException {
    //Read input content into String.
    String input = "";
    for (String eachLine : eachLines) {
        input = input + eachLine + "\n";
    }
    input = input.replaceFirst(".*?\n", ""); //Remove header line. FYI, You can comment this line, if there is no header in input.
    //Remove \n and , when they are present in data.
    int len = input.length();        StringBuilder out = new StringBuilder();
    boolean withInQuotes = false;    char c;
    for (int i = 0; i < len; i++) {
        c = input.charAt(i);
        if (c == '"') { withInQuotes = !withInQuotes; }
        if (withInQuotes && (c == '\n' || c == ',')) { c = ' '; } //When \n or , are present with in quotes, replace them with space.
        out.append(c);
    }
    //Poupulate target elements.
    eachLines = out.toString().split("\n"); //Get each line.
    for (String eachLine : eachLines) {
        eachLine = eachLine.replace("\"", ""); //Remove all ".
        eachLine = eachLine + ",,,,--"; //Add ,,,,-- (4 target fields). This is to avoid ArrayIndexOutOfBoundsException.
        String[] value = eachLine.split(","); //Get each value.
        Records.addValue("");
        Name.addValue(value[0]);            Name.addContextChange();
        ID.addValue(value[1]);              ID.addContextChange();
        Description.addValue(value[2]);     Description.addContextChange();
        Location.addValue(value[3]);        Location.addContextChange();
    }
}

UDF

5UDF.PNG

Assigned tags

      8 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Praveen Tirumareddy
      Praveen Tirumareddy

      good one Radhu 🙂

      Author's profile photo Eng Swee Yeoh
      Eng Swee Yeoh

      Raghu

      I think this would only work if all the fields in the CSV are encapsulated in double quotes. Not all CSV files would have encapsulation though. Any thoughts how to handle it for those cases?

      Rgds

      Eng Swee

      Author's profile photo Raghu Vamseedhar Reddy KadipiReddy
      Raghu Vamseedhar Reddy KadipiReddy
      Blog Post Author

      Eng,

      Yes, above custom solution is for "sophisticated CSV implementations" where newLine is present in data. I have seen such data in Ariba CSV reports.

      According to Wiki https://en.wikipedia.org/wiki/Comma-separated_values "Fields with embedded line breaks must be quoted (however, many CSV implementations simply do not support this)."

      If CSV file does not have enclosureSign (" or other character), above solution will not work. And in that case, solution will be very difficult.

      Author's profile photo Former Member
      Former Member

      Hi Raghu,

      Its great gesture, blogging on a query is very much appreciated.

      Nice n easy way to handle the issue..

      Keep it up and thanks a lot for blog... 🙂

      Regards

      Siri

      Author's profile photo Former Member
      Former Member

      Thanks for Information .

      Author's profile photo Hubery Yao
      Hubery Yao

      Hello Raghu and All,

      Is there good method else to write the java code in ESR of PI(Function Library). You know that the tips of methods will not display when you write the java code in Function Libray.

      Thank you for your support in advance.

      Thank you & best regards,

      Hubery

      Author's profile photo Raghu Vamseedhar Reddy KadipiReddy
      Raghu Vamseedhar Reddy KadipiReddy
      Blog Post Author

      Hubery,

      No. If UDF has huge number of input/output parameters, it is difficult to see entire method definition on screen in Function library. This what I do, I copy the method definition (ctrl+a ctrl+c) and paste it in notepad++ or Java IDE, to see entire method definition.

      Author's profile photo Hubery Yao
      Hubery Yao

      Hello Raghu,

      Thank you for your reply. Generally, I finished the java code in Eclipse, and then copy to the Function Library I created. Because sometimes, I forget the method name with Object. The development platform Eclipse will give me a tips when create a new object.

      Many thanks and Best regards,

      Hubery