Many a times i see queries from user stating that though in receiver channel they have configured file encoding as UTF-8 but the file in receiver folder is with ANSI encoding . What is happening?

Well my answer is, PI is very much correct and its placing the file in the encoding specified in the receiver channel. It’s a known issue with windows based .txt opening applications (like Notepad, Notepad++…etc).

Let us tryout a Example: XML to flat file scenario with Encoding as UTF-8 and let’s see how Notepad interupts the encodings 😉 .

Source: XML File

Target: flat file with File Encoding set as “UTF-8”


Scenario 1: When data is sent across without any BOM chars in it. — what is BOM will explain below be patient 😉 .

  1. The receiver channel process the Source XML into text file with UTF-8 encoding without any issues:


  2.  But when the received flat file is opened using Notepad++ you notice that it says encoding is ANSI.


Sc  Scenario 2: Same FILE sent across with a BOM character in it  (). BOM characters cannot be represented using ANSI.

1.  Same source XML file with BOM character introduced in it:


2. The receiver channel writes the file with UTF-8 encodes in same way as it did in Scenario 1.


3. But if you notice, Notepad++ application interprets the same file as UTF-8 BOM now instead of ANSI as it did earlier. – why this inconsistant behavior of the Notepad++ application?


why and how it’s happening?

  1. The problem is in the apps which opens the text file like notepad, Notepad++ ..etc. these WRONGLY interpret the files as ANSI and hence we assume that the file is encoded in ANSI instead of UTF-8 and thus think PI channel encoding is not working proper 😉 .


It all depends on the algorithm which the applications like notepad use to open a text file and interpret its encoding.

What is BOM: Byte Order Mark is information in a text file which tells the application on what encoding to use while opening a file.

For more information on BOM ref:

This BOM characters are not usually sent across in the flat file and almost all the time the flat file does not have any BOM in it as data. – Even the Unicode standards does not recommend sending across BOM characters in text files.

If BOM is not sent (as data) then there is no information on what encoding to use to open the text file.

Because of this the notepad or Notepad++ does not know what encoding to use while opening the file.

Default Behavior of Notepad or other apps If the file looks like valid UTF-8, then open the file with UTF-8 encoding else with 8-bit ANSI encoding.

Query: Sometimes the file which is sent to receiver folder is in UTF-8 and sometimes in ANSI when opened with Notepad why is it so?

A. well the answer is check the payload content there will be characters which cannot be represented in ANSI format (allowed are from 1 to 127 values) and hence the application opens the file with UTF-8 BOM encoding.

If all the content of the file can be represented using ANSI encoding then the application opens the file by default with ANSI encoding. – it depends on the file content and the internal algorithm of the application which opens the file.

But rest assure the file placed in the receiver folder by PI is in the encoding format specified in the receiver channel and as shown in adapter engine log.

If there is a situation where in the Business is adamant that they want the file encoding visible as UTF-8 when they open the file, then pass the BOM character with the data (header information) using a dummy field to the receiver.

/wp-content/uploads/2012/04/8_96830.jpg        /wp-content/uploads/2012/04/7_96831.jpg

 (0xEF,0xBB,0xBF)–> This BOM  char depicts that the encoding is UTF-8.

-There are different BOM representation for each encoding (UTF-8,UTF-16,UTF-32..etc)  ref wiki link above.

BOM implicitly specifies on what encoding to be used while opening the file. And inform to the receiver business to ignore/handle this field at their end while processing.

Note: above mentioned problem only occur when we deal with windows based systems. not much we can do about it apart from controlling CRLR 🙂 .

-Thoughts and suggestions appreciated.

Senthilprakash Selvaraj SAP PI Flextronics

To report this post you need to login first.


You must be Logged on to comment or reply to a post.

    1. Soumya Nalajarla

      Hi Senthil,

      Nice blog. I have very similar requirment but not able to get through. Our requirement is like below-

      An interface program in SAP sends data to PI using RFC. Both SAP and PI are unicode systems. In PI, we are generating a CSV file using content conversion. Our requirement is to get the target file in UTF-8 format. Initially there was no BOM character usage, so  when we have European characters, they are not identfied correctly.

      So i have added BOM character to the file begining as suggested above. This still doesn’t work. When i open the file in Notepad++, it says encoding UTF-8 but the european characters are not shown as expected and instead shown as hex values.

      Let me know your thoughts.



      1. senthilprakash selvaraj Post author

        Hi Soumya,

        Ideally sending BOM is not a good idea. but in worst cases like yours to make the file encoding explicit we do this.

        ok for  your issue: can you give me the correct Europien char we are taking about (Hex).

        1. First are you able to manually save the file in UTF-8 format and when opened still you are able to see the character without any issue in your laptop?

        2. if point 1 is YES then try introducing the BOM to the file — save it and then open it again and see how the character is represented.

        do let me know.

        also what OS is PI built on and to which OS file system are these files being placed to (Unix or Windows).


        1. Soumya Nalajarla

          hi Senthil,

          Thanks for the response. Here is the details –

          1. The character that is in question is –  ú.  This has UTF-8 value as “C3 BA”.

          2. When I see the incoming Payload  xml  file, it is in UTF-8 format(saved xml into text file and opened in hex editor).  the above character is with hex value “C3 BA”.

          3. When I see the payload xml file generated after mapping, it is in UTF-8 format.the above character is with hex value “C3 BA”.

          4. When I open the CSV file generated, it is in ASCII format. the above character is with hex value “FA”.

          5. Even if I  add BOM character, the file is opened as UTF-8 but the hex value is still “FA” so obviously it is not identified as  “ú”.

          it looks like UTF-8 is converted into ASCII during content conversion.

          PI is on Window NT and files are placed on  NT system as well. We are at PI 7.11 SP 10. Let me know if you need more details.  Appreciate the help.



  1. Jay M

    Hi Senthil,

      Thank you for the blog. I have a scenario from R3-FTP.I have an issue with Latin special characters and vendor wants UTF-8 format with BOM to process the file successfully. I added those 3 BOM characters in the mapping but some how when I see the output file, there are some other characters also appearing . Could you please let me know from where you copied those characters and included in the mapping.

    Its the same issue what Soumya discussed above.We are able to resolve that by just adding UTF-8 as charset, but the issue is back again after the support pack upgrade.

    Appreciate you help in this regard.



    1. Satish Babu

      Hi Senthil,

      I have same requirement like above, in our case they want below encoding format at receiver side.

      Could you please help me..It’s very urgent and

      we are using PI7.0

      –> Text file at receiver side

      –> Unicode encoding

      –>UTF-16 LE

      –>BOM(Byte Order Mark) at starting of the file(0xFF , 0xFE)

      –>Tab Delimited

      –>Row are delimited by “\r\n”

      @can we generate above file using PI7.0?

      Please advise me.



  2. Satish Babu

    Hi All,

    Can anyone help me on below requirement …please.

    we are using PI7.0

    –> Text file at receiver side

    –> Unicode encoding

    –>UTF-16 LE

    –>BOM(Byte Order Mark) at starting of the file(0xFF , 0xFE)

    –>Tab Delimited

    –>Row are delimited by “\r\n”



  3. senthilprakash selvaraj Post author

    Update on my above blog.

    I have been noticing a unique behavior in Unix based file systems (specifically on HP UNIX).

    We see that we need to set the default encoding format of the Unix File systems (by default its set to ANSI for HP UNIX boxes).

    we were seeing that all the flat files placed as UTF-8 by PI in the file system was read as ANSI flat files by the application which was picking up the data from the location. ( in this case SAP MDM application).

    we asked the admin to change the default encoding of the Unix file system from ANSI to UTF-8 at the OS level and the problem was solved immediately.

    So if anyone does not want to send BOM as payload from PI. ask the receiver file system to change the default encoding to UTF-8 at the OS file system level (if this is the standard used across your organisation that is).


Leave a Reply