Many a times i see queries from user stating that though in receiver channel they have configured file encoding as UTF-8 but the file in receiver folder is with ANSI encoding . What is happening?
Well my answer is, PI is very much correct and its placing the file in the encoding specified in the receiver channel. It’s a known issue with windows based .txt opening applications (like Notepad, Notepad++…etc).
Let us tryout a Example: XML to flat file scenario with Encoding as UTF-8 and let’s see how Notepad interupts the encodings 😉 .
Source: XML File
Target: flat file with File Encoding set as “UTF-8”
Scenario 1: When data is sent across without any BOM chars in it. — what is BOM will explain below be patient 😉 .
- The receiver channel process the Source XML into text file with UTF-8 encoding without any issues:
2. But when the received flat file is opened using Notepad++ you notice that it says encoding is ANSI.
Sc Scenario 2: Same FILE sent across with a BOM character in it (ï»¿). BOM characters cannot be represented using ANSI.
1. Same source XML file with BOM character introduced in it:
2. The receiver channel writes the file with UTF-8 encodes in same way as it did in Scenario 1.
3. But if you notice, Notepad++ application interprets the same file as UTF-8 BOM now instead of ANSI as it did earlier. – why this inconsistant behavior of the Notepad++ application?
– why and how it’s happening?
- The problem is in the apps which opens the text file like notepad, Notepad++ ..etc. these WRONGLY interpret the files as ANSI and hence we assume that the file is encoded in ANSI instead of UTF-8 and thus think PI channel encoding is not working proper 😉 .
It all depends on the algorithm which the applications like notepad use to open a text file and interpret its encoding.
What is BOM: Byte Order Mark is information in a text file which tells the application on what encoding to use while opening a file.
For more information on BOM ref: http://en.wikipedia.org/wiki/Byte_order_mark
This BOM characters are not usually sent across in the flat file and almost all the time the flat file does not have any BOM in it as data. – Even the Unicode standards does not recommend sending across BOM characters in text files.
If BOM is not sent (as data) then there is no information on what encoding to use to open the text file.
Because of this the notepad or Notepad++ does not know what encoding to use while opening the file.
Default Behavior of Notepad or other apps If the file looks like valid UTF-8, then open the file with UTF-8 encoding else with 8-bit ANSI encoding.
Query: Sometimes the file which is sent to receiver folder is in UTF-8 and sometimes in ANSI when opened with Notepad why is it so?
A. well the answer is check the payload content there will be characters which cannot be represented in ANSI format (allowed are from 1 to 127 values) and hence the application opens the file with UTF-8 BOM encoding.
If all the content of the file can be represented using ANSI encoding then the application opens the file by default with ANSI encoding. – it depends on the file content and the internal algorithm of the application which opens the file.
But rest assure the file placed in the receiver folder by PI is in the encoding format specified in the receiver channel and as shown in adapter engine log.
If there is a situation where in the Business is adamant that they want the file encoding visible as UTF-8 when they open the file, then pass the BOM character with the data (header information) using a dummy field to the receiver.
ï»¿ (0xEF,0xBB,0xBF)–> This BOM char depicts that the encoding is UTF-8.
-There are different BOM representation for each encoding (UTF-8,UTF-16,UTF-32..etc) ref wiki link above.
BOM implicitly specifies on what encoding to be used while opening the file. And inform to the receiver business to ignore/handle this field at their end while processing.
Note: above mentioned problem only occur when we deal with windows based systems. not much we can do about it apart from controlling CRLR 🙂 .
-Thoughts and suggestions appreciated.
Senthilprakash Selvaraj SAP PI Flextronics