Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
shayanmajumder
Discoverer

Introduction

While using SAP CPI a consultant often faces a task of sending various types of files to an external SFTP server however more than often just simply defining the file type in the file extension is not enough. There are cases when even though we have the proper extension the file is not properly deciphered by the relevant software. Hence it becomes crucial to hardcode the encoding of the file before it is being sent to the SFTP.

Byte Order Marker (BOM) is used to define encoding and byte order in a file. Usually taking form as an encoded sequence of bytes, BOM aids software in deciphering endianness or byte order for multibyte character encodings such as UTF-16 and UTF-32. The BOMs are also sometimes referred as Magic Numbers which are specific bytes at the beginning of a file that distinguish it as a certain file type. They are also known as file signatures and can help the system identify files even without a file extension.

Advantages of using byte order marker include:

  1. The Byte Order Mark (BOM) plays a crucial role in identifying the character encoding of a text file, especially within Unicode. Given that various encodings may coexist such as UTF-8, UTF-16 and UTF-32; the BOM distinguishes between them an operation particularly useful for those actively working with these different implications of Unicode.
  2. The Byte Order Mark (BOM) serves as a byte order indication for encodings such as UTF-16, where the significance lies in their byte order (endianness). This specific sequence of bytes indicates whether the least significant or most significant byte precedes it.
  3. The inclusion of Byte Order Mark (BOM) can significantly enhance compatibility, particularly in environments where diverse systems or software might interpret text files with variance; thus the use of a BOM helps guarantee that supporting programs will correctly decipher the text file.

Different types of byte order markers

Encoding Representation (hexadecimal) Unicode String Format
UTF-8EF BB BF\uFEFF
UTF-16, big-endianFE FF\uFFFE
UTF-16, little-endianFF FE\uFEFF
UTF-32, big-endian00 00 FE FF\u0000\u0000\uFEFF
UTF-32, little-endianFF FE 00 00\uFEFF\u0000\u0000
UTF-72B 2F 76 38 2B 2F 76 39+/v8+/v9

A comprehensive list of all file magic numbers can be found here.

Using BOM in SAP CPI Groovy Script

Suppose we need to send a simple CSV to a SFTP which contains some Chinese characters. If we do not encode it using BOM and try opening it using excel the output will be shown as:

shayanmajumder_0-1710239138066.png

However when we encode it using BOM using the following code the output will be as follows:

 

def csvString = "Name,Age,City,ChineseText\nJohn,30,Beijing,你好世界"
csvString = "\uFEFF" + csvString; // New string after adding UTF-8 Byte Order Mark (BOM)

 

Hence we can see how hardcoding a byte order can help us in dealing with foreign characters and unique encoding styles.

shayanmajumder_1-1710239236316.png

Use of byte markers incase of bidirectional text

Certain integration which involve bidirectional texts involving a mix of both Left to Right and Right to Left directional texts this might require the use of byte markers which make it unidirectional. A good example of such kind of integrations is the Hilan interface which involves a mix of both Hebrew and English alphabets and hence viewing the data effectively becomes very difficult. 

DirectionUnicode Byte MarkerDescriptionPreview (Showing all characters )Final Display
LTR (Left-to-Right)\u200EThis marker signals left-to-right text.shayanmajumder_1-1710236259683.pngshayanmajumder_2-1710236332323.png
RTL (Right-to-Left)\u200FThis marker signals right-to-left text.shayanmajumder_4-1710236904481.pngshayanmajumder_11-1710237814440.png
Pop Directional Format\u202CThe marker terminates an embedding or overrides control by popping the last direction setting.shayanmajumder_0-1710237079477.pngshayanmajumder_1-1710237130026.png
LRE (Left-to-Right Embedding)\u202AThis marker indicates that the following text should be treated as an embedded left-to-right block.shayanmajumder_3-1710237270796.pngshayanmajumder_2-1710237238643.png
RLE (Right-to-Left Embedding)\u202BUse this marker to indicate that the following text should be treated as a right-to-left block.shayanmajumder_4-1710237365497.pngshayanmajumder_5-1710237391888.png
Left-to-Right Override\u202DThis marker enforces left-to-right direction for the enclosed text, overriding the default right-to-left direction.shayanmajumder_6-1710237504944.pngshayanmajumder_7-1710237533501.png
Right-to-Left Override\u202EThis marker enforces right-to-left direction for the enclosed text, overriding the default left-to-right direction.shayanmajumder_8-1710237578355.pngshayanmajumder_9-1710237604112.png

Conclusion

While the use of BOM can be beneficial in certain situations, it is not always required or preferred. In some instances, such as HTTP responses and scripting languages, including BOMs can cause unforeseen issues. Therefore, it is crucial to evaluate the specific requirements and compatibility of the systems and tools being utilized before deciding to incorporate a BOM in text files. BOM should also be carefully used incase of fixed width files as it introduces additional special characters which might cause an issue with the interpreting software.

Labels in this area