As a middle-ware SAP PI / PO integrates SAP / non-SAP systems, which use different formats (text(XML, CSV…), binary) to represent data. Sometimes they even encode text in different formats OR use different code-pages. This document helps to understand and handle those situations.

    Code-page is a table, assigning a number for each character. Example ‘A’ is 65, ‘a’ is 97 and ‘b’ is 98 and so on.

          Click on image to expand. HTML form of below screenshots are attached (please rename .txt to .html). ASCII, ISO 8859-1, CP-1252 and Unicode.

0 Code page.PNG
Code page.gif Unicode.PNG

    ‘A’ is 65. 65 = 10 0001 (64*1 32*0 16*0 8*0 4*0 2*0 1*1). Representing code-page number in 0’s and 1’s is encoding.

    10 0001 is 65. Lookup 65 in code-page, it is ‘A’. Looking up code-page number is decoding.

    Some encodings are fixed length. Example ASCII, ISO 8859-1, cp1252, UTF-32 and ISO 8859-1 and cp1252 have to use 1 byte to represent code-page number. ASCII has to use 1 byte (it actually use only 7 bites, 1st bit is ignored). UTF-32 has to use 4 bytes.

    Some encodings are variable length. Example UTF-8 and UTF-16. UTF-8 will start with 1 byte, if code-page number is too big to be represented in 1 byte, it can use 2 or 3 or 4 bytes. UTF-16 will start with 2 bytes, if needed it will use 4 bytes (i.e., 2 bytes or 4 bytes).

UTF-8: – UTF-8 is the preferred encoding on internet. HTML, XML, JSON … are encoded in UTF-8 by default.

Understand UTF-8, BOM, endian. FYI..Characters, Symbols and the Unicode Miracle – Computerphile – YouTube, Characters in a computer – Unicode Tutorial UTF-8 – YouTube

    Byte Order Mark (BOM):- It’s a heads-up notice to target system about encoding. Some Microsoft Windows applications require BOM to properly decode UTF text. This is how BOM works. If we are sending UTF-8 encoded text, then we prefix that text stream with binary form of EF BB BF (hex). Then target system reads these characters and understands “This text stream starts with EF BB BF, then this text must be UTF-8 and I should use UTF-8 decode logic”. It will not display EF BB BF. If we are sending UTF-16 Big-Endian, then we will prefix that text stream with FE FF (hex). Then target system reads these characters and understands “This text stream starts with FE FF, then this text must be UTF-16 BE”.

    If target program does not understand BOM heads-up notice, i.e., when it sees EF BB BF (hex) at starting of text stream and it is not programmed to understand it. It may interpret it as cp1252 characters . If you see any error or display starting with  OR þÿ OR ÿþ. It means that, target program is not decoding data properly.

                                                                Click on image to expand.

BOM.gif

    To test whether source, PI/PO and target system are using proper encoding or not. You can request source system to send Euro sign € in one of data elements. If target system does not decode € properly, then there is issue with code-page / encoding.

Notepad.gif

Why Euro sign € is displayed as €?

€ -> U+20AC (hex) -> 0010 0000 1010 1100 -> 11100010 10000010 10101100 -> E2 82 AC -> €

Please go through How to Work with Character Encodings in Process Integration.

Here are some points to note from above document.

    When reading XML, SAP recommend to “File Type” as ‘Binary’. As XML prolog has encoding details <?xml version=”1.0″ encoding=”utf-8″?>. SAP note 821267.

    You can use below adapter modules to change encoding.

    MessageTransformationBean: Transfer.ContentType = text/xml;charset=”cp1252″

    TextCodepageConvertionBean: Conversion.charset = “utf-8”

    XMLAnonymizerBean: anonymizer.encoding = “utf-8”

    FYI. cp1252 is superset to ASCII and ISO 8859-1. UTF-8 is superset of cp1252, but number of bytes used may vary.

Lets handle issues mentioned section 5 and 6 in How to Work with Character Encodings in Process Integration.

1) Java mapping to change code-page/encoding. Supported Encodings.


package com.map;
import com.sap.aii.mapping.api.*;
import java.io.*;
public class ChangeEncoding_JavaMapping extends AbstractTransformation {
    @Override
    public void transform(TransformationInput transformationInput, TransformationOutput transformationOutput) throws StreamTransformationException {
        try {
            InputStream inputStream = transformationInput.getInputPayload().getInputStream();
            OutputStream outputStream = transformationOutput.getOutputPayload().getOutputStream();
            //Read input as cp1252 and write output as UTF-8.
            byte[] b = new byte[inputStream.available()];
            inputStream.read(b);
            String inS = new String(b, "Cp1252");
            outputStream.write(inS.getBytes("UTF-8"));
        } catch (Exception ex) {
            getTrace().addDebugMessage(ex.getMessage());
            throw new StreamTransformationException(ex.toString());
        }
    }
}

















Result: –

1JavaMapping.PNG

2) Java mapping to handle Quoted-Printable input.


package com.map;
import com.sap.aii.mapping.api.*;
import java.io.*;
public class QuotedPrintable_JavaMapping extends AbstractTransformation {
    @Override
    public void transform(TransformationInput transformationInput, TransformationOutput transformationOutput) throws StreamTransformationException {
        try {
            InputStream inputStream = transformationInput.getInputPayload().getInputStream();
            OutputStream outputStream = transformationOutput.getOutputPayload().getOutputStream();
            //Convert quoted-printable to unicode output. Add JAX-WS library when compiling.
            inputStream = javax.mail.internet.MimeUtility.decode(inputStream, "quoted-printable");
            //Copy Input content to Output content.
            byte[] b = new byte[inputStream.available()];
            inputStream.read(b);
            outputStream.write(b);
        } catch (Exception ex) {
            getTrace().addDebugMessage(ex.getMessage());
            throw new StreamTransformationException(ex.toString());
        }
    }
}

















Result: –

2JavaMapping.PNG

3) Java mapping to handle Base64 input.


package com.map;
import com.sap.aii.mapping.api.*;
import java.io.*;
public class Base64_JavaMapping extends AbstractTransformation {
    @Override
    public void transform(TransformationInput transformationInput, TransformationOutput transformationOutput) throws StreamTransformationException {
        try {
            InputStream inputStream = transformationInput.getInputPayload().getInputStream();
            OutputStream outputStream = transformationOutput.getOutputPayload().getOutputStream();
            //Decode Base64 Input content to Output content. FYI. Java 8 has java.util.Base64.
            byte[] b = new sun.misc.BASE64Decoder().decodeBuffer(inputStream);
          //Above class is internal class. As an alternative you can use below line, whichever works for you.
          //byte[] b = javax.xml.blind.DatatypeConverter().decodeBuffer(inputStream);  
          outputStream.write(b);
        } catch (Exception ex) {
            getTrace().addDebugMessage(ex.getMessage());
            throw new StreamTransformationException(ex.toString());
        }
    }
}

















Result: –

3JavaMapping.PNG

4) Java mapping to add BOM.


package com.map;
import com.sap.aii.mapping.api.*;
import java.io.*;
public class BOM_JavaMapping extends AbstractTransformation {
    @Override
    public void transform(TransformationInput transformationInput, TransformationOutput transformationOutput) throws StreamTransformationException {
        try {
            InputStream inputStream = transformationInput.getInputPayload().getInputStream();
            OutputStream outputStream = transformationOutput.getOutputPayload().getOutputStream();
            //Copy Input content to Output content.
            byte[] b = new byte[inputStream.available()];
            inputStream.read(b);
            //Prefix BOM. For UTF-8 use "0xEF,0xBB,0xBF". For UTF-16BE use "0xFE,0xFF". For UTF-16LE use "0xFF,0xFE".
            outputStream.write(0xEF);  outputStream.write(0xBB);  outputStream.write(0xBF);
            outputStream.write(b);
        } catch (Exception ex) {
            getTrace().addDebugMessage(ex.getMessage());
            throw new StreamTransformationException(ex.toString());
        }
    }
}

















Result: – BOM characters will not be displayed.

4JavaMapping.PNG

5) Java mapping to handle XML Escape Sequence.

Not well-formed XML – & issue

FYI…How to create Java mapping.

How to create Java Mapping in SAP PI / PO

To report this post you need to login first.

9 Comments

You must be Logged on to comment or reply to a post.

  1. Eng Swee Yeoh

    Hi Raghu

    Thanks for sharing this 😉

    A couple of feedback:

    – Can you provide links in a reference section to the different code-pages shown in the animated GIF? It’s a bit hard if anyone wants to view the details of a particular codepage as the animation keeps on switching images.

    – It is not recommended to use the Sun Base64 decoder. Neither can we use Java 8’s Base64 library (yet!) as the latest PI versions are still using JVM 6.1. Refer to the following thread for the different options for handling Base64.

    Base64 Encoding using UDF

    Rgds

    Eng Swee

    (0) 
    1. Raghu Vamseedhar Reddy KadipiReddy Post author

      Eng,

      Thank you.

        – I have used GIF to explain the concept better and save space. I would recommend taking a screenshot, when constant picture is required. Code-pages are available on wiki, ASCII, ISO 8859-1, CP-1252, Unicode.

        – I agree, sun.misc.BASE64Decoder is internal class.

      Readers can use byte[] b = javax.xml.blind.DatatypeConverter().decodeBuffer(inputStream); OR byte[] b = new  byte[inputStream.available()]; inputSteam.read(b); b = javax.xml.bind.DatatypeConverter.parseBase64Binary(b.toString());

      FYI..this option did not gave me correct result.

      Document is updated with your suggestions.

      (0) 
      1. Eng Swee Yeoh

        Hi Raghu

        Thanks for the update. Yes, I understand that it is possible to take a screenshot, but it would have been nicer to have the links – just my two cents 😉

        Hmm.. not sure why javax.xml.bind.DatatypeConverter.parseBase64Binary is not giving you the correct results. I do have some interfaces running on PO7.4 using both the parse and print methods to handle Base64 decode/encode and they are working fine. Anyway, there are a few options around so readers can choose whichever works on their system 😉

        Rgds

        Eng Swee

        (0) 
    1. Praveen B

      Hi Raghu,

      Thanks for Sharing such a useful information.

      Currently I got the same requirement, to add BOM to the file. Like you mentioned I did install NWDS and exported the jar file into ESR in PI. In first attempt I saw it was working good. I can see BOM added to the file. But later I tried the following day, I tested with one more file it was not working. I checked all the settings are normal. I deleted java mapping and created a new one with new names  and imported the same. .. but still BOM is not added to the file. Could you please help me on this.

      Thanks

      Praveen

      (0) 
  2. RAVIJEET DAS

    Hi Raghu,

    I am getting below error in RECEIVER REST adapter audit logs, not sure what is causing this error. I believe it is some invalid character in the response. I am trying to do a JSON to XML conversion in response.


    Information Server returned code: 200

    MP: exception caught with cause java.lang.RuntimeException:

    com.ctc.wstx.exc.WstxIOException: Invalid white space character (0x8) in text to output (in xml 1.1, could output as a character entity)

    Exception caught by adapter framework: com.ctc.wstx.exc.WstxIOException: Invalid white space character (0x8) in text to output (in xml 1.1, could output as a character entity)

    Transmitting the message using connection JPR failed, due to: com.sap.engine.interfaces.messaging.api.exception.MessagingException: java.lang.RuntimeException: com.ctc.wstx.exc.WstxIOException: Invalid white space character (0x8) in text to output (in xml 1.1, could output as a character entity)

    As this message is getting stuck in receiver rest adapter on synchronous response side I believe I would either need to write a adapter module or change the code page to ISO Latin/ UTF -16 ?

    I tried to put the data format of response json to cp1252 but still same error.


    Thx

    Ravijeet

    (0) 
  3. Shivduttsinh Mahida

    Hi Raghu,

     

    I have a requirement to generate a .txt file from PI encoded in utf-8 with BOM. I implemented the example 4 suggested here and added it in the operation mapping after my usual message mapping, but it still doesn’t work. May you please let me know, what am i doing wrong here?

    Thanks

    Shiv

     

    (0) 

Leave a Reply