Application Development Blog Posts
Learn and share on deeper, cross technology development topics such as integration and connectivity, automation, cloud extensibility, developing at scale, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 
former_member207438
Participant
0 Kudos

to Unicode or not to Unicode?


If you work for a company where all business is done in one language - English - chances are you would never come across it.

First time I started reading about Unicode was when after migration to ECC 6.0 (from a non-Unicode 4.7) some of the file interfaces stopped working.

Second time I came across Unicode was when I found the SAPlink code removing Byte Order Mark character from the XML file. There must have been some reason for doing that. I don't know. See details here.

Now I was really puzzled and decided to research how well ABAP can work with XML and Unicode.

Research


My research goal was to develop an ABAP program that can:

  • capture a "Hello" message in different languages

  • save the messages in a Unicode XML file on the frontend

  • display the message from the Unicode XML file in a given language or display all messages


I knew that computers do need some help to speak a language other than English. Remembering how I once "taught" MS-DOS to speak my language I started to explore.

I found a very informative collection of articles about "Characters and encodings" by Jukka "Yucca" Korpela.

Jukka "Yucca" Korpela writes
(...) a sequence of octets can be interpreted in a multitude of ways when processed as character data. By looking at the octet sequence only, you cannot even know whether each octet presents one character or just part of a two-octet presentation of a character, or something more complicated. Sometimes one can guess the encoding, but data processing and transfer shouldn't be guesswork. More...

Nicely said! That was it. To avoid "the guesswork" XML has the encoding declaration.

For example,

The Byte Order Mark plays an important role too:
Entities encoded in UTF-16 MUST and entities encoded in UTF-8 MAY begin with the Byte Order Mark (...) This is an encoding signature, not part of either the markup or the character data of the XML document. XML processors MUST be able to use this character to differentiate between UTF-8 and UTF-16 encoded documents. More...

The question I needed to answer now was: Is it possible to open an XML file as binary and let ABAP XML processor figure out what encoding to use?

After some coding the answer I found was: Yes.

This program shows how to load and display an XML file. You can download a test UTF-8 or UTF-16 file or use one of your own.

Note that the exception handling is not done properly in the code below.


Say "Hello" program




The Say "Hello:" program prompts you to enter a "Hello" message in your language.
Select "Update" for your message to be added to the File.
You can download sample XML file in UTF-8 or UTF-16 format.

You can download a SAPlink installation of the program and related utility classes.

XML utility class


I have developed a class ZCL_EVP_XML_UTILS that has two methods LOAD_IXML_DOC_FROM_FILE and SAVE_IXML_DOC_TO_FILE to load and save XML files from/to the frontend.

When you are saving the XML you have to specify the encoding. When the XML is loaded the encoding is not required as it will be defined by the XML encoding declaration.

You can download a SAPlink installation of the program and related utility classes.

Source Code


The live version of the Say "Hello" program and the XML utility class is hosted here in the Google Code project under SVN. The Say "Hello" program is in the test directory.

Unicode!


Going back to my question in the beginning to Unicode or not to Unicode?.

A fellow ABAP-er asked me a few days ago "How many bytes are in that string?". I replied back with a question (not polite, I know ;-)): "What encoding are you using?"

After learning a lot about Unicode I really liked its universal concept.

My answer is definitely: to Unicode!.




SAPlink removing Byte Order Mark


Here is what I found in the revision 280 of the ZSAPLINK class:

Method CONVERTIXMLDOCTOSTRING removes the first character of the XML:
  while _tempString(1) <> ‘<'.
    shift _tempString left by 1 places.
  endwhile.

6 Comments