Saying “Hello” in different languages or Unicode a...

former_member207438 · ‎02-09-2010

to Unicode or not to Unicode?

If you work for a company where all business is done in one language - English - chances are you would never come across it.

First time I started reading about Unicode was when after migration to ECC 6.0 (from a non-Unicode 4.7) some of the file interfaces stopped working.

Second time I came across Unicode was when I found the SAPlink code removing Byte Order Mark character from the XML file. There must have been some reason for doing that. I don't know. See details here.

Now I was really puzzled and decided to research how well ABAP can work with XML and Unicode.

Research

My research goal was to develop an ABAP program that can:

capture a "Hello" message in different languages

save the messages in a Unicode XML file on the frontend

display the message from the Unicode XML file in a given language or display all messages

I knew that computers do need some help to speak a language other than English. Remembering how I once "taught" MS-DOS to speak my language I started to explore.

I found a very informative collection of articles about "Characters and encodings" by Jukka "Yucca" Korpela.

Jukka "Yucca" Korpela writes

(...) a sequence of octets can be interpreted in a multitude of ways when processed as character data. By looking at the octet sequence only, you cannot even know whether each octet presents one character or just part of a two-octet presentation of a character, or something more complicated. Sometimes one can guess the encoding, but data processing and transfer shouldn't be guesswork. More...

Nicely said! That was it. To avoid "the guesswork" XML has the encoding declaration.

For example,

The Byte Order Mark plays an important role too:

Entities encoded in UTF-16 MUST and entities encoded in UTF-8 MAY begin with the Byte Order Mark (...) This is an encoding signature, not part of either the markup or the character data of the XML document. XML processors MUST be able to use this character to differentiate between UTF-8 and UTF-16 encoded documents. More...

The question I needed to answer now was: Is it possible to open an XML file as binary and let ABAP XML processor figure out what encoding to use?

After some coding the answer I found was: Yes.

This program shows how to load and display an XML file. You can download a test UTF-8 or UTF-16 file or use one of your own.

Note that the exception handling is not done properly in the code below.

data lt_xml type dcxmllines.data lv_size type i.data lv_filename type string.data lo_ixml_stream_factory type ref to if_ixml_stream_factory.data lo_input_stream type ref to if_ixml_istream.data lo_ixml type ref to if_ixml.data lo_parser type ref to if_ixml_parser.data lo_ixml_doc type ref to if_ixml_document.data lo_xml_doc type ref to cl_xml_document.lv_filename = 'c:\temp\message-UTF-8.xml'. "#EC NOTEXTcall method cl_gui_frontend_services=>gui_upload  exporting   filename                = lv_filename   filetype                = 'BIN'  importing   filelength              = lv_size  changing   data_tab                = lt_xml  exceptions   others                  = 19.lo_ixml = cl_ixml=>create( ).lo_ixml_stream_factory = lo_ixml->create_stream_factory( ).lo_ixml_doc = lo_ixml->create_document( ).lo_input_stream = lo_ixml_stream_factory->create_istream_itable(   table = lt_xml   size = lv_size ).lo_parser = lo_ixml->create_parser(  stream_factory = lo_ixml_stream_factory  istream = lo_input_stream  document = lo_ixml_doc).lo_parser->parse( ).lo_input_stream->close( ).create object lo_xml_doc.lo_xml_doc->create_with_dom( lo_ixml_doc ).lo_xml_doc->display( ).

Say "Hello" program

The Say "Hello:" program prompts you to enter a "Hello" message in your language.
Select "Update" for your message to be added to the File.
You can download sample XML file in UTF-8 or UTF-16 format.

You can download a SAPlink installation of the program and related utility classes.

XML utility class

I have developed a class ZCL_EVP_XML_UTILS that has two methods LOAD_IXML_DOC_FROM_FILE and SAVE_IXML_DOC_TO_FILE to load and save XML files from/to the frontend.

When you are saving the XML you have to specify the encoding. When the XML is loaded the encoding is not required as it will be defined by the XML encoding declaration.

You can download a SAPlink installation of the program and related utility classes.

Source Code

The live version of the Say "Hello" program and the XML utility class is hosted here in the Google Code project under SVN. The Say "Hello" program is in the test directory.

Unicode!

Going back to my question in the beginning to Unicode or not to Unicode?.

A fellow ABAP-er asked me a few days ago "How many bytes are in that string?". I replied back with a question (not polite, I know ;-)): "What encoding are you using?"

After learning a lot about Unicode I really liked its universal concept.

My answer is definitely: to Unicode!.

SAPlink removing Byte Order Mark

Here is what I found in the revision 280 of the ZSAPLINK class:

Method CONVERTIXMLDOCTOSTRING removes the first character of the XML:

  while _tempString(1) <> ‘<'.

    shift _tempString left by 1 places.

  endwhile.