Skip to Content

I just come across few people, including myself, got confused with single-byte codepages vs. multi-byte codepages. Going through few materials on the SDN and the web, I come across few useful links. 

While I was researching for the codepages, I had to go through a term called Unicode. There are a lot of good resources on Unicode and Unicode conversions on the web, I linked some as the following. For me, just for simplification, Unicode is a mixture of multiple languages. 

 

  • For more detailed information on Unicode

You may want to have a look at what Unicode is from Unicode.org http://www.unicode.org/standard/WhatIsUnicode.htmlSAP’s Unicode Technology http://service.sap.com/unicode (SMP logon required)Unicode Technology at SDN Overview of Unicode (SDN logon required) 

 

  • Some reference to Unicode conversion and blogs.

Unicode conversionOverview of Unicode -> Unicode Conversion – Upgrade and conversion. (SDN logon required) Blog: Jim Spath’s Unicode – Episode series /people/jim.spath/blog/2008/02/12/unicode–episode-1000-the-final-chapter-updated-18-feb-2008Blog: Martin Riedel’s  SAP Upgrades: When Should my Organization Convert to Unicode? SAP Upgrades: When Should my Organization Convert to Unicode?

 

  

Before Unicode.

Codepages

There is something called Codepages in SAP language world. My understanding of a Codepage is a language i.e. English, French, Japanese, etc. For instance, Latin-1 codepages, this can speak English for sure and can speak most of Latin based languages, but can not handle non Latin based languages like Japanese. Another example would be Japanese codepage, which can handle Japanese for sure, and can also handle English, but not any other European language. 

I kind of see the similarities of characters among the some of European languages, and interestingly enough, there are similarities among Asian languages characters as well. And somehow they are grouped and built into same codepages. 

Going back to this single and multi bytes codepages stuffs, usually Alphabet oriented languages are coded into single-byte codepages, and those relatively complex looking Asian languages are coded into multi-bytes codepages. 

The main reason they are called in this manner is how the characters are stored. All characters used in Alphabet based codepages could fit in to a box called byte, but characters used in certain Asian languages would not fit into the same sized box as the Alphabet based codepages would.  For instance, English, all its alphabets can fit in to a box(byte); single-byte codepages, on the other hand, like Japanese, all its characters could not fit into the same box but could fit into two of the same sized boxes(2 bytes: double-byte).            

Before I might mislead you with a strange analogy referring a byte with a box, let me give you a short background of bytes. A byte is collection of 8 bits. And a bit, the smallest unit of electronic storage, can represent only two values 1 and 0 (on and off), and a combination of 8 1s and 0s represent a character in a single-byte codepage.  Oh, why 1s and 0s? Believe or not, in digital world, there are only on and off. I thought the computers are smarter…well, not to worry smart people come up with an idea of giving those bunch of 1s and 0s a meaning by setting up a convention called code sets to represent recognizable information; eg, ASCII, ****-JIS, EBCDIC, etc. 

Just for fun, in Latin based ASCII based codepage [Hello] can be represented as [0110100001100101011011000110110001101111] in bits, and stored in 5 bytes. [H=01101000 e=01100101 l=01101100 l=01101100 o=01101111].  

For those who are good with numbers, a byte can represent maximum of 256 (2 to the power of 8) character combinations….. hummm, in some of Eastern Asian primary schools, students need to learn at least 3000 Chinese characters,,, 256,,, 

How about double the bytes? 2 bytes can represent maximum of 65536 (2 to the power of 16) character combinations in theories. So, represented characters of a combination of 16 1s and 0s are introduced as double-byte codepage.   

In short,·       
  • Single-byte – e.g. English, German, etc, requires 1 byte per character·       
  • Multi-bytes – e.g. Japanese, Korean, etc, requires more then 2 bytes per character

MDMP

At the beginning, SAP could only speak one codepage, but of course, needs of handling more then one language grew in time as more and more companies needed to handle more languages. E.g. I want to use German and Japanese on my SAP system for local legal requirements. For that, SAP come up with an idea of MDMP. MDMP is a way to teach SAP more then one codepage, regardless of single or multi byte codepages.  

With MDMP, SAP could speak more then one language. This was quite successful, but with some limitations. To give you an idea, if you are talking in German with Japanese this will never going to work, this even can cause misunderstanding. MDMP was quite similar; you have to make sure you are talking in a correct language when you talk to MDMP SAP system. And of course it was very hard to teach an old dog a new trick… installing MDMP was a pain. 

The slide deck from Hideki Yamamoto demonstrates the above very well with even with some common error case: Session ID: IM101 Dealing with Multi-Language Garbage? Data – Lessons Learned http://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/502d4e05-2111-2a10-9e9c-a9951a3110ac (SDN logon required)                       

After Unicode,,,

Ok, so there is a mixture of most of the languages and it is put into a multi-bytes. For people who are interested in how those different languages are mixed into Unicode, you might want to have a look at this chart. http://www.unicode.org/charts/ 

 In fact, it is very similar to the MDMP in one codepage under SAP systems.

        It can display all languages.
        Technically, it is already enabled with SAP supported languages.
                -Few parameter toggles and import of translations will enable a language 
 

Unicode can talk in a lot of languages and it is very easy to communicate in different languages under Unicode enabled SAP system. The merit of Unicode enabled SAP system is, it will speak (display) all available languages correctly regardless of the language I am speaking (language used to access SAP) of. Where as under MDMP, I could only speak (display) in one specific language and will get my reply in only that particular language, ignoring the original language of the contents. E.g., If a name was entered in Japanese, but I logon to the same SAP system with Russian, I will not get a proper display on the Japanese name. This is also well demonstrated with the slide from Hideki Yamamoto above.           

Happily ever after??? 

Well, all the fuzz with Unicode sounded really good, and it is, in fact, quite good, until the data communication scenarios between Unicode and Non-Unicode came into my life. Like how Unicode talks to others in ABAP world.
Unicode <-> MDMP

Unicode <-> Single-byte single codepage
Unicode <-> Multi-byte single codepage 
 

To start with, you can have a look at a TechEd material by Alexander Davidenkoff, SPC203 Integration Between Heterogeneous SAP Unicode and Third Party Systemshttp://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/60a35b0b-de0a-2a10-4ea5-dd82e6bbd74d(SDN logon required) 

This is coming to a challenge with certain landscapes with old release of SAP instances.For instance, Multi-byte single codepage BW 3.1 upgrade to BW7.0 with Multi-byte single codepage R/3 4.6C. Note that BW7.0 has to be Unicode. (Multi-byte single codepage <-> Multi-byte single codepage becomes Unicode <-> Multi-byte single codepage) 

Recent releases of NetWeaver Platforms comes with all required software components to accommodate the above mentioned data communications. However, for some of the old components it is very important to check whether the sender and the receiver are ready to communicate each other. Especially, for those systems with Multi-byte single codepages, logically speaking it is treated as a single codepage. However at the technical level (Byte level) it has the characteristics of Unicode. – in old BASIS system, it does not know how to treat different codepages, and worst cases, receiver ends do not know how to read what it just have received.  

In my case, I had certain problem transferring data from R/3 4.6 to BW7.0. I thought when a sender can specify its RFC codepage, a Unicode receiver would translate the sender’s codepage to Unicode and store to its DB, and it actually works well with single-byte single codepage data. However, as for Multi-byte single codepage data, there were some difficulties because of its length (two or more bytes).  

Later I found out that a sender of Multi-byte data transfer have to have certain level of technical structure to be able to send its data correctly. In my particular case, a sender of Multi-byte should have SAP NetWeaver WAS 6.20 or above.  

If you seem you would have similar data transfer scenarios like Unicode <-> MDMP, Unicode <-> Multi-byte single codepage, and a data sender side with Multi-byte single codepage data has lower release then SAP NetWeaver WAS 6.20,   

You might want to check out the following notes. ·       
  • Note 647495 – RFC for Unicode ./. non-Unicode Connections·       
  • Note 547444 – RFC Enhancement for Unicode ./. non-Unicode Connections·       
  • Note 752835 – Usage of the file interfaces in Unicode systems·       
  • Note 745030 – MDMP – Unicode Interfaces: Solution Overview

And also, you could ask for advices at globalization@sap.com

Of course, consulting with your local SAP consulting would give you a more comprehensive summary of your situation as well as project based solutions to your situation.   

To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

  1. Christine Puccio
    Great article about Unicode.  I wanted to add in a few notes myself about System Performance.  As of January 1st 2009, five major changes have to be considered when looking at SAP SD benchmark results (see SAP benchmark Update):

    -all new benchmarks have to use Unicode
    -average response time may not exceed 1 second (previous limit was 2 seconds)
    -the new general ledger is used
    -credit limit checking functionality has to be activated
    -new benchmarks are based on SAP ERP 6.0 enhancement pack 4 (EHP4)

    Unicode imposes 10-30% overhead (depending on specific platform) while almost 10% additional resources may be required for the new release and use of the new general ledger. Credit limit checking impact seems to be pretty low (around 2 %).
    Overall this leads to significantly lower SAPS values for a given system / CPU. 

    When looking at systems and benchmark results, you should look to see if the vendor is using Unicode vs. Non Unicode when running the benchmark.  Sun has been running all SD 2-Tier benchmarks with Unicode before this rule was in place.  Take a look at our wiki site to understand more about the SAP SD 2 Tier Benchmark: http://wikis.sun.com/display/SAPonSun/Understanding+a+SAP+SD+2-tier+benchmark+result

    For a look at Sun’s benchmark results on all our latest systems for SAP go to sun.com/sap/platform

    (0) 

Leave a Reply