Skip to Content
Author's profile photo Jim Spath

Unicode 0.002

Like Finnegan’s Wake, I’m starting this story in the middle.  We’ve converted 2 huge SAP systems from MDMP to Unicode and are beginning the other huge R/3 system and a growing SCM system.

I want to outline ideas for a Unicode session during Community Day in Las Vegas:

  • What are proactive developers doing to prepare for Unicode conversion?
  • What do business content owners do when preparing to enter new markets?
  • How does IT best support the business with the FUD that comes with Unicode?

Later in the week at Tech Ed, I’ll be presenting on how Black & Decker is managing our Unicode conversions (session id LCM300).

I’ll point you to SAP resources who have helped us:

Of course others from SAP may wander in.

One of my ASUG colleagues who will likewise present on Unicode is Atul Patankar.

As background, large SAP customer shops have been struggling with Unicdoe conversions since SAP set the direction to move to a single code page for all of their applications. Multi-terabyte databases that support global corporations 24×7 are not trivial to get downtime for technology work. However, ASUG members including Microsoft, Dow Corning, Chevron and others have exchanged ideas over the past several years. I can provide references to the best practices we’ve learned.

A lot of these concepts will apply to technology changes such as database, operating system and chip family migrations.

So, what do you want from Community Day?  If you want heavy developer content, then we’ll have some tips & tricks ready.  If you want information life cycle management, the Basis, database and archiving gurus will gravitate.  If you’re a business process expert and just want to know what the heck this all means, we’ll have some 101 on hand.

And maybe some buttons. ASUG BITI We Make the Pieces Fit.

Updated 12-Feb-2008

All Unicode Epsiodes

  • 0.001 – [Destroyed by space pirates]
  • 0.002 – Community Day and Onward
  • 0.003 – Revenge of The Space Monster
  • 0.004 – Children of Murphy
  • 0.005 – Journey to the Edge of Quality
  • 0.006 – The Calm Before the Storm of Quality
  • 0.007 – Passage to Production
  • 0.008 – Unicode The Final Descent
  • 1.000 – The Final Chapter

Assigned Tags

      You must be Logged on to comment or reply to a post.
      Author's profile photo Srini Tanikella
      Srini Tanikella

      Thanks for the blog - I look forward to the session. Curious to get some statistics on the database growth, time it took to complete the project, team size and skill set of resources. We plan to embark on a unicode conversion sometime next year.


      Author's profile photo Jim Spath
      Jim Spath
      Blog Post Author

        I'll try to answer your first question ahead of time, as one of the most frequently asked Unicode questions is "won't my database double in size?". 
      This has been answered elsewhere, but here's a technical view, based on my DBA background. 

      Say we have existing data; for simplicity one row with 4 columns:

      BORN: 03-APR-1926
      DOGTAG: 7588654

      The different database versions store date, text and numbers differently, but here's one method:

      NAME - 8 bytes (variable length character)
      BORN - 3 bytes (24 bit integer)
      DOGTAG - 8 bytes (64 bit integer)
      RANK - 22 bytes (variable length character)

      So, the first issue is which columns need to be represented in different code pages?  The date and number columns do not, meaning that tables with lots of non-text fields won't grow much.

      The next issue is character and field size.  The name field might be 80, 256 or more characters wide, but this example uses much less.  Then, we need to know if the database uses UTF-8 or UTF-16 encoding.  For the former, character sets such as LATIN-1 and LATIN-2 store 8 bits per character, so if the text column needs to have German, Italian and Czech translations, each row is the same size in Unicode as before.  For UTF-16, 16 bits are used, but again, only for valid data.  Asian language character sets will require 16 bits (or more) per character in Unicode systems. Growth, then, will be based on your specific data content.

      The last, and definitely not least, issue is reorganization.  If significant volumes of data have been deleted, whether through archiving or purging, and the database tables have not been reorganized, this space will be recovered during a Unicode conversion.  The DBAs will appreciate the opportunity.

      Looking past the Unicode data conversion, companies have seen different storage growth rates, depending on how many languages data is being recorded in.

      Hope this helps!

      Author's profile photo Srini Tanikella
      Srini Tanikella
      Thank you. Definitely helps!