As part of my preparation for teaching a SAPUI5 workshop in Israel, I was asked to provide some explanation on how SAPUI5 handles the topic of right-to-left languages.  Since I’ve never had to look at that subject before, I’ve done some digging and asking around, and here’s what I’ve found.

First we must understand how a browser handles the direction in which characters should be printed.  This could be right-to-left, left-to-right or some mixture of both (known as bi-directional).


Once we understand how browsers handle this topic, then the way SAPUI5 handles the subject will make a lot more sense (and take far less time to explain).


Since I am not a Hebrew speaker, I trust that Google Translate is telling me the truth when it says that “Hello World!” is “Shalom Olam!” or !שלום עולם


Text Direction in Browsers

There are two things to understand here:

  1. The order in which words are displayed to form a sentence
  2. The order in which the characters within a contiguous text string are displayed to form one or more words

Any browser that supports the use of Unicode characters must support the use of the W3C’s bi-directional algorithm.  As end-users, we don’t have to care too much about how this algorithm works internally, but we do need to understand how it behaves.  See section 8.2 of the W3C’s Language Direction specification for browsers http://www.w3.org/TR/html4/struct/dirlang.html

The behaviour of this algorithm can be loosely summarised in the following way:

  1. All HTML pages contain text that belongs to a base or default language.
  2. If there is any ambiguity about text directionality within an HTML page, then the dir=”rtl|ltr” parameter should be used on a block element (usually the <html> tag) to specify the predominant directionality.
  3. The default value for the dir= parameter is LTR for left-to-right.
  4. The dir= parameter is inherited by any nested elements within the current block; therefore if it is applied to the <html> element, you have defined the default directionality for the entire web page.
  5. The dir= parameter value may be overridden by specifying a new value for the current block.
  6. Here’s the important bit – Once the browser has determined the predominant text directionality, it then assumes that all character strings in the HTML page are represented using this directionality – irrespective of the language to which that text belongs!
  7. If a contiguous string of two or more Unicode escape characters are found that have the opposite directionality, then that string of characters is automatically reversed.


For example, the following HTML page contains two paragraphs that both say “Hello World”, first in English then in Hebrew.  The Hebrew characters have been represented as Unicode escape characters rather than the actual Hebrew glyphs.


<!DOCTYPE html>
<html>
<head>
  <meta http-equiv="X-UA-Compatible" content="IE=edge" />
  <meta charset="UTF-8">
  <title>Bidirectional Text</title>
  </head>
<body>
  <p>Hello World!</p>
  <p>שלום עולם!</p>
</body>
</html>


When the browser renders this page, the following logic is used:

  1. None of the block elements in the HTML page specify a directionality, so the default value or LTR is assumed.
  2. The browser assumes that all text within this file will be presented in LTR format – irrespective of the actual language represented by the text!
  3. A sequence of Unicode escape characters is used to represent the Hebrew text.  This sequence has been specified according to the web page’s overall text directionality of LTR – which means the order of characters in the Hebrew text has been written backwards!

If we display this HTML file as is, we get the following:

/wp-content/uploads/2015/11/bidi_1_839963.png

Notice that the following things have happened here:

  • Irrespective of their content, all paragraphs have been left aligned
  • The character order of the Hebrew text has been reversed automatically.  This is because:
    • The browser supports Unicode text
    • The dominant text direction of the current web page is LTR
    • A contiguous string of Unicode escape characters was found that all have the opposite text directionality
    • Therefore, the browser automatically reverses the order of characters in this contiguous string (and only this contiguous string) otherwise, it could not be rendered correctly
  • In “Unicode speak”, all white space and punctuation characters are said to be “weak”: that is, they have no inherent directionality; therefore, the exclamation mark is positioned according to the dominant word order for the web page.  This puts it at the right-hand end of the Hebrew sentence – which is not correct.

Overall, this is not an acceptable way to render both English and Hebrew text together on the same web page.  So let’s correct the situation.


<!DOCTYPE html>
<html>
<head>
  <meta http-equiv="X-UA-Compatible" content="IE=edge" />
  <meta charset="UTF-8">
  <title>Bidirectional Text</title>
</head>
<body>
  <p>Hello World!</p>
  <p dir="rtl">שלוםעולם!</p>
</body>
</html>


Simply by adding the dir=”rtl” parameter to a block element (the paragraph element in this case), we are telling the browser that we want to change the directionality of the text within this block.


Displaying this web page now gives the following:

/wp-content/uploads/2015/11/bidi_2_839964.png

That’s better!  Now the Hebrew text has been rendered correctly.  The entire paragraph is right justified and the exclamation mark is at the left-hand end of the sentence.


But there’s one more twist to the story…


Let’s change the HTML page to add a third paragraph in which the actual Hebrew glyphs have been entered – not their Unicode escaped representation:


<!DOCTYPE html>
<html>
<head>
  <meta http-equiv="X-UA-Compatible" content="IE=edge" />
  <meta charset="UTF-8">
  <title>Bidirectional Text</title>
</head>
<body>
  <p>Hello World!</p>
  <p dir="rtl">שלום עולם!</p>
  <p>!שלום עולם</p>
</body>
</html>


Now let’s see what that looks like…

Screen Shot 2015-11-27 at 15.27.10.png

Hmmm, that’s not right.  Ok, let’s add the direction parameter back in….


<!DOCTYPE html>
<html>
<head>
  <meta http-equiv="X-UA-Compatible" content="IE=edge" />
  <meta charset="UTF-8">
  <title>Bidirectional Text</title>
</head>
<body>
  <p>Hello World!</p>
  <p dir="rtl">שלום עולם!</p>
  <p dir="rtl">!שלום עולם</p>
</body>
</html>

Screen Shot 2015-11-27 at 15.36.56.png

What the…!!  The exclamation mark is now in the wrong place!


First of all, by entering text into the web page directly in Hebrew glyphs, the browser does not reverse the character order of “Shalom Olam”.  This is because the character order is only reversed when Unicode escape characters are used, not actual glyphs.


Secondly though, we have mixed the Hebrew characters with a Latin character (the exclamation mark).  Consequently, the browser looks at the overall contents of this paragraph and sees only two things:

  1. An exclamation mark belonging to the Latin-1 character set
  2. An existing RTL character sequence.


The Unicode bidirectional algorithm now kicks in and reasons that since the overall text direction of the web page is LTR, therefore all text, irrespective of language, will be specified in this order.  However, we have just told the browser that the text direction is RTL; therefore, it must reverse the order of the exclamation mark and the Hebrew character string.


Oops!

This can be “fixed” by moving the exclamation mark to the right-hand end of the character string; but this is really a workaround.

<!DOCTYPE html>
<html>
<head>
  <meta http-equiv="X-UA-Compatible" content="IE=edge" />
  <meta charset="UTF-8">
  <title>Bidirectional Text</title>
</head>
<body>
  <p>Hello World!</p>
  <p dir="rtl">שלום עולם!</p>
  <p dir="rtl">שלום עולם!</p>
</body>
</html>


Screen Shot 2015-11-27 at 16.25.38.png

For more information on this topic, see the W3C’s web page on creating HTML pages using RTL scripts.


Now we should turn our attention to .properties files


.properties files

For .properties files, there are several factors involved here:

  1. All .properties files must be encoded using the LATIN-1 code page (ISO-8859-1)
  2. As a consequence of point 1), all non-Latin characters must be represented as Unicode escape characters in the form of \uHHHH, where HHHH is a two byte, hexadecimal number
  3. As a second consequence of point 1), the character order in a .properties file is required to be left-to-right – irrespective of the language in which any particular text string is represented
  4. As a consequence of point 3), text strings belonging to RTL languages such as Hebrew must have their character order reversed

So in a .properties file in which you wish to place the Hebrew text “Shalom Olam”, it is incorrect to add the text like this:

shalomOlam=!שלום עולם

This is wrong for two reasons:

  1. The character order of the Hebrew text has not been reversed
  2. The Hebrew characters have been entered as the actual glyphs, rather than as Unicode escape characters

If you enter the text string in the way shown above, it will create a similar problem to the one described immediately above when a mixture of Hebrew glyphs and Latin characters were entered directly into the HTML page.  When rendered in the browser, the Hebrew glyphs will be in the correct order, but if any Latin characters are additionally present, then the order will be reversed.

The correct entry in the .properties file is this:

shalomOlam=\u05E9\u05DC\u05D5\u05DD \u05E2\u05D5\u05DC\u05DD!

Notice that the Latin exclamation mark character is now also at the right hand end of the text string because the entire string has been reversed.


Creating a basic SAPUI5 app that uses RTL text

Using step 8 of the SAPUI5 Walkthrough tutorial as the starting point, here’s a simple modification that includes “Hello World” in Hebrew.


Modify i18n/i18n.properties to add a property called shalomOlam


shalomOlam=\u05E9\u05DC\u05D5\u05DD \u05E2\u05D5\u05DC\u05DD!


Modify view/App.view.xml to include the following extra <Text> element shown in red:

<mvc:View controllerName="sap.ui.demo.wt.controller.App"
    xmlns="sap.m"
    xmlns:mvc="sap.ui.core.mvc" >
  <Button text="Say Hello" press="onShowHello"/>
  <Input value="{/recipient/name}"
      description="{i18n>showHelloButtonText}"
      valueLiveUpdate="true"
      width="60%"/>
  <Text text="{i18n>shalomOlam}" class="sapUiMarginSmall" textDirection="RTL" />
</mvc:View>

Screen Shot 2015-11-27 at 17.23.20.png

So there you have it – how to get RTL text to appear in the correct order in a SAPUI5 app.  As you can see, the majority of this blog discusses how a browser handles this situation.  Therefore, if you understand that, you’re about 90% of the way to getting this working in SAPUI5.




Chris W




To report this post you need to login first.

1 Comment

You must be Logged on to comment or reply to a post.

  1. Jocelyn Dart

    Nice blog Chris! Really like the clear explanation and the use of the SAPUI5 Walkthrough Tutorial for the final example. Takes me back to when I first had to learn about multi-languages, date formats and decimal formats.. 😉

    (0) 

Leave a Reply