Have you ever asked yourself if it is possible an HTML document encoded with ISO-8859-1 to display Unicode symbols? Honestly, I hadnt until yesterday when I faced such a problem. The problem was that we had to display one localized message in a browser using an existing HTML page, sent over HTTP with “content-type: text/html; charset=ISO-8859-1” header field.
Yes, I know it sounds impossible. If character encoding was UTF-8, it wouldnt be a problem, but it is ISO-8859-1. Fortunately, the HTML 4.01 specification has Unicode backdoor named numeric character references. Next is an excerpt from the specification that describes them:
Numeric character references specify the code position of a character in the document character set. Numeric character references may take two forms:
- The syntax “&#D;”, where D is a decimal number, refers to the ISO 10646 decimal character number D.
- The syntax “&#xH;” or “&#XH;”, where H is a hexadecimal number, refers to the ISO 10646 hexadecimal character number H. Hexadecimal numbers in numeric character references are case-insensitive.
Here are some examples of numeric character references:
- å (in decimal) represents the letter “a” with a small circle above it (used, for example, in Norwegian) (å).
- å (in hexadecimal) represents the same character (å).
- å (in hexadecimal) represents the same character as well (å).
- И (in decimal) represents the Cyrillic capital letter “I” (И).
- 水 (in hexadecimal) represents the Chinese character for water (水).
So the problem isnt really a problem any more. It is enough to write down all non-ISO-8859-1 characters as numeric character references and they will look fine in the browser. And luckily this works OK with both Internet Explorer and Mozilla Firefox.