Skip to Content

Have you ever asked yourself if it is possible an HTML document encoded with ISO-8859-1 to display Unicode symbols? Honestly, I hadn’t until yesterday when I faced such a problem. The problem was that we had to display one localized message in a browser using an existing HTML page, sent over HTTP with “content-type: text/html; charset=ISO-8859-1” header field.

Yes, I know it sounds impossible. If character encoding was UTF-8, it wouldn’t be a problem, but it is ISO-8859-1. Fortunately, the HTML 4.01 specification has Unicode backdoor named “numeric character references”. Next is an excerpt from the specification that describes them:

Numeric character references specify the code position of a character in the document character set. Numeric character references may take two forms:

  • The syntax “&#D;”, where D is a decimal number, refers to the ISO 10646 decimal character number D.
  • The syntax “&#xH;” or “&#XH;”, where H is a hexadecimal number, refers to the ISO 10646 hexadecimal character number H. Hexadecimal numbers in numeric character references are case-insensitive.

Here are some examples of numeric character references:

  • å (in decimal) represents the letter “a” with a small circle above it (used, for example, in Norwegian) (å).
  • å (in hexadecimal) represents the same character (å).
  • å (in hexadecimal) represents the same character as well (å).
  • И (in decimal) represents the Cyrillic capital letter “I” (И).
  • 水 (in hexadecimal) represents the Chinese character for water (水).

So the problem isn’t really a problem any more. It is enough to write down all non-ISO-8859-1 characters as numeric character references and they will look fine in the browser. And luckily this works OK with both Internet Explorer and Mozilla Firefox.

To report this post you need to login first.

Be the first to leave a comment

You must be Logged on to comment or reply to a post.

Leave a Reply