Alan Wood’s Unicode resources

Unicode and multilingual support in HTML, fonts, Web browsers and other applications

Caution and apology

I regret that I no longer have the time to keep this website up-to-date. The test pages include the Unicode 6.3 characters, and some of the Unicode 7.0 characters, but nothing more recent. The pages of fonts and utilities have not been updated for several years.

Introduction

Before Unicode became widely supported, it was not uncommon to face problems such as trying to include a passage in a different alphabet in one of your documents, for example a quotation in Russian in an English document, only to find that you had no Cyrillic characters available. Or to send a Spanish document in electronic form to someone in Greece, only to be told that the accented Latin characters had been replaced by Greek characters. Or to produce a Web page that included technical symbols and found that it worked with Windows but not with Mac OS or Unix. Problems like these arose with non-Latin alphabets and Symbol font because in those days most computers used fonts that contained a maximum of 256 characters. The first 128 characters (the ASCII characters) of most fonts included punctuation marks, numbers and the letters a–z and A–Z, and were not a problem. In the USA, Canada, the United Kingdom, the rest of the English-speaking world and much of Western Europe, the second set of 128 characters comprised more punctuation marks, some currency symbols (such as £ and \) and a lot of accented letters (such as á, ç, è, ñ, ô and ü). Older English versions of Microsoft Windows, and several other language editions, used this set of 256 characters, which is known as the ANSI character set.

If you lived in a country such as Egypt, Greece, Israel, Russia or Thailand that uses a different alphabet, then your version of Windows used a different character set. The first 128 characters were the same as in ANSI, but many of the places in the second set of 128 were taken by characters from the Arabic, Greek, Hebrew, Cyrillic or Thai alphabets. When documents started to be transferred electronically as e-mail messages, e-mail attachments or Web pages, instead of on paper, reading documents from another country, particularly a country with a different alphabet, became more and more of a problem. There were similar problems when moving documents between operating systems such as DOS, Windows, Mac OS and UNIX.

The solution was to leave behind the assortment of 8-bit fonts with their limit of 256 characters, where the same character number represented a different character in different alphabets, and move to a system that assigns a unique number to each character in each of the major languages of the world. Such a system was developed and is known as Unicode. It is intended for use on all computer systems, not just Windows, and covers Chinese, Japanese and Korean as well as the alphabets for many other languages and scripts, plus a large number of special characters. Some Unicode support has been included in Microsoft Windows since Windows 95, and Windows NT 4, Windows 2000, Windows XP, Windows Vista, Windows 7 and Windows 8 are based on Unicode instead of the ANSI or WGL4 character sets. Some Unicode support has been included in Mac OS since Mac OS 8.5, but prior to Mac OS X 10 only limited use was made of it by applications. Unicode is sometimes referred to as a 16-bit system, which would allow for only 65,536 characters, but this is not correct, and Unicode has the potential to cope with over one million unique characters.

The current version (6.3.0) of the Unicode Standard, developed by the Unicode Consortium, assigns a unique identifier to each of 110,187 graphical, formatting and control characters, covering the scripts of the world’s principal written languages and many mathematical and other symbols. A previous version (2.1) of the Unicode Standard encompassed 38,887 characters and was adopted as part of the recommendations for HTML 4.0.

Version	Date	Increase	Characters
1.0.0	October 1991	7,161
1.0.1	June 1992	21,198	28,359
1.1.0	June 1993	5,620	33,979
2.0.0	July 1996	4,971	38,950
2.1.2	May 1998	2	38,952
3.0.0	September 1999	10,307	49,259
3.1.0	March 2001	44,946	94,205
3.2.0	March 2002	1,016	95,221
4.0.0	April 2003	1,226	96,447
4.1.0	March 2005	1,273	97,720
5.0.0	July 2006	1,369	99,089
5.1.0	April 2008	1,624	100,713
5.2.0	October 2009	6,648	107,361
6.0.0	October 2010	2,087	109,449
6.1.0	January 2012	732	110,181
6.2.0	September 2012	1	110,182
6.3.0	October 2013	5	110,187

On this Web site, I have tried to gather together practical information about Unicode and the increasing number of applications and fonts that support it, intended to help people who are trying to use Unicode to produce standardised multilingual and technical documents. The pages on the site include:

Test pages for Unicode ranges: Lists of Unicode characters that you can use to test the Unicode support of your Web browser and fonts.
Search for a Unicode character: Search the test pages to find any character that you want to use.
Fonts for each Unicode range: A list of Unicode ranges and the fonts that support them.
Unicode fonts: Lists of fonts for Windows, Mac OS 9, Mac OS X 10 and Unix, with the Unicode ranges they support, and where to obtain them.
Browsers for Apple Macintosh computers: How to enable Unicode support in Web browsers under Mac OS 9.
Internet Explorer for Windows: How to enable Unicode support in IE 4, IE 5, IE 5.5 and IE 6.
Netscape for Windows: How to enable Unicode support in Netscape Communicator 4.x and 6.x.
Editors and word processors: Applications for Windows, Mac OS 9, Mac OS X 10 and Unix that can produce Unicode text, HTML and word processor documents.
File conversion, font and keyboard utilities: Utilities for Mac OS 9, Mac OS X 10, Windows and Unix that can convert files to and from Unicode, view the characters in Unicode fonts, or re-map your keyboard to type Unicode characters.
Creating multilingual Web pages: HTML code, fonts and editors to help you produce Web pages with multiple scripts and languages

For a complete list of pages on the site, please see the Site Map.

Top

Test pages for Unicode character ranges

The pages in the following list can be used to display the ranges of characters defined in the Unicode 6.3.0 Character Database, within the limitations imposed by your Web browser and the fonts that you have installed. There is also a page with a sample of Unicode characters from each range.

General Scripts

Symbols

Miscellaneous

Chinese, Japanese and Korean

Top

Web sites of other Unicode proponents

Alan Flavell: Unicode test material
Andrew Cunningham: Multilingual Unicode web page development
Apple Computer, Inc.: Unicode Utilities
Babel: Towards communicating on the Internet in any language...
Brian Wilson: Text in HTML...
Bruno Haible: The Unicode HOWTO (for Linux)
Christoph Singer: Slavic Text Processing and Typography
Daniel Tobias: Dan's Web Tips: Characters and Fonts
Frank da Cruz: UTF-8 Sampler
Henry Churchyard: Latin 1 and Unicode characters in &ampersand; entities
James Kass: Does Your Browser Support Multi-language?
Jukka Korpela: Using national and special characters in HTML
Markus Kuhn: UTF-8 and Unicode FAQ for Unix/Linux
Michael Everson: Evertype
Microsoft: Global Software Development and Computing Portal
Nelson H. F. Beebe: Fonts for the Unicode Character Set
Oscar van Vlijmen: Unicode browser display
Roman Czyborra: Unicode in the Unix Environment
Sun: Unicode Support in Solaris
Tex Texin: Internationalization (I18n), Localization (L10n), Standards, and Amusements
TITUS: Titus Is Testing Unicode Scriptmanagement
Tom Gewecke: Unleash your Multilingual Mac
Unicode Consortium: Unicode Home Page
Wazu Japan: Gallery of Unicode Fonts

Top

Alan Wood’s Web Site

HTML 4.01 Dublin Core Unicode Encoded