Unicode 11.0 Bookmarks

[Unicode] Unicode 11.0.0 Tech Site | Site Map | Search

Unicode 11.0 Core Specification Bookmarks

This page contains links to sections, tables, and figures of the core specification for The Unicode Standard, Version 11.0. See Unicode 11.0.0 for full context about the Unicode Standard.

Preface

Why Unicode?

What’s New?

Organization of This Standard

The Unicode Character Database

Unicode Code Charts

Unicode Standard Annexes

Unicode Technical Standards and Unicode Technical Reports

Updates and Errata

Acknowledgements

1 Introduction

Figure 1-1. Wide ASCII

1.1 Coverage

Standards Coverage

New Characters

1.2 Design Goals

Figure 1-2. Unicode Compared to the 2022 Framework

1.3 Text Handling

Characters and Glyphs

Text Elements

2 General Structure

2.1 Architectural Context

Basic Text Processes

Text Elements, Characters, and Text Processes

Figure 2-1. Text Elements and Characters

Text Processes and Encoding

2.2 Unicode Design Principles

Table 2-1. The 10 Unicode Design Principles

Universality

Efficiency

Characters, Not Glyphs

Figure 2-2. Characters Versus Glyphs

Table 2-2. User-Perceived Characters with Multiple Code Points

Figure 2-3. Unicode Character Code to Rendered Glyphs

Semantics

Plain Text

Logical Order

Figure 2-4. Bidirectional Ordering

Figure 2-5. Writing Direction and Numbers

Unification

Figure 2-6. Typeface Variation for the Bone Character

Dynamic Composition

Figure 2-7. Dynamic Composition

Stability

Convertibility

2.3 Compatibility Characters

Compatibility Variants

Compatibility Decomposable Characters

2.4 Code Points and Characters

Figure 2-8. Abstract and Encoded Characters

Types of Code Points

Table 2-3. Types of Code Points

2.5 Encoding Forms

Figure 2-9. Overlap in Legacy Mixed-Width Encodings

Figure 2-10. Boundaries and Interpretation

Figure 2-11. Unicode Encoding Forms

UTF-32

UTF-16

UTF-8

Comparison of the Advantages of UTF-32, UTF-16, and UTF-8

2.6 Encoding Schemes

Table 2-4. The Seven Unicode Encoding Schemes

Figure 2-12. Unicode Encoding Schemes

2.7 Unicode Strings

2.8 Unicode Allocation

Planes

Allocation Areas and Blocks

Assignment of Code Points

2.9 Details of Allocation

Figure 2-13. Unicode Allocation

Plane 0 (BMP)

Figure 2-14. Allocation on the BMP

Plane 1 (SMP)

Figure 2-15. Allocation on Plane 1

Plane 2 (SIP)

Other Planes

2.10 Writing Direction

Figure 2-16. Writing Directions

2.11 Combining Characters

Figure 2-17. Combining Enclosing Marks for Symbols

Sequence of Base Characters and Diacritics

Figure 2-18. Sequence of Base Characters and Diacritics

Figure 2-19. Reordered Indic Vowel Signs

Figure 2-20. Properties and Combining Character Sequences

Multiple Combining Characters

Figure 2-21. Stacking Sequences

Table 2-5. Interaction of Combining Characters

Table 2-6. Nondefault Stacking

Ligated Multiple Base Characters

Figure 2-22. Ligated Multiple Base Characters

Exhibiting Nonspacing Marks in Isolation

"Characters" and Grapheme Clusters

2.12 Equivalent Sequences

Figure 2-23. Equivalent Sequences

Normalization

Figure 2-24. Canonical Ordering

Decompositions

Figure 2-25. Types of Decomposables

Non-decomposition of Certain Diacritics

2.13 Special Characters

Special Noncharacter Code Points

Byte Order Mark (BOM)

Layout and Format Control Characters

The Replacement Character

Control Codes

2.14 Conforming to the Unicode Standard

Characteristics of Conformant Implementations

Unacceptable Behavior

Acceptable Behavior

Supported Subsets

3 Conformance

3.1 Versions of the Unicode Standard

Stability

Version Numbering

Errata and Corrigenda

References to the Unicode Standard

Precision in Version Citation

References to Unicode Character Properties

References to Unicode Algorithms

3.2 Conformance Requirements

Code Points Unassigned to Abstract Characters

Interpretation

Modification

Character Encoding Forms

Character Encoding Schemes

Bidirectional Text

Normalization Forms

Normative References

Unicode Algorithms

Default Casing Algorithms

Unicode Standard Annexes

3.3 Semantics

Definitions

Character Identity and Semantics

3.4 Characters and Encoding

Table 3-1. Named Unicode Algorithms

3.5 Properties

Types of Properties

Property Values

Default Property Values

Classification of Properties by Their Values

Property Status

Table 3-2. Normative Character Properties

Table 3-3. Informative Character Properties

Context Dependence

Stability of Properties

Simple and Derived Properties

Property Aliases

Private Use

3.6 Combination

Combining Character Sequences

Grapheme Clusters

Application of Combining Marks

Figure 3-1. Enclosing Marks

3.7 Decomposition

Compatibility Decomposition

Canonical Decomposition

3.8 Surrogates

3.9 Unicode Encoding Forms

Table 3-4. Examples of Unicode Encoding Forms

UTF-32

UTF-16

Table 3-5. UTF-16 Bit Distribution

UTF-8

Table 3-6. UTF-8 Bit Distribution

Table 3-7. Well-Formed UTF-8 Byte Sequences

Encoding Form Conversion

Constraints on Conversion Processes

U+FFFD Substitution of Maximal Subparts

Table 3-8. U+FFFD for Non-Shortest Form Sequences

Table 3-9. U+FFFD for Ill-Formed Sequences for Surrogates

Table 3-10. U+FFFD for Other Ill-Formed Sequences

Table 3-11. U+FFFD for Truncated Sequences

3.10 Unicode Encoding Schemes

Table 3-12. Summary of UTF-16BE, UTF-16LE, and UTF-16

Table 3-13. Summary of UTF-32BE, UTF-32LE, and UTF-32

3.11 Normalization Forms

Normalization Stability

Combining Classes

Specification of Unicode Normalization Forms

Starters

Table 3-14. Combining Marks and Starter Status

Canonical Ordering Algorithm

Table 3-15. Reorderable Pairs

Canonical Composition Algorithm

Definition of Normalization Forms

3.12 Conjoining Jamo Behavior

Definitions

Hangul Syllable Decomposition

Table 3-16. Hangul Characters Used in Examples

Hangul Syllable Composition

Hangul Syllable Name Generation

Sample Code for Hangul Algorithms

3.13 Default Case Algorithms

Definitions

Table 3-17. Context Specification for Casing

Default Case Conversion

Default Case Folding

Default Case Detection

Table 3-18. Case Detection Examples

Default Caseless Matching

4 Character Properties

4.1 Unicode Character Database

4.2 Case

Definitions of Case and Casing

Table 4-1. Relationship of Casing Definitions

Table 4-2. Case Function Values for Strings

Case Mapping

Table 4-3. Sources for Case Mapping Information

4.3 Combining Classes

Figure 4-1. Positions of Common Combining Marks

Reordrant, Split, and Subjoined Combining Marks

4.4 Directionality

4.5 General Category

Table 4-4. General Category

4.6 Numeric Value

Ideographic Numeric Values

Table 4-5. Primary Numeric Ideographs

Table 4-6. Ideographs Used as Accounting Numbers

4.7 Bidi Mirrored

4.8 Name

Table 4-7. Types of Character Name Aliases

Unicode Name Property

Table 4-8. Name Derivation Rule Prefix Strings

Code Point Labels

Table 4-9. Construction of Code Point Labels

Use of Character Names in APIs and User Interfaces

4.9 Unicode 1.0 Names

4.10 Letters, Alphabetic, and Ideographic

4.11 Properties for Text Boundaries

4.12 Characters with Unusual Properties

Table 4-10. Unusual Properties

5 Implementation Guidelines

5.1 Data Structures for Character Conversion

Issues

Multistage Tables

Figure 5-1. Two-Stage Tables

5.2 Programming Languages and Data Types

Unicode Data Types for C

5.3 Unknown and Missing Characters

5.4 Handling Surrogate Pairs in UTF-16

5.5 Handling Numbers

5.6 Normalization

Figure 5-2. Normalization

5.7 Compression

5.8 Newline Guidelines

Definitions

Table 5-1. Hex Values for Acronyms

Table 5-2. NLF Platform Correlations

Line Separator and Paragraph Separator

Recommendations

5.9 Regular Expressions

5.10 Language Information in Plain Text

Requirements for Language Tagging

Language Tags and Han Unification

5.11 Editing and Selection

Consistent Text Elements

Figure 5-3. Consistent Character Boundaries

5.12 Strategies for Handling Nonspacing Marks

Keyboard Input

Figure 5-4. Dead Keys Versus Handwriting Sequence

Truncation

Figure 5-5. Truncating Grapheme Clusters

5.13 Rendering Nonspacing Marks

Figure 5-6. Inside-Out Rule

Figure 5-7. Fallback Rendering

Figure 5-8. Bidirectional Placement

Figure 5-9. Justification

Canonical Equivalence

Table 5-3. Typing Order Differing from Canonical Order

Table 5-4. Permuting Combining Class Weights

Positioning Methods

Figure 5-10. Positioning with Ligatures

Figure 5-11. Positioning with Contextual Forms

Figure 5-12. Positioning with Enhanced Kerning

5.14 Locating Text Element Boundaries

5.15 Identifiers

5.16 Sorting and Searching

Culturally Expected Sorting and Searching

Language-Insensitive Sorting

Searching

Sublinear Searching

Figure 5-13. Sublinear Searching

5.17 Binary Order

UTF-8 in UTF-16 Order

UTF-16 in UTF-8 Order

5.18 Case Mappings

Titlecasing

Complications for Case Mapping

Figure 5-14. Uppercase Mapping for Turkish I

Figure 5-15. Lowercase Mapping for Turkish I

Figure 5-16. Casing of German Sharp S

Reversibility

Caseless Matching

Normalization and Casing

Table 5-5. Casing and Normalization in Strings

5.19 Mapping Compatibility Variants

5.20 Unicode Security

5.21 Ignoring Characters in Processing

Characters Ignored in Text Segmentation

Characters Ignored in Line Breaking

Characters Ignored in Cursive Joining

Characters Ignored in Identifiers

Characters Ignored in Searching and Sorting

Characters Ignored for Display

5.22 U+FFFD Substitution in Conversion

6 Writing Systems and Punctuation

6.1 Writing Systems

Figure 6-1. Overriding Inherent Vowels

Table 6-1. Typology of Scripts in the Unicode Standard

6.2 General Punctuation

Figure 6-2. Forms of CJK Punctuation

Blocks Devoted to Punctuation

Format Control Characters

Space Characters

Table 6-2. Unicode Space Characters

Dashes and Hyphens

Table 6-3. Unicode Dash Characters

Paired Punctuation

Language-Based Usage of Quotation Marks

Figure 6-3. European Quotation Marks

Table 6-4. Models of Visual Relationship between Quote Glyphs

Table 6-5. East Asian Quotation Marks

Figure 6-4. Asian Quotation Marks

Table 6-6. Opening and Closing Forms

Apostrophes

Other Punctuation

Table 6-7. Names for the @

Archaic Punctuation and Editorial Marks

Figure 6-5. Examples of Ancient Greek Editorial Marks

Figure 6-6. Use of Greek Paragraphos

Indic Punctuation

Table 6-8. Unicode Danda Characters

CJK Punctuation

Figure 6-7. CJK Parentheses

Unknown or Unavailable Ideographs

CJK Compatibility Forms

7 Europe-I

7.1 Latin

Figure 7-1. Alternative Glyphs in Latin

Table 7-1. Preferred Rendering of Cedilla versus Comma Below

Figure 7-2. Diacritics on i and j

Figure 7-3. Vietnamese Letters and Tone Marks

Letters of Basic Latin: U+0041–U+007A

Letters of the Latin-1 Supplement: U+00C0–U+00FF

Latin Extended-A: U+0100–U+017F

Latin Extended-B: U+0180–U+024F

IPA Extensions: U+0250–U+02AF

Phonetic Extensions: U+1D00–U+1DBF

Latin Extended Additional: U+1E00–U+1EFF

Latin Extended-C: U+2C60–U+2C7F

Latin Extended-D: U+A720–U+A7FF

Latin Extended-E: U+AB30–U+AB6F

Latin Ligatures: U+FB00–U+FB06

7.2 Greek

Greek: U+0370–U+03FF

Table 7-2. Nonspacing Marks Used with Greek

Figure 7-4. Variations in Greek Capital Letter Upsilon

Greek Extended: U+1F00–U+1FFF

Table 7-3. Greek Spacing and Nonspacing Pairs

Ancient Greek Numbers: U+10140–U+1018F

7.3 Coptic

Figure 7-5. Coptic Numerals

7.4 Cyrillic

Cyrillic: U+0400–U+04FF

Cyrillic Supplement: U+0500–U+052F

Cyrillic Extended-A: U+2DE0–U+2DFF

Figure 7-6. Combination of Titlo Letters

Cyrillic Extended-B: U+A640–U+A69F

Cyrillic Extended-C: U+1C80–U+1C8F

7.5 Glagolitic

Glagolitic: U+2C00–U+2C5F

Glagolitic Supplement: U+1E000–U+1E02F

7.6 Armenian

7.7 Georgian

Georgian: U+10A0–U+10FF

Georgian Extended: U+1C90–U+1CBF

Georgian Supplement: U+2D00–U+2D2F

Figure 7-7. Georgian Scripts and Casing

7.8 Modifier Letters

Spacing Modifier Letters: U+02B0–U+02FF

Figure 7-8. Tone Letters

Modifier Tone Letters: U+A700–U+A71F

7.9 Combining Marks

Figure 7-9. Double Diacritics

Figure 7-10. Positioning of Double Diacritics

Figure 7-11. Use of CGJ with Double Diacritics

Figure 7-12. Interaction of Combining Marks with Ligatures

Combining Diacritical Marks: U+0300–U+036F

Combining Diacritical Marks Extended: U+1AB0–U+1AFF

Figure 7-13. Positioning of Combining Parentheses

Combining Diacritical Marks Supplement: U+1DC0–U+1DFF

Table 7-4. Typicon Kavyka Symbols

Combining Diacritical Marks for Symbols: U+20D0–U+20FF

Figure 7-14. Use of Vertical Line Overlay for Negation

Combining Half Marks: U+FE20–U+FE2F

Figure 7-15. Double Diacritics and Half Marks

Combining Marks in Other Blocks

8 Europe-II

8.1 Linear A

8.2 Linear B

Linear B Syllabary: U+10000–U+1007F

Linear B Ideograms: U+10080–U+100FF

Aegean Numbers: U+10100–U+1013F

8.3 Cypriot Syllabary

Table 8-1. Similar Characters in Linear B and Cypriot

8.4 Ancient Anatolian Alphabets

Lycian: U+10280–U+1029F

Carian: U+102A0–U+102DF

Lydian: U+10920–U+1093F

8.5 Old Italic

Figure 8-1. Distribution of Old Italic

8.6 Runic

8.7 Old Hungarian

8.8 Gothic

8.9 Elbasan

8.10 Caucasian Albanian

8.11 Old Permic

Table 8-2. Combining Marks Used in Old Permic

8.12 Ogham

8.13 Shavian

9 Middle East-I

9.1 Hebrew

Hebrew: U+0590–U+05FF

Alphabetic Presentation Forms: U+FB1D–U+FB4F

9.2 Arabic

Arabic: U+0600–U+06FF

Figure 9-1. Directionality and Cursive Connection

Figure 9-2. Using a Joiner

Figure 9-3. Using a Non-joiner

Figure 9-4. Combinations of Joiners and Non-joiners

Figure 9-5. Placement of Harakat

Table 9-1. Arabic Digit Names

Table 9-2. Glyph Variation in Eastern Arabic-Indic Digits

Figure 9-6. Arabic Year Sign

Arabic Cursive Joining

Table 9-3. Primary Arabic Joining Types

Table 9-4. Derived Arabic Joining Types

Table 9-5. Arabic Glyph Types

Arabic Ligatures

Table 9-6. Arabic Obligatory Ligature Joining Groups

Table 9-7. Arabic Ligature Notation

Arabic Joining Groups

Table 9-8. Dual-Joining Arabic Characters

Table 9-9. Right-Joining Arabic Characters

Table 9-10. Forms of the Arabic Letter yeh

Combining Hamza

Table 9-11. Arabic Letters With Hamza Above

Other Letters for Extended Arabic

Arabic Supplement: U+0750–U+077F

Arabic Extended-A: U+08A0–U+08FF

Arabic Presentation Forms-A: U+FB50–U+FDFF

Arabic Presentation Forms-B: U+FE70–U+FEFF

9.3 Syriac

Syriac: U+0700–U+074F

Figure 9-7. Syriac Abbreviation

Figure 9-8. Use of SAM

Table 9-12. Miscellaneous Syriac Diacritic Use

Syriac Shaping

Table 9-13. Syriac Final Alaph Glyph Types

Table 9-14. Dual-Joining Syriac Characters

Table 9-15. Right-Joining Syriac Characters

Table 9-16. Syriac Alaph Glyph Forms

Table 9-17. Syriac Ligatures

Syriac Supplement: U+0860–U+086F

9.4 Samaritan

Table 9-18. Samaritan Performative Punctuation Marks

9.5 Mandaic

Table 9-19. Dual-Joining Mandaic Characters

Table 9-20. Right-Joining Mandaic Characters

10 Middle East-II

10.1 Old North Arabian

10.2 Old South Arabian

Table 10-1. Old South Arabian Numeric Characters

Table 10-2. Number Formation in Old South Arabian

10.3 Phoenician

10.4 Imperial Aramaic

Table 10-3. Number Formation in Aramaic

10.5 Manichaean

Table 10-4. Dual-Joining Manichaean Letters

Table 10-5. Right-Joining Manichaean Letters

Table 10-6. Left-Joining Manichaean Letters

Table 10-7. Non-Joining Manichaean Letters

Table 10-8. Manichaean Ligatures

10.6 Pahlavi and Parthian

Inscriptional Parthian: U+10B40–U+10B5F

Inscriptional Pahlavi: U+10B60–U+10B7F

Table 10-9. Inscriptional Parthian Shaping Behavior

Psalter Pahlavi: U+10B80–U+10BAF

10.7 Avestan

Table 10-10. Avestan Shaping Behavior

10.8 Nabataean

10.9 Palmyrene

10.10 Hatran

11 Cuneiform and Hieroglyphs

11.1 Sumero-Akkadian

Cuneiform: U+12000–U+123FF

Table 11-1. Cuneiform Script Usage

Cuneiform Numbers and Punctuation: U+12400–U+1247F

Early Dynastic Cuneiform: U+12480–U+1254F

11.2 Ugaritic

11.3 Old Persian

11.4 Egyptian Hieroglyphs

Table 11-2. Hieroglyphic Character Sequence

Figure 11-1. Interpretation of Hieroglyphic Markup

11.5 Meroitic

11.6 Anatolian Hieroglyphs

12 South and Central Asia-I

12.1 Devanagari

Devanagari: U+0900–U+097F

Principles of the Devanagari Script

Table 12-1. Devanagari Vowel Letters

Figure 12-1. Dead Consonants in Devanagari

Table 12-2. Devanagari Atomic Consonants

Figure 12-2. Conjunct Formations in Devanagari

Figure 12-3. Multi-Consonant Conjuncts in Devanagari

Table 12-3. Devanagari Consonant Conjuncts

Figure 12-4. Preventing Conjunct Forms in Devanagari

Figure 12-5. Half-Consonants in Devanagari

Figure 12-6. Independent Half-Forms in Devanagari

Figure 12-7. Half-Consonants in Oriya

Figure 12-8. Consonant Forms in Devanagari and Oriya

Rendering Devanagari

Figure 12-9. Rendering Order in Devanagari

Table 12-4. Sample Devanagari Half-Forms

Table 12-5. Sample Devanagari Ligatures

Table 12-6. RA + Vocalic Letter Ligature Forms

Table 12-7. Sample Devanagari Half-Ligature Forms

Table 12-8. Marathi and Nepali Allographs

Devanagari Digits, Punctuation, and Symbols

Extensions in the Main Devanagari Block

Figure 12-10. Use of Apostrophe in Bodo, Dogri and Maithili

Figure 12-11. Use of Avagraha in Dogri

Table 12-9. Devanagari Vowels Used in Bihari Languages

Table 12-10. Prishthamatra Orthography

Devanagari Extended: U+A8E0–U+A8FF

Vedic Extensions: U+1CD0–U+1CFF

12.2 Bengali (Bangla)

Table 12-11. Bengali Vowel Letters

Table 12-12. Diphthong Vowel Letters in Kokborok

Table 12-13. Assamese Consonant-Vowel Combinations

Table 12-14. Bengali Consonant-Vowel Combinations

Figure 12-12. Requesting Bengali Consonant-Vowel Ligature

Figure 12-13. Blocking Bengali Consonant-Vowel Ligature

Figure 12-14. Bengali Syllable tta

Table 12-15. Use of Apostrophe in Bangla

12.3 Gurmukhi

Table 12-16. Gurmukhi Vowel Letters

Table 12-17. Gurmukhi Conjuncts

Table 12-18. Additional Pairin and Addha Forms in Gurmukhi

Table 12-19. Use of Joiners in Gurmukhi

12.4 Gujarati

Table 12-20. Gujarati Vowel Letters

Table 12-21. Gujarati Conjuncts

12.5 Oriya (Odia)

Table 12-22. Oriya Vowel Letters

Table 12-23. Oriya Conjuncts

Table 12-24. Oriya Vowel Placement

Table 12-25. Ligation for the Syllable om

12.6 Tamil

Tamil: U+0B80–U+0BFF

Figure 12-15. Kssa Ligature in Tamil

Tamil Vowels

Figure 12-16. Tamil Vowel Reordering

Figure 12-17. Tamil Two-Part Vowels

Figure 12-18. Tamil Vowel Splitting and Reordering

Figure 12-19. Vowel Reordering Around a Tamil Conjunct

Tamil Ligatures

Figure 12-20. Tamil Ligatures with i

Table 12-26. Tamil Ligatures with u

Figure 12-21. Spacing Forms of Tamil u

Figure 12-22. Tamil Ligatures with ra

Figure 12-23. Tamil Ligatures for shri

Figure 12-24. Traditional Tamil Ligatures with aa

Figure 12-25. Traditional Tamil Ligatures with o

Figure 12-26. Traditional Tamil Ligatures with ai

Figure 12-27. Vowel ai in Modern Tamil

Tamil Named Character Sequences

Table 12-27. Tamil Vowels, Consonants, and Syllables

12.7 Telugu

Table 12-28. Telugu Vowel Letters

Table 12-29. Rendering of Telugu na + virama

12.8 Kannada

Kannada: U+0C80–U+0CFF

Principles of the Kannada Script

Table 12-30. Kannada Vowel Letters

Figure 12-28. Indicating Retroflexion in Badaga Vowels

Rendering Kannada

Table 12-31. Rendering of Kannada na + virama

12.9 Malayalam

Malayalam: U+0D00–U+0D7F

Table 12-32. Malayalam Vowel Letters

Malayalam Orthographic Reform

Table 12-33. Malayalam Orthographic Reform

Rendering Malayalam

Table 12-34. Malayalam Conjuncts

Table 12-35. Candrakkala Examples

Table 12-36. Use of Joiners in Malayalam

Table 12-37. Malayalam /rara/ and /uua/

Table 12-38. Malayalam /nr/ and /nt/

Table 12-39. Atomic Encoding of Malayalam Chillus

Malayalam Numbers and Punctuation

13 South and Central Asia-II

13.1 Thaana

Table 13-1. Thaana Glyph Placement

13.2 Sinhala

Sinhala: U+0D80–U+0DFF

Table 13-2. Sinhala Vowel Letters

Sinhala Archaic Numbers: U+111E0–U+111FF

13.3 Newa

Table 13-3. Murmured Resonants in Nepal Bhasa

13.4 Tibetan

Figure 13-1. Tibetan Syllable Structure

Figure 13-2. Justifying Tibetan Tseks

13.5 Mongolian

Mongolian: U+1800–U+18AF

Figure 13-3. Mongolian Glyph Convergence

Figure 13-4. Mongolian Consonant Ligation

Figure 13-5. Mongolian Positional Forms

Figure 13-6. Mongolian Free Variation Selector

Figure 13-7. Mongolian Gender Forms

Figure 13-8. Mongolian Vowel Separator

Mongolian Supplement: U+11660–U+1167F

13.6 Limbu

Table 13-4. Positions of Limbu Combining Characters

13.7 Meetei Mayek

Meetei Mayek: U+ABC0–U+ABFF

Meetei Mayak Extensions: U+AAE0–U+AAF6

13.8 Mro

13.9 Warang Citi

13.10 Ol Chiki

13.11 Chakma

13.12 Lepcha

Table 13-5. Lepcha Syllabic Structure

13.13 Saurashtra

13.14 Masaram Gondi

Figure 13-9. Masaram Gondi Consonant Clusters

Figure 13-10. Rendering of ra in Masaram Gondi

Table 13-6. Various Signs in Masaram Gondi

13.15 Gunjala Gondi

Figure 13-11. Gunjala Gondi Conjunct Formation

14 South and Central Asia-III

14.1 Brahmi

Table 14-1. Brahmi Vowel Letters

Figure 14-1. Consonant Ligatures in Brahmi

Table 14-2. Brahmi Positional Digits

14.2 Kharoshthi

Kharoshthi: U+10A00–U+10A5F

Figure 14-2. Geographical Extent of the Kharoshthi Script

Figure 14-3. Kharoshthi Number 1996

Rendering Kharoshthi

Figure 14-4. Kharoshthi Rendering Example

Table 14-3. Kharoshthi Vowel Signs

Table 14-4. Kharoshthi Vowel Modifiers

Table 14-5. Kharoshthi Consonant Modifiers

Table 14-6. Examples of Kharoshthi Virama

Figure 14-5. Subjoined Forms of ya

14.3 Bhaiksuki

14.4 Phags-pa

Figure 14-6. Phags-pa Syllable Om

Table 14-7. Phags-pa Positional Forms of I, U, E, and O

Table 14-8. Contextual Glyph Mirroring in Phags-pa

Table 14-9. Phags-pa Standardized Variants

Figure 14-7. Phags-pa Reversed Shaping

14.5 Marchen

14.6 Zanabazar Square

Figure 14-8. Conjunct Stacking in Zanabazar Square

14.7 Soyombo

14.8 Old Turkic

14.9 Old Sogdian

14.10 Sogdian

15 South and Central Asia-IV

15.1 Syloti Nagri

15.2 Kaithi

15.3 Sharada

15.4 Takri

Table 15-1. Takri Vowel Letters

15.5 Siddham

Figure 15-1. Siddham Consonant Cluster

Table 15-2. Siddham Punctuation Characters

15.6 Mahajani

15.7 Khojki

15.8 Khudawadi

Table 15-3. Khudawadi Vowel Letters

Table 15-4. Representation of Arabic Sounds in Khudawadi

15.9 Multani

15.10 Tirhuta

Table 15-5. Tirhuta Vowel Letters

15.11 Modi

Table 15-6. Modi Vowel Letters

Figure 15-2. Modi Shaping for ra

15.12 Grantha

Grantha: U+11300–U+1137F

Rendering Grantha

Figure 15-3. Splitting Large Conjunct Stacks in Grantha

Table 15-7. Rendering of Explicit Virama Forms in Grantha

Table 15-8. Additional Svara Marks used in Grantha

15.13 Ahom

15.14 Sora Sompeng

15.15 Dogra

16 Southeast Asia

16.1 Thai

Table 16-1. Glyph Positions in Thai Syllables

16.2 Lao

Table 16-2. Glyph Positions in Lao Syllables

16.3 Myanmar

Myanmar: U+1000–U+109F

Table 16-3. Modern Burmese Syllabic Structure

Myanmar Extended-A: U+AA60–U+AA7F

Khamti Shan

Table 16-4. Khamti Shan Tone Marks

Aiton and Phake

Myanmar Extended-B: U+A9E0–U+A9FF

16.4 Khmer

Khmer: U+1780–U+17FF

Principles of the Khmer Script

Table 16-5. Independent Khmer Vowel Characters

Table 16-6. Two Registers of Khmer Consonants

Table 16-7. Khmer Subscript Consonant Signs

Table 16-8. Khmer Composite Dependent Vowel Signs with Nikahit

Table 16-9. Khmer Subscript Independent Vowel Signs

Figure 16-1. Common Ligatures in Khmer

Figure 16-2. Common Multiple Forms in Khmer

Figure 16-3. Examples of Syllabic Order in Khmer

Figure 16-4. Ligation in Muul Style in Khmer

Khmer Symbols: U+19E0–U+19FF

16.5 Tai Le

Table 16-10. Tai Le Tone Marks

Table 16-11. Myanmar Digits in Tai Le

16.6 New Tai Lue

Table 16-12. New Tai Lue Vowel Placement

Table 16-13. New Tai Lue Registers and Tones

16.7 Tai Tham

16.8 Tai Viet

Table 16-14. Tai Viet Symbols and Punctuation

16.9 Kayah Li

16.10 Cham

Table 16-15. Cham Syllabic Structure

16.11 Pahawh Hmong

Figure 16-5. Pahawh Hmong Syllable Structure

16.12 Pau Cin Hau

16.13 Hanifi Rohingya

17 Indonesia and Oceania

17.1 Philippine Scripts

Tagalog: U+1700–U+171F

Hanunóo: U+1720–U+173F

Buhid: U+1740–U+175F

Tagbanwa: U+1760–U+177F

Principles of the Philippine Scripts

Table 17-1. Hanunóo and Buhid Vowel Sign Combinations

17.2 Buginese

Figure 17-1. Buginese Ligature

17.3 Balinese

Table 17-2. Balinese Base Consonants and Conjunct Forms

Table 17-3. Sasak Extensions for Balinese

Figure 17-2. Writing dharma in Balinese

Table 17-4. Balinese Consonant Clusters with u and u:

17.4 Javanese

Figure 17-3. Representation of Javanese Two-Part Vowels

17.5 Rejang

17.6 Batak

17.7 Sundanese

Sundanese: U+1B80–U+1BBF

Table 17-5. Modern Sundanese Syllabic Structure

Sundanese Supplement: U+1CC0–U+1CCF

17.8 Makasar

18 East Asia

18.1 Han

CJK Unified Ideographs

Blocks Containing Han Ideographs

Table 18-1. Blocks Containing Han Ideographs

Table 18-2. Small Extensions to the URO

General Characteristics of Han Ideographs

Table 18-3. Common Han Characters

Figure 18-1. Han Spelling

Figure 18-2. Semantic Context for Han Characters

Principles of Han Unification

Figure 18-3. Three-Dimensional Conceptual Model

Unification Rules

Figure 18-4. CJK Source Separation

Table 18-4. Source Encoding for Sword Variants

Figure 18-5. Not Cognates, Not Unified

Abstract Shape

Figure 18-6. Ideographic Component Structure

Figure 18-7. The Most Superior Node of an Ideographic Component

Table 18-5. Ideographs Not Unified

Table 18-6. Ideographs Unified

Han Ideograph Arrangement

Table 18-7. Han Ideograph Arrangement

Radical-Stroke Indices

Mappings for Han Ideographs

CJK Unified Ideographs Extension B: U+20000–U+2A6D6

CJK Unified Ideographs Extension C: U+2A700–U+2B734

CJK Unified Ideographs Extension D: U+2B740–U+2B81D

CJK Unified Ideographs Extension E: U+2B820–U+2CEA1

CJK Unified Ideographs Extension F: U+2CEB0–U+2EBE0

CJK Compatibility Ideographs: U+F900–U+FAFF

CJK Compatibility Supplement: U+2F800–U+2FA1D

Kanbun: U+3190–U+319F

Symbols Derived from Han Ideographs

CJK and KangXi Radicals: U+2E80–U+2FD5

CJK Additions from HKSCS and GB 18030

CJK Strokes: U+31C0–U+31EF

18.2 Ideographic Description Characters

Figure 18-8. Examples of Ideographic Description Characters

Figure 18-9. Using the Ideographic Description Characters

18.3 Bopomofo

Table 18-8. Mandarin Tone Marks

Table 18-9. Minnan and Hakka Tone Marks

18.4 Hiragana and Katakana

Hiragana: U+3040–U+309F

Katakana: U+30A0–U+30FF

Katakana Phonetic Extensions: U+31F0–U+31FF

Kana Supplement: U+1B000–U+1B0FF

Kana Extended-A: U+1B100–U+1B12F

Figure 18-10. Japanese Historic Kana for e and ye

Figure 18-11. Hentaigana Distinct Parent Ideographs

Figure 18-12. Other Hentaigana Examples

18.5 Halfwidth and Fullwidth Forms

18.6 Hangul

Hangul Jamo: U+1100–U+11FF

Hangul Jamo Extended-A: U+A960–U+A97F

Hangul Jamo Extended-B: U+D7B0–U+D7FF

Hangul Compatibility Jamo: U+3130–U+318F

Table 18-10. Separating Jamo Characters

Hangul Syllables: U+AC00–U+D7A3

Table 18-11. Line-Based Placement of Jungseong

18.7 Yi

18.8 Nüshu

18.9 Lisu

Table 18-12. Lisu Tone Letters

Table 18-13. Punctuation Adopted in Lisu Orthography

18.10 Miao

18.11 Tangut

Tangut: U+17000–U+187FF

Tangut Components: U+18800–U+18AFF

19 Africa

19.1 Ethiopic

Ethiopic: U+1200–U+137F

Table 19-1. Labialized Forms in Ethiopic -WAA

Table 19-2. Labialized Forms in Ethiopic -WE

Ethiopic Extensions

19.2 Osmanya

19.3 Tifinagh

Figure 19-1. Tifinagh Contextual Shaping

Figure 19-2. Tifinagh Consonant Joiner and Bi-consonants

19.4 N’Ko

Table 19-3. N’Ko Diacritic Usage

Table 19-4. N’Ko Tone Diacritics on Vowels

Figure 19-3. Examples of N’Ko Ordinals

Table 19-5. N’Ko Letter Shaping

19.5 Vai

19.6 Bamum

Bamum: U+A6A0–U+A6FF

Bamum Supplement: U+16800–U+16A3F

19.7 Bassa Vah

19.8 Mende Kikakui

Table 19-6. Number Formation in Mende Kikakui

19.9 Adlam

19.10 Medefaidrin

20 Americas

20.1 Cherokee

20.2 Canadian Aboriginal Syllabics

Canadian Aboriginal Syllabics: U+1400–U+167F

Canadian Aboriginal Syllabics Extended: U+18B0–U+18FF

20.3 Osage

Table 20-1. Combining Marks used in Osage

20.4 Deseret

Figure 20-1. Short Words Equivalent to Deseret Letter Names

Table 20-2. IPA Transcription of Deseret

21 Notational Systems

21.1 Braille

21.2 Western Musical Symbols

Figure 21-1. Examples of Specialized Music Layout

Figure 21-2. Precomposed Note Characters

Figure 21-3. Alternative Noteheads

Figure 21-4. Augmentation Dots and Articulation Symbols

Table 21-1. Examples of Ornamentation

21.3 Byzantine Musical Symbols

21.4 Ancient Greek Musical Notation

Table 21-2. Representation of Ancient Greek Vocal and Instrumental Notation

21.5 Duployan

Duployan: U+1BC00–U+1BC9F

Shorthand Format Controls: U+1BCA0–U+1BCAF

21.6 Sutton SignWriting

Sutton SignWriting: U+1D800–U+1DAAF

22 Symbols

22.1 Currency Symbols

Figure 22-1. Alternative Glyphs for Dollar Sign

Currency Symbols: U+20A0–U+20CF

Table 22-1. Currency Symbols Encoded in Other Blocks

22.2 Letterlike Symbols

Letterlike Symbols: U+2100–U+214F

Figure 22-2. Alternative Glyphs for Numero Sign

Mathematical Alphanumeric Symbols: U+1D400–U+1D7FF

Mathematical Alphabets

Figure 22-3. Wide Mathematical Accents

Figure 22-4. Style Variants and Semantic Distinctions in Mathematics

Table 22-2. Mathematical Alphanumeric Symbols

Fonts Used for Mathematical Alphabets

Figure 22-5. Easily Confused Shapes for Mathematical Glyphs

Arabic Mathematical Alphabetic Symbols: U+1EE00–U+1EEFF

22.3 Numerals

Decimal Digits

Table 22-3. Script-Specific Decimal Digits

Figure 22-6. CJK Ideographic Numbers

Other Digits

Table 22-4. Compatibility Digits

Figure 22-7. Regular and Old Style Digits

Non-Decimal Radix Systems

Acrophonic Systems and Other Letter-based Numbers

Coptic Epact Numbers: U+102E0–U+102FF

Rumi Numeral Symbols: U+10E60–U+10E7E

Siyaq Numerical Notation Systems

CJK Numerals

Fractions

Figure 22-8. Alternate Forms of Vulgar Fractions

Common Indic Number Forms: U+A830–U+A83F

22.4 Superscript and Subscript Symbols

Superscripts and Subscripts: U+2070–U+209F

22.5 Mathematical Symbols

Mathematical Operators: U+2200–U+22FF

Table 22-5. Mathematical Operators Disunified from Punctuation

Supplements to Mathematical Symbols and Arrows

Supplemental Mathematical Operators: U+2A00–U+2AFF

Miscellaneous Mathematical Symbols-A: U+27C0–U+27EF

Miscellaneous Mathematical Symbols-B: U+2980–U+29FF

Miscellaneous Symbols and Arrows: U+2B00–U+2B7F

Arrows: U+2190–U+21FF

Supplemental Arrows

Standardized Variants of Mathematical Symbols

22.6 Invisible Mathematical Operators

22.7 Technical Symbols

Control Pictures: U+2400–U+243F

Miscellaneous Technical: U+2300–U+23FF

Figure 22-9. Usage of Crops and Quine Corners

Table 22-6. Use of Mathematical Symbol Pieces

Figure 22-10. Usage of the Decimal Exponent Symbol

Optical Character Recognition: U+2440–U+245F

22.8 Geometrical Symbols

Box Drawing and Block Elements

Geometric Shapes: U+25A0–U+25FF

Geometric Shapes Extended: U+1F780–U+1F7FF

Table 22-7. Geometric Shape Collections

22.9 Miscellaneous Symbols

Miscellaneous Symbols and Pictographs

Emoticons: U+1F600–U+1F64F

Transport and Map Symbols: U+1F680–U+1F6FF

Dingbats: U+2700–U+27BF

Ornamental Dingbats: U+1F650–U+1F67F

Alchemical Symbols: U+1F700–U+1F77F

Mahjong Tiles: U+1F000–U+1F02F

Domino Tiles: U+1F030–U+1F09F

Playing Cards: U+1F0A0–U+1F0FF

Chess Symbols: U+1FA00–U+1FA6F

Yijing Hexagram Symbols: U+4DC0–U+4DFF

Tai Xuan Jing Symbols: U+1D300–U+1D356

Ancient Symbols: U+10190–U+101CF

Phaistos Disc Symbols: U+101D0–U+101FF

22.10 Enclosed and Square

Enclosed Alphanumerics: U+2460–U+24FF

Enclosed CJK Letters and Months: U+3200–U+32FF

CJK Compatibility: U+3300–U+33FF

Table 22-8. Japanese Era Names

Enclosed Alphanumeric Supplement: U+1F100–U+1F1FF

Enclosed Ideographic Supplement: U+1F200–U+1F2FF

23 Special Areas and Format Characters

23.1 Control Codes

Representing Control Sequences

Specification of Control Code Semantics

Table 23-1. Control Codes Specified in the Unicode Standard

23.2 Layout Controls

Line and Word Breaking

Table 23-2. Letter Spacing

Cursive Connection and Ligatures

Figure 23-1. Prevention of Joining

Figure 23-2. Exhibition of Joining Glyphs in Isolation

Figure 23-3. Effect of Intervening Joiners

Combining Grapheme Joiner

Bidirectional Ordering Controls

Table 23-3. Bidirectional Ordering Controls

Stateful Format Controls

Table 23-4. Paired Stateful Controls

Table 23-5. Paired Stateful Controls (Deprecated)

23.3 Deprecated Format Characters

23.4 Variation Selectors

23.5 Private-Use Characters

Private Use Area: U+E000–U+F8FF

Supplementary Private Use Areas

23.6 Surrogates Area

23.7 Noncharacters

23.8 Specials

Byte Order Mark (BOM): U+FEFF

Table 23-6. Unicode Encoding Scheme Signatures

Table 23-7. U+FEFF Signature in Other Charsets

Specials: U+FFF0–U+FFF8

Annotation Characters: U+FFF9–U+FFFB

Figure 23-4. Annotation Characters

Replacement Characters: U+FFFC–U+FFFD

23.9 Tag Characters

Tag Characters: U+E0000–U+E007F

Deprecated Use for Language Tagging

Syntax for Embedding Tags

Figure 23-5. Tag Characters

Working with Language Tags

Unicode Conformance Issues

Formal Tag Syntax

24 About the Code Charts

24.1 Character Names List

Images in the Code Charts and Character Lists

Special Characters and Code Points

Character Names

Informative Aliases

Normative Aliases

Cross References

Information About Languages

Case Mappings

Decompositions

Standardized Variation Sequences

Positional Forms

Figure 24-1. Mongolian Positional Forms

Block Headers

Subheads

24.2 CJK Ideographs

CJK Unified Ideographs

Table 24-1. IRG Sources

Figure 24-2. CJK Chart Format for the Main CJK Block

Figure 24-3. CJK Chart Format for CJK Extension A

Figure 24-4. CJK Chart Format for CJK Extension B

Compatibility Ideographs

Figure 24-5. CJK Chart Format for Compatibility Ideographs

Figure 24-6. Annotations Identifying CJK Unified Ideographs

24.3 Hangul Syllables

A Notational Conventions

Code Points

Character Names

Character Blocks

Sequences

Rendering

Figure A-1. Example of Rendering

Properties and Property Values

Miscellaneous

Extended BNF

Table A-1. Extended BNF

Table A-2. Character Class Examples

Operators

Table A-3. Operators

B Unicode Publications and Resources

B.1 The Unicode Consortium

The Unicode Technical Committee

Other Activities

B.2 Unicode Publications

B.3 Other Unicode Online Resources

Unicode Online Resources

How to Contact the Unicode Consortium

C Relationship to ISO/IEC 10646

C.1 History

Table C-1. Timeline

C.2 Encoding Forms in ISO/IEC 10646

Zero Extending

Table C-2. Zero Extending

C.3 UTF-8 and UTF-16

UTF-8

UTF-16

C.4 Synchronization of the Standards

C.5 Identification of Features for Unicode

C.6 Character Names

C.7 Character Functional Specifications

D Version History of the Standard

Table D-1. Versions of Unicode and ISO/IEC 10646

Table D-2. Allocation of Code Points by Type (Versions 1.0.0 to 3.0)

Table D-3. Allocation of Code Points by Type (Versions 3.1 to 5.1)

Table D-4. Allocation of Code Points by Type (Versions 5.2 to 7.0)

Table D-5. Allocation of Code Points by Type (Versions 8.0 to 11.0)

E Han Unification History

E.1 Development of the URO

E.2 Ideographic Rapporteur Group

E.3 CJK Sources

F Documentation of CJK Strokes

Table F-1. CJK Strokes

I Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z