Unicode 11.0 Core Specification Bookmarks
This page contains links to sections, tables, and figures of the core specification for The Unicode Standard, Version 11.0. See Unicode 11.0.0 for full context about the Unicode Standard.
- Preface
- 1 Introduction
- 2 General Structure
- 2.1 Architectural Context
- 2.2 Unicode Design Principles
- 2.3 Compatibility Characters
- 2.4 Code Points and Characters
- 2.5 Encoding Forms
- 2.6 Encoding Schemes
- 2.7 Unicode Strings
- 2.8 Unicode Allocation
- 2.9 Details of Allocation
- 2.10 Writing Direction
- 2.11 Combining Characters
- 2.12 Equivalent Sequences
- 2.13 Special Characters
- 2.14 Conforming to the Unicode Standard
- 3 Conformance
- 3.1 Versions of the Unicode Standard
- 3.2 Conformance Requirements
- 3.3 Semantics
- 3.4 Characters and Encoding
- 3.5 Properties
- 3.6 Combination
- 3.7 Decomposition
- 3.8 Surrogates
- 3.9 Unicode Encoding Forms
- 3.10 Unicode Encoding Schemes
- 3.11 Normalization Forms
- 3.12 Conjoining Jamo Behavior
- 3.13 Default Case Algorithms
- 4 Character Properties
- 5 Implementation Guidelines
- 5.1 Data Structures for Character Conversion
- 5.2 Programming Languages and Data Types
- 5.3 Unknown and Missing Characters
- 5.4 Handling Surrogate Pairs in UTF-16
- 5.5 Handling Numbers
- 5.6 Normalization
- 5.7 Compression
- 5.8 Newline Guidelines
- 5.9 Regular Expressions
- 5.10 Language Information in Plain Text
- 5.11 Editing and Selection
- 5.12 Strategies for Handling Nonspacing Marks
- 5.13 Rendering Nonspacing Marks
- 5.14 Locating Text Element Boundaries
- 5.15 Identifiers
- 5.16 Sorting and Searching
- 5.17 Binary Order
- 5.18 Case Mappings
- 5.19 Mapping Compatibility Variants
- 5.20 Unicode Security
- 5.21 Ignoring Characters in Processing
- 5.22 U+FFFD Substitution in Conversion
- 6 Writing Systems and Punctuation
- 6.1 Writing Systems
- 6.2 General Punctuation
- Figure 6-2. Forms of CJK Punctuation
- Blocks Devoted to Punctuation
- Format Control Characters
- Space Characters
- Dashes and Hyphens
- Paired Punctuation
- Language-Based Usage of Quotation Marks
- Apostrophes
- Other Punctuation
- Archaic Punctuation and Editorial Marks
- Indic Punctuation
- CJK Punctuation
- Unknown or Unavailable Ideographs
- CJK Compatibility Forms
- 7 Europe-I
- 7.1 Latin
- Figure 7-1. Alternative Glyphs in Latin
- Table 7-1. Preferred Rendering of Cedilla versus Comma Below
- Figure 7-2. Diacritics on i and j
- Figure 7-3. Vietnamese Letters and Tone Marks
- Letters of Basic Latin: U+0041–U+007A
- Letters of the Latin-1 Supplement: U+00C0–U+00FF
- Latin Extended-A: U+0100–U+017F
- Latin Extended-B: U+0180–U+024F
- IPA Extensions: U+0250–U+02AF
- Phonetic Extensions: U+1D00–U+1DBF
- Latin Extended Additional: U+1E00–U+1EFF
- Latin Extended-C: U+2C60–U+2C7F
- Latin Extended-D: U+A720–U+A7FF
- Latin Extended-E: U+AB30–U+AB6F
- Latin Ligatures: U+FB00–U+FB06
- 7.2 Greek
- 7.3 Coptic
- 7.4 Cyrillic
- 7.5 Glagolitic
- 7.6 Armenian
- 7.7 Georgian
- 7.8 Modifier Letters
- 7.9 Combining Marks
- Figure 7-9. Double Diacritics
- Figure 7-10. Positioning of Double Diacritics
- Figure 7-11. Use of CGJ with Double Diacritics
- Figure 7-12. Interaction of Combining Marks with Ligatures
- Combining Diacritical Marks: U+0300–U+036F
- Combining Diacritical Marks Extended: U+1AB0–U+1AFF
- Combining Diacritical Marks Supplement: U+1DC0–U+1DFF
- Combining Diacritical Marks for Symbols: U+20D0–U+20FF
- Combining Half Marks: U+FE20–U+FE2F
- Combining Marks in Other Blocks
- 8 Europe-II
- 9 Middle East-I
- 9.1 Hebrew
- 9.2 Arabic
- Arabic: U+0600–U+06FF
- Figure 9-1. Directionality and Cursive Connection
- Figure 9-2. Using a Joiner
- Figure 9-3. Using a Non-joiner
- Figure 9-4. Combinations of Joiners and Non-joiners
- Figure 9-5. Placement of Harakat
- Table 9-1. Arabic Digit Names
- Table 9-2. Glyph Variation in Eastern Arabic-Indic Digits
- Figure 9-6. Arabic Year Sign
- Arabic Cursive Joining
- Arabic Ligatures
- Arabic Joining Groups
- Combining Hamza
- Other Letters for Extended Arabic
- Arabic Supplement: U+0750–U+077F
- Arabic Extended-A: U+08A0–U+08FF
- Arabic Presentation Forms-A: U+FB50–U+FDFF
- Arabic Presentation Forms-B: U+FE70–U+FEFF
- 9.3 Syriac
- 9.4 Samaritan
- 9.5 Mandaic
- 10 Middle East-II
- 11 Cuneiform and Hieroglyphs
- 12 South and Central Asia-I
- 12.1 Devanagari
- Devanagari: U+0900–U+097F
- Principles of the Devanagari Script
- Table 12-1. Devanagari Vowel Letters
- Figure 12-1. Dead Consonants in Devanagari
- Table 12-2. Devanagari Atomic Consonants
- Figure 12-2. Conjunct Formations in Devanagari
- Figure 12-3. Multi-Consonant Conjuncts in Devanagari
- Table 12-3. Devanagari Consonant Conjuncts
- Figure 12-4. Preventing Conjunct Forms in Devanagari
- Figure 12-5. Half-Consonants in Devanagari
- Figure 12-6. Independent Half-Forms in Devanagari
- Figure 12-7. Half-Consonants in Oriya
- Figure 12-8. Consonant Forms in Devanagari and Oriya
- Rendering Devanagari
- Devanagari Digits, Punctuation, and Symbols
- Extensions in the Main Devanagari Block
- Devanagari Extended: U+A8E0–U+A8FF
- Vedic Extensions: U+1CD0–U+1CFF
- 12.2 Bengali (Bangla)
- Table 12-11. Bengali Vowel Letters
- Table 12-12. Diphthong Vowel Letters in Kokborok
- Table 12-13. Assamese Consonant-Vowel Combinations
- Table 12-14. Bengali Consonant-Vowel Combinations
- Figure 12-12. Requesting Bengali Consonant-Vowel Ligature
- Figure 12-13. Blocking Bengali Consonant-Vowel Ligature
- Figure 12-14. Bengali Syllable tta
- Table 12-15. Use of Apostrophe in Bangla
- 12.3 Gurmukhi
- 12.4 Gujarati
- 12.5 Oriya (Odia)
- 12.6 Tamil
- Tamil: U+0B80–U+0BFF
- Tamil Vowels
- Tamil Ligatures
- Figure 12-20. Tamil Ligatures with i
- Table 12-26. Tamil Ligatures with u
- Figure 12-21. Spacing Forms of Tamil u
- Figure 12-22. Tamil Ligatures with ra
- Figure 12-23. Tamil Ligatures for shri
- Figure 12-24. Traditional Tamil Ligatures with aa
- Figure 12-25. Traditional Tamil Ligatures with o
- Figure 12-26. Traditional Tamil Ligatures with ai
- Figure 12-27. Vowel ai in Modern Tamil
- Tamil Named Character Sequences
- 12.7 Telugu
- 12.8 Kannada
- 12.9 Malayalam
- 13 South and Central Asia-II
- 14 South and Central Asia-III
- 15 South and Central Asia-IV
- 16 Southeast Asia
- 16.1 Thai
- 16.2 Lao
- 16.3 Myanmar
- 16.4 Khmer
- Khmer: U+1780–U+17FF
- Principles of the Khmer Script
- Table 16-5. Independent Khmer Vowel Characters
- Table 16-6. Two Registers of Khmer Consonants
- Table 16-7. Khmer Subscript Consonant Signs
- Table 16-8. Khmer Composite Dependent Vowel Signs with Nikahit
- Table 16-9. Khmer Subscript Independent Vowel Signs
- Figure 16-1. Common Ligatures in Khmer
- Figure 16-2. Common Multiple Forms in Khmer
- Figure 16-3. Examples of Syllabic Order in Khmer
- Figure 16-4. Ligation in Muul Style in Khmer
- Khmer Symbols: U+19E0–U+19FF
- 16.5 Tai Le
- 16.6 New Tai Lue
- 16.7 Tai Tham
- 16.8 Tai Viet
- 16.9 Kayah Li
- 16.10 Cham
- 16.11 Pahawh Hmong
- 16.12 Pau Cin Hau
- 16.13 Hanifi Rohingya
- 17 Indonesia and Oceania
- 18 East Asia
- 18.1 Han
- CJK Unified Ideographs
- Blocks Containing Han Ideographs
- General Characteristics of Han Ideographs
- Principles of Han Unification
- Unification Rules
- Abstract Shape
- Han Ideograph Arrangement
- Radical-Stroke Indices
- Mappings for Han Ideographs
- CJK Unified Ideographs Extension B: U+20000–U+2A6D6
- CJK Unified Ideographs Extension C: U+2A700–U+2B734
- CJK Unified Ideographs Extension D: U+2B740–U+2B81D
- CJK Unified Ideographs Extension E: U+2B820–U+2CEA1
- CJK Unified Ideographs Extension F: U+2CEB0–U+2EBE0
- CJK Compatibility Ideographs: U+F900–U+FAFF
- CJK Compatibility Supplement: U+2F800–U+2FA1D
- Kanbun: U+3190–U+319F
- Symbols Derived from Han Ideographs
- CJK and KangXi Radicals: U+2E80–U+2FD5
- CJK Additions from HKSCS and GB 18030
- CJK Strokes: U+31C0–U+31EF
- 18.2 Ideographic Description Characters
- 18.3 Bopomofo
- 18.4 Hiragana and Katakana
- 18.5 Halfwidth and Fullwidth Forms
- 18.6 Hangul
- 18.7 Yi
- 18.8 Nüshu
- 18.9 Lisu
- 18.10 Miao
- 18.11 Tangut
- 19 Africa
- 20 Americas
- 21 Notational Systems
- 22 Symbols
- 22.1 Currency Symbols
- 22.2 Letterlike Symbols
- 22.3 Numerals
- 22.4 Superscript and Subscript Symbols
- 22.5 Mathematical Symbols
- Mathematical Operators: U+2200–U+22FF
- Supplements to Mathematical Symbols and Arrows
- Supplemental Mathematical Operators: U+2A00–U+2AFF
- Miscellaneous Mathematical Symbols-A: U+27C0–U+27EF
- Miscellaneous Mathematical Symbols-B: U+2980–U+29FF
- Miscellaneous Symbols and Arrows: U+2B00–U+2B7F
- Arrows: U+2190–U+21FF
- Supplemental Arrows
- Standardized Variants of Mathematical Symbols
- 22.6 Invisible Mathematical Operators
- 22.7 Technical Symbols
- 22.8 Geometrical Symbols
- 22.9 Miscellaneous Symbols
- Miscellaneous Symbols and Pictographs
- Emoticons: U+1F600–U+1F64F
- Transport and Map Symbols: U+1F680–U+1F6FF
- Dingbats: U+2700–U+27BF
- Ornamental Dingbats: U+1F650–U+1F67F
- Alchemical Symbols: U+1F700–U+1F77F
- Mahjong Tiles: U+1F000–U+1F02F
- Domino Tiles: U+1F030–U+1F09F
- Playing Cards: U+1F0A0–U+1F0FF
- Chess Symbols: U+1FA00–U+1FA6F
- Yijing Hexagram Symbols: U+4DC0–U+4DFF
- Tai Xuan Jing Symbols: U+1D300–U+1D356
- Ancient Symbols: U+10190–U+101CF
- Phaistos Disc Symbols: U+101D0–U+101FF
- 22.10 Enclosed and Square
- 23 Special Areas and Format Characters
- 23.1 Control Codes
- 23.2 Layout Controls
- 23.3 Deprecated Format Characters
- 23.4 Variation Selectors
- 23.5 Private-Use Characters
- 23.6 Surrogates Area
- 23.7 Noncharacters
- 23.8 Specials
- 23.9 Tag Characters
- 24 About the Code Charts
- 24.1 Character Names List
- 24.2 CJK Ideographs
- 24.3 Hangul Syllables
- A Notational Conventions
- B Unicode Publications and Resources
- C Relationship to ISO/IEC 10646
- D Version History of the Standard
- Table D-1. Versions of Unicode and ISO/IEC 10646
- Table D-2. Allocation of Code Points by Type (Versions 1.0.0 to 3.0)
- Table D-3. Allocation of Code Points by Type (Versions 3.1 to 5.1)
- Table D-4. Allocation of Code Points by Type (Versions 5.2 to 7.0)
- Table D-5. Allocation of Code Points by Type (Versions 8.0 to 11.0)
- E Han Unification History
- F Documentation of CJK Strokes
- I Index