Basic Latin (Unicode block)
| Basic Latin or C0 Controls and Basic Latin | |
|---|---|
| Range | U+0000..U+007F (128 code points) |
| Plane | BMP |
| Scripts | Latin (52 characters) Common (76 characters) |
| Major alphabets | English French German Spanish Vietnamese |
| Symbol sets | Arabic numerals Punctuation |
| Assigned | 128 code points 33 Control or Format |
| Unused | 0 reserved code points |
| Source standards | ISO/IEC 8859, ISO 646 |
| Unicode version history | |
| 1.0.0 (1991) | 128 (+128) |
| Unicode documentation | |
| Code chart ∣ Web page | |
| Note: [1] [2] | |
The Basic Latin Unicode block,[3] sometimes informally called C0 Controls and Basic Latin,[4] is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.
The Basic Latin block was included in its present form from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire.[5] Its block name in Unicode 1.0 was ASCII.[6]
Table of characters
[edit ]- A The letter U+005C (\) may show up as a Yen(\) or Won(₩) sign in Japanese/Korean fonts mistaking Unicode (especially UTF-8) as a legacy character set which replaced the backslash with these signs.[7]
Subheadings
[edit ]The C0 Controls and Basic Latin block contains six subheadings.[8]
C0 controls
[edit ]The C0 Controls, referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7-bit and 8-bit encoding schemes. The Alias names for C0 controls are taken from the ISO/IEC 6429:1992 standard.[8]
ASCII punctuation and symbols
[edit ]This subheading refers to standard punctuation characters, simple mathematical operators, and symbols like the dollar sign, percent, ampersand, underscore, and pipe.[8]
ASCII digits
[edit ]The ASCII Digits subheading contains the standard European number characters 1–9 and 0.[8]
Uppercase Latin alphabet
[edit ]The Uppercase Latin alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the majuscule.[8]
Lowercase Latin alphabet
[edit ]The Lowercase Latin Alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the minuscule.[8]
Control character
[edit ]The Control Character subheading contains the "Delete" character.[8]
Number of symbols, letters and control codes
[edit ]The table below shows the number of letters, symbols and control codes in each of the subheadings in the C0 Controls and Basic Latin block.
| Subheading | Number of symbols | Range of characters |
|---|---|---|
| C0 controls | 32 control codes | U+0000 to U+001F |
| ASCII punctuation and symbols | 33 punctuation marks and symbols | U+0020 to U+002F, U+003A to U+0040, U+005B to U+0060 and U+007B to U+007E |
| ASCII digits | 10 digits | U+0030 to U+0039 |
| Uppercase Latin Alphabet | 26 unaccented Latin letters in the majuscule. | U+0041 to U+005A |
| Lowercase Latin Alphabet | 26 unaccented Latin letters in the minuscule. | U+0061 to U+007A |
| Control character | 1 control code containing the "Delete" character. | U+007F |
Chart
[edit ]- ^ As of Unicode version 17.0
Variants
[edit ]Several of the characters are defined to render as a standardized variant if followed by variant indicators.
A variant is defined for a zero with a short diagonal stroke: U+0030 DIGIT ZERO, U+FE00 VS1 (0︀).[9] [10]
Twelve characters (#, *, and the digits) can be followed by U+FE0E VS15 or U+FE0F VS16 to create emoji variants.[11] [12] [13] [14] They are keycap base characters, for example #️⃣ (U+0023 NUMBER SIGN U+FE0F VS16 U+20E3 COMBINING ENCLOSING KEYCAP). The VS15 version is "text presentation" while the VS16 version is "emoji-style".[10]
Emoji variation sequencesHistory
[edit ]The following Unicode-related documents record the purpose and process of defining specific characters in the Basic Latin block:
| Version | Final code points[a] | Count | UTC ID | L2 ID | WG2 ID | Document |
|---|---|---|---|---|---|---|
| 1.0.0 | U+0000..007F | 128 | (to be determined) | |||
| UTC/1999-013 | Karlsson, Kent (1999年05月27日), Tildes and micro sign decompositions | |||||
| L2/99-176R | Moore, Lisa (1999年11月04日), "Micro Sign Case Mappings", Minutes from the joint UTC/L2 meeting in Seattle, June 8-10, 1999 | |||||
| L2/04-145 | Starner, David (2004年04月30日), C with stroke character examples from BAE report 1884 (Dorsey) | |||||
| L2/04-202 | Anderson, Deborah (2004年06月07日), Slashed C Feedback | |||||
| N3046 | Suignard, Michel (2006年02月22日), Improving formal definition for control characters | |||||
| N3103 (pdf, doc) | Umamaheswaran, V. S. (2006年08月25日), "M48.33", Unconfirmed minutes of WG 2 meeting 48, Mountain View, CA, USA; 2006年04月24日/27 | |||||
| L2/11-043 | Freytag, Asmus; Karlsson, Kent (2011年02月02日), Proposal to correct mistakes and inconsistencies in certain property assignments for super and subscripted letters | |||||
| L2/11-160 | PRI #181 Changing General Category of Twelve Characters, 2011年05月02日 | |||||
| L2/11-261R2 | Moore, Lisa (2011年08月16日), "Consensus 128-C3", UTC #128 / L2 #225 Minutes, Accept Ken Whistler's recommendations in L2/11-281 on name aliases for control characters with the addition of the abbreviations BEL and NUL. | |||||
| L2/11-438 [b] [c] | N4182 | Edberg, Peter (2011年12月22日), Emoji Variation Sequences (Revision of L2/11-429) | ||||
| L2/15-107 | Moore, Lisa (2015年05月12日), "Consensus 143-C5", UTC #143 Minutes, Add the 12 keycap sequences in emoji-data.txt as provisional named sequences in Unicode 8.0. | |||||
| L2/15-268 | Beeton, Barbara; Freytag, Asmus; Iancu, Laurențiu; Sargent, Murray (2015年10月30日), Proposal to Represent the Slashed Zero Variant of Empty Set | |||||
| L2/15-301 [d] [c] | Pournader, Roozbeh (2015年11月01日), A proposal for 278 standardized variation sequences for emoji | |||||
| L2/15-254 | Moore, Lisa (2015年11月16日), "B.12.1.2 Proposal to Represent the Slashed Zero Variant of Empty Set", UTC #145 Minutes | |||||
| L2/17-294 | N4914 | Lunde, Ken (2017年08月14日), Proposal to add standardized variation sequence for U+FF10 FULLWIDTH DIGIT ZERO | ||||
| L2/22-019 | Scherer, Markus; et al. (2022年01月19日), "F.2 F4: U+0019 in ISO vs. NameAliases.txt vs. chart/NamesList.txt", UTC #170 properties feedback & recommendations | |||||
| L2/22-016 | Constable, Peter (2022年04月21日), "Consensus 170-C24", UTC #170 Minutes, For U+0019, add a Name alias "EM" of type abbreviation, for Unicode version 15.0. | |||||
| ||||||
See also
[edit ]- Latin script in Unicode
- Latin-1 Supplement
- Character encoding
- ISO/IEC 8859-1
- Latin script
- ISO basic Latin alphabet
References
[edit ]- ^ "Unicode character database". The Unicode Standard. Retrieved 2023年07月26日.
- ^ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023年07月26日.
- ^ "block.txt". The Unicode Consortium. Retrieved 2023年03月23日.
- ^ "C0 Controls and Basic Latin" (PDF). The Unicode Standard, Version 15.0. Unicode, Inc. 2022. Retrieved March 22, 2023.
- ^ The Unicode Standard Version 1.0, Volume 1. Addison-Wesley Publishing Company, Inc. 1990. ISBN 0-201-56788-1.
- ^ "3.8: Block-by-Block Charts" (PDF). The Unicode Standard. version 1.0. Unicode Consortium.
- ^ Michael S. Kaplan (2005年09月17日). "When is a backslash not a backslash?". Sorting it all Out. Microsoft. Archived from the original on 2010年06月12日. Also available at: http://archives.miloush.net/michkap/archive/2005/09/17/469941.html
- ^ a b c d e f g "Unicode 6.2 code charts" (PDF). The Unicode Standard. Retrieved 1 April 2013.
- ^ Beeton, Barbara; Freytag, Asmus; Iancu, Laurențiu; Sargent, Murray (2015年10月30日). "L2/15-268: Proposal to Represent the Slashed Zero Variant of Empty Set" (PDF).
- ^ a b "UTS #51 Emoji Variation Sequences". The Unicode Consortium.
- ^ Edberg, Peter (2011年12月22日). "L2/11-438: Emoji Variation Sequences (Revision of L2/11-429)" (PDF).
- ^ Pournader, Roozbeh (2015年11月01日). "L2/15-301: A proposal for 278 standardized variation sequences for emoji" (PDF).
- ^ "UTR #51: Unicode Emoji". Unicode Consortium. 2023年09月05日.
- ^ "UCD: Emoji Data for UTR #51". Unicode Consortium. 2023年02月01日.