Basic Latin (Unicode block)

Unicode character block

Basic Latin or C0 Controls and Basic Latin
Range	U+0000..U+007F (128 code points)
Plane	BMP
Scripts	Latin (52 characters) Common (76 characters)
Major alphabets	English French German Spanish Vietnamese
Symbol sets	Arabic numerals Punctuation
Assigned	128 code points 33 Control or Format
Unused	0 reserved code points
Source standards	ISO/IEC 8859, ISO 646
Unicode version history

1.0.0 (1991)	128 (+128)
Unicode documentation
Code chart ∣ Web page
Note: ^[1]^[2]

The Basic Latin Unicode block,^[3] sometimes informally called C0 Controls and Basic Latin,^[4] is the first block of the Unicode standard, and the only block which is encoded in one byte in UTF-8. The block contains all the letters and control codes of the ASCII encoding. It ranges from U+0000 to U+007F, contains 128 characters and includes the C0 controls, ASCII punctuation and symbols, ASCII digits, both the uppercase and lowercase of the English alphabet and a control character.

The Basic Latin block was included in its present form from version 1.0.0 of the Unicode Standard, without addition or alteration of the character repertoire.^[5] Its block name in Unicode 1.0 was ASCII.^[6]

Table of characters

[edit ]

Code	Result	Description	Acronym
C0 controls
U+0000	Null character	NUL
U+0001	Start of Heading	SOH
U+0002	Start of Text	STX
U+0003	End-of-text character	ETX
U+0004	End-of-transmission character	EOT
U+0005	Enquiry character	ENQ
U+0006	Acknowledge character	ACK
U+0007	Bell character	BEL
U+0008	Backspace	BS
U+0009	Horizontal tab	HT
U+000A	Line feed	LF
U+000B	Vertical tab	VT
U+000C	Form feed	FF
U+000D	Carriage return	CR
U+000E	Shift Out	SO
U+000F	Shift In	SI
U+0010	Data Link Escape	DLE
U+0011	Device Control 1	DC1
U+0012	Device Control 2	DC2
U+0013	Device Control 3	DC3
U+0014	Device Control 4	DC4
U+0015	Negative-acknowledge character	NAK
U+0016	Synchronous Idle	SYN
U+0017	End of Transmission Block	ETB
U+0018	Cancel character	CAN
U+0019	End of Medium	EM
U+001A	Substitute character	SUB
U+001B	Escape character	ESC
U+001C	File Separator	FS
U+001D	Group Separator	GS
U+001E	Record Separator	RS
U+001F	Unit Separator	US
ASCII punctuation and symbols
U+0020		Space	SP
U+0021	!	Exclamation mark	EXC
U+0022	"	Quotation mark	QUO
U+0023	#	Number sign
U+0024	$	Dollar sign
U+0025	%	Percent sign
U+0026	&	Ampersand
U+0027	'	Apostrophe
U+0028	(	Left parenthesis
U+0029	)	Right parenthesis
U+002A	*	Asterisk
U+002B	⁺	Plus sign
U+002C	,	Comma
U+002D	-	Hyphen-minus
U+002E	.	Full stop or period
U+002F	/	Solidus or Slash
ASCII digits
U+0030	0	Digit Zero
U+0031	1	Digit One
U+0032	2	Digit Two
U+0033	3	Digit Three
U+0034	4	Digit Four
U+0035	5	Digit Five
U+0036	6	Digit Six
U+0037	7	Digit Seven
U+0038	8	Digit Eight
U+0039	9	Digit Nine
ASCII punctuation and symbols
U+003A	:	Colon
U+003B	;	Semicolon
U+003C	<	Less-than sign
U+003D	=	Equal sign
U+003E	>	Greater-than sign
U+003F	?	Question mark
U+0040	@	At sign or Commercial at
Uppercase Latin alphabet
U+0041	A	Latin Capital letter A
U+0042	B	Latin Capital letter B
U+0043	C	Latin Capital letter C
U+0044	D	Latin Capital letter D
U+0045	E	Latin Capital letter E
U+0046	F	Latin Capital letter F
U+0047	G	Latin Capital letter G
U+0048	H	Latin Capital letter H
U+0049	I	Latin Capital letter I
U+004A	J	Latin Capital letter J
U+004B	K	Latin Capital letter K
U+004C	L	Latin Capital letter L
U+004D	M	Latin Capital letter M
U+004E	N	Latin Capital letter N
U+004F	O	Latin Capital letter O
U+0050	P	Latin Capital letter P
U+0051	Q	Latin Capital letter Q
U+0052	R	Latin Capital letter R
U+0053	S	Latin Capital letter S
U+0054	T	Latin Capital letter T
U+0055	U	Latin Capital letter U
U+0056	V	Latin Capital letter V
U+0057	W	Latin Capital letter W
U+0058	X	Latin Capital letter X
U+0059	Y	Latin Capital letter Y
U+005A	Z	Latin Capital letter Z
ASCII punctuation and symbols
U+005B	[	Left Square Bracket
U+005C	\	Backslash ^[A]
U+005D	]	Right Square Bracket
U+005E	^	Circumflex accent
U+005F	_	Low line
U+0060	`	Grave accent
Lowercase Latin alphabet
U+0061	a	Latin Small Letter A
U+0062	b	Latin Small Letter B
U+0063	c	Latin Small Letter C
U+0064	d	Latin Small Letter D
U+0065	e	Latin Small Letter E
U+0066	f	Latin Small Letter F
U+0067	g	Latin Small Letter G
U+0068	h	Latin Small Letter H
U+0069	i	Latin Small Letter I
U+006A	j	Latin Small Letter J
U+006B	k	Latin Small Letter K
U+006C	l	Latin Small Letter L
U+006D	m	Latin Small Letter M
U+006E	n	Latin Small Letter N
U+006F	o	Latin Small Letter O
U+0070	p	Latin Small Letter P
U+0071	q	Latin Small Letter Q
U+0072	r	Latin Small Letter R
U+0073	s	Latin Small Letter S
U+0074	t	Latin Small Letter T
U+0075	u	Latin Small Letter U
U+0076	v	Latin Small Letter V
U+0077	w	Latin Small Letter W
U+0078	x	Latin Small Letter X
U+0079	y	Latin Small Letter Y
U+007A	z	Latin Small Letter Z
ASCII punctuation and symbols
U+007B	{	Left Curly Bracket
U+007C	\|	Vertical bar
U+007D	}	Right Curly Bracket
U+007E	~	Tilde
Control character
U+007F	␡	Delete	DEL

^A The letter U+005C (\) may show up as a Yen(\) or Won(₩) sign in Japanese/Korean fonts mistaking Unicode (especially UTF-8) as a legacy character set which replaced the backslash with these signs.^[7]

Subheadings

[edit ]

The C0 Controls and Basic Latin block contains six subheadings.^[8]

C0 controls

[edit ]

The C0 Controls, referred to as C0 ASCII control codes in version 1.0, are inherited from ASCII and other 7-bit and 8-bit encoding schemes. The Alias names for C0 controls are taken from the ISO/IEC 6429:1992 standard.^[8]

ASCII punctuation and symbols

[edit ]

This subheading refers to standard punctuation characters, simple mathematical operators, and symbols like the dollar sign, percent, ampersand, underscore, and pipe.^[8]

ASCII digits

[edit ]

The ASCII Digits subheading contains the standard European number characters 1–9 and 0.^[8]

Uppercase Latin alphabet

[edit ]

The Uppercase Latin alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the majuscule.^[8]

Lowercase Latin alphabet

[edit ]

The Lowercase Latin Alphabet subheading contains the standard 26-letter unaccented Latin alphabet in the minuscule.^[8]

Control character

[edit ]

The Control Character subheading contains the "Delete" character.^[8]

Number of symbols, letters and control codes

[edit ]

The table below shows the number of letters, symbols and control codes in each of the subheadings in the C0 Controls and Basic Latin block.

Subheading	Number of symbols	Range of characters
C0 controls	32 control codes	U+0000 to U+001F
ASCII punctuation and symbols	33 punctuation marks and symbols	U+0020 to U+002F, U+003A to U+0040, U+005B to U+0060 and U+007B to U+007E
ASCII digits	10 digits	U+0030 to U+0039
Uppercase Latin Alphabet	26 unaccented Latin letters in the majuscule.	U+0041 to U+005A
Lowercase Latin Alphabet	26 unaccented Latin letters in the minuscule.	U+0061 to U+007A
Control character	1 control code containing the "Delete" character.	U+007F

Chart

[edit ]

C0 Controls and Basic Latin ^[a]
Official Unicode Consortium code chart (PDF)

0 1 2 3 4 5 6 7 8 9 A B C D E F

U+000x NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI

U+001x DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US

U+002x SP ! " # $ % & ' ( ) * + , - . /

U+003x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?

U+004x @ A B C D E F G H I J K L M N O

U+005x P Q R S T U V W X Y Z [ \ ] ^ _

U+006x ` a b c d e f g h i j k l m n o

U+007x p q r s t u v w x y z { | } ~ DEL

^ As of Unicode version 17.0

Variants

[edit ]

Several of the characters are defined to render as a standardized variant if followed by variant indicators.

A variant is defined for a zero with a short diagonal stroke: U+0030 DIGIT ZERO, U+FE00 VS1 (0︀).^[9]^[10]

Twelve characters (#, *, and the digits) can be followed by U+FE0E VS15 or U+FE0F VS16 to create emoji variants.^[11]^[12]^[13]^[14] They are keycap base characters, for example #️⃣ (U+0023 NUMBER SIGN U+FE0F VS16 U+20E3 COMBINING ENCLOSING KEYCAP). The VS15 version is "text presentation" while the VS16 version is "emoji-style".^[10]

Emoji variation sequences

U+ 0023 002A 0030 0031 0032 0033 0034 0035 0036 0037 0038 0039

base # * 0 1 2 3 4 5 6 7 8 9

base+VS15+keycap #︎⃣ *︎⃣ 0︎⃣ 1︎⃣ 2︎⃣ 3︎⃣ 4︎⃣ 5︎⃣ 6︎⃣ 7︎⃣ 8︎⃣ 9︎⃣

base+VS16+keycap #️⃣ *️⃣ 0️⃣ 1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ 6️⃣ 7️⃣ 8️⃣ 9️⃣

History

[edit ]

The following Unicode-related documents record the purpose and process of defining specific characters in the Basic Latin block:

Version	Final code points^[a]	Count	UTC ID	L2 ID	WG2 ID
1.0.0	U+0000..007F	128	(to be determined)
			UTC/1999-013	Karlsson, Kent (1999年05月27日), Tildes and micro sign decompositions
			L2/99-176R	Moore, Lisa (1999年11月04日), "Micro Sign Case Mappings", Minutes from the joint UTC/L2 meeting in Seattle, June 8-10, 1999
			L2/04-145	Starner, David (2004年04月30日), C with stroke character examples from BAE report 1884 (Dorsey)
			L2/04-202	Anderson, Deborah (2004年06月07日), Slashed C Feedback
			N3046	Suignard, Michel (2006年02月22日), Improving formal definition for control characters
			N3103 (pdf, doc)	Umamaheswaran, V. S. (2006年08月25日), "M48.33", Unconfirmed minutes of WG 2 meeting 48, Mountain View, CA, USA; 2006年04月24日/27
			L2/11-043	Freytag, Asmus; Karlsson, Kent (2011年02月02日), Proposal to correct mistakes and inconsistencies in certain property assignments for super and subscripted letters
			L2/11-160	PRI #181 Changing General Category of Twelve Characters, 2011年05月02日
			L2/11-261R2	Moore, Lisa (2011年08月16日), "Consensus 128-C3", UTC #128 / L2 #225 Minutes, Accept Ken Whistler's recommendations in L2/11-281 on name aliases for control characters with the addition of the abbreviations BEL and NUL.
			L2/11-438 ^[b]^[c]	N4182	Edberg, Peter (2011年12月22日), Emoji Variation Sequences (Revision of L2/11-429)
			L2/15-107	Moore, Lisa (2015年05月12日), "Consensus 143-C5", UTC #143 Minutes, Add the 12 keycap sequences in emoji-data.txt as provisional named sequences in Unicode 8.0.
			L2/15-268	Beeton, Barbara; Freytag, Asmus; Iancu, Laurențiu; Sargent, Murray (2015年10月30日), Proposal to Represent the Slashed Zero Variant of Empty Set
			L2/15-301 ^[d]^[c]	Pournader, Roozbeh (2015年11月01日), A proposal for 278 standardized variation sequences for emoji
			L2/15-254	Moore, Lisa (2015年11月16日), "B.12.1.2 Proposal to Represent the Slashed Zero Variant of Empty Set", UTC #145 Minutes
			L2/17-294	N4914	Lunde, Ken (2017年08月14日), Proposal to add standardized variation sequence for U+FF10 FULLWIDTH DIGIT ZERO
			L2/22-019	Scherer, Markus; et al. (2022年01月19日), "F.2 F4: U+0019 in ISO vs. NameAliases.txt vs. chart/NamesList.txt", UTC #170 properties feedback & recommendations
			L2/22-016	Constable, Peter (2022年04月21日), "Consensus 170-C24", UTC #170 Minutes, For U+0019, add a Name alias "EM" of type abbreviation, for Unicode version 15.0.
^ Proposed code points and characters names may differ from final code points and names ^ See also L2/10-458, L2/11-414, L2/11-415, and L2/11-429 ^ ^a ^b Refer to the history section of the Miscellaneous Symbols and Pictographs block for additional emoji-related documents ^ See also L2/15-198 and L2/15-275

References

[edit ]

^ "Unicode character database". The Unicode Standard. Retrieved 2023年07月26日.
^ "Enumerated Versions of The Unicode Standard". The Unicode Standard. Retrieved 2023年07月26日.
^ "block.txt". The Unicode Consortium. Retrieved 2023年03月23日.
^ "C0 Controls and Basic Latin" (PDF). The Unicode Standard, Version 15.0. Unicode, Inc. 2022. Retrieved March 22, 2023.
^ The Unicode Standard Version 1.0, Volume 1. Addison-Wesley Publishing Company, Inc. 1990. ISBN 0-201-56788-1.
^ "3.8: Block-by-Block Charts" (PDF). The Unicode Standard. version 1.0. Unicode Consortium.
^ Michael S. Kaplan (2005年09月17日). "When is a backslash not a backslash?". Sorting it all Out. Microsoft. Archived from the original on 2010年06月12日. Also available at: http://archives.miloush.net/michkap/archive/2005/09/17/469941.html
^ ^a ^b ^c ^d ^e ^f ^g "Unicode 6.2 code charts" (PDF). The Unicode Standard. Retrieved 1 April 2013.
^ Beeton, Barbara; Freytag, Asmus; Iancu, Laurențiu; Sargent, Murray (2015年10月30日). "L2/15-268: Proposal to Represent the Slashed Zero Variant of Empty Set" (PDF).
^ ^a ^b "UTS #51 Emoji Variation Sequences". The Unicode Consortium.
^ Edberg, Peter (2011年12月22日). "L2/11-438: Emoji Variation Sequences (Revision of L2/11-429)" (PDF).
^ Pournader, Roozbeh (2015年11月01日). "L2/15-301: A proposal for 278 standardized variation sequences for emoji" (PDF).
^ "UTR #51: Unicode Emoji". Unicode Consortium. 2023年09月05日.
^ "UCD: Emoji Data for UTR #51". Unicode Consortium. 2023年02月01日.

External links

[edit ]

Listen to this article (5 minutes)

Spoken Wikipedia icon

This audio file was created from a revision of this article dated 8 November 2023 (2023年11月08日), and does not reflect subsequent edits.

(Audio help · More spoken articles)

v
t
e

Unicode

Code points

Characters

Special purpose	BOM Combining grapheme joiner Left-to-right mark / Right-to-left mark Soft hyphen Variant form Word joiner Zero-width joiner Zero-width non-joiner Zero-width space
Lists	Characters CJK Unified Ideographs Combining character Duplicate characters Numerals Scripts Spaces Symbols Halfwidth and fullwidth Alias names and abbreviations Whitespace characters

Processing

Algorithms	Bidirectional text Collation ISO/IEC 14651 Equivalence Variation sequences International Ideographs Core
Comparison of encodings	BOCU-1 CESU-8 Punycode SCSU UTF-1 UTF-7 UTF-8 UTF-16/UCS-2 UTF-32/UCS-4 UTF-EBCDIC

On pairs of
code points

Usage

Related standards

Table of characters

Subheadings

C0 controls

ASCII punctuation and symbols

ASCII digits

Uppercase Latin alphabet

Lowercase Latin alphabet

Control character

Number of symbols, letters and control codes

Chart

Variants

History

See also

References

External links