Cork encoding
Find sources: "Cork encoding" – news · newspapers · books · scholar · JSTOR (November 2012)
The Cork (also known as T1 or EC) encoding is a character encoding used for encoding glyphs in fonts.[1] It is named after the city of Cork in Ireland, where during a TeX Users Group (TUG) conference in 1990 a new encoding was introduced for LaTeX.[1] It contains 256 characters supporting most west- and east-European languages with the Latin alphabet.[2]
Details
[edit ]In 8-bit TeX engines the font encoding has to match the encoding of hyphenation patterns where this encoding is most commonly used.[3] In LaTeX one can switch to this encoding with \usepackage[T1]{fontenc}, while in ConTeXt MkII this is the default encoding already. In modern engines such as XeTeX and LuaTeX Unicode is fully supported and the 8-bit font encodings are obsolete.
Character set
[edit ]0060 ́
00B4 ˆ
02C6 ̃
02DC ̈
00A8 ̋
02DD ̊
02DA ˇ
02C7 ̆
02D8 ̄
00AF ̇
02D9 ̧
00B8 ̨
02DB ‚
201A ‹
2039 ›
203A
201C "
201D „
201E «
00AB »
00BB –
2013 —
2014 ZWSP [a]
200B 0 [b]
2080 ı [c]
0131 ȷ [c]
0237 ff
FB00 fi
FB01 fl
FB02 ffi
FB03 ffl
FB04
0102 Ą
0104 Ć
0106 Č
010C Ď
010E Ě
011A Ę
0118 Ğ
011E Ĺ
0139 Ľ
013D Ł
0141 Ń
0143 Ň
0147 Ŋ
014A Ő
0150 Ŕ
0154
0158 Ś
015A Š
0160 Ş
015E Ť
0164 Ţ
0162 Ű
0170 Ů
016E Ÿ
0178 Ź
0179 Ž
017D Ż
017B IJ
0132 İ
0130 đ
0111 §
00A7
0103 ą
0105 ć
0107 č
010D ď
010F ě
011B ę
0119 ğ
011F ĺ
013A ľ
013E ł
0142 ń
0144 ň
0148 ŋ
014B ő
0151 ŕ
0155
0159 ś
015B š
0161 ş
015F ť
0165 ţ
0163 ű
0171 ů
016F ÿ
00FF ź
017A ž
017E ż
017C ij
0133 ¡
00A1 ¿
00BF £
00A3
Notes
[edit ]- Hexadecimal values under the characters in the table are the Unicode character codes.
- The first 12 characters are often used as combining characters.
- ^ 0x17 is dubbed a "compound word mark" (CWM) in the Cork encoding, and is an innovation of this standard. It is an invisible character that separates compounds in a complex word, for instance in German, in order to disallow esthetic ligatures at compound boundaries.[2] It is mapped to the Unicode "zero-width space" (ZWSP, U+200B), defined at about the same time, whose purpose is similar, if not identical.
- ^ 0x18 is a "small o", used to compose ‰ or ‱ (or arbitrary smaller quantities) out of percent sign (%).[2]
- ^ a b Dotless i and dotless j may be used to compose accented variants like i with macron (ī).
- ^ 0x7F is the hyphenation character, not really a soft hyphen (SHY) as defined by Unicode.
- ^ 0xD0 is used both as Eth (Ð, U+00D0) and as D with stroke (Đ, U+0110) which might be a problem at some occasions (like copying text from PDF, hyphenation, ...)
- ^ 0xDF contains SS (two letters S). It allows TeX to automatically convert the German lowercase ß into the uppercase form.
Supported languages
[edit ]The encoding supports most European languages written in Latin alphabet. Notable exceptions are:
- Esperanto and Maltese language (using IL3)
- Latvian language and Lithuanian language (using L7X)
- Welsh language
Languages with slightly suboptimal support include:
- Galician language, Portuguese language and Spanish language – due to the lack of characters a and o, which are not superscript versions of lowercase "a" and "o" (superscripts are thinner) and they are often underlined
- Croatian language, Bosnian language, Serbian language – due to the shared use of the slot for Đ
- Turkish language – due to dotless i having different uppercase and lowercase combinations than in other languages
References
[edit ]- ^ a b Petrlik, Lukas (1996年06月19日). "The Czech and Slovak Character Encoding Mess Explained". cs-encodings-faq. 1.10. Archived from the original on 2016年06月21日. Retrieved 2016年06月21日.
- ^ a b c Ferguson, Michael (1990), "Report on Multilingual Activities" (PDF), TUGboat, 11 (4): 514–516
- ^ TeX hyphenation patterns