KOI8-R
Alias(es) | cp878 (code page 878) |
---|---|
Languages | Russian, Bulgarian |
Classification | 8-bit KOI, extended ASCII |
Extends | KOI8-B |
Based on | KOI-8 |
Other related encodings | KOI8-U, KOI8-RU |
KOI8-R (RFC 1489) is an 8-bit character encoding derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses the Russian subset of a Cyrillic script. KOI-8, on its turn, is an 8-bit extension of the KOI-7 encoding, which inherited a phonetic correspondence of Russian and Latin letters from the MTK-2 teletype code. As a result, Russian Cyrillic letters in KOI8-R are in pseudo-Latin alphabetical order rather than the normal Cyrillic one like in ISO 8859-5. Although this may seem unnatural, this has the useful effect that if the 8th bit is stripped, the text remains partially readable in any ASCII-based encoding (including KOI8-R itself) as a case-reversed transliteration. For example, "Код для обмена и обработки информации" (the Russian meaning of the "KOI" acronym) becomes kOD DLQ OBMENA I OBRABOTKI INFORMACII.
KOI-8 stands for 8-bitnyy kod dlya obmena i obrabotki informatsii (Russian: 8-битный код для обмена и обработки информации) which means "8-Bit Code for Information Interchange".[1] In Microsoft Windows, KOI8-R is assigned the code page number 20866. In IBM, KOI8-R is assigned code page 878.[2] [3] KOI8-R also happens to cover Bulgarian.
It lacks proper quotation marks for these languages: both «...» and the Bulgarian „...". Windows-1251 does support these, as well as more letters, and has thus become more popular. KOI8-R is used by less than 0.004% of websites, mostly Russian and Bulgarian.[citation needed ] Unicode and UTF-8 is preferred to single-byte Cyrillic encodings in modern applications, Unicode contains 436 Cyrillic letters including for Old Cyrillic.
Character set
[edit ]The following table shows the KOI8-R encoding. Each character is shown with its equivalent Unicode code point.
KOI8-R[4] [5] [6] [7]2500 │
2502 ┌
250C ┐
2510 └
2514 ┘
2518 ├
251C ┤
2524 ┬
252C ┴
2534 ┼
253C ▀
2580 ▄
2584 █
2588 ▌
258C ▐
2590
2591 ▒
2592 ▓
2593 ⌠
2320 ■しかく
25A0 ∙
2219 √
221A ≈
2248 ≤
2264 ≥
2265 NBSP ⌡
2321 °
00B0 2
00B2 ·
00B7 ÷
00F7
255F ╠
2560 ╡
2561 Ё
0401 ╢
2562 ╣
2563 ╤
2564 ╥
2565 ╦
2566 ╧
2567 ╨
2568 ╩
2569 ╪
256A ╫
256B ╬
256C ©
00A9
044E а
0430 б
0431 ц
0446 д
0434 е
0435 ф
0444 г
0433 х
0445 и
0438 й
0439 к
043A л
043B м
043C н
043D о
043E
043F я
044F р
0440 с
0441 т
0442 у
0443 ж
0436 в
0432 ь
044C ы
044B з
0437 ш
0448 э
044D щ
0449 ч
0447 ъ
044A
042E А
0410 Б
0411 Ц
0426 Д
0414 Е
0415 Ф
0424 Г
0413 Х
0425 И
0418 Й
0419 К
041A Л
041B М
041C Н
041D О
041E
041F Я
042F Р
0420 С
0421 Т
0422 У
0423 Ж
0416 В
0412 Ь
042C Ы
042B З
0417 Ш
0428 Э
042D Щ
0429 Ч
0427 Ъ
042A
See also
[edit ]- KOI8-B, a derivation of KOI8-R with only the letter subset implemented
- KOI8-U, another derivative encoding which adds Ukrainian characters
- KOI character encodings
- RELCOM
- Windows-1251, another common Cyrillic character encoding
References
[edit ]- ^ (in Russian) ГОСТ 19768-74 (СТ СЭВ 358-76). Машины вычислительные и система обработки данных. Коды 8-битные для обмена и обработки информации.
- ^ "SBCS code page information - CPGID: 00878 / Name: Russian internet koi8-r". IBM Software: Globalization: Coded character sets and related resources: Code pages by CPGID: Code page identifiers. IBM. C-H 3-3220-050. Archived from the original on 2017年02月18日. Retrieved 2017年02月18日.
- ^ "CCSID information document; CCSID 878; KOI8-R CYRILLIC". IBM . Retrieved 2017年02月18日.
- ^ Richter, Helmut (2016年01月04日) [1999年08月18日]. "KOI8-R.TXT". 2.0. Retrieved 2016年12月09日.
- ^ Code Page CPGID 00878 (pdf) (PDF), IBM
- ^ Code Page CPGID 00878 (txt), IBM
- ^ International Components for Unicode (ICU), ibm-878_P100-1996.ucm, 2002年12月03日
Further reading
[edit ]- Flohr, Guido; Kiss, Gabor; Chernov, Andrey A. (2016) [2006]. "Locale::RecodeData::KOI8_R - Conversion routines for KOI8-R". CPAN libintl-perl. 1.0. Archived from the original on 2017年01月15日. Retrieved 2017年01月15日.
- Kostis, Kosta. "koi8-r (Russian U*IX encoding, also used by RELCOM)". 1.20. Archived from the original on 2017年01月16日. Retrieved 2017年01月16日.
- RFC 1489
- "KOI8-R (RFC 1489)". Kermit . Columbia University . Retrieved 2020年06月24日.
- Kornai, Andras; Birnbaum, David J.; da Cruz, Frank; Davis, Bur; Fowler, George; Paine, Richard B.; Paperno, Slava; Simonsen, Keld J.; Thobe, Glenn E.; Vulis, Dimitri; van Wingen, Johan W. (1993年03月13日). "CYRILLIC ENCODING FAQ Version 1.3". 1.3. Retrieved 2020年06月24日.
External links
[edit ]- Universal Cyrillic decoder, an online program that may help recovering Cyrillic texts with broken KOI8-R or other character encodings.
- "The Home of the KOI8-R since 1995". 1995. Retrieved 2016年12月05日.
- Czyborra, Roman (1998年11月30日) [1998年05月25日]. "The Cyrillic Charset Soup". Archived from the original on 2016年12月03日. Retrieved 2016年12月03日.
- Hohlov, Yu. E. "Cyrillic Information Representation in Electronic Form - Character Set (Code Page) Tables". Archived from the original on 2016年12月05日. Retrieved 2016年12月05日.
- Nechayev, Valentin (2013) [2001]. "Review of 8-bit Cyrillic encodings universe". Archived from the original on 2016年12月05日. Retrieved 2016年12月05日.