Default locale for Russian/Russia should be ru_RU.CP1251
Marco Atzeri
marco.atzeri@gmail.com
Thu Dec 24 18:22:00 GMT 2015
On 24/12/2015 16:40, Andrey ``Bass'' Shcheglov wrote:
> Hi,
>> I'm running Cygwin 2.2.0 on an English Windows 8.1 box:
>>> CYGWIN_NT-6.3 UNIT-725 2.2.0(0.289/5/3) 2015年08月03日 12:51 x86_64 Cygwin
>> Windows regional settings are set to Russian/Russia.
>> In the absence of any settings in bashrc/bash_profile, `locale` command
> outputs the following:
>>> LANG=ru_RU
>> LC_CTYPE="ru_RU"
>> LC_NUMERIC="ru_RU"
>> LC_TIME="ru_RU"
>> LC_COLLATE="ru_RU"
>> LC_MONETARY="ru_RU"
>> LC_MESSAGES="ru_RU"
>> LC_ALL=
>> This is perfectly fine, except that "no charset" in the locale output
> means "ISO charset", which is ISO-8859-5 for Russian/Russia and has
> never been used (historically, DOS used CP866, Windows used CP1251 ANSI
> codepage, and various Unices sticked to KOI8-R before the rise of
> Unicode era).
>> The above is consistent with locale charmap output, which is again
> ISO-8859-5.
>>> Short C example also confirms ISO-8859-5 is used:
>>> #include <stdio.h>
>>>> #include <locale.h>
>> #include <langinfo.h>
>>>> int main() {
>> const char *locale = setlocale(LC_ALL, "");
>> const char *codeset = nl_langinfo(CODESET);
>> printf("locale: %s\n", locale);
>> printf("codeset: %s\n", codeset);
>>>> return 0;
>> }
>> outputs
>>> locale: ru_RU/ru_RU/ru_RU/ru_RU/ru_RU/C
>> codeset: ISO-8859-5
>>> Cygwin docs state that
>>> Starting with Cygwin 1.7.2, the default character set is determined by the default Windows ANSI codepage for this language and territory.
>> which is not true in my case (Windows ANSI codepage for Cyrillic is
> CP1251, not ISO-8859-5!). Surprisingly, for Belarusian (a.k.a
> Belorussian, Eastern Slavic language very close to Russian) "be_BY"
> locale the default charset is indeed CP1251 which is in accordance with
> both the documentation and common sense.
>>> Additionally, in `strace locale -u` output, I see multiple
>> __get_lcid_from_locale: LCID=0x0419
> lines.
>> "0x0419" corresponds to Russian/Russia (see
> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd318693%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396>).
>> Despite that, $(locale -u) returns "en_GB", despite all regional
> settings are set to Russian/Russia. I believe this is not correct,
> either, and needs to be fixed.
the current code on
winsup/cygwin/nlsfuncs.cc
is responsible for the ISO-8859-5 defaults.
--------------------------------------------------------------
case 1251:
if (lcid == 0x0c1a /* sr_CS (Serbian Language/Former
Serbia and Montenegro) */
|| lcid == 0x1c1a /* sr_BA (Serbian Language/Bosnia
and Herzegovina) */
|| lcid == 0x281a /* sr_RS (Serbian
Language/Serbia) */
|| lcid == 0x301a /* sr_ME (Serbian
Language/Montenegro)*/
|| lcid == 0x0440 /* ky_KG (Kyrgyz/Kyrgyzstan) */
|| lcid == 0x0843 /* uz_UZ (Uzbek/Uzbekistan) */
/* tt_RU (Tatar/Russia),
IQTElif alphabet */
|| (lcid == 0x0444 && has_modifier ("@iqtelif"))
|| lcid == 0x0450) /* mn_MN (Mongolian/Mongolia) */
cs = "UTF-8";
else if (lcid == 0x0423) /* be_BY (Belarusian/Belarus) */
cs = has_modifier ("@latin") ? "UTF-8" : "CP1251";
else if (lcid == 0x0402) /* bg_BG (Bulgarian/Bulgaria) */
cs = "CP1251";
else if (lcid == 0x0422) /* uk_UA (Ukrainian/Ukraine) */
cs = "KOI8-U";
else
cs = "ISO-8859-5";
--------------------------------------------------------------
> Regards,
> Andrey.
as temporary workaround can you use UTF-8 ?
export LANG=ru_RU.UTF-8
Regards
Marco
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
More information about the Cygwin
mailing list