Default locale for Russian/Russia should be ru_RU.CP1251

Marco Atzeri marco.atzeri@gmail.com
Thu Dec 24 18:22:00 GMT 2015


On 24/12/2015 16:40, Andrey ``Bass'' Shcheglov wrote:
> Hi,
>> I'm running Cygwin 2.2.0 on an English Windows 8.1 box:
>>> CYGWIN_NT-6.3 UNIT-725 2.2.0(0.289/5/3) 2015年08月03日 12:51 x86_64 Cygwin
>> Windows regional settings are set to Russian/Russia.
>> In the absence of any settings in bashrc/bash_profile, `locale` command
> outputs the following:
>>> LANG=ru_RU
>> LC_CTYPE="ru_RU"
>> LC_NUMERIC="ru_RU"
>> LC_TIME="ru_RU"
>> LC_COLLATE="ru_RU"
>> LC_MONETARY="ru_RU"
>> LC_MESSAGES="ru_RU"
>> LC_ALL=
>> This is perfectly fine, except that "no charset" in the locale output
> means "ISO charset", which is ISO-8859-5 for Russian/Russia and has
> never been used (historically, DOS used CP866, Windows used CP1251 ANSI
> codepage, and various Unices sticked to KOI8-R before the rise of
> Unicode era).
>> The above is consistent with locale charmap output, which is again
> ISO-8859-5.
>>> Short C example also confirms ISO-8859-5 is used:
>>> #include <stdio.h>
>>>> #include <locale.h>
>> #include <langinfo.h>
>>>> int main() {
>> const char *locale = setlocale(LC_ALL, "");
>> const char *codeset = nl_langinfo(CODESET);
>> printf("locale: %s\n", locale);
>> printf("codeset: %s\n", codeset);
>>>> return 0;
>> }
>> outputs
>>> locale: ru_RU/ru_RU/ru_RU/ru_RU/ru_RU/C
>> codeset: ISO-8859-5
>>> Cygwin docs state that
>>> Starting with Cygwin 1.7.2, the default character set is determined by the default Windows ANSI codepage for this language and territory.
>> which is not true in my case (Windows ANSI codepage for Cyrillic is
> CP1251, not ISO-8859-5!). Surprisingly, for Belarusian (a.k.a
> Belorussian, Eastern Slavic language very close to Russian) "be_BY"
> locale the default charset is indeed CP1251 which is in accordance with
> both the documentation and common sense.
>>> Additionally, in `strace locale -u` output, I see multiple
>> __get_lcid_from_locale: LCID=0x0419
> lines.
>> "0x0419" corresponds to Russian/Russia (see
> <https://msdn.microsoft.com/en-us/library/windows/desktop/dd318693%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396>).
>> Despite that, $(locale -u) returns "en_GB", despite all regional
> settings are set to Russian/Russia. I believe this is not correct,
> either, and needs to be fixed.

the current code on
 winsup/cygwin/nlsfuncs.cc
is responsible for the ISO-8859-5 defaults.
--------------------------------------------------------------
 case 1251:
 if (lcid == 0x0c1a /* sr_CS (Serbian Language/Former
 Serbia and Montenegro) */
 || lcid == 0x1c1a /* sr_BA (Serbian Language/Bosnia
 and Herzegovina) */
 || lcid == 0x281a /* sr_RS (Serbian 
Language/Serbia) */
 || lcid == 0x301a /* sr_ME (Serbian 
Language/Montenegro)*/
 || lcid == 0x0440 /* ky_KG (Kyrgyz/Kyrgyzstan) */
 || lcid == 0x0843 /* uz_UZ (Uzbek/Uzbekistan) */
 /* tt_RU (Tatar/Russia),
 IQTElif alphabet */
 || (lcid == 0x0444 && has_modifier ("@iqtelif"))
 || lcid == 0x0450) /* mn_MN (Mongolian/Mongolia) */
 cs = "UTF-8";
 else if (lcid == 0x0423) /* be_BY (Belarusian/Belarus) */
 cs = has_modifier ("@latin") ? "UTF-8" : "CP1251";
 else if (lcid == 0x0402) /* bg_BG (Bulgarian/Bulgaria) */
 cs = "CP1251";
 else if (lcid == 0x0422) /* uk_UA (Ukrainian/Ukraine) */
 cs = "KOI8-U";
 else
 cs = "ISO-8859-5";
--------------------------------------------------------------
> Regards,
> Andrey.

as temporary workaround can you use UTF-8 ?
export LANG=ru_RU.UTF-8
Regards
Marco
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple


More information about the Cygwin mailing list

AltStyle によって変換されたページ (->オリジナル) /