This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013年11月09日 08:02 by mfabian, last changed 2022年04月11日 14:57 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| mike-test.py | mfabian, 2013年11月09日 08:05 | test program to see what goes wrong in locale normalization | ||
| 0001-Issue-19534-fix-normalize-in-locale.py-to-make-it-wo.patch | mfabian, 2013年11月09日 08:22 | 0001-Issue-19534-fix-normalize-in-locale.py-to-make-it-wo.patch | ||
| Messages (8) | |||
|---|---|---|---|
| msg202467 - (view) | Author: Mike FABIAN (mfabian) | Date: 2013年11月09日 08:02 | |
Originally reported here: https://bugzilla.redhat.com/show_bug.cgi?id=1024667 I found that Serbian translations in Latin do not work when the locale name is written as sr_RS.UTF-8@latin (one gets the cyrillic translations instead), but they *do* work when the locale name is written as sr_RS@latin (i.e. omitting the '.UTF-8'): $ LANG='sr_RS.UTF-8' python2 -c 'import gettext; print(gettext.ldgettext("anaconda", "What language would you like to use during the installation process?").decode("UTF-8"))' Који језик бисте желели да користите током процеса инсталације? mfabian@ari:~ $ LANG='sr_RS.UTF-8@latin' python2 -c 'import gettext; print(gettext.ldgettext("anaconda", "What language would you like to use during the installation process?").decode("UTF-8"))' Који језик бисте желели да користите током процеса инсталације? mfabian@ari:~ $ LANG='sr_RS@latin' python2 -c 'import gettext; print(gettext.ldgettext("anaconda", "What language would you like to use during the installation process?").decode("UTF-8"))' Koji jezik biste želeli da koristite tokom procesa instalacije? mfabian@ari:~ $ The "gettext" command line tool does not have this problem: mfabian@ari:~ $ LANG='sr_RS@latin' gettext anaconda "What language would you like to use during the installation process?" Koji jezik biste želeli da koristite tokom procesa instalacije?mfabian@ari:~ $ LANG='sr_RS.UTF-8@latin' gettext anaconda "What language would you like to use during the installation process?" Koji jezik biste želeli da koristite tokom procesa instalacije?mfabian@ari:~ $ LANG='sr_RS.UTF-8' gettext anaconda "What language would you like to use during the installation process?" Који језик бисте желели да користите током процеса инсталације?mfabian@ari:~ $ |
|||
| msg202468 - (view) | Author: Mike FABIAN (mfabian) | Date: 2013年11月09日 08:05 | |
The problem turns out to be caused by a problem in normalizing the locale name, see the output of this test program: mfabian@ari:~ $ cat ~/tmp/mike-test.py #!/usr/bin/python2 import sys import os import locale import encodings import encodings.aliases test_locales = [ 'ja_JP.UTF-8', 'de_DE.SJIS', 'de_DE.foobar', 'sr_RS.UTF-8@latin', 'sr_rs@latin', 'sr@latin', 'sr_yu', 'sr_yu.SJIS@devanagari', 'sr@foobar', 'sR@foObar', 'sR', ] for test_locale in test_locales: print("%(orig)s -> %(norm)s" %{'orig': test_locale, 'norm': locale.normalize(test_locale)} ) mfabian@ari:~ $ python2 ~/tmp/mike-test.py ja_JP.UTF-8 -> ja_JP.UTF-8 de_DE.SJIS -> de_DE.SJIS de_DE.foobar -> de_DE.foobar sr_RS.UTF-8@latin -> sr_RS.utf_8_latin sr_rs@latin -> sr_RS.UTF-8@latin sr@latin -> sr_RS.UTF-8@latin sr_yu -> sr_RS.UTF-8@latin sr_yu.SJIS@devanagari -> sr_RS.sjis_devanagari sr@foobar -> sr@foobar sR@foObar -> sR@foObar sR -> sr_RS.UTF-8 mfabian@ari:~ $ I.e. "sr_RS.UTF-8@latin" is normalized to "sr_RS.utf_8_latin" which is clearly wrong and causes a fallback to sr_RS when using gettext which gives the cyrillic translations. |
|||
| msg202469 - (view) | Author: Mike FABIAN (mfabian) | Date: 2013年11月09日 08:09 | |
A simple fix for that problem could look like this: mfabian@ari:~ $ diff -u /usr/lib64/python2.7/locale.py.orig /usr/lib64/python2.7/locale.py --- /usr/lib64/python2.7/locale.py.orig 2013年11月09日 09:08:24.807331535 +0100 +++ /usr/lib64/python2.7/locale.py 2013年11月09日 09:08:34.526390646 +0100 @@ -377,7 +377,7 @@ # First lookup: fullname (possibly with encoding) norm_encoding = encoding.replace('-', '') norm_encoding = norm_encoding.replace('_', '') - lookup_name = langname + '.' + encoding + lookup_name = langname + '.' + norm_encoding code = locale_alias.get(lookup_name, None) if code is not None: return code @@ -1457,6 +1457,7 @@ 'sr_cs@latn': 'sr_RS.UTF-8@latin', 'sr_me': 'sr_ME.UTF-8', 'sr_rs': 'sr_RS.UTF-8', + 'sr_rs.utf8@latin': 'sr_RS.UTF-8@latin', 'sr_rs.utf8@latn': 'sr_RS.UTF-8@latin', 'sr_rs@latin': 'sr_RS.UTF-8@latin', 'sr_rs@latn': 'sr_RS.UTF-8@latin', mfabian@ari:~ $ |
|||
| msg202470 - (view) | Author: Mike FABIAN (mfabian) | Date: 2013年11月09日 08:15 | |
in locale.py, the comment above "locale_alias = {" says:
# Note that the normalize() function which uses this tables
# removes '_' and '-' characters from the encoding part of the
# locale name before doing the lookup. This saves a lot of
# space in the table.
But in normalize(), this is actually not done:
# First lookup: fullname (possibly with encoding)
norm_encoding = encoding.replace('-', '')
norm_encoding = norm_encoding.replace('_', '')
lookup_name = langname + '.' + encoding
code = locale_alias.get(lookup_name, None)
"norm_encoding" holds the locale name with these replacements,
but then it is not used in the lookup.
The patch in http://bugs.python.org/msg202469
fixes that, using the norm_encoding together with adding the alias
+ 'sr_rs.utf8@latin': 'sr_RS.UTF-8@latin',
makes it work for sr_RS.UTF-8@latin, my test program then outputs:
mfabian@ari:~
$ python2 ~/tmp/mike-test.py
ja_JP.UTF-8 -> ja_JP.UTF-8
de_DE.SJIS -> de_DE.SJIS
de_DE.foobar -> de_DE.foobar
sr_RS.UTF-8@latin -> sr_RS.UTF-8@latin
sr_rs@latin -> sr_RS.UTF-8@latin
sr@latin -> sr_RS.UTF-8@latin
sr_yu -> sr_RS.UTF-8@latin
sr_yu.SJIS@devanagari -> sr_RS.sjis_devanagari
sr@foobar -> sr@foobar
sR@foObar -> sR@foObar
sR -> sr_RS.UTF-8
mfabian@ari:~
$
But note that the normalization of the "sr_yu.SJIS@devanagari"
locale is still weird (of course a "sr_yu.SJIS@devanagari"
is quite silly and does not exist anyway, but the code in normalize()
does not seem to work as intended.
|
|||
| msg202471 - (view) | Author: Mike FABIAN (mfabian) | Date: 2013年11月09日 08:22 | |
I think the patch I attach here is a better fix than the patch in http://bugs.python.org/msg202469 because it makes the normalize() function behave more logical overall, with this patch, my test program prints: mfabian@ari:/local/mfabian/src/cpython (2.7-mike %) $ ./python ~/tmp/mike-test.py ja_JP.UTF-8 -> ja_JP.UTF-8 de_DE.SJIS -> de_DE.SJIS de_DE.foobar -> de_DE.foobar sr_RS.UTF-8@latin -> sr_RS.UTF-8@latin sr_rs@latin -> sr_RS.UTF-8@latin sr@latin -> sr_RS.UTF-8@latin sr_yu -> sr_RS.UTF-8@latin sr_yu.SJIS@devanagari -> sr_RS.SJIS@devanagari sr@foobar -> sr_RS.UTF-8@foobar sR@foObar -> sr_RS.UTF-8@foobar sR -> sr_RS.UTF-8 [18995 refs] mfabian@ari:/local/mfabian/src/cpython (2.7-mike %) $ The patch also contains a small fix for the "ks" and "sd" locales in the locale_alias dictionary, they had the ".UTF-8" in the wrong place: - 'ks_in@devanagari': 'ks_IN@devanagari.UTF-8', + 'ks_in@devanagari': 'ks_IN.UTF-8@devanagari', - 'sd': 'sd_IN@devanagari.UTF-8', + 'sd': 'sd_IN.UTF-8@devanagari', (This error is inherited from the locale.alias file from X.org where the locale_alias dictionary is generated from) |
|||
| msg202472 - (view) | Author: Mike FABIAN (mfabian) | Date: 2013年11月09日 08:24 | |
The patch http://bugs.python.org/file32552/0001-Issue-19534-fix-normalize-in-locale.py-to-make-it-wo.patch is against the current HEAD of the 2.7 branch, but Python 3.3 has exactly the same problem, the same patch fixes it for python 3.3 as well. |
|||
| msg202473 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2013年11月09日 08:44 | |
Seems this is a duplicate of issue5815. |
|||
| msg208116 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) | Date: 2014年01月14日 21:38 | |
locale.normalize() was fixed in issue5815 (and new entry for 'sr_RS.UTF-8@latin' is not needed anymore). Devanagari entries were fixed in issue20027. In any case thank you Mike for your report and proposed patch. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:57:53 | admin | set | github: 63733 |
| 2014年01月14日 21:38:10 | serhiy.storchaka | set | status: open -> closed type: behavior messages: + msg208116 resolution: duplicate stage: resolved |
| 2013年11月09日 08:44:39 | serhiy.storchaka | set | superseder: locale.getdefaultlocale() missing corner case messages: + msg202473 nosy: + serhiy.storchaka |
| 2013年11月09日 08:24:28 | mfabian | set | messages: + msg202472 |
| 2013年11月09日 08:22:39 | mfabian | set | files:
+ 0001-Issue-19534-fix-normalize-in-locale.py-to-make-it-wo.patch keywords: + patch messages: + msg202471 |
| 2013年11月09日 08:15:40 | mfabian | set | messages: + msg202470 |
| 2013年11月09日 08:09:25 | mfabian | set | messages: + msg202469 |
| 2013年11月09日 08:05:26 | mfabian | set | files:
+ mike-test.py messages: + msg202468 |
| 2013年11月09日 08:02:59 | mfabian | create | |