homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: normalize() in locale.py fails for sr_RS.UTF-8@latin
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: locale.getdefaultlocale() missing corner case
View: 5815
Assigned To: Nosy List: mfabian, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013年11月09日 08:02 by mfabian, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
mike-test.py mfabian, 2013年11月09日 08:05 test program to see what goes wrong in locale normalization
0001-Issue-19534-fix-normalize-in-locale.py-to-make-it-wo.patch mfabian, 2013年11月09日 08:22 0001-Issue-19534-fix-normalize-in-locale.py-to-make-it-wo.patch
Messages (8)
msg202467 - (view) Author: Mike FABIAN (mfabian) Date: 2013年11月09日 08:02
Originally reported here: 
https://bugzilla.redhat.com/show_bug.cgi?id=1024667
I found that Serbian translations in Latin do not work when the locale
name is written as sr_RS.UTF-8@latin (one gets the cyrillic
translations instead), but they *do* work when the locale name is
written as sr_RS@latin (i.e. omitting the '.UTF-8'):
$ LANG='sr_RS.UTF-8' python2 -c 'import gettext; print(gettext.ldgettext("anaconda", "What language would you like to use during the installation process?").decode("UTF-8"))'
Који језик бисте желели да користите током процеса инсталације?
mfabian@ari:~
$ LANG='sr_RS.UTF-8@latin' python2 -c 'import gettext; print(gettext.ldgettext("anaconda", "What language would you like to use during the installation process?").decode("UTF-8"))'
Који језик бисте желели да користите током процеса инсталације?
mfabian@ari:~
$ LANG='sr_RS@latin' python2 -c 'import gettext; print(gettext.ldgettext("anaconda", "What language would you like to use during the installation process?").decode("UTF-8"))'
Koji jezik biste želeli da koristite tokom procesa instalacije?
mfabian@ari:~
$ 
The "gettext" command line tool does not have this problem:
mfabian@ari:~
$ LANG='sr_RS@latin' gettext anaconda "What language would you like to use during the installation process?"
Koji jezik biste želeli da koristite tokom procesa instalacije?mfabian@ari:~
$ LANG='sr_RS.UTF-8@latin' gettext anaconda "What language would you like to use during the installation process?"
Koji jezik biste želeli da koristite tokom procesa instalacije?mfabian@ari:~
$ LANG='sr_RS.UTF-8' gettext anaconda "What language would you like to use during the installation process?"
Који језик бисте желели да користите током процеса инсталације?mfabian@ari:~
$
msg202468 - (view) Author: Mike FABIAN (mfabian) Date: 2013年11月09日 08:05
The problem turns out to be caused by a problem in normalizing
the locale name, see the output of this test program:
mfabian@ari:~
$ cat ~/tmp/mike-test.py
#!/usr/bin/python2
import sys
import os
import locale
import encodings
import encodings.aliases
test_locales = [
 'ja_JP.UTF-8',
 'de_DE.SJIS',
 'de_DE.foobar',
 'sr_RS.UTF-8@latin',
 'sr_rs@latin',
 'sr@latin',
 'sr_yu',
 'sr_yu.SJIS@devanagari',
 'sr@foobar',
 'sR@foObar',
 'sR',
]
for test_locale in test_locales:
 print("%(orig)s -> %(norm)s"
 %{'orig': test_locale,
 'norm': locale.normalize(test_locale)}
 )
mfabian@ari:~
$ python2 ~/tmp/mike-test.py
ja_JP.UTF-8 -> ja_JP.UTF-8
de_DE.SJIS -> de_DE.SJIS
de_DE.foobar -> de_DE.foobar
sr_RS.UTF-8@latin -> sr_RS.utf_8_latin
sr_rs@latin -> sr_RS.UTF-8@latin
sr@latin -> sr_RS.UTF-8@latin
sr_yu -> sr_RS.UTF-8@latin
sr_yu.SJIS@devanagari -> sr_RS.sjis_devanagari
sr@foobar -> sr@foobar
sR@foObar -> sR@foObar
sR -> sr_RS.UTF-8
mfabian@ari:~
$ 
I.e. "sr_RS.UTF-8@latin" is normalized to "sr_RS.utf_8_latin" which
is clearly wrong and causes a fallback to sr_RS when using gettext
which gives the cyrillic translations.
msg202469 - (view) Author: Mike FABIAN (mfabian) Date: 2013年11月09日 08:09
A simple fix for that problem could look like this:
mfabian@ari:~
$ diff -u /usr/lib64/python2.7/locale.py.orig /usr/lib64/python2.7/locale.py
--- /usr/lib64/python2.7/locale.py.orig 2013年11月09日 09:08:24.807331535 +0100
+++ /usr/lib64/python2.7/locale.py 2013年11月09日 09:08:34.526390646 +0100
@@ -377,7 +377,7 @@
 # First lookup: fullname (possibly with encoding)
 norm_encoding = encoding.replace('-', '')
 norm_encoding = norm_encoding.replace('_', '')
- lookup_name = langname + '.' + encoding
+ lookup_name = langname + '.' + norm_encoding
 code = locale_alias.get(lookup_name, None)
 if code is not None:
 return code
@@ -1457,6 +1457,7 @@
 'sr_cs@latn': 'sr_RS.UTF-8@latin',
 'sr_me': 'sr_ME.UTF-8',
 'sr_rs': 'sr_RS.UTF-8',
+ 'sr_rs.utf8@latin': 'sr_RS.UTF-8@latin',
 'sr_rs.utf8@latn': 'sr_RS.UTF-8@latin',
 'sr_rs@latin': 'sr_RS.UTF-8@latin',
 'sr_rs@latn': 'sr_RS.UTF-8@latin',
mfabian@ari:~
$
msg202470 - (view) Author: Mike FABIAN (mfabian) Date: 2013年11月09日 08:15
in locale.py, the comment above "locale_alias = {" says:
# Note that the normalize() function which uses this tables
# removes '_' and '-' characters from the encoding part of the
# locale name before doing the lookup. This saves a lot of
# space in the table.
But in normalize(), this is actually not done:
 # First lookup: fullname (possibly with encoding)
 norm_encoding = encoding.replace('-', '')
 norm_encoding = norm_encoding.replace('_', '')
 lookup_name = langname + '.' + encoding
 code = locale_alias.get(lookup_name, None)
"norm_encoding" holds the locale name with these replacements,
but then it is not used in the lookup.
The patch in http://bugs.python.org/msg202469
fixes that, using the norm_encoding together with adding the alias
+ 'sr_rs.utf8@latin': 'sr_RS.UTF-8@latin',
makes it work for sr_RS.UTF-8@latin, my test program then outputs:
mfabian@ari:~
$ python2 ~/tmp/mike-test.py
ja_JP.UTF-8 -> ja_JP.UTF-8
de_DE.SJIS -> de_DE.SJIS
de_DE.foobar -> de_DE.foobar
sr_RS.UTF-8@latin -> sr_RS.UTF-8@latin
sr_rs@latin -> sr_RS.UTF-8@latin
sr@latin -> sr_RS.UTF-8@latin
sr_yu -> sr_RS.UTF-8@latin
sr_yu.SJIS@devanagari -> sr_RS.sjis_devanagari
sr@foobar -> sr@foobar
sR@foObar -> sR@foObar
sR -> sr_RS.UTF-8
mfabian@ari:~
$ 
But note that the normalization of the "sr_yu.SJIS@devanagari"
locale is still weird (of course a "sr_yu.SJIS@devanagari"
is quite silly and does not exist anyway, but the code in normalize()
does not seem to work as intended.
msg202471 - (view) Author: Mike FABIAN (mfabian) Date: 2013年11月09日 08:22
I think the patch I attach here is a better fix than the
patch in http://bugs.python.org/msg202469 because
it makes the normalize() function behave more logical overall,
with this patch, my test program prints:
mfabian@ari:/local/mfabian/src/cpython (2.7-mike %)
$ ./python ~/tmp/mike-test.py
ja_JP.UTF-8 -> ja_JP.UTF-8
de_DE.SJIS -> de_DE.SJIS
de_DE.foobar -> de_DE.foobar
sr_RS.UTF-8@latin -> sr_RS.UTF-8@latin
sr_rs@latin -> sr_RS.UTF-8@latin
sr@latin -> sr_RS.UTF-8@latin
sr_yu -> sr_RS.UTF-8@latin
sr_yu.SJIS@devanagari -> sr_RS.SJIS@devanagari
sr@foobar -> sr_RS.UTF-8@foobar
sR@foObar -> sr_RS.UTF-8@foobar
sR -> sr_RS.UTF-8
[18995 refs]
mfabian@ari:/local/mfabian/src/cpython (2.7-mike %)
$ 
The patch also contains a small fix for the "ks" and "sd"
locales in the locale_alias dictionary, they had the ".UTF-8"
in the wrong place:
- 'ks_in@devanagari': 'ks_IN@devanagari.UTF-8',
+ 'ks_in@devanagari': 'ks_IN.UTF-8@devanagari',
- 'sd': 'sd_IN@devanagari.UTF-8',
+ 'sd': 'sd_IN.UTF-8@devanagari',
(This error is inherited from the locale.alias file from X.org
where the locale_alias dictionary is generated from)
msg202472 - (view) Author: Mike FABIAN (mfabian) Date: 2013年11月09日 08:24
The patch
http://bugs.python.org/file32552/0001-Issue-19534-fix-normalize-in-locale.py-to-make-it-wo.patch
is against the current HEAD of the 2.7 branch, but
Python 3.3 has exactly the same problem, the same patch fixes it for python
3.3 as well.
msg202473 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013年11月09日 08:44
Seems this is a duplicate of issue5815.
msg208116 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014年01月14日 21:38
locale.normalize() was fixed in issue5815 (and new entry for 'sr_RS.UTF-8@latin' is not needed anymore). Devanagari entries were fixed in issue20027. In any case thank you Mike for your report and proposed patch.
History
Date User Action Args
2022年04月11日 14:57:53adminsetgithub: 63733
2014年01月14日 21:38:10serhiy.storchakasetstatus: open -> closed
type: behavior
messages: + msg208116

resolution: duplicate
stage: resolved
2013年11月09日 08:44:39serhiy.storchakasetsuperseder: locale.getdefaultlocale() missing corner case

messages: + msg202473
nosy: + serhiy.storchaka
2013年11月09日 08:24:28mfabiansetmessages: + msg202472
2013年11月09日 08:22:39mfabiansetfiles: + 0001-Issue-19534-fix-normalize-in-locale.py-to-make-it-wo.patch
keywords: + patch
messages: + msg202471
2013年11月09日 08:15:40mfabiansetmessages: + msg202470
2013年11月09日 08:09:25mfabiansetmessages: + msg202469
2013年11月09日 08:05:26mfabiansetfiles: + mike-test.py

messages: + msg202468
2013年11月09日 08:02:59mfabiancreate

AltStyle によって変換されたページ (->オリジナル) /