This issue tracker has been migrated to GitHub ,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015年01月08日 20:30 by pnugues, last changed 2022年04月11日 14:58 by admin.
| Messages (4) | |||
|---|---|---|---|
| msg233685 - (view) | Author: Pierre Nugues (pnugues) | Date: 2015年01月08日 20:30 | |
The sorted() function does not work properly with macosx. Here is a script to reproduce the issue: import locale locale.setlocale(locale.LC_ALL, "fr_FR.UTF-8") a = ["A", "E", "Z", "a", "e", "é", "z"] sorted(a) sorted(a, key=locale.strxfrm) The execution on MacOsX produces: pierre:Flaubert pierre$ sw_vers -productVersion 10.10.1 pierre:Flaubert pierre$ python3 Python 3.4.2 (v3.4.2:ab2c023a9432, Oct 5 2014, 20:42:22) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. import locale locale.setlocale(locale.LC_ALL, "fr_FR.UTF-8") 'fr_FR.UTF-8' a = ["A", "E", "Z", "a", "e", "é", "z"] sorted(a) ['A', 'E', 'Z', 'a', 'e', 'z', 'é'] sorted(a, key=locale.strxfrm) ['A', 'E', 'Z', 'a', 'e', 'z', 'é'] while it produces this on your interactive shell (python.org): In [10]: import locale In [11]: locale.setlocale(locale.LC_ALL, "fr_FR.UTF-8") Out[11]: 'fr_FR.UTF-8' In [12]: a = ["A", "E", "Z", "a", "e", "é", "z"] In [13]: sorted(a) Out[13]: ['A', 'E', 'Z', 'a', 'e', 'z', 'é'] In [14]: sorted(a, key=locale.strxfrm) Out[14]: ['a', 'A', 'e', 'E', 'é', 'z', 'Z'] which is correct. |
|||
| msg233687 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2015年01月08日 21:27 | |
locale.strxfrm() have a different implementation in Python 2 and in Python 3:
- Python 2 uses strxfrm(), so works on bytes strings
- Python 3 uses wcsxfrm(), so works on multibyte strings ("unicode" strings)
It looks like Python 2 and 3 have the same behaviour on Mac OS X: the list is not sorted as expected. Test on Mac OS X 10.9.2.
Imac-Photo:~ haypo$ cat collate2.py
#coding:utf8
import locale, random
locale.setlocale(locale.LC_ALL, "fr_FR.UTF-8")
print("LC_COLLATE = %s" % locale.setlocale(locale.LC_COLLATE, None))
a = ["A", "E", "Z", "\xc9", "a", "e", "\xe9", "z"]
random.shuffle(a)
print(sorted(a))
print(sorted(a, key=locale.strxfrm))
Imac-Photo:~ haypo$ cat collate3.py
#coding:utf8
import locale, random
locale.setlocale(locale.LC_ALL, "fr_FR.UTF-8")
print("LC_COLLATE = %s" % locale.setlocale(locale.LC_COLLATE, None))
a = ["A", "E", "Z", "\xc9", "a", "e", "\xe9", "z"]
random.shuffle(a)
print(ascii(sorted(a)))
print(ascii(sorted(a, key=locale.strxfrm)))
Imac-Photo:~ haypo$ LC_ALL=fr_FR.utf8 python collate2.py
LC_COLLATE = fr_FR.UTF-8
['A', 'E', 'Z', 'a', 'e', 'z', '\xc9', '\xe9']
['A', 'E', 'Z', 'a', 'e', 'z', '\xc9', '\xe9']
Imac-Photo:~ haypo$ LC_ALL=fr_FR.utf8 ~/prog/python/default/python.exe ~/collate3.py
LC_COLLATE = fr_FR.UTF-8
['A', 'E', 'Z', 'a', 'e', 'z', '\xc9', '\xe9']
['A', 'E', 'Z', 'a', 'e', 'z', '\xc9', '\xe9']
On Linux, I get the expected order with Python 3:
LC_COLLATE = fr_FR.UTF-8
['A', 'E', 'Z', 'a', 'e', 'z', '\xc9', '\xe9']
['a', 'A', 'e', 'E', '\xe9', '\xc9', 'z', 'Z']
On Linux, Python 2 gives me a strange order. It's maybe an issue in my program:
haypo@selma$ python x.py
LC_COLLATE = fr_FR.UTF-8
['A', 'E', 'Z', 'a', 'e', 'z', '\xc9', '\xe9']
['\xe9', '\xc9', 'a', 'A', 'e', 'E', 'z', 'Z']
|
|||
| msg233690 - (view) | Author: Ned Deily (ned.deily) * (Python committer) | Date: 2015年01月08日 22:26 | |
The initial difference appears to be a long-standing BSD (including OS X) versus GNU/Linux platform difference. See, for example: http://www.postgresql.org/message-id/18C8A481-33A6-4483-8C24-B8CE70DB7F27@eggerapps.at Why there is no difference between en and fr UTF-8 is obvious when you look under the covers at the system locale definitions. This is on FreeBSD 10, OS X 10.10 is the same: $ cd /usr/share/locale/fr_FR.UTF-8/ $ ls -l total 8 lrwxr-xr-x 1 root wheel 28 Jan 16 2014 LC_COLLATE -> ../la_LN.US-ASCII/LC_COLLATE lrwxr-xr-x 1 root wheel 17 Jan 16 2014 LC_CTYPE -> ../UTF-8/LC_CTYPE lrwxr-xr-x 1 root wheel 30 Jan 16 2014 LC_MESSAGES -> ../fr_FR.ISO8859-1/LC_MESSAGES -r--r--r-- 1 root wheel 36 Jan 16 2014 LC_MONETARY lrwxr-xr-x 1 root wheel 29 Jan 16 2014 LC_NUMERIC -> ../fr_FR.ISO8859-1/LC_NUMERIC -r--r--r-- 1 root wheel 364 Jan 16 2014 LC_TIME For some reason US-ASCII is used for UTF-8 collation; this is also true for en_US.UTF-8 and de_DE.UTF-8, the only other ones I checked. The postresq discussion and some earlier Python issues suggest using ICU to properly implement Unicode functions like collation across all platforms. But that has never been implemented in Python. Nosing Marc-Andre. |
|||
| msg233691 - (view) | Author: STINNER Victor (vstinner) * (Python committer) | Date: 2015年01月08日 22:37 | |
> The postresq discussion and some earlier Python issues suggest using ICU to properly implement Unicode functions like collation across all platforms. In my experience, the locale module is error-prone and not reliable, especially if you want portability. It just uses functions provided by the OS. And the locales (LC_CTYPE, LC_MESSAGE, etc.) are process-wide which become a major issue if you want to serve different clients using different locales... Windows supports a different locale per thread if I remember correctly. It would be more reliable to use a good library like ICU. You may try: https://pypi.python.org/pypi/PyICU Link showing how to use PyICU to sort a Python sequence: https://stackoverflow.com/questions/11121636/sorting-list-of-string-with-specific-locale-in-python => strings.sort(key=lambda x: collator[loc].getCollationKey(x).getByteArray()) |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022年04月11日 14:58:11 | admin | set | github: 67384 |
| 2015年01月08日 22:37:48 | vstinner | set | messages: + msg233691 |
| 2015年01月08日 22:27:21 | ned.deily | set | title: Sorting with locale (strxfrm) does not work properly with Python3 on Macos -> Sorting with locale (strxfrm) does not work properly with Python3 on BSD or OS X |
| 2015年01月08日 22:26:41 | ned.deily | set | nosy:
+ lemburg messages: + msg233690 |
| 2015年01月08日 21:48:54 | r.david.murray | set | nosy:
+ r.david.murray |
| 2015年01月08日 21:48:27 | r.david.murray | link | issue23196 superseder |
| 2015年01月08日 21:46:17 | r.david.murray | set | title: Sorting with locale does not work properly with Python3 on Macos -> Sorting with locale (strxfrm) does not work properly with Python3 on Macos |
| 2015年01月08日 21:27:27 | vstinner | set | messages: + msg233687 |
| 2015年01月08日 20:33:56 | ned.deily | set | nosy:
+ ned.deily |
| 2015年01月08日 20:30:56 | pnugues | create | |