homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: locale.normalize strips "-" from UTF-8, which fails on Mac
Type: behavior Stage: resolved
Components: Library (Lib), macOS Versions: Python 3.1, Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ronaldoussoren Nosy List: Boris.FELD, PiotrSikora, georg.brandl, ixokai, lemburg, pitrou, python-dev, ronaldoussoren, ruseel, vstinner
Priority: normal Keywords: patch

Created on 2010年10月20日 15:31 by ixokai, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue10154.patch ronaldoussoren, 2011年05月07日 07:19 review
Messages (15)
msg119213 - (view) Author: Stephen Hansen (ixokai) (Python triager) Date: 2010年10月20日 15:31
In the course of investigating issue10092, Georg discovered that the behavior of locale.normalize() on Mac is bad.
Basically, "en_US.UTF-8" is how the "correct" locale string should be spelled on the Mac. If you drop the dash, it fails: which locale.normalize does, so you can't pass the return value of the function to setlocale, even though that's what its documented to be for.
If that isn't clear, this should demonstrate (from /branches/py3k):
Top-2:build pythonbuildbot$ ./python.exe
Python 3.2a3+ (py3k:85631, Oct 17 2010, 06:45:22) 
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
[51767 refs]
>>> locale.normalize("en_US.UTF-8")
'en_US.UTF8'
[51770 refs]
>>> locale.setlocale(locale.LC_TIME, 'en_US.UTF8')
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/Users/pythonbuildbot/test/build/Lib/locale.py", line 538, in setlocale
 return _setlocale(category, locale)
locale.Error: unsupported locale setting
[51816 refs]
>>> locale.setlocale(locale.LC_TIME, 'en_US.UTF-8')
'en_US.UTF-8'
[51816 refs]
The precise same behavior exists on my stock/system Python 2.6, too, fwiw. (Not that it can be fixed on 2.6, but maybe 2.7?)
msg119216 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2010年10月20日 15:46
This patch solves the immediate failure:
Index: Lib/locale.py
===================================================================
--- Lib/locale.py	(revision 85743)
+++ Lib/locale.py	(working copy)
@@ -396,6 +396,9 @@
 else:
 encoding = defenc
 #print 'found encoding %r' % encoding
+ if sys.platform == 'darwin' and encoding == 'UTF8':
+ encoding = 'UTF-8'
+
 if encoding:
 return langname + '.' + encoding
 else:
I'm not happy about hardcoding this specific exception though, there should be a better solution than this.
Ronald
msg119236 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010年10月20日 21:47
Ronald Oussoren wrote:
> 
> Ronald Oussoren <ronaldoussoren@mac.com> added the comment:
> 
> This patch solves the immediate failure:
> 
> Index: Lib/locale.py
> ===================================================================
> --- Lib/locale.py	(revision 85743)
> +++ Lib/locale.py	(working copy)
> @@ -396,6 +396,9 @@
> else:
> encoding = defenc
> #print 'found encoding %r' % encoding
> + if sys.platform == 'darwin' and encoding == 'UTF8':
> + encoding = 'UTF-8'
> +
> if encoding:
> return langname + '.' + encoding
> else:
> 
> I'm not happy about hardcoding this specific exception though, there should be a better solution than this.
Could you tell me the values of localename, code, langname and encoding
at that step in the process ?
We may need to add an locale_encoding_alias from 'UTF8' to 'UTF-8',
since the version with the hyphen is what the C lib uses.
msg119298 - (view) Author: Stephen Hansen (ixokai) (Python triager) Date: 2010年10月21日 13:53
Mark, the locals() right before "if encoding:" (line 399) are:
>>> locale.normalize("en_US.UTF-8")
{'code': 'en_US.ISO8859-1', 'langname': 'en_US', 'encoding': 'UTF8', 'norm_encoding': 'utf_8', 'defenc': 'ISO8859-1', 'localename': 'en_US.UTF-8', 'lookup_name': 'en_us.utf-8', 'fullname': 'en_us.utf-8'}
'en_US.UTF8'
msg119301 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010年10月21日 14:15
Stephen Hansen wrote:
> 
> Stephen Hansen <me+python@ixokai.io> added the comment:
> 
> Mark, the locals() right before "if encoding:" (line 399) are:
> 
>>>> locale.normalize("en_US.UTF-8")
> {'code': 'en_US.ISO8859-1', 'langname': 'en_US', 'encoding': 'UTF8', 'norm_encoding': 'utf_8', 'defenc': 'ISO8859-1', 'localename': 'en_US.UTF-8', 'lookup_name': 'en_us.utf-8', 'fullname': 'en_us.utf-8'}
> 'en_US.UTF8'
Thanks.
Line 646 in the alias table is wrong:
 'utf_8': 'UTF8',
should read:
 'utf_8': 'UTF-8',
I wonder why this wasn't reported earlier - did the GlibC change
the UTF-8 spelling at some point ? I do vaguely remember that I
had to remove the hyphen due to problems with setlocale() not
accepting 'UTF-8', but that was at the time I wrote that part
of locale.py, i.e. many years ago.
It doesn't appear to be necessary anymore. I checked on openSUSE
10.3 and 11.3. Both work fine with 'UTF-8' and 'UTF8'.
msg119309 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010年10月21日 15:27
If other Posix-y systems accept both spellings and only Macs insist on the dash, we should probably indeed change the alias entry to use it.
msg122374 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010年11月25日 15:42
Mandriva and Debian also work fine with both "UTF8" and "UTF-8". For the record, the canonical spelling inside /usr/share/locale is "UTF-8". I suppose glibc does its own normalization.
msg123553 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2010年12月07日 14:29
UTF-8 works on SuSE Enterprise Linux 9 and 10 as well. 
BTW, neither UTF8 nor UTF-8 work on HPUX 10. That platform requires spelling it as utf8. 
This sadly enought means that this code doesn't work on HPUX 10:
>>> locale.setlocale(locale.LC_ALL, locale.getdefaultlocale())
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/opt/python2.7/lib/python2.7/locale.py", line 531, in setlocale
 return _setlocale(category, locale)
locale.Error: unsupported locale setting
That's because getdefaultlocale returns 'UTF8' as the encoding, even though LANG is set to 'nl_NL.utf8' (which is a working locale on the machine I tested).
BTW. I'm +1 on changing the alias table as Marc-Andre proposed.
msg123667 - (view) Author: MunSic JEONG (ruseel) Date: 2010年12月09日 02:34
Ubuntu 10.4.1 LTS 
 also work fine with both "UTF8" and "UTF-8"
msg129662 - (view) Author: Boris FELD (Boris.FELD) * Date: 2011年02月27日 22:00
Bug confirmed on python2.5+ and python3.2-.
If it works with the dash, is agree with the Marc-Andre solution.
msg134271 - (view) Author: Piotr Sikora (PiotrSikora) Date: 2011年04月22日 16:52
It's the same on OpenBSD (and I'm pretty sure it's true for other BSDs as well).
>>> locale.resetlocale()
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/local/lib/python2.6/locale.py", line 523, in resetlocale
 _setlocale(category, _build_localename(getdefaultlocale()))
locale.Error: unsupported locale setting
>>> locale._build_localename(locale.getdefaultlocale())
'en_US.UTF8'
Works fine with Marc-Andre's alias table fix.
Any chances this will be eventually fixed in 2.x?
msg134450 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2011年04月26日 10:18
Piotr Sikora wrote:
> 
> Piotr Sikora <piotr.sikora@frickle.com> added the comment:
> 
> It's the same on OpenBSD (and I'm pretty sure it's true for other BSDs as well).
> 
>>>> locale.resetlocale()
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/usr/local/lib/python2.6/locale.py", line 523, in resetlocale
> _setlocale(category, _build_localename(getdefaultlocale()))
> locale.Error: unsupported locale setting
>>>> locale._build_localename(locale.getdefaultlocale())
> 'en_US.UTF8'
> 
> Works fine with Marc-Andre's alias table fix.
> 
> Any chances this will be eventually fixed in 2.x?
This can go into Python 2.7, and, of course, into the 3.x
branches.
msg135406 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2011年05月07日 07:19
The attached patch implements the change that Marc-Andre proposed.
I intend to apply this patch to all active branches later today (after some more testing)
msg136150 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年05月17日 12:10
New changeset 932de36903e7 by Ronald Oussoren in branch '2.7':
(backport)Fix #10154 and #10090: locale normalizes the UTF-8 encoding to "UTF-8" instead of "UTF8"
http://hg.python.org/cpython/rev/932de36903e7
New changeset 28e410eb86af by Ronald Oussoren in branch '3.1':
Fix #10154 and #10090: locale normalizes the UTF-8 encoding to "UTF-8" instead of "UTF8"
http://hg.python.org/cpython/rev/28e410eb86af
New changeset 454d13e535ff by Ronald Oussoren in branch '3.2':
(merge) Fix #10154 and #10090: locale normalizes the UTF-8 encoding to "UTF-8" instead of "UTF8"
http://hg.python.org/cpython/rev/454d13e535ff 
msg136154 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年05月17日 12:49
New changeset 3d7cb852a176 by Ronald Oussoren in branch 'default':
Fix for issue 10154, merge from 3.2
http://hg.python.org/cpython/rev/3d7cb852a176 
History
Date User Action Args
2022年04月11日 14:57:07adminsetgithub: 54363
2014年10月02日 08:28:35serhiy.storchakalinkissue1176504 superseder
2011年05月17日 12:49:51python-devsetmessages: + msg136154
2011年05月17日 12:14:26ronaldoussorensetstatus: open -> closed
2011年05月17日 12:14:03ronaldoussorensetresolution: fixed
stage: needs patch -> resolved
2011年05月17日 12:10:12python-devsetnosy: + python-dev
messages: + msg136150
2011年05月07日 08:07:34vstinnersetnosy: + vstinner
2011年05月07日 07:19:43ronaldoussorensetfiles: + issue10154.patch
keywords: + patch
messages: + msg135406
2011年04月26日 10:18:54lemburgsetmessages: + msg134450
title: locale.normalize strips "-" from UTF-8, which fails on Mac -> locale.normalize strips "-" from UTF-8, which fails on Mac
2011年04月23日 15:46:10eric.araujosettitle: locale.normalize strips "-" from UTF-8, which fails on Mac -> locale.normalize strips "-" from UTF-8, which fails on Mac
stage: needs patch
versions: + Python 3.3, - Python 2.6, Python 2.5
2011年04月22日 16:52:24PiotrSikorasetnosy: + PiotrSikora
messages: + msg134271
2011年02月27日 22:00:08Boris.FELDsetnosy: + Boris.FELD

messages: + msg129662
versions: + Python 2.6, Python 2.5
2010年12月09日 02:34:31ruseelsetmessages: + msg123667
2010年12月07日 14:29:42ronaldoussorensetmessages: + msg123553
2010年11月25日 15:42:48pitrousetnosy: + pitrou
messages: + msg122374
2010年11月25日 02:12:40ruseelsetnosy: + ruseel
2010年10月22日 17:37:08eric.araujolinkissue10090 dependencies
2010年10月21日 15:27:04georg.brandlsetnosy: + georg.brandl
messages: + msg119309
2010年10月21日 14:15:06lemburgsetmessages: + msg119301
2010年10月21日 13:53:57ixokaisetmessages: + msg119298
2010年10月20日 21:47:40lemburgsetnosy: + lemburg
title: locale.normalize strips "-" from UTF-8, which fails on Mac -> locale.normalize strips "-" from UTF-8, which fails on Mac
messages: + msg119236
2010年10月20日 15:49:01ronaldoussorensetfiles: - smime.p7s
2010年10月20日 15:46:22ronaldoussorensetfiles: + smime.p7s

messages: + msg119216
2010年10月20日 15:31:23ixokaicreate

AltStyle によって変換されたページ (->オリジナル) /