homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Obsolete default file encoding "mac-roman" on OS X, not influenced by locale env variables
Type: behavior Stage:
Components: IO, Library (Lib), macOS Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ronaldoussoren Nosy List: benjamin.peterson, ned.deily, ronaldoussoren
Priority: release blocker Keywords: patch

Created on 2009年06月05日 10:37 by ned.deily, last changed 2022年04月11日 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
fix_mac_encoding.patch benjamin.peterson, 2009年06月05日 17:53
Messages (6)
msg88929 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2009年06月05日 10:37
Potential Release Blocker
The default file encoding for 3.x file objects is the value of 
locale.getpreferredencoding(). Currently, the locale module behavior on 
OS X deviates from other python POSIX platforms in a few unexpected and 
bad ways:
1. On OS X, locale.getpreferredencoding() returns "mac-roman", an 
obsolete encoding from the old "Classic" MacOS days.
2. Unlike other POSIX platforms (at least Debian Linux), the values 
returned by locale.getdefaultlocale() and locale.getpreferredencoding() 
on OS X are not influenced by the settings of the POSIX locale 
environment variables, i.e LANG. So, unlike on the other POSIX 
platforms, one can't override the (obsolete) encoding without explicitly 
setting the encoding argument to open().
Compare the results from Debian Linux:
$ unset LANG
$ python3.1
Python 3.1a1+ (py3k, Mar 23 2009, 00:12:12) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding()
'ANSI_X3.4-1968'
>>> open('blah','r').encoding
'ANSI_X3.4-1968'
>>> locale.getlocale()
(None, None)
>>> locale.getdefaultlocale()
(None, None)
>>> 
$ export LANG=en_US.UTF-8
$ python3.1
Python 3.1a1+ (py3k, Mar 23 2009, 00:12:12) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding()
'UTF-8'
>>> open('blah','r').encoding
'UTF-8'
>>> locale.getlocale()
('en_US', 'UTF8')
>>> locale.getdefaultlocale()
('en_US', 'UTF8')
>>> 
... to OS X:
$ unset LANG
$ python3.1
Python 3.1rc1+ (py3k, Jun 3 2009, 14:31:41) 
[GCC 4.0.1 (Apple Computer, Inc. build 5370)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding()
'mac-roman'
>>> open('blah','r').encoding
'mac-roman'
>>> locale.getlocale()
(None, None)
>>> locale.getdefaultlocale()
(None, 'mac-roman')
>>> 
$ export LANG=en_US.UTF-8
$ python3.1
Python 3.1rc1+ (py3k, Jun 3 2009, 14:31:41) 
[GCC 4.0.1 (Apple Computer, Inc. build 5370)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.getpreferredencoding()
'mac-roman'
>>> open('blah','r').encoding
'mac-roman'
>>> locale.getlocale()
('en_US', 'UTF8')
>>> locale.getdefaultlocale()
(None, 'mac-roman')
>>> 
A quick look at the code shows that part of the problem is in 
Modules/_localemodule.c where there is a #if defined(__APPLE__) version 
of PyLocale_getdefaultlocale which appears to have its origins in MacOS 
and should probably just be removed and locale.py modified to 
eliminate/minimize the special case mac/darwin code. For the case of no 
locale, "UTF-8" would seem to be a reasonable default. In any case, 
"mac-roman" is not.
msg88938 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2009年06月05日 11:47
I'm setting the priority to "release blocker" because the current 
behaviour is completely unwanted, the "mac-roman" encoding is no longer 
used by default on OSX. All system tools write UTF-8 encoded files by 
default, and the LANG variable is set to an UTF8 encoding as well.
I won't be able to look into before sunday, and possibly only after next 
week (that is june 15th or later) because I'll be at a conference and 
don't know if I have spare time to spent on this after sunday.
msg88957 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009年06月05日 17:53
Here's a patch. (for the trunk as it is also afflicted) It simply
removes the specific mac cases and uses posix detection.
msg88978 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2009年06月05日 22:22
A very quick test of the patch on trunk for 10.4 and 10.5 looks good, 
though it should be re-tested once the unrelated current breakage of 
test__locale is fixed.
msg89043 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2009年06月07日 15:29
The patch looks good, and tests pass on 10.5.7.
I've committed this as r73268 
msg89048 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2009年06月07日 17:43
(and committed to trunk in r73270 by Benjamin)
History
Date User Action Args
2022年04月11日 14:56:49adminsetgithub: 50451
2010年10月26日 18:16:58ned.deilylinkissue3362 superseder
2009年06月25日 13:48:42r.david.murraylinkissue6315 superseder
2009年06月07日 17:43:56ned.deilysetmessages: + msg89048
2009年06月07日 15:29:59ronaldoussorensetstatus: open -> closed
resolution: fixed
messages: + msg89043
2009年06月05日 22:22:56ned.deilysetmessages: + msg88978
2009年06月05日 17:53:13benjamin.petersonsetfiles: + fix_mac_encoding.patch
keywords: + patch
messages: + msg88957

versions: + Python 2.7
2009年06月05日 11:47:06ronaldoussorensetpriority: release blocker

messages: + msg88938
2009年06月05日 10:37:09ned.deilycreate

AltStyle によって変換されたページ (->オリジナル) /