homepage

This issue tracker has been migrated to GitHub , and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: open: avoid the locale encoding when possible
Type: Stage:
Components: Unicode Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, python-dev, vstinner
Priority: normal Keywords: patch

Created on 2011年06月30日 12:20 by vstinner, last changed 2022年04月11日 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
open_hook.patch vstinner, 2011年06月30日 12:20 review
Messages (15)
msg139473 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月30日 12:20
open() uses the locale encoding in Python 3 when opening text file if the encoding argument is not specified (implicit). Some functions use locale encoding, but it's not the right encoding. I see at least three cases where the encoding should be changed:
 - UTF-8 should be used instead for portability: it's a bug in the module
 - ASCII must be used instead: the module doesn't support non-ASCII characters (old file formats, old network protocols, some fields of a document, etc.)
 - ASCII can be used instead: it's just a micro-optimization, the ASCII encoding is a little bit faster
To detect the usage of the implicit locale encoding, some functions can be monkeypatched:
 - builtins.open, io.open, _pyio.open
 - io.TextIOWrapper, _pyio.TextIOWrapper
 - more functions using directly or indirectly open/TextIOWrapper may be patched to emit the warning earlier
Attached open_hook.patch implements these hooks (hacks?) in the site module: it emits a ResourceWarning. Use python -Werror to raise an error if the locale encoding is used implicitly. If you really want to use the locale encoding, use encoding='locale' to make quiet the warning.
Quite all functions in Python uses the implicit locale encoding. For example, Python doesn't start with the patch and -Werror. If you use -Werror, you have to patch *all* calls to open()/TextIOWrapper to be able to locate real bugs, or the program will stop before hitting the real problems. Each time you have to check what is the real expected encoding, it takes a lot of time.
I started this huge project. I'm using ASCII most of the time (especially in Python tests), I don't know if it's correct. It will require a second step to ensure that the function really don't use/support non-ASCII characters.
I will use this issue for my commits, attach patches, and more generally discuss this topic.
msg139477 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年06月30日 13:42
New changeset bd73edea78dc by Victor Stinner in branch '3.2':
Issue #12451: distutils now opens the setup script in binary mode to read the
http://hg.python.org/cpython/rev/bd73edea78dc
New changeset 8a7fd54cba01 by Victor Stinner in branch 'default':
(merge 3.2) Issue #12451: distutils now opens the setup script in binary mode
http://hg.python.org/cpython/rev/8a7fd54cba01 
msg139478 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年06月30日 14:00
New changeset 1942f7c8f51c by Victor Stinner in branch '3.2':
Issue #12451: pydoc.synopsis() now reads the encoding cookie if available, to
http://hg.python.org/cpython/rev/1942f7c8f51c
New changeset 3e627877b5a9 by Victor Stinner in branch 'default':
(merge 3.2) Issue #12451: pydoc.synopsis() now reads the encoding cookie if
http://hg.python.org/cpython/rev/3e627877b5a9 
msg139495 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年06月30日 15:39
New changeset bafc5c7d24b2 by Victor Stinner in branch '3.2':
Issue #12451: doctest.debug_script() doesn't create a temporary file anymore to
http://hg.python.org/cpython/rev/bafc5c7d24b2
New changeset 77c589b27e90 by Victor Stinner in branch 'default':
(merge 3.2) Issue #12451: doctest.debug_script() doesn't create a temporary
http://hg.python.org/cpython/rev/77c589b27e90 
msg139496 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月30日 15:41
See also issue #9561 for distutils: I just attached a new patch for PKG-INFO / .egg-info files.
msg139497 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年06月30日 16:11
New changeset 45e3dafb3dbe by Victor Stinner in branch '3.2':
Issue #12451: The XInclude default loader of xml.etree now decodes files from
http://hg.python.org/cpython/rev/45e3dafb3dbe
New changeset e8eea84a90dc by Victor Stinner in branch 'default':
(merge 3.2) Issue #12451: The XInclude default loader of xml.etree now decodes
http://hg.python.org/cpython/rev/e8eea84a90dc 
msg139498 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年06月30日 16:21
New changeset 68bc1a29ba5a by Victor Stinner in branch '3.2':
Issue #12451: Open files in binary mode in some tests when the text file is not
http://hg.python.org/cpython/rev/68bc1a29ba5a
New changeset 3969b6377f52 by Victor Stinner in branch 'default':
(merge 3.2) Issue #12451: Open files in binary mode in some tests when the text
http://hg.python.org/cpython/rev/3969b6377f52 
msg139500 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年06月30日 16:25
New changeset c4388478f9b2 by Victor Stinner in branch 'default':
Issue #12451: Open the test file in binary mode in test_bz2, the text file is
http://hg.python.org/cpython/rev/c4388478f9b2 
msg139503 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年06月30日 16:46
See also issue #12454 for the mailbox module (.mh_sequences).
msg139525 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年06月30日 21:26
New changeset 0c49260e85a0 by Victor Stinner in branch 'default':
Issue #12451: Add support.create_empty_file()
http://hg.python.org/cpython/rev/0c49260e85a0 
msg139716 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年07月03日 23:29
New changeset 81424281ee59 by Victor Stinner in branch '3.2':
Issue #12451: xml.dom.pulldom: parse() now opens files in binary mode instead
http://hg.python.org/cpython/rev/81424281ee59
New changeset c039c6b58907 by Victor Stinner in branch 'default':
(merge 3.2) Issue #12451: xml.dom.pulldom: parse() now opens files in binary
http://hg.python.org/cpython/rev/c039c6b58907 
msg139717 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年07月03日 23:47
New changeset cd1759711357 by Victor Stinner in branch '3.2':
Issue #12451: runpy: run_path() now opens the Python script in binary mode,
http://hg.python.org/cpython/rev/cd1759711357
New changeset e240af1f0ae1 by Victor Stinner in branch 'default':
(merge 3.2) Issue #12451: runpy: run_path() now opens the Python script in
http://hg.python.org/cpython/rev/e240af1f0ae1 
msg139718 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年07月04日 00:14
New changeset a1b4f1716b73 by Victor Stinner in branch '3.2':
Issue #12451: pydoc: importfile() now opens the Python script in binary mode,
http://hg.python.org/cpython/rev/a1b4f1716b73
New changeset 5ca136dccbf7 by Victor Stinner in branch 'default':
(merge 3.2) Issue #12451: pydoc: importfile() now opens the Python script in
http://hg.python.org/cpython/rev/5ca136dccbf7 
msg139864 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011年07月05日 12:32
New changeset 8b62f5d722f4 by Victor Stinner in branch '3.2':
Issue #12451: pydoc: html_getfile() now uses tokenize.open() to support Python
http://hg.python.org/cpython/rev/8b62f5d722f4
New changeset 2fbfb7ea362f by Victor Stinner in branch 'default':
(merge 3.2) Issue #12451: pydoc: html_getfile() now uses tokenize.open() to
http://hg.python.org/cpython/rev/2fbfb7ea362f 
msg145754 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011年10月17日 18:45
Ok, it should be enough :-)
History
Date User Action Args
2022年04月11日 14:57:19adminsetgithub: 56660
2011年10月17日 18:45:19vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg145754
2011年07月05日 12:32:01python-devsetmessages: + msg139864
2011年07月04日 00:14:57python-devsetmessages: + msg139718
2011年07月03日 23:47:59python-devsetmessages: + msg139717
2011年07月03日 23:29:13python-devsetmessages: + msg139716
2011年06月30日 21:26:00python-devsetmessages: + msg139525
2011年06月30日 16:51:16Arfreversetnosy: + Arfrever
2011年06月30日 16:46:58vstinnersetmessages: + msg139503
2011年06月30日 16:25:28python-devsetmessages: + msg139500
2011年06月30日 16:21:45python-devsetmessages: + msg139498
2011年06月30日 16:11:26python-devsetmessages: + msg139497
2011年06月30日 15:41:25vstinnersetmessages: + msg139496
2011年06月30日 15:39:42python-devsetmessages: + msg139495
2011年06月30日 14:00:04python-devsetmessages: + msg139478
2011年06月30日 13:42:02python-devsetnosy: + python-dev
messages: + msg139477
2011年06月30日 12:20:50vstinnercreate

AltStyle によって変換されたページ (->オリジナル) /