[Python-checkins] peps: Add PEP 488: elimination of PYO files

brett.cannon python-checkins at python.org
Fri Mar 6 17:31:05 CET 2015


https://hg.python.org/peps/rev/5ff20cda4c34
changeset: 5721:5ff20cda4c34
user: Brett Cannon <brett at python.org>
date: Fri Mar 06 11:29:18 2015 -0500
summary:
 Add PEP 488: elimination of PYO files
files:
 pep-0488.txt | 303 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 303 insertions(+), 0 deletions(-)
diff --git a/pep-0488.txt b/pep-0488.txt
new file mode 100644
--- /dev/null
+++ b/pep-0488.txt
@@ -0,0 +1,303 @@
+PEP: 488
+Title: Elimination of PYO files
+Version: $Revision$
+Last-Modified: $Date$
+Author: Brett Cannon <brett at python.org>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 20-Feb-2015
+Post-History:
+ 2015年03月06日
+
+Abstract
+========
+
+This PEP proposes eliminating the concept of PYO files from Python.
+To continue the support of the separation of bytecode files based on
+their optimization level, this PEP proposes extending the PYC file
+name to include the optimization level in bytecode repository
+directory (i.e., the ``__pycache__`` directory).
+
+
+Rationale
+=========
+
+As of today, bytecode files come in two flavours: PYC and PYO. A PYC
+file is the bytecode file generated and read from when no
+optimization level is specified at interpreter startup (i.e., ``-O``
+is not specified). A PYO file represents the bytecode file that is
+read/written when **any** optimization level is specified (i.e., when
+``-O`` is specified, including ``-OO``). This means that while PYC
+files clearly delineate the optimization level used when they were
+generated -- namely no optimizations beyond the peepholer -- the same
+is not true for PYO files. Put in terms of optimization levels and
+the file extension:
+
+ - 0: ``.pyc``
+ - 1 (``-O``): ``.pyo``
+ - 2 (``-OO``): ``.pyo``
+
+The reuse of the ``.pyo`` file extension for both level 1 and 2
+optimizations means that there is no clear way to tell what
+optimization level was used to generate the bytecode file. In terms
+of reading PYO files, this can lead to an interpreter using a mixture
+of optimization levels with its code if the user was not careful to
+make sure all PYO files were generated using the same optimization
+level (typically done by blindly deleting all PYO files and then
+using the `compileall` module to compile all-new PYO files [1]_).
+This issue is only compounded when people optimize Python code beyond
+what the interpreter natively supports, e.g., using the astoptimizer
+project [2]_.
+
+In terms of writing PYO files, the need to delete all PYO files
+every time one either changes the optimization level they want to use
+or are unsure of what optimization was used the last time PYO files
+were generated leads to unnecessary file churn. The change proposed
+by this PEP also allows for **all** optimization levels to be
+pre-compiled for bytecode files ahead of time, something that is
+currently impossible thanks to the reuse of the ``.pyo`` file
+extension for multiple optimization levels.
+
+As for distributing bytecode-only modules, having to distribute both
+``.pyc`` and ``.pyo`` files is unnecessary for the common use-case
+of code obfuscation and smaller file deployments.
+
+
+Proposal
+========
+
+To eliminate the ambiguity that PYO files present, this PEP proposes
+eliminating the concept of PYO files and their accompanying ``.pyo``
+file extension. To allow for the optimization level to be unambiguous
+as well as to avoid having to regenerate optimized bytecode files
+needlessly in the `__pycache__` directory, the optimization level
+used to generate a PYC file will be incorporated into the bytecode
+file name. Currently bytecode file names are created by
+``importlib.util.cache_from_source()``, approximately using the
+following expression defined by PEP 3147 [3]_, [4]_, [5]_::
+
+ '{name}.{cache_tag}.pyc'.format(name=module_name,
+ cache_tag=sys.implementation.cache_tag)
+
+This PEP proposes to change the expression to::
+
+ '{name}.{cache_tag}.opt-{optimization}.pyc'.format(
+ name=module_name,
+ cache_tag=sys.implementation.cache_tag,
+ optimization=str(sys.flags.optimize))
+
+The "opt-" prefix was chosen so as to provide a visual separator
+from the cache tag. The placement of the optimization level after
+the cache tag was chosen to preserve lexicographic sort order of
+bytecode file names based on module name and cache tag which will
+not vary for a single interpreter. The "opt-" prefix was chosen over
+"o" so as to be somewhat self-documenting. The "opt-" prefix was
+chosen over "O" so as to not have any confusion with "0" while being
+so close to the interpreter version number.
+
+A period was chosen over a hyphen as a separator so as to distinguish
+clearly that the optimization level is not part of the interpreter
+version as specified by the cache tag. It also lends to the use of
+the period in the file name to delineate semantically different
+concepts.
+
+For example, the bytecode file name of ``importlib.cpython-35.pyc``
+would become ``importlib.cpython-35.opt-0.pyc``. If ``-OO`` had been
+passed to the interpreter then instead of
+``importlib.cpython-35.pyo`` the file name would be
+``importlib.cpython-35.opt-2.pyc``.
+
+
+Implementation
+==============
+
+importlib
+---------
+
+As ``importlib.util.cache_from_source()`` is the API that exposes
+bytecode file paths as while as being directly used by importlib, it
+requires the most critical change. As of Python 3.4, the function's
+signature is::
+
+ importlib.util.cache_from_source(path, debug_override=None)
+
+This PEP proposes changing the signature in Python 3.5 to::
+
+ importlib.util.cache_from_source(path, debug_override=None, *, optimization=None)
+
+The introduced ``optimization`` keyword-only parameter will control
+what optimization level is specified in the file name. If the
+argument is ``None`` then the current optimization level of the
+interpreter will be assumed. Any argument given for ``optimization``
+will be passed to ``str()`` and must have ``str.isalnum()`` be true,
+else ``ValueError`` will be raised (this prevents invalid characters
+being used in the file name). If the empty string is passed in for
+``optimization`` then the addition of the optimization will be
+suppressed, reverting to the file name format which predates this
+PEP.
+
+It is expected that beyond Python's own
+0-2 optimization levels, third-party code will use a hash of
+optimization names to specify the optimization level, e.g.
+``hashlib.sha256(','.join(['dead code elimination', 'constant folding'])).hexdigest()``.
+While this might lead to long file names, it is assumed that most
+users never look at the contents of the __pycache__ directory and so
+this won't be an issue.
+
+The ``debug_override`` parameter will be deprecated. As the parameter
+expects a boolean, the integer value of the boolean will be used as
+if it had been provided as the argument to ``optimization`` (a
+``None`` argument will mean the same as for ``optimization``). A
+deprecation warning will be raised when ``debug_override`` is given a
+value other than ``None``, but there are no plans for the complete
+removal of the parameter as this time (but removal will be no later
+than Python 4).
+
+The various module attributes for importlib.machinery which relate to
+bytecode file suffixes will be updated [7]_. The
+``DEBUG_BYTECODE_SUFFIXES`` and ``OPTIMIZED_BYTECODE_SUFFIXES`` will
+both be documented as deprecated and set to the same value as
+``BYTECODE_SUFFIXES`` (removal of ``DEBUG_BYTECODE_SUFFIXES`` and
+``OPTIMIZED_BYTECODE_SUFFIXES`` is not currently planned, but will be
+not later than Python 4).
+
+All various finders and loaders will also be updated as necessary,
+but updating the previous mentioned parts of importlib should be all
+that is required.
+
+
+Rest of the standard library
+----------------------------
+
+The various functions exposed by the ``py_compile`` and
+``compileall`` functions will be updated as necessary to make sure
+they follow the new bytecode file name semantics [6]_, [1]_. The CLI
+for the ``compileall`` module will not be directly affected (the
+``-b`` flag will be implicitly as it will no longer generate ``.pyo``
+files when ``-O`` is specified).
+
+
+Compatibility Considerations
+============================
+
+Any code directly manipulating bytecode files from Python 3.2 on
+will need to consider the impact of this change on their code (prior
+to Python 3.2 -- including all of Python 2 -- there was no
+__pycache__ which already necessitates bifurcating bytecode file
+handling support). If code was setting the ``debug_override``
+argument to ``importlib.util.cache_from_source()`` then care will be
+needed if they want the path to a bytecode file with an optimization
+level of 2. Otherwise only code **not** using
+``importlib.util.cache_from_source()`` will need updating.
+
+As for people who distribute bytecode-only modules (i.e., use a
+bytecode file instead of a source file), they will have to choose
+which optimization level they want their bytecode files to be since
+distributing a ``.pyo`` file with a ``.pyc`` file will no longer be
+of any use. Since people typically only distribute bytecode files for
+code obfuscation purposes or smaller distribution size then only
+having to distribute a single ``.pyc`` should actually be beneficial
+to these use-cases. And since the magic number for bytecode files
+changed in Python 3.5 to support PEP 465 there is no need to support
+pre-existing ``.pyo`` files [8]_.
+
+
+Rejected Ideas
+==============
+
+N/A
+
+
+Open Issues
+===========
+
+Formatting of the optimization level in the file name
+-----------------------------------------------------
+
+Using the "opt-" prefix and placing the optimization level between
+the cache tag and file extension is not critical. All options which
+have been considered are:
+
+* ``importlib.cpython-35.opt-0.pyc``
+* ``importlib.cpython-35.opt0.pyc``
+* ``importlib.cpython-35.o0.pyc``
+* ``importlib.cpython-35.O0.pyc``
+* ``importlib.cpython-35.0.pyc``
+* ``importlib.cpython-35-O0.pyc``
+* ``importlib.O0.cpython-35.pyc``
+* ``importlib.o0.cpython-35.pyc``
+* ``importlib.0.cpython-35.pyc``
+
+These were initially rejected either because they would change the
+sort order of bytecode files, possible ambiguity with the cache tag,
+or were not self-documenting enough.
+
+
+Not specifying the optimization level when it is at 0
+-----------------------------------------------------
+
+It has been suggested that for the common case of when the
+optimizations are at level 0 that the entire part of the file name
+relating to the optimization level be left out. This would allow for
+file names of ``.pyc`` files to go unchanged, potentially leading to
+less backwards-compatibility issues (although Python 3.5 introduces a
+new magic number for bytecode so all bytecode files will have to be
+regenerated regardless of the outcome of this PEP).
+
+It would also allow a potentially redundant bit of information to be
+left out of the file name if an implementation of Python did not
+allow for optimizing bytecode. This would only occur, though, if the
+interpreter didn't support ``-O`` **and** didn't implement the ast
+module, else users could implement their own optimizations.
+
+Arguments against allow this special case is "explicit is better than
+implicit" and "special cases aren't special enough to break the
+rules". There are also currently no Python 3 interpreters that don't
+support ``-O``, so a potential Python 3 implementation which doesn't
+allow bytecode optimization is entirely theoretical at the moment.
+
+
+References
+==========
+
+.. [1] The compileall module
+ (https://docs.python.org/3/library/compileall.html#module-compileall)
+
+.. [2] The astoptimizer project
+ (https://pypi.python.org/pypi/astoptimizer)
+
+.. [3] ``importlib.util.cache_from_source()``
+ (https://docs.python.org/3.5/library/importlib.html#importlib.util.cache_from_source)
+
+.. [4] Implementation of ``importlib.util.cache_from_source()`` from CPython 3.4.3rc1
+ (https://hg.python.org/cpython/file/038297948389/Lib/importlib/_bootstrap.py#l437)
+
+.. [5] PEP 3147, PYC Repository Directories, Warsaw
+ (http://www.python.org/dev/peps/pep-3147)
+
+.. [6] The py_compile module
+ (https://docs.python.org/3/library/compileall.html#module-compileall)
+
+.. [7] The importlib.machinery module
+ (https://docs.python.org/3/library/importlib.html#module-importlib.machinery)
+
+.. [8] ``importlib.util.MAGIC_NUMBER``
+ (https://docs.python.org/3/library/importlib.html#importlib.util.MAGIC_NUMBER)
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+
+..
+ Local Variables:
+ mode: indented-text
+ indent-tabs-mode: nil
+ sentence-end-double-space: t
+ fill-column: 70
+ coding: utf-8
+ End:
-- 
Repository URL: https://hg.python.org/peps


More information about the Python-checkins mailing list

AltStyle によって変換されたページ (->オリジナル) /