[Python-checkins] r78122 - peps/trunk/pep-3146.txt

Tue Feb 9 03:51:27 CET 2010

Author: collin.winter
Date: Tue Feb 9 03:51:26 2010
New Revision: 78122
Log:
Commit updated version of PEP 3146.
Modified:
 peps/trunk/pep-3146.txt
Modified: peps/trunk/pep-3146.txt
==============================================================================

--- peps/trunk/pep-3146.txt	(original)
+++ peps/trunk/pep-3146.txt	Tue Feb 9 03:51:26 2010
@@ -165,6 +165,50 @@
 framework that the wider CPython development community can build upon it for
 years to come, extracting increased performance in each subsequent release.
 
+Alternatives
+------------
+
+There are number of alternative strategies for improving Python performance
+which we considered, but found unsatisfactory.
+
+- *Cython, Shedskin*: Cython [#cython]_ and Shedskin [#shedskin]_ are both
+ static compilers for Python. We view these as useful-but-limited workarounds
+ for CPython's historically-poor performance. Shedskin does not support the
+ full Python standard library [#shedskin-library-limits]_, while Cython
+ requires manual Cython-specific annotations for optimum performance.
+
+ Static compilers like these are useful for writing extension modules without
+ worrying about reference counting, but because they are static, ahead-of-time
+ compilers, they cannot optimize the full range of code under consideration by
+ a just-in-time compiler informed by runtime data.
+- *IronPython*: IronPython [#ironpython]_ is Python on Microsoft's .Net
+ platform. It is not actively tested on Mono [#mono]_, meaning that it is
+ essentially Windows-only, making it unsuitable as a general CPython
+ replacement.
+- *Jython*: Jython [#jython]_ is a complete implementation of Python 2.5, but
+ is significantly slower than Unladen Swallow (3-5x on measured benchmarks) and
+ has no support for CPython extension modules [#jython-c-ext]_, which would
+ make migration of large applications prohibitively expensive.
+- *Psyco*: Psyco [#psyco]_ is a specializing JIT compiler for CPython,
+ implemented as an extension module. It primarily improves performance for
+ numerical code. Pros: exists; makes some code faster. Cons: 32-bit only, with
+ no plans for 64-bit support; supports x86 only; very difficult to maintain;
+ incompatible with SSE2 optimized code due to alignment issues.
+- *PyPy*: PyPy [#pypy]_ has good performance on numerical code, but is slower
+ than Unladen Swallow on non-numerical workloads. PyPy only supports 32-bit
+ x86 code generation. It has poor support for CPython extension modules,
+ making migration for large applications prohibitively expensive.
+- *PyV8*: PyV8 [#pyv8]_ is an alpha-stage experimental Python-to-JavaScript
+ compiler that runs on top of V8. PyV8 does not implement the whole Python
+ language, and has no support for CPython extension modules.
+- *WPython*: WPython [#wpython]_ is a wordcode-based reimplementation of
+ CPython's interpreter loop. While it provides a modest improvement to
+ interpreter performance [#wpython-performance]_, it is not an either-or
+ substitute for a just-in-time compiler. An interpreter will never be as fast
+ as optimized machine code. We view WPython and similar interpreter
+ enhancements as complementary to our work, rather than as competitors.
+
+
 
 Performance
 ===========
@@ -411,6 +455,25 @@
 Stddev: 0.00214 -> 0.00240: 1.1209x larger
 Timeline: http://tinyurl.com/yajn8fa
 
+ ### bzr_startup ###
+ Min: 0.067990 -> 0.097985: 1.4412x slower
+ Avg: 0.084322 -> 0.111348: 1.3205x slower
+ Significant (t=-37.432534, a=0.95)
+ Stddev: 0.00793 -> 0.00643: 1.2330x smaller
+ Timeline: http://tinyurl.com/ybdm537
+
+ ### hg_startup ###
+ Min: 0.016997 -> 0.024997: 1.4707x slower
+ Avg: 0.026990 -> 0.036772: 1.3625x slower
+ Significant (t=-53.104502, a=0.95)
+ Stddev: 0.00406 -> 0.00417: 1.0273x larger
+ Timeline: http://tinyurl.com/ycout8m
+
+
+``bzr_startup`` and ``hg_startup`` measure how long it takes Bazaar and
+Mercurial, respectively, to display their help screens. ``startup_nosite``
+runs ``python -S`` many times; usage of the ``-S`` option is rare, but we feel
+this gives a good indication of where increased startup time is coming from.
 
 Unladen Swallow has made headway toward optimizing startup time, but there is
 still more work to do and further optimizations to implement. Improving start-up
@@ -422,40 +485,31 @@
 -----------
 
 Statically linking LLVM's code generation, analysis and optimization libraries
-significantly increases the size of the ``python`` binary.
-
-
-32-bit; gcc 4.0.3
-
-+-------------+---------------+---------------+----------------------+
-| Binary size | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 |
-+=============+===============+===============+======================+
-| Release | 3.8M | 4.0M | 74M |
-+-------------+---------------+---------------+----------------------+
-| Debug | 3.3M | 3.6M | 118M |
-+-------------+---------------+---------------+----------------------+
-
-64-bit; gcc 4.2.4
-
-+-------------+---------------+---------------+----------------------+
-| Binary size | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 |
-+=============+===============+===============+======================+
-| Release | 5.5M | 5.7M | 89M |
-+-------------+---------------+---------------+----------------------+
-| Debug | 4.1M | 4.4M | 128M |
-+-------------+---------------+---------------+----------------------+
-
-The increased binary size is due to statically linking LLVM's code generation,
-analysis and optimization libraries into the ``python`` binary. This can be
-straightforwardly addressed by modifying LLVM to better support shared linking
-and then using that, instead of the current static linking. For the moment,
-though, static linking provides an accurate look at the cost of linking against
-LLVM.
-
-Unladen Swallow recently experienced a regression in binary size, going from
-19MB in Unladen's 2009Q3 release up to the current 74MB shown in the table
-above. Resolution of this issue [#us-binary-size]_ will block final merger into
-the ``py3k`` branch.
+significantly increases the size of the ``python`` binary. The tables below
+report stripped on-disk binary sizes; the binaries are stripped to better
+correspond with the configurations used by system package managers. We feel this
+is the most realistic measure of any change in binary size.
+
+
++-------------+---------------+---------------+-----------------------+
+| Binary size | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r1041 |
++=============+===============+===============+=======================+
+| 32-bit | 1.3M | 1.4M | 12M |
++-------------+---------------+---------------+-----------------------+
+| 64-bit | 1.6M | 1.6M | 12M |
++-------------+---------------+---------------+-----------------------+
+
+
+The increased binary size is caused by statically linking LLVM's code
+generation, analysis and optimization libraries into the ``python`` binary.
+This can be straightforwardly addressed by modifying LLVM to better support
+shared linking and then using that, instead of the current static linking. For
+the moment, though, static linking provides an accurate look at the cost of
+linking against LLVM.
+
+Even when statically linking, we believe there is still headroom to improve
+on-disk binary size by narrowing Unladen Swallow's dependencies on LLVM. This
+issue is actively being addressed [#us-binary-size]_.
 
 
 Performance Retrospective
@@ -610,7 +664,8 @@
 best support on x86 and x86-64 systems, and these are the platforms where
 Unladen Swallow has received the most testing. We are confident in LLVM/Unladen
 Swallow's support for x86 and x86-64 hardware. PPC and ARM support exists, but
-is not widely used and may be buggy.
+is not widely used and may be buggy (for example, [#llvm-ppc-eager-jit-issue]_,
+[#llvm-far-call-issue]_, [#llvm-arm-jit-issue]_).
 
 Unladen Swallow is known to work on the following operating systems: Linux,
 Darwin, Windows. Unladen Swallow has received the most testing on Linux and
@@ -631,7 +686,7 @@
 --------------------------------------------------------
 
 Unladen Swallow's JIT compiler operates on CPython bytecode, and as such, it is
-immune to Python languages changes that only affect the parser.
+immune to Python language changes that affect only the parser.
 
 We recommend that changes to the CPython bytecode compiler or the semantics of
 individual bytecodes be prototyped in the interpreter loop first, then be ported
@@ -765,6 +820,10 @@
 Unladen Swallow [#us-oprofile-change]_, other profiling tools should be easy as
 well, provided they support a similar JIT interface [#oprofile-jit-interface]_.
 
+We have documented the process for using oProfile to profile Unladen Swallow
+[#oprofile-workflow]_. This document will be merged into CPython's `Doc/` tree
+in the merge.
+
 
 Addition of C++ to CPython
 --------------------------
@@ -781,12 +840,17 @@
 - Easy use of LLVM's full, powerful code generation and related APIs.
 - Convenient, abstract data structures simplify code.
 - C++ is limited to relatively small corners of the CPython codebase.
+- C++ can be disabled via ``./configure --without-llvm``, which even omits the
+ dependency on ``libstdc++``.
 
 Lowlights:
 
 - Developers must know two related languages, C and C++ to work on the full
 range of CPython's internals.
 - A C++ style guide will need to be developed and enforced. See `Open Issues`_.
+- Different C++ compilers emit different ABIs; this can cause problems if
+ CPython is compiled with one C++ compiler and extensions modules are compiled
+ with a different C++ compiler.
 
 
 Managing LLVM Releases, C++ API Changes
@@ -813,20 +877,26 @@
 following an LLVM release, and failing that, llvm.org itself includes binary
 releases.
 
-Pre-built LLVM packages are available from MacPorts [#llvm-macports]_ for
-Darwin, and from most major Linux distributions ([#llvm-ubuntu]_,
+Unladen Swallow has historically included a copy of the LLVM and Clang source
+trees in the Unladen Swallow tree; this was done to allow us to closely track
+LLVM trunk as we made patches to it. We do not recommend this model of
+development for CPython. CPython releases should be based on official LLVM
+releases. Pre-built LLVM packages are available from MacPorts [#llvm-macports]_
+for Darwin, and from most major Linux distributions ([#llvm-ubuntu]_,
 [#llvm-debian]_, [#llvm-fedora]_). LLVM itself provides additional binaries,
 such as for MinGW [#llvm-mingw]_.
 
 LLVM is currently intended to be statically linked; this means that binary
 releases of CPython will include the relevant parts (not all!) of LLVM. This
-will increase the binary size, as noted above.
+will increase the binary size, as noted above. To simplify downstream package
+management, we will modify LLVM to better support shared linking. This issue
+will block final merger [#us-shared-link-issue]_.
 
 Unladen Swallow has tasked a full-time engineer with fixing any remaining
-critical issues in LLVM before LLVM's 2.7 release. We would like CPython 3.x to
-be able to depend on a released version of LLVM, rather than closely tracking
-LLVM trunk as Unladen Swallow has done. We believe we will finish this work
-before the release of LLVM 2.7, expected in May 2010.
+critical issues in LLVM before LLVM's 2.7 release. We consider it essential that
+CPython 3.x be able to depend on a released version of LLVM, rather than closely
+tracking LLVM trunk as Unladen Swallow has done. We believe we will finish this
+work [#us-llvm-punchlist]_ before the release of LLVM 2.7, expected in May 2010.
 
 
 Building CPython
@@ -868,27 +938,22 @@
 interaction, b) statically linking LLVM into ``libpython``, c) compiling parts
 of the Python runtime to LLVM IR to enable cross-language inlining.
 
-Incremental builds, however, are significantly slower. The table below shows
-incremental rebuild times after touching ``Objects/listobject.c``.
-
-+-------------+---------------+---------------+----------------------+
-| Incr make | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r988 |
-+=============+===============+===============+======================+
-| Run 1 | 0m1.854s | 0m1.456s | 0m24.464s |
-+-------------+---------------+---------------+----------------------+
-| Run 2 | 0m1.437s | 0m1.442s | 0m24.416s |
-+-------------+---------------+---------------+----------------------+
-| Run 3 | 0m1.440s | 0m1.425s | 0m24.352s |
-+-------------+---------------+---------------+----------------------+
+Incremental builds are also somewhat slower than mainline CPython. The table
+below shows incremental rebuild times after touching ``Objects/listobject.c``.
 
-As with full builds, this extra time comes from a) additional ``.cc`` files
-needed for LLVM interaction, and b) statically linking LLVM into ``libpython``.
-
-If ``libpython`` were linked shared against LLVM, this overhead would go down.
-Incremental builds of Unladen Swallow also currently (as of r988) suffer from a
-known bug in the Unladen Swallow ``Makefile`` [#rebuild-too-much]_ where too
-many ``.cc`` files are recompiled. We consider this a blocking issue for full
-merger with the ``py3k`` branch.
++-------------+---------------+---------------+-----------------------+
+| Incr make | CPython 2.6.4 | CPython 3.1.1 | Unladen Swallow r1024 |
++=============+===============+===============+=======================+
+| Run 1 | 0m1.854s | 0m1.456s | 0m6.680s |
++-------------+---------------+---------------+-----------------------+
+| Run 2 | 0m1.437s | 0m1.442s | 0m5.310s |
++-------------+---------------+---------------+-----------------------+
+| Run 3 | 0m1.440s | 0m1.425s | 0m7.639s |
++-------------+---------------+---------------+-----------------------+
+
+As with full builds, this extra time comes from statically linking LLVM
+into ``libpython``. If ``libpython`` were linked shared against LLVM, this
+overhead would go down.
 
 
 Proposed Merge Plan
@@ -930,6 +995,31 @@
 ``py3k-jit`` branch.
 
 
+Contingency Plans
+-----------------
+
+There is a chance that we will not be able to reduce memory usage or startup
+time to a level satisfactory to the CPython community. Our primary contingency
+plan for this situation is to shift from a online just-in-time compilation
+strategy to an offline ahead-of-time strategy using an instrumented CPython
+interpreter loop to obtain feedback. This is the same model used by gcc's
+feedback-directed optimizations (`-fprofile-generate`) [#gcc-fdo]_ and
+Microsoft Visual Studio's profile-guided optimizations [#msvc-pgo]_; we will
+refer to this as "feedback-directed optimization" here, or FDO.
+
+We believe that an FDO compiler for Python would be inferior to a JIT compiler.
+FDO requires a high-quality, representative benchmark suite, which is a relative
+rarity in both open- and closed-source development. A JIT compiler can
+dynamically find and optimize the hot spots in any application -- benchmark
+suite or no -- allowing it to adapt to changes in application bottlenecks
+without human intervention.
+
+If an ahead-of-time FDO compiler is required, it should be able to leverage a
+large percentage of the code and infrastructure already developed for Unladen
+Swallow's JIT compiler. Indeed, these two compilation strategies could exist
+side-by-side.
+
+
 Future Work
 ===========
 
@@ -959,6 +1049,9 @@
 initially avoided a purely-tracing JIT compiler in favor of a simpler,
 function-at-a-time compiler. However this function-at-a-time compiler has laid
 the groundwork for a future tracing compiler implemented in the same terms.
+- Profile generation/reuse. The runtime data gathered by the JIT could be
+ persisted to disk and reused by subsequent JIT compilations, or by external
+ tools such as Cython [#cython]_ or a feedback-enhanced code coverage tool.
 
 This list is by no means exhaustive. There is a vast literature on optimizations
 for dynamic languages that could and should be implemented in terms of Unladen
@@ -977,8 +1070,6 @@
 organization. We would like a non-Google-affiliated member of the CPython
 development team to review our work for correctness and compatibility, but we
 realize this may not be possible for every commit.
-- *How to link LLVM.* Should we change LLVM to better support shared linking,
- and then use shared linking to link the parts of it we need into CPython?
 - *Prioritization of remaining issues.* We would like input from the CPython
 development team on how to prioritize the remaining issues in the Unladen
 Swallow codebase. Some issues like memory usage are obviously critical before
@@ -1007,6 +1098,10 @@
 under the terms of the Python Software Foundation License v2 [#psf-lic]_ under
 the umbrella of Google's blanket Contributor License Agreement with the PSF.
 
+LLVM is licensed [#llvm-lic]_ under the University of llinois/NCSA Open Source
+License [#ui-lic]_, a liberal, OSI-approved license. The University of Illinois
+Urbana-Champaign is the sole copyright holder for LLVM.
+
 
 References
 ==========
@@ -1026,9 +1121,6 @@
 .. [#llvm-hardware]
 http://llvm.org/docs/GettingStarted.html#hardware
 
-.. [#rebuild-too-much]
- http://code.google.com/p/unladen-swallow/issues/detail?id=115
-
 .. [#llvm-c-api]
 http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm-c/
 
@@ -1077,6 +1169,9 @@
 .. [#oprofile-jit-interface]
 http://oprofile.sourceforge.net/doc/devel/jit-interface.html
 
+.. [#oprofile-workflow]
+ http://code.google.com/p/unladen-swallow/wiki/UsingOProfile
+
 .. [#llvm-mingw]
 http://llvm.org/releases/download.html
 
@@ -1179,6 +1274,12 @@
 .. [#psf-lic]
 http://www.python.org/psf/license/
 
+.. [#llvm-lic]
+ http://llvm.org/docs/DeveloperPolicy.html#clp
+
+.. [#ui-lic]
+ http://www.opensource.org/licenses/UoI-NCSA.php
+
 .. [#v8]
 http://code.google.com/p/v8/
 
@@ -1296,6 +1397,54 @@
 .. [#us-nbody]
 http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_nbody.py
 
+.. [#us-shared-link-issue]
+ http://code.google.com/p/unladen-swallow/issues/detail?id=130
+
+.. [#us-llvm-punchlist]
+ http://code.google.com/p/unladen-swallow/issues/detail?id=131
+
+.. [#llvm-ppc-eager-jit-issue]
+ http://llvm.org/PR4816
+
+.. [#llvm-arm-jit-issue]
+ http://llvm.org/PR6065
+
+.. [#cython]
+ http://www.cython.org/
+
+.. [#shedskin]
+ http://shed-skin.blogspot.com/
+
+.. [#shedskin-library-limits]
+ http://shedskin.googlecode.com/files/shedskin-tutorial-0.3.html
+
+.. [#wpython]
+ http://code.google.com/p/wpython/
+
+.. [#wpython-performance]
+ http://www.mail-archive.com/python-dev@python.org/msg45143.html
+
+.. [#ironpython]
+ http://ironpython.net/
+
+.. [#mono]
+ http://www.mono-project.com/
+
+.. [#jython]
+ http://www.jython.org/
+
+.. [#jython-c-ext]
+ http://wiki.python.org/jython/JythonFaq/GeneralInfo
+
+.. [#pyv8]
+ http://code.google.com/p/pyv8/
+
+.. [#gcc-fdo]
+ http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
+
+.. [#msvc-pgo]
+ http://msdn.microsoft.com/en-us/library/e7k32f4k.aspx
+
 
 Copyright
 =========