[Python-ideas] Updated PEP 432: Simplifying the CPython update sequence

Sun Jan 6 08:26:14 CET 2013

On Sun, Jan 6, 2013 at 7:42 AM, Barry Warsaw <barry at python.org> wrote:
> Hi Nick,
>> PEP 432 is looking very nice. It'll be fun to watch the implementation come
> together. :)
>> Some comments...
>> The start up sequences:
>>> * Pre-Initialization - no interpreter available
>> * Initialization - interpreter partially available
>> What about "Initializing"?

Makes sense, changed.
>> * Initialized - full interpreter available, __main__ related metadata
>> incomplete
>> * Main Execution - optional state, __main__ related metadata populated,
>> bytecode executing in the __main__ module namespace
>> What is "optional" about this state? Maybe it should be called "Operational"?

Unlike the other phases which are sequential and distinct, "Main
Execution" is a subphase of Initialized. Embedding applications
without the concept of a "__main__" module (e.g. mod_wsgi) will never
use it.
>> ... separate system Python (spython) executable ...
>> I love the idea, but I'm not crazy about the name. What about
> `python-minimal` (yes, it's deliberately longer. Symlinks ftw. :)

Yeah, I'll go with "python-minimal".
>> <TBD: Did I miss anything?>
>> What about sys.implementation?

Unaffected, since that's all configured at build time. I've added an
explicit note that sys.implementation and sysconfig.get_config_vars()
are not affected by this initial proposal.
>> as it failed to be updated for the virtual environment support added in
>> Python 3.3 (detailed in PEP 420).
>> venv is defined in PEP 405 (there are two cases of mis-referencing).

Oops, fixed.
> Note that there may be other important build time settings on some platforms.
> An example is Debian/Ubuntu, where we define the multiarch triplet in the
> configure script, and pass that through Makefile(.pre.in) to sysmodule.c for
> exposure as sys.implementation._multiarch.

Yeah, I don't want to mess with adding new runtime configuration
options at this point, beyond the features inherent in breaking up the
existing initialization phases.
>>> For a command executed with -c, it will be the string "-c"
>> For explicitly requested input from stdin, it will be the string "-"
>> Wow, I couldn't believe it but it's true! That seems crazy useless. :)

Yup. While researching this PEP I had many moments where I was looking
at the screen going "WTF, we seriously do that?" (most notably when I
learned that using the -W and -X options means we create Python
objects in Py_Main() before the call to Py_Initialize(). This is why
there has to be an explicit call to _Py_Random_Init() before the
option processing code)
>> Embedding applications must call Py_SetArgv themselves. The CPython logic
>> for doing so is part of Py_Main() and is not exposed separately. However,
>> the runpy module does provide roughly equivalent logic in runpy.run_module
>> and runpy.run_path.
>> As I've mentioned before on the python-porting mailing list, this is actually
> more difficult than it seems because main() takes char*s but Py_SetArgv() and
> Py_SetProgramName() takes wchar_t*s.
>> Maybe Python's own conversion could be refactored to make this easier either
> as part of this PEP or after the PEP is implemented.

Yeah, one of the changes in the PEP is that you can pass program_name
and raw_argv as a Unicode object or a list of Unicode objects instead
of use wchar_t.
>>> int Py_ReadConfiguration(PyConfig *config);
>>> The config argument should be a pointer to a Python dictionary. For any
>> supported configuration setting already in the dictionary, CPython will
>> sanity check the supplied value, but otherwise accept it as correct.
>> So why not define this to take a PyObject* or a PyDictObject* ?

That wording is a holdover from a previous version of the PEP where
this was indeed a dictionary pointer. I came around to Antoine's point
of view that since we have a fixed list of supported settings at any
given point in time, a struct would be easier to deal with on the C
side. However, I missed a few spots (including this one) when I made
the change to the PEP.
>> (also: the Py_Config struct members need the correct concrete type pointers,
> e.g. PyDictObject*)

Fixed.
>> Alternatively, settings may be overridden after the Py_ReadConfiguration
>> call (this can be useful if an embedding application wants to adjust a
>> setting rather than replace it completely, such as removing sys.path[0]).
>> How will setting something after Py_ReadConfiguration() is called change a
> value such as sys.path? Or is this the reason why you pass a Py_Config to
> Py_EndInitialization()?

Correct - calling Py_ReadConfiguration has no effect on the
interpreter state. The
interpreter state only changes in Py_EndInitialization. I'll include a
more explicit
explanation of that behaviour.
> (also, see the type typo <wink> in the definition of Py_EndInitialization())
>> Also, I suggest taking the opportunity to change the sense of flags such as
> no_site and dont_write_bytecode. I find it much more difficult to reason that
> "dont_write_bytecode = 0" means *do* write bytecode, rather than
> "write_bytecode = 1". I.e. positives are better than double-negatives.

While I agree with this principle in general, I'm deliberate not doing
anything about most of these because these settings are already
exposed in their double-negative form as environment variables
(PYTHONDONTWRITEBYTECODE, PYTHONNOUSERSITE), as global variables that
can be set by an embedding application (Py_DontWriteBytecodeFlag,
Py_NoSiteFlag, Py_NoUserSiteDirectory) and as sys module attributes
(sys.dont_write_bytecode, sys.flags.no_site, sys.flags.no_user_site).
However, I *am* going to change the sense of the no_site setting to
"enable_site_config". The reason for this is that the meaning of the
setting actually changed in Python 3.3 to also mean "disable the side
effects that are currently implicit in importing the site module", in
addition to implicitly importing that module as part of the startup
sequence.
>> sys.argv[0] may not yet have its final value
>> it will be -m when executing a module or package with CPython
>> Gosh, wouldn't it be nice if this could have a more useful value?

It does once runpy is done with it (it has the __file__ attribute
corresponding to whatever code is actually being run as __main__). At
this point in the initialisation sequence, though, __main__ is still
the builtin __main__ module, and there's no getting around the fact
that we need to be able to import and run arbitrary Python code (both
from the standard library and from package __init__ files) in order to
properly locate __main__.
>> Initial thought is that hiding the various options behind a single API would
>> make that API too complicated, so 3 separate APIs is more likely:
>> +1
>>> The interpreter state will be updated to include details of the
>> configuration settings supplied during initialization by extending the
>> interpreter state object with an embedded copy of the Py_CoreConfig and
>> Py_Config structs.
>> Couldn't it just have a dict with all the values from both structs collapsed
> into it?

It could, but that's substantially less convenient from the C side of the API.
>>> For debugging purposes, the configuration settings will be exposed as a
>> sys._configuration simple namespace
>> I suggest un-underscoring the name and making it public. It might be useful
> for other than debugging purposes.

The underscore is there because the specific fields are currently
CPython specific. Another implementation may not make these settings
configurable at all.
If there are particular settings that would be useful to modules like
importlib or site, then we may want to look at exposing them through
sys.implementation as required attributes, but that's a distinct PEP
from this one.
>> Is Py_IsRunningMain() worth keeping?
>> Perhaps. Does it provide any additional information above Py_IsInitialized()?

Yes - it indicates that sys.argv[0] and the metadata in __main__ are
fully updated (i.e. the placeholder info used while executing Python
code in order to locate __main__ in the first place has been replaced
with the real info).
>> Should the answers to Py_IsInitialized() and Py_RunningMain() be exposed via
>> the sys module?
>> I can't think of a use case.

Neither can I. I'll leave them as "for embedding apps only" until
someone comes up with an actual reason to expose them.
>> Is the Py_Config struct too unwieldy to be practical? Would a Python
>> dictionary be a better choice?
>> Although I see why you've spec'd it this way, I don't like having *two* config
> structures (Py_CoreConfig and Py_Config). Having a dictionary for the latter
> would probably be fine, and in fact you could copy the Py_Config values into
> it (when possible during the init sequence) and expose it in the sys module.

Yeah, I originally had just Py_CoreConfig and then a Py_DictObject for
the rest of it. The first draft of Py_Config embedded a copy of
Py_CoreConfig as the first field. However, I eventually settled on the
current scheme as best aligning the model with the reality that we
really do have two kinds of configuration setting which need to be
handled differently:
- Py_CoreConfig holds the settings that are required to create a
Py_InterpreterState at all (passed to Py_BeginInitialization)
- Py_Config holds the settings that are required to get to a fully
functional interpreter (passed to Py_EndInitialization)
Using a struct for both of them is easier to work with from C, and
makes the number vs string vs list vs mapping distinction for the
various settings self-documenting.
>> Would it be better to manage the flag variables in Py_Config as Python
>> integers so the struct can be initialized with a simple memset(&config, 0,
>> sizeof(*config))?
>> Would we even notice the optimization?

I'll clarify this a bit - it's a maintainability question, rather than
an optimization. (i.e. I think _Py_Config_INIT is ugly as hell, I just
don't have any better ideas)
>>> A System Python Executable
>> This should probably at least mention Christian's idea of the -I flag (which I
> think hasn't been PEP'd yet). We can bikeshed about the name of the
> executable later. :)

Yeah, I've gone through and added a bunch of tracker links, including
that one. There's a signficant number of things which this should make
easier in the future (e.g. I haven't linked to it, but the proposal to
support custom memory allocators could be handled by adding more
fields to Py_CoreConfig rather than more C level global variables)
Cheers,
Nick.
-- 
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia