[Python-ideas] Tulip / PEP 3156 - subprocess events

Fri Jan 18 23:15:07 CET 2013

On Thu, Jan 17, 2013 at 11:17 PM, Greg Ewing
<greg.ewing at canterbury.ac.nz> wrote:
> Paul Moore wrote:
>>>> PS From the PEP, it seems that a protocol must implement the 4 methods
>> connection_made, data_received, eof_received and connection_lost. For
>> a process, which has 2 output streams involved, a single data_received
>> method isn't enough.

> It looks like there would have to be at least two Transport instances
> involved, one for stdin/stdout and one for stderr.
>> Connecting them both to a single Protocol object doesn't seem to be
> possible with the framework as defined. You would have to use a
> couple of adapter objects to translate the data_received calls into
> calls on different methods of another object.

So far this makes sense.
But for this specific case there's a simpler solution -- require the
protocol to support a few extra methods, in particular,
err_data_received() and err_eof_received(), which are to stderr what
data_received() and eof_received() are for stdout. (After all, the
point of a subprocess is that "normal" data goes to stdout.) There's
only one input stream to the subprocess, so there's no ambiguity for
write(), and neither is there a need for multiple
connection_made()/lost() methods. (However, we could argue endlessly
over whether connection_lost() should be called when the subprocess
exits, or when the other side of all three pipes is closed. :-)
> This sort of thing would be easier if, instead of the Transport calling
> a predefined method of the Protocol, the Protocol installed a callback
> into the Transport. Then a Protocol designed for dealing with subprocesses
> could hook different methods of itself into a pair of Transports.

Hm. Not excited. I like everyone using the same names for these
callback methods, so that a reader (who is familiar with the
transport/protocol API) can instantly know what kind of callback it is
and what its arguments are. (But see Nick's simple solution for having
your cake and eating it, too.)
> Stepping back a bit, I must say that from the coroutine viewpoint,
> the Protocol/Transport stuff just seems to get in the way. If I were
> writing coroutine-based code to deal with a subprocess, I would want
> to be able to write coroutines like
>> def handle_output(stdout):
> while 1:
> line = yield from stdout.readline()
> if not line:
> break
> mungulate_line(line)
>> def handle_errors(stderr):
> while 1:
> line = yield from stderr.readline()
> if not line:
> break
> complain_to_user(line)
>> In other words, I don't want Transports or Protocols or any of that
> cruft, I just want a simple pair of async stream objects that I can
> read and write using yield-from calls. There doesn't seem to be
> anything like that specified in PEP 3156.

This is a good observation -- one that I've made myself as well. I
also have a plan for dealing with it -- but I haven't coded it up
properly yet and consequently I haven't written it up for the PEP yet
either.
The idea is that there will be some even-higher-level functions for
tasks to call to open connections (etc.) which just give you two
unidrectional streams (one for reading, one for writing). The
write-stream can just be the transport (its write() and writelines()
methods are familiar from regular I/O streams) and the read-stream can
be a StreamReader -- a class I've written but which needs to be moved
into a better place:
http://code.google.com/p/tulip/source/browse/tulip/http_client.py#37
Anyway, the reason for having the transport/protocol abstractions in
the middle is so that other frameworks can ignore coroutines if they
want to -- all they have to do is work with Futures, which can be
fully controlled through callbacks (which are native at the lowest
level of almost all frameworks, including Tulip / PEP 3156).
> It does mention something about implementing a streaming buffer on
> top of a Transport, but in a way that makes it sound like a suggested
> recipe rather than something to be provided by the library. Also it
> seems like a lot of layers of overhead to go through.

It'll be in the stdlib, no worries. I don't expect the overhead to be a problem.
> On the whole, in PEP 3156 the idea of providing callback-based
> interfaces with yield-from-based ones built on top has been
> pushed way further up the stack than I imagined it would. I don't
> want to be *forced* to write my coroutine code at the level of
> Protocols; I want to be able to work at a lower level than that.

You can write an alternative framework using coroutines and callbacks,
bypassing transports and protocols. (You'll still need Futures.)
However you'd be missing the interoperability offered by the
protocol/transport abstractions: in an IOCP world you'd have to
interact with the event loop's callbacks differently than in a
select/poll/etc. world.
PEP 3156 is trying to make different groups happy: people who like
callbacks, people who like coroutines; people who like UNIX, people
who like Windows. Everybody may have to compromise a little bit, but
the reward will (hopefully) be better portability and better
interoperability.
-- 
--Guido van Rossum (python.org/~guido)