[Python-Dev] the email module, text, and bytes (was Re: Dropping bytes "support" in json)

Fri Apr 10 05:03:35 CEST 2009

On Apr 9, 2009, at 11:11 PM, glyph at divmod.com wrote:
> I think this is a problematic way to model bytes vs. text; it gives 
> text a special relationship to bytes which should be avoided.
>> IMHO the right way to think about domains like this is a multi-level 
> representation. The "low level" representation is always bytes, 
> whether your MIME type is text/whatever or application/x-i-dont-know.

This is a really good point, and I really should be clearer when 
describing my current thinking (sleep would help :).
> The thing that's "special" about text is that it's a "high level" 
> representation that the standard library can know about. But the 
> 'email' package ought to support being extended to support other 
> types just as well. For example, I want to ask for image/png 
> content as PIL.Image objects, not bags of bytes. Of course this 
> presupposes some way for PIL itself to get at some bytes, but then 
> you need the email module itself to get at the bytes to convert to 
> text in much the same way. There also needs to be layering at the 
> level of bytes->base64->some different bytes->PIL->Image. There are 
> mail clients that will base64-encode unusual encodings so you have 
> to do that same layering for text sometimes.
>> I'm also being somewhat handwavy with talk of "low" and "high" level 
> representations; of course there are actually multiple levels beyond 
> that. I might want text/x-python content to show up as an AST, but 
> the intermediate DOM-parsing representation really wants to operate 
> on characters. Similarly for a DOM and text/html content. (Modulo 
> the usual encoding-detection weirdness present in parsers.)

When I was talking about supporting text/* content types as strings, I 
was definitely thinking about using basically the same plug-in or 
higher level or whatever API to do that as you might use to get PIL 
images from an image/gif.
> So, as long as there's a crisp definition of what layer of the MIME 
> stack one is operating on, I don't think that there's really any 
> ambiguity at all about what type you should be getting.

In that case, we really need the bytes-in-bytes-out-bytes-in-the-chewy- 
center API first, and build things on top of that.
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 304 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090409/25c444cd/attachment.pgp>